科研绘图之PCA图
主成分分析(Principal Component Analysis,PCA)是一种用于数据降维和特征提取的数学方法。它被广泛应用于数据分析、模式识别、图像处理等领域。PCA的目标是通过线性变换将原始数据转换为一组新的坐标系,使得在新的坐标系下数据的方差最大化。这就意味着,通过PCA,我们可以将原始数据集中的信息尽可能地集中在少数几个主成分(Principal Components)中,从而实现数据的降维。
用PCAtools绘制pca图
library(PCAtools)
A<- read.table(file = 'A.txt',
sep = '\t', header = T, row.names = 1)
sample <- read.table(file = 'sample.txt',
sep = '\t', header = T, row.names = 1)
pca <- pca(A, metadata = sample)
biplot(pca, x = 'PC1', y = 'PC2')
用ggplot2绘制pca图
library(tidyverse)
library(cowplot)
library(ggsci)
#关联样本信息
pca_rotated_plus <- rownames_to_column(pca$rotated,
var = 'sample_name') %>%
left_join(rownames_to_column(sample, var = 'sample_name'),
by = 'sample_name')
ggplot(pca_rotated_plus, aes(x = PC1, y = PC2)) +
geom_point(size = 8, aes(shape = strain, fill = stage)) +
labs(x = 'PC1 (68% variance explained)',
y = 'PC2 (11% variance explained)') +
scale_shape_manual(values = 21:22) +
scale_fill_brewer(palette = 'Set2') +
theme_half_open()
颜色无法显示到图例,我们做进一步的修改
ggplot(pca_rotated_plus, aes(x = PC1, y = PC2)) +
geom_point(size = 8, aes(fill = stage, shape = strain)) +
labs(x = 'PC1 (68% variance explained)',
y = 'PC2 (11% variance explained)') +
scale_shape_manual(values = 21:24) +
scale_fill_brewer(palette = 'Set3') +
theme_half_open() +
guides(fill = guide_legend(override.aes=list(shape=21)))
调整图例的位置和添加置信椭圆
ggplot(pca_rotated_plus, aes(x = PC1, y = PC2)) +
geom_point(size = 8, aes(fill = stage, shape = strain)) +
stat_ellipse(aes(color = stage)) +
labs(x = 'PC1 (68% variance explained)',
y = 'PC2 (11% variance explained)') +
scale_shape_manual(values = 21:24) +
scale_fill_brewer(palette = 'Set2') +
scale_color_brewer(palette = 'Set2') +
theme_half_open() +
guides(fill = guide_legend(override.aes=list(shape=21))) +
theme(
legend.position = c(0.18, 0.85),
legend.direction = "horizontal",
legend.background = element_rect(fill = "gray95"))
欢迎加入学习群