科研绘图之PCA图

2026-02-13 18:28:25

主成分分析（Principal Component Analysis，PCA）是一种用于数据降维和特征提取的数学方法。它被广泛应用于数据分析、模式识别、图像处理等领域。PCA的目标是通过线性变换将原始数据转换为一组新的坐标系，使得在新的坐标系下数据的方差最大化。这就意味着，通过PCA，我们可以将原始数据集中的信息尽可能地集中在少数几个主成分（Principal Components）中，从而实现数据的降维。

用PCAtools绘制pca图

library(PCAtools)

A<- read.table(file = 'A.txt',

sep = '\t', header = T, row.names = 1)

sample <- read.table(file = 'sample.txt',

sep = '\t', header = T, row.names = 1)

pca <- pca(A, metadata = sample)

biplot(pca, x = 'PC1', y = 'PC2')

用ggplot2绘制pca图

library(tidyverse)

library(cowplot)

library(ggsci)

#关联样本信息

pca_rotated_plus <- rownames_to_column(pca$rotated,

var = 'sample_name') %>%

left_join(rownames_to_column(sample, var = 'sample_name'),

by = 'sample_name')

ggplot(pca_rotated_plus, aes(x = PC1, y = PC2)) +

geom_point(size = 8, aes(shape = strain, fill = stage)) +

labs(x = 'PC1 (68% variance explained)',

y = 'PC2 (11% variance explained)') +

scale_shape_manual(values = 21:22) +

scale_fill_brewer(palette = 'Set2') +

theme_half_open()

颜色无法显示到图例，我们做进一步的修改

ggplot(pca_rotated_plus, aes(x = PC1, y = PC2)) +

geom_point(size = 8, aes(fill = stage, shape = strain)) +

labs(x = 'PC1 (68% variance explained)',

y = 'PC2 (11% variance explained)') +

scale_shape_manual(values = 21:24) +

scale_fill_brewer(palette = 'Set3') +

theme_half_open() +

guides(fill = guide_legend(override.aes=list(shape=21)))

调整图例的位置和添加置信椭圆

ggplot(pca_rotated_plus, aes(x = PC1, y = PC2)) +

geom_point(size = 8, aes(fill = stage, shape = strain)) +

stat_ellipse(aes(color = stage)) +

labs(x = 'PC1 (68% variance explained)',

y = 'PC2 (11% variance explained)') +

scale_shape_manual(values = 21:24) +

scale_fill_brewer(palette = 'Set2') +

scale_color_brewer(palette = 'Set2') +

theme_half_open() +

guides(fill = guide_legend(override.aes=list(shape=21))) +

theme(

legend.position = c(0.18, 0.85),

legend.direction = "horizontal",

legend.background = element_rect(fill = "gray95"))

欢迎加入学习群