Analysis
聚类分析
SAW count
, realign
和 reanalyze
分析流程以 AnnData H5AD 格式输出空间聚类结果,其中了数据记录预处理、降维、聚类和差异表达分析的信息结果。
H5AD 中的聚类结果和 UMAP 降维信息可以在 StereoMap 中实现可视化。
这里详细展开了一个H5AD文件中的记录信息:
$ h5dump -n <task id>/outs/analysis/<SN>.bin200_1.0.h5ad ## you can also check <SN>.cellbin_1.0.h5ad
HDF5 "<task id>/outs/analysis/<SN>.bin200_1.0.h5ad" {
FILE_CONTENTS {
group /
dataset /X
group /layers
group /obs
dataset /obs/_index
group /obs/leiden
dataset /obs/leiden/categories
dataset /obs/leiden/codes
dataset /obs/n_genes_by_counts
group /obs/orig.ident
dataset /obs/orig.ident/categories
dataset /obs/orig.ident/codes
dataset /obs/pct_counts_mt
dataset /obs/total_counts
dataset /obs/x
dataset /obs/y
group /obsm
dataset /obsm/X_pca
dataset /obsm/X_umap
dataset /obsm/spatial
group /obsp
group /obsp/connectivities
dataset /obsp/connectivities/data
dataset /obsp/connectivities/indices
dataset /obsp/connectivities/indptr
group /obsp/distances
dataset /obsp/distances/data
dataset /obsp/distances/indices
dataset /obsp/distances/indptr
group /raw
group /raw/X
dataset /raw/X/data
dataset /raw/X/indices
dataset /raw/X/indptr
group /raw/var
dataset /raw/var/_index
dataset /raw/var/mean_umi
dataset /raw/var/n_cells
dataset /raw/var/n_counts
group /raw/var/real_gene_name
dataset /raw/var/real_gene_name/categories
dataset /raw/var/real_gene_name/codes
group /raw/varm
group /uns
dataset /uns/bin_size
dataset /uns/bin_type
group /uns/gene_exp_leiden
dataset /uns/gene_exp_leiden/1
...
dataset /uns/gene_exp_leiden/_index
group /uns/hvg
dataset /uns/hvg/method
group /uns/hvg/params
dataset /uns/hvg/source
dataset /uns/leiden_resolution
group /uns/neighbors
dataset /uns/neighbors/connectivities_key
dataset /uns/neighbors/distance_key
group /uns/rank_genes_groups
dataset /uns/rank_genes_groups/logfoldchanges
group /uns/rank_genes_groups/mean_count
dataset /uns/rank_genes_groups/mean_count/1
...
dataset /uns/rank_genes_groups/mean_count/_index
dataset /uns/rank_genes_groups/names
group /uns/rank_genes_groups/params
dataset /uns/rank_genes_groups/params/corr_method
dataset /uns/rank_genes_groups/params/groupby
dataset /uns/rank_genes_groups/params/method
dataset /uns/rank_genes_groups/params/reference
dataset /uns/rank_genes_groups/params/use_raw
group /uns/rank_genes_groups/pts
dataset /uns/rank_genes_groups/pts/1
...
dataset /uns/rank_genes_groups/pts/_index
group /uns/rank_genes_groups/pts_rest
dataset /uns/rank_genes_groups/pts_rest/1
...
dataset /uns/rank_genes_groups/pts_rest/_index
dataset /uns/rank_genes_groups/pvals
dataset /uns/rank_genes_groups/pvals_adj
dataset /uns/rank_genes_groups/scores
dataset /uns/resolution
group /uns/sn
dataset /uns/sn/_index
dataset /uns/sn/batch
dataset /uns/sn/sn
group /var
dataset /var/_index
dataset /var/dispersions
dataset /var/dispersions_norm
dataset /var/highly_variable
dataset /var/mean_umi
dataset /var/means
dataset /var/n_cells
dataset /var/n_counts
group /var/real_gene_name
dataset /var/real_gene_name/categories
dataset /var/real_gene_name/codes
group /varm
group /varp
}
}
差异表达分析
SAW count
, realign
和 reanalyze
会以 CSV 格式输出差异表达分析结果。
差异表达分析的 CSV 结果文件有两种,分别为 find_marker_genes.csv
和 <bin_size>_marker_features.csv
。
find_marker_genes.csv
是差异表达分析的原始输出结果<bin_size>_marker_features.csv
中的数据信息经过整理,格式经过调整,更加简洁明了。
对于每个类群的特征信息,主要计算以下指标::
- 平均 MID Count
- 表达占比的Log2变化值
- 校正后的 p-value
- 基因在类群内的表达占比
Feature ID,Feature Name,Cluster 1 Mean MID Count,Cluster 1 Log2 fold change,Cluster 1 Adjusted p-value,Cluster 1 % of expressed, ... ,Cluster 20 Mean MID Count,Cluster 20 Log2 fold change,Cluster 20 Adjusted p-value,Cluster 20 % of expressed
ENSMUSG00000016559,H3f3b,67.1754386,42.00155933,1.76E-41,1, ... ,0.076923077,-63.19518177,0,0.076923077
<bin_size>_marker_features.csv
中记录的差异表达分析结果可在 StereoMap 中查看,或直接使用 Excel 打开。
多组学联合分析-聚类
如果您对 Stereo-CITE T FF 样本执行 SAW reanalyze
,其联合分析多组学聚类结果将保存在 H5MU 中。
以下是 H5MU 中记录的信息的示例:
$ h5dump -n <task id>/outs/analysis/<SN>.bin20.h5mu
HDF5 "<task id>/outs/analysis/<SN>.bin20.h5mu" {
FILE_CONTENTS {
group /
group /mod
group /mod/multiomics
dataset /mod/multiomics/X
group /mod/multiomics/layers
dataset /mod/multiomics/layers/denoised_rna
group /mod/multiomics/obs
dataset /mod/multiomics/obs/_index
group /mod/multiomics/obs/leiden
dataset /mod/multiomics/obs/leiden/categories
dataset /mod/multiomics/obs/leiden/codes
dataset /mod/multiomics/obs/n_genes_by_counts
group /mod/multiomics/obs/orig.ident
dataset /mod/multiomics/obs/orig.ident/categories
dataset /mod/multiomics/obs/orig.ident/codes
dataset /mod/multiomics/obs/pct_counts_mt
dataset /mod/multiomics/obs/total_counts
dataset /mod/multiomics/obs/x
dataset /mod/multiomics/obs/y
group /mod/multiomics/obsm
dataset /mod/multiomics/obsm/X_totalVI
dataset /mod/multiomics/obsm/X_umap
dataset /mod/multiomics/obsm/spatial
group /mod/multiomics/obsp
group /mod/multiomics/obsp/connectivities
dataset /mod/multiomics/obsp/connectivities/data
dataset /mod/multiomics/obsp/connectivities/indices
dataset /mod/multiomics/obsp/connectivities/indptr
group /mod/multiomics/obsp/distances
dataset /mod/multiomics/obsp/distances/data
dataset /mod/multiomics/obsp/distances/indices
dataset /mod/multiomics/obsp/distances/indptr
group /mod/multiomics/raw
group /mod/multiomics/raw/X
dataset /mod/multiomics/raw/X/data
dataset /mod/multiomics/raw/X/indices
dataset /mod/multiomics/raw/X/indptr
group /mod/multiomics/raw/var
dataset /mod/multiomics/raw/var/_index
dataset /mod/multiomics/raw/var/mean_umi
dataset /mod/multiomics/raw/var/n_cells
dataset /mod/multiomics/raw/var/n_counts
group /mod/multiomics/raw/var/real_gene_name
dataset /mod/multiomics/raw/var/real_gene_name/categories
dataset /mod/multiomics/raw/var/real_gene_name/codes
group /mod/multiomics/raw/varm
group /mod/multiomics/uns
dataset /mod/multiomics/uns/bin_size
dataset /mod/multiomics/uns/bin_type
group /mod/multiomics/uns/gene_exp_leiden
dataset /mod/multiomics/uns/gene_exp_leiden/1
...
dataset /mod/multiomics/uns/gene_exp_leiden/9
dataset /mod/multiomics/uns/gene_exp_leiden/_index
group /mod/multiomics/uns/hvg
dataset /mod/multiomics/uns/hvg/method
group /mod/multiomics/uns/hvg/params
dataset /mod/multiomics/uns/hvg/source
dataset /mod/multiomics/uns/leiden_resolution
group /mod/multiomics/uns/neighbors
dataset /mod/multiomics/uns/neighbors/connectivities_key
dataset /mod/multiomics/uns/neighbors/distance_key
dataset /mod/multiomics/uns/omics
dataset /mod/multiomics/uns/resolution
group /mod/multiomics/uns/sn
dataset /mod/multiomics/uns/sn/_index
dataset /mod/multiomics/uns/sn/batch
dataset /mod/multiomics/uns/sn/sn
group /mod/multiomics/var
dataset /mod/multiomics/var/_index
dataset /mod/multiomics/var/highly_variable
dataset /mod/multiomics/var/highly_variable_nbatches
dataset /mod/multiomics/var/highly_variable_rank
dataset /mod/multiomics/var/mean_umi
dataset /mod/multiomics/var/means
dataset /mod/multiomics/var/n_cells
dataset /mod/multiomics/var/n_counts
group /mod/multiomics/var/real_gene_name
dataset /mod/multiomics/var/real_gene_name/categories
dataset /mod/multiomics/var/real_gene_name/codes
dataset /mod/multiomics/var/variances
dataset /mod/multiomics/var/variances_norm
group /mod/multiomics/varm
group /mod/multiomics/varp
group /mod/protein
...
group /mod/rna
...
group /obs
dataset /obs/_index
dataset /obs/_scvi_batch
dataset /obs/_scvi_labels
dataset /obs/_scvi_raw_norm_scaling
group /obsm
dataset /obsm/multiomics
dataset /obsm/protein
dataset /obsm/rna
group /obsmap
dataset /obsmap/multiomics
dataset /obsmap/protein
dataset /obsmap/rna
group /obsp
group /uns
dataset /uns/_scvi_manager_uuid
dataset /uns/_scvi_uuid
group /var
dataset /var/_index
dataset /var/mean_umi
dataset /var/n_cells
dataset /var/n_counts
group /var/real_gene_name
dataset /var/real_gene_name/categories
dataset /var/real_gene_name/codes
group /varm
dataset /varm/multiomics
dataset /varm/protein
dataset /varm/rna
group /varmap
dataset /varmap/multiomics
dataset /varmap/protein
dataset /varmap/rna
group /varp
}
}