count 输出结果

目录概览

SAW count 分析任务通常在工作目录下开启,在该目录下,将找到一个名为 --id--sn(当--id参数没有启用时)的文件夹。输出结果依据数据类型被分类,主要文件被保存在 /outs下。

分析流程输出的具体文件取决于:

  • SAW软件的版本
  • 分析流程的选择,SAW countSAW realign
  • 是否加入显微镜图像进行分析
  • 特定的分析参数设置
  • ...

空间基因表达相关

完成 Stereo-seq T FF ,Stereo-seq N FFPE 和 Stereo-CITE T FF 组织样本的SAW count分析任务后,可以在 outs/ 目录下找到以下文件:

Directory/File NameDescription
bam/Files in BAM format.
annotated_bam/BAM file after alignment and annotation.
<SN>.*.bamIndexed BAM file containing position-sorted reads mapped to CIDs, aligned to the genome, and annotated with GTF/GFF.
<SN>.*.bam.csiIndex for <SN>.*.bam.
feature_expression/Feature expression matrices in HDF5 format at different dimensions.
<SN>.raw.gefFeature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.tissue.gefFeature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.cellbin.gefCellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area.
<SN>.adjusted.cellbin.gefCellbin expression matrix with cell border expanding, based on <SN>_<stain_type>_mask_edm_dis_<distance>.tif.
<SN>.merge.barcodeReadsCount.txtA mapped CID list file with read counts for each CID, including three columns (x, y, count).
<SN>_raw_barcode_gene_exp.txtAn annotated list file with the information of coordinate, gene, MID, read counts, which is prepared to be a sampling file that performs sequence saturation.
analysis/Secondary analysis files.
<SN>.bin20_1.0.h5ad & <SN>.bin50_1.0.h5ad

An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, based on <SN>.tissue.gef.

This output H5AD is named in the format of <SN>.<binN>_<leiden_res>.h5ad. In the file name, <SN> stands for the Stereo-seq chip serial number, <N> for bin size, and <leiden_res> for the resolution of Leiden clustering.

<SN>.bin20_1.0.marker_features.csv & <SN>.bin50_1.0.marker_features.csv Format-integrated differential expression analysis results, using <SN>.tissue.gef of bin20 and bin50.
<SN>.cellbin_1.0.h5adAn AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.cellbin.gef.
<SN>.cellbin_1.0.marker_features.csvFormat-integrated differential expression analysis results, using <SN>.cellbin.gef.
<SN>.cellbin_1.0.adjusted.h5adAn AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.adjusted.cellbin.gef.
<SN>.cellbin_1.0.adjusted.marker_features.csvFormat-integrated differential expression analysis results, using <SN>.adjusted.cellbin.gef.

空间蛋白表达相关

完成 Stereo-CITE T FF 组织样本的SAW count分析任务后,可以在 outs/ 目录下找到以下文件:

Directory/File NameDescription
feature_expression/Feature expression matrices in HDF5 format at different dimensions.
<SN>.protein.raw.gefFeature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.protein.tissue.gefFeature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.protein.cellbin.gefCellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area.
<SN>.protein.adjusted.cellbin.gefCellbin expression matrix with cell border expanding, based on <SN>_<stain_type>_mask_edm_dis_<distance>.tif.
<SN>.protein.tissue.rmbg.gem.gzFeature expression matrix from automatic protein background removal. It shows bin1 expression counts.
<SN>_cid_pid_mid_reads.tsvA list file with coordinate, PID, MID, and read counts, which is prepared to be a sampling file that performs sequence saturation for all proteins.
<SN>_valid_cid_reads.tsvA mapped CID list file from all ADT FASTQs, with read counts for each CID, including three columns (x, y, count).
analysis/Secondary analysis files.
<SN>.protein.bin20_0.1.h5ad & <SN>.protein.bin50_0.1.h5ad

An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, based on <SN>.protein.tissue.gef.

This output H5AD is named in the format of <SN>.<binN>_<leiden_res>.h5ad. In the file name, <SN> stands for the Stereo-seq chip serial number, <N> for bin size, and <leiden_res> for the resolution of Leiden clustering.

<SN>.protein.cellbin_0.1.h5adAn AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.protein.cellbin.gef.
<SN>.protein.cellbin_0.1.adjusted.h5adAn AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.protein.adjusted.cellbin.gef.

图像

Directory/File NameDescription
image/Images are generated from automatic or manual workflows.
<SN>_<stainType>_regist.tifThe panoramic image after the registration with <SN>.raw.gef matrix.
<SN>_<stainType>_tissue_cut.tifThe tissue segmentation image, based on the aligned panoramic image.
<SN>_<stainType>_mask.tifThe cell segmentation image, based on the aligned panoramic image.
<SN>_<stainType>_mask_edm_dis_<distance>.tifThe adjusted image, based on the cell segmentation image

HTML报告和可视化

Directory/File NameDescription
<SN>.report.htmlAnalysis summary report of metrics and plots in HTML format.
visualization.tar.gzStereoMap visualization file to presentation and manual processing.
<SN>.stereoA manifest file in JSON format includes experiment and pipeline information, basic analysis statistics, and references to image and spatial matrix files in the SAW output visualization file folder.

visualization.tar.gz

可视化压缩文件内集成了 StereoMap 展示所需的文件,一个解压后的示例文件如下:

visualization
├── C04042E3.adjusted.cellbin.gef
├── C04042E3.bin20_1.0.h5ad
├── C04042E3.bin50_1.0.h5ad
├── C04042E3.cellbin_1.0.adjusted.h5ad
├── C04042E3.rpi
├── C04042E3_SC_20240930_141410_4.1.0.tar.gz
├── C04042E3.stereo
├── C04042E3.tissue.gef
└── HE_matrix_template.txt

来自 Stereo-CITE 分析的输出可视化压缩文件,解压后的示例:

visualization
├── A02677B5.adjusted.cellbin.gef
├── A02677B5.bin20_1.0.h5ad
├── A02677B5.bin50_1.0.h5ad
├── A02677B5.cellbin_1.0.adjusted.h5ad
├── A02677B5.protein.adjusted.cellbin.gef
├── A02677B5.protein.bin20_0.1.h5ad
├── A02677B5.protein.bin50_0.1.h5ad
├── A02677B5.protein.cellbin_0.1.adjusted.h5ad
├── A02677B5.protein.tissue.gef
├── A02677B5.rpi
├── A02677B5_SC_20240930_094017_4.1.0.tar.gz
├── A02677B5.stereo
├── A02677B5.tissue.gef
└── DAPI_matrix_template.txt

.stereo

.stereo 是一个JSON格式的统领文件,里面记录了:

  • SAW分析流程的基本信息
  • 组织样本的相关信息
  • 基本分析统计结果
  • StereoMap所需的图像相关和矩阵相关的文件信息

*文件详细介绍在 “输出结果” 下的各部分说明中可以找到

© 2025 STOmics Tech. All rights reserved.Modified: 2025-03-07 10:28:04

results matching ""

    No results matching ""