count 输出结果
目录概览
SAW count
分析任务通常在工作目录下开启,在该目录下,将找到一个名为 --id
或 --sn
(当--id
参数没有启用时)的文件夹。输出结果依据数据类型被分类,主要文件被保存在 /outs
下。
分析流程输出的具体文件取决于:
- SAW软件的版本
- 分析流程的选择,
SAW count
或SAW realign
- 是否加入显微镜图像进行分析
- 特定的分析参数设置
- ...
空间基因表达相关
完成 Stereo-seq T FF ,Stereo-seq N FFPE 和 Stereo-CITE T FF 组织样本的SAW count
分析任务后,可以在 outs/
目录下找到以下文件:
Directory/File Name | Description |
---|---|
bam/ | Files in BAM format. |
annotated_bam/ | BAM file after alignment and annotation. |
<SN>.*.bam | Indexed BAM file containing position-sorted reads mapped to CIDs, aligned to the genome, and annotated with GTF/GFF. |
<SN>.*.bam.csi | Index for <SN>.*.bam . |
feature_expression/ | Feature expression matrices in HDF5 format at different dimensions. |
<SN>.raw.gef | Feature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts. |
<SN>.tissue.gef | Feature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200. |
<SN>.cellbin.gef | Cellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area. |
<SN>.adjusted.cellbin.gef | Cellbin expression matrix with cell border expanding, based on <SN>_<stain_type>_mask_edm_dis_<distance>.tif . |
<SN>.merge.barcodeReadsCount.txt | A mapped CID list file with read counts for each CID, including three columns (x, y, count). |
<SN>_raw_barcode_gene_exp.txt | An annotated list file with the information of coordinate, gene, MID, read counts, which is prepared to be a sampling file that performs sequence saturation. |
analysis/ | Secondary analysis files. |
<SN>.bin20_1.0.h5ad & <SN>.bin50_1.0.h5ad | An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, based on This output H5AD is named in the format of |
<SN>.bin20_1.0.marker_features.csv & <SN>.bin50_1.0.marker_features.csv | Format-integrated differential expression analysis results, using <SN>.tissue.gef of bin20 and bin50. |
<SN>.cellbin_1.0.h5ad | An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.cellbin.gef . |
<SN>.cellbin_1.0.marker_features.csv | Format-integrated differential expression analysis results, using <SN>.cellbin.gef . |
<SN>.cellbin_1.0.adjusted.h5ad | An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.adjusted.cellbin.gef . |
<SN>.cellbin_1.0.adjusted.marker_features.csv | Format-integrated differential expression analysis results, using <SN>.adjusted.cellbin.gef . |
空间蛋白表达相关
完成 Stereo-CITE T FF 组织样本的SAW count
分析任务后,可以在 outs/
目录下找到以下文件:
Directory/File Name | Description |
---|---|
feature_expression/ | Feature expression matrices in HDF5 format at different dimensions. |
<SN>.protein.raw.gef | Feature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts. |
<SN>.protein.tissue.gef | Feature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200. |
<SN>.protein.cellbin.gef | Cellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area. |
<SN>.protein.adjusted.cellbin.gef | Cellbin expression matrix with cell border expanding, based on <SN>_<stain_type>_mask_edm_dis_<distance>.tif . |
<SN>.protein.tissue.rmbg.gem.gz | Feature expression matrix from automatic protein background removal. It shows bin1 expression counts. |
<SN>_cid_pid_mid_reads.tsv | A list file with coordinate, PID, MID, and read counts, which is prepared to be a sampling file that performs sequence saturation for all proteins. |
<SN>_valid_cid_reads.tsv | A mapped CID list file from all ADT FASTQs, with read counts for each CID, including three columns (x, y, count). |
analysis/ | Secondary analysis files. |
<SN>.protein.bin20_0.1.h5ad & <SN>.protein.bin50_0.1.h5ad | An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, based on This output H5AD is named in the format of |
<SN>.protein.cellbin_0.1.h5ad | An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.protein.cellbin.gef . |
<SN>.protein.cellbin_0.1.adjusted.h5ad | An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.protein.adjusted.cellbin.gef . |
图像
Directory/File Name | Description |
---|---|
image/ | Images are generated from automatic or manual workflows. |
<SN>_<stainType>_regist.tif | The panoramic image after the registration with <SN>.raw.gef matrix. |
<SN>_<stainType>_tissue_cut.tif | The tissue segmentation image, based on the aligned panoramic image. |
<SN>_<stainType>_mask.tif | The cell segmentation image, based on the aligned panoramic image. |
<SN>_<stainType>_mask_edm_dis_<distance>.tif | The adjusted image, based on the cell segmentation image |
HTML报告和可视化
Directory/File Name | Description |
---|---|
<SN>.report.html | Analysis summary report of metrics and plots in HTML format. |
visualization.tar.gz | StereoMap visualization file to presentation and manual processing. |
<SN>.stereo | A manifest file in JSON format includes experiment and pipeline information, basic analysis statistics, and references to image and spatial matrix files in the SAW output visualization file folder. |
visualization.tar.gz
可视化压缩文件内集成了 StereoMap 展示所需的文件,一个解压后的示例文件如下:
visualization
├── C04042E3.adjusted.cellbin.gef
├── C04042E3.bin20_1.0.h5ad
├── C04042E3.bin50_1.0.h5ad
├── C04042E3.cellbin_1.0.adjusted.h5ad
├── C04042E3.rpi
├── C04042E3_SC_20240930_141410_4.1.0.tar.gz
├── C04042E3.stereo
├── C04042E3.tissue.gef
└── HE_matrix_template.txt
来自 Stereo-CITE 分析的输出可视化压缩文件,解压后的示例:
visualization
├── A02677B5.adjusted.cellbin.gef
├── A02677B5.bin20_1.0.h5ad
├── A02677B5.bin50_1.0.h5ad
├── A02677B5.cellbin_1.0.adjusted.h5ad
├── A02677B5.protein.adjusted.cellbin.gef
├── A02677B5.protein.bin20_0.1.h5ad
├── A02677B5.protein.bin50_0.1.h5ad
├── A02677B5.protein.cellbin_0.1.adjusted.h5ad
├── A02677B5.protein.tissue.gef
├── A02677B5.rpi
├── A02677B5_SC_20240930_094017_4.1.0.tar.gz
├── A02677B5.stereo
├── A02677B5.tissue.gef
└── DAPI_matrix_template.txt
.stereo
.stereo
是一个JSON格式的统领文件,里面记录了:
- SAW分析流程的基本信息
- 组织样本的相关信息
- 基本分析统计结果
- StereoMap所需的图像相关和矩阵相关的文件信息
*文件详细介绍在 “输出结果” 下的各部分说明中可以找到