如果用户喜欢使用SAW GitHub页面(https://github.com/STOmics/SAW)上提供的SAW shell脚本,建议按照上述方法整理原始数据。
1、Spatial Transcriptomics (ST) :
bash $ tree . |-- image | |-- SS200000135TL_D1_SC_20230822_144400_3.0.0.ipr | `-- SS200000135TL_D1_SC_20230822_144400_3.0.0.tar.gz |-- mask | `-- SS200000135TL_D1.barcodeToPos.h5 |-- md5 |-- reads | |-- E100026571_L01_trim_read_1.fq.gz | `-- E100026571_L01_trim_read_2.fq.gz` -- reference |-- STAR_SJ100 | |-- chrLength.txt | |-- chrNameLength.txt | |-- chrName.txt | |-- chrStart.txt | |-- exonGeTrInfo.tab | |-- exonInfo.tab | |-- FMindex | |-- geneInfo.tab | |-- Genome | |-- genomeParameters.txt | |-- SA | |-- SAindex | |-- SAindexAux | |-- sjdbInfo.txt | |-- sjdbList.fromGTF.out.tab | |-- sjdbList.out.tab | `-- transcriptInfo.tab |-- genes.gtf `-- genome.fa 5 directories, 25 files
2、Spatial Proteomics & Transcriptomics (PT) :
bash $ tree . |-- image | |-- A02677B5_SC_20240131_192213_3.0.3.ipr | `-- A02677B5_SC_20240131_192213_3.0.3.tar.gz |-- mask | `-- A02677B5.barcodeToPos.h5 |-- md5 |-- STOmics-RNA │ ├── V350248064_L01_read_1.fq.gz │ ├── V350248064_L01_read_2.fq.gz │ ├── V350248064_L02_read_1.fq.gz │ ├── V350248064_L02_read_2.fq.gz │ ├── V350248064_L03_read_1.fq.gz │ `-- V350248064_L03_read_2.fq.gz |-- STOmics-ADT │ │ ├── E150023160_L01_11_1.fq.gz │ │ `-- E150023160_L01_11_2.fq.gz |-- protein-reference | `-- ProteinPanel_128_mouse.list `-- reference |-- STAR_SJ100 | |-- chrLength.txt | |-- chrNameLength.txt | |-- chrName.txt | |-- chrStart.txt | |-- exonGeTrInfo.tab | |-- exonInfo.tab | |-- FMindex | |-- geneInfo.tab | |-- Genome | |-- genomeParameters.txt | |-- SA | |-- SAindex | |-- SAindexAux | |-- sjdbInfo.txt | |-- sjdbList.fromGTF.out.tab | |-- sjdbList.out.tab | `-- transcriptInfo.tab |-- genes.gtf `-- genome.fa 7 directories, 32 files
附录 B: SAW ST 输出文件列表
步骤 | 输出文件 | 文件描述 |
---|---|---|
splitMask | *.SN.barcodeToPos.bin | 通过CID分割Stereo-seq芯片T的mask文件。 |
mapping | lane.Aligned.sortedByCoord.out.bam | 二进制比对/映射文件,用于存储序列比对信息。 |
lane.barcodeReadsCount.txt | 比对上CID的reads列表文件,三列分别为x、y和reads计数。 | |
lane.Log.final.out | mapping 完成后汇总比对统计信息(STAR 输出)。 | |
lane.Log.out | STAR mapping中的主日志文件(STAR 输出)。 | |
lane.Log.progress.out | 报告作业过程统计数据(STAR 输出)。 | |
lane.SJ.out.tab | mapping 过程中拼接接头的检测(STAR 输出)。 | |
lane.bcPara | 定义CID比对选项的参数文件。 | |
lane.CIDMap.stat | mapping 的统计报告,如比对上CID的reads计数、reads测序质量、比对上的DNB计数等。 | |
lane.run.log | mapping 模块输出日志文件。 | |
lane.valid_cid_reads.fq | 有效CID reads :CID比对后以FASTQ格式读取。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readID之后添加。 | |
lane.unmapped_reads.fq | 将Clean reads映射到参考基因组后,FASTQ格式的未比对上的reads。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readdID之后添加。 | |
merge | SN.merge.barcodeReadsCount.txt | 合并的比对上CID的reads列表文件,三列分别为x、y和reads计数。 |
count | SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam | 按坐标排序的带注释的BAM文件,包括HI:i标记为1的唯一比对reads和多比对reads。 |
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.csi | 带注释的BAM的索引文件。 | |
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.summary.stat | count 的统计报告。“过滤&去重”指标中的“总reads数”表示BAM中的总比对记录数。“通过过滤的reads和注释总reads”指标一致,表示用于做注释、MID矫正、和定量的唯一比对reads。 | |
SN.raw.gef | HDF5格式的基因表达文件。这是第一个包含完整芯片区域表达信息的原始矩阵。它仅包括一个bin的geneExp组。表达矩阵的原点已被校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY。 | |
SN_raw_barcode_gene_exp.txt | 一个记录坐标、基因、MID和计数信息的,以空格分隔的列表。该文件为计算测序饱和度所需要的抽样文件。其五列信息分别为y、x、geneIndex、MIDIndex、以及readCount。 | |
count_data_hhmmss.log | Log文件。 | |
register & imageTools ipr2img | date-hh-mm-ss.log | Log文件。 |
SN or other user specified name for the image folder used when input into ImageQC/ImageStudio | 存储TIFF格式的原始小图的目录。 | |
SN_0000_0000_YYYY-MM-DD_hh-mm-ss-n.tif | TIFF格式的小图。 | |
<stainType>_fov_stitched_transformed.rpi | 多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。 | |
fov_stitched.rpi | 多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。 | |
<stainType>_fov_stitched_transformed.tif | TIFF格式的已经与track线模板预配准的拼接全图。 | |
<stainType>_fov_stitched.tif | 拼接的全景图像。需要进一步旋转一个非直角或比例。 | |
<stainType>matrix_template.txt | 配准DAPI/IF图的track线交叉点模版。用于评估配准结果。 | |
SN_<chipType>_date_time_version.ipr | IPR格式图像处理记录文件,记录从imageQC/imageStudio收集的基本图像信息。 | |
<stainType>_SN_mask.tif | TIFF格式的配准后的DAPI/IF细胞分割二值化图。 | |
<stainType>_SN_regist.tif | TIFF格式的配准全景图。 | |
SN.rpi | 保存配准后的显微拍照全景图、组织边界、以及细胞边界(降采样)的图像金字塔。 | |
<stainType>_SN_tissue_cut.tif | TIFF格式的DAPI/IF全图文件的组织分割结果。 | |
<stainType>_transform_template.txt | <stainType>_fov_stitched_transformed.tif的track线交叉点模板。用于评估拼接结果。 | |
tissueCut | SN.gef | HDF5格式的基因表达文件。这个文件是一个完整的GEF格式,包括bin1、10、20、50、100、200和500中的geneExp组和wholeExp组。它还包括一个统计组。表达式矩阵的原点已经校准为(0,0),偏移量x和y已经记录为geneExp/expression数据集的属性中的minX和minY。 |
SN.tissue.gef | HDF5格式的基因表达文件。组织GEF包括组织覆盖区域的表达信息。它仅包括bin 1的geneExp组。表达式矩阵的原点已被校准为(0,0),偏移量x和y已在与原始GEF相同的geneExp/expression数据集的属性中记录为minX和minY。 | |
SN.<label>.raw.label.gef | HDF5格式的基因表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。 | |
SN.<label>.label.gef | HDF5格式的基因表达文件。它包括标记的组织覆盖区域的表达信息,在bin1,10,20,50,100,200和500中有geneExp组和wholeExp组。它还包括一个统计组。将原始表达矩阵校准为(0,0),并将偏移量x和y记录为geneExp/expression数据集属性中的minX和minY。 | |
SN.gem.gz | 压缩基因表达矩阵,用于存储基因空间表达数据。 | |
SN.tissue.gem.gz | 压缩基因表达矩阵,用于存储组织区域中的基因空间表达数据。 | |
tissue_fig | 该目录存储了组织覆盖区域的统计图。 | |
scatter_100x100_MID_gene_counts.png | 每个bin(bin 100)中的MID计数和基因类型的散点图。 | |
scatter_150x150_MID_gene_counts.png | 每个bin(bin 150)中的MID计数和基因类型的散点图。 | |
scatter_200x200_MID_gene_counts.png | 每个bin(bin 200)中的MID计数和基因类型的散点图。 | |
scatter_20x20_MID_gene_counts.png | 每个bin(bin 20)中的MID计数和基因类型的散点图。 | |
scatter_50x50_MID_gene_counts.png | 每个bin(bin 50)中的MID计数和基因类型的散点图。 | |
statistic_100x100_MID_gene_DNB.png | x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin100)。 | |
statistic_150x150_MID_gene_DNB.png | x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin150)。 | |
statistic_200x200_MID_gene_DNB.png | x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin200)。 | |
statistic_20x20_MID_gene_DNB.png | x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin20)。 | |
statistic_50x50_MID_gene_DNB.png | x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin50)。 | |
violin_100x100_MID_gene.png | 展示每个bin(bin100)去重后的MID计数和基因类型的分布的小提琴图。 | |
violin_150x150_MID_gene.png | 展示每个bin(bin150)去重后的MID计数和基因类型的分布的小提琴图。 | |
violin_200x200_MID_gene.png | 展示每个bin(bin200)去重后的MID计数和基因类型的分布的小提琴图。 | |
violin_20x20_MID_gene.png | 展示每个bin(bin20)去重后的MID计数和基因类型的分布的小提琴图。 | |
violin_50x50_MID_gene.png | 展示每个bin(bin50)去重后的MID计数和基因类型的分布的小提琴图。 | |
tissuecut.stat | 组织覆盖区域的统计报告。 | |
<label>.tissuecut.stat | 标记覆盖区域的统计报告。 | |
cellCorrect | SN.adjusted.cellbin.gef | Cellbin GEF由调整后的细胞分割TIFF图像和*.raw. GEF生成。 |
SN.adjusted.gem | Cellbin GEM由调整后的细胞分割TIFF图像和*.raw.gef生成。 | |
<stainType>_SN_mask_edm_dis_10.tif | 调整单元格分割二值图像在TIFF格式。 | |
cellCut | SN.cellbin.gef | HDF5格式的细胞基因表达文件。Cellbin GEF包括细胞的表达信息,例如质心的坐标、边界坐标、基因的表达和细胞面积。通过边界来划分细胞。表达矩阵的原点已被校准为(0,0),坐标偏移量x和y记录在GEF文件的属性offsetX和offsetY,与原始GEF文件中的minX和minY相同。 |
spatialCluster | SN.bin200_1.0.spatial.cluster.h5ad | 空间聚类分析结果的H5AD文件。 |
cellCluster | SN.cell.cluster.h5ad | 细胞聚类分析结果的H5AD文件。 |
SN.adjusted.cell.cluster.h5ad | H5AD文件的细胞聚类分析结果,基于调整后的Cellbin GEF。 | |
saturation | plot_1x1_saturation.png | bin1的测序饱和度分析图。对于每个bin(bin 1),按1-(唯一读数/总读数)计算。 |
plot_200x200_saturation.png | bin200的测序饱和度分析图。对于每个bin (bin 200),按1-(唯一读数/总读数)计算。 | |
sequence_saturation.tsv | 测序饱和度文件。九列分别为采样成分(#sample)、bin1总读数(bar_x)、bin1的测序饱和度值(bar_y1)、bin1的中值基因计数(bar_y2)、bin1的唯一读数(bar_umi)、bin200总读数(bin_x)、bin200的测序饱和度值(bin_y1)、bin200的中值基因计数(bin_y2)和bin200的唯一读数(bin_umi)。 | |
report | SN.report.html | HTML网页分析报告。 |
SN.statistics.json | JSON格式的统计总结报告。它从每个步骤的统计报告中收集所有重要的统计指标。 | |
scatter_1x1_MID_gene_counts.png | 每个细胞的CID计数和基因数的散点图。 | |
statistic_1x1_cell_area.png | 细胞面积沿每个细胞的单变量分布。 | |
statistic_1x1_DNB.png | DNB数沿每个细胞x轴的单变量分布。 | |
statistic_1x1_gene.png | 基因类型沿每个细胞x轴的单变量分布。 | |
statistic_1x1_MID.png | MID计数沿每个细胞x轴的单变量分布。 | |
violin_1x1_gene.png | 小提琴图显示了每个细胞中基因类型的分布。 | |
violin_1x1_MID.png | 小提琴图显示了每个细胞中重复数据删除的MID计数的分布。 |
步骤 | 输出文件 | 文件描述 |
splitMask | *.SN.barcodeToPos.bin | 通过CID分割Stereo-seq芯片T的mask文件。 |
mapping | lane.Aligned.sortedByCoord.out.bam | 二进制比对/映射文件,用于存储序列比对信息。 |
lane.barcodeReadsCount.txt | 比对上CID的reads列表文件,三列分别为x、y和reads计数。 | |
lane.Log.final.out | mapping 完成后汇总比对统计信息(STAR 输出)。 | |
lane.Log.out | STAR mapping中的主日志文件(STAR 输出)。 | |
lane.Log.progress.out | 报告作业过程统计数据(STAR 输出)。 | |
lane.SJ.out.tab | mapping 过程中拼接接头的检测(STAR 输出)。 | |
lane.bcPara | 定义CID比对选项的参数文件。 | |
lane.CIDMap.stat | mapping 的统计报告,如比对上CID的reads计数、reads测序质量、比对上的DNB计数等。 | |
lane.run.log | mapping 模块输出日志文件。 | |
lane.valid_cid_reads.fq | 有效CID reads :CID比对后以FASTQ格式读取。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readID之后添加。 | |
lane.unmapped_reads.fq | 将Clean reads映射到参考基因组后,FASTQ格式的未比对上的reads。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readdID之后添加。 | |
merge | SN.merge.barcodeReadsCount.txt | 合并的比对上CID的reads列表文件,三列分别为x、y和reads计数。 |
count | SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam | 按坐标排序的带注释的BAM文件,包括HI:i标记为1的唯一比对reads和多比对reads。 |
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.csi | 带注释的BAM的索引文件。 | |
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.summary.stat | count 的统计报告。“过滤&去重”指标中的“总reads数”表示BAM中的总比对记录数。“通过过滤的reads和注释总reads”指标一致,表示用于做注释、MID矫正、和定量的唯一比对reads。 | |
SN.raw.gef | HDF5格式的基因表达文件。这是第一个包含完整芯片区域表达信息的原始矩阵。它仅包括一个bin的geneExp组。表达矩阵的原点已被校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY。 | |
SN_raw_barcode_gene_exp.txt | 一个记录坐标、基因、MID和计数信息的,以空格分隔的列表。该文件为计算测序饱和度所需要的抽样文件。其五列信息分别为y、x、geneIndex、MIDIndex、以及readCount。 | |
count_data_hhmmss.log | Log文件。 | |
register & imageTools ipr2img | date-hh-mm-ss.log | Log文件。 |
SN or other user specified name for the image folder used when input into ImageQC/ImageStudio | 存储TIFF格式的原始小图的目录。 | |
SN_0000_0000_YYYY-MM-DD_hh-mm-ss-n.tif | TIFF格式的小图。 | |
<stainType>_fov_stitched_transformed.rpi | 多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。 | |
fov_stitched.rpi | 多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。 | |
<stainType>_fov_stitched_transformed.tif | TIFF格式的已经与track线模板预配准的拼接全图。 | |
<stainType>_fov_stitched.tif | 拼接的全景图像。需要进一步旋转一个非直角或调整缩放比例。 | |
<stainType>matrix_template.txt | 配准DAPI/IF图的track线交叉点模版。用于评估配准结果。 | |
SN_<chipType>_date_time_version.ipr | IPR格式图像处理记录文件,记录从imageQC/imageStudio收集的基本图像信息。 | |
<stainType>_SN_mask.tif | TIFF格式的配准后的DAPI/IF细胞分割二值化图。 | |
<stainType>_SN_regist.tif | TIFF格式的配准全景图。 | |
SN.rpi | 保存配准后的显微拍照全景图、组织边界、以及细胞边界(降采样)的图像金字塔。 | |
<stainType>_SN_tissue_cut.tif | TIFF格式的DAPI/IF全图文件的组织分割结果。 | |
<stainType>_transform_template.txt | <stainType>_fov_stitched_transformed.tif的track线交叉点模板。用于评估拼接结果。 | |
mapping-SP | SN_cid_pid_mid_reads.tsv | tab分隔的文件记录坐标、基因、MID、PIDIndex 和reads计数信息。 准备作为对所有蛋白执行测序饱和度的采样文件。 这 5 列是 x、y、PIDIndex、MIDIndex、readCount。 |
SN_map.stat | CID比对和过滤、MID过滤、PID比对和测序质量的统计报告。 | |
SN.protein.gem.gz | GEM格式的蛋白表达文件 | |
SN.protein.raw.gef | hdf5 格式的蛋白表达文件。 这是第一个包含整个芯片区域表达信息的原始矩阵。 它只包括bin大小为1的geneExp组。表达矩阵的原点已校准为(0,0),并且偏移量x和y已在 geneExp/expression 的属性中记录为minX和minY。 | |
SN_valid_cid_reads.tsv | 比对上CID的列表文件,其中包含每个 CID 的读取计数,三列记录 x、y 和读取计数。 | |
calibration | SN.gef/SN.protein.gef | HDF5 格式的基因表达文件。该文件是一个完整的 GEF 格式,包括 bin1、10、20、50、100、200 和 500 的geneExp 组和 wholeExp 组。它还包括一个统计组。 x 和 y的偏移量在geneExp/expression数据集的属性中记录为minX和minY, |
SN.calibrated.raw.gef / SN.protein.calibrated.raw.gef | HDF5 格式的基因/蛋白表达文件,包括整张芯片的表达信息。它只包括bin1的geneExp组。这两个表达矩阵已校准到相同的偏移量。 | |
tissueCut | SN.<label>.raw.label.gef / SN.protein.<label>.raw.label.gef | HDF5格式的基因/蛋白表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。 |
SN.<label>.label.gef/ SN.protein.<label>.label.gef | matrix has been calibrated to (0,0), and the offset x and y has been recorded as minX and minY in the attribute of geneExp/expression dataset same to raw GEF. HDF5格式的基因/蛋白表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。它还包括一个统计组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。 | |
tissue_fig | 该目录存储了组织覆盖区域的统计图。 | |
scatter_<bin>x<bin>_MID_gene_counts.png | 每个bin 中的MID 计数和基因类型的散点图。bin大小包括:bin200、bin150、bin100、bin50、bin20。 | |
statistic_<bin>x<bin>MID_gene_DNB.png / statistic_<bin>x<bin>MID.png / statistic_<bin>x<bin>gene.png / statistic_<bin>x<bin>_MID_gene_DNB.png | MID数量、基因类型和DNB 沿x 轴的单变量分布。bin大小包括:bin200、bin150、bin100、bin50、bin20。 | |
violin_<bin>x<bin>MID_gene.png / violin_<bin>x<bin>MID.png / violin_<bin>x<bin>_gene.png | 展示每个bin 去重后的MID 计数和基因类型的分布的小提琴图。bin大小包括:bin200、bin150、bin100、bin50、bin20。 | |
tissuecut.stat | 组织覆盖区域的统计报告。 | |
<label>.tissuecut.stat | 标记覆盖区域的统计报告。 | |
SN.tissue.gef/ SN.protein.tissue.gef | HDF5格式的基因/蛋白表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。 | |
cellCut | SN.cellbin.gef/ SN.protein.cellbin.gef | HDF5格式的细胞基因表达文件。Cellbin GEF包括细胞的表达信息,例如质心的坐标、边界坐标、基因的表达和细胞面积。通过边界来划分细胞。表达矩阵的原点已被校准为(0,0),坐标偏移量x和y在GEF文件的属性中记录为offsetX和offsetY,与原始GEF文件中的minX和minY相同。 |
cellCorrect | SN.adjusted.cellbin.gef/ SN.protein.adjusted.cellbin.gef | Cellbin GEF由调整后的细胞分割TIFF图像和*.raw. GEF生成。 |
SN.adjusted.gem/ SN.protein.adjusted.gem | Cellbin GEM由调整后的细胞分割TIFF图像和*.raw.gef生成。 | |
<stainType>_SN_mask_edm_dis_10.tif | 调整单元格分割二值图像在TIFF格式。 | |
spatialCluster & spatialCluster-SP | SN.bin200_1.0.spatial.cluster.h5ad/ SN_bin200_0.1.protein.spatial.cluster.h5ad | 空间聚类分析结果的H5AD文件。 |
cellCluster & cellCluster-SP | SN.protein.cell.cluster.h5ad | 细胞聚类分析结果的H5AD文件。 |
SN.adjusted.cell.cluster.h5ad/ SN.protein.adjusted.cell.cluster.h5ad | H5AD文件的细胞聚类分析结果,基于调整后的Cellbin GEF。 | |
saturation | plot_1x1_saturation.png | bin1的测序饱和度分析图。对于每个bin(bin 1),按1-(唯一读数/总读数)计算。 |
plot_200x200_saturation.png | bin200的测序饱和度分析图。对于每个bin (bin 200),按1-(唯一读数/总读数)计算。 | |
sequence_saturation.tsv | 测序饱和度文件。九列分别为采样成分(#sample)、bin1总读数(bar_x)、bin1的测序饱和度值(bar_y1)、bin1的中值基因计数(bar_y2)、bin1的唯一读数(bar_umi)、bin200总读数(bin_x)、bin200的测序饱和度值(bin_y1)、bin200的中值基因计数(bin_y2)和bin200的唯一读数(bin_umi)。 | |
multiomics- Analysis | SN_50_differential_expression.csv | 包含了蛋白和RNA的1-vs-all差异分析CSV文件 |
SN_50_dotplot_RNA_totalVI_03.png | 基于空间聚类后的markder基因差异热图 | |
SN_50_matrixplot_Protein_totalVI_04.png | 基于空间聚类后的marker蛋白差异热图 | |
SN_50_Protein_Correlation_Heatmap_05.png | 蛋白-蛋白相关性图 | |
SN_50_spatial_leiden_totalVI_02.png | 潜空间的空间聚类图 | |
SN_50_UMAP_leiden_totalVI_01.png | 潜空间的UMAP 图。 | |
SN_50.h5mu | 记录了蛋白和RNA聚类分析结果的H5MU 文件。 | |
report-PT | rna_cell | 该目录存储了组织的转录组cellbin 统计图。 |
protein_cell | 该目录存储了组织的蛋白组cellbin 统计图。 | |
AnalysisReport | 该目录存储了工作流程摘要报告的所有元素。AnalysisReport/report.html 是主页。 | |
input.yaml | 配置文件记录了所有的输入文件。 | |
SN.statistics.json | JSON格式的统计总结报告。它从每个步骤的统计报告中收集所有重要的统计指标。 |
附录 D: 处理错误和异常情况
bash terminate called after throwing an instance of 'std::invalid_argument' what(): Could not open the file: xxxxxx ...
bash #006: H5FDsec2.c line 352 in H5FD__sec2_open(): unable to open file: name = '/path/to/hdf5/file', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0 major: File accessibility minor: Unable to open file ...
bash # Attempt 1: Bind file path before execute command. $ export SINGULARITY_BIND="/path/to/file/directory" # Attempt 2: Turn off the HDF5 lock on the file by running the command below before running SAW. $ export HDF5_USE_FILE_LOCKING=FALSE
>bash EXITING: FATAL INPUT ERROR: empty value for parameter “readNameSeparator” in input “Command-Line” SOLUTION: use non-empty value for this parameter
bash Error code: SAW-A10183 EXITING because of fatal ERROR: not enough memory for BAM sorting: SOLUTION: re-run STAR with at least –limitBAMsortRAM xxxxxxxxx
bash EXITING because of fatal error: buffer size of SJ output is too small Solution: increase input parameter --limitOutSJcollapsed
bash OSError: [Errno 30] Read-only file system: '/opt/saw_v5.4.0_software/pipeline/imageTools/imagetools-1.0.0/log'
附录 E:错误码
错误码的设计包含三个部分,日期时间、错误码、和描述。日期时间信息可以帮助用户区分不同的运行时间。错误码部分通过字母与数字的组合定义流程模块和报错类型。描述部分详细说明报错或异常,以及提示可尝试的解决方法。
Pipeline | Code | Error type | Examples and error handling |
splitMask | SAW-A00001 | Parameters invalid or missing | e.g. "parameters error" Please check your input parameters. Some required parameters might be missed. e.g. "splitBcPos error, expected 1_24 or 2_25" Please check your input CID position is either 1_24 or 2_25. |
SAW-A00002 | File open failed | Please check the input file exists and has the correct access permission. | |
SAW-A00003 | File parse failed | e.g. "only support .bin or .h5 file." Please check your input file is in the correct file format. | |
SAW-A00004 | Other IO API error | e.g. "cannot write to file, /path/to/file" Please check your output directory path is an existing path. | |
CIDCount | SAW-A00021 | Parameters invalid or missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A00022 | File open failed | e.g. "cannot open such file, /path/to/file" Please check the input file exists and has the correct access permission. | |
SAW-A00023 | Failed to parser the file | Please check your input file is in the correct file format. | |
SAW-A00024 | Other API error | Please check Appendix C or contact FAS/FBS for help. | |
SAW-A00025 | Software exception | Please check Appendix C or contact FAS/FBS for help. | |
mapping | SAW-A10100 | Parameters invalid or missing | e.g. "EXITING: FATAL INPUT ERROR: unrecognized parameter name "outSAMattribute" in input "Command-Line-Initial"" Please check the spelling of your argument and parameters. e.g. "please check the umi position and length" Please check the length of the reads in FQ1 are consistent with the parameters set for barcode length and umi length. For example, the length of the sum of barcode and umi set in bcPara file is 35 bp, but one of the reads in FQ1 is 30 bp, then you will see A10101 error code. Please check the parameters set in the bcPara file are consistent with your FQs. |
SAW-A10101 | File open failed | e.g. "Error, cannot open the file which be expected in gz or ascii format" Please check the file permission and file format. e.g. "barcodePositionMapFile does not exists: /path/to/mask" Please check the mask file exists and has the correct access permission. | |
SAW-A10102 | File parse failed | e.g. "sequence and quality have different length" Please check the completeness of the reads in FQ. This issue may arise if the FQs are in incorrect format or the file was incompletely written or transferred. | |
SAW-A10103 | Invalid data or data exception | Error data. Please check the file format and content. | |
SAW-A10104 | File deletion failed | This error arose if "_STARtmp" directory failed to be deleted. Please check whether the program has completely finished according to the *.Log.progress.out or *.run.log file. | |
SAW-A10105 | File IO failed | Please contact FAS/FBS for help. | |
SAW-A10106 | Failure on APIs of system and libraries | Please contact FAS/FBS for help. | |
SAW-A10107 | Software assert | Please contact FAS/FBS for help. | |
SAW-A10108 | Software exception | Please contact FAS/FBS for help. | |
SAW-A10109 | Allocate memory error | This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job. | |
SAW-A10110 | Out of disk space | This error occurs when you store too many files on your hard disk. Please remove some files to free disk space. | |
SAW-A10200 | Parameter missing | Please check your input parameters. Some required parameters might be missed. | |
SAW-A10201 | CID comparison rate is too low | This error usually arises because the CID information of the input FQs is not the same as the CID in the input mask file. Please use the correct SN-FQ pairs for mapping . | |
SAW-A10202 | Fail to create the index for the BAM file | Please contact FAS/FBS for help. | |
SAW-A10300 | Fail to load indexed reference | Please check the existence, access permission, and completeness for the indexed reference. | |
mapping-SP | SAW-A12100 | Parameter invalid | Please comfirm the parameter setting is reasonable. e.g. "cidLen=0" |
SAW-A12101 | Parameter missing | Please check your input parameters. Some required parameters might be missed. | |
SAW-A12102 | File open failed | Please check the input file that exists and has the correct access permission. | |
SAW-A12103 | File type invalid | Please check the file format. | |
SAW-A12104 | File parse failed | Please check your input file has a valid column number. | |
SAW-A12105 | Protein name invalid | "PIDName" ONLY ACCEPTS letters [a-zA-Z], digits [0-9] and symbols ["(", ")", "-", "_"]. | |
count | SAW-A20001 | Parameter missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A20002 | File open failed | Please check the input file that exists and has the correct access permission. | |
SAW-A20003 | File parse failed | Please check your input BAM header is in the correct file format. | |
SAW-A20004 | Allocate memory error | This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job. | |
SAW-A20005 | File parse failed | e.g. "Found <number> gene names with their length exceeding 64 characters" Please check gene names in your input GTF/GFF file. | |
SAW-A20101 | Parameter missing | Please check your input parameters. Some required parameters might be missed. | |
SAW-A20102 | File open failed | Please check the file that exists and has the correct access permission. | |
SAW-A20103 | File parse failed | Please check your input file has a valid column number. | |
SAW-A20105 | File parse failed | e.g. "Found <number> gene names with their length exceeding 64 characters" Please check gene names in your input GTF/GFF file. | |
merge | SAW-A30001 | Parameter missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A30002 | File open failed | Please check the input file exists and has the correct access permission. | |
SAW-A30003 | File parse failed | Please check your mask file is in the correct file format. Only support .h5/.bin mask file. | |
SAW-A30004 | Allocate memory error | Please check whether the range of the coordinates is too large, or you have run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job. | |
SAW-A30005 | Fail to open input file | Fail to open input TXT file. Please check the file that exists and have the correct access permission. | |
register & rapidRegister | SAW-A40001 | Parameter missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A40002 | File open failed | Please check the input file that exists and has the correct access permission. Or, please check whether a stitched panoramic TIFF (.tif or .tiff) image exists in the TAR.GZ. | |
SAW-A40003 | File parse failed | Please check your input file is in the correct file format. The -v input gene expression matrix should be either a *tsv, barcode_gene_exp.txt, *.gem.gz, or *raw.gef. | |
SAW-A40004 | Invalid data or data exception | Error data. Please check the file content. This error may arise because the -v input file is empty, the CZI file in TAR.GZ is invalid, or the QCPassFlag in IPR is 0. | |
SAW-A40005 | Tissue segmentation error | Abnormal tissue segmentation reference score in image preprocessing. | |
SAW-A40006 | IPR field missing | "Stitch/BGIStitch/StitchedGlobalLoc", does not exist. | |
SAW-A40007 | Tiled image missing | Please check the uncompressed image folder has tiled images. | |
SAW-A40008 | Insufficient GPU memory | GPU resources in the current node are insufficient. Please redeliver the task on an adequate one. | |
SAW-A40009 | Abnormal chip SN prefix | Check "ImageInfo -> STOmicsChipSN" in .ipr where it offers the information of chip SN prefix. | |
imageTools | SAW-A40401 | Parameter missing | Please check your imageTools merge input parameters. Some required parameters might be missed. |
SAW-A40402 | File open failed | Please check the imageTools merge input file that exists and has the correct access permission. | |
SAW-A40405 | Invalid input | imageTools merge inputs of less than two images or more than three images. | |
SAW-A40406 | File pairing failed | Please check the imageTools merge input TIFF sizes are the same. Since the merge function is used for evaluating segmentation results, the input images are supposed to be the same in size and position (tissue position in the whole image). | |
SAW-A40501 | Parameter missing | Please check your imageTools overlay input parameters. Some required parameters might be missed. | |
SAW-A40502 | File open failed | Please check the imageTools overlay input file that exists and has the correct access permission. | |
SAW-A40504 | Invalid data or data exception | Error data. Please check the file content. This error may arise because the -c input IPR file does not include Stitch/TransformTemplate or Register/MatrixTemplate information. | |
SAW-A40601 | Parameter missing | Please check your imageTools img2rpi input parameters. Some required parameters might be missed. | |
SAW-A40602 | File open failed | Please check the imageTools img2rpi input file that exists and has the correct access permission. | |
SAW-A40605 | Invalid input | imageTools img2rpi input -i and -g have different length. These two inputs are supposed to be paired. | |
SAW-A40701 | Parameter missing | Please check your imageTools ipr2img input parameters. Some required parameters might be missed. | |
SAW-A40702 | File open failed | Please check the imageTools ipr2img input file that exists and has the correct access permission. Or, please check whether a stitched panoramic TIFF (.tif or .tiff) image exists in the TAR.GZ. | |
SAW-A40703 | File parse failed | Please check your imageTools ipr2img input file is in the ImageStudio output TAR.GZ format. | |
SAW-A40704 | Invalid data or data exception | Error data. Please check the imageTools ipr2img input IPR file content.This error may arise because the CZI file in TAR.GZ is invalid, or the image has not either automatically or manually registered with the expression matrix. The second circumstance can be confirmed from IPR by checking whether the StereoResepSwitch/register is TRUE (has not performed automatic registration) or the ManualState/register is FALSE (has not performed manual registration). The third possible reason is the StereoResepSwitch/tissueseg is TRUE, therefore cannot output cell segmentation result. | |
SAW-A40706 | File pairing failed | Please check the shape of registered tissue segmentation images stored in the imageTools ipr2img input IPR are the same. Or please contact FAS/FBS for help. | |
manualRegister | SAW-A40801 | Parameter missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A40802 | File open failed | Please check the file that exists and has the correct access permission. Or, please check whether the pre-registered image fov_stitched_transformed.tif exists in the input directory. | |
SAW-A40803 | File parse failed | Please check your input file is in the correct file format. The -v input gene expression matrix should be either a *tsv, barcode_gene_exp.txt, *.gem.gz, or *raw.gef. | |
SAW-A40804 | Invalid data or data exception | Error data. Please check the file content. This error may arise because the -v input file is empty. The second possible reason is that the gene expression matrix information in the IPR Register module (MatrixShape, Xstart, Ystart) does not match with the input GEF file (minX, minY, maxX, maxY), because the manual registration has to be processed on the identical matrix. | |
tissueCut | SAW-A50001 | Parameter missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A50002 | File open failed | Please check the file that exists and has the correct access permission. | |
SAW-A50003 | File parse failed | Please check your input file is in the correct file format. | |
SAW-A50004 | Fail to create output file | Fail to create output file. Please check your writing permission of the output directory. | |
SAW-A50005 | Fail to write TIFF | Fail to write a TIFF file. | |
SAW-A50006 | h5AttrWrite error | Please check the H5 file attributes. | |
SAW-A50007 | h5DatasetWrite error | Please check the H5 file dataset. | |
SAW-A50008 | Fail to create TIFF | Please check the access permission to write a TIFF image. | |
SAW-A50009 | Different sizes between TIFF and GEF | Please check the sizes of the TIFF image and GEF (gene expression matrix) respectively. Make sure they are the same size. | |
SAW-A50010 | Allocate memory error | This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job. | |
cellCut | SAW-A60001 | Parameter missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A60002 | File open failed | Please check the file that exists and has the correct access permission. | |
SAW-A60003 | File parse failed | The file does not contain correct information. Please check the file format. | |
SAW-A60110 | Program version error | Please check your output GEF version. Your input GEF version might be too old. | |
SAW-A60111 | Call process error | e.g. "Please call freeRestriction first, or call restrictRegion function before restrictGene." This error arose because the invocation flow order was messed up. Please modify your invocation flow as prompted. | |
SAW-A60120 | Invalid data or data exception | Error data. Please check the file content. | |
SAW-A60121 | File information missing | Failed to read the file. Please check whether the file is damaged. | |
SAW-A60122 | File pairing failed | Please check the TIFF mask size is consistent with the size of expression matrix. Since the mask has been registered with the expression matrix, their sizes are supposed to be the same. | |
SAW-A60130 | Fail to create output file | Fail to create output H5 file. Please check your writing permission of the output directory or contact FAS/FBS for help. | |
SAW-A60140 | Allocate memory error | This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job. | |
SAW-A60150 | Dimensions of gene expression matrix did not match | Please contact FAS/FBS for help. | |
calibration | SAW-A60201 | Parameter missing | Please check your input parameters. Some required parameters might be missed. |
SAW-A60202 | File open failed | Please check the file that exists and has the correct access permission. | |
spatialCluster | SAW-A70001 | Parameter missing | e.g. "-i or --gef_file is missing" Please check your input parameters. Some required parameters might be missed. |
SAW-A70002 | File open failed | e.g. “cannot access /path/to/file: No such file or directory.” Please check the file that exists and has the correct access permission. | |
SAW-A70005 | Value error | e.g. "The bin size is out of range, please check the range of gef binsize is in [1,10,20,50,100,200,500]." Please reset the bin size as prompted. e.g. "Gene number less than 3000, please check your gef file" Please check the content of your GEF file, and make sure there are at least 3000 genes for clustering. | |
cellCluster | SAW-A70101 | Parameter missing | e.g. "-i or --gef_file is missing" Please check your input parameters. Some required parameters might be missed. |
SAW-A70102 | File open failed | e.g. “cannot access /path/to/file: No such file or directory.” Please check the file that exists and has the correct access permission. | |
SAW-A70105 | Value error | e.g. "The bin size is out of range, please check the range of gef binsize is in [1,10,20,50,100,200,500]." Please reset the bin size as prompted. e.g. "Gene number less than 3000, please check your gef file" Please check the content of your GEF file, and make sure there are at least 3000 genes for clustering. | |
saturation | SAW-A80001 | Parameter missing | e.g. "-i is missing." Please check your input parameters. Some required parameters might be missed. |
SAW-A80002 | File open failed | Please check the file that exists and has the correct access permission. | |
SAW-A80003 | File parse failed | Invalid GEF file. Please check the file format. | |
SAW-A80004 | Invalid data or data exception | e.g. "no data left after filter by coordinates." Please check the file content. | |
SAW-A80005 | Invalid data or data exception | e.g. "total map reads is 0, please check file format from --bcstat" Please check the file content of the input file as prompted. | |
SAW-A80006 | File pairing failed | e.g. "map reads less than annotated reads." Please check the input mapping statistical report and the count statistical report are from the same analysis. | |
SAW-A80007 | Plot error | Please contact FAS/FBS for help. PATH environment may not have python3. | |
SAW-A80008 | Fail to create output | Fail to generate saturation file, please contact FAS/FBS for help. | |
report | SAW-A90001 | Parameter missing | e.g. "-m or --barcodeMapStat is missing." Please check your input parameter as prompted. Some required parameters might be missed. |
SAW-A90002 | File open failed | e.g. "cannot access *: No such file or directory." Please check the file that exists and has the correct access permission. | |
SAW-A90003 | File parse failed | JSON file format error. This error may arise because the input statistics files were not generated in the same SAW version as report . Or, the input mapping file prefix can not be parsed. Please contact FAS/FBS for help. | |
SAW-A90004 | Invalid data or data exception | e.g."information loss: fail to find 'bin_[size]' or 'ssDNA' in '*.rpi'." Please check the file content. | |
SAW-A90005 | Fail to create output file | Fail to create output file. Please check your writing permission of the output directory. | |
report-PT | SAW-A91001 | Parameter missing | e.g. "-s or --sn is missing." Please check your input parameters. Some required parameters might be missed. |
SAW-A91002 | File open failed | e.g. "cannot access *: No such file or directory." | |
SAW-A91004 | Invalid data or data exception | e.g. "information loss: fail to find 'bin_[size]' or 'ssDNA' in '*.rpi'." | |
SAW-A91005 | Fail to write | e.g. "failed to write in html" or "failed to write final_result_json." | |
cellCorrect | SAW-A13001 | GEF parse failed | e.g. "-i parameter is invalid, please check your input." Please check the path or file format of GEF. |
SAW-A13002 | TIFF parse failed | e.g. "-m parameter is invalid, please check your input." Please check the path or size of TIFF image. | |
SAW-A13003 | Fail to create output | e.g. "-o parameter is invalid, please check your input." Please check the access permission to write files or whether the output directory exists. | |
SAW-A13004 | Invalid adjusting distance | e.g. "-d parameter exceeds the range of adjusting distance." Please check whether the adjusting distance is in a proper and reasonable range. | |
multiomicsAnalysis | SAW-A14001 | Parameter Invalid | -r parameter is invalid, please check your Transcriptomics GEM/GEF file. |
SAW-A14002 | Parameter Invalid | -p parameter is invalid, please check your Proteomics GEM/GEF file. | |
SAW-A14003 | Parameter Invalid | -b parameter is invalid, please check your input binsize. | |
SAW-A14004 | Parameter Invalid | -o parameter is invalid, please check your output directory. | |
SAW-A14005 | Invalid data or data exception | --use_gpu parameter is invalid, please check your input. | |
SAW-A14006 | Invalid data or data exception | protein data/rna data is empty, it is probably all filtered out. | |
SAW-A14007 | Invalid data or data exception | RNA data and protein data are disjoint. | |
SAW-A14008 | Invalid data or data exception | Genes to plot scatters are empty. | |
lasso | SAW-A00031 | Parameters invalid or missing | e.g. "-i/-o/-m/-n is missing" Please check your input parameters. Some required parameters might be missed. |
SAW-A00032 | File open failed | e.g. "cannot access *: No such file or directory." Please check the file that exists and has the correct access permission. | |
SAW-A00033 | File parse failed | e.g. "file type error: * is not GEF/GEOJSON file." Invalid file. Please check the file format. | |
MIDFilter | SAW-A00051 | Parameter Invalid | Please check your input parameters. |
SAW-A00052 | File open failed | No such file or directory. | |
SAW-A00053 | File parse failed | File type error. | |
SAW-A00054 | Fail to create output | Generate filtered gef failed. |
BGIResearch/gefpy: gef io, draw out from stereopy. Accessed April 7, 2022.https://github.com/BGIResearch/gefpy
Gayoso A, Steier Z, Lopez R, et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. . 2021;Nat Methods 18(3):272-282. doi:10.1038/s41592-020-01050-x