STOmics logo 时空组学

EN
SAW软件操作手册 SAW软件操作手册 SAW软件操作手册
SAW软件操作手册
搜 索
SAW软件操作手册
附录和参考文献
附录A:推荐使用原始数据的目录结构

如果用户喜欢使用SAW GitHub页面(https://github.com/STOmics/SAW)上提供的SAW shell脚本,建议按照上述方法整理原始数据。

1、Spatial Transcriptomics (ST) :

bash
$ tree
.
|-- image
|   |-- SS200000135TL_D1_SC_20230822_144400_3.0.0.ipr
|   `-- SS200000135TL_D1_SC_20230822_144400_3.0.0.tar.gz
|-- mask
|   `-- SS200000135TL_D1.barcodeToPos.h5
|-- md5
|-- reads
|   |-- E100026571_L01_trim_read_1.fq.gz
|   `-- E100026571_L01_trim_read_2.fq.gz`
 -- reference    
      |-- STAR_SJ100   
      |   |-- chrLength.txt    
      |   |-- chrNameLength.txt    
      |   |-- chrName.txt    
      |   |-- chrStart.txt    
      |   |-- exonGeTrInfo.tab    
      |   |-- exonInfo.tab    
      |   |-- FMindex    
      |   |-- geneInfo.tab    
      |   |-- Genome    
      |   |-- genomeParameters.txt    
      |   |-- SA    
      |   |-- SAindex    
      |   |-- SAindexAux    
      |   |-- sjdbInfo.txt    
      |   |-- sjdbList.fromGTF.out.tab    
      |   |-- sjdbList.out.tab    
      |   `-- transcriptInfo.tab    
      |-- genes.gtf    
      `-- genome.fa     

5 directories, 25 files


2、Spatial Proteomics & Transcriptomics (PT) :

bash
$ tree
.
|-- image
|   |-- A02677B5_SC_20240131_192213_3.0.3.ipr
|   `-- A02677B5_SC_20240131_192213_3.0.3.tar.gz
|-- mask
|   `-- A02677B5.barcodeToPos.h5
|-- md5
|-- STOmics-RNA
│       ├── V350248064_L01_read_1.fq.gz
│       ├── V350248064_L01_read_2.fq.gz
│       ├── V350248064_L02_read_1.fq.gz
│       ├── V350248064_L02_read_2.fq.gz
│       ├── V350248064_L03_read_1.fq.gz
│       `-- V350248064_L03_read_2.fq.gz
|-- STOmics-ADT
│   │   ├── E150023160_L01_11_1.fq.gz
│   │   `-- E150023160_L01_11_2.fq.gz
|-- protein-reference
|   `-- ProteinPanel_128_mouse.list
`-- reference
    |-- STAR_SJ100
    |   |-- chrLength.txt
    |   |-- chrNameLength.txt
    |   |-- chrName.txt
    |   |-- chrStart.txt
    |   |-- exonGeTrInfo.tab
    |   |-- exonInfo.tab
    |   |-- FMindex
    |   |-- geneInfo.tab
    |   |-- Genome
    |   |-- genomeParameters.txt
    |   |-- SA
    |   |-- SAindex
    |   |-- SAindexAux
    |   |-- sjdbInfo.txt
    |   |-- sjdbList.fromGTF.out.tab
    |   |-- sjdbList.out.tab
    |   `-- transcriptInfo.tab
    |-- genes.gtf
    `-- genome.fa     

7 directories, 32 files


附录 B: SAW ST 输出文件列表

步骤
输出文件
文件描述
splitMask
*.SN.barcodeToPos.bin
通过CID分割Stereo-seq芯片T的mask文件。
mapping
lane.Aligned.sortedByCoord.out.bam
二进制比对/映射文件,用于存储序列比对信息。
lane.barcodeReadsCount.txt
比对上CID的reads列表文件,三列分别为x、y和reads计数。
lane.Log.final.out
mapping完成后汇总比对统计信息(STAR 输出)。
lane.Log.out
STAR mapping中的主日志文件(STAR 输出)。
lane.Log.progress.out
报告作业过程统计数据(STAR 输出)。
lane.SJ.out.tab
mapping过程中拼接接头的检测(STAR 输出)。
lane.bcPara
定义CID比对选项的参数文件。
lane.CIDMap.stat
mapping的统计报告,如比对上CID的reads计数、reads测序质量、比对上的DNB计数等。
lane.run.log
mapping模块输出日志文件。
lane.valid_cid_reads.fq
有效CID reads :CID比对后以FASTQ格式读取。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readID之后添加。
lane.unmapped_reads.fq
将Clean reads映射到参考基因组后,FASTQ格式的未比对上的reads。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readdID之后添加。
merge
SN.merge.barcodeReadsCount.txt
合并的比对上CID的reads列表文件,三列分别为x、y和reads计数。
count
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam
按坐标排序的带注释的BAM文件,包括HI:i标记为1的唯一比对reads和多比对reads。
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.csi
带注释的BAM的索引文件。
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.summary.stat
count的统计报告。“过滤&去重”指标中的“总reads数”表示BAM中的总比对记录数。“通过过滤的reads和注释总reads”指标一致,表示用于做注释、MID矫正、和定量的唯一比对reads。
SN.raw.gef
HDF5格式的基因表达文件。这是第一个包含完整芯片区域表达信息的原始矩阵。它仅包括一个bin的geneExp组。表达矩阵的原点已被校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY。
SN_raw_barcode_gene_exp.txt
一个记录坐标、基因、MID和计数信息的,以空格分隔的列表。该文件为计算测序饱和度所需要的抽样文件。其五列信息分别为y、x、geneIndex、MIDIndex、以及readCount。
count_data_hhmmss.log
Log文件。
register &  imageTools ipr2img
date-hh-mm-ss.log
Log文件。
SN or other user specified name for the image folder used when input into ImageQC/ImageStudio
存储TIFF格式的原始小图的目录。
SN_0000_0000_YYYY-MM-DD_hh-mm-ss-n.tif
TIFF格式的小图。
<stainType>_fov_stitched_transformed.rpi
多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。
fov_stitched.rpi
多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。
<stainType>_fov_stitched_transformed.tif
TIFF格式的已经与track线模板预配准的拼接全图。
<stainType>_fov_stitched.tif
拼接的全景图像。需要进一步旋转一个非直角或比例。
<stainType>matrix_template.txt
配准DAPI/IF图的track线交叉点模版。用于评估配准结果。
SN_<chipType>_date_time_version.ipr
IPR格式图像处理记录文件,记录从imageQC/imageStudio收集的基本图像信息。
<stainType>_SN_mask.tif
TIFF格式的配准后的DAPI/IF细胞分割二值化图。
<stainType>_SN_regist.tif
TIFF格式的配准全景图。
SN.rpi
保存配准后的显微拍照全景图、组织边界、以及细胞边界(降采样)的图像金字塔。
<stainType>_SN_tissue_cut.tif
TIFF格式的DAPI/IF全图文件的组织分割结果。
<stainType>_transform_template.txt
<stainType>_fov_stitched_transformed.tif的track线交叉点模板。用于评估拼接结果。
tissueCut
SN.gef
HDF5格式的基因表达文件。这个文件是一个完整的GEF格式,包括bin1、10、20、50、100、200和500中的geneExp组和wholeExp组。它还包括一个统计组。表达式矩阵的原点已经校准为(0,0),偏移量x和y已经记录为geneExp/expression数据集的属性中的minX和minY。
SN.tissue.gef
HDF5格式的基因表达文件。组织GEF包括组织覆盖区域的表达信息。它仅包括bin 1的geneExp组。表达式矩阵的原点已被校准为(0,0),偏移量x和y已在与原始GEF相同的geneExp/expression数据集的属性中记录为minX和minY。
SN.<label>.raw.label.gef
HDF5格式的基因表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。
SN.<label>.label.gef
HDF5格式的基因表达文件。它包括标记的组织覆盖区域的表达信息,在bin1,10,20,50,100,200和500中有geneExp组和wholeExp组。它还包括一个统计组。将原始表达矩阵校准为(0,0),并将偏移量x和y记录为geneExp/expression数据集属性中的minX和minY。
SN.gem.gz
压缩基因表达矩阵,用于存储基因空间表达数据。
SN.tissue.gem.gz
压缩基因表达矩阵,用于存储组织区域中的基因空间表达数据。
tissue_fig
该目录存储了组织覆盖区域的统计图。
scatter_100x100_MID_gene_counts.png
每个bin(bin 100)中的MID计数和基因类型的散点图。
scatter_150x150_MID_gene_counts.png
每个bin(bin 150)中的MID计数和基因类型的散点图。
scatter_200x200_MID_gene_counts.png
每个bin(bin 200)中的MID计数和基因类型的散点图。
scatter_20x20_MID_gene_counts.png
每个bin(bin 20)中的MID计数和基因类型的散点图。
scatter_50x50_MID_gene_counts.png
每个bin(bin 50)中的MID计数和基因类型的散点图。
statistic_100x100_MID_gene_DNB.png
x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin100)。
statistic_150x150_MID_gene_DNB.png
x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin150)。
statistic_200x200_MID_gene_DNB.png
x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin200)。
statistic_20x20_MID_gene_DNB.png
x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin20)。
statistic_50x50_MID_gene_DNB.png
x轴含毛边的MID数、基因类型和DNB数的单变量分布图(bin50)。
violin_100x100_MID_gene.png
展示每个bin(bin100)去重后的MID计数和基因类型的分布的小提琴图。
violin_150x150_MID_gene.png
展示每个bin(bin150)去重后的MID计数和基因类型的分布的小提琴图。
violin_200x200_MID_gene.png
展示每个bin(bin200)去重后的MID计数和基因类型的分布的小提琴图。
violin_20x20_MID_gene.png
展示每个bin(bin20)去重后的MID计数和基因类型的分布的小提琴图。
violin_50x50_MID_gene.png
展示每个bin(bin50)去重后的MID计数和基因类型的分布的小提琴图。
tissuecut.stat
组织覆盖区域的统计报告。
<label>.tissuecut.stat
标记覆盖区域的统计报告。
cellCorrect
SN.adjusted.cellbin.gef
Cellbin GEF由调整后的细胞分割TIFF图像和*.raw. GEF生成。
SN.adjusted.gem
Cellbin GEM由调整后的细胞分割TIFF图像和*.raw.gef生成。
<stainType>_SN_mask_edm_dis_10.tif
调整单元格分割二值图像在TIFF格式。
cellCut
SN.cellbin.gef
HDF5格式的细胞基因表达文件。Cellbin GEF包括细胞的表达信息,例如质心的坐标、边界坐标、基因的表达和细胞面积。通过边界来划分细胞。表达矩阵的原点已被校准为(0,0),坐标偏移量x和y记录在GEF文件的属性offsetX和offsetY,与原始GEF文件中的minX和minY相同。
spatialCluster
SN.bin200_1.0.spatial.cluster.h5ad
空间聚类分析结果的H5AD文件。
cellCluster
SN.cell.cluster.h5ad
细胞聚类分析结果的H5AD文件。
SN.adjusted.cell.cluster.h5ad
H5AD文件的细胞聚类分析结果,基于调整后的Cellbin GEF。
saturation
plot_1x1_saturation.png
bin1的测序饱和度分析图。对于每个bin(bin 1),按1-(唯一读数/总读数)计算。
plot_200x200_saturation.png
bin200的测序饱和度分析图。对于每个bin (bin 200),按1-(唯一读数/总读数)计算。
sequence_saturation.tsv
测序饱和度文件。九列分别为采样成分(#sample)、bin1总读数(bar_x)、bin1的测序饱和度值(bar_y1)、bin1的中值基因计数(bar_y2)、bin1的唯一读数(bar_umi)、bin200总读数(bin_x)、bin200的测序饱和度值(bin_y1)、bin200的中值基因计数(bin_y2)和bin200的唯一读数(bin_umi)。
report
SN.report.html
HTML网页分析报告。
SN.statistics.json
JSON格式的统计总结报告。它从每个步骤的统计报告中收集所有重要的统计指标。
scatter_1x1_MID_gene_counts.png
每个细胞的CID计数和基因数的散点图。
statistic_1x1_cell_area.png
细胞面积沿每个细胞的单变量分布。
statistic_1x1_DNB.png
DNB数沿每个细胞x轴的单变量分布。
statistic_1x1_gene.png
基因类型沿每个细胞x轴的单变量分布。
statistic_1x1_MID.png
MID计数沿每个细胞x轴的单变量分布。
violin_1x1_gene.png
小提琴图显示了每个细胞中基因类型的分布。
violin_1x1_MID.png
小提琴图显示了每个细胞中重复数据删除的MID计数的分布。


附录C:SAW PT 输出文件列表
步骤
输出文件
文件描述
splitMask
*.SN.barcodeToPos.bin
通过CID分割Stereo-seq芯片T的mask文件。
mapping
lane.Aligned.sortedByCoord.out.bam
二进制比对/映射文件,用于存储序列比对信息。
lane.barcodeReadsCount.txt
比对上CID的reads列表文件,三列分别为x、y和reads计数。
lane.Log.final.out
mapping完成后汇总比对统计信息(STAR 输出)。
lane.Log.out
STAR mapping中的主日志文件(STAR 输出)。
lane.Log.progress.out
报告作业过程统计数据(STAR 输出)。
lane.SJ.out.tab
mapping过程中拼接接头的检测(STAR 输出)。
lane.bcPara
定义CID比对选项的参数文件。
lane.CIDMap.stat
mapping的统计报告,如比对上CID的reads计数、reads测序质量、比对上的DNB计数等。
lane.run.log
mapping模块输出日志文件。
lane.valid_cid_reads.fq
有效CID reads :CID比对后以FASTQ格式读取。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readID之后添加。
lane.unmapped_reads.fq
将Clean reads映射到参考基因组后,FASTQ格式的未比对上的reads。类似于Q4 FASTQ格式,但每次读取的第一行不同,例如,“@V350044321L1C001R0020993658 | Cx:i:10413 | Cy:i:7737 D3450E0D391E EC7FF”,其中Cx和Cy表示解码的坐标,并在readdID之后添加。
merge
SN.merge.barcodeReadsCount.txt
合并的比对上CID的reads列表文件,三列分别为x、y和reads计数。
count
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam
按坐标排序的带注释的BAM文件,包括HI:i标记为1的唯一比对reads和多比对reads。
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.csi
带注释的BAM的索引文件。
SN.Aligned.sortedByCoord.out.merge.q10.dedup.target.bam.summary.stat
count的统计报告。“过滤&去重”指标中的“总reads数”表示BAM中的总比对记录数。“通过过滤的reads和注释总reads”指标一致,表示用于做注释、MID矫正、和定量的唯一比对reads。
SN.raw.gef
HDF5格式的基因表达文件。这是第一个包含完整芯片区域表达信息的原始矩阵。它仅包括一个bin的geneExp组。表达矩阵的原点已被校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY。
SN_raw_barcode_gene_exp.txt
一个记录坐标、基因、MID和计数信息的,以空格分隔的列表。该文件为计算测序饱和度所需要的抽样文件。其五列信息分别为y、x、geneIndex、MIDIndex、以及readCount。
count_data_hhmmss.log
Log文件。
register &  imageTools ipr2img
date-hh-mm-ss.log
Log文件。
SN or other user specified name for the image folder used when input into ImageQC/ImageStudio
存储TIFF格式的原始小图的目录。
SN_0000_0000_YYYY-MM-DD_hh-mm-ss-n.tif
TIFF格式的小图。
<stainType>_fov_stitched_transformed.rpi
多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。
fov_stitched.rpi
多重染色拼接全貌图像(.tif)存储在RPI文件中。已经与track线模板配准的拼接全图,支持存储多种TIFF格式的染色拼接全图。所以它不需要再次调整非直角角度或缩放比例。
<stainType>_fov_stitched_transformed.tif
TIFF格式的已经与track线模板预配准的拼接全图。
<stainType>_fov_stitched.tif
拼接的全景图像。需要进一步旋转一个非直角或调整缩放比例。
<stainType>matrix_template.txt
配准DAPI/IF图的track线交叉点模版。用于评估配准结果。
SN_<chipType>_date_time_version.ipr
IPR格式图像处理记录文件,记录从imageQC/imageStudio收集的基本图像信息。
<stainType>_SN_mask.tif
TIFF格式的配准后的DAPI/IF细胞分割二值化图。
<stainType>_SN_regist.tif
TIFF格式的配准全景图。
SN.rpi
保存配准后的显微拍照全景图、组织边界、以及细胞边界(降采样)的图像金字塔。
<stainType>_SN_tissue_cut.tif
TIFF格式的DAPI/IF全图文件的组织分割结果。
<stainType>_transform_template.txt
<stainType>_fov_stitched_transformed.tif的track线交叉点模板。用于评估拼接结果。
mapping-SP
SN_cid_pid_mid_reads.tsv
tab分隔的文件记录坐标、基因、MID、PIDIndex 和reads计数信息。 准备作为对所有蛋白执行测序饱和度的采样文件。 这 5 列是 x、y、PIDIndex、MIDIndex、readCount。
SN_map.stat
CID比对和过滤、MID过滤、PID比对和测序质量的统计报告。
SN.protein.gem.gz
GEM格式的蛋白表达文件
SN.protein.raw.gef
hdf5 格式的蛋白表达文件。 这是第一个包含整个芯片区域表达信息的原始矩阵。 它只包括bin大小为1的geneExp组。表达矩阵的原点已校准为(0,0),并且偏移量x和y已在 geneExp/expression 的属性中记录为minX和minY。
SN_valid_cid_reads.tsv
比对上CID的列表文件,其中包含每个 CID 的读取计数,三列记录 x、y 和读取计数。
calibration
SN.gef/SN.protein.gef
HDF5 格式的基因表达文件。该文件是一个完整的 GEF
格式,包括 bin1、10、20、50、100、200 和 500 的geneExp 组和 wholeExp 组。它还包括一个统计组。 x 和 y的偏移量在geneExp/expression数据集的属性中记录为minX和minY,
SN.calibrated.raw.gef /
SN.protein.calibrated.raw.gef
HDF5 格式的基因/蛋白表达文件,包括整张芯片的表达信息。它只包括bin1的geneExp组。这两个表达矩阵已校准到相同的偏移量。
tissueCut
SN.<label>.raw.label.gef /
SN.protein.<label>.raw.label.gef
HDF5格式的基因/蛋白表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。
SN.<label>.label.gef/
SN.protein.<label>.label.gef
matrix has been calibrated to (0,0), and the offset x and y has been
recorded as minX and minY in the attribute of geneExp/expression
dataset same to raw GEF.
HDF5格式的基因/蛋白表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。它还包括一个统计组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。
tissue_fig
该目录存储了组织覆盖区域的统计图。
scatter_<bin>x<bin>_MID_gene_counts.png
每个bin 中的MID 计数和基因类型的散点图。bin大小包括:bin200、bin150、bin100、bin50、bin20。
statistic_<bin>x<bin>MID_gene_DNB.png /
statistic_<bin>x<bin>MID.png /
statistic_<bin>x<bin>gene.png /
statistic_<bin>x<bin>_MID_gene_DNB.png
MID数量、基因类型和DNB 沿x 轴的单变量分布。bin大小包括:bin200、bin150、bin100、bin50、bin20。
violin_<bin>x<bin>MID_gene.png /
violin_<bin>x<bin>MID.png /
violin_<bin>x<bin>_gene.png
展示每个bin 去重后的MID 计数和基因类型的分布的小提琴图。bin大小包括:bin200、bin150、bin100、bin50、bin20。
tissuecut.stat
组织覆盖区域的统计报告。
<label>.tissuecut.stat
标记覆盖区域的统计报告。
SN.tissue.gef/
SN.protein.tissue.gef
HDF5格式的基因/蛋白表达文件。它携带了标记的组织覆盖区域的表达信息,只有bin1大小的geneExp组。表达式矩阵的原点已校准为(0,0),偏移量x和y在geneExp/expression数据集的属性中记录为minX和minY,与原始GEF相同。
cellCut
SN.cellbin.gef/
SN.protein.cellbin.gef
HDF5格式的细胞基因表达文件。Cellbin GEF包括细胞的表达信息,例如质心的坐标、边界坐标、基因的表达和细胞面积。通过边界来划分细胞。表达矩阵的原点已被校准为(0,0),坐标偏移量x和y在GEF文件的属性中记录为offsetX和offsetY,与原始GEF文件中的minX和minY相同。
cellCorrect
SN.adjusted.cellbin.gef/
SN.protein.adjusted.cellbin.gef
Cellbin GEF由调整后的细胞分割TIFF图像和*.raw. GEF生成。
SN.adjusted.gem/
SN.protein.adjusted.gem
Cellbin GEM由调整后的细胞分割TIFF图像和*.raw.gef生成。
<stainType>_SN_mask_edm_dis_10.tif
调整单元格分割二值图像在TIFF格式。
spatialCluster & spatialCluster-SP
SN.bin200_1.0.spatial.cluster.h5ad/
SN_bin200_0.1.protein.spatial.cluster.h5ad
空间聚类分析结果的H5AD文件。
cellCluster & cellCluster-SP
SN.cell.cluster.h5ad/
SN.protein.cell.cluster.h5ad
细胞聚类分析结果的H5AD文件。
SN.adjusted.cell.cluster.h5ad/
SN.protein.adjusted.cell.cluster.h5ad
H5AD文件的细胞聚类分析结果,基于调整后的Cellbin GEF。
saturation
plot_1x1_saturation.png
bin1的测序饱和度分析图。对于每个bin(bin 1),按1-(唯一读数/总读数)计算。
plot_200x200_saturation.png
bin200的测序饱和度分析图。对于每个bin (bin 200),按1-(唯一读数/总读数)计算。
sequence_saturation.tsv
测序饱和度文件。九列分别为采样成分(#sample)、bin1总读数(bar_x)、bin1的测序饱和度值(bar_y1)、bin1的中值基因计数(bar_y2)、bin1的唯一读数(bar_umi)、bin200总读数(bin_x)、bin200的测序饱和度值(bin_y1)、bin200的中值基因计数(bin_y2)和bin200的唯一读数(bin_umi)。
multiomics-
Analysis
SN_50_differential_expression.csv
包含了蛋白和RNA的1-vs-all差异分析CSV文件
SN_50_dotplot_RNA_totalVI_03.png
基于空间聚类后的markder基因差异热图
SN_50_matrixplot_Protein_totalVI_04.png
基于空间聚类后的marker蛋白差异热图
SN_50_Protein_Correlation_Heatmap_05.png
蛋白-蛋白相关性图
SN_50_spatial_leiden_totalVI_02.png
潜空间的空间聚类图
SN_50_UMAP_leiden_totalVI_01.png
潜空间的UMAP 图。
SN_50.h5mu
记录了蛋白和RNA聚类分析结果的H5MU 文件。
report-PT
rna_cell
该目录存储了组织的转录组cellbin 统计图。
protein_cell
该目录存储了组织的蛋白组cellbin 统计图。
AnalysisReport
该目录存储了工作流程摘要报告的所有元素。AnalysisReport/report.html 是主页。
input.yaml
配置文件记录了所有的输入文件。
SN.statistics.json
JSON格式的统计总结报告。它从每个步骤的统计报告中收集所有重要的统计指标。



附录 D: 处理错误和异常情况

常见错误
  • 文件访问错误
  • 文件访问错误的若干示例
bash
terminate called after throwing an instance of 'std::invalid_argument'
what():  Could not open the file: xxxxxx
...
或者
bash
#006: H5FDsec2.c line 352 in H5FD__sec2_open(): unable to open file: name = '/path/to/hdf5/file', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
  major: File accessibility
  minor: Unable to open file
...
解决方案
bash
# Attempt 1: Bind file path before execute command.
$ export SINGULARITY_BIND="/path/to/file/directory"

# Attempt 2: Turn off the HDF5 lock on the file by running the command below before running SAW.
$ export HDF5_USE_FILE_LOCKING=FALSE
mapping
  • 读取名称分隔符错误
>bash
EXITING: FATAL INPUT ERROR: empty value for parameter “readNameSeparator” in input “Command-Line”
SOLUTION: use non-empty value for this parameter
解决方案: 可尝试从命令行删除--readNameSeparator ” ”
  • limitBAMsortRAM 错误
bash
Error code: SAW-A10183

EXITING because of fatal ERROR: not enough memory for BAM sorting:
SOLUTION: re-run STAR with at least –limitBAMsortRAM xxxxxxxxx
解决方案:参考stderr中的解决方案修改由--limitBAMsortRAM指定的值。
  • --limitOutSJcollapsed 错误:
en_model7_1.5e197bd3
bash
EXITING because of fatal error: buffer size of SJ output is too small
Solution: increase input parameter --limitOutSJcollapsed
解决方案:参数--limitOutSJcollapsed 和 --limitIObufferSize是配套的,请替换为 --limitOutSJcollapsed 10000000            --limitIObufferSize=480000000 后再次尝试运行。
imageTools
  • 为 imageTools merge抛出的OSError
bash
OSError: [Errno 30] Read-only file system: '/opt/saw_v5.4.0_software/pipeline/imageTools/imagetools-1.0.0/log'
解决方案: 引发错误的原因是-o输入是一个没有路径的文件名字. 请尝试在-o参数输入/path/to/output/file.rpi或者./file.rpi for -o。


附录 E:错误码

SAW的错误码通过易于理解的方式描述流程中捕获的报错信息。用户可以根据错误码信息自行快速定位报错并处理。报错信息输出在“errcode.log”文件中。

错误码的设计包含三个部分,日期时间、错误码、和描述。日期时间信息可以帮助用户区分不同的运行时间。错误码部分通过字母与数字的组合定义流程模块和报错类型。描述部分详细说明报错或异常,以及提示可尝试的解决方法。

Pipeline
Code
Error type
Examples and error handling
splitMask
SAW-A00001
Parameters invalid or missing
e.g. "parameters error"
Please check your input parameters. Some required parameters might be missed.
e.g. "splitBcPos error, expected 1_24 or 2_25"
Please check your input CID position is either 1_24 or 2_25.
SAW-A00002
File open failed
Please check the input file exists and has the correct access permission.
SAW-A00003
File parse failed
e.g. "only support .bin or .h5 file."
Please check your input file is in the correct file format.
SAW-A00004
Other IO API error
e.g. "cannot write to file, /path/to/file"
Please check your output directory path is an existing path.
CIDCount
SAW-A00021
Parameters invalid or missing
Please check your input parameters. Some required parameters might be missed.
SAW-A00022
File open failed
e.g. "cannot open such file, /path/to/file"
Please check the input file exists and has the correct access permission.
SAW-A00023
Failed to parser the file
Please check your input file is in the correct file format.
SAW-A00024
Other API error
Please check Appendix C or contact FAS/FBS for help.
SAW-A00025
Software exception
Please check Appendix C or contact FAS/FBS for help.
mapping
SAW-A10100
Parameters invalid or missing
e.g. "EXITING: FATAL INPUT ERROR: unrecognized parameter name "outSAMattribute" in input "Command-Line-Initial""
Please check the spelling of your argument and parameters.
e.g. "please check the umi position and length"
Please check the length of the reads in FQ1 are consistent with the parameters set for barcode length and umi length. For example, the length of the sum of barcode and umi set in bcPara file is 35 bp, but one of the reads in FQ1 is 30 bp, then you will see A10101 error code. Please check the parameters set in the bcPara file are consistent with your FQs.
SAW-A10101
File open failed
e.g. "Error, cannot open the file which be expected in gz or ascii format"
Please check the file permission and file format.
e.g. "barcodePositionMapFile does not exists: /path/to/mask"
Please check the mask file exists and has the correct access permission.
SAW-A10102
File parse failed
e.g. "sequence and quality have different length"
Please check the completeness of the reads in FQ. This issue may arise if the FQs are in incorrect format or the file was incompletely written or transferred.
SAW-A10103
Invalid data or data exception
Error data. Please check the file format and content.
SAW-A10104
File deletion failed
This error arose if "_STARtmp" directory failed to be deleted. Please check whether the program has completely finished according to the *.Log.progress.out or *.run.log file.
SAW-A10105
File IO failed
Please contact FAS/FBS for help.
SAW-A10106
Failure on APIs of system and libraries
Please contact FAS/FBS for help.
SAW-A10107
Software assert
Please contact FAS/FBS for help.
SAW-A10108
Software exception
Please contact FAS/FBS for help.
SAW-A10109
Allocate memory error
This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job.
SAW-A10110
Out of disk space
This error occurs when you store too many files on your hard disk. Please remove some files to free disk space.
SAW-A10200
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A10201
CID comparison rate is too low
This error usually arises because the CID information of the input FQs is not the same as the CID in the input mask file. Please use the correct SN-FQ pairs for mapping.
SAW-A10202
Fail to create the index for the BAM file
Please contact FAS/FBS for help.
SAW-A10300
Fail to load indexed reference
Please check the existence, access permission, and completeness for the indexed reference.
mapping-SP
SAW-A12100
Parameter invalid
Please comfirm the parameter setting is reasonable. e.g. "cidLen=0"
SAW-A12101
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A12102
File open failed
Please check the input file that exists and has the correct access permission.
SAW-A12103
File type invalid
Please check the file format.
SAW-A12104
File parse failed
Please check your input file has a valid column number.
SAW-A12105
Protein name invalid
"PIDName" ONLY ACCEPTS letters [a-zA-Z], digits [0-9] and symbols ["(", ")", "-", "_"].
count
SAW-A20001
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A20002
File open failed
Please check the input file that exists and has the correct access permission.
SAW-A20003
File parse failed
Please check your input BAM header is in the correct file format.
SAW-A20004
Allocate memory error
This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job.
SAW-A20005
File parse failed
e.g. "Found <number> gene names with their length exceeding 64 characters"
Please check gene names in your input GTF/GFF file.
SAW-A20101
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A20102
File open failed
Please check the file that exists and has the correct access permission.
SAW-A20103
File parse failed
Please check your input file has a valid column number.
SAW-A20105
File parse failed
e.g. "Found <number> gene names with their length exceeding 64 characters"
Please check gene names in your input GTF/GFF file.
merge
SAW-A30001
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A30002
File open failed
Please check the input file exists and has the correct access permission.
SAW-A30003
File parse failed
Please check your mask file is in the correct file format. Only support .h5/.bin mask file.
SAW-A30004
Allocate memory error
Please check whether the range of the coordinates is too large, or you have run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job.
SAW-A30005
Fail to open input file
Fail to open input TXT file. Please check the file that exists and have the correct access permission.
register & rapidRegister
SAW-A40001
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A40002
File open failed
Please check the input file that exists and has the correct access permission. Or, please check whether a stitched panoramic TIFF (.tif or .tiff) image exists in the TAR.GZ.
SAW-A40003
File parse failed
Please check your input file is in the correct file format. The -v input gene expression matrix should be either a *tsv, barcode_gene_exp.txt, *.gem.gz, or *raw.gef.
SAW-A40004
Invalid data or data exception
Error data. Please check the file content. This error may arise because the -v input file is empty, the CZI file in TAR.GZ is invalid, or the QCPassFlag in IPR is 0.
SAW-A40005
Tissue segmentation error
Abnormal tissue segmentation reference score in image preprocessing.
SAW-A40006
IPR field missing
"Stitch/BGIStitch/StitchedGlobalLoc", does not exist.
SAW-A40007
Tiled image missing
Please check the uncompressed image folder has tiled images.
SAW-A40008
Insufficient GPU memory
GPU resources in the current node are insufficient. Please redeliver the task on an adequate one.
SAW-A40009
Abnormal chip SN prefix
Check "ImageInfo -> STOmicsChipSN" in.ipr where it offers the information of chip SN prefix.
imageTools
SAW-A40401
Parameter missing
Please check your imageTools merge input parameters. Some required parameters might be missed.
SAW-A40402
File open failed
Please check the imageTools merge input file that exists and has the correct access permission.
SAW-A40405
Invalid input
imageTools merge inputs of less than two images or more than three images.
SAW-A40406
File pairing failed
Please check the imageTools merge input TIFF sizes are the same. Since the merge function is used for evaluating segmentation results, the input images are supposed to be the same in size and position (tissue position in the whole image).
SAW-A40501
Parameter missing
Please check your imageTools overlay input parameters. Some required parameters might be missed.
SAW-A40502
File open failed
Please check the imageTools overlay input file that exists and has the correct access permission.
SAW-A40504
Invalid data or data exception
Error data. Please check the file content. This error may arise because the -c input IPR file does not include Stitch/TransformTemplate or Register/MatrixTemplate information.
SAW-A40601
Parameter missing
Please check your imageTools img2rpi input parameters. Some required parameters might be missed.
SAW-A40602
File open failed
Please check the imageTools img2rpi input file that exists and has the correct access permission.
SAW-A40605
Invalid input
imageTools img2rpi input -i and -g have different length. These two inputs are supposed to be paired.
SAW-A40701
Parameter missing
Please check your imageTools ipr2img input parameters. Some required parameters might be missed.
SAW-A40702
File open failed
Please check the imageTools ipr2imginput file that exists and has the correct access permission. Or, please check whether a stitched panoramic TIFF (.tif or .tiff) image exists in the TAR.GZ.
SAW-A40703
File parse failed
Please check your imageTools ipr2img input file is in the ImageStudio output TAR.GZ format.
SAW-A40704
Invalid data or data exception
Error data. Please check the imageTools ipr2img input IPR file content.
This error may arise because the CZI file in TAR.GZ is invalid, or the image has not either automatically or manually registered with the expression matrix. The second circumstance can be confirmed from IPR by checking whether the StereoResepSwitch/register is TRUE (has not performed automatic registration) or the ManualState/register is FALSE (has not performed manual registration). The third possible reason is the StereoResepSwitch/tissueseg is TRUE, therefore cannot output cell segmentation result.
SAW-A40706
File pairing failed
Please check the shape of registered tissue segmentation images stored in theimageTools ipr2img input IPR are the same. Or please contact FAS/FBS for help.
manualRegister
SAW-A40801
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A40802
File open failed
Please check the file that exists and has the correct access permission. Or, please check whether the pre-registered image fov_stitched_transformed.tif exists in the input directory.
SAW-A40803
File parse failed
Please check your input file is in the correct file format. The -v input gene expression matrix should be either a *tsv, barcode_gene_exp.txt, *.gem.gz, or *raw.gef.
SAW-A40804
Invalid data or data exception
Error data. Please check the file content. This error may arise because the -v input file is empty. The second possible reason is that the gene expression matrix information in the IPR Register module (MatrixShape, Xstart, Ystart) does not match with the input GEF file (minX, minY, maxX, maxY), because the manual registration has to be processed on the identical matrix.
tissueCut
SAW-A50001
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A50002
File open failed
Please check the file that exists and has the correct access permission.
SAW-A50003
File parse failed
Please check your input file is in the correct file format.
SAW-A50004
Fail to create output file
Fail to create output file. Please check your writing permission of the output directory.
SAW-A50005
Fail to write TIFF
Fail to write a TIFF file.
SAW-A50006
h5AttrWrite error
Please check the H5 file attributes.
SAW-A50007
h5DatasetWrite error
Please check the H5 file dataset.
SAW-A50008
Fail to create TIFF
Please check the access permission to write a TIFF image.
SAW-A50009
Different sizes between TIFF and GEF
Please check the sizes of the TIFF image and GEF (gene expression matrix) respectively. Make sure they are the same size.
SAW-A50010
Allocate memory error
This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job.
cellCut
SAW-A60001
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A60002
File open failed
Please check the file that exists and has the correct access permission.
SAW-A60003
File parse failed
The file does not contain correct information. Please check the file format.
SAW-A60110
Program version error
Please check your output GEF version. Your input GEF version might be too old.
SAW-A60111
Call process error
e.g. "Please call freeRestriction first, or call restrictRegion function before restrictGene."
This error arose because the invocation flow order was messed up. Please modify your invocation flow as prompted.
SAW-A60120
Invalid data or data exception
Error data. Please check the file content.
SAW-A60121
File information missing
Failed to read the file. Please check whether the file is damaged.
SAW-A60122
File pairing failed
Please check the TIFF mask size is consistent with the size of expression matrix. Since the mask has been registered with the expression matrix, their sizes are supposed to be the same.
SAW-A60130
Fail to create output file
Fail to create output H5 file. Please check your writing permission of the output directory or contact FAS/FBS for help.
SAW-A60140
Allocate memory error
This error occurs when you run out of RAM. You may need to kill some jobs that use the same RAM or upgrade your hardware to get more RAM resources for your job.
SAW-A60150
Dimensions of gene expression matrix did not match
Please contact FAS/FBS for help.
calibration
SAW-A60201
Parameter missing
Please check your input parameters. Some required parameters might be missed.
SAW-A60202
File open failed
Please check the file that exists and has the correct access permission.
spatialCluster
SAW-A70001
Parameter missing
e.g. "-i or --gef_file is missing"
Please check your input parameters. Some required parameters might be missed.
SAW-A70002
File open failed
e.g. “cannot access /path/to/file: No such file or directory.”
Please check the file that exists and has the correct access permission.
SAW-A70005
Value error
e.g. "The bin size is out of range, please check the range of gef binsize is in [1,10,20,50,100,200,500]."
Please reset the bin size as prompted.
e.g. "Gene number less than 3000, please check your gef file"
Please check the content of your GEF file, and make sure there are at least 3000 genes for clustering.
cellCluster
SAW-A70101
Parameter missing
e.g. "-i or --gef_file is missing"
Please check your input parameters. Some required parameters might be missed.
SAW-A70102
File open failed
e.g. “cannot access /path/to/file: No such file or directory.”
Please check the file that exists and has the correct access permission.
SAW-A70105
Value error
e.g. "The bin size is out of range, please check the range of gef binsize is in [1,10,20,50,100,200,500]."
Please reset the bin size as prompted.
e.g. "Gene number less than 3000, please check your gef file"
Please check the content of your GEF file, and make sure there are at least 3000 genes for clustering.
saturation
SAW-A80001
Parameter missing
e.g. "-i is missing."
Please check your input parameters. Some required parameters might be missed.
SAW-A80002
File open failed
Please check the file that exists and has the correct access permission.
SAW-A80003
File parse failed
Invalid GEF file. Please check the file format.
SAW-A80004
Invalid data or data exception
e.g. "no data left after filter by coordinates."
Please check the file content.
SAW-A80005
Invalid data or data exception
e.g. "total map reads is 0, please check file format from --bcstat"
Please check the file content of the input file as prompted.
SAW-A80006
File pairing failed
e.g. "map reads less than annotated reads."
Please check the input mapping statistical report and the count statistical report are from the same analysis.
SAW-A80007
Plot error
Please contact FAS/FBS for help. PATH environment may not have python3.
SAW-A80008
Fail to create output
Fail to generate saturation file, please contact FAS/FBS for help.
report
SAW-A90001
Parameter missing
e.g. "-m or --barcodeMapStat is missing."
Please check your input parameter as prompted. Some required parameters might be missed.
SAW-A90002
File open failed
e.g. "cannot access *: No such file or directory."
Please check the file that exists and has the correct access permission.
SAW-A90003
File parse failed
JSON file format error. This error may arise because the input statistics files were not generated in the same SAW version as report. Or, the input mapping file prefix can not be parsed. Please contact FAS/FBS for help.
SAW-A90004
Invalid data or data exception
e.g."information loss: fail to find 'bin_[size]' or 'ssDNA' in '*.rpi'."
Please check the file content.
SAW-A90005
Fail to create output file
Fail to create output file. Please check your writing permission of the output directory.
report-PT
SAW-A91001
Parameter missing
e.g. "-s or --sn is missing."
Please check your input parameters. Some required parameters might be missed.
SAW-A91002
File open failed
e.g. "cannot   access *: No such file or directory."
SAW-A91004
Invalid data or data exception
e.g. "information loss: fail to find 'bin_[size]' or 'ssDNA' in '*.rpi'."
SAW-A91005
Fail to write
e.g. "failed to write in html" or "failed to write final_result_json."
cellCorrect
SAW-A13001
GEF parse failed
e.g. "-i parameter is invalid, please check your input."
Please check the path or file format of GEF.
SAW-A13002
TIFF parse failed
e.g. "-m parameter is invalid, please check your input."
Please check the path or size of TIFF image.
SAW-A13003
Fail to create output
e.g. "-o parameter is invalid, please check your input."
Please check the access permission to write files or whether the output directory exists.
SAW-A13004
Invalid adjusting distance
e.g. "-d parameter exceeds the range of adjusting distance."
Please check whether the adjusting distance is in a proper and reasonable range.
multiomicsAnalysis
SAW-A14001
Parameter Invalid
-r parameter is invalid, please check your Transcriptomics GEM/GEF file.
SAW-A14002
Parameter Invalid
-p parameter is invalid, please check your Proteomics GEM/GEF file.
SAW-A14003
Parameter Invalid
-b parameter is invalid, please check your input binsize.
SAW-A14004
Parameter Invalid
-o parameter is invalid, please check your output directory.
SAW-A14005
Invalid data or data exception
--use_gpu parameter is invalid, please check your input.
SAW-A14006
Invalid data or data exception
protein data/rna data is empty, it is probably all filtered out.
SAW-A14007
Invalid data or data exception
RNA data and protein data are disjoint.
SAW-A14008
Invalid data or data exception
Genes to plot scatters are empty.
lasso
SAW-A00031
Parameters invalid or missing
e.g. "-i/-o/-m/-n is missing"
Please check your input parameters. Some required parameters might be missed.
SAW-A00032
File open failed
e.g. "cannot access *: No such file or directory."
Please check the file that exists and has the correct access permission.
SAW-A00033
File parse failed
e.g. "file type error: * is not GEF/GEOJSON file."
Invalid file. Please check the file format.
MIDFilter
SAW-A00051
Parameter Invalid
Please check your input parameters.
SAW-A00052
File open failed
No such file or directory.
SAW-A00053
File parse failed
File type error.
SAW-A00054
Fail to create output
Generate filtered gef failed.




参考文献
  • BGIResearch/SAW. Accessed October 13, 2021. https://github.com/BGIResearch/SAW
  • Chen A, Liao S, Cheng M, et al. Large field of view-spatially resolved transcriptomics at nanoscale resolution Short title: DNA nanoball stereo-sequencing. bioRxiv. Published online January 24,2021:2021.01.17.427004. doi:10.1101/2021.01.17.427004
  • Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2009;38(6):1767-1771. doi:10.1093/nar/gkp1137
  • Archives SR, Sra T, Nucleotide I, et al. File Format Guide 1. Published online 2009:1-11. Accessed May 21, 2021.
  • Merkel D. Docker: lightweight Linux containers for consistent development and deployment. Linux J.2014;2014(239):2. Accessed October 15, 2021.https://www.linuxjournal.com/content/docker-lightweight-linux-containers-consistent-development-and-deployment
  • Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One.2017;12(5):e0177459. doi:10.1371/journal.pone.0177459
  • Sequence Alignment/Map Format Specification.; 2021. Accessed May 21, 2021.https://github.com/samtools/hts-specs.
  • Dobin A, Davis CA, Schlesinger F, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics.2013;29(1):15-21. doi:10.1093/bioinformatics/bts635
  • Ensembl. GFF/GTF File Format. Published 2020. Accessed May 27, 2021.http://www.ensembl.org/info/website/upload/gff.html?redirect=no
  • GFF2 - GMOD. Accessed May 27, 2021. http://gmod.org/wiki/GFF2
  • GitHub - BGIResearch/stereopy: A toolkit of spatial transcriptomic analysis. Accessed July 4, 2021.https://github.com/BGIResearch/stereopy
  • BGIResearch/geftools: Tools for manipulating GEFs. Accessed April 7, 2022.https://github.com/BGIResearch/geftools
  • BGIResearch/gefpy: gef io, draw out from stereopy. Accessed April 7, 2022.https://github.com/BGIResearch/gefpy

  • Gayoso A, Steier Z, Lopez R, et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. . 2021;Nat Methods 18(3):272-282. doi:10.1038/s41592-020-01050-x

联系方式
即刻了解华大时空组学
咨询