BAM
BAM文件为二进制格式,用于保存序列的比对和基因注释信息。SAW count
在BAM文件的可选字段中添加自定义标签,用于记录read的坐标、CID、MID信息,注释信息通过标签字段添加在BAM中。
标签信息
BAM 可选字段中添加的自定义标签信息:
Tag | Description |
---|---|
Cx:i | x coordinate of the Coordiante ID. |
Cy:i | y coordinate of the Coordiante ID. |
UR:Z | The hexadecimal representation of uncorrected binary-encoded MID. |
XF:i | Mapping region on the reference genome. Valid value: 0=EXONIC, 1=INTRONIC, 2=INTERGENIC, 3=rRNA, 4=ANTISENSE. |
GI:Z | Annotated gene ID. |
GE:Z | Annotated gene name. |
GS:Z | ‘+’ or ‘-’, indicating forward/reverse strand respectively. |
UB:Z | The hexadecimal representation of count corrected binary-encoded MID. |
原始比对输出的BAM示例:
E100026571L1C009R00301275185 16 1 3000095 255 26M121066N74M * 0 0 GGCTTTTTTTTTTTTTTTTTTTTTTTTTTTCTAAATATTGGGTTTTATTAGCACCATGATAACTGTATATTAATTTGCACTGACTGTCATAACAAAATAC G+:GFFGGFGFFGFFGFGGFFGFFFFFCFGFCFGGGFGGFGFFFFGGFGGFGFFFGGFFGFFFGFGFGFFGFFGFGFFFFGFFFFFFFFGGFFGGFFGEF NH:i:1 HI:i:1 AS:i:88 nM:i:0 Cx:i:4826 Cy:i:11598 UR:Z:6FA29
基因注释后的BAM示例:
E100026571L1C002R00703943265 1040 1 3082766 255 11M132671N89M * 0 0 CTGCTGCAGCTTTTTTTTCTTTGAGATTTATTTTTATGCTATGTGTATGGGTATTTTGCCTGCATATATGTCTATGCACCATGTGTGTGCAGTGCTTGAG FFFFFECGFDCFGDGDFEE@EEGIBFGGCGFFGACGFCGFFDGDGFFFFFFEGCDFCGFFGG@FFF=EFFDGGGGGFDGFFFGGGFGFFGGGFFGGGDFG NH:i:1 HI:i:1 AS:i:88 nM:i:0 Cx:i:7767 Cy:i:18052 UR:Z:7AE49 XF:i:0 GI:Z:ENSMUSG00000051951 GE:Z:Xkr4 GS:Z:- UB:Z:79E49
比对统计信息
测序 FASTQ 经过 read 比对之后,该环节的统计文件、详细信息被保存在 /STEREO_ANALYSIS_WORKFLOW/ALIGNMENT/<lane>.CIDMap.stat
目录下。
Metric | Description |
---|---|
Number of CID in chip mask | Number of CIDs in the chip mask file |
Number of unique CID in FASTQ | Number of unique CIDs in FASTQs |
Number of total reads | Number of total reads in FASTQs |
Q10 in CID % | Ratio of Q10 CID bases |
Q20 in CID % | Ratio of Q20 CID bases |
Q30 in CID % | Ratio of Q30 CID bases |
Number of mapped CID | Number of reads mapped to CID |
% of mapped CID | Ratio of reads mapped to CID |
Number of exactly mapped CID | Number of reads exactly mapped to CID |
% of exactly mapped CID | Ratio of reads exactly mapped to CID |
Number of CID with mismatch | Number of reads mapped to CID with mismatch |
% of CID with mismatch | Ratio of reads mapped to CID with mismatch |
Q10 in RNA % | Ratio of Q10 RNA bases |
Q20 in RNA % | Ratio of Q20 RNA bases |
Q30 in RNA % | Ratio of Q30 RNA bases |
Number of reads with polyA | Number of reads with polyA sequence |
% of reads with polyA | Ratio of reads with polyA sequence |
Number of short reads (trim polyA) | Number ot short reads after trimming polyA sequence |
% of short reads (trim polyA) | Ration ot short reads after trimming polyA sequence |
Number of reads with adapter | Number of reads with adapter sequence |
% of reads with adapter | Ration of reads with adapter sequence |
Number of short reads (trim adapter) | Number of short reads after trimming adapter sequence |
% of short reads (trim adapter) | Ratio of short reads after trimming adapter sequence |
Number of reads filtered with DNB | Number of reads with DNB sequence |
% of reads filtered with DNB | Ratio of reads with DNB sequence |
Q10 in clean RNA % | Ratio of Q10 RNA bases after filtering |
Q20 in clean RNA % | Ratio of Q20 RNA bases after filtering |
Q30 in clean RNA % | Ratio of Q30 RNA bases after filtering |
Q10 in MID % | Ratio of Q10 MID bases |
Q20 in MID % | Ratio of Q20 MID bases |
Q30 in MID % | Ratio of Q30 MID bases |
Number of low quality MID | Number of MID with low quality bases |
% of low quality MID | Ratio of MID with low quality bases |
Number of MID with N | Number of MID with N base |
% of MID with N | Ratio of MID with N base |
Number of MID in specific sequence | Number of MID mapped to specific sequences |
% of MID with specific sequence | Ratio of MID mapped to specific sequences |
Q10 in clean MID % | Ratio of Q10 MID bases after filtering |
Q20 in clean MID % | Ratio of Q20 MID bases after filtering |
Q30 in clean MID % | Ratio of Q30 MID bases after filtering |
Number of exact MID | Number of reads exactly mapped to MID |
% of exact MID | Ratio of reads exactly mapped to MID |
Number of inexact MID | Number of reads inexactly mapped to MID |
% of inexact MID | Ratio of reads inexactly mapped to MID |
注释统计信息
read 经过基因注释之后,该环节的统计文件、详细信息被保存在 /STEREO_ANALYSIS_WORKFLOW/ANNOTATION/*.bam.summary.stat
目录下。
Metric | Description |
---|---|
Number of total reads | Number for total reads aligned to genome |
Number of reads to be annotated | Number of reads that will be annotated with GTF/GFF annotation database |
% of reads to be annotated | % of reads that will be annotated with GTF/GFF annotation database |
Number of uniquely mapped reads to be annotated | Number of reads to be annotated which are uniquely mapped to genome |
% of uniquely mapped reads to be annotated | Ratio of reads to be annotated which are uniquely mapped to genome |
Number of multi-mapped reads to be annotated | Number of reads to be annotated which are multi-mapped to genome |
% of multi-mapped reads to be annotated | Ratio of reads to be annotated which are multi-mapped to genome |
Number of multi-mapped reads | Number of reads multi-mapped to genome |
Number of reads mapped to transcriptome | Number of reads mapped to transcriptome, including exon and intron regions. |
% of reads mapped to transcriptome | % of reads mapped to transcriptome, including exonic and intronic regions. |
Number of unique captures (on CID, gene and MID) | Number of unique captures for reads, based on CID, gene and MID information |
% of unique captures (on CID, gene and MID) | % of unique captures for reads, based on CID, gene and MID information |
Number of duplicated reads | Number of duplicated captures for reads, based on CID, gene and MID information |
% of duplicated reads | % of duplicated captures for reads, based on CID, gene and MID information |
Number of reads to be annotated | Number of reads that will be annotated with GTF/GFF annotation database |
Number of reads mapped to exonic regions | Number of reads mapped to exonic regions |
% of reads mapped to exonic regions | % of reads mapped to exonic regions |
Number of reads mapped to intronic regions | Number of reads mapped to intronic regions |
% of reads mapped to intronic regions | % of reads mapped to intronic regions |
Number of reads mapped to intergenic regions | Number of reads mapped to intergenic regions |
% of reads mapped to intergenic regions | % of reads mapped to intergenic regions |
Number of reads mapped antisense to gene | Number of reads mapped antisense to gene |
% of reads mapped antisense to gene | % of reads mapped antisense to gene |
Number of reads mapped to rRNA | Numder of reads mapped to rRNA regions |
Number of rRNA reads in uniquely mapped | Numder of uniquely mapped reads mapped to rRNA regions |
% of rRNA reads in uniquely mapped | % of uniquely mapped reads mapped to rRNA regions |
Number of rRNA reads in multi-mapped | Numder of multi-mapped reads mapped to rRNA regions |
% of rRNA reads in multi-mapped reads | % of multi-mapped reads mapped to rRNA regions |