BAM

BAM文件为二进制格式,用于保存序列的比对和基因注释信息。SAW count 在BAM文件的可选字段中添加自定义标签,用于记录read的坐标、CID、MID信息,注释信息通过标签字段添加在BAM中。

标签信息

BAM 可选字段中添加的自定义标签信息:

TagDescription
Cx:ix coordinate of the Coordiante ID.
Cy:iy coordinate of the Coordiante ID.
UR:ZThe hexadecimal representation of uncorrected binary-encoded MID.
XF:iMapping region on the reference genome. Valid value: 0=EXONIC, 1=INTRONIC, 2=INTERGENIC, 3=rRNA, 4=ANTISENSE.
GI:ZAnnotated gene ID.
GE:ZAnnotated gene name.
GS:Z‘+’ or ‘-’, indicating forward/reverse strand respectively.
UB:ZThe hexadecimal representation of count corrected binary-encoded MID.

原始比对输出的BAM示例:

E100026571L1C009R00301275185    16      1       3000095 255     26M121066N74M   *       0       0       GGCTTTTTTTTTTTTTTTTTTTTTTTTTTTCTAAATATTGGGTTTTATTAGCACCATGATAACTGTATATTAATTTGCACTGACTGTCATAACAAAATAC      G+:GFFGGFGFFGFFGFGGFFGFFFFFCFGFCFGGGFGGFGFFFFGGFGGFGFFFGGFFGFFFGFGFGFFGFFGFGFFFFGFFFFFFFFGGFFGGFFGEF    NH:i:1  HI:i:1    AS:i:88 nM:i:0  Cx:i:4826       Cy:i:11598      UR:Z:6FA29

基因注释后的BAM示例:

E100026571L1C002R00703943265    1040    1       3082766 255     11M132671N89M   *       0       0       CTGCTGCAGCTTTTTTTTCTTTGAGATTTATTTTTATGCTATGTGTATGGGTATTTTGCCTGCATATATGTCTATGCACCATGTGTGTGCAGTGCTTGAG    FFFFFECGFDCFGDGDFEE@EEGIBFGGCGFFGACGFCGFFDGDGFFFFFFEGCDFCGFFGG@FFF=EFFDGGGGGFDGFFFGGGFGFFGGGFFGGGDFG    NH:i:1  HI:i:1  AS:i:88 nM:i:0  Cx:i:7767       Cy:i:18052      UR:Z:7AE49      XF:i:0  GI:Z:ENSMUSG00000051951 GE:Z:Xkr4       GS:Z:-  UB:Z:79E49

比对统计信息

测序 FASTQ 经过 read 比对之后,该环节的统计文件、详细信息被保存在 /STEREO_ANALYSIS_WORKFLOW/ALIGNMENT/<lane>.CIDMap.stat目录下。

MetricDescription
Number of CID in chip maskNumber of CIDs in the chip mask file
Number of unique CID in FASTQNumber of unique CIDs in FASTQs
Number of total readsNumber of total reads in FASTQs
Q10 in CID %Ratio of Q10 CID bases
Q20 in CID %Ratio of Q20 CID bases
Q30 in CID %Ratio of Q30 CID bases
Number of mapped CIDNumber of reads mapped to CID
% of mapped CIDRatio of reads mapped to CID
Number of exactly mapped CIDNumber of reads exactly mapped to CID
% of exactly mapped CIDRatio of reads exactly mapped to CID
Number of CID with mismatchNumber of reads mapped to CID with mismatch
% of CID with mismatchRatio of reads mapped to CID with mismatch
Q10 in RNA %Ratio of Q10 RNA bases
Q20 in RNA %Ratio of Q20 RNA bases
Q30 in RNA %Ratio of Q30 RNA bases
Number of reads with polyANumber of reads with polyA sequence
% of reads with polyARatio of reads with polyA sequence
Number of short reads (trim polyA)Number ot short reads after trimming polyA sequence
% of short reads (trim polyA)Ration ot short reads after trimming polyA sequence
Number of reads with adapterNumber of reads with adapter sequence
% of reads with adapterRation of reads with adapter sequence
Number of short reads (trim adapter)Number of short reads after trimming adapter sequence
% of short reads (trim adapter)Ratio of short reads after trimming adapter sequence
Number of reads filtered with DNBNumber of reads with DNB sequence
% of reads filtered with DNBRatio of reads with DNB sequence
Q10 in clean RNA %Ratio of Q10 RNA bases after filtering
Q20 in clean RNA %Ratio of Q20 RNA bases after filtering
Q30 in clean RNA %Ratio of Q30 RNA bases after filtering
Q10 in MID %Ratio of Q10 MID bases
Q20 in MID %Ratio of Q20 MID bases
Q30 in MID %Ratio of Q30 MID bases
Number of low quality MIDNumber of MID with low quality bases
% of low quality MIDRatio of MID with low quality bases
Number of MID with NNumber of MID with N base
% of MID with NRatio of MID with N base
Number of MID in specific sequenceNumber of MID mapped to specific sequences
% of MID with specific sequenceRatio of MID mapped to specific sequences
Q10 in clean MID %Ratio of Q10 MID bases after filtering
Q20 in clean MID %Ratio of Q20 MID bases after filtering
Q30 in clean MID %Ratio of Q30 MID bases after filtering
Number of exact MIDNumber of reads exactly mapped to MID
% of exact MIDRatio of reads exactly mapped to MID
Number of inexact MIDNumber of reads inexactly mapped to MID
% of inexact MIDRatio of reads inexactly mapped to MID

注释统计信息

read 经过基因注释之后,该环节的统计文件、详细信息被保存在 /STEREO_ANALYSIS_WORKFLOW/ANNOTATION/*.bam.summary.stat目录下。

MetricDescription
Number of total readsNumber for total reads aligned to genome
Number of reads to be annotatedNumber of reads that will be annotated with GTF/GFF annotation database
% of reads to be annotated% of reads that will be annotated with GTF/GFF annotation database
Number of uniquely mapped reads to be annotatedNumber of reads to be annotated which are uniquely mapped to genome
% of uniquely mapped reads to be annotatedRatio of reads to be annotated which are uniquely mapped to genome
Number of multi-mapped reads to be annotatedNumber of reads to be annotated which are multi-mapped to genome
% of multi-mapped reads to be annotatedRatio of reads to be annotated which are multi-mapped to genome
Number of multi-mapped readsNumber of reads multi-mapped to genome
Number of reads mapped to transcriptomeNumber of reads mapped to transcriptome, including exon and intron regions.
% of reads mapped to transcriptome% of reads mapped to transcriptome, including exonic and intronic regions.
Number of unique captures (on CID, gene and MID)Number of unique captures for reads, based on CID, gene and MID information
% of unique captures (on CID, gene and MID)% of unique captures for reads, based on CID, gene and MID information
Number of duplicated readsNumber of duplicated captures for reads, based on CID, gene and MID information
% of duplicated reads% of duplicated captures for reads, based on CID, gene and MID information
Number of reads to be annotatedNumber of reads that will be annotated with GTF/GFF annotation database
Number of reads mapped to exonic regionsNumber of reads mapped to exonic regions
% of reads mapped to exonic regions% of reads mapped to exonic regions
Number of reads mapped to intronic regionsNumber of reads mapped to intronic regions
% of reads mapped to intronic regions% of reads mapped to intronic regions
Number of reads mapped to intergenic regionsNumber of reads mapped to intergenic regions
% of reads mapped to intergenic regions% of reads mapped to intergenic regions
Number of reads mapped antisense to geneNumber of reads mapped antisense to gene
% of reads mapped antisense to gene% of reads mapped antisense to gene
Number of reads mapped to rRNANumder of reads mapped to rRNA regions
Number of rRNA reads in uniquely mappedNumder of uniquely mapped reads mapped to rRNA regions
% of rRNA reads in uniquely mapped% of uniquely mapped reads mapped to rRNA regions
Number of rRNA reads in multi-mappedNumder of multi-mapped reads mapped to rRNA regions
% of rRNA reads in multi-mapped reads% of multi-mapped reads mapped to rRNA regions
© 2025 STOmics Tech. All rights reserved.Modified: 2025-03-07 10:28:04

results matching ""

    No results matching ""