Matrices

Gene Expression File (GEF)

基因表达文件 (GEF) 是一种数据管理和存储格式,旨在支持多维数据集和高计算效率。Stereo-seq 分析工作流程生成 bin GEF 和 cellbin GEF 文件。Bin GEF 文件格式是一种分层结构的数据模型,以各种 bin 大小存储一个或多个组合的基因表达矩阵。Cellbin GEF 文件格式存储每个细胞内的表达信息。每个 GEF 容器组织一个空间基因表达矩阵集合。它包括两个主要数据对象:Group 和 Dataset。数据集是数据元素的多维数组。Group 对象类似于以层次结构组织数据集和其他组的文件系统目录。

Gene Expression Matrix (GEM)

基因表达矩阵 (GEM) 存储基因空间表达数据。SAW 在分析流程中生成多个基因表达矩阵文件,基本格式需要六列,标题行显示列名。六列分别是基因 ID、基因名称、x 坐标、y 坐标、MID 计数和 exon 计数,如果是 cellbin 数据,那么会有第七列记录细胞 ID。最大面积外接矩形区域的表达矩阵标题在列行前有多个以“#”开头的注释行,标题字段名称和字段类型在表中描述。

文件类型

SAW 分析流程输出的表达矩阵文件主要包括两种类型,bin GEF 和 cellbin GEF。可以通过文件后缀名来快速识别:

File extensionDescription
.gef

The feature expression matrix file in HDF5 format for visualization. It contains the MID count for each gene of each spot. A spot is a binning unit that has a fixed-sized square shape in which the expression value in this square is accumulated. By default, a visualization .gef includes spot sizes of bin 1, 5, 10, 20, 50, 100, 150, 200.

.cellbin.gef

The cellbin feature expression matrix file in HDF5 format. It contains the spatial location and area of each cell, the MID count for each gene of each cell, and the cluster the cell belongs to. In .cellbin.gef, the cell is the smallest data unit.


Only available when the cell segmentation was done based on an microscopy image.

常见矩阵文件

SAW countSAW realign 输出的表达矩阵文件通常为:

FileDescription
<SN>.raw.gefFeature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.gefFeature expression matrix. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.tissue.gefFeature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.cellbin.gefCellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area.
<SN>.adjusted.cellbin.gefCellbin expression matrix with cell border expanding, based on <<SN>_<stainType>_mask_edm_dis_<distance>.tif.

微生物相关

基于 Stereo-seq N FFPE 组织样本进行分析,在运行 SAW count 分析任务时,设置--microorganism-detect 参数,输出的微生物相关的表达矩阵文件就被保存在 /outs/feature_expression/microorganism 目录下,具体内容如下:

FileDescription
<SN>.microorganism.raw.gefFeature expression matrix of microorganisms includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.microorganism.gefFeature expression matrix of microorganisms. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.host_microorganism.raw.gefFeature expression matrix of microorganisms and the host includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.host_microorganism.gefFeature expression matrix of microorganisms and the host. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.microorganism.<classification>.gem

Feature expression matrix of a specific classification of microbes.

Classifications of microorganisms include phylum, class, order, family, genus, and species.

蛋白组相关

如使用 SAW countSAW realign 对 Stereo-CITE T FF 组织样本分析 ,其空间蛋白表达矩阵将保存在 /outs/feature_expression 中。

具体内容如下:

FileDescription
<SN>.protein.raw.gefFeature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.protein.gefFeature expression matrix. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.protein.tissue.gefFeature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.protein.cellbin.gefCellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area.
<SN>.protein.adjusted.cellbin.gefCellbin expression matrix with cell border expanding, based on <SN>_<stainType>_mask_edm_dis_<distance>.tif.
<SN>.protein.tissue.rmbg.gem.gzFeature expression matrix from automatic protein background removal. It shows bin1 expression counts.
© 2025 STOmics Tech. All rights reserved.Modified: 2025-03-07 10:28:04

results matching ""

    No results matching ""