SeekSpace小鼠脑demo数据分析¶
背景信息¶
本⽂档是关于寻因SeekSpace技术所得到的数据及其seurat分析⽅法的说明。
寻因SeekSpace技术可以检测到单细胞精度的基因表达数据,同时也能定位出每个细胞在组织中的空间坐标。
SeekSpace的数据分析简便易学,能够很方便地兼容常见单细胞转录组的分析软件,例如seurat和scanpy。
SeekSpace技术中一个像素点的大小是约为 0.2653 微米,将像素点的坐标乘以 0.2653 即可转换计算细胞在真实空间上的距离。
本文档的输入数据为小鼠脑数据的分析结果。首先使用SeekSpace tools,在默认参数下进行分析;然后,使用seurat进行聚类;进一步,将具有高比例线粒体cluster的细胞过滤掉,过滤后的结果作为本文档输入数据。
⽂件说明¶
SeekSpace技术的配套的基础数据处理的软件是SeekSpace tools,该软件可以从测序文库中识别细胞的表达信息,同时可以识别出每个细胞在空间上的位置。
经过SeekSpace tools软件处理之后所得到的结果文件格式如下:
├── WTH1092_filtered_feature_bc_matrix # 表达矩阵⽬录,可以使⽤seurat的Read10X命令进⾏读取
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ ├── matrix.mtx.gz
│ └── cell_locations.tsv.gz 为⼩⿏脑测序数据中细胞的空间坐标⽂件。第1列是barcode,顺序与matrix中的filtered_feature_bc_matrix/barcode中细胞的顺序⼀致;第2列和第3列分别为该barcode所代表的细胞的空间位置(即空间芯⽚上的像素坐标)。
├── WTH1092_aligned_DAPI.png # 为⼩⿏脑组织切⽚的DAPI染⾊图⽚
├── WTH1092_aligned_HE_TIMG.png # 为⼩⿏脑组织切⽚的HE染⾊图⽚
├── seekspace_of_seurat.ipynb # 为使⽤seurat分析该⼩⿏脑空间数据的jupyter示例⽂件
└── seekspace_of_scanpy.ipynb # 为使⽤scanpy分析该⼩⿏脑空间数据的jupyter示例⽂件
import scanpy# Core scverse libraries
import scanpy as sc
import anndata as ad
import pandas as pd
# Data retrieval
import pooch
import warnings
warnings.filterwarnings("ignore")
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor="white")
scanpy==1.9.2 anndata==0.8.0 umap==0.5.3 numpy==1.23.5 scipy==1.10.1 pandas==1.5.3 scikit-learn==1.2.1 statsmodels==0.13.5 python-igraph==0.10.3 pynndescent==0.5.8
读取数据矩阵¶
adata = sc.read_10x_mtx(
"./Outs/WTH1092_filtered_feature_bc_matrix/",
cache=True
)
... writing an h5ad cache file to speedup reading next time
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 22870 × 32285 var: 'gene_ids', 'feature_types'
降维聚类¶
## Preprocessing
adata.var["mt"] = adata.var_names.str.startswith("mt-")
sc.pp.calculate_qc_metrics(
adata, qc_vars=["mt"], percent_top=None, log1p=False, inplace=True
)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
adata.raw = adata
adata = adata[:, adata.var.highly_variable]
sc.pp.regress_out(adata, ["total_counts", "pct_counts_mt"])
sc.pp.scale(adata, max_value=10)
normalizing counts per cell finished (0:00:00) extracting highly variable genes finished (0:00:01) --> added 'highly_variable', boolean vector (adata.var) 'means', float vector (adata.var) 'dispersions', float vector (adata.var) 'dispersions_norm', float vector (adata.var) regressing out ['total_counts', 'pct_counts_mt'] sparse input is densified and may lead to high memory use finished (0:02:49)
## Principal component analysis
sc.tl.pca(adata, svd_solver="arpack")
computing PCA on highly variable genes with n_comps=50 finished (0:00:07)
## Computing the neighborhood graph
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
computing neighbors using 'X_pca' with n_pcs = 40 finished: added to `.uns['neighbors']` `.obsp['distances']`, distances for each pair of neighbors `.obsp['connectivities']`, weighted adjacency matrix (0:00:18)
## Embedding the neighborhood graph
#sc.tl.paga(adata)
#sc.pl.paga(adata, plot=False) # remove `plot=False` if you want to see the coarse-grained graph
#sc.tl.umap(adata, init_pos='paga')
sc.tl.umap(adata)
sc.tl.tsne(adata)
computing UMAP finished: added 'X_umap', UMAP coordinates (adata.obsm) (0:00:15) computing tSNE using 'X_pca' with n_pcs = 50 using sklearn.manifold.TSNE finished: added 'X_tsne', tSNE coordinates (adata.obsm) (0:01:28)
## Clustering the neighborhood graph
sc.tl.leiden(
adata,
resolution=0.9,
random_state=0,
n_iterations=2,
directed=False,
)
running Leiden clustering finished: found 21 clusters and added 'leiden', the cluster labels (adata.obs, categorical) (0:00:00)
sc.pl.umap(adata, color=['Cst3'])
sc.pl.umap(adata, color=["Cst3"], use_raw=False)
sc.pl.embedding(adata,"umap",color=["leiden"])
添加空间坐标¶
spatial_df = pd.read_csv("./Outs/WTH1092_filtered_feature_bc_matrix/cell_locations.tsv.gz",index_col=0,sep='\t')
spatial_df.columns = ['spatial_1', 'spatial_2']
spatial_df = pd.merge(adata.obs, spatial_df, left_index=True, right_index=True)[["spatial_1","spatial_2"]]
adata.obsm["spatial"] = spatial_df.values
adata
AnnData object with n_obs × n_vars = 22870 × 5044 obs: 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden' var: 'gene_ids', 'feature_types', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'tsne', 'leiden', 'leiden_colors' obsm: 'X_pca', 'X_umap', 'X_tsne', 'spatial' varm: 'PCs' obsp: 'distances', 'connectivities'
sc.settings.set_figure_params(figsize=(10, 7))
Save the result.
sc.settings.set_figure_params(figsize=(15, 7))
sc.pl.embedding(adata,"spatial",color=["leiden"])
添加细胞注释结果¶
接下来我们再把细胞的注释结果添加到每个细胞上,这样我们就可以看不同的细胞类型在空间上的分布情况了。
SeekSpace细胞注释的过程,跟普通的单细胞注释的过程完全一致。
我们之前已经对demo数据进行了大群和亚群的注释,注释的结果文件放在了anntation.csv文件中。
接下来我们简单的读入处理一下。
cell_anno = pd.read_csv("./annotation.csv",index_col=0,sep=',')
adata.obs = pd.merge(adata.obs,cell_anno,left_index=True,right_index=True)
adata
AnnData object with n_obs × n_vars = 22870 × 5044 obs: 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'sc_barcode', 'Main_CellType', 'Sub_CellType' var: 'gene_ids', 'feature_types', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'log1p', 'hvg', 'pca', 'neighbors', 'umap', 'tsne', 'leiden', 'leiden_colors' obsm: 'X_pca', 'X_umap', 'X_tsne', 'spatial' varm: 'PCs' obsp: 'distances', 'connectivities'
sc.settings.set_figure_params(figsize=(7, 7))
sc.pl.embedding(adata,"umap",color=["Main_CellType"])
sc.settings.set_figure_params(figsize=(7, 7))
sc.pl.embedding(adata,"umap",color=[ "Sub_CellType"])
sc.settings.set_figure_params(figsize=(15, 7))
sc.pl.embedding(adata,"spatial",color=["Main_CellType"])
sc.settings.set_figure_params(figsize=(15, 7))
sc.pl.embedding(adata,"spatial",color=["Sub_CellType"])
adata.write("WTH1092_demo_mouse_brain.h5ad")