web123456

stPlus: Accurately enhancing spatial transcriptomics analysis with information from scRNA-seq data

Recent advances in spatial transcriptomics technologies have further enabled simultaneous gene expression profiling and spatial organization mapping of cells. In these techniques, based onimagingmethods can provide higher spatial resolution.However, they are limited by the small number of genes imaged or the low sensitivity of gene detection. Although several methods have been proposed to enhance spatial resolution, the lack of accuracy in gene expression prediction and the lack of cell population identification still hinder the application of these methods.

A research team from Tsinghua University has proposedstPlusA reference sequence-based approach to enhance spatial transcriptomics analysis using information from scRNA-seq data.

What is stPlus?

stPlus is designed to enhance spatial transcriptomics analysis by accurately predicting the expression of undetected genes and efficiently inputting the expression of detected genes.The inputs to stPlus are target spatial data and reference scRNA-seq data from tissues that match or are similar to the spatial data. These two data can be represented by two separate gene-cell matrices. Note that there is a cellular mismatch between the two data, and the genes in the reference data usually include most of the genes in the spatial data. The user can specify any gene in the reference data to be predicted. stPlus output is a gene-cell matrix containing the predicted expression of each specified gene for each cell in the spatial data.

The stPlus enhancement processThis can be divided into three main steps: i) data processing to prepare for co-embedding; ii) co-embedding of individual cells into spatial transcriptome data and reference scRNA-seq data; and iii) prediction of spatially undetectable gene expression based on cell embedding and reference scRNA-seq data.

Performance test results for stPlus

The scientific team compared stPlus with four baseline methods, including SpaGE, Seurat, Liger, and gimVI, for theperformances

  • stPlus outperforms baseline methods in accurately predicting undetected gene expression;

  • stPlus contributes to cell population identification through enhanced spatial transcriptomics;

  • Predicting the spatial expression of scRNA-seq unique genes also offers the potential to characterize cellular heterogeneity;

  • In addition, stPlus is stable and scalable for datasets with different gene detection sensitivity levels, sample sizes and number of genes detected in space.

# stPlus can accurately predict spatial transcriptome data

Comparison of Spearman correlation coefficients between stPlus and other methods at the gene and cellular levels: 1) Gene level. In alldata set set, stPlus consistently achieves better performance than the other methods and improves the median Spearman correlation coefficient by at least 8.8%; and on the STARmap_AllenVISp dataset, stPlus significantly outperforms gimVI and is comparable to SpaGE's performance. 2) Cellular level. In all five datasets, stPlus consistently achieved significantly higher coefficients than the other four methods (unilateral paired Wilcoxon test p-value <0.01) and averaged a 23.2% improvement in median Spearman correlation coefficients over the second-ranked method. All these results demonstrate the superior performance of stPlus in predicting spatially undetectable gene expression.

# stPlus helps identify cell populations

Using computationally predicted gene expression can significantly improveclusteringPerformance. For example, SpaGE and gimVI achieved better clustering performance with less variance on the first three datasets based on the osmFISH dataset.Seurat provided the lowest clustering performance on the first three datasets, and superiority of Seurat over the other methods was observed on the MERFISH dataset, which again suggests that Seurat's performance is strongly influenced by the number of genes shared between the two datasets. The number of genes shared is highly affected. In all four datasets, stPlus achieved the overall best clustering performance, especially on the first three datasets. The conduct of cross-validation experiments can be seen as a strategy for data enhancement. As expected, the baseline performance was significantly improved using data from all genes. On the first three pairs of datasets, the clustering performance exceeds the baseline only when using the gene expression data predicted by stPlus. On the MERFISH_Moffit dataset, stPlus again outperforms the baseline and other computational methods. These results not only show that stPlus is capable of predicting spatially undetectable gene expression, but also indicate that stPlus-enhanced data can provide better cell cluster identification performance than spatial transcriptomic data from existing methods or even the original analysis.

# stPlus scales to large datasets

The research team demonstrated the superior enhanced performance of stPlus on the MERFISH_Moffit dataset, which consists of 64,373 spatial transcriptome cells and 31,299 scRNA-seq cells. stPlus provides satisfactory computational efficiency and scalability in large datasets, while SpaGE achieves the best computational efficiency. Seurat, Liger and gimVI had relatively poor computational efficiency. In visualizing the results, the enhanced spatial gene expression using stPlus allows for better recapitulation of patterns and differentiation of variants in various cell types compared to other methods.

# stPlus is stable to hyperparameter selection

Spatial expression of genes unique to the scRNA-seq data predicted by #stPlus will continue to deepen understanding of cellular heterogeneity characteristics

The research team provided a user-friendly interface, detailed documentation, and quick-start tutorials to facilitate the adoption of stPlus.

The stPlus with detailed documentation is available at the following link:/software/stPlus/

The source code is available at the following link:

/xy-chen16/stPlus  

bibliography

Chen Shengquan, Zhang Boheng, Chen Xiaoyang, Zhang Xuegong, Jiang Rui, stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics, Bioinformatics, Volume 37, Issue Supplement_1, July 2021, Pages i299–i307, 

Images from Bioinformatics official website and references.