web123456

umi matrix for single-cell Seurat - with FEATURES, COUNTS (for QC)

catalogs

About umi matrix learning

Calculate feature, counts values with umi

①Meta Data Viewing

②Count and Feature calculation (automatically calculated when generating Seurat)

1) Extracting the UMI matrix

2) Calculations

Other indicators

Assessment of quality indicators (focus)

1) UMI Count

2) Gene counting

3)UMIs vs. genes detected

4) Mitochondrial count ratio

5) Integrated filtration

Filtering to extract subsets

with respect toumimatrix learning

The 10X data does not need to account for the effect of gene length because of UMI, but it still needs to account for differences in sequencing depth from cell to cell, so it needs to be used with thefunction (math.)LogNormalize is processed by taking the UMI of the gene/all UMIs of the cell and multiplying by 10,000, and after LogNormalizing by column, it can then be scaled by row to remove the effect of very large and very small value genes on the data.About the processing of single-cell TPM, Count data_What is umicount and readcount of rds file - CSDN Blogs

Calculate feature, counts values with umi

scRNA-seq quality control processscRNA-seq-Quality Control ()

①Meta Data Viewing

③ Seurat process for single-cell learning-pbmc_seurat Deletion of discrete cells - CSDN Blogs

Seurat automatically creates some metadata for each cell data <- pbmc@

  1. rm(list=ls())
  2. library(dplyr)
  3. library(Seurat)
  4. library(patchwork)
  5. ## Read the data
  6. pbmc.data <- Read10X(data.dir = "F:/##24 years of single-cell processing##/pbmc3k_filtered_gene_bc_matrices")
  7. ## Create Seruat objects
  8. pbmc <- CreateSeuratObject(counts = pbmc.data,
  9. project = "pbmc3k",
  10. = 3, # : how many cells each feature is expressed in at least (feature=gene)
  11. = 200) # : how many features are detected at least in each cell
  12. pbmc
  13. data <- pbmc@meta.data
> head(data)
                  nCount_RNA nFeature_RNA
AAACATACAACCAC-1     pbmc3k       2419          779
AAACATTGAGCTAC-1     pbmc3k       4903         1352
AAACATTGATCAGC-1     pbmc3k       3147         1129
AAACCGTGCTTCCG-1     pbmc3k       2639          960
AAACCGTGTATGCG-1     pbmc3k        980          521
AAACGCACTGGTAC-1     pbmc3k       2163          781

  • : Usually contains sample identifiers, usually defaults toprojectfor the identity we assign to them

  • nCount_RNA : Number of UMIs per cell

  • nFeature_RNA : Number of genes detected per cell (non-zero number)


②Count and Feature calculation (generation)Automatic calculation at Seurat
1) Extracting the UMI matrix
  1. # Extract the UMI matrix
  2. exp <- GetAssayData(pbmc, slot = "counts", assay = "RNA")
  3. umi_df <- data.frame(exp)
  4. umi_df[1:5,1:3]
umi_df[1:5,1:3]
              AAACATACAACCAC.1 AAACATTGAGCTAC.1 AAACATTGATCAGC.1
AL627309.1                   0                0                0
AP006222.2                   0                0                0
RP11-206L10.2                0                0                0
RP11-206L10.9                0                0                0
LINC00115                    0                0                0
2) Calculations
  1. dat <- cbind(colnames(umi_df),
  2. colSums(umi_df), ## Sum of each UMI column
  3. colSums(umi_df != 0))#Number of genes detected per columnfeature
  4. dat1 <- apply(dat[,c(2,3)],2,as.numeric) # Converted to numeric
  5. rownames(dat1) <- (dat)
  6. colnames(dat1) <- c("UMI1","UMI2")
  7. dat1 <- as.data.frame(dat1)
> head(dat1)
                 UMI1 UMI2
AAACATACAACCAC.1 2419  779
AAACATTGAGCTAC.1 4903 1352
AAACATTGATCAGC.1 3147 1129
AAACCGTGCTTCCG.1 2639  960
AAACCGTGTATGCG.1  980  521
AAACGCACTGGTAC.1 2163  781

Other indicators
  • number of genes detected per UMI: This metric gives us a good idea of thedata sethave an idea of the complexity of the data (the more genes detected per UMI, the more complex our data will be)

  • mitochondrial ratio: This metric will give us a percentage of cellular reads originating from mitochondrial genes.

The number of genes per UMI per cell is very easy to calculate, and we will perform a log10 transformation of the results to better compare between samples.

  1. # Add number of genes per UMI for each cell to metadata
  2. pbmc$log10GenesPerUMI <- log10(pbmc$nFeature_RNA) / log10(pbmc$nCount_RNA)

PercentageFeatureSet()will adopt a certainmouldand search for gene identifiers. This function makes it easy to calculate the percentage of all counts belonging to a subset of possible functions for each cell.

  1. # Compute percent mito ratio
  2. pbmc$mitoRatio <- PercentageFeatureSet(object = pbmc, pattern = "^MT-")
  3. pbmc$mitoRatio <- pbmc@meta.data$mitoRatio / 100

Attention:The pattern provided (" ^ MT-") applies to human gene names.

New original data view

  1. data1 <- pbmc@meta.data
  2. head(data1)
> head(data1)
                  nCount_RNA nFeature_RNA log10GenesPerUMI
AAACATACAACCAC-1     pbmc3k       2419          779        0.8545652
AAACATTGAGCTAC-1     pbmc3k       4903         1352        0.8483970
AAACATTGATCAGC-1     pbmc3k       3147         1129        0.8727227
AAACCGTGCTTCCG-1     pbmc3k       2639          960        0.8716423
AAACCGTGTATGCG-1     pbmc3k        980          521        0.9082689
AAACGCACTGGTAC-1     pbmc3k       2163          781        0.8673469

Assessment of quality indicators (focus)
  • cell count

  • UMI counts per cell

  • Genes detected per cell

  • UMI and detected genes

  • Mitochondrial ratio

  • Novelty

scRNA-seq-Quality Control ()

③ Seurat process for single-cell learning-pbmc_seurat Deletion of discrete cells - CSDN Blogs

1) UMI Count

nCount_RNA : Number of UMIs per cell: no relevant QC done here!

visualization

  1. # Visualize the number UMIs/transcripts per cell
  2. library(ggplot2)
  3. data1 %>%
  4. ggplot(aes(color='', x= nCount_RNA, fill= '')) +
  5. geom_density(alpha = 0.2) +
  6. scale_x_log10() +
  7. theme_classic() +
  8. ylab("Cell density") +
  9. geom_vline(xintercept = 500)

2) Gene counting

nFeature_RNA : Number of genes detected per cell

  1. data1 %>%
  2. ggplot(aes(color='', x= nFeature_RNA, fill= '')) +
  3. geom_density(alpha = 0.2) +
  4. scale_x_log10() +
  5. theme_classic() +
  6. ylab("Cell density") +
  7. geom_vline(xintercept = 500)

3)UMIs vs. genes detected

Two metrics that are typically evaluated together are the number of UMIs and the number of genes detected per cell. Here, we mapped theRelationship between the number of genes and the number of UMIs as a proportion of mitochondrial readsFigure. The mitochondrial reads fraction is only high (light blue) in cells with exceptionally low counts where few genes were detected. This could be damaged/dead cells whose cytoplasmic mRNAs have leaked out through ruptured membranes, so that only the mRNAs located in the mitochondria remain conserved. These cells are filtered out by our count and gene number thresholds. Combined visualization of counts and gene thresholds reveals thatCombined filtration effect

Poor quality cells are likely to have a low number of genes and UMI per cell and correspond to the data points in the lower left quadrant of the graph. Good cells will typically exhibit more genes per cell and a higher number of UMIs.

With this figure, we also assessed the slope of the line, as well as any scatter of data points in the lower right quadrant of the plot. These cells have a large number of UMIs, but only a few genes. These may be dying cells, but may also represent a population of low-complexity cell types (i.e., erythrocytes).

  1. # Visualize the correlation between genes detected and number of UMIs and determine whether strong presence of cells with low numbers of genes/UMIs
  2. p <- data1 %>%
  3. ggplot(aes(x=nCount_RNA, y=nFeature_RNA, color=mitoRatio)) +
  4. geom_point() +
  5. scale_colour_gradient(low = "gray90", high = "black") +
  6. stat_smooth(method=lm) +
  7. scale_x_log10() +
  8. scale_y_log10() +
  9. theme_classic() +
  10. geom_vline(xintercept = 500) +
  11. geom_hline(yintercept = 250) +
  12. facet_wrap(~)
  13. p

4) Mitochondrial count ratio

This indicator allows for the identification ofDead or dying cellsIs there a large number ofmitochondrial contamination. We define a sample with poor quality mitochondrial counts as cells labeled with more than a 0.2 mitochondrial ratio, unless you want this in your sample.

  1. # Visualize the distribution of mitochondrial gene expression detected per cell
  2. p1 <- data1 %>%
  3. ggplot(aes(color=, x=mitoRatio, fill=)) +
  4. geom_density(alpha = 0.2) +
  5. scale_x_log10() +
  6. theme_classic() +
  7. geom_vline(xintercept = 0.2)
  8. p1

5) Integrated filtration

We can see that the samples where we sequenced fewer cells per cell have a higher overall complexity, and this is because we have not yet begun saturation sequencing of any given gene in these samples. In these samples theexceptionsValue cells may be cells with simpler RNA species than other cells. Sometimes we can detect contamination of low-complexity cell types, such as red blood cells, by this indicator. In general, we expect a NOVELTY score of 0.80 or higher.

  1. # Visualize the overall complexity of the gene expression by visualizing the genes detected per UMI
  2. p3 <- data1 %>%
  3. ggplot(aes(x=log10GenesPerUMI, color = , fill=)) +
  4. geom_density(alpha = 0.2) +
  5. theme_classic() +
  6. geom_vline(xintercept = 0.8)
  7. p3

Filtering to extract subsets
  1. #Filtering
  2. pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & < 5)
  3. pbmc

scRNA-seq-Quality Control ()

[1] The code can be used to calculate this indicator on your own./hbctraining/scRNA-seq/blob/master/lessons/
[2] Scrublet: /AllonKleinLab/scrublet