Welcome to the Sangshin Training Manual!
In all quantitative analyses of RNA_seq data, the reads are first compared to the reference genome, and then the quantitativehardwarePerforming quantification, such as the classic hisat+stringTie analysis strategy, works on the same principle for single-cell transcriptomes, except that due to the introduction of theUMI
The design of the label needs to consider the same when quantifying theUMI
The tags are from the same transcript, and it would be inappropriate to use traditional analysis software directly.
The official cell ranger software provides not only data splitting, but also quantitative and other analytical content.
The premise of quantification is to compare the reads to the reference genome, the first step for comparison is to index the reference genome, the official website provides the reference genome of human and mouse for download, the URL is as follows
https://support./single-cell-gene-expression/software/downloads/latest
For other species, we only need to have the fasta file of the genome and the gtf file of the transcript to customize the reference genome in the following steps
1. Filtering of GTF documents
In the original GTF file, there will be very many types of genes that can bemkgtf
subcommand, which screens for genes of interest, is used as follows
-
cellranger mkgtf \
-
\
-
\
-
--attribute=gene_biotype:protein_coding
pass (a bill or inspection etc)attribute
attribute to filter, only records corresponding to protein coding genes were filtered in the above example.
2. Creation of indexes
pass (a bill or inspection etc)mkref
subcommand to build an index, as follows
-
cellranger mkref \
-
--genome=output_genome \
-
--nthreads=10 \
-
--fasta= \
-
--genes=
genome
parameter specifies the directory of the output results, the directory structure after building the index is as follows
-
.
-
├── fasta
-
│ ├──
-
│ └──
-
├── genes
-
│ └──
-
├── pickle
-
│ └──
-
├──
-
└── star
It can be seen that cell ranger establishes the genomicSTAR
The index of theSTAR
Compare reads to a reference genome.
Quantitative analysis was performed bycount
The subcommand is implemented with the following usage
-
cellranger count \
-
--id=sample345 \
-
--transcriptome=database_path \
-
--fastqs=fastq_path \
-
--sample=mysample \
id
parameter specifies the name of the output directory, thetranscriptome
parameter specifies the directory where the genome index is located.fastqs
indicate clearly and with certaintymkfastq
directory where the sequence file generated by the command is located.sample
parameter specifies the samples to be analyzed in thefastq_path
under which corresponds to a subdirectory.
count
The subcommands not only perform quantitative analysis, but also provide theclusteringThe output results are recorded in a number of files for the purpose of outputting the results, and we will explain the output results of this command in detail in the following sections.
·end·
-If you like it, share it with your friends!
Scan and follow the micro-signal, more exciting content waiting for you!