Sequencing Depth (Sequencing Depth) means: the total number of bases obtained by sequencing (bp)与基因组(转录组或测序目标区域大小)specific gravity,It is one of the metrics for evaluating the amount of sequencing。
The sequencing depth is calculated as:
Sequencing depth = (L × N)/ G
L: length of reads; N: number of reads; G: size of sequencing target region
Coverage (coverage) is the proportion of sequences obtained by sequencing to the whole genome. Due to the presence of high GC, repetitive sequences and other complex structures in the genome, the sequences obtained by sequencing in the final splicing assembly often cannot cover all regions, and this part of the uncovered region is called the Gap. it is worth noting that sequencing coverage refers to the depth of sequencing coverage which is also known as the sequencing depth, which reflects how many reads a region has been sequenced on average.
give an example
Suppose a 2000 bp (G: size of sequencing target region) target sequence is sequenced single-end, and 1000 reads (N: number of read segments) are obtained, each read is 200bp (L: read length), after sequencing and comparing all the reads to the target region, if there is 1800bp in the 2000bp target region with at least 1 read covered, and the remaining 200bp is not covered by any read, the
It can be seen that sequencing depth is positively correlated with coverage.
Whereas the need for sequencing volume in different experiments is related to the size of the target region, mutation type and diseasemouldetc. are closely related.
For whole genome sequencing (WGS), the entire human genome is approximately 3G, and healthy individuals generally need to be sequenced to 30X, i.e., to obtain 90G of valid data; to reliably detect single nucleotides in the genomepolymorphicsinglenucleotide polymorphism (SNP) and insertion-deletion labeling (INDEL), which require sequencing to at least 35X, yielding 105G of validated sequencing data [1]; whereas de novo genome sequencing, which requires genome assembly, with an optimal sequencing depth of 50X [2].
There are approximately 180,000 exons in the human genome, representing 1% of the human genome, or about 30 MB. for whole exome sequencing (WES), due to the increased heterogeneity of the target region and the 50% capture efficiency of the probes, a greater average read depth is required to obtain the same coverage as WGS, covering 89.6-96.8% of the target bases, requiring sequencing up to 80X [1]. WES in healthy individuals is typically measured to 100X, obtaining 6G of data, of which 3G is valid.
For highly heterogeneous cell populations or tissue samples, such as tumors, WGS requires 70-100X and WES requires 160-200X.
RNA sequencing (RNA-seq) has broader applications, and a large number of RNA-seq experiments have now been performed in many cell and tissue types under different conditions, but there are few definitive guidelines on the depth of RNA-seq sequencing, because sequencing requirements often depend on the biological question being studied, as well as on the size and complexity of the transcriptomes being measured.
Unicellular eukaryotes and bacteria, for example, have simpler transcriptomes and lower transcriptional potential. For example, 4 M reads can detect 80% of yeast genes.
Mammals possess tens of thousands of genes, many of which also contain different isoforms and have different levels of expression. In RNA-seq analysis, gene or transcript abundance is usually denoted as RPKM1. ENCODE2 has evaluated this using H1 human embryonic stem cells, and for RPKM>10 genes, 36 M reads per sample is sufficient to accurately quantify 80% of the expression of the gene. However, for genes with low expression levels (FPKM<10), 80 M reads per sample is required for accurate quantification. Therefore, if it is necessary to accurately quantify all genes (including lncRNA genes) in the whole transcriptome, then the samples need to be measured to more than 80 M. If it is only to study the overall expression changes of transcripts with high expression levels, then 36 M reads per sample is sufficient.
If the goal of RNA-seq is to discover new and rare transcripts such as noncoding RNAs or mRNAs newly and variably spliced, it is estimated that more than 400 M reads would be required given the low expression of these transcripts and the bias generated by the library building scheme.
/p/74558512