web123456

Uploading RNA-seq data to the NCBI GEO database

SRA - NCBI

example - NCBI

It's time to publish the article, and the editor will definitely ask you to upload the NGS sequencing data when reviewing the article.

Generally data is put on clusters, not on personal computers, because some of it is frighteningly large (several T).

So we'll just create a folder and link all the needed fastq files to that folder (copy is too slow and takes up too much space).

Next, how to NCBI account is applied, then you can upload it directly, using aspera.

The commands are as follows:

~/.aspera/connect/bin/ascp -i ~/download/ -QT -l10000m -k1 -d WGS_BACE2_paper* subasp@:uploads/[email protected]_nYYKcqx0/RNAseq

Reference:

Raw Data Extreme Upload NCBI SRA Tutorials - It's more comprehensive, so just basically follow it.

Downloading genomic data from EBI or NCBI using Aspera - Supplementary aspera usage

Tutorial: How to upload your data to the evil Sequence Read Archive (SRA)? - It's in English. It's more standardized.

 

What you need to download is a program calledaspera connect(used form a nominal expression)hardwareThere are too many software programs in the aspera series, so don't get the wrong one.

This oddball piece of software must go toLinuxThe download link will only appear in your browser, so you have to have an Ubuntu system. Download it and then transfer it to the cluster.

Note that there is a key (-i option) that needs to be downloaded from ncbi, it's on the final upload page.

 

NCBI uploads the data to the SRA and everything goes relatively smoothly, except that you need to gather some information and fill out some forms yourself.

But I had a huge problem that took me all night to solve, so be patient and read the NCBi error message.

Your table upload failed because multiple BioSamples cannot have identical attributes

It's when you fill out the sample form that you can't have the exact same information in all columns except for some required columns!!!!

So my final solution was to either COPY the sample name (which is definitely different) into a column; or just fill in a column with 1 cumulative number to prevent duplicates.

If you're not good at looking at the wrong report, you really don't know exactly where this step is wrong. And most of the teacher's samples is the sample name is different, other information is the same, so look at the online help is still a lot of people, Baidu basically no correct answer.