web123456

Column 4: Parsing cellphoneDB (V3-based)

New version of the websiteNew github version

Older versions of the siteOlder versions of github

This column tries to combine the two as much as possible (the new version adds things, but the framework remains the same)

Official tutoriol

20231115 Update

Note that the V3 version is no longer officially maintained and the V5 version has already appeared.

Data preparationinput:

  1. expression matrix (math.)

The expression matrix counts file has the following alternatives

  • a text file, a matrix in txt format.

Use seurat's GetAssayData(seuratobj,slot = "count")

count = GetAssayData(seuratobj,slot = "count")

Easy to extract, then just export to txt format

2023.11.15 Addendum: a faster fwrite could be usedfunction (math.)

  1. library()
  2. # Use fwrite instead
  3. fwrite(count, file = "", sep = "\t", quote = FALSE, = TRUE, = TRUE)
(count,file = "",quote = F,sep = "\t", = T, = T)
  • h5ad (recoommended), 

If you are using h5ad, just change the suffix at runtime

cellphonedb method analysis test_meta.txt test_counts.h5ad
  • h5

  • a path to a folder containing mtx/barcode/features. You can use the output of cellranger directly

  • NOTE: Your gene/protein ids must be HUMAN. If you are working with another specie such as mouse, we recommend you to convert the gene ids to their corresponding orthologous.If it's not a personIf you want to convert

A txt file containing two columns, one for barcodes and one for cluster/celltype information

file ( if method degs_analysis)

Required if method degs_analysis is selected, this DEG can come from Seurat's search results

This is a two columns file indicanting which gene is specific or upregulated in a cell type (see example ). The first column should be the cell type/cluster name (matching those in ) and the second column the associated gene id. The remaining columns are ignored. We provide notebooks for both Seurat and Scanpy users. It is on you to design a DEG analysis appropiated for your research question.

To summarize the above words: the first column is the celltype that matches the meta information, the second column is the gene, and the rest will be ignored

Provides access to seurat and scanpy analysisAccess Code, including the DEG lookup methodDEG

4. microenviroments file (if microenvs_file_path)

Microenvironment information can be selectively provided to further characterize the ligand-receptor situation. I understand that microenvironment can be interpreted as an additional delineated/circumscribed analytical (spatial) area, and celltype should correspond to metadata

Run the code:

  1. statistical method

consist ofrunning the statistical methodrespond in singingwithout using the statistical methodAnd the latestdegs_analysismethodologies

  • method statistical_analysis

Paradigm:

cellphonedb method statistical_analysis test_meta.txt test_counts.txt
  • method analysis

Paradigm:

cellphonedb method analysis test_meta.txt test_counts.txt 
  • method degs_analysis

  1. run-time parameter

~ Optional Method parameters:

  • --counts-data: [ensembl | gene_name | hgnc_symbol] Type of gene identifiers in the counts dataGenerally select hgnc_symbolHere's what the format of the gene names in the expression matrix looks like

  • --project-name: Name of the project. a subfolder with this name is created in the output folder to create a subdirectory under output named the project. suitable for differentiating between samples when calculating multiple samples.

  • --iterations: Number of iterations for the statistical analysis [1000]Number of iterations,can be defaulted

  • --threshold: % of cells expressing the specific ligand/receptorGenes with an expression share of less than % will not be analyzed, often selecting 0.1

  • --result-precision: Number of decimal digits in results [3]

  • --output-path: Directory where the results will be allocated (the directory must exist) [out] output folder, this folder will not be created by itself, (unlike cnmf which will create one).Folders must be created in advance

  • --output-format: Output format of the results files (extension will be added to filename if not present) [txt] Format of the output file,Unspecified istxt

  • --means-result-name: Means result filename [means] The name of the result file.

  • --significant-means-result-name: Significant mean result filename [significant_means]Name of the file result,Default is fine.

  • --deconvoluted-result-name: Deconvoluted result filename [deconvoluted]File Result Name,Default is fine.

  • --verbose/--quiet: Print or hide CellPhoneDB logs [verbose] Whether or not to output intermediate results

  • --subsampling: Enable subsampling subsampling related parameters - whenWhen the sample size is too largeCan be used

  • --subsampling-log: Enable subsampling log1p for non log-transformed data inputs !!mandatory!!

  • --subsampling-num-pc: Subsampling NumPC argument (number of PCs to use) [100]

  • --subsampling-num-cells: Number of cells to subsample to [1/3 of cells]

~ Optional Method Statistical parameters statistical parameter

  • --pvalues-result-name: P-values result filename [pvalues]File Result Name,Default is fine.

  • --pvalue: P-value threshold [0.05] thresholds never mind

  • --debug-seed: Debug random seed -1. To disable it please use a value >=0 [-1] deal withbug

  • --threads: Number of threads to use. >=1 [4] Multi-core or not

3. Official functioning

The old version officially gave four examples, much the same can actually be set together

cellphonedb method statistical_analysis   --iterations=10 --threads=2

  1. The new version of method runs

Specific examples in tutoriolnew versionIt is based on thepythonRuns, no longerterminal

Example with running the DEG-based method
  1. from import cpdb_degs_analysis_method
  2. deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. degs_file_path = degs_file_path,
  7. counts_data = 'hgnc_symbol',
  8. threshold = 0.1,
  9. output_path = out_path)
Example with running the statistical method
  1. from import cpdb_statistical_analysis_method
  2. deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. counts_data = 'hgnc_symbol',
  7. output_path = out_path)
Example without using the statistical method

Results with text

  1. from import cpdb_analysis_method
  2. means, deconvoluted = cpdb_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.txt,
  6. counts_data = 'hgnc_symbol',
  7. output_path = out_path)

Results with h5ad

  1. from import cpdb_analysis_method
  2. means, deconvoluted = cpdb_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. counts_data = 'hgnc_symbol',
  7. output_path = out_path)
Example running a microenviroments file
  1. from import cpdb_degs_analysis_method
  2. deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. counts_data = 'hgnc_symbol',
  7. microenvs_file_path = microenvs_file_path,
  8. output_path = out_path)

Result Description:

Old official websiteInterpretation of cellphoneDB_github results

Interpretation of the results of the new versionMore detailed and also includes more new methods that are not presented here for the time being.

In the results folder there are mainly the following files: within each file there are several parameters

  • P-value (),

  • Mean (),

  • Significant mean (significant_means.txt)

  • Deconvoluted ()

  • anno's annotation file is input data

Parameters common to all documents

  • id_cp_interaction: Unique CellPhoneDB identifier for each interaction stored in the database.

  • interacting_pair: Name of the interacting pairs separated by “|”.

  • partner A or B: Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:)multiligand-like (MLM)/Multi-receptor situation

  • gene A or B: Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list.

  • secreted: True if one of the partners is secreted.

  • Receptor A or B: True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database.

  • annotation_strategy: Curated if the interaction was annotated by the CellPhoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from.

  • is_integrin: True if one of the partners is integrin.


  • : p-values for the all the interacting partners: refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. (Only in )

followed by some value that should be the p-value between clusters, for that ligand-receptor pair

The result is similar to the p-value file

  • means: Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0. (Only in )

3.significant_means.txt

The main parameters are the same as before, but with more rank results.

  • rank: Total number of significant p-values for each interaction divided by the number of cell type-cell type comparisons. (Only in significant_means.txt)

  • significant_mean: Significant mean calculation for all the interacting partners. If < 0.05, the value will be the mean. Alternatively, the value is set to 0. (Only in significant_means.txt)

It's a little different than the rest of the files.

  • gene_name: Gene identifier for one of the subunits that are participating in the interaction defined in “” file. The identifier will depend on the input of the user list.

  • uniprot: UniProt identifier for one of the subunits that are participating in the interaction defined in “” file.

  • is_complex: True if the subunit is part of a complex. Single if it is not, complex if it is.

  • protein_name: Protein name for one of the subunits that are participating in the interaction defined in “” file.

  • complex_name: Complex name if the subunit is part of a complex. Empty if not.

  • id_cp_interaction: Unique CellPhoneDB identifier for each of the interactions stored in the database.

  • mean: Mean expression of the corresponding gene in each cluster.

Mapping:

cpdb comes with a plotting function, but it is more suitable for exporting data to be plotted in R.

An R package can be viewedktplots plotting cpdb graphsYou can watch it, too.Official drawing

View Version:

Get the current latest library (run in terminal)

cellphonedb database list_remote

Local library version

cellphonedb database list_local