How do I compare tumor vs normal expression?

Xena will only allow you to compare samples for a small set of genes. If you want to do a genome-wide differential expression analysis, you will need to do this outside of Xena.

Which 'normal' to use?

While TCGA is a good resource for tumor data, finding normal tissue expression data for comparison can be challenging. There are two main sources of normal expression data in Xena. The first is normal samples from TCGA itself. These samples are called "solid tissue normals" and are taken from normal tissues near the tumor. Solid tissue normal samples from TCGA are typically limited in number but some cancer types may have enough for a robust statistical comparison. Their proximity to tumor may introduce signals of tumor microenvironment in its transcriptome profile. The second way is to compare to GTEx samples, which has expression data from normal tissue of individuals who do not have cancer. There are typically many more samples in GTEx then TCGA solid tissue normals, however, experimental sample processing are different from TCGA, which may lead to batch effects.

Using the TCGA TARGET GTEx study

You can use the TCGA TARGET GTEx study for both types of 'normal' samples. Data from the study is from the UCSC RNA-seq Compendium, where TCGA and GTEx samples are re-analyzed (re-aligned to hg38 genome and expressions are called using RSEM and Kallisto methods) by the same RNA-seq pipeline. Because all samples are processed using a uniform bioinformatic pipeline, batch effect due to different computational processing is eliminated.

To compare tumor vs normal, you will need to filter down to just the samples you want to compare and then compare gene expression between your groups of samples.

There are four gene expression datasets in this study. Two are normalized using with-in sample methods. The 'RSEM norm__count' dataset is normalized by the upper quartile method, the 'RSEM expected__count (DESeq2 standardized)' dataset is by DESeq2 normalization. Therefore, these two gene expression datasets should be used.

Walk-through example

In this example we will be looking to compare MYC gene expression between normal colon tissue from GTEx to TCGA colon adenocarcinoma.

1. Filter

Start with the TCGA TARGET GTEx study, first, we filter the cohort and only keep the colon samples. The search term that was used to filter is: colon.

cheat link:

2. Add gene

All we have to do now is to 'Click to Add Column' to add our favorite gene (e.g. MYC), click 'gene expression' and click 'Done'.

cheat link:

3. Chart

  1. Click the graph icon in the upper right corner to enter Chart View.

  2. Click 'Compare subgroups', since we want to compare tumor vs normal samples

  3. Click the dropdown for 'Show data from' and choose 'column F: MYC - gene expression RNAseq - RSEM norm_count'.

  4. Click the dropdown for 'Subgroup samples by' and choose 'column C: Sample Type'.

  5. Click 'Done'.

cheat link: