Step-by-step instructions for our most common use cases
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
This page assumes you have a column on screen that has the groups you would like to compare (such as 'sample type' for comparing tumor vs normal') or have already made subgroups (such as 'has mutations in EGFR' vs 'does not have mutations in EGFR'). If you need help making subgroups, please see the 'How do I make subgroups' help page.
First, make sure that the gene or genes that you want to compare across your groups are on screen.
Click on the charts icon in the top right and choose 'Compare subgroups'.
Click the dropdown for 'Show data from' and choose your gene expression column.
Click the dropdown for 'Subgroup samples by' and choose your subgroup column.
Choose if you would like a box plot or violin plot and click 'Done'.
Below we look at patient's samples that have aberrations in EGFR in the TCGA Lung Adenocarcinoma study. We will investigate if patient's samples that have aberrations in EGFR (mutations or copy number amplifications) have higher expression.
Click the graph icon in the upper right corner to enter Chart View.
Click 'Compare subgroups', since we want to compare the group of samples who have aberrations in EGFR to the group of samples that do not.
Click the dropdown for 'Show data from' and choose 'column C: EGFR - gene expression RNAseq - HTSeq - FPKM-UQ'.
Click the dropdown for 'Subgroup samples by' and choose 'column B: (mis OR infra) OR C:>0.5 - Subgroup'.
Click 'Done'.
For more information see our Basic Tutorial: Section 3.
For users who wish to use the datasets in a Pan-Can cohort but need to view just one cancer type.
1. Add the phenotype column that details the cancer type
The phenotype column will vary depending on which study you choose. See below for specific column names
2. Search for the cancer type you are interested in, making sure that it is listed in the phenotype column. Click the Filter + subgroup menu next to the search bar and select 'Keep Samples'.
For the TCGA PanCan (PANCAN), you will want to add the phenotype column:
cancer type abbreviation
Here is a bookmark that will take you to the TCGA PanCan (PANCAN) Study with that phenotype column already selected.
For the TCGA TARGET GTEx, you will want to add the phenotype columns:
main category
study
primary_site
Here is a bookmark that will take you to the TCGA TARGET GTEx Study with those phenotype columns already selected.
There are two main sources of normal expression data in Xena. The first is matched normal tissue samples from TCGA patients. These patient's samples are called "solid tissue normals" and are taken from tissue near the tumor. Normal samples from TCGA patients are typically limited in number but some cancer types may have enough for a robust statistical comparison. It is important to note that their proximity to tumor means it may have tumor microenvironment signal. The second source of normal expression is GTEx. GTEx has expression data from normal tissue of individuals who do not have cancer. There are typically many more samples in GTEx then in TCGA solid tissue normals. However, experimental sample processing are different from TCGA, which may lead to batch effects.
You can use the TCGA TARGET GTEx study for both types of 'normal' samples. Data from the study is from the UCSC RNA-seq Compendium, where TCGA, TARGET, and GTEx samples are re-analyzed by the same RNA-seq pipeline. This pipeline involved re-aligning the reads to hg38 genome and calling gene expression using RSEM and Kallisto methods. Because all samples are processed using a uniform bioinformatic pipeline, batch effects due to different computational processing is eliminated. Note that the samples from this study have only undergone per-sample normalization.
To compare tumor vs normal, you will need to filter down to just the samples you want to compare and then compare gene expression between your groups of samples.
More information:
There are four gene expression datasets in this study. Two are normalized using with-in sample methods. The 'RSEM norm__count' dataset is normalized by the upper quartile method, the 'RSEM expected__count (DESeq2 standardized)' dataset is by DESeq2 normalization. Therefore, these two gene expression datasets should be used.
If you are looking to compare just a few genes, you can use our chart view to run your analysis. If you are looking to run a genome-wide differential gene expression analysis, you can use our DEA feature. Note that we only allow users to run our Differential Gene Expression Analysis on less than 2,000 samples total. Thus, you will need to filter to run this analysis on this dataset.
You've run your analysis and are ready to publish your paper - congratulations! Cite the paper below to thank Xena and keep our project funded.
Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020).
You can also read our paper for free at bioRxiv:
This page assumes you are familiar with making 2 subgroups. If you are not, please see the help page.
This page details how to create subgroups based on the expression of 2 genes so that you create the following 4 subgroups:
geneA expression is high AND geneB expression is high
geneA expression is high AND geneB expression is low
geneA expression is low AND geneB expression is low
geneA expression is low AND geneB expression is high
To do this enter a search terms for each gene, such as 'C:>15' or 'D:<0.6' into the search box and separate each search term with a ';'.
You can see in the search bar the expression used to make column A using the example genes of CD44 and CD24.
Also note that you can use this feature on columns besides gene expression, such as copy number variation, etc. You can also use it on categorical features, for instance to compare expression of a gene and the patient's gender (male or female). Simply add the gender column to the Visual Spreadsheet and enter 'female' for one of the search terms above.
from 'true' and 'false' to something more biologically meaningful.
To make a KM plot, click on the column menu at the top of a column and choose 'Kaplan Meier Plot'.
More information about KM plots can be found in our Overview of Kaplan Meier Plots.
Sometimes not all samples in a dataset have data. This can happen for a variety of reasons, such as a particular patient's sample did not undergo one or more analyses. In this case, we use gray, or 'null' to show that there is no data.
To remove null data use the 'Remove samples with nulls' shortcut in the filter menu.
Use the find samples feature (highlighted below) to make subgroups:
First, search for all the patient's samples you want in one of your subgroups. Next, click the Filter + Subgroup menu and choose 'New subgroup column'.
This will create a new subgroup column. All the patient's samples that matched your search term will be in one subgroup labeled as 'true' and all the samples that did not match your search term will in the other subgroup labeled as 'false'.
Your new column can be used for a KM analysis or to compare gene expression.
More information:
In this example we are creating two subgroups in the TCGA Lung Adenocarcinoma study: patient's samples with aberrations in EGFR and those without. These aberrations could be mutations or copy number amplifications.
Type '(mis OR infra) OR C:>0.5' into the samples search bar. This will select samples that either have a missense or inframe deletion '(mis OR infra)', or where copy number variation (column C) is greater than 0.5. Note that I arbitrarily choose a cutoff of 0.5.
Click the filter menu and select 'New column subgroup'. This will create a new column that has samples that met our search term marked as 'true' (ie. those that have an EGFR aberration) and those that did not meet our search term as 'false' (ie. those that do not have an EGFR aberration).
For more information see our Basic Tutorial: Section 2.
See our help on renaming the subgroup labels from 'true' and 'false' to something more biologically meaningful.
This page assumes you are familiar with making 2 subgroups. If you are not, please see 'How do I make subgroups'.
To make more than 2 sample subgroups, enter multiple search terms, such as 'C:>15' into the search box. Separate each search term with a ';'.
This can be used for a number of situations:
To divide a single numerical column into more than 2 subgroups (e.g. geneA high, geneA mid, and geneA low)
To make subgroups over the expression of two genes such that you get 4 subgroups (e.g. geneA high + geneB high, geneA low + geneB high, geneA high + geneB low, geneA low + geneB low)
To make subgroups over the expression of a gene and a categorical column (e.g. geneA high + Estrogen Receptor positive, geneA low + Estrogen Receptor positive, geneA high + Estrogen Receptor negative, geneA low + Estrogen Receptor negative)
To make subgroups over two categorical columns (e.g. Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative)
See below for an example of each.
In the screenshot below you can see that column D that ranges from 7.3 to 12. If you wanted to have 3 groups: 7.3 - 9, 9 - 10, and 10 - 12, you would enter:
C:>9 ; C:>10
into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.
See our help on renaming the subgroup labels from 'true' and 'false' to something more biologically meaningful.
Click here to see our separate help page for this scenario
In the screenshot below you can see that column E (ERBB2 gene expression) that ranges from 10 to 16. If you wanted to have 4 groups: ERBB2 > 13 + Estrogen Receptor positive, ERBB2 <= 13 + Estrogen Receptor positive, ERBB2 > 13 + Estrogen Receptor negative, ERBB2 <= 13 + Estrogen Receptor negative), you would enter:
E:>13 ; C:Negative
into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.
See our help on renaming the subgroup labels from 'true' and 'false' to something more biologically meaningful.
In the screenshot below, if you wanted to have 4 groups: Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative you would enter:
C:Negative ; D:Negative
into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.
See our help on renaming the subgroup labels from 'true' and 'false' to something more biologically meaningful.
This page assumes that you are already viewing more than one cancer type in view. Please see the help page 'How do I view multiple types of cancer together' to get started with this.
If there are cancer types in view that you do not want to investigate, you will need to filter them out. Please see the help page 'How do I filter to just one cancer type' to get started with this.
Steps:
Add a column of data. Enter your gene or list of genes, select 'Gene Expression' and click done.
From the column menu at the top of the new column you created, select 'Chart & Statistics'
Choose 'Compare Subgroups'
Click the dropdown for 'Show data from' and choose your gene expression column.
Click the dropdown for 'Subgroup samples by' and choose the cancer type column.
Choose if you would like a box plot or violin plot and click 'Done'.
To change the color threshold, click on the column menu at the top and choose 'Display'. From there click 'custom', enter your new thresholds, and click 'Done'.
To change the color, click on the column menu at the top and choose 'Display'. From there choose a new set of colors from the drop down.
If your plot has an '!' icon next to the p-value this means that some patients are in your plot twice. This can happen when A) a patient has both a tumor and normal sample or when a patient has a metastasis that is part of the dataset and/or B) a tumor sample was split into multiple aliquots and then run through the same analysis twice.
This page will guide you on how to remove duplicates due to A. If there are duplicates due to B you will need to download the data, decide how to resolve any inconsistencies between the multiple aliquots and load it into your own Xena Hub.
Add the data column of 'sample type' from the Phenotype data
We are adding a column of data that indicates the sample type such as 'Primary Tumor', 'Normal', etc. Note that different datasets may have a different name for this the data.
2. Filter to only samples that are 'Primary tumor' by typing 'primary' into the filter search box. Next, click the filter icon next to the filter search box and chose 'Filter'. This will filter out all samples that are not primary tumor.
Note that if you are viewing a mostly metastatic cancer like melanoma you may instead need to filter on 'metastatic' instead of 'primary'
3. Run your KM analysis by clicking the caret menu at the top of the column and choosing 'Kaplan-Meier plot' It will now only have primary tumor samples in it.
Removing duplicate samples from TCGA Lower Grade Glioma KM analysis
For users who wish to compare data across different types of cancer
To view multiple types of cancer patients side-by-side you will need to start with a Pan-Cancer dataset and then filter down to just the cancer types you want to see.
The contains the latest data from the PanCan Atlas project, including many hand curated datasets. It also contains some legacy TCGA data across all cancer types, including GISTIC 2 CNV estimates and miRNAseq estimates.
1. Add the phenotype column cancer type abbreviation
that details the cancer type.
that will take you to the TCGA PanCan (PANCAN) Study with that phenotype column already selected.
2. Search for the cancer type you are interested in, making sure that it is listed in the phenotype column. Separate each cancer type by 'OR'. Example: 'lgg OR gbm'. Click the Filter + subgroup menu next to the search bar and select 'Keep Samples'.
Below is an example for viewing breast and ovarian cancer together for the TCGA PanCan Atlas
When you are in the Xena Visual Spreadsheet, hovering the mouse over any data on the screen will trigger a tooltip to show up at the top of the view.
To freeze the tooltip, you need to "Alt-click", i.e. hold on the ALT key on your computer and at the same time click the left mouse button.
To unfreeze the tooltip, click on the close (X) icon.
This can be helpful if you want to click on the link to take you to the UCSC Genome Browser, where you can view more information about those genomic coordinates.
If you are adding in new samples, this will require you to combine outside of Xena and then load. If you are adding new data on samples we already have, then simply load the data into a Xena Hub.
We apologize but we don't provide a simple way to do this because of the batch effects that would be present when combining most data across studies. You will need to download the data you wish to combine from TCGA, combine it yourself outside of Xena, and then load it into your own Xena hub.
Download TCGA data through
Load your data into your own Xena hub, making sure to select the cohort that you want to view your data side-by-side with when loading it.
Sample names and format are study specific. You will need to match what we already in Xena.
Note that if you want to view a genomic signature on our gene expression data, you can do so using our