1 of 70

User Help Pages

Welcome to the Help Pages for UCSC Xena

Tutorials, Live Examples, and How to pages for UCSC Xena

Tutorials and webinars

Step-by-step tutorials to get you started and our schedule of upcoming webinars

Video Tutorials

Overview of Xena (2 min)

Beginner Tutorial (17 min)

Advanced Tutorial (55 min)

Recording of tutorial given for NCI in May 2021. Includes closed captioning.

Workshop Cheatsheet/Handout

How do I ...

Request a workshop

Webinars

Explore upcoming webinars and sign up to stay in the loop on new dates.

Introduction to Xena for bulk sequencing data: June 5th, 2025 10am-12pm PT

This webinar will be on June 5th, 2025, 10am-12pm PT.

In this webinar we will explore Xena's core functionalities for bulk sequencing data including

Visualizing mutation, expression, and copy number variation for a gene, group of genes, or chromosome
Visualizing clinical/phenotype data
Running a Kaplan-Meier survival analysis
Creating charts and analyses with accompanying statistics
Running a Differential Gene Expression analysis

All analyses will be done on publicly available data. As time permits, we’ll also do a brief demonstration of how to explore your own bulk sequencing data within Xena.

The webinar will run for 1.5 hours, followed by 30 minutes for Q&A.

More webinars coming soon!

Basic Tutorial: Section 1

Learn to create your first views in Xena

Description

This tutorial is made for those who have never used Xena. We will cover how to create a Visual Spreadsheet with gene expression, mutation, and copy number variation data.

Prerequisites

This tutorial assumes basic knowledge of

gene expression, copy number variation, and mutational genomic sequencing data
how a change in copy number variation or mutations can lead to a change in gene expression
The Cancer Genome Atlas (TCGA)

These resources can help you gain basic knowledge of these concepts:

Estimated time needed

Part A: 5 min

Part B: 10 min

Learning goals

Part A

Create a Visual Spreadsheet
Compare data across columns

Part B

Move columns
Resize columns
Zoom in and out

Tutorial

We are going to look at EGFR aberrations in patients with lung adenocarcinomas using TCGA data. We will be looking at mutations and copy number aberrations and how they change gene expression.

Part A

Our goal is to build a Visual Spreadsheet and understand the relationship between the columns of data.

Steps

Type 'GDC TCGA Lung Adenocarcinoma (LUAD)', select this study from the drop down menu, and click 'To first variable'.
Type 'EGFR', select the checkboxes for Gene Expression, Copy Number, and Somatic Mutation, and click 'To second variable'.

Video of steps:

How to read a Visual Spreadsheet

Samples are on the y-axis and your columns of data are on the x-axis. We line up columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.

Biological interpretation

We can see that samples from TCGA patients that have high expression of EGFR (red, column B) tend to either have amplifications of EGFR (red, column C) or mutations in EGFR (blue tick marks, column D).

More information

Making your own Visual Spreadsheet: Which TCGA study to choose

Part B

We will now move the columns to change the sort order and resize columns. We will zoom in to the whole Visual Spreadsheet and also within a column.

Steps

Move columns. Click column C, copy number variation, and drag it to the left so that it becomes the first column after the samples column (i.e. column B). Note that the samples are now sorted by the values in this column.
Resize columns. Click the handle in the lower right corner of column D, mutation. Move it to the right to make the column bigger.
Zoom in on a column. Click and drag within column D. Release to zoom.
Zoom out on a column. Click the red zoom out text at the top of column D.
Zoom in on samples. Click and drag vertically in any column in the Visual Spreadsheet to zoom in on these samples.
Zoom out on samples. To zoom out click either 'Zoom out' or 'Clear zoom' at the top of the Visual Spreadsheet.

Video of step 1

Video of step 2

Video of steps 3-6

More information

Test your knowledge

Create a Visual Spreadsheet looking at TP53 gene expression and mutation in samples from patients in the GDC TCGA Lower Grade Glioma study.

Change the Visual Spreadsheet from Question 1 so that the patient's samples are sorted by mutations rather than gene expression.

Basic Tutorial: Section 2

Learn how to remove samples with no data, subgroup samples, and make Kaplan Meier plots

Description

This tutorial is made for those who have never used Xena but who have completed Section 1 of the Basic Tutorial. We will cover how to filter to just the samples you are interested in, how to create subgroups, and how to run a Kaplan Meier survival analysis.

Prerequisites

Estimated time needed

Part A: 7 min

Part B: 15 min

Part C: 5 min

Learning goals

Part A

Search for samples of interest
Remove samples with no data

Part B

Make subgroups
Rename subgroups

Part C

Run a Kaplan Meier survival analysis
Use a custom time endpoint

Tutorial

In the Basic Tutorial Section 1 we found that we found that samples from patients that have aberrations in EGFR have relatively higher expression. These aberrations could be mutations or copy number amplifications.

Now we are going to look at whether those patient with aberrations in their samples also have a worse survival prognosis.

Part A

Our goal is to remove patient's samples with no data (i.e. null) from the view. This will make the view look cleaner and remove irrelevant samples from our Kaplan Meier survival analysis.

Steps

Type 'null' into the samples search bar. This will highlight samples that have 'null' values in any column on the screen. Null means that there is no data for that sample for that column.
Click the filter menu and select 'Remove samples'.
Delete the search term.

Video of steps

More information

Shortcut for Part A

Instead of typing 'null' and removing those samples from the view, you can also use the 'Remove samples with nulls' shortcut in the filter menu.

Part B

Our goal is to create two subgroups, those patient's with samples with aberrations in EGFR and those patient's samples without aberrations in EGFR. We will then name the subgroups.

Steps

Type '(mis OR inframe) OR B:>0.5' into the samples search bar. This will select samples that either have a missense or inframe deletion '(mis OR inframe)', or where copy number variation (column B) is greater than 0.5. Note that I arbitrarily choose a cutoff of 0.5.

You must have the copy number variation column as column B for the search term '(mis OR inframe) OR B:>0.5' to work. The 'B' in 'B:>0.5' is instructing Xena to search in column B for values that are greater than 0.5.

Click the filter menu and select 'New subgroup column'. This will create a new column that has samples that met our search term marked as 'true' (ie. those that have an EGFR aberration) and those that did not meet our search term as 'false' (ie. those that do not have an EGFR aberration).
Click the column menu for the column we just created (column B) and chose 'Display'.
Rename the display so that samples that are 'true' are instead labeled as 'EGFR Aberrations' and the samples that are 'false' are instead labeled as 'No EGFR Aberrations'. Click 'Done'
Delete the search term. This will remove the black tick marks for matching samples.

Video of steps 1

Video of steps 2-4

More information

Part C

Now that we have our subgroups we will run a Kaplan Meier survival analysis. Note that TCGA survival data is in days, hence the x-axis will be in days.

We can now see that there is no difference in survival between patients with EGFR aberrations and those without.

Steps

Click the column menu at the top of column B.
Choose 'Kaplan Meier Plot'.
Click 'Custom survival time cutoff' at the bottom of the Kaplan Meier plot.
Enter 3650, as this is 10 years.

Video of steps

More information

Test your knowledge

Starting at the end of Part A, filter down to only those patient's samples that have a missense mutation.

Search term: "missense"

Starting at the end of Part A, create two subgroups: those patient's samples with EGFR expression greater than 4 and those with EGFR expression less than 4.

Search term: "C:>4"

Starting at the end of Part A, run a Kaplan Meier analysis on the EGFR expression column.

Basic Tutorial: Section 3

Learn how to use Chart View and add new columns of data to a view

Description

This tutorial is made for those who have never used Xena but who have completed Section 1 of the Basic Tutorial. We will cover how to make box plots and bar charts using our Charts and Statistics View and how to add another column of data, in particular phenotype data, to the view.

Prerequisites

Estimated time needed

Part A: 5 min

Part B: 15 min

Learning goals

Part A

Create a box plot using the Charts and Statistics View

Part B

Add another column of data to the view
Add phenotype data to the view
Create a bar chart using the Charts and Statistics View

Tutorial

Part A

We found that patient's samples that have aberrations in EGFR have higher gene expression. Now we are going to investigate if this difference in gene expression statistically significant.

We can now see that patient's samples with EGFR aberrations have statistically higher gene expression.

Steps

Click the 3-dot column menu at the top of the gene expression column (don't worry if you start with another column - you will be selecting the correct columns in the steps ahead).
Click 'Compare subgroups', since we want to compare the group of samples who have aberrations in EGFR to the group of samples that do not.
Click the dropdown for 'Show data from' and choose 'column C: EGFR - gene expression RNAseq - HTSeq - FPKM-UQ'.
Click the dropdown for 'Subgroup samples by' and choose 'column B: (mis OR infra) OR C:>0.5 - Subgroup'.
Click 'Done'.

Video of steps

More information

Part B

We will now investigate how EGFR aberrations compare between samples from men and women.

We can now see that EGFR aberrations are more common in samples from females.

Steps

Click the 'x' in the upper right corner to exit Chart View.
Hover between columns B and C until 'Click to insert a column' becomes visible. Click on it.
Choose 'Phenotypic', click in the search bar, and choose 'Advanced'.
Type 'gender' into the search bar, select 'gender.demographic' from the dropdown menu, and click 'Done'.
Click the column menu at the top of column C and choose 'Chart & Statistics'. Note that this is just another way to enter Chart View.
Click 'Compare subgroups', since we want to compare the group of samples who have aberrations in EGFR to the group of samples that do not.
'column C: gender.demographic' should already be selected for 'Show data from'. If not, select it.
'column B: (mis OR infra) OR C:>0.5 - Subgroup' should already be selected for 'Subgroup samples by'. If not, select it.
Click 'Done'.

Video of steps 1-4

Video of steps 5-9

More information

Test your knowledge

Starting at the end of Part A, create a violin plot that compares copy number variation between patient's samples that have EGFR aberrations and those that do not.

Starting at the end of Part B, add the phenotype data 'age_at_earliest_diagnosis_in_years.diagnoses.xena_derived' to the plot.

Note that your column order may be different.

Advanced Tutorial: Section 1

Learn how to view whole chromosomes and view advanced datasets such as exon expression

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to view whole chromosome and how to use the advanced dataset menu to access datasets such as exon expression.

Prerequisites

Estimated time needed

10 min‌

Learning goals

Create a visual spreadsheet that with a chromosome-wide column and data from the advanced dataset menu.

Tutorial

We will look at the ERG-TMPRSS2 gene fusion in patients from the TCGA Prostate Cancer study.

ERG is an oncogene that expressed at low levels in normal prostate tissue. Some patient's prostate cancer samples have higher expression of ERG. These samples tend to have an intra-chromosomal deletion that fuses ERG to TMPRSS2. TMPRSS2 is expressed at high levels in normal prostate tissue. This allows ERG to use the TMPRSS2 promoter to increase ERG expression.

Note that column D may look slightly different, depending on how you resize and zoom the column.

We can now see that there are many patient's samples with relatively high expression of ERG (column B). This relatively high expression is not uniform across the exons of ERG, but instead is in the exons closer to the 3' end of the gene (column C). Looking at column D, we can see that these samples also have an intra-chromosomal deletion of part of chromosome 21. If we hover over the genes at either end of the deletion, we can see that the end points fall within ERG and TMPRSS2.

Steps

Type 'TCGA Prostate Cancer (PRAD)', select this study from the drop down menu, and click 'To first variable'.
Type 'ERG', select the checkbox for Gene Expression and click 'To second variable'.
Type 'ERG', click 'Show Advanced', select the checkbox for 'IlluminaHiSeq' under 'exon expression RNAseq', and click 'Done'.
Click the text 'Click to insert a column' after column C. Type 'chr21', select the checkbox for Copy Number and click 'Done'.
Click on the filter menu and select 'Remove samples with nulls'
Click on the handle in the lower right corner of column E, copy number for chromosome 21. Move it to the right to make the column bigger.
Click and drag within column E, copy number for chromosome 21 to zoom into the intra-chromosomal deletion.

Video of steps 1-4

‌Video of steps 5-8

More information:

Test your knowledge

Add copy number data for chromosome 1.

Add DNA Methylation data for ERG.

Advanced Tutorial: Section 2

Learn how to use the pick samples feature, how to view multiple genes in a single column, how to view a signature, and how to run a differential expression analysis

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to use the pick samples feature, how to view multiple genes in a single column, how to enter and view a signature, and how to run a differential expression analysis.

Prerequisites

Estimated time needed

Part A: 10 min

Part B: 5 min

Part C: 15 min

Learning goals

Part A

Create a visual spreadsheet with single column with multiple genes.
Filter to only Primary Tumor samples using the Pick Samples mode.
Remove nulls using the option in the filter menu

Part B

Enter and view a gene expression signature

Part C

Run a differential expression analysis.

Tutorial

We will investigate the PAM50 molecular subtypes in breast cancer. PAM50 is a 50-gene signature that classifies breast cancer into five molecular intrinsic subtypes: Luminal A, Luminal B, HER2-enriched, Basal-like, and Normal-like.

Part A

We will make a visual spreadsheet where we can explore the relationship between the PAM50 subtype call and the 50 genes that make up the PAM50 subtype call.

Steps

Type 'TCGA Breast Cancer (BRCA)', select this study from the drop down menu, and click 'To first variable'.
Choose 'Phenotypic', select 'sample_type' from the dropdown menu, and click 'To second variable'.
Choose 'Phenotypic', click on 'advanced', type 'pam' into the search bar, select 'PAM50Call_RNAseq' from the dropdown menu, and click 'Done'. This will exit the wizard.
Click on 'Click to insert a column' after column C. Copy and paste the 50 genes, choose 'Gene Expression', and click 'Done'.
Click the handle in the lower right corner of column D, mutation. Move it to the right to make the column bigger.

List of 50 genes used to calculate the PAM50 subtype call:

UBE2T BIRC5 NUF2 CDC6 CCNB1 TYMS MYBL2 CEP55 MELK NDC80 RRM2 UBE2C CENPF PTTG1 EXO1 ORC6L ANLN CCNE1 CDC20 MKI67 KIF2C ACTR3B MYC EGFR KRT5 PHGDH CDH3 MIA KRT17 FOXC1 SFRP1 KRT14 ESR1 SLC39A6 BAG1 MAPT PGR CXXC5 MLPH BCL2 MDM2 NAT1 FOXA1 BLVRA MMP11 GPR160 FGFR4 GRB7 TMEM45B ERBB2

Video of steps 1-4

Video of steps 5-6

Steps continued

Click on the picker icon next to the filter menu to enter pick samples mode.
Click on the Primary Tumor samples.
Click the filter menu and select 'Keep samples'.
Exit pick samples mode by clicking on the picker icon again.
Click the filter menu and select 'Remove samples with nulls'.

Video of steps 1-5

More information:

Part B

We will now look at the TFAC30 gene signature and see how it relates to the PAM50 subtype calls. This gene expression signature over 30 genes predicts pathologic complete response (pCR) to preoperative weekly paclitaxel and fluorouracil-doxorubicin-cyclophosphamide (T/FAC) chemotherapy.

Steps

Click on 'Click to insert a column' after column D. Copy and paste the signature below, choose 'Gene Expression', and click 'Done'. Note you need to include the '=' as this tells Xena that you want the signature rather than to see all the genes individually.

TFAC30 gene expression signature:

=E2F3 + MELK + RRM2 + BTG3 - CTNND2 - GAMT - METRN - ERBB4 - ZNF552 - CA12 - KDM4B - NKAIN1 - SCUBE2 - KIAA1467 - MAPT - FLJ10916 - BECN1 - RAMP1 - GFRA1 - IGFBP4 - FGFR1OP - MDM2 - KIF3A - AMFR - MED13L - BBS4

We can now see that patient's samples that are labeled as 'Her2' and 'Basal' are predicted to be more likely to achieve pCR on TFAC chemotherapy.

Video of step 1

More information

Part C

We will run a differential expression analysis comparing Basal samples to Luminal A and Luminal B samples.

Steps

Click the column menu for the PAM50 subtype call (column C) and chose 'Differential Expression'. This will open a new tab where we will run the analysis.
Choose the first subgroup to be 'Basal' and the second subgroup to be 'LumA' and 'LumB'. Hold the shift key while clicking to select multiple groups.
Click 'Submit'.

Note it can take a while for the analysis to run. Wait until it says 'Success' at the top.

Video of steps 1-3

More information

Tutorial: Tumor vs Normal

Learn how to compare tumor samples to normal samples using our TCGA TARGET GTEx study

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to view tumor and normal samples from healthy and diseased individuals together, and how to compare gene expression for one or more genes between tumor and normal samples.

We will be using both GTEx samples as our normal samples as well as TCGA matched normal samples. More information on GTEx normal samples can be found here:

Prerequisites

Estimated time needed

Part A: 10 min

Part B: 5 min

Learning goals

Part A

Build a visual spreadsheet with the columns primary site, sample type, study, and gene expression for the TCGA TARGET GTEx study.
Filter to just colon samples.

Part B

Create a box plot using the Charts and Statistics View

Tutorial

We will compare MYC gene expression between patient's samples in TCGA colon adenocarcinoma tumor samples and individuals normal colon tissue in GTEx.

Part A

Our goal is to build a visual spreadsheet with the columns 'primary site', 'sample site', 'study', and gene expression for MYC for the TCGA TARGET GTEx study. We will then filter to samples in the colon.

We can now see that normal samples tend to have lower MYC gene expression.

Steps

Type 'TCGA TARGET GTEx', select this study from the drop down menu, and click 'To first variable'.
Type 'MYC', select the checkbox for Gene Expression and click 'To second variable'.
Choose 'Phenotypic' and select the checkboxes for 'sample type', 'study' and 'Primary site', and click 'Done'.
Type 'colon' in the samples search bar and choose 'Keep samples'.

Video of steps 1-4

Video of step 5

Part B

Our goal is to see if the difference in gene expression, where normal samples tend to have lower MYC gene expression, is statistically significant.

We can now see that patient's tumor samples, both recurrent, primary, and metastatic, have higher expression compared to normal tissue, both patient's matched normal tissue from TCGA and unmatched individual's normal tissue from GTEx.

Steps

Click the column menu for column B (MYC gene expression) and choose 'Charts & Stats'
Click 'Compare subgroups', click the dropdown for 'Show data from' and choose 'column B: MYC - gene expression RNAseq - RSEM norm_count' if it is not already selected
Click the dropdown for 'Subgroup samples by' and choose 'column C: Sample Type'.
Leave the chart type as 'box plot', and click 'Done'.

Video of steps 1-4

Test your knowledge

Compare EGFR gene expression between patient's tumor samples and individual's normal lung tissue.

Tutorial: Viewing your own data

Learn how to view your own data using data from the Chinese Glioma Genome Atlas (CGGA)

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to load your own data into a Xena hub on your computer. We will then view the data in the Xena Browser

We will be viewing RNAseq and clinical data from the Chinese Glioma Genome Atlas (CGGA).

Prerequisites

To format the datasets you will need access to a spreadsheet application, such as Microsoft Excel.

To load the data into a Local Xena Hub you will need a computer where you have installation privileges.

Estimated time needed

Part A: 10 min

Part B: 15 min

Part C: 10 min

Learning goals

Part A

Download data from CGGA
Use Microsoft Excel or another spreadsheet application to make small formatting adjustments. These adjustments are only to enable Kaplan Meier analyses. Data can be visualized as is.

Part B

Download and install a Local Xena Hub
Load data into the Xena Hub on your computer

Part C

Make a visual spreadsheet from the data in the Xena Hub on your computer
Create a box plot
Run a Kaplan Meier Analysis

Tutorial

Part A

We will start with downloading the files from the CGGA. These files already conform to our data file requirements. This is because they are matrices that have sample IDs along one axis and probe, gene, or clinical data names along the other. Additionally, the files are tab-delimited.

For more information see:

While we can load the files exactly as is, we will perform a small format adjustment so that we can create a Kaplan Meier plot. Our Kaplan Meier analyses need two columns of clinical data to create a plot: the event/censor column and the time to that event/censor. These columns need to be specially named so that our Kaplan Meier analysis recognizes them. For Overall Survival, the column names need to be 'OS' and 'OS.time'.

For more information on other supported columns for our Kaplan Meier analysis see:

Steps to format the file

Click to download the 'Clinical Data' and 'Expression Data from STAR+RSEM'. Unzip the files. The resulting files should be named 'CGGA.mRNAseq_693.RSEM-genes.20200506.txt' and 'CGGA.mRNAseq_693_clinical.20200506.txt'.
Open CGGA.mRNAseq_693_clinical.20200506.txt
in a spreadsheet application like Microsoft Excel. If the spreadsheet application asks, these files are tab-delimited.
Rename the column header 'OS' to be 'OS.time'.
Rename the column header 'Censor (alive=0; dead=1)' to be 'OS'.
Save and close the file.

There is no need to open CGGA.mRNAseq_693.RSEM-genes.20200506.txt since it is ready to be loaded into the Local Xena Hub on your computer as is.

Part B

Steps

If this is your first time viewing your own data

2. Click 'Open UCSC Xena' to set your computer up to automatically open the Xena Hub when you come to this page in the future.

3. Click on 'download & run a Local Xena Hub' to download the correct installer for your computer.

4. Double-click the installer to install the Xena Hub on your computer. Follow onscreen instructions, which vary by operating system.

If you already have viewed your own data

2. Wait for 30 seconds. If you allowed your browser to open the Xena Hub every time you come to this screen, then it will open the Xena Hub and this dialog box will close. If you did not, you will need to go to your Applications Folder and open UCSC Xena yourself

Whether you have viewed your own data before or not, you should arrive at a screen like this:

If you have already loaded data previously, you may see datasets and cohorts listed at the bottom of the screen

Steps to load the data files

Click the 'Load Data' button.
Click 'Select Data File', choose 'CGGA.mRNAseq_693_clinical.20200506.txt', and click 'Next'.
Choose 'Phenotypic Data' and click 'Next'.
Choose 'The first column is sample IDs' and click 'Next'
Choose 'These are the first data on these samples.', change the study name to 'CGGA', and click 'Import'.
Choose 'Load more data'
Click 'Select Data File', choose
'CGGA.mRNAseq_693.RSEM-genes.20200506.txt', and click 'Next'.
Choose 'Genomic Data' and click 'Next'.
Confirm selection of 'The first row is sample IDs' and click 'Next'
Choose 'I have loaded other data on these samples and want to connect to it.', select 'CGGA' from the drop down, and click 'Import'.

Video of steps 1-6

Video of steps 7-10

Note that it can take several minutes for the RNAseq data to load since it is larger.

Part C

We will look at the chromosome 1p-19q co-deletion in Chinese glioma patients and compare this to IDH1 expression.

Ending Screenshot for Visual Spreadsheet (end of step 5)

Ending Screenshot for Box plot (end of step 11)

Ending Screenshot for Kaplan Meier Analysis (end of step 13)

Steps

Click on 'Visualization' in the top menu bar.
Type 'CGGA', choose 'CGGA' as the study and click 'To first variable'.
Enter the gene 'IDH1', choose 'CGGA.mRNAseq_693.RSEM-genes.20200506.txt', and click 'To second variable'
Choose 'Phenotypic', click '1p19q_codeletion_status', and click 'Done'
The dataset authors annotated samples without a 1p/19q co-deletion status with 'NA'. To remove these samples, type 'NA' in the samples search bar and choose 'Remove Samples' from the filter actions menu drop down.
Compare IDH1 expression between samples with a 1p/19q co-deletion and those that do not. To do this, click on the column menu for column B (IDH1 expression) and choose 'Charts & Stats'.
Choose 'Compare Subgroups'.
Click the dropdown for 'Show data from' and choose 'column B: IDH1 - CGGA.mRNAseq_693.RSEM-genes.20200506.txt'.
Click the dropdown for 'Subgroup samples by' and choose 'column C: 1p19q_codeletion_status - CGGA.mRNAseq_693_clinical.20200506.txt'.
Click 'Done'.
Close the chart using the 'x' in the upper left corner.
Run a Kaplan Meier analysis comparing patients with high IDH1 expression to those with low IDH1 expression. To do this, click on the column menu for column B (IDH1 expression) and choose 'KM plot'

Video of steps 2-4

Video of steps 5-10

Video of steps 11-12

Live examples

Live Examples of what types of visualizations and analyses you can perform using UCSC Xena

Workshop cheatsheet/handout

Xena mutation views supports examination of both coding and non-coding mutations from whole genome analysis. We support viewing mutations from both gene- or coordinate- centric perspective. In the gene-centric view, you can dynamically toggle to show or hide introns from the view. This figure shows the frequent intron mutations in 321 samples from the ICGC lymphoma cohorts. These 'pile-ups' would be not be visible if viewing mutations only in the exome. These intron mutations overlap with known enhancers regions (Mathelier 2015).‌

How do I ...

Step-by-step instructions for our most common use cases

How do I compare tumor vs normal expression?

TCGA matched normal vs. GTEx normal

Using the TCGA TARGET GTEx study

To compare tumor vs normal, you will need to filter down to just the samples you want to compare and then compare gene expression between your groups of samples.

More information:

There are four gene expression datasets in this study. Two are normalized using with-in sample methods. The 'RSEM norm__count' dataset is normalized by the upper quartile method, the 'RSEM expected__count (DESeq2 standardized)' dataset is by DESeq2 normalization. Therefore, these two gene expression datasets should be used.

Running a Differential Gene Expression Analysis

More information:

Tutorial

How do I remove null data (gray lines) from view?

Sometimes not all samples in a dataset have data. This can happen for a variety of reasons, such as a particular patient's sample did not undergo one or more analyses. In this case, we use gray, or 'null' to show that there is no data.

To remove null data use the 'Remove samples with nulls' shortcut in the filter menu.

Example

Overview of features

More details about all the features we have on Xena

Overview of public data

FAQ

Viewing your own data

Technical documentation

Tutorial: Viewing your own data

Learn how to view your own data using data from the Chinese Glioma Genome Atlas (CGGA)

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to load your own data into a Xena hub on your computer. We will then view the data in the Xena Browser

We will be viewing RNAseq and clinical data from the Chinese Glioma Genome Atlas (CGGA).

Prerequisites

To format the datasets you will need access to a spreadsheet application, such as Microsoft Excel.

To load the data into a Local Xena Hub you will need a computer where you have installation privileges.

To visualize the data, you will need basic knowledge of how to build and read a , how to , how to , and how to . To get this go through the Basic Tutorials, starting with .

Estimated time needed

Part A: 10 min

Part B: 15 min

Part C: 10 min

Learning goals

Part A

Download data from CGGA
Use Microsoft Excel or another spreadsheet application to make small formatting adjustments. These adjustments are only to enable Kaplan Meier analyses. Data can be visualized as is.

Part B

Download and install a Local Xena Hub
Load data into the Xena Hub on your computer

Part C

Make a visual spreadsheet from the data in the Xena Hub on your computer
Create a box plot
Run a Kaplan Meier Analysis

Tutorial

Part A

For more information see:

Data format specifications and supported biological data types

For more information on other supported columns for our Kaplan Meier analysis see:

KM plots using data from a Local Xena Hub

Steps to format the file

Go to and scroll to the DataSet ID mRNAseq_693.
Click to download the 'Clinical Data' and 'Expression Data from STAR+RSEM'. Unzip the files. The resulting files should be named 'CGGA.mRNAseq_693.RSEM-genes.20200506.txt' and 'CGGA.mRNAseq_693_clinical.20200506.txt'.
Open CGGA.mRNAseq_693_clinical.20200506.txt
in a spreadsheet application like Microsoft Excel. If the spreadsheet application asks, these files are tab-delimited.
Rename the column header 'OS' to be 'OS.time'.
Rename the column header 'Censor (alive=0; dead=1)' to be 'OS'.
Save and close the file.

There is no need to open CGGA.mRNAseq_693.RSEM-genes.20200506.txt since it is ready to be loaded into the Local Xena Hub on your computer as is.

Part B

Steps

If this is your first time viewing your own data

1. Click '' at the top of the screen. You should see a screen similar to this:

2. Click 'Open UCSC Xena' to set your computer up to automatically open the Xena Hub when you come to this page in the future.

3. Click on 'download & run a Local Xena Hub' to download the correct installer for your computer.

4. Double-click the installer to install the Xena Hub on your computer. Follow onscreen instructions, which vary by operating system.

Please see our or if you encounter any problems.

If you already have viewed your own data

1. Click '' at the top of the screen. You should see a screen similar to this:

Whether you have viewed your own data before or not, you should arrive at a screen like this:

If you have already loaded data previously, you may see datasets and cohorts listed at the bottom of the screen

Steps to load the data files

Click the 'Load Data' button.
Click 'Select Data File', choose 'CGGA.mRNAseq_693_clinical.20200506.txt', and click 'Next'.
Choose 'Phenotypic Data' and click 'Next'.
Choose 'The first column is sample IDs' and click 'Next'
Choose 'These are the first data on these samples.', change the study name to 'CGGA', and click 'Import'.
Choose 'Load more data'
Click 'Select Data File', choose
'CGGA.mRNAseq_693.RSEM-genes.20200506.txt', and click 'Next'.
Choose 'Genomic Data' and click 'Next'.
Confirm selection of 'The first row is sample IDs' and click 'Next'
Choose 'I have loaded other data on these samples and want to connect to it.', select 'CGGA' from the drop down, and click 'Import'.

Video of steps 1-6

Video of steps 7-10

Note that it can take several minutes for the RNAseq data to load since it is larger.

Part C

We will look at the chromosome 1p-19q co-deletion in Chinese glioma patients and compare this to IDH1 expression.

Ending Screenshot for Visual Spreadsheet (end of step 5)

Ending Screenshot for Box plot (end of step 11)

Ending Screenshot for Kaplan Meier Analysis (end of step 13)

Note that we are unable to provide links to these ending screenshots because we do not allow users to create bookmarks when viewing data from their own Local Xena Hubs. This is to protect the privacy of your data.

Steps

Click on 'Visualization' in the top menu bar.
Type 'CGGA', choose 'CGGA' as the study and click 'To first variable'.
Enter the gene 'IDH1', choose 'CGGA.mRNAseq_693.RSEM-genes.20200506.txt', and click 'To second variable'
Choose 'Phenotypic', click '1p19q_codeletion_status', and click 'Done'
The dataset authors annotated samples without a 1p/19q co-deletion status with 'NA'. To remove these samples, type 'NA' in the samples search bar and choose 'Remove Samples' from the filter actions menu drop down.
Compare IDH1 expression between samples with a 1p/19q co-deletion and those that do not. To do this, click on the column menu for column B (IDH1 expression) and choose 'Charts & Stats'.
Choose 'Compare Subgroups'.
Click the dropdown for 'Show data from' and choose 'column B: IDH1 - CGGA.mRNAseq_693.RSEM-genes.20200506.txt'.
Click the dropdown for 'Subgroup samples by' and choose 'column C: 1p19q_codeletion_status - CGGA.mRNAseq_693_clinical.20200506.txt'.
Click 'Done'.
Close the chart using the 'x' in the upper left corner.
Run a Kaplan Meier analysis comparing patients with high IDH1 expression to those with low IDH1 expression. To do this, click on the column menu for column B (IDH1 expression) and choose 'KM plot'

Video of steps 2-4

Video of steps 5-10

Video of steps 11-12

Visual Spreadsheet

This dynamic, powerful, and flexible view is our default view into the data.

The Visual Spreadsheet allows you to add an arbitrary number of columns of any data type (mutation, copy number, expression, protein, phenotype, methylation, etc) on any number of patient's samples into a spreadsheet-like view. We line up all columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.

Making a Visual Spreadsheet

The wizard on the screen will guide you to choose a study to view and TWO columns of data to view on those samples. Note that if you do not choose at least two columns, the wizard will not exit and let you interact with the data.

Selecting a cohort

You can select a cohort either by choosing 'Help me select a cohort' and searching our cohorts for you cancer type, etc. or by choosing 'I know the study I want to use' and searching for the partial or full name of the cohort you are interested in.

Adding a Gene or Position

Enter a HUGO gene name or a dataset-specific probe names (e.g. a CpG island). You can enter one gene or multiple genes. Separate multiple genes with a space, comma, tab, or new line.

To display a genomic region, enter the genomic region, choose your dataset and click 'done'. We recongize chromosomes (e.g. chr1), arms of chromosomes (e.g. chr19q), and chromosomes coordinates (e.g. chr1:100-4,000).

Selecting a Dataset

After entering a gene or probe name, you will need to select one or more datasets.

Basic Datasets

We have pre-selected default datasets for most cohorts. These datasets are selected based because they are the most used datasets. Typically there is a default mutation, copy number, and expression dataset.

Advanced Datasets

Xena also has more datasets than those listed in the Basic Menu. Depending on the cohort, these can include DNA methylation, exon expression, thresholded CNV data and more. To access them, click on 'Show Advanced' below:

More information on basic datasets

Video of making a Visual Spreadsheet

After you made a Visual Spreadsheet

Overview

Patient samples are on the y-axis and your columns of data are on the x-axis. We line up all columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.

If you entered a single gene

If you entered a single gene, that gene will be listed at the top of the column. If there are multiple probes mapped to that gene in the dataset you selected they will be displayed as subcolumns ordered left to right in the direction of transcription.

If you selected a positional dataset, such as segmented copy number variation or mutation we will display the gene model will be displayed at the top of the column. The gene model is a composite of all transcripts of the gene. Boxes show different exons with UTR regions being short and CDS regions being tall. We display 2Kb upstream to show the promoter region. Use the column menu to toggle to show intronic regions.

If you entered multiple genes

If you entered multiple genes, each gene will be listed as a subcolumn for that dataset. If there are multiple probes mapped to that gene in the dataset (i.e. if you entered a single gene then you would see the probes as subcolumns), then the probes are averaged for a single value per gene.

Note that if you entered more than one gene and selected a mutation dataset, we will only show the first gene. If you wish to see multiple mutation columns, please enter each gene individually and click 'done'

If you entered a chromosome or chromosome position

When displaying a chromosome range, genes will be shown at the top of the column, with dark blue genes being on the forward strand and red genes being on the reverse strand. Hovering over a gene will display the gene name in the tooltip. Note that introns are always shown in this mode.

Data values

Individual values vary by dataset. The legend at the bottom of the dataset will tell you the units for your particular dataset, including any normalization that was performed. If a sample does not have data for a column, it will show as gray and be labeled as 'null'.

If the entire column is gray this means we did not recognize the gene, probe, or position. If you believe this to be in error, please try an alternate name.

More information about a dataset can be found in the dataset details page. To get there, click on the column menu and choose 'About'.

Sample sorting

The Xena Browser uses the y-axis for samples and the x-axis/columns for genomic/phenotypic features. Data from a single sample is always on the same horizontal line across all columns, allowing you to see screen-wide trends. The Xena Browser orders samples left to right first by the first columns, then the second, etc. If there are multiple genes, identifiers, probes within in a column, samples is ordered from left to right by 1st sub-column, then 2nd sub-column, and so on.

Numerical data are ordered in descending order (e.g. 3.5, 1.2, ...). Categorical data (e.g. stage, tumor type, etc) are ordered by categories. CNV data is sorted by the average of the entire column. Positional mutation data is ordered by genomic coordinates (from 5'->3') and then by the predicted impact of the mutation. Both CNV and positional mutation data has the option to instead sort by the zoomed region. Click the column menu at the top of the column and choose 'Sort by zoom region avg'.

To reverse the ordering, click the column menu at the top of the column and chose 'Reverse sort'

Move a column/change the sample sorting

As the sample sort order is controlled by the left most columns, it can be useful to explore the data by moving a different column to the left.

To move a column click on the column header and drag a column to the right or left.

Zooming

Click and drag any where in any column to zoom in in either direction. Zoom out to all samples by clicking the 'Clear Zoom' at the top. Zoom out to the whole column by clicking the red 'x' at the top of a column.

Resize a column

You can change the size of a column by clicking on the bottom right corner of a column and dragging to a new size.

Add another column

You can add another column of data by clicking on 'Click to add column' either on the right edge of the visual spreadsheet or by hovering between columns until 'Click to insert column' displays'.

Data format specifications and supported biological data types

There are 2 basic data formats and 2 advanced data formats. Each of these formats has one or more biological data types that it supports.

General Specifications for all data formats

We support most types of genomic and phenotype/clinical/sample annotations. For genomic data we support calls made on the raw data including but not limited to expression calls, mutation calls, etc. This is what TCGA calls ‘Level 3’ data and is typically a value on gene, transcript, probe, etc. We do not support FASTQ, BAMs, or other ‘raw’ files. Please contact us if you have any questions.

We support tab-delimited and Microsoft Excel files (.xlsx and .xls). Tab-delimited files generally have a file name ending in .tsv or .txt, though we do not require this. Note that we load tab-delimited files much faster than Excel files. You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.

Please do not have any duplicate genes/probes/identifiers or samples. We will allow you to load with duplicates but will only display the first one encountered in the file.

We assume you use a '.' to indicate a decimal place as opposed to a ',' .

Basic Genomic data: numbers in a rectangle/matrix/spreadsheet

Supported data types

RNA-seq expression (exon, transcript, gene, etc)
Array-based expression (probe, gene, etc)
Gene-level mutation
Gene-level copy number
DNA methylation
RPPA
and more ...

For samples that do not have expression for a particular gene, either have a blank field or use "NA".

An example of a genomic matrix file (in this case, expression):

Sample

TCGA-BA-4074-01

TCGA-BA-4075-01

TCGA-BA-4076-01

ACAP3

0.137

0.022

CTRT2

0.024

0.805

0.256

ALK

0.098

0.805

1.87

Basic Phenotypic data: categories or non-genomic in a rectangle/matrix/spreadsheet

These are data on a sample or patient that is categorical in nature (e.g. Tumor Stage or 'wild type' or 'mutant' for a gene) or is numerical but non-genomic (e.g. age or a genomic signature). Samples can be columns and rows can be phenotype/clinical/sample orientation or vice versa. We support both orientations.

Supported data types

phenotype/patient/clinical data (age, weight, if there was blood drawn, etc)
sample/aliquot data (where it was sequenced, tumor weight, etc)
derived data (regulon activity for a gene, etc)
genomic signatures (EMT signature score, stemness score, etc)
other (whether a sample has an ERG-TMPRSS2 fusion, whether a sample has WGS data available, etc)

Categorial vs numerical data

We support both numerical and categorical data. For numerical data please use a blank field for any samples which may be missing data. For categorical data you can use a blank field or "NA" for any samples which may be missing data.

Note that if you use "NA" for a missing numerical field then the Xena software will automatically treat that column as a category.

To have it be treated as a numerical field please use a blank field.

An example of a phenotype matrix file:

sample

ER_status

disease_status

age

TCGA-BA-4074-01

positive

complete remission

TCGA-BA-4082-01

positive

complete remission

TCGA-BA-4078-01

negative

undergoing treatment

Advanced Segmented data

For segmented data, we require the following 5 columns: sample, chr, start, end, and value. Note that your column headers must be these names exactly!

Please use 'NA' to indicate no data.

Supported data types

copy number

We currently accept hg38, hg19, hg18 coordinates.

Example segmented copy number data with required columns:

sample

chr

start

end

value

TCGA-V4-A9EL-01

chr1

61735

16815530

0.041

TCGA-V4-A9EL-01

chr1

16816090

17190862

-0.4227

TCGA-V4-A9EF-01

chr4

86979944

115173700

0.0414

Advanced Positional data

For positional data, we require 6 columns: sample, chr, start, end, reference, alt. Note that your column headers must be these names exactly!

Note that Xena will not call the gene, variant effect, etc for you. All gene annotation information must be included in the file

Supported data types

mutation data

We currently accept hg38, hg19, hg18 coordinates.

Example mutation data with the six required columns, plus the gene column:

sample

chr

start

end

reference

alt

gene

TCGA-AB-2802-03

chr2

29917721

ALK

TCGA-AB-2802-03

chr1

119270684

119270687

TTAAA

MYC

TCGA-AB-2867-03

chr1

150324146

PRPF3

To specify a sample is assayed but no mutation is detected, you need a line in the file with three columns filled: sample, start, end. "start" and "end" are required to be integer (if left empty, the data loader will reject the file), so use -1 to indicate that these are bogus coordinates. The rest of the columns are empty strings.