Only this pageAll pages
Powered by GitBook
1 of 70

User Help Pages

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

How do I ...

Step-by-step instructions for our most common use cases

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Overview of features

More details about all the features we have on Xena

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Overview of public data

Loading...

Loading...

Loading...

Loading...

Loading...

FAQ

Loading...

Loading...

Viewing your own data

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Technical documentation

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Basic Tutorial: Section 3

Learn how to use Chart View and add new columns of data to a view

Description

This tutorial is made for those who have never used Xena but who have completed Section 1 of the Basic Tutorial. We will cover how to make box plots and bar charts using our Charts and Statistics View and how to add another column of data, in particular phenotype data, to the view.

Prerequisites

Estimated time needed

Part A: 5 min

Part B: 15 min

Learning goals

Part A

  • Create a box plot using the Charts and Statistics View

Part B

  • Add another column of data to the view

  • Add phenotype data to the view

  • Create a bar chart using the Charts and Statistics View

Tutorial

Part A

We found that patient's samples that have aberrations in EGFR have higher gene expression. Now we are going to investigate if this difference in gene expression statistically significant.

We can now see that patient's samples with EGFR aberrations have statistically higher gene expression.

Steps

  1. Click the 3-dot column menu at the top of the gene expression column (don't worry if you start with another column - you will be selecting the correct columns in the steps ahead).

  2. Click 'Compare subgroups', since we want to compare the group of samples who have aberrations in EGFR to the group of samples that do not.

  3. Click the dropdown for 'Show data from' and choose 'column C: EGFR - gene expression RNAseq - HTSeq - FPKM-UQ'.

  4. Click the dropdown for 'Subgroup samples by' and choose 'column B: (mis OR infra) OR C:>0.5 - Subgroup'.

  5. Click 'Done'.

Video of steps

More information

Part B

We will now investigate how EGFR aberrations compare between samples from men and women.

We can now see that EGFR aberrations are more common in samples from females.

Steps

  1. Click the 'x' in the upper right corner to exit Chart View.

  2. Hover between columns B and C until 'Click to insert a column' becomes visible. Click on it.

  3. Choose 'Phenotypic', click in the search bar, and choose 'Advanced'.

  4. Type 'gender' into the search bar, select 'gender.demographic' from the dropdown menu, and click 'Done'.

  5. Click the column menu at the top of column C and choose 'Chart & Statistics'. Note that this is just another way to enter Chart View.

  6. Click 'Compare subgroups', since we want to compare the group of samples who have aberrations in EGFR to the group of samples that do not.

  7. 'column C: gender.demographic' should already be selected for 'Show data from'. If not, select it.

  8. 'column B: (mis OR infra) OR C:>0.5 - Subgroup' should already be selected for 'Subgroup samples by'. If not, select it.

  9. Click 'Done'.

Video of steps 1-4

Video of steps 5-9

More information

Test your knowledge

Starting at the end of Part A, create a violin plot that compares copy number variation between patient's samples that have EGFR aberrations and those that do not.

Starting at the end of Part B, add the phenotype data 'age_at_earliest_diagnosis_in_years.diagnoses.xena_derived' to the plot.

Note that your column order may be different.

Tutorials and webinars

Step-by-step tutorials to get you started and our schedule of upcoming webinars

Video Tutorials

Overview of Xena (2 min)

Beginner Tutorial (17 min)

Advanced Tutorial (55 min)

Recording of tutorial given for NCI in May 2021. Includes closed captioning.

Workshop Cheatsheet/Handout

How do I ...

Request a workshop

This tutorial assumes you have done . is recommended but not required. This tutorial begins where the Basic Tutorial: Section 2 ends. A live link to the end of Basic Tutorial: Section 2 is at the beginning of this tutorial.

In the we found that patient's samples that have aberrations in EGFR have higher expression. These aberrations could be mutations or copy number amplifications.

In the we created two subgroups: patient's samples that have aberrations in EGFR and those without. We ran a Kaplan Meier survival analysis and found that there was no difference in survival between these two groups.

Now we are going to use the subgroups created in the to see if there is a statistical difference in gene expression between the two subgroups. We will also look at whether samples from male or female patients have more aberrations.

To ensure your columns are sorted the same as those in this tutorial, .

are useful if you have a specific question.

We offer FREE online workshops to any group or organization, both within the USA and internationally. They can be 1-hour, 1/2-day, or 1-day in length. Please contact us for more information:

Basic Tutorial: Section 1
Basic Tutorial: Section 2
Basic Tutorial: Section 1
Basic Tutorial: Section 2
Basic Tutorial: Section 2
please start at this link
Ending Screenshot
Chart and Statistics View
Ending Screenshot
Chart and Statistics View
Ending Screenshot
Ending Screenshot
Upcoming Webinars
Basic Step-by-Step Tutorial
Advanced Step-by-Step Tutorial
https://cbiit.webex.com/cbiit/lsr.php?RCID=acf4ea46dd9e41338662f0ba1ac59754
How do I ... guides
genome-cancer@soe.ucsc.edu

Webinars

Explore upcoming webinars and sign up to stay in the loop on new dates.

Introduction to Xena for bulk sequencing data: June 5th, 2025 10am-12pm PT

This webinar will be on June 5th, 2025, 10am-12pm PT.

In this webinar we will explore Xena's core functionalities for bulk sequencing data including

  • Visualizing mutation, expression, and copy number variation for a gene, group of genes, or chromosome

  • Visualizing clinical/phenotype data

  • Running a Kaplan-Meier survival analysis

  • Creating charts and analyses with accompanying statistics

  • Running a Differential Gene Expression analysis

All analyses will be done on publicly available data. As time permits, we’ll also do a brief demonstration of how to explore your own bulk sequencing data within Xena.

The webinar will run for 1.5 hours, followed by 30 minutes for Q&A.

More webinars coming soon!

Welcome to the Help Pages for UCSC Xena

Tutorials, Live Examples, and How to pages for UCSC Xena

Basic Tutorial: Section 2

Learn how to remove samples with no data, subgroup samples, and make Kaplan Meier plots

Description

This tutorial is made for those who have never used Xena but who have completed Section 1 of the Basic Tutorial. We will cover how to filter to just the samples you are interested in, how to create subgroups, and how to run a Kaplan Meier survival analysis.

Prerequisites

Estimated time needed

Part A: 7 min

Part B: 15 min

Part C: 5 min

Learning goals

Part A

  • Search for samples of interest

  • Remove samples with no data

Part B

  • Make subgroups

  • Rename subgroups

Part C

  • Run a Kaplan Meier survival analysis

  • Use a custom time endpoint

Tutorial

In the Basic Tutorial Section 1 we found that we found that samples from patients that have aberrations in EGFR have relatively higher expression. These aberrations could be mutations or copy number amplifications.

Now we are going to look at whether those patient with aberrations in their samples also have a worse survival prognosis.

Part A

Our goal is to remove patient's samples with no data (i.e. null) from the view. This will make the view look cleaner and remove irrelevant samples from our Kaplan Meier survival analysis.

Steps

  1. Type 'null' into the samples search bar. This will highlight samples that have 'null' values in any column on the screen. Null means that there is no data for that sample for that column.

  2. Click the filter menu and select 'Remove samples'.

  3. Delete the search term.

Video of steps

More information

Shortcut for Part A

Instead of typing 'null' and removing those samples from the view, you can also use the 'Remove samples with nulls' shortcut in the filter menu.

Part B

Our goal is to create two subgroups, those patient's with samples with aberrations in EGFR and those patient's samples without aberrations in EGFR. We will then name the subgroups.

Steps

  1. Type '(mis OR inframe) OR B:>0.5' into the samples search bar. This will select samples that either have a missense or inframe deletion '(mis OR inframe)', or where copy number variation (column B) is greater than 0.5. Note that I arbitrarily choose a cutoff of 0.5.

You must have the copy number variation column as column B for the search term '(mis OR inframe) OR B:>0.5' to work. The 'B' in 'B:>0.5' is instructing Xena to search in column B for values that are greater than 0.5.

  1. Click the filter menu and select 'New subgroup column'. This will create a new column that has samples that met our search term marked as 'true' (ie. those that have an EGFR aberration) and those that did not meet our search term as 'false' (ie. those that do not have an EGFR aberration).

  2. Click the column menu for the column we just created (column B) and chose 'Display'.

  3. Rename the display so that samples that are 'true' are instead labeled as 'EGFR Aberrations' and the samples that are 'false' are instead labeled as 'No EGFR Aberrations'. Click 'Done'

  4. Delete the search term. This will remove the black tick marks for matching samples.

Video of steps 1

Video of steps 2-4

More information

Part C

Now that we have our subgroups we will run a Kaplan Meier survival analysis. Note that TCGA survival data is in days, hence the x-axis will be in days.

We can now see that there is no difference in survival between patients with EGFR aberrations and those without.

Steps

  1. Click the column menu at the top of column B.

  2. Choose 'Kaplan Meier Plot'.

  3. Click 'Custom survival time cutoff' at the bottom of the Kaplan Meier plot.

  4. Enter 3650, as this is 10 years.

Video of steps

More information

Test your knowledge

Starting at the end of Part A, filter down to only those patient's samples that have a missense mutation.

Search term: "missense"

Starting at the end of Part A, create two subgroups: those patient's samples with EGFR expression greater than 4 and those with EGFR expression less than 4.

Search term: "C:>4"

Starting at the end of Part A, run a Kaplan Meier analysis on the EGFR expression column.

Basic Tutorial: Section 1

Learn to create your first views in Xena

Description

This tutorial is made for those who have never used Xena. We will cover how to create a Visual Spreadsheet with gene expression, mutation, and copy number variation data.

Prerequisites

This tutorial assumes basic knowledge of

  • gene expression, copy number variation, and mutational genomic sequencing data

  • how a change in copy number variation or mutations can lead to a change in gene expression

  • The Cancer Genome Atlas (TCGA)

These resources can help you gain basic knowledge of these concepts:

Estimated time needed

Part A: 5 min

Part B: 10 min

Learning goals

Part A

  • Create a Visual Spreadsheet

  • Compare data across columns

Part B

  • Move columns

  • Resize columns

  • Zoom in and out

Tutorial

We are going to look at EGFR aberrations in patients with lung adenocarcinomas using TCGA data. We will be looking at mutations and copy number aberrations and how they change gene expression.

Part A

Our goal is to build a Visual Spreadsheet and understand the relationship between the columns of data.

Steps

  1. Type 'GDC TCGA Lung Adenocarcinoma (LUAD)', select this study from the drop down menu, and click 'To first variable'.

  2. Type 'EGFR', select the checkboxes for Gene Expression, Copy Number, and Somatic Mutation, and click 'To second variable'.

Video of steps:

How to read a Visual Spreadsheet

Samples are on the y-axis and your columns of data are on the x-axis. We line up columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.

Biological interpretation

We can see that samples from TCGA patients that have high expression of EGFR (red, column B) tend to either have amplifications of EGFR (red, column C) or mutations in EGFR (blue tick marks, column D).

More information

Making your own Visual Spreadsheet: Which TCGA study to choose

Part B

We will now move the columns to change the sort order and resize columns. We will zoom in to the whole Visual Spreadsheet and also within a column.

Steps

  1. Move columns. Click column C, copy number variation, and drag it to the left so that it becomes the first column after the samples column (i.e. column B). Note that the samples are now sorted by the values in this column.

  2. Resize columns. Click the handle in the lower right corner of column D, mutation. Move it to the right to make the column bigger.

  3. Zoom in on a column. Click and drag within column D. Release to zoom.

  4. Zoom out on a column. Click the red zoom out text at the top of column D.

  5. Zoom in on samples. Click and drag vertically in any column in the Visual Spreadsheet to zoom in on these samples.

  6. Zoom out on samples. To zoom out click either 'Zoom out' or 'Clear zoom' at the top of the Visual Spreadsheet.

Video of step 1

Video of step 2

Video of steps 3-6

More information

Test your knowledge

Create a Visual Spreadsheet looking at TP53 gene expression and mutation in samples from patients in the GDC TCGA Lower Grade Glioma study.

Change the Visual Spreadsheet from Question 1 so that the patient's samples are sorted by mutations rather than gene expression.

Advanced Tutorial: Section 2

Learn how to use the pick samples feature, how to view multiple genes in a single column, how to view a signature, and how to run a differential expression analysis

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to use the pick samples feature, how to view multiple genes in a single column, how to enter and view a signature, and how to run a differential expression analysis.

Prerequisites

Estimated time needed

Part A: 10 min

Part B: 5 min

Part C: 15 min

Learning goals

Part A

  • Create a visual spreadsheet with single column with multiple genes.

  • Filter to only Primary Tumor samples using the Pick Samples mode.

  • Remove nulls using the option in the filter menu

Part B

  • Enter and view a gene expression signature

Part C

  • Run a differential expression analysis.

Tutorial

We will investigate the PAM50 molecular subtypes in breast cancer. PAM50 is a 50-gene signature that classifies breast cancer into five molecular intrinsic subtypes: Luminal A, Luminal B, HER2-enriched, Basal-like, and Normal-like.

Part A

We will make a visual spreadsheet where we can explore the relationship between the PAM50 subtype call and the 50 genes that make up the PAM50 subtype call.

Steps

  1. Type 'TCGA Breast Cancer (BRCA)', select this study from the drop down menu, and click 'To first variable'.

  2. Choose 'Phenotypic', select 'sample_type' from the dropdown menu, and click 'To second variable'.

  3. Choose 'Phenotypic', click on 'advanced', type 'pam' into the search bar, select 'PAM50Call_RNAseq' from the dropdown menu, and click 'Done'. This will exit the wizard.

  4. Click on 'Click to insert a column' after column C. Copy and paste the 50 genes, choose 'Gene Expression', and click 'Done'.

  5. Click the handle in the lower right corner of column D, mutation. Move it to the right to make the column bigger.

List of 50 genes used to calculate the PAM50 subtype call:

UBE2T BIRC5 NUF2 CDC6 CCNB1 TYMS MYBL2 CEP55 MELK NDC80 RRM2 UBE2C CENPF PTTG1 EXO1 ORC6L ANLN CCNE1 CDC20 MKI67 KIF2C ACTR3B MYC EGFR KRT5 PHGDH CDH3 MIA KRT17 FOXC1 SFRP1 KRT14 ESR1 SLC39A6 BAG1 MAPT PGR CXXC5 MLPH BCL2 MDM2 NAT1 FOXA1 BLVRA MMP11 GPR160 FGFR4 GRB7 TMEM45B ERBB2

Video of steps 1-4

Video of steps 5-6

Steps continued

  1. Click on the picker icon next to the filter menu to enter pick samples mode.

  2. Click on the Primary Tumor samples.

  3. Click the filter menu and select 'Keep samples'.

  4. Exit pick samples mode by clicking on the picker icon again.

  5. Click the filter menu and select 'Remove samples with nulls'.

Video of steps 1-5

More information:

Part B

We will now look at the TFAC30 gene signature and see how it relates to the PAM50 subtype calls. This gene expression signature over 30 genes predicts pathologic complete response (pCR) to preoperative weekly paclitaxel and fluorouracil-doxorubicin-cyclophosphamide (T/FAC) chemotherapy.

Steps

  1. Click on 'Click to insert a column' after column D. Copy and paste the signature below, choose 'Gene Expression', and click 'Done'. Note you need to include the '=' as this tells Xena that you want the signature rather than to see all the genes individually.

TFAC30 gene expression signature:

=E2F3 + MELK + RRM2 + BTG3 - CTNND2 - GAMT - METRN - ERBB4 - ZNF552 - CA12 - KDM4B - NKAIN1 - SCUBE2 - KIAA1467 - MAPT - FLJ10916 - BECN1 - RAMP1 - GFRA1 - IGFBP4 - FGFR1OP - MDM2 - KIF3A - AMFR - MED13L - BBS4

We can now see that patient's samples that are labeled as 'Her2' and 'Basal' are predicted to be more likely to achieve pCR on TFAC chemotherapy.

Video of step 1

More information

Part C

We will run a differential expression analysis comparing Basal samples to Luminal A and Luminal B samples.

Steps

  1. Click the column menu for the PAM50 subtype call (column C) and chose 'Differential Expression'. This will open a new tab where we will run the analysis.

  2. Choose the first subgroup to be 'Basal' and the second subgroup to be 'LumA' and 'LumB'. Hold the shift key while clicking to select multiple groups.

  3. Click 'Submit'.

Note it can take a while for the analysis to run. Wait until it says 'Success' at the top.

Video of steps 1-3

More information

Advanced Tutorial: Section 1

Learn how to view whole chromosomes and view advanced datasets such as exon expression

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to view whole chromosome and how to use the advanced dataset menu to access datasets such as exon expression.

Prerequisites

Estimated time needed

10 min‌

Learning goals

Create a visual spreadsheet that with a chromosome-wide column and data from the advanced dataset menu.

Tutorial

We will look at the ERG-TMPRSS2 gene fusion in patients from the TCGA Prostate Cancer study.

ERG is an oncogene that expressed at low levels in normal prostate tissue. Some patient's prostate cancer samples have higher expression of ERG. These samples tend to have an intra-chromosomal deletion that fuses ERG to TMPRSS2. TMPRSS2 is expressed at high levels in normal prostate tissue. This allows ERG to use the TMPRSS2 promoter to increase ERG expression.

Note that column D may look slightly different, depending on how you resize and zoom the column.

We can now see that there are many patient's samples with relatively high expression of ERG (column B). This relatively high expression is not uniform across the exons of ERG, but instead is in the exons closer to the 3' end of the gene (column C). Looking at column D, we can see that these samples also have an intra-chromosomal deletion of part of chromosome 21. If we hover over the genes at either end of the deletion, we can see that the end points fall within ERG and TMPRSS2.

Steps

  1. Type 'TCGA Prostate Cancer (PRAD)', select this study from the drop down menu, and click 'To first variable'.

  2. Type 'ERG', select the checkbox for Gene Expression and click 'To second variable'.

  3. Type 'ERG', click 'Show Advanced', select the checkbox for 'IlluminaHiSeq' under 'exon expression RNAseq', and click 'Done'.

  4. Click the text 'Click to insert a column' after column C. Type 'chr21', select the checkbox for Copy Number and click 'Done'.

  5. Click on the filter menu and select 'Remove samples with nulls'

  6. Click on the handle in the lower right corner of column E, copy number for chromosome 21. Move it to the right to make the column bigger.

  7. Click and drag within column E, copy number for chromosome 21 to zoom into the intra-chromosomal deletion.

Video of steps 1-4

‌Video of steps 5-8

More information:

Test your knowledge

Add copy number data for chromosome 1.

Add DNA Methylation data for ERG.

Tutorial: Tumor vs Normal

Learn how to compare tumor samples to normal samples using our TCGA TARGET GTEx study

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to view tumor and normal samples from healthy and diseased individuals together, and how to compare gene expression for one or more genes between tumor and normal samples.

We will be using both GTEx samples as our normal samples as well as TCGA matched normal samples. More information on GTEx normal samples can be found here:

Prerequisites

Estimated time needed

Part A: 10 min

Part B: 5 min

Learning goals

Part A

  • Build a visual spreadsheet with the columns primary site, sample type, study, and gene expression for the TCGA TARGET GTEx study.

  • Filter to just colon samples.

Part B

  • Create a box plot using the Charts and Statistics View

Tutorial

We will compare MYC gene expression between patient's samples in TCGA colon adenocarcinoma tumor samples and individuals normal colon tissue in GTEx.

Part A

Our goal is to build a visual spreadsheet with the columns 'primary site', 'sample site', 'study', and gene expression for MYC for the TCGA TARGET GTEx study. We will then filter to samples in the colon.

We can now see that normal samples tend to have lower MYC gene expression.

Steps

  1. Type 'TCGA TARGET GTEx', select this study from the drop down menu, and click 'To first variable'.

  2. Type 'MYC', select the checkbox for Gene Expression and click 'To second variable'.

  3. Choose 'Phenotypic' and select the checkboxes for 'sample type', 'study' and 'Primary site', and click 'Done'.

  4. Type 'colon' in the samples search bar and choose 'Keep samples'.

Video of steps 1-4

Video of step 5

Part B

Our goal is to see if the difference in gene expression, where normal samples tend to have lower MYC gene expression, is statistically significant.

We can now see that patient's tumor samples, both recurrent, primary, and metastatic, have higher expression compared to normal tissue, both patient's matched normal tissue from TCGA and unmatched individual's normal tissue from GTEx.

Steps

  1. Click the column menu for column B (MYC gene expression) and choose 'Charts & Stats'

  2. Click 'Compare subgroups', click the dropdown for 'Show data from' and choose 'column B: MYC - gene expression RNAseq - RSEM norm_count' if it is not already selected

  3. Click the dropdown for 'Subgroup samples by' and choose 'column C: Sample Type'.

  4. Leave the chart type as 'box plot', and click 'Done'.

Video of steps 1-4

Test your knowledge

Compare EGFR gene expression between patient's tumor samples and individual's normal lung tissue.

How do I compare tumor vs normal expression?

TCGA matched normal vs. GTEx normal

Using the TCGA TARGET GTEx study

To compare tumor vs normal, you will need to filter down to just the samples you want to compare and then compare gene expression between your groups of samples.

More information:

There are four gene expression datasets in this study. Two are normalized using with-in sample methods. The 'RSEM norm__count' dataset is normalized by the upper quartile method, the 'RSEM expected__count (DESeq2 standardized)' dataset is by DESeq2 normalization. Therefore, these two gene expression datasets should be used.

Running a Differential Gene Expression Analysis

More information:

Tutorial

Live examples

Live Examples of what types of visualizations and analyses you can perform using UCSC Xena

Workshop cheatsheet/handout

Xena mutation views supports examination of both coding and non-coding mutations from whole genome analysis. We support viewing mutations from both gene- or coordinate- centric perspective. In the gene-centric view, you can dynamically toggle to show or hide introns from the view. This figure shows the frequent intron mutations in 321 samples from the ICGC lymphoma cohorts. These 'pile-ups' would be not be visible if viewing mutations only in the exome. These intron mutations overlap with known enhancers regions (Mathelier 2015).‌

How do I remove null data (gray lines) from view?

Sometimes not all samples in a dataset have data. This can happen for a variety of reasons, such as a particular patient's sample did not undergo one or more analyses. In this case, we use gray, or 'null' to show that there is no data.

To remove null data use the 'Remove samples with nulls' shortcut in the filter menu.

Example

Tutorial: Viewing your own data

Learn how to view your own data using data from the Chinese Glioma Genome Atlas (CGGA)

Description

This tutorial is made for those who have basic knowledge of how to use Xena. We will cover how to load your own data into a Xena hub on your computer. We will then view the data in the Xena Browser

We will be viewing RNAseq and clinical data from the Chinese Glioma Genome Atlas (CGGA).

Prerequisites

To format the datasets you will need access to a spreadsheet application, such as Microsoft Excel.

To load the data into a Local Xena Hub you will need a computer where you have installation privileges.

Estimated time needed

Part A: 10 min

Part B: 15 min

Part C: 10 min

Learning goals

Part A

  • Download data from CGGA

  • Use Microsoft Excel or another spreadsheet application to make small formatting adjustments. These adjustments are only to enable Kaplan Meier analyses. Data can be visualized as is.

Part B

  • Download and install a Local Xena Hub

  • Load data into the Xena Hub on your computer

Part C

  • Make a visual spreadsheet from the data in the Xena Hub on your computer

  • Create a box plot

  • Run a Kaplan Meier Analysis

Tutorial

Part A

We will start with downloading the files from the CGGA. These files already conform to our data file requirements. This is because they are matrices that have sample IDs along one axis and probe, gene, or clinical data names along the other. Additionally, the files are tab-delimited.

For more information see:

While we can load the files exactly as is, we will perform a small format adjustment so that we can create a Kaplan Meier plot. Our Kaplan Meier analyses need two columns of clinical data to create a plot: the event/censor column and the time to that event/censor. These columns need to be specially named so that our Kaplan Meier analysis recognizes them. For Overall Survival, the column names need to be 'OS' and 'OS.time'.

For more information on other supported columns for our Kaplan Meier analysis see:

Steps to format the file

  1. Click to download the 'Clinical Data' and 'Expression Data from STAR+RSEM'. Unzip the files. The resulting files should be named 'CGGA.mRNAseq_693.RSEM-genes.20200506.txt' and 'CGGA.mRNAseq_693_clinical.20200506.txt'.

  2. Open CGGA.mRNAseq_693_clinical.20200506.txt

    in a spreadsheet application like Microsoft Excel. If the spreadsheet application asks, these files are tab-delimited.

  3. Rename the column header 'OS' to be 'OS.time'.

  4. Rename the column header 'Censor (alive=0; dead=1)' to be 'OS'.

  5. Save and close the file.

There is no need to open CGGA.mRNAseq_693.RSEM-genes.20200506.txt since it is ready to be loaded into the Local Xena Hub on your computer as is.

Part B

Steps

If this is your first time viewing your own data

2. Click 'Open UCSC Xena' to set your computer up to automatically open the Xena Hub when you come to this page in the future.

3. Click on 'download & run a Local Xena Hub' to download the correct installer for your computer.

4. Double-click the installer to install the Xena Hub on your computer. Follow onscreen instructions, which vary by operating system.

If you already have viewed your own data

2. Wait for 30 seconds. If you allowed your browser to open the Xena Hub every time you come to this screen, then it will open the Xena Hub and this dialog box will close. If you did not, you will need to go to your Applications Folder and open UCSC Xena yourself

Whether you have viewed your own data before or not, you should arrive at a screen like this:

If you have already loaded data previously, you may see datasets and cohorts listed at the bottom of the screen

Steps to load the data files

  1. Click the 'Load Data' button.

  2. Click 'Select Data File', choose 'CGGA.mRNAseq_693_clinical.20200506.txt', and click 'Next'.

  3. Choose 'Phenotypic Data' and click 'Next'.

  4. Choose 'The first column is sample IDs' and click 'Next'

  5. Choose 'These are the first data on these samples.', change the study name to 'CGGA', and click 'Import'.

  6. Choose 'Load more data'

  7. Click 'Select Data File', choose

    'CGGA.mRNAseq_693.RSEM-genes.20200506.txt', and click 'Next'.

  8. Choose 'Genomic Data' and click 'Next'.

  9. Confirm selection of 'The first row is sample IDs' and click 'Next'

  10. Choose 'I have loaded other data on these samples and want to connect to it.', select 'CGGA' from the drop down, and click 'Import'.

Video of steps 1-6

Video of steps 7-10

Note that it can take several minutes for the RNAseq data to load since it is larger.

Part C

We will look at the chromosome 1p-19q co-deletion in Chinese glioma patients and compare this to IDH1 expression.

Ending Screenshot for Visual Spreadsheet (end of step 5)

Ending Screenshot for Box plot (end of step 11)

Ending Screenshot for Kaplan Meier Analysis (end of step 13)

Steps

  1. Click on 'Visualization' in the top menu bar.

  2. Type 'CGGA', choose 'CGGA' as the study and click 'To first variable'.

  3. Enter the gene 'IDH1', choose 'CGGA.mRNAseq_693.RSEM-genes.20200506.txt', and click 'To second variable'

  4. Choose 'Phenotypic', click '1p19q_codeletion_status', and click 'Done'

  5. The dataset authors annotated samples without a 1p/19q co-deletion status with 'NA'. To remove these samples, type 'NA' in the samples search bar and choose 'Remove Samples' from the filter actions menu drop down.

  6. Compare IDH1 expression between samples with a 1p/19q co-deletion and those that do not. To do this, click on the column menu for column B (IDH1 expression) and choose 'Charts & Stats'.

  7. Choose 'Compare Subgroups'.

  8. Click the dropdown for 'Show data from' and choose 'column B: IDH1 - CGGA.mRNAseq_693.RSEM-genes.20200506.txt'.

  9. Click the dropdown for 'Subgroup samples by' and choose 'column C: 1p19q_codeletion_status - CGGA.mRNAseq_693_clinical.20200506.txt'.

  10. Click 'Done'.

  11. Close the chart using the 'x' in the upper left corner.

  12. Run a Kaplan Meier analysis comparing patients with high IDH1 expression to those with low IDH1 expression. To do this, click on the column menu for column B (IDH1 expression) and choose 'KM plot'

Video of steps 2-4

Video of steps 5-10

Video of steps 11-12

Sign up here:

for updates.

This tutorial assumes completion of the . This tutorial begins where the Basic Tutorial: Section 1 ends.

To ensure your columns are sorted the same as those in this tutorial,

Start at our home page and click on 'Launch Xena'. You are now in our Visual Spreadsheet Wizard.

In this example we selected the TCGA data from the GDC.

This tutorial assumes basic knowledge of how to build and read a . To get this, go through . It also assumes basic knowledge of filtering. To get this, go through .

Start at

This tutorial assumes basic knowledge of how to build and read a . To get this, go through .

Start at

This tutorial assumes basic knowledge of how to build and read a . To get this, go through .

Start at our home page and click on 'Launch Xena'. You are now in our Visual Spreadsheet Wizard.

There are two main sources of normal expression data in Xena. The first is matched normal tissue samples from TCGA patients. These patient's samples are called "solid tissue normals" and are taken from tissue near the tumor. Normal samples from TCGA patients are typically limited in number but some cancer types may have enough for a robust statistical comparison. It is important to note that their proximity to tumor means it may have tumor microenvironment signal. The second source of normal expression is . GTEx has expression data from normal tissue of individuals who do not have cancer. There are typically many more samples in GTEx then in TCGA solid tissue normals. However, experimental sample processing are different from TCGA, which may lead to batch effects.

You can use the for both types of 'normal' samples. Data from the study is from the UCSC RNA-seq Compendium, where TCGA, TARGET, and GTEx samples are re-analyzed by the same RNA-seq pipeline. This pipeline involved re-aligning the reads to hg38 genome and calling gene expression using RSEM and Kallisto methods. Because all samples are processed using a uniform bioinformatic pipeline, batch effects due to different computational processing is eliminated. Note that the samples from this study have only undergone per-sample normalization.

If you are looking to compare just a few genes, you can use our to run your analysis. If you are looking to run a genome-wide differential gene expression analysis, you can use our . Note that we only allow users to run our Differential Gene Expression Analysis on less than 2,000 samples total. Thus, you will need to filter to run this analysis on this dataset.

To visualize the data, you will need basic knowledge of how to build and read a , how to , how to , and how to . To get this go through the Basic Tutorials, starting with .

Go to and scroll to the DataSet ID mRNAseq_693.

1. Click '' at the top of the screen. You should see a screen similar to this:

Please see our or if you encounter any problems.

1. Click '' at the top of the screen. You should see a screen similar to this:

Note that we are unable to provide links to these ending screenshots because we do not allow users to create bookmarks when viewing data from their own Local Xena Hubs. This is to protect the privacy of your data.

https://ucsc.zoom.us/meeting/register/VnLfFkaPS26Uj22AyVyMfA
Subscribe to our newsletter
Contact us
Cite us
Basic Tutorial: Section 1
please start at this link
Ending Screenshot
Filtering and subgrouping samples
Supported search terms
Ending Screenshot
Filtering and subgrouping samples
Supported search terms
Ending Screenshot
Kaplan Meier survival analysis
Ending screenshot
Ending screenshot
Ending screenshot
Link to Ending Screenshot
http://xena.ucsc.edu/
There are 4 versions of the TCGA data in Xena.
This page can help you decide which version of TCGA data to use for your own analysis.
Visual Spreadsheet
Ending Screenshot
Ending Screenshot
Visual Spreadsheet
Basic Tutorial: Section 1
Basic Tutorial: Section 2
Ending Screenshot
https://xenabrowser.net/
Filtering samples
Ending Screenshot
Genomic signatures
Differential Expression Analysis
Visual Spreadsheet
Basic Tutorial: Section 1
​Ending Screenshot​
https://xenabrowser.net/
Visual Spreadsheet
Ending Screenshot
Ending Screenshot
https://connect.springerpub.com/content/book/978-0-8261-6868-9/chapter/ch01connect.springerpub.com
How do I compare tumor vs normal expression?
Visual Spreadsheet
Basic Tutorial: Section 1
Ending Screenshot
http://xena.ucsc.edu/
Ending Screenshot
Ending Screenshot
GTEx
TCGA TARGET GTEx study
Filtering and subgrouping
Compare gene expression between subgroups
chart view
DEA feature
Filtering and subgrouping
Running a Differential Gene Expression Analysis
Tutorial: Tumor vs Normal
FOXM1a, FOXM1b, FOXM1c transcript expression in tumor vs. normal samples
PDL1 and PD1 expression across 39 cancer types in TCGA and TARGET
MGMT gene expression ~ promoter DNA methylation in GBM
ERG-TMPRSS2 fusion in prostate cancer
KM plot of breast cancer PAM50 subtypes
Genetic separation of lower grade gliomas: one characterized by loss of chromosomes 1p & 19q, the other by TP53 & ATRX mutations
Copy number for EGFR, PTEN, chromosome 1, 7, 10, 19 in TCGA brain tumors
Mutation pile-ups in intron enhancers in ICGC lymphoma
Stemness scores across TCGA cancer types
Visual Spreadsheet
filter samples
create a box plot in Chart View
run a Kaplan Meier Analysis
Basic Tutorial: Section 1
Data format specifications and supported biological data types
KM plots using data from a Local Xena Hub
Genomic and Clinical data to load into Xena
http://www.cgga.org.cn/download.jsp
Copy of genomic and clinical data to load into Xena
VIEW MY DATA
FAQ/Troubleshooting Guide
contact us
VIEW MY DATA
For more information see our Bookmarks help section.

How do I make more than 2 subgroups?

To make more than 2 sample subgroups, enter multiple search terms, such as 'C:>15' into the search box. Separate each search term with a ';'.

This can be used for a number of situations:

  • To divide a single numerical column into more than 2 subgroups (e.g. geneA high, geneA mid, and geneA low)

  • To make subgroups over the expression of two genes such that you get 4 subgroups (e.g. geneA high + geneB high, geneA low + geneB high, geneA high + geneB low, geneA low + geneB low)

  • To make subgroups over the expression of a gene and a categorical column (e.g. geneA high + Estrogen Receptor positive, geneA low + Estrogen Receptor positive, geneA high + Estrogen Receptor negative, geneA low + Estrogen Receptor negative)

  • To make subgroups over two categorical columns (e.g. Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative)

See below for an example of each.

Examples

Dividing a single numerical column into more than 2 subgroups (e.g. geneA high, geneA mid, and geneA low)

In the screenshot below you can see that column D that ranges from 7.3 to 12. If you wanted to have 3 groups: 7.3 - 9, 9 - 10, and 10 - 12, you would enter:

C:>9 ; C:>10

into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.

Making subgroups over the expression of two genes such that you get 4 subgroups (e.g. geneA high + geneB high, geneA low + geneB high, geneA high + geneB low, geneA low + geneB low)

Making subgroups over the expression of a gene and a categorical column (e.g. geneA high + Estrogen Receptor positive, geneA low + Estrogen Receptor positive, geneA high + Estrogen Receptor negative, geneA low + Estrogen Receptor negative)

In the screenshot below you can see that column E (ERBB2 gene expression) that ranges from 10 to 16. If you wanted to have 4 groups: ERBB2 > 13 + Estrogen Receptor positive, ERBB2 <= 13 + Estrogen Receptor positive, ERBB2 > 13 + Estrogen Receptor negative, ERBB2 <= 13 + Estrogen Receptor negative), you would enter:

E:>13 ; C:Negative

into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.

Making subgroups over two categorical columns (e.g. Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative)

In the screenshot below, if you wanted to have 4 groups: Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative you would enter:

C:Negative ; D:Negative

into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.

How do I make subgroups with geneA high and geneB high?

This page details how to create subgroups based on the expression of 2 genes so that you create the following 4 subgroups:

  • geneA expression is high AND geneB expression is high

  • geneA expression is high AND geneB expression is low

  • geneA expression is low AND geneB expression is low

  • geneA expression is low AND geneB expression is high

To do this enter a search terms for each gene, such as 'C:>15' or 'D:<0.6' into the search box and separate each search term with a ';'.

Example

You can see in the search bar the expression used to make column A using the example genes of CD44 and CD24.

Also note that you can use this feature on columns besides gene expression, such as copy number variation, etc. You can also use it on categorical features, for instance to compare expression of a gene and the patient's gender (male or female). Simply add the gender column to the Visual Spreadsheet and enter 'female' for one of the search terms above.

How do I make subgroups?

Use the find samples feature (highlighted below) to make subgroups:

First, search for all the patient's samples you want in one of your subgroups. Next, click the Filter + Subgroup menu and choose 'New subgroup column'.

This will create a new subgroup column. All the patient's samples that matched your search term will be in one subgroup labeled as 'true' and all the samples that did not match your search term will in the other subgroup labeled as 'false'.

More information:

Example

In this example we are creating two subgroups in the TCGA Lung Adenocarcinoma study: patient's samples with aberrations in EGFR and those without. These aberrations could be mutations or copy number amplifications.

Steps

  1. Type '(mis OR infra) OR C:>0.5' into the samples search bar. This will select samples that either have a missense or inframe deletion '(mis OR infra)', or where copy number variation (column C) is greater than 0.5. Note that I arbitrarily choose a cutoff of 0.5.

  2. Click the filter menu and select 'New column subgroup'. This will create a new column that has samples that met our search term marked as 'true' (ie. those that have an EGFR aberration) and those that did not meet our search term as 'false' (ie. those that do not have an EGFR aberration).

What is functional genomics?Functional genomics I
The Cancer Genome Atlas - Cancers Selected for StudyNational Cancer Institute
The Cancer Genome Atlas - Data Types CollectedNational Cancer Institute
Supervised Risk Predictor of Breast Cancer Based on Intrinsic SubtypesJournal of Clinical Oncology
Pharmacogenomic Predictor of Sensitivity to Preoperative Chemotherapy With Paclitaxel and Fluorouracil, Doxorubicin, and Cyclophosphamide in Breast CancerJournal of Clinical Oncology

This page assumes you are familiar with making 2 subgroups. If you are not, please see .

from 'true' and 'false' to something more biologically meaningful.

from 'true' and 'false' to something more biologically meaningful.

from 'true' and 'false' to something more biologically meaningful.

This page assumes you are familiar with making 2 subgroups. If you are not, please see the help page.

from 'true' and 'false' to something more biologically meaningful.

Your new column can be used for a or to .

For more information see our .

from 'true' and 'false' to something more biologically meaningful.

'How do I make subgroups'
Ending Screenshot
See our help on renaming the subgroup labels
Click here to see our separate help page for this scenario
Ending screenshot
See our help on renaming the subgroup labels
Ending bookmark
See our help on renaming the subgroup labels
'How do I make subgroups'
Ending Screenshot
See our help on renaming the subgroup labels
KM analysis
compare gene expression
How to search for samples
Basic Tutorial: Section 2
See our help on renaming the subgroup labels
Home | CGGA - Chinese Glioma Genome Atlas

How do I change the color of a column?

To change the color threshold, click on the column menu at the top and choose 'Display'. From there click 'custom', enter your new thresholds, and click 'Done'.

To change the color, click on the column menu at the top and choose 'Display'. From there choose a new set of colors from the drop down.

Visual Spreadsheet
Colors and values in columns
Sample sorting
Advanced dataset menu
Search terms for filtering and subgrouping
Logo
Logo

How do I filter to just one cancer type

For users who wish to use the datasets in a Pan-Can cohort but need to view just one cancer type.

Generalized Steps

1. Add the phenotype column that details the cancer type

The phenotype column will vary depending on which study you choose. See below for specific column names

2. Search for the cancer type you are interested in, making sure that it is listed in the phenotype column. Click the Filter + subgroup menu next to the search bar and select 'Keep Samples'.

For TCGA Pan-Cancer (PANCAN)

For the TCGA PanCan (PANCAN), you will want to add the phenotype column:

cancer type abbreviation

For TCGA TARGET GTEx

For the TCGA TARGET GTEx, you will want to add the phenotype columns:

main category

study

primary_site

How do I interact with the tooltip?

When you are in the Xena Visual Spreadsheet, hovering the mouse over any data on the screen will trigger a tooltip to show up at the top of the view.

To freeze the tooltip, you need to "Alt-click", i.e. hold on the ALT key on your computer and at the same time click the left mouse button.

To unfreeze the tooltip, click on the close (X) icon.

This can be helpful if you want to click on the link to take you to the UCSC Genome Browser, where you can view more information about those genomic coordinates.

How do I compare gene expression between different cancer types?

Steps:

  1. Add a column of data. Enter your gene or list of genes, select 'Gene Expression' and click done.

  2. From the column menu at the top of the new column you created, select 'Chart & Statistics'

  3. Choose 'Compare Subgroups'

  4. Click the dropdown for 'Show data from' and choose your gene expression column.

  5. Click the dropdown for 'Subgroup samples by' and choose the cancer type column.

  6. Choose if you would like a box plot or violin plot and click 'Done'.

Visual Spreadsheet

This dynamic, powerful, and flexible view is our default view into the data.

The Visual Spreadsheet allows you to add an arbitrary number of columns of any data type (mutation, copy number, expression, protein, phenotype, methylation, etc) on any number of patient's samples into a spreadsheet-like view. We line up all columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.

Making a Visual Spreadsheet

The wizard on the screen will guide you to choose a study to view and TWO columns of data to view on those samples. Note that if you do not choose at least two columns, the wizard will not exit and let you interact with the data.

Selecting a cohort

You can select a cohort either by choosing 'Help me select a cohort' and searching our cohorts for you cancer type, etc. or by choosing 'I know the study I want to use' and searching for the partial or full name of the cohort you are interested in.

Adding a Gene or Position

Enter a HUGO gene name or a dataset-specific probe names (e.g. a CpG island). You can enter one gene or multiple genes. Separate multiple genes with a space, comma, tab, or new line.

To display a genomic region, enter the genomic region, choose your dataset and click 'done'. We recongize chromosomes (e.g. chr1), arms of chromosomes (e.g. chr19q), and chromosomes coordinates (e.g. chr1:100-4,000).

Selecting a Dataset

After entering a gene or probe name, you will need to select one or more datasets.

Basic Datasets

We have pre-selected default datasets for most cohorts. These datasets are selected based because they are the most used datasets. Typically there is a default mutation, copy number, and expression dataset.

Advanced Datasets

Xena also has more datasets than those listed in the Basic Menu. Depending on the cohort, these can include DNA methylation, exon expression, thresholded CNV data and more. To access them, click on 'Show Advanced' below:

More information on basic datasets

Video of making a Visual Spreadsheet

After you made a Visual Spreadsheet

Overview

Patient samples are on the y-axis and your columns of data are on the x-axis. We line up all columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.

If you entered a single gene

If you entered a single gene, that gene will be listed at the top of the column. If there are multiple probes mapped to that gene in the dataset you selected they will be displayed as subcolumns ordered left to right in the direction of transcription.

If you selected a positional dataset, such as segmented copy number variation or mutation we will display the gene model will be displayed at the top of the column. The gene model is a composite of all transcripts of the gene. Boxes show different exons with UTR regions being short and CDS regions being tall. We display 2Kb upstream to show the promoter region. Use the column menu to toggle to show intronic regions.

If you entered multiple genes

If you entered multiple genes, each gene will be listed as a subcolumn for that dataset. If there are multiple probes mapped to that gene in the dataset (i.e. if you entered a single gene then you would see the probes as subcolumns), then the probes are averaged for a single value per gene.

Note that if you entered more than one gene and selected a mutation dataset, we will only show the first gene. If you wish to see multiple mutation columns, please enter each gene individually and click 'done'

If you entered a chromosome or chromosome position

When displaying a chromosome range, genes will be shown at the top of the column, with dark blue genes being on the forward strand and red genes being on the reverse strand. Hovering over a gene will display the gene name in the tooltip. Note that introns are always shown in this mode.

Data values

Individual values vary by dataset. The legend at the bottom of the dataset will tell you the units for your particular dataset, including any normalization that was performed. If a sample does not have data for a column, it will show as gray and be labeled as 'null'.

If the entire column is gray this means we did not recognize the gene, probe, or position. If you believe this to be in error, please try an alternate name.

More information about a dataset can be found in the dataset details page. To get there, click on the column menu and choose 'About'.

Sample sorting

The Xena Browser uses the y-axis for samples and the x-axis/columns for genomic/phenotypic features. Data from a single sample is always on the same horizontal line across all columns, allowing you to see screen-wide trends. The Xena Browser orders samples left to right first by the first columns, then the second, etc. If there are multiple genes, identifiers, probes within in a column, samples is ordered from left to right by 1st sub-column, then 2nd sub-column, and so on.

Numerical data are ordered in descending order (e.g. 3.5, 1.2, ...). Categorical data (e.g. stage, tumor type, etc) are ordered by categories. CNV data is sorted by the average of the entire column. Positional mutation data is ordered by genomic coordinates (from 5'->3') and then by the predicted impact of the mutation. Both CNV and positional mutation data has the option to instead sort by the zoomed region. Click the column menu at the top of the column and choose 'Sort by zoom region avg'.

To reverse the ordering, click the column menu at the top of the column and chose 'Reverse sort'

Move a column/change the sample sorting

As the sample sort order is controlled by the left most columns, it can be useful to explore the data by moving a different column to the left.

To move a column click on the column header and drag a column to the right or left.

Zooming

Click and drag any where in any column to zoom in in either direction. Zoom out to all samples by clicking the 'Clear Zoom' at the top. Zoom out to the whole column by clicking the red 'x' at the top of a column.

Tooltip

Resize a column

You can change the size of a column by clicking on the bottom right corner of a column and dragging to a new size.

Add another column

You can add another column of data by clicking on 'Click to add column' either on the right edge of the visual spreadsheet or by hovering between columns until 'Click to insert column' displays'.

Kaplan Meier Plots

Generating a plot

To generate a KM plot, click on the column menu at the top of a column and choose 'Kaplan Meier Plot'.

Features

Sample groups

For numerical or continuous features, you will have the option of having 2 groups of samples, 3 groups of samples, or viewing the upper vs lower quartile. For 2 groups, we divide the samples on the median. For 3 groups, we divide samples into the upper third, middle third, and lower third.

When viewing the upper vs lower quartile, note that we only include samples that are greater than (not greater than or equal to) the upper quartile, and the same for the lower quartile.

Note that all are used to calculate the median and other dividing values, whether or not they have survival data. To see which samples have survival data, add the column 'OS' from the phenotype data.

If more than one sample has the same value, we put the samples in a group together, even if this means the groups end up being unequal in size.

For categorical features, we only show the first 10 categories.

We remove samples with 'null' data for all plots.

Type of survival

Survival time cutoff

We default to the last time any individual in the plot was known to be alive. You can change this to be 1-year or 5-year survival by changing the time cutoff at the bottom of the screen. The statistics will automatically recalculate. TCGA data uses days as their measurement of time.

PDF

You can generate a high quality PDF by clicking the PDF icon.

Download

Statistics used

When there are multiple curves or lines in a KM plot, Xena Browser compares the different Kaplan–Meier curves using the log-rank test. The Browser reports the test statistics (𝜒 2) and p-value (𝜒 2 distribution). Data is retrieved in real-time from Xena Hub(s) to a user's web browser and the test is performed in the browser to maintain your data privacy.

Exceptions

If all patients in a particular group (i.e. line) are censored before any event happens for the whole population (including all the groups), we exclude this group from the statistical analysis and perform the log-rank test on the remaining groups. We do this because we have no way to know the number of people at risk for this particular group at any of event times, and therefore can not compute any statistics for this group. R handles this exception in the same way. Although this group is removed from the statistical analysis, we still display the group in the KM plot.

Duplicate samples

More information on how to load your own survival data into Xena

Filtering and subgrouping

How to find samples that you want to remove or keep in the view. How to make subgroups.

Use the search box at the top of the screen to first pick/find your samples of interest. Then filter to keep or removes these samples, create a new subgroup column, or zoom.

The bar highlighted above allows you to search all data on the screen for your search term. Note that it will not search data that is not on the screen. Samples that match your criteria are marked with a black bar in the Visual Spreadsheet.

Searching for samples

You can search for samples by either typing in the search bar or by clicking on the dropper icon to enter the pick samples mode. The pick samples mode will allow you to click on a column to select samples. The search term for your picked samples will appear in the search bar. To exit the pick samples mode, click on the dropper icon again.

Note the pick samples mode tends to work best if the column you are selecting from is the first column.

Example of pick samples mode

Filter + Subgroup menu

Once you have your sample(s) of interest, click on the filter + subgroup menu and choose to:

Keep samples: Keep only the samples which match your criteria.

Remove samples: Remove the samples which match your criteria.

Clear sample filter: Remove ALL filters currently applied.

Remove Samples with nulls: Removes samples that have no data for one or more columns. Equivalent to typing 'null' in the search bar and choosing 'Remove samples'.

Zoom: Zoom to the samples that meet your criteria. Shift-click to zoom out.

Search bar history

Once you have either filtered, created a subgroup column, or zoomed to samples, your search term will be added to the search history. Access the search history by clicking the downward facing arrow at the upper right of the search bar.

Changing subgroup labels

Once the subgroup column is created, users can change the labels from "true" or "false" to, for example, "wild type" or "EGFR mutant" by adjusting the column display settings. To access these select the three dot menu at the top of the column and choose 'Display'

Chart & Statistics View

Enter Chart View

To get to the chart view click on the icon indicated below by the red box or use the column menu and select 'Chart & Statistics'.

Build a chart

Once you enter Chart View, it will ask you a series of questions about what type of graph you are trying to make.

Compare subgroups will allow you to compare groups of patient's samples, either those that you have made or via a categorical feature, such as sample type. It will build the appropriate graph depending on whether you have selected a continuous numerical or categorical column. This option will let you make box plots, violin plots, bar charts, and dot plots.

See a distribution will let you see a histogram distribution of the data in a single column. You can view the mean, median, and various standard deviations on the distribution. The column can have sub-columns, either multiple probes or multiple genes, which will instead create a plot with multiple box plots.

Make a scatterplot will make a scatterplot from two continuous numerical columns. The second column can have multiple sub-columns, either multiple probes or multiple genes, which will create overlapping scatterplots

If an option is grayed out, this means that you do not have enough or the right type of data on the screen. Return to the Visual Spreadsheet and add more data.

After building a chart

If you are viewing a distribution of a continuous feature, such as gene expression for a single gene, you can add lines to the graph that indicate the mean/median or percentiles.

If you are viewing a scatterplot, you can color the points by a third column of data.

If you are viewing a dot plot, you can select if you would like to view the data as 'continuous value' where the size of the dot reflects the mean, same as the intensity of the color, or if you would like to view the data as 'single cell count data' where the size of the dot reflects the percent of cells/samples that have a non-zero value.

Advanced options available under the graph will allow you to change the scales of the axes.

We show statistics in the bottom right corner of the screen for most graphs. If we detect it will take some time run the statistics we may instead show a button with 'run stats', so that you can decide if you would like to run the statistical test.

Note that for violin plots, the width of each plot is does not relate to the number of samples in the plot.

Return to the Visual Spreadsheet

To return to the Visual Spreadsheet, click either the icon in the upper left, or the 'x' close button.

Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas - Genome BiologyBioMed Central

that will take you to the TCGA PanCan (PANCAN) Study with that phenotype column already selected.

that will take you to the TCGA TARGET GTEx Study with those phenotype columns already selected.

This page assumes that you are already viewing more than one cancer type in view. Please see the help page '' to get started with this.

If there are cancer types in view that you do not want to investigate, you will need to filter them out. Please see the help page '' to get started with this.

Get started by going to the and following the wizard to enter your data of interest.

We annotate datasets used in the basic Visual Spreadsheet wizard with a red asterisk in our datasets pages. For an example see:

The Tooltip at the top of the Visual Spreadsheet shows more information about the data under the mouse. Links are links to the UCSC Genome Browser to learn more about that gene or genomic position. Alt-click to freeze and unfreeze the tooltip to be able to click on the links. .

Kaplan Meier Survival Analyses are a way of comparing the survival of groups of patients. More information on what a Kaplan Meier analysis is can be found in

For mutation features, we divide samples into those with any mutation and those without. To make different groups (e.g. samples with nonsense mutations vs those without), and run a KM plot on the new column

We default to Overall Survival. Users can select different end points if they are available. An example of this is in the .

You can download the data used to generate the KM plot using the download icon. It will download the , in addition to the sample ID, patient ID, groups, and underlying data.

The statistics the Xena Browser reports are equivalent to R's survival package, , with rho=0 (default in R).

Note that we do not automatically remove duplicate patients (for instance if there is a tumor and a normal sample from the same patient). You can determine if there are duplicate patients by looking for the "!" icon next to the p value. .

More information on

New subgroup column: Create a new column where samples that meet your criteria are annotated as 'true' and samples that don't meet your criteria are annotated as 'false'. This new columns can then be used for or in the .

To create more than 2 subgroups, please see our guide.

Note this search history will be preserved in .

Chart View will generate bar plots, box plots, violin plots, scatter plots, and distribution graphs using any of the columns in a Visual Spreadsheet. Statistics, such as and , and will be calculated automatically.

More information on filtering
Here is a bookmark
Here is a bookmark
How do I view multiple types of cancer together
How do I filter to just one cancer type
Xena Browser
https://xenabrowser.net/datapages/?cohort=TCGA%20Acute%20Myeloid%20Leukemia%20(LAML)
Click here for more information about interacting with the tooltip
this article
create your own subgroups
TCGA PanCancer Study
survdiff
Learn how to remove duplicate samples
KM plots using data from a Local Xena Hub
supported search terms
Beginning of the video
Kaplan Meier Analysis
Chart View
'How do I make subgroups with geneA high and geneB high?'
bookmarks

Coloring for Segmented Copy Number Columns

When there are no overlapping segments, Xena displays the value and color of the copy number segment as indicated in the column legend at the bottom of the column.

When there are overlapping segments, Xena follows these steps:

  1. Compute overlaps by slicing segments that overlap with other segments. For example if there was one segment from chr1:10000-20000 and a second segment from chr1:10050-10100, then resulting segments from this step would be chr1:10000-10050, chr1:10050-10100, and chr1:10100-20000.

  2. For each segment defined in step 1, determine which segments in the original data overlap with this segment.

  3. Divide data segments into those that are greater than copy number neutral (i.e. are amplifications) and those that are less than copy number neutral (i.e. are deletions). Average the segments for each of these two groups.

  4. Find the colors corresponding to the two averages from step 3. Then pick a color that is in between those two colors on the color wheel. An example would be that if the amplifications are red and deletions are blue, the resulting color from a strong amplification and a strong deletion would be purple. Note that copy number neutral in this example would be white.

Welch's t-test,
Pearson's
Spearman's rank correlation
ANOVA

Coloring for Mutation Columns

More information about how we color mutation columns

Samples that have mutation data are white with a dot or line for the mutation for where the mutation falls in relation to the gene model at the top of the column. Mutation data is colored by the functional impact:

  • Red - Deleterious

  • Blue - Missense

  • Orange - Splice site mutation

  • Green - Silent

  • Gray - Unknown

Samples for which there is no mutation data are gray with no dot or line, and are marked as 'null'.

More details for 'Somatic mutation (SNP and INDEL)' datasets

Red --> Nonsense_Mutation, frameshift_variant, stop_gained, splice_acceptor_variant, splice_acceptor_variant&intron_variant, splice_donor_variant, splice_donor_variant&intron_variant, Splice_Site, Frame_Shift_Del, Frame_Shift_Ins

Blue --> splice_region_variant, splice_region_variant&intron_variant, missense, non_coding_exon_variant, missense_variant, Missense_Mutation, exon_variant, RNA, Indel, start_lost, start_gained, De_novo_Start_OutOfFrame, Translation_Start_Site, De_novo_Start_InFrame, stop_lost, Nonstop_Mutation, initiator_codon_variant, 5_prime_UTR_premature_start_codon_gain_variant, disruptive_inframe_deletion, inframe_deletion, inframe_insertion, In_Frame_Del, In_Frame_Ins

Green --> synonymous_variant, 5_prime_UTR_variant, 3_prime_UTR_variant, 5'Flank, 3'Flank, 3'UTR, 5'UTR, Silent, stop_retained_variant

Orange --> others, SV, upstreamgenevariant, downstream_gene_variant, intron_variant, intergenic_region

Note that we are case insensitive when we color for these terms.

For the gene-level mutation datasets (Somatic gene-level non-silent mutation):

Red (=1) --> indicates that a non-silent somatic mutation (nonsense, missense, frame-shif indels, splice site mutations, stop codon readthroughs, change of start codon, inframe indels) was identified in the protein coding region of a gene, or any mutation identified in a non-coding gene

White (=0) --> indicates that none of the above mutation calls were made in this gene for the specific sample

Pink (=0.5) --> some samples have two aliquots. In the event that in one aliquot a mutation was called and in the other no mutation was called, we assign a value of 0.5.

GSEA

Run a genome-wide differential GSEA analysis to compare groups of samples

To run a GSEA analysis, click on the 3 dot column menu at the top of a categorical column (not a numerical column) and choose 'GSEA'.

This will take you to new page where you will define the sample subgroups you would like to compare (note that you can select multiple categories for a single subgroup).

After you have your subgroups, choose a gene set library, scroll to the bottom and click 'submit'.

Due to compute limitations you can only run a total of 2000 samples through the analysis pipeline.

This will start the analysis, which make take a while to run depending on the size of the dataset. As the results are completed, the web page will update. Scroll to see more results. Once the analysis is finished it will say 'Done' at the top of the page.

More details

The Advanced Visualization parameters apply to the PCA or t-SNE plot, as well as the blitzGSEA analysis itself.

Running it on your own data

Genomic Signatures

Enter a genomic signature over a set of genes for a particular dataset

Genomic signatures, sometimes expressed as a weighted sum of genes, are an algebra over genes, such as "ESR1 + 0.5*ERBB2 - GRB7". Once a signature is entered, the value for each gene name for each sample are substituted and the algebraic expression is evaluated.

Entering a signature

  1. Open the Add column menu

  2. Enter '=' and then your signature into the gene entry box

  3. Select 'gene expression' as the dataset

  4. Click 'Done'

There must be a space on both sides of the "+" and "-".

Alternatively enter a list of genes and we will automatically add a '+' in between each gene when evaluating the signature

If we can not find a gene that is part of the signature, the missing gene will be included as a zero in the expression calculation and the label will list the genes as missing.

Example: TFAC30 Gene Signature

Hess et.al. identified 30 genes whose gene expression profile is predictive of complete pathologic response to chemotherapy treatment in breast cancer.

Gene signature

=E2F3 + MELK + RRM2 + BTG3 - CTNND2 - GAMT - METRN - ERBB4 - ZNF552 - CA12 - KDM4B - NKAIN1 - SCUBE2 - KIAA1467 - MAPT - FLJ10916 - BECN1 - RAMP1 - GFRA1 - IGFBP4 - FGFR1OP - MDM2 - KIF3A - AMFR - MED13L - BBS4

Here we can see that the predicted chemo response signature is high in the basal subtype and low in luminal subtype. Additionally, the signature is high for ER negative samples and low for ER positive samples.

Signatures datasets

To use these signatures, go to the dataset pages (links above) to see what the names of the specific signatures are (under Identifiers). Then in the visualization enter the name of the specific signature as a gene, click 'Advanced', choose the appropriate dataset, and click 'Done'

Download Data

There are 4 ways to download data

The four ways to download data

1. Download data in a single column of a Visual Spreadsheet In a Visual Spreadsheet, click on the column Hamburger menu, then "Download" to download just the data from the column.

2. Download data in an entire Visual Spreadsheet In a Visual Spreadsheet, clicking on the download icon in the upper right corner of the spreadsheet.

3. Bulk download a whole dataset file Click top banner "Data Sets" to navigate to the dataset of your interest, where a download url link is in the page. You can also reach the dataset page by clicking on the column Hamburger menu, then "About". Click on the download url to download the entire dataset. Or use "wget", "curl" to download from command line.

4. Via our APIs:

How do I open the download files?

Our files are tab-delimited or '.tsv'. We recommend opening them in your favorite spreadsheet program, such as Microsoft Excel, which will automatically convert the tabs into new columns. Please note that if you have many thousands of samples, Microsoft Excel will likely have difficulty opening the file. In this case, the command line may work better for you.

Bookmarks

Bookmarks are a great way to save a particular view in Xena, either for yourself or to share with others.

Creating a bookmark

To bookmark a view, click on 'Bookmark' in the top navigation bar. From here you can either click 'Bookmark' to create a bookmark URL or click 'Export' to export a file that can then be imported back to the browser.

When you click 'Bookmark' you will then need to click 'Copy Bookmark' to copy the bookmark URL to your copy buffer. Large views may take a second or two to generate a URL.

Note that your filter and subgroup history, as well as the last Chart View you created, if any, will be saved as part of the bookmark.

Bookmarks are only guaranteed for 3 months

More information: Bookmark vs. Export/Import

The 'Bookmark' option will store all the data in view on our servers and provide you a link. This is the easiest way to share a view. Note that if you have any private data in view, this option will be disabled to preserve your privacy. Please also note that if you lose the link there is no way to get it back.

If you chose Export, it will give you a file with everything Xena needs to recreate your view. You can then save this file and import it back into Xena. While this option can be a bit cumbersome, it will allow you to share private data. Note that these files are still only guaranteed for 3 months, though they may last for longer.

Recent Bookmarks

The 'Recent Bookmarks' option will temporarily show the 15 most recent bookmarks you have created. This can be useful if you're constructing many bookmarks. Note that this menu is frequently reset so do not use this as permanent storage for a bookmark.

FAQ

How do I make a bookmark with private data in view?

When you create a bookmark link, we save the data in view on our servers. To protect user data privacy, we have disabled this option when private data is in view. Please use the Export/Import option instead.

The gene expression dataset chosen for a specific study/cohort is the same gene expression dataset as the one in the .

Note that the GSEA analysis runs , a faster implementation of a traditional GSEA analysis.

We disable running our GSEA analysis on your own data since we send the data in the analysis to various websites, which may not be secure. Currently we only offer a as a method for running this pipeline on your own data. Please contact us if you need help setting this up.

Bookmark:

We also have a number of signature datasets under the from the PanCan Atlas project:

blitzGSEA
docker image
https://xenabrowser.net/?bookmark=2401ccb792e256d7397008b24af20565
Hess KR, et. al. Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J Clin Oncol. 2006 Sep 10;24(26):4236-44. Epub 2006 Aug 8.
TCGA Pan-Cancer study
gene programs
HRD score, genome-wide DNA damage footprint
Immune Signature Scores (Denise Wolf et al)
Stemness score (DNA methylation based)
Stemness score (RNA based)
Basic Datasets menu
Python
R

Xena Gene Set Viewer

Source code:

Overview

The Gene Set Viewer allows comparison of individual gene sets or pathways and their genes across two cancer tumor sample cohorts as well as comparison within the same sub cohorts.

As an overview, Figure 1 shows two cohorts, the left (olive background, TCGA Ovarian Cancer) and the right (tan background , TCGA Prostate Cancer). Figure 1A shows the selection for the analysis, Gene Set, view limit, and filter (differential versus similar). Figure 1B shows the view comparing the Mean Gene Set Score in the center and individual samples on the right. 1C shows the individual samples, with the hover result showing the sample and score in 1E. 1D provides a link directly into Xena for the given gene set. 1F provides a sharable URL link. 1G provides a login for use in uploading.

Analysis

Gene Expression

  • BPA GENE EXPRESSION

  • PARADIGM IPL

  • REGULON ACTIVITY (only avaiable for the LUAD Cohort)

Mutation / CNV

  • CNV ∩ MUTATION

  • COPY NUMBER

  • MUTATION

Sources for the somatic mutation and copy number variation data

The Xena Gene Sets Viewer compares gene expression, somatic mutation, and copy number variation profile of cancer related gene sets across cancer cohorts. It queries genomics data hosted on public Xena Hubs, in a similar way as other tools in the Xena Visualization suite. And then it generates gene set visualizations of those data.

Figure 8 shows analysis of a GMT file using the BPA method [: thanks to Verena Friedl]. This is only available to logged in users and they may only see their own analysis and are limited to 100 pathways. Logins are any valid google login. Several public pathway sets are available including those curated from the Gene Ontology Consortium (thanks to Laurent-Philippe Albou) as well as those from the Hallmark [cite] and Pancan [cite] analyses.

)

Logo
Logo
https://xenagoweb.xenahubs.net/xena
Xena Gene Set Viewer
Xena Analysis Server
citation
https://xenabrowser.net/datapages/?hub=https://tcga.xenahubs.net:443
https://xenabrowser.net/datapages/?cohort=Cancer Cell Line Encyclopedia (Breast
Logo

TCGA

TCGA is our most used data resource. We host several versions of the TCGA data.

This paper helps clarify the differences between the Legacy TCGA data and the TCGA data on the GDC:

Types of data we have

We support a wide variety of data types including:

  • SNPs and small INDELs

  • Large structural variants

  • Segmented copy number, gene-level copy number

  • Gene-, Transcript-, Exon-, Protein-, LncRNA-, and miRNA-expression

  • DNA methylation (genes and probes)

  • Phenotype, clinical data

  • Signature scores, classifications, derived parameters

The type of data in each study vary considerably and depend on what analyses that particular study performed

How do I make a KM plot?

To make a KM plot, click on the column menu at the top of a column and choose 'Kaplan Meier Plot'.

Example

How do I remove duplicate samples from a KM plot?

If your plot has an '!' icon next to the p-value this means that some patients are in your plot twice. This can happen when A) a patient has both a tumor and normal sample or when a patient has a metastasis that is part of the dataset and/or B) a tumor sample was split into multiple aliquots and then run through the same analysis twice.

Example of error icon

Removing duplicates

  1. Add the data column of 'sample type' from the Phenotype data

We are adding a column of data that indicates the sample type such as 'Primary Tumor', 'Normal', etc. Note that different datasets may have a different name for this the data.

2. Filter to only samples that are 'Primary tumor' by typing 'primary' into the filter search box. Next, click the filter icon next to the filter search box and chose 'Filter'. This will filter out all samples that are not primary tumor.

Note that if you are viewing a mostly metastatic cancer like melanoma you may instead need to filter on 'metastatic' instead of 'primary'

3. Run your KM analysis by clicking the caret menu at the top of the column and choosing 'Kaplan-Meier plot' It will now only have primary tumor samples in it.

Example

Removing duplicate samples from TCGA Lower Grade Glioma KM analysis

How do I view multiple types of cancer together?

For users who wish to compare data across different types of cancer

To view multiple types of cancer patients side-by-side you will need to start with a Pan-Cancer dataset and then filter down to just the cancer types you want to see.

Generalized Steps

1. Add the phenotype column cancer type abbreviation that details the cancer type.

2. Search for the cancer type you are interested in, making sure that it is listed in the phenotype column. Separate each cancer type by 'OR'. Example: 'lgg OR gbm'. Click the Filter + subgroup menu next to the search bar and select 'Keep Samples'.

Example

Below is an example for viewing breast and ovarian cancer together for the TCGA PanCan Atlas

, a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. The TCGA dataset, describing tumor tissue and matched normal tissues from more than 11,000 patients, is publicly available and has been used widely by the research community. The data have contributed to more than a thousand studies of cancer by independent researchers and to the TCGA research network publications.

As its concluding project, The Cancer Genome Atlas (TCGA) Research Network completes the most comprehensive cross-cancer analysis to date: The Pan-Cancer Atlas. Xena displays the curated genomics and clinical data generated by the Pan-Cancer Atlas consortium working groups.

TCGA data uniformly re-analyzed at GDC using the latest Human Genome Assembly hg38. We download all open-access tier data from GDC, compile individual files into datasets organized by cohorts (33 individual tumor cohorts as well as a Pancan cohort. Xena displays the compiled datasets.

TCGA data has been co-analyzed with GTEx data using the UCSC bioinformatic pipeline (TOIL RNA-seq) and can be used to compare tumor vs normal gene and transcript expression from the matching tissue of origin. Xena hosts gene and transcript expression results of the UCSC RNA-seq recompute compendium.

Data generated and published by TCGA Research Network before the Pan-Cancer Atlas publications. Xena displays the level-3 data.

If you need a particular type of data, please see to help you find the study with that type of data

More information about KM plots can be found in our .

This page will guide you on how to remove duplicates due to A. If there are duplicates due to B you will need to , decide how to resolve any inconsistencies between the multiple aliquots and .

The contains the latest data from the PanCan Atlas project, including many hand curated datasets. It also contains some legacy TCGA data across all cancer types, including GISTIC 2 CNV estimates and miRNAseq estimates.

that will take you to the TCGA PanCan (PANCAN) Study with that phenotype column already selected.

The Cancer Genome Atlas (TCGA)
TCGA Pan-Cancer Atlas
TCGA data from Genomic Data Commons
TCGA data in the UCSC RNA-seq Recompute Compendium
Legacy TCGA data
Please see our help page on how to choose between these different versions of the TCGA data
choosing a study/cohort
Overview of Kaplan Meier Plots
download the data
load it into your own Xena Hub
Ending Screenshot
More help on filtering
Ending Screenshot
TCGA PanCan Study
Here is a bookmark
More information on filtering
Beginning screenshot
Ending screenshot

Choosing a study/cohort

General recommendations

We recommend the TCGA Pan-Cancer (PANCAN) study for most analysis. Unless you need a specific type of data or need to run a type of analysis listed below, we recommend the TCGA Pan-Cancer (PANCAN) study.

Why do we recommend this study?

We recommend it because it has the data from the Cancer Genome Atlas (TCGA) Research Network, which generated the most comprehensive cross-cancer analysis to date: The Pan-Cancer Atlas. Xena displays the curated genomics and clinical data generated by the Pan-Cancer Atlas consortium working groups.

Note that if you use the TCGA Pan-Cancer (PANCAN) to study a specific cancer type, you will need to filter down to just that cancer type.

If you don't want to filter ...

Our second most recommended datasets are the cancer-specific GDC TCGA studies. These avoid the need to filter down to a single cancer type and contain harmonized data from the Genomic Data Commons.

Differences between the GDC and the legacy TCGA data

More information comparing the data in the GDC to the legacy TCGA data can be found here:

Choosing a study by type of data

The table below assumes that you are interested in TCGA data. These data types may also appear in other studies, but these are the recommended studies.

Data type

Study

Dataset name

Menu

Transcript expression

TCGA Pan-Cancer (PANCAN)

TOIL Transcript expression

Advanced

lncRNA expression

TCGA Pan-Cancer (PANCAN)

TOIL Gene expression

Advanced

Exon expression

legacy TCGA datasets (per cancer type)

Exon expression

Advanced

miRNA expression

TCGA Pan-Cancer (PANCAN)

Batch Effects normalized miRNA data

Advanced

DNA methylation

Any

DNA methylation

Advanced

ATAC-seq

GDC Pan-Cancer (PANCAN)

ATAC-seq

Advanced

Varied Survival endpoints

TCGA Pan-Cancer (PANCAN)

NA (run KM plot)

--

Choosing a study based on a specific analysis or sample type

Analysis

Study

Compare Tumor vs Normal

TCGA, TARGET, GTEx

GRCh38 coordinates

Any GDC study

Cell Line

CCLE

Disease specific survival, disease free survival, progression free survival

TCGA Pan-Cancer (PANCAN)

TCGA Pan-Cancer (PANCAN) study
GDC Data Hub

Xena Browser

Why can't I interact with the browser? I can see one column of data but I can't do anything.

The Visual Spreadsheet wizard asks that you add at least TWO columns of data before interacting with the browser. This is because Xena was designed to allow you to find correlations within the data and you need more than one type of data on the screen to find a trend.

Add another column of data and click Done. You can always delete this column after you have completed the wizard if it is not needed.

Help! My gene is showing up as gray.

In general, we recognize genes from the HUGO gene name space. If you gene name isn't recognized, try looking at Gene Card and see if other names listed there are recognized.

Transcript View

About

Usage

Enter the HUGO name of your gene of interest and click 'OK'. Choose your two studies of interest from the two drop down menus. Each row in the visualization shows the transcript, transcript structure and density plots showing range of expression of that transcript.

Change the units from TMP (Transcripts Per Million) to isoform percentage using the drop-down near the top. To zoom on a row, click on it. To zoom out, click on the row again.

Data behind the view

Exon numbering

For this visualization, we numbered the exons using an in-house automated method which may not line up with exon numbering in the literature. This method is subject to change and should not be relied on to denote any exon going forward.

Regions that are intronic in all transcripts are removed. The remaining exonic regions are numbered 1..N. Different exons within a given region are labeled starting with ‘a’ for the left-most exon (in transcript direction).

For example, exon 3 is the unique exon in the third exonic region. Exons 4a and 4b are two different exons in the fourth exonic region.

Another way to say this is: different exons across all transcripts which overlap transitively will be assigned the same integer. So if one transcript has exons 4a and 4c, there must be exons in other transcripts that overlap them, and each other.

Data and datasets

What are the Source Repositories Xena pulls from?

  • Various journal publications for UCSC Public Hub data

For TCGA, which gene expression RNAseq dataset should I use for my analysis?

TCGA Pan-Cancer Atlas gene expression

GDC STAR gene expression

Toil RSEM gene expression

The goal of the Toil recompute was to process ~20,000 RNA-seq samples to create a consistent meta-analysis of four datasets free of computational batch effects. This is best used to compare TCGA cohorts to TARGET or GTEx cohorts

TCGA Gene expression RNAseq (IlluminaHiSeq)

For comparison within a single TCGA cohort, you can use the "gene expression RNAseq" data. Values in this dataset is log2(x+1) where x is the RSEM value.

TCGA Gene expression RNAseq (IlluminaHiSeq pancan normalized)

For questions regarding the gene expression of a particular cohort in relation to other types tumors, you can use the pancan normalized version of the "gene expression RNAseq" data. Values in this dataset are generated at UCSC by first combining "gene expression RNAseq" values (above) of all TCGA cohorts and then mean normalizing all values per gene. This data was then divided into the 30-40 cancer types after normalization so that this data is available for each cancer type. Since there are 30-40 cancer types with RNAseq data, the TCGA pancan data can serve as a proxy of background distribution of gene expression.

TCGA Gene expression RNAseq (IlluminaHiSeq percentile)

For comparing with data outside TCGA, you can use the percentile version if your non-TCGA RNAseq data is normalized by percentile ranking. Values in this dataset are generated at UCSC by rank RSEM values per sample. The values are percentile ranks ranges from 0 to 100, lower values represent lower expression. You can also combine the TCGA RNAseq data with your RNAseq data, perform normalization across the combined dataset using whatever method you choose, then analyze the combined dataset further.

What is the difference between RPPA data and RPPA_RBN data?

Can I combine data from the methylation 450k and 27k datasets?

What is the difference between GISTIC 2 and GISTIC 2 thresholded datasets?

Where is the transcript-level expression data?

How do I calculate fold change (FC)?

Log transformed means that the output values from the gene expression caller/program have been put through the following transformation:

log2(x+theta) = y

Where x is the TPM, RSEM, etc value, "theta" is a very small value (1, 0.01, etc) added to x since you can not take the log of zero, "log2" is log base 2, and y is the transformed value.

log(A/B) = log(A) - log(B)

So, within our downloads (either from our bulk downloads or just a slice of the data that has not been mean normalized), say you have 2 samples with expression for a gene. In our downloads, one sample is 4 and one sample is 1. This means, because our values are log transformed,

log(A) = 4

log(B) = 1

Therefore:

log(A/B) = 4 - 1

log(A/B) = 3

This gives you a 3-fold change.

Please note that in this case we are reporting the log(fold change). Biologists often use the log(fold change) because without taking the log, down regulated genes would have values between 0 and 1, whereas up regulated genes would have any value between 1 and infinity. This distribution makes graphing and further statistical analysis difficult. Taking the log typically makes the resulting values more normally distributed, which is better for further analysis.

Can I get access to the raw TOIL data?

Example command to get the manifest

aws s3 cp s3://cgl-rnaseq-recompute-toil/tcga-manifest . --request-pay

Now you can take look of the manifest to see the TCGA files

Example command to download a single TCGA file

aws s3 cp s3://cgl-rnaseq-recompute-toil/tcga/0106d51d-d581-4be7-91f3-b2f0c84468d1.tar.gz . --request-pay

How do I view my data with the data from TCGA?

If you are adding in new samples, this will require you to combine outside of Xena and then load. If you are adding new data on samples we already have, then simply load the data into a Xena Hub.

Adding in new samples

We apologize but we don't provide a simple way to do this because of the batch effects that would be present when combining most data across studies. You will need to download the data you wish to combine from TCGA, combine it yourself outside of Xena, and then load it into your own Xena hub.

Adding in more data on TCGA patient's samples, such as new subtype calls

  1. Load your data into your own Xena hub, making sure to select the cohort that you want to view your data side-by-side with when loading it.

Sample names and format are study specific. You will need to match what we already in Xena.

Genomic Signatures

Xena's shows transcript-specific expression or isoform percentage for 'tumor' TCGA data and 'normal' GTEX data. It allows you to compare the distribution of these values for two groups of patient samples.

This tool was created by Akhil Kamath as part of . Akhil was advised by and . Thank you Akhil for all your work!

All RNAseq data was generated by the done by the UCSC Computational Core using the RSEM package. All transcripts are from .

GDC data portal () for the GDC Hub data

GDC legacy archive () for the TCGA Hub data

ICGC data portal () for the ICGC Hub data

Pan Cancer Atlas publications’ data site () for the Pan-Cancer Atlas Hub data

TCGA ATAC-seq publication’s data site () for the ATAC-seq Hub data

Nature biotechnology publication () for the UCSC Toil RNAseq Recompute Hub data

For comparison across multiple or all TCGA cohorts. Dataset was generated by the TCGA PanCan Atlas project and has been normalized for batch effects. Please see the for more information.

Generated by the , this data can be used to compare across TCGA cohorts as well. May not have as many batch effects removed as the PanCan Atlas work.

The TCGA RPPA data are generated at MD Anderson. RPPA data is values generated using method described at . We download the RPPA values from TCGA DCC.

The RPPA_RBN data is normalized value generated using the RBN (replicate-base normalization) method developed by MDACC. For more information: . We downloaded the RBN values from synapse at .

The methylation 450k dataset has . However, we have discovered the range of data for each dataset to be slightly different. As such, we recommend applying some sort of normalization. We recommend looking in the literature to see what methods people have used.

Many copy number estimation algorithms estimate copy number variation on a continuous scale even though it is measuring something discrete (i.e. the number of copies of piece of chromosome or a gene in the cell). The GISTIC 2 thresholded data attempts to assign discrete numbers to these fragments by thresholding the data. The estimated values -2,-1,0,1,2, represent homozygous deletion, single copy deletion, diploid normal copy, low-level copy number amplification, or high-level copy number amplification respectively. More information can be found in the and at the , which is the group that processed this data.

As of March 2019, our transcript-level data is in the . From here choose 'Advanced' and select any of the transcript-level expression datasets. Enter your transcript of interest as a Ensembl identifier (not a gene).

The following instructions assume that your data has been log transformed. All the RNAseq data in Xena public data hubs have already been log transformed, either by us or by the data providers. You can always confirm this by viewing the dataset details page (start at our and drill down until you get to the details page for the dataset).

When comparing these log transformed values, we use the :

Yes! We host it on AWS. Note that due to how large the files are, you will need to pay the egress fees to download the files. To get started, first look through the manifests for TCGA: , TARGET: , and GTEx and decide which files you want. Then using your AWS account, download the files. if you run into any issues.

Download TCGA data through

Note that if you want to view a genomic signature on our gene expression data, you can do so using our

Transcript View
Google Summer of Code 2017
Angela Brooks
Brian Craft
Toil pipeline recompute
Gencode V23 comprehensive annotation
https://portal.gdc.cancer.gov/repository
https://portal.gdc.cancer.gov/legacy-archive
https://dcc.icgc.org/
https://gdc.cancer.gov/node/905/
https://gdc.cancer.gov/about-data/publications/ATACseq-AWG
https://doi.org/10.1038/nbt.3772
PanCan Atlas paper in Cell
https://xenabrowser.net/datapages/?dataset=EB%2B%2BAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.xena&host=https%3A%2F%2Fpancanatlas.xenahubs.net
GDC
http://bioinformatics.mdanderson.org/main/TCPA:Overview
http://bioinformatics.mdanderson.org/main/TCPA:Overview
https://www.synapse.org/#!Synapse:syn1750330
90% of the probes from the 27k dataset
GISTIC 2 paper
Broad Institute
TCGA Pan-Cancer cohort
Explore Data pages
quotient rule of logarithms
s3://cgl-rnaseq-recompute-toil/tcga-manifest
s3://cgl-rnaseq-recompute-toil/target-manifest
s3://cgl-rnaseq-recompute-toil/gtex-manifest
Contact us
our data pages
More information about loading data into your own Xena hub
Our data pages have more information about the sample names for a study
genomic signature feature.
Logo

Getting Started

Step-by-step instructions to viewing your own data

Overview

Get started viewing your own data:

We support tab-delimited (.tsv and .txt) and Microsoft Excel files (.xlsx and .xls). Data on a Local Xena Hub can only be viewed or accessed by the same computer on which it is running, keeping private data secure.

The Local Xena Hub must be installed and running in order to load data, as well as any time you want to view data. The Local Xena Hub will remember previously loaded data.

Please use Chrome to view your own data.

Installing a Local Xena Hub

Double click on the download to begin the installation of the Xena Hub. Follow the wizard to finish the install.

System requirements for Xena Hub

  • Mac: OSX 10.7 and above

  • Windows: 64-bit

  • Linux: ability to run a .jar file

Starting/running a Local Xena Hub

Loading data into a Local Xena Hub

Viewing data from a Local Xena Hub

Gene names and identifiers for genomic data

When you loaded your genomic data we asked what type of genes, transcripts or probes you used. If you selected one of the options from the drop down menu then you can enter HUGO gene names or the identifiers in your file. If you did not select one of the options then you will need to enter the identifiers as they appear in your file.

Help! I don't see my study listed

Data security

How does Xena ensure the security of my data?

Xena does not utilize a central rendering service, or require hubs to be publicly accessible on the internet like, for example, the UCSC Genome Browser does. Data flows in one direction, from hubs to the user agent. If the user installs a Xena Hub on their laptop, the hub is as secure as the laptop. If the user installs a Xena Hub on a local network, behind a firewall, the hub is as secure as the local network.

The Xena Browser accesses data from a local Xena Hub on the same computer by requesting data from http://127.0.0.1. The local Xena Hub will make the data within it available at this address. The local Xena Hub will only answer requests made form the user's own computer.

Users will need to use a web browser that supports this if they wish to use a Xena Hub on the loopback interface. At the time of writing, this includes Chrome, and Firefox, but not Safari.

Is there any data that is considered to not be secure?

A very limited set of metadata is considered to be not secure in the Xena architecture model. This includes cohort names and samples names. This metadata is visible to other hubs in the following scenarios. When the user selects a cohort, all hubs are queried for samples on that cohort. When the user selects a data field, the hub holding that field is queried with the field ID (e.g. gene, probe, transcript, phenotype) and all cohort sample IDs. This means, for example, that two hubs holding data on the same cohort will see the union of sample IDs from that cohort. While data queries are not made available publicly, a malicious person could gain entry to a Xena Hub and comb through logs for these queries. For these reasons, these metadata fields should not contain private information.

KM plots using data from a Local Xena Hub

To visualize and perform a KM analysis, we use two columns/rows of data, time to event and event. These data must be loaded in a phenotype file. The phenotype file can contain other data as well.

Note that you will need to name the headers in your phenotype file EXACTLY what we recognize. See the list of recognized headers for each type of survival/interval below.

This data can be in days, months, years, etc.

Time to Event and Event

Time to Event is a duration variable for each subject having a beginning and an end anywhere along the timeline of the complete study. It begins when the subject is enrolled into a study or when treatment begins, and ends when the end-point (event of interest, for example, death or metastasis) is reached or the subject is censored from the study.

Censoring means the total survival time for that subject cannot be accurately determined. This can happen when something negative for the study occurs, such as the subject drops out, is lost to follow-up, or the required data is not available or, conversely, something good happens, such as the study ends before the subject had the event of interest occur, i.e., they survived at least until the end of the study, but there is no knowledge of what happened thereafter.

Event indicates what the 'event' was for a patient, 1 for the event, for example, death or metastasis, and 0 for censored.

Recognized header names for different types of survival

Below is a table of the column/row header names we recognize for each type. Note that these header names are case sensitive.

Example Overall Survival

More studies

TCGA, TARGET, and GTEx RNA-seq data are uniformly re-aligned to hg38 genome, and re-processed using RSEM and Kallisto methods with gencode v23 annotations to generate expression estimates for ~60,000 genes and ~200,000 transcripts, including many LncRNAs. Xena hosts and displays gene and transcript expression results of this analysis.

International Cancer Genome Consortium (ICGC) goal is to obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe. It includes TCGA data (U.S.A.) plus data contributed by groups from other countries in the International Cancer Genome Consortium. The resource has publically-accessible non-coding somatic mutation data from non-TCGA samples.

The Pan-Cancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutation in more than 2,600 cancer whole genomes from the International Cancer Genome Consortium. Building upon previous work which examined cancer coding regions, this project explored the nature and consequences of somatic and germline variations in both coding and non-coding regions, with specific emphasis on cis-regulatory sites, non-coding RNAs, and large-scale structural alterations.

The NCI's Genomic Data Commons (GDC) provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine.

Cancer Cell Line Encyclopedia. Detailed genetic and pharmacologic characterization of a large panel (~1100) of human cancer cell lines.

Pediatric data

We have a number of sources of pediatric data

The goal of the Treehouse Childhood Cancer Initiative (Treehouse) is to evaluate the utility of comparative gene expression analysis for difficult-to-treat pediatric cancer patients. Approaching 2000 pediatric tumor data, Treehouse has now assembled a large collection of pediatric cancer RNA-Seq, which, added to adult data, results in a compendium of over 11,000 adult and pediatric tumor-derived gene expression data. Pediatric cancer expression data are from public repository samples and from clinical samples at partner institutions, including UC San Francisco, Stanford, Children’s Hospital of Orange County and British Columbia Cancer Agency. In line with UC Santa Cruz Genomics Institute’s commitment to sharing data and to furthering research everywhere, we have made this data available for all to download and use.

Interested in a dataset we don't have?

Hubs for institutions, collaborations, labs, and larger projects

Institutional Xena Hubs allow you to share data, visualizations, and analyses with a specific group of people. Xena Hubs can be set up on any server or in the cloud. You control who has access to the Xena Hub by controlling who has access to the server on which it is hosted.

To make your data publicly available, simply make the server open to the web.

Download

First, download the ucsc_xena_xxx.tar.gz file to your server, here:

The file to download is the one called "Tar archive, no updater or JRE - recommended for linux server developments". Uncompress and extract the .jar file (cavm-xxx-standalone.jar). The current version is 0.25.0.

Start the hub

The hub can be started with "java -jar cavm-xxx-standalone.jar". Passing option --help will display usage information.

Note that you need to use Java 8 to run the hub.

There are several options you will want to set.

To bind an external interface (instead of loopback), use "--host 0.0.0.0".

The connection between your hub and the Xena Browser is through https, use "--certfile" and "--keyfile" options to set them.

There are three paths that can be configured: the database file, the log file, and the root directory for data files to be served. These are set by --database, --logfile, and --root. If you don't set these, they will default to paths under ${HOME}/xena.

Example start script for an open-access hub

Copy the content below to a file "start_script"

Link server.jar to cavm-x.xx.x-standalone.jar

Make "start_script" executable

Run "./start_script"

Your hub is now running on "https://computer-external-ip:7223".

Getting a security certificate for an open-access hub

When a Xena Hub starts, it opens two consecutive ports, for http and https connections, e.g. 7222 and 7223. HTTP is always the lower number, and HTTPS is always the higher number. This means your hub has two urls

Connecting via HTTP to the hub is no longer supported by modern web browsers, thus you will need to connect via HTTPS. To do this you will need an HTTPS certificate and private key. Paths to the cert and key are set with --certfile and --keyfile. This might seem redundant for a hub behind a firewall, but the web app has no influence over the security policies of the web browser. HTTPS certificates can be acquired from free public Certificate Authorities, or via NIH InCommon.

Make your data ready

Load data through command line

Once the hub is running, and input files have been placed in the --root directory, a file can be loaded by running the jar a second time, with the -l option, like

Delete data through command line

If your hub is run on the default 7222 port, you can load data with

If your hub is running on a different port, you load data with

Please contact us at genome-cancer@soe.ucsc.edu for more assistance.

If your hub is run on the default 7222 port, you can delete data with

If your hub is running on a different port, you delete data with

Viewing data from the hub

You can now go to the visualization and add a cohort or study listed in your hub.

If you don't have a security certificate yet

If you don't have a security certificate yet but you would like to verify that the hub is working you can use ssh tunneling. An example of how to do this for AWS is below, where it is assumed that the xena hub is running on port 7222 for http and 7223 for https. In this scenario, you start the hub without using --certfile and --keyfile options.

Assuming that you typically ssh into EC2 on AWS like this,

you will now set up an ssh tunnel to port 8000 on your computer. To do this we add the -L option:

Now on your computer, http://localhost:8000 is the same as the http://aws-ip:7222. Chrome Browser does not allow a connection to http://aws-ip:7222, but it will allow a connection to http://localhost:8000.

An example apache configuration on AWS VM

in /etc/httpd/conf/httpd.conf

A landing page for my hub

How do I add a 'Launch Xena' button like the TOIL landing page

<button class="hubButton" data-cohort="TCGA TARGET GTEx">Launch Xena</button>

To add a clickable button in the hub landing page, make sure the button has classname 'hubButton'. You also need to specify the cohort to view, defined by the data parameter 'data-cohort'. Once users click the button, the visualization wizard will be launched to the specified cohort. You can change the button label.

A landing page for my cohort

How do I add a "Launch" button like the TCGA TARGET GTEx landing page

<button class="cohortButton" data-bookmark="bc7f3f46b042bcf5c099439c2816ff01">Example: compare FOXM1 expression</button>

The button must has a classname 'cohortButton'. If you have the data parameter 'data-bookmark', clicking the button will take the user to the bookmark view. If you don't have the 'data-bookmark' parameter, clicking the button will take the user to the visualization wizard with an empty spreadsheet. You can change the button label. You can as many button as you want.

Setting up Xena for your institution

Please see our Viewing your own Data documentation:

Deep Linking Into Xena

How to programmatically specify Xena Browser views

Xena has the ability to draw visualizations based on parameters passed through URL. You will need to URL encode the parameters.

Examples

Display Column Setup Examples

HTML Code showing how to build examples 1-7

Highlight Samples Examples

Below is HTML code showing how to generate URLs that:

1) highlight samples TCGA-C8-A131-01 or TCGA-BH-A0DL-01

2) highlight samples matching arbitrary criteria, such as samples in Column B with values > 10

HTML code for highlighting samples examples

HTML code showing sample filtering (to specify what samples to display in the view) examples

Base URL and URL construction

columns parameter

The columns parameter is a JSON-encoded array of objects, specifying the columns to display.

Properties

To specify a single column you need to, at a minimum, specify the dataset ID, the hub where the data resides, and the fields that you want to display.

Fields for genomic columns

Fields can be a gene, probe, or chromosome position. All fields need to be of the same type (i.e. all genes or all probes). You can only enter one chromosome position per column.

Fields for phenotypic columns

Field should be the field ID as it appears exactly in the dataset

Optional properties

width: <number>

Width in pixels

columnLabel: <string>

Text for top column label

fieldLabel: <string>

Text for bottom column label

geneAverage: <boolean>

Display the gene average instead of the individual probes for a gene. You can use this only when a single gene is specified for a dataset that has probes on a gene

normalize: ‘none’ | ‘mean’ | ‘log2’ | ‘normal2’

How the data should be dynamically normalized on the fly. 'mean' is x-mean (subtract mean), applied per (sub)column. 'log2' is log2(x+1). 'normal2' is (x-2).

showIntrons: <boolean>

Show introns for mutation and segmented copy number columns

sortDirection: 'reverse'

Reverse sort the samples

sortVisible: <boolean>

Sort column on the zoomed region

filterColumns parameter

Same as the columns parameter, but these columns will not be displayed. They are available for sample filtering. See the filter property of the heatmap parameter, below.

heatmap parameter

The heatmap parameter is a JSON-encoded object specifying global display options

Properties

mode: 'chart'

Display in chart mode rather than visual spreadsheet mode.

showWelcome: <boolean>

Show the welcome banner.

searchSampleList: [<string>, ...]

Highlight the specified samples in the view.

search: <string>

filter: <string>

Both search and filter can be specified in the same url, in which case the samples will be filtered, and any remaining samples matching search will be highlighted. Note that the search expression should only reference columns, not filterColumns, since the latter are not available for visualization.

How do I cite UCSC Xena?

You've run your analysis and are ready to publish your paper - congratulations! Cite the paper below to thank Xena and keep our project funded.

How do I compare gene expression between subgroups?

  1. First, make sure that the gene or genes that you want to compare across your groups are on screen.

  2. Click on the charts icon in the top right and choose 'Compare subgroups'.

  3. Click the dropdown for 'Show data from' and choose your gene expression column.

  4. Click the dropdown for 'Subgroup samples by' and choose your subgroup column.

  5. Choose if you would like a box plot or violin plot and click 'Done'.

Example

Below we look at patient's samples that have aberrations in EGFR in the TCGA Lung Adenocarcinoma study. We will investigate if patient's samples that have aberrations in EGFR (mutations or copy number amplifications) have higher expression.

Steps

  1. Click the graph icon in the upper right corner to enter Chart View.

  2. Click 'Compare subgroups', since we want to compare the group of samples who have aberrations in EGFR to the group of samples that do not.

  3. Click the dropdown for 'Show data from' and choose 'column C: EGFR - gene expression RNAseq - HTSeq - FPKM-UQ'.

  4. Click the dropdown for 'Subgroup samples by' and choose 'column B: (mis OR infra) OR C:>0.5 - Subgroup'.

  5. Click 'Done'.

Video of steps

Differential Gene Expression

Run a genome-wide differential gene expression analysis to compare groups of samples

To run a differential gene expression analysis, click on the 3 dot column menu at the top of a categorical column (not a numerical column) and choose 'Differential Expression'.

This will take you to new page where you will define the sample subgroups you would like to compare (note that you can select multiple categories for a single subgroup).

After you have your subgroups, scroll to the bottom and click 'submit'.

Due to compute limitations you can only run a total of 2000 samples through the analysis pipeline.

This will start the analysis, which make take a while to run depending on the size of the dataset. As the results are completed, the web page will update. Scroll to see more results. Once the analysis is finished it will say 'Done' at the top of the page.

More details

The Advanced Visualization parameters only apply to the PCA or t-SNE plot. They do not apply to any other analyses.

Running it on your own data

We disable running our differential gene expression analysis on your own data since we send the data in the analysis to various websites, which may not be secure. There are 3 options to run our analysis on your own data:

TumorMap

A tool developed by the Stuart Lab to view samples in a 2D layout

TumorMap is a tool that enables grouping samples based on their omic signatures in a visually accessible way. Similar to dimensionality reduction methods, Tumor Map method takes a high-dimensional omics space and produces a two dimensional visualization. Unlike most dimensionality reduction methods, the TumorMap method is able to combine multiple types of omics data (e.g. mRNA expression and methylation data types in a single map). Furthermore, TumorMap is an interactive tool that allows navigating through a tumor landscape that represents a heterogeneous multi-dimensional and multi-platform omic space of oncogenic signatures.

In the TumorMap, each node is a sample and clusters of samples indicate groups with similar oncogenic signatures and genomic alteration events. The samples in a map may be colored by various molecular, clinical, diagnostic, prognostic, and phenotypic annotations (e.g. tumor type, molecular subtype, etc.) to visualize associations with the data type used in clustering.

Xena Single Cell

Overview of how to view single cell data

Entering Xena Single Cell

To enter Xena Single Cell click on 'SINGLECELL' in the top navigation bar. This will welcome you to where you can then click 'Enter'.

Selecting a study

You can either double-click to select a study or click on a study and click next.

For many of the studies we generated data on top of what was provided by the authors, such as the coordinates for a 3D UMAP. Note that these Xena-derived data are in beta and are subject to change at any point.

Selecting a layout

After you have selected a study you will be prompted to select a layout on the right. We offer two types of layouts: spatial and UMAP/t-SNE. Our UMAP/t-SNE plots can be 2D or 3D.

Image configuration

Many of our spatial layouts have an image component, either H&E or fluorescent image channels. Click on the 'image' tab in the right panel to manipulate the image component, if available. At the top of the tab will be an opacity slider for the H&E image, if that is available for your layout. Below that are the image channels. The check boxes allow you to turn color channels on and off. You can select different channels by clicking on the channel name and choosing a new one from the drop down menu. The slider below the channel name allows you to adjust the saturation of the color. At the bottom of the tab you can toggle the cell segmentation on and off, if it is available for an image.

Coloring the cell/dots

All of our layouts have the ability to color the cell or dot by various data. The options, which may or may not be available for a particular study, are donor, dataset, cell types/clusters, cell type/cluster scores, gene/protein, and more options. The 'cell type/cluster scores' allows you to more closely examine one cell type or cluster. 'Gene/protein' contains gene expression and protein expression data. More options contains a variety of additional data we have on those cells/dots.

Each of these options will also have the ability to blend this first coloring with another color. This will allow you to color a categorical feature by a continuous one so that the intensity of the categorical color is controlled by the continuous one. This can be useful for seeing in which cell types a gene is expressed. The blend with option will also allow you to visualize the expression of two continuous features together, such as two genes. Not all combinations of data types are supported.

Viewing single cell data in the Visual Spreadsheet or Chart View

Before and After: A Comparison of Legacy and Harmonized TCGA Data at the Genomic Data Commons | NCI Genomic Data CommonsNCIGDC_Updates
Before and After: A Comparison of Legacy and Harmonized TCGA Data at the Genomic Data Commons | NCI Genomic Data CommonsNCIGDC_Updates

We support most types of . Genomic data needs to be values called on genes, transcripts, exons, probes or some other identifier. Phenotypic/clinical/annotation data can be almost anything, including patient data (e.g. age, set, etc), clinical data (), and other data such as gene fusion calls, regulon activity, immune scores, and more. Samples can be bulk tissue, cell lines, cells, and more. We do not visualize raw data such as FASTQs or BAMs.

Data can be your own or from another source, like or a publication.

Click on . You will be prompted to download and install a local Xena Hub.

After installing a local Xena Hub, go back to to auto-start the Hub. If it does not automatically start, refresh the page or double click on the Xena Hub application on your computer. The Xena Hub application should be in your Applications folder for Mac and Windows. Note that it will take up to one minute to start up.

Most people load data into their Local Xena Hub through our , which leads you through the loading process step by step. Note that you will want to make sure your data is ahead of time.

You can also load data .

Click on . If your study is not already selected as step 1 of the wizard, then select it from the drop down and click 'Done'. Note that if you did not enter a study name your data will be under 'My Study'.

You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is running on your computer. You can also check which studies are on your hub and what data is in them by going to the .

Help text was partially taken from .

The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (), including The Cancer Genome Atlas () and Therapeutically Applicable Research to Generate Effective Treatments (), and many more.

Xena displays gene expression data from the metastatic cancer study published in

The goal of the Gabriella Miller Kids First Pediatric Research Program (Kids First) is to develop to help researchers uncover new insights into the biology of childhood cancer and structural birth defects, including the discovery of shared genetic pathways between these disorders. Over 2015-2018, the program selected 26 patient cohorts for whole genome sequencing through a peer-review process.

TARGET data is intended exclusively for biomedical research using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) that focus on the development of more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. Moreover, TARGET data can be used for research relevant to the biology, causes, treatment and late complications of treatment of pediatric cancers, but is not intended for the sole purposes of methods and/or tool development (please see section of the OCG website). If you are interested in using TARGET data for publication or other research purposes, you must follow the .

Don't see a study or dataset that you are interested in? for yourself or your group with the data you need.

or

Note that , which can be used for testing purposes only.

You will need to make your data file ready just like for local Xena hub on your laptop. Please see instructions on .

You will also need to make your data's meta-data file (xxx.json) ready. Please see for instructions.

Go to Data Hub page , add "https:computer-external-ip:7223"

After setting up the ssh tunnel go to Data Hub page , add "http://localhost:8000".

How to set up my hub to have a url like

Alternatively, you can run the hub behind a reverse proxy, and attach the certificate and keyfile to Apache, Nginx or AWS load balancer configurations. In this scenario, you start the hub without using --certfile and --keyfile options. This is useful if you want your hub to have a url like "". You set up your DNS to point the hostname (tcga.xenahubs.net) to ip address of the server on which the hub is running.

If you have a markdown file called $DOCROOT/meta/info.mdown in your hub's document root directory, the markdown file will serve as a splash page for your hub. An example is the UCSC Toil RNA-seq Recompute hub: . The corresponding markdown file is .

You can also have a landing page for a study cohort. An example is the TCGA TARGET GTEx cohort: . The corresponding markdown file is . The study cohort landing page is also a markdown file, which must be hosted in the repository on github. The markdown file called https://github.com/ucscXena/cohortMetaData/cohort_$cohortName/info.mdown.

The list of supported parameters below is not exhaustive. If you do not see your functionality supported below please .

One data column (with two subcolumns) display

Display in chart mode

Three data column display, one clinical data column, two genomic data columns

Specify column width, top label, and bottom label on Column C

Reverse sort on Column B

Display gene average in Column C

Display introns in Column C; Hide welcome banner

Equivalent to typing this text into the 'Find' feature in Xena. In this example it is highlighting the samples that for column B have a value of 'TARGET'.

Like the search parameter, but filter the view to the matching samples. Equivalent to . Columns that are only needed for filtering (not visualization) can be added to the filterColumns parameter, and appear semantically after columns. For example, if columns has length two they are labeled 'B' and 'C', and the first column in filterColumns will be 'D'.

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020).

You can also read our paper for free at bioRxiv:

This page assumes you have a column on screen that has the groups you would like to compare (such as 'sample type' for comparing tumor vs normal') or have already made subgroups (such as 'has mutations in EGFR' vs 'does not have mutations in EGFR'). If you need help making subgroups, please see the help page.

For more information see our .

The gene expression dataset chosen for a specific study/cohort is the same gene expression dataset as the one in the .

Upload your data to to run a somewhat similar analysis. BioJupies by the Ma'ayan lab will run a somewhat similar analysis to the one we run and has a very user friendly interface.

Upload your data to the to run a very similar analysis. This pipeline is what our analysis is based off of and will require a bit more familiarity with running differential gene expression analyses. Our modifications to this analysis are just to automatically pick the best normalization, etc options based on our public data. You will need to know which options are best given your own data.

Run our pipeline on your own computer. This will give you identical results to our pipeline but requires the most engineering to set up and run. You will need to with all the dependencies pre-installed and then download and run the notebook on this docker.

UCSC TumorMap is a separate project developed by the Stuart Lab at UCSC. We link to them to help users gain another perspective on the data they are seeing in Xena. From their :

Note that we do not yet support users loading their own data. If there is data that you'd like to see on Xena Single Cell, public or private, .

In addition to viewing single cell data in Xena Single Cell, you can also view it in the or . Select the study and columns or data you are interested in in the Visual Spreasheet, similar to the bulk data.

genomic and/or phenotypic/clinical/annotation data
survival data for a KM plot
GEO
VIEW MY DATA
VIEW MY DATA
website wizard
properly formatted
via the command line
VISUALIZATION
My Computer Hub page
Install a Local Xena Hub on your computer
Start the Local Xena Hub
Load the data you want to view
View the data

Survival Type

'Time to Event' Header name

'Event' Header name

Overall Survival

OS.time

OS

Disease free interval

DFI.time

DFI

Disease specific survival

DSS.time

DSS

Progression free interval

PFI.time

PFI

Local recurrence interval

LRI.time

LRI

Distant metastasis interval

DMI.time

DMI

Distant disease free survival

DDFS.time

DDFS

Invasive disease free survival

IDFS.time

IDFS

Regional recurrence

RR.time

RR

Relapse

Relapse.time

Relapse

Metastasis

Metastasis.time

Metastasis

Distant recurrence interval

DRI.time

DRI

Distant metastasis free survival

DMFS.time

DMFS

sample

OS

OS.time

TCGA-AB-1234-01

0

100

TCGA-AB-6789-01

1

200

TCGA-CD-1234-01

0

300

TCGA-CD-5678-01

1

400

Event and Time to Event columns
--database -d default to ${HOME}/xena/database
--logfile default to ${HOME}/xena/xena.log
--root -r default to ${HOME}/xena/files/
#!/bin/bash

PORT=7222
LOGFILE=xena/xena7222.log 
DOCROOT=xena/files
DB=xena/myHub

java -jar server.jar -r ${DOCROOT} -d ${DB} --no-gui -p ${PORT} -H 0.0.0.0 --logfile ${LOGFILE} --certfile ${CERTFILE} --keyfile ${KEYFILE}> log 2>&1 &

disown
ln -sf cavm-0.xx.0-standalone.jar server.jar
chmod u+x start_script
./start_script
ln -sf cavm-x.xx.x-standalone.jar server.jar
java -jar server.jar -l /path/to/root/file.tsv
java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv
java -jar server.jar -x /path/to/root/file.tsv
java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv
ssh -i "xena.pem" ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com
ssh -i "xena.pem" -L 8000:localhost:7222 ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com
<VirtualHost *:443>
    ServerName tcga.xenahubs.net
    SSLEngine on
    SSLProxyEngine On
    SSLProxyVerify none
    SSLProxyCheckPeerCN off
    SSLProxyCheckPeerName off
    SSLProxyCheckPeerExpire off
    SSLCertificateFile YOURCERTIFICATE
    SSLCertificateKeyFile YOURKEY
    # setup the proxy                                                                                                                                                                                          
    ProxyPreserveHost On
    ProxyPass / https://localhost:9000/
    ProxyPassReverse / https://localhost:9001/
</VirtualHost>
<html>
<script>
	window.onload = function() {
        var browser = 'https://xenabrowser.net/';

        var hub1 = 'https://toil.xenahubs.net';
        var dataset1 = 'tcga_Kallisto_tpm';

		var hub2 = 'https://tcga.xenahubs.net';
		var dataset2 = 'TCGA.PANCAN.sampleMap/Gistic2_CopyNumber_Gistic2_all_data_by_genes';
		
		var hub3 = 'https://pancanatlas.xenahubs.net';
		var dataset3 = 'Survival_SupplementalTable_S1_20171025_xena_sp';

		var hub4 = 'https://pancanatlas.xenahubs.net';
		var dataset4 = 'broad.mit.edu_PANCAN_Genome_Wide_SNP_6_whitelisted.xena';

		var hub5 = 'https://gdc.xenahubs.net';
		var dataset5 = 'TCGA-BRCA.somaticmutation_wxs.tsv';

		var col1 = {name: dataset1, host: hub1, fields: 'TP53 FOXM1'};
		var col2 = {name: dataset2, host: hub2, fields: 'FOXM1'};
		var col3 = {name: dataset3, host: hub3, fields: 'cancer type abbreviation'};
		var col4 = {name: dataset4, host: hub4, fields: 'chr3:4000000-4100000'};
		var col5 = {
			name: dataset1,
			host: hub1,
			width: 400,
			columnLabel: 'top column label',
			fieldLabel: 'bottom column label',
			fields: 'ENST00000064780.6 ENST00000066544.7 ENST00000070846.10 ENST00000072516.7 ENST00000072644.5 ENST00000072869.8 ENST00000074304.9 ENST00000075120.11 ENST00000075322.10'
		};

		var col6 = {name: dataset1, host: hub1, fields: 'TP53'};
		var col6_geneAve = {name: dataset1, host: hub1, fields: 'TP53', geneAverage: true};

		var col7 = {name: dataset5, host: hub5, fields: 'TP53'};
		var col7_showIntron = {name: dataset5, host: hub5, fields: 'TP53', showIntrons: true};

		var col8 = {
			name: dataset1, 
			host: hub1, 
			fields: 'TP53 FOXM1', 
			sortDirection: 'reverse'
		};

		var heatmapChart = JSON.stringify({
			mode: 'chart'
		});

		var heatmapHideWelcome = JSON.stringify({
			showWelcome: false,
		});

		        var columns1 = JSON.stringify([col1]);
        var l1 = document.getElementById('link1');
        l1.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns1);

        var columns2 = JSON.stringify([col1, col2]);
        var l2 = document.getElementById('link2');
        l2.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns2) + '&heatmap=' + encodeURIComponent(heatmapChart);

        var columns3 = JSON.stringify([col3, col2, col4]);
        var l3 = document.getElementById('link3');
        l3.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns3);

        var columns4 = JSON.stringify([col3, col5]);
        var l4 = document.getElementById('link4');
        l4.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns4);

        var columns5 = JSON.stringify([col8]);
        var l5 = document.getElementById('link5');
        l5.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns5);

        var columns6 = JSON.stringify([col6, col6_geneAve]);
        var l6 = document.getElementById('link6');
        l6.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns6);

        var columns7 = JSON.stringify([col7, col7_showIntron]);
        var l7 = document.getElementById('link7');
        l7.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns7) + '&heatmap=' + encodeURIComponent(heatmapHideWelcome);
    };
</script>
<body>
    <a id=link1>Example 1</a> One data column (with two subcolumns) display
	<br><br>
	<a id=link2>Example 2</a> Display in chart mode
	<br><br>
	<a id=link3>Example 3</a> Three data column display, one clinical data column, two genomic data columns
	<br><br>
	<a id=link4>Example 4</a> Specify column width, top label, and bottom label on Column C
	<br><br>
	<a id=link5>Example 5</a> Reverse sort on Column B
	<br><br>
	<a id=link6>Example 6</a> Display gene average in Column C
	<br><br>
	<a id=link7>Example 7</a> Display introns in Column C; Hide welcome banner
</body>
</html>
<html>
<script>
	window.onload = function() {
		var browser = 'https://xenabrowser.net/';
		var columns = JSON.stringify([
			{
				"width": 136,
				"columnLabel": "gene expression RNAseq - IlluminaHiSeq",
				"fieldLabel": "TP53",
				"showIntrons": true,
				"host": "https://tcga.xenahubs.net",
				"name": "TCGA.BRCA.sampleMap/HiSeqV2",
				"fields": "TP53"
			},
			{
				"width": 200,
				"columnLabel": "somatic mutation (SNPs and small INDELs) - MC3 public version",
				"fieldLabel": "TP53",
				"host": "https://tcga.xenahubs.net",
				"name": "mc3/BRCA_mc3.txt",
				"fields": "TP53"
			}
		]);


		var heatmap1 = JSON.stringify({
			showWelcome: false,
			searchSampleList: ["TCGA-C8-A131-01", "TCGA-BH-A0DL-01"]
		});

		var l1 = document.getElementById('link1');
		l1.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&heatmap=' + encodeURIComponent(heatmap1);

		var heatmap2 = JSON.stringify({
			showWelcome: false,
			search: "B:>10"
		});

		var l2 = document.getElementById('link2');
		l2.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&heatmap=' + encodeURIComponent(heatmap2);
	};
</script>
<body>
	<a id=link1>Sample highlight example 1:</a> highlight samples of specific sample IDs, such as TCGA-C8-A131-01 or TCGA-BH-A0DL-01
	<br>
	<a id=link2>Sample highlight example 2:</a> highlight samples matching arbitary criteria, such as samples in Column B with values > 10
</body>
</html>
<html>
<script>
	window.onload = function() {
		var browser = 'http://dev.xenabrowser.net/';
		var columns = JSON.stringify([
			{
				"width": 90,
				"columnLabel": "",
				"fieldLabel": "site id",
				"host": "https://xena.treehouse.gi.ucsc.edu",
				"name": "clinical_TumorCompendium_v11_PolyA_2020-04-09.tsv",
				"fields": "site_id"
			},
			{
				"width": 150,
				"columnLabel": "gene expression RNAseq",
				"fieldLabel": "ALK",
				"host": "https://xena.treehouse.gi.ucsc.edu",
				"name": "TumorCompendium_v11_PolyA_AllSamples_AllGenes_Kallisto_HugoLog2TPM_20230630.tsv",
				"fields": "ALK"
			}
		]);


		var heatmap1 = JSON.stringify({
			showWelcome: false,
			searchSampleList: ["THR30_0820_S01", "THR30_0861_S01"]
		});

		var l1 = document.getElementById('link1');
		l1.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&heatmap=' + encodeURIComponent(heatmap1);

		var heatmap2 = JSON.stringify({
			showWelcome: false,
			search: "B:=TARGET"
		});

		var l2 = document.getElementById('link2');
		l2.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&heatmap=' + encodeURIComponent(heatmap2);

		var heatmap3 = JSON.stringify({
			filter: "B:TARGET",
		});

		var l3 = document.getElementById('link3');
		l3.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&heatmap=' + encodeURIComponent(heatmap3);

		var heatmap4 = JSON.stringify({
			showWelcome: false,
			filter: "B:TARGET OR B:TCGA",
		});

		var l4 = document.getElementById('link4');
		l4.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&heatmap=' + encodeURIComponent(heatmap4);

		var heatmap5 = JSON.stringify({
			showWelcome: false,
			filter: "B: TARGET",
			searchSampleList: ["TARGET-40-0A4I6O-01A-01R"]
		});

		var l5 = document.getElementById('link5');
		l5.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&heatmap=' + encodeURIComponent(heatmap5);

		var filterColumns = JSON.stringify([{
			"host": "https://xena.treehouse.gi.ucsc.edu",
			"name": "clinical_TumorCompendium_v11_PolyA_2020-04-09.tsv",
			"fields": "age_at_dx"
		}]);
		var heatmap6 = JSON.stringify({
			showWelcome: false,
			filter: "D:<5",
		});
		var l6 = document.getElementById('link6');
		l6.href = browser + 'heatmap/?columns=' + encodeURIComponent(columns) + '&filterColumns=' + encodeURIComponent(filterColumns) + '&heatmap=' + encodeURIComponent(heatmap6);
	};
</script>
<body>
	<a id=link1>Sample highlight Example 1:</a> highlight samples of specific sample IDs, such as THR30_0820_S01 or THR30_0861_S01
	<br><br>
	<a id=link2>Sample highlight Example 2:</a> highlight samples matching an arbitary criteria, such as TARGET samples
	<br><br>
	<a id=link3>Sample filtering Example 1:</a> specify (filter to) what samples to show in a view using a predicate, such as TARGET samples
	<br><br>
	<a id=link4>Sample filtering Example 2:</a> specify samples in a view using a predicate that includes a boolean term, such as TARGET and TCGA samples (use boolean OR)
	<br><br>
	<a id=link5>Sample filtering and highlight at the same time:</a> specify samples in a view using a predicate (e.g. just show TARGET samples), as well as highlight specific samples by IDs, such as TARGET-40-0A4I6O-01A-01R
	<br><br>
	<a id=link6>Sample filtering using data in a hidden filter column:</a> useful when you want to filter samples using data not displayed on the screen. You can use <i>HIDDEN FILTER COLUMN</i>. For exmaple, filter the samples to < 5 years old, but you don't want to actually display the age column on the screen.  
</body>
</html>
var page = ‘https://xenabrowser.net/heatmap/’;
var url = page + ‘?columns=’ + encodeURIComponent(columns_parameter) + \
    ‘&heatmap=’ + encodeURIComponent(heatmap_parameter);
var column = {name: dataset1, host: hub1, fields: 'foo bar'};
var columns_parameter = JSON.stringify([column]);
var url = page + ‘?columns=’ + encodeURIComponent(columns_parameter);
var heatmap_paramter = JSON.stringify({
      showWelcome: false,
      search: "B:=TARGET"
});
var url = page + ‘?columns=’ + encodeURIComponent(columns_parameter) + \
      ‘?heatmap=’ + encodeURIComponent(heatmap_paramter);
A PRACTICAL GUIDE TO UNDERSTANDING KAPLAN-MEIER CURVES
UCSC RNA-seq recompute compendium
ICGC
PCAWG
GDC
CCG
TCGA
TARGET
MET500
Robinson et al 2017 Integrative clinical genomics of metastatic cancer.
CCLE
KidsFirst
a large-scale data resource
TARGET
Using TARGET Data
TARGET Publication Guidelines
Treehouse Consortium
Set up a hub,
https://genome-cancer.ucsc.edu/download/public/get-xena/index.html
http://ip:7222
https://ip:7223
data format specifications
loading data from the command line
here
here
https://tcga.xenahubs.net
https://tcga.xenahubs.net
https://toil.xenahubs.net
this
https://xenabrowser.net/datapages/?cohort=TCGA%20TARGET%20GTEx
this
https://github.com/ucscXena/cohortMetaData
Hubs for institutions, collaborations, labs, and larger projects
contact us
Example 1
Example 2
Example 3
Example 4
Example 5
Example 6
Example 7
More examples of possible search terms.
https://doi.org/10.1038/s41587-020-0546-8
https://www.biorxiv.org/content/10.1101/326470v6
'How do I make subgroups'
Ending Screenshot
Basic Tutorial: Section 3
BioJupies
Bulk RNA-seq analysis pipeline Appyter
set up a docker
Overview page
contact us
Visual Spreasheet
Chart View
the section below detailing a way to utilize ssh tunneling to get around this
selecting 'Filter' from the 'Find' UI
Basic Datasets menu
Logo
Logo

FAQ/Troubleshooting Guide

I ran into APPLE warning you about “unidentified developer” warning when installing Xena. What do I do?

I see my probes/genes/transcripts when loading my data, but I don't know whether to choose hg18, hg19 or hg38?

The only time the assembly matters is if you decided to visualize part or all of a chromosome, rather than a gene/probe/transcript. If you want to visualize only genes/probes/transcripts than it does not matter which assembly you choose.

I don't see my study listed in the visualization

You also may not see your study if the hub is still loading the data. Wait a few minutes and refresh the page.

I'm entering a gene name but it only draws gray

When you loaded your genomic data we asked what type of genes, transcripts or probes you used. If you selected one of the options from the drop down menu then you can enter HUGO gene names or the identifiers in your file. If you did not select one of the options then you will need to enter the identifiers as they appear in your file.

Can I load two or more phenotype files to the same study?

Yes, we will allow you to select phenotypes from both files in the visualization.

Help! My file is too large to open in Microsoft Excel.

You might be able to load your file anyways, depending on the format. Give it a try and if you are unable to load it, write us an email and we may be able to fix your file for you.

How do I convert my .xls or .xlsx into a tab-delimited file?

You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.

Unix vs DOS

We require that your data files have a unix line ending. To ensure that your files have this line ending on a DOS, please follow the help here:

Note that this requirement is only for data files, not for the associated .json files.

Contact us

We'd love to hear from you!

Post to our public forum

Privately email us

genome-cancer@soe.ucsc.edu

Sign up for our monthly newsletter

Follow us

Tag us when you announce a publication or research

Do you use Xena to further your research? Tag us when you publish and we'll help promote you.

Cite us

You've run your analysis and are ready to publish your paper - congratulations! Cite the paper below to thank Xena and help keep our project funded.

Supported search terms for finding samples

Categorical features

Our search is 'contains' search, meaning the term you enter can be at the beginning, end or in the middle of a matched term. Our search is case-independent. An example is

IIA

will match 'Stage IIIA' and 'Stage IIA'. To specify a specific string, use quotes

"Stage IIA"

Numerical and Continuous features

You can specify a certain column and mathematical expression such as

A:>2

which will find all values greater than 2 in the first column. We support the following operators

  • = (equal)

  • >= (less than or equal)

  • >= (greater than or equal)

  • < (less than)

  • > (greater than)

  • != (not equal)

Mutation data

You can search any annotation on a mutation, such as the functional impact, protein position, or gene name itself

To find all samples with mutations with the protein change, enter:

V600E

To find all samples where the functional impact has the text 'frame' or 'nonsense' in it:

frame OR nonsense

To find all samples that have a mutation, search the gene annotation:

TP53

To find all samples that do not have a mutation, use the negation of the gene annotation:

!=TP53

No data or 'null'

To find all samples that do not have data in one or more columns, use:

null

and choose 'Remove samples'. To find all samples that do not have data for just one column, use:

B:null

Sample IDs

Enter a sample ID to find a sample of interest. An example:

TCGA-DB-A4XH

If you are searching for multiple sample IDs, you will need to separate each by an 'OR'. You can copy and paste a list of sample IDs into the search bar as long as they are separated by a space, tab, or return (new line).

TCGA-DB-A4XH OR TCGA-2F-A9KO-01 OR TCGA-02-0001

Search a specific column

To make it easy to search a specific column, we use shorthand to annotate the first column as 'A:', the second as 'B:', etc. An example is

A:YES

This will search ONLY the first column for the word 'YES'. Note that we will retain your original search if you move the columns around.

Boolean operators: OR, AND, and !=

You can enter multiple search terms and we will match all of them with an implicit 'AND'. We also support 'OR'.

Use parentheses to group search terms. For example:

"Stage II" (B:Negative OR C:Negative)

will search for samples that match 'Stage II' in any column and are 'Negative' for either the second or third column.

You can also use '!=' to negate a term such as:

!=null

which will match all samples that have data across all columns.

Accessing data through python

Installation

Usage

Example

Help

More Information

MuPIT

A 3D protein viewer developed by Rachel Karchin's lab

MuPIT interactive is an online tool that allows you to map sequence variants from their genomic position onto protein structures. Viewing a variant on protein structure can be useful in interpreting its potential biological consequences. After mapping, the variants are displayed on an interactive 3d structure. The user may turn variants on and off, and display annotations on the protein structure.

Once you have the mutation data you're interested in, click the menu at the top of the column and chose 'MuPIT View'. This will send your mutation data to MuPIT and open their viewer in a new tab.

Example

On the left of the figure is Xena mutation column view of ERBB2 somatic mutations from the TCGA breast cancer cohort. Users click on the MuPIT link from the caret menu at the top of the column. It will send all the mutations' genomic positions as well as their recurrence p-values to the MuPIT display. On the right side of the figure, MuPIT displays mutations in various size of bright green spheres. Large spheres for recurrent mutations. Size of the mutation spheres are determined by recurrence p values. The MuPIT display shows these ERBB2 somatic mutations cluster around the ERBB2 active site (ATP binding site in blue and proton acceptor site in teal).

GDC

Information on Xena data from GDC release v41.0

In addition to the data from the GDC, we added two new phenotype/clinical fields to all GDC cohorts: age_at_earliest_diagnosis.diagnoses.xena_derived and age_at_earliest_diagnosis_in_years.diagnoses.xena_derived. This was done because some GDC cohorts had multiple diagnoses, each with their own age_at_diagnosis.diagnoses. When there were multiple ages the Xena Visual Spreadsheet would display these fields as a category. In order to have a field that could always be displayed as a continuous feature, we created the age_at_earliest_diagnosis.diagnoses.xena_derived field that has the smallest value when there were multiple entries. age_at_earliest_diagnosis_in_years.diagnoses.xena_derived was created similarly, but also dividing the number of days by 365.

For this release, we worked to not have samples that have no genomic data and only have phenotype/clinical data. This should make visualizing data in our Visual Spreadsheet easier.

CPTAC-3

There are a couple of options. You can right-click the .dmg and chosen 'open'. You can also press the Control key, then click the app icon, then choose Open from the shortcut menu. These help pages might help: and from Apple: .

You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is started up. You can also check which studies are on your hub and what data is in them by going to the My Computer Hub page: .

for copying a sample ID from the tooltip.

You can use the python API, , to programmatically access data in the public Xena Data Hubs.

We use the from at John Hopkins to provide this visualization to our users. From their Help Page:

Access this tool by going to our and following the wizard to select samples. Next, enter your gene of interest, click 'somatic mutation' and then click 'Done'. You may need to choose another variable such as 'gene expression'.

MuPIT Help:

This help page is for the Genomic Data Commons (GDC) data we host from . We display all GDC open access genomic data and its accompanying phenotype/clinical data. Explore the .

You can still view data from the older . This data will be available until October 2025. After October 2025 the data from this release will only be available for download.

For the cohort, we noted that occasionally samples were pooled into the same aliquot before sequencing was performed. Xena's visualizations are based on the sample-level, thus for these pooled aliquots there are several samples with duplicate data. An example of this is noted for case , where samples C3N-03011-04, C3N-03011-02, and C3N-03011-01 were all pooled into the aliquot CPT0226250007 before sequencing was performed.

http://www.iclarified.com:8081/28180/how-to-open-applications-from-unidentified-developers-in-mac-os-x-mountain-lion
https://support.apple.com/kb/PH25088?locale=en_US
xenabrowser.net/datapages/?host=https%3A%2F%2Flocal.xena.ucsc.edu%3A7223
https://groups.google.com/g/ucsc-cancer-genomics-browser
http://xena.ucsc.edu/#whatsnew
https://twitter.com/ucscxena?lang=en
https://doi.org/10.1101/326470
pip install 'git+https://github.com/ucscXena/xenaPython'
pip install --upgrade 'git+https://github.com/ucscXena/xenaPython'
import xenaPython as xena  
import xenaPython as xena

hub = "https://toil.xenahubs.net"
dataset = "tcga_RSEM_gene_tpm"
samples = xena.dataset_samples (hub, dataset, None)
samples = xena.dataset_samples (hub, dataset, 10)
samples = ["TCGA-02-0047-01","TCGA-02-0055-01","TCGA-02-2483-01"]
probes = ['ENSG00000282740.1', 'ENSG00000000005.5']
[position, [ENSG00000282740_1, ENSG00000000005_5]] = xena.dataset_probe_values (hub, dataset, samples, probes)
ENSG00000282740_1
[-9.9658, -2.8262, -9.9658]
import xenaPython
help(xenaPython)
Use 'alt-click' to freeze
xenaPython
Jupyter notebook example
https://github.com/ucscXena/xenaPython
MuPIT 3D protein viewer
Rachel Karchin's lab
Visualization tab
http://mupit.icm.jhu.edu/MuPIT_Interactive/Help.html
Live bookmark of above image
GDC Data Release 41.0 - August 28, 2024
GDC data on Xena
GDC Data Release v18.0 release - August 28, 2019
CPTAC-3
C3N-03011

Data Use Agreement

Get Started — TumorMap Help
Indices and tables — TumorMap Help
Today I Learned: Change DOS to Unix text file format in VIMhashrockettil
UCSC Xena (@ucscxena.bsky.social)Bluesky Social

As Xena does not generate any of the data it displays, there is no Xena Data Use Agreement. If you use data from Xena, .

Please check with the original data providers (e.g. the GDC) for any data use restrictions. You can see more about our data providers by clicking on the .

please cite us
Hub page
Logo
Logo

Probes/transcripts/identifiers we recognize

We will automatically detect and map your probes/transcripts/identifiers to HUGO gene names. For instance, we will map Affy probe IDs to HUGO gene names so that you can enter a HUGO gene name when creating a column in the Visual Spreadsheet and we will pull up the corresponding Affy probes.

You can still load your data if you do not see your identifiers listed. We will just not map them to HUGO genes for you. This means that in the visualization you will need to enter your identifiers as they appear in your file.

Supported probes and other identifiers

  • Affy U133 array (hg19) e.g. 1007_s_at

  • Affy HumanExon1.0ST (hg18) e.g. 2315101

  • Affy Human Gene 1.0 ST array (hg19) e.g. 7896736

  • Affy Human SNP6 array (hg18) e.g. CN_473963

  • Agilent Human gene expression 4X44K array (hg18) e.g. A_23_P100001

  • Agilent SurePrint G3 Human CGH array 2x400K (hg18) e.g. A_16_P01651995

  • Agilent Human 1A array (hg18) e.g. A_23_P149050

  • Exon: GENCODE 19 e.g. ENSE00000327880.1

  • Infinium HumanMethylation27 array GDC version (hg38) e.g. cg00000292

  • Infinium HumanMethylation27 array TCGA legacy version (hg18) e.g. cg26211698

  • Infinium HumanMethylation450 array TCGA legacy version (hg19) e.g. cg13332474

  • Infinium HumanMethylation450 array GDC version (hg38) e.g. cg00000029

Supported genes and transcripts

  • HUGO: human gene symbol (hg18) e.g. TP53

  • HUGO: human gene symbol (hg19) e.g. TP53

  • HUGO: human gene symbol (hg38) e.g. TP53

  • Gene: Ensembl human genes (hg19) e.g. ENSG00000223972

  • Gene: Ensembl human genes (hg38) e.g. ENSG00000223972

  • Gene: GENCODE 19 e.g. ENSG00000223972.4

  • Gene: GENCODE 22 comprehensive e.g. ENSG00000223972.5

  • Gene: GENCODE 23 comprehensive e.g. ENSG00000223972.5

  • Gene: GENCODE 23 basic e.g. ENSG00000223972.5

  • Gene: UCSC Known genes (hg18) e.g. uc001aaa.1

  • Gene: UCSC Known Genes (hg19) e.g. uc001aaa.1

  • Transcript: GENCODE 19 comprehensive e.g. ENST00000456328.2

  • Transcript: GENCODE 23 comprehensive e.g. ENST00000456328.2

  • Transcript: GENCODE 23 basic e.g. ENST00000456328.2

  • Transcript: RefSeq (hg19) e.g. NM_000014

  • miRNA miRBase v13 stem-loop (hg18) e.g. hsa-mir-1977

  • miRNA miRBase v20 stem-loop (hg19) e.g. hsa-mir-1302-2

Contact us if you don't see your gene or probe names in this list and we may be able to add it for you.

FAQ

FAQ: Xena didn't map the right probes

If it looks like we picked the wrong set of probes, please click 'Advanced' next to the 'Import' button on the last screen of the wizard to load data. You can then pick the appropriate probes.

Logo

Loading data from the command line

In addition to the data itself, we require some metadata about your file. When you use our website to load your data we fill in this metadata for you. When you use the command line, you will need to provide this data in an additional file.

Metadata file requirements

There are two required fields: type and cohort.

Type

Type can be:

Example:

{"type":"mutationVector"}

Cohort

Cohort is used to know if there are other data on the samples that you are loaded. You can either specify a pre-existing cohort or create your own. Cohort names are displayed on the dataset pages and the cohort drop down menu on the Heatmaps page.

For existing cohorts, you need to enter the cohort name EXACTLY as it appears as the existing cohort name. Note that our cohort names are case sensitive.

Example

{"type":"mutationVector", 
 "cohort":"TCGA Breast Cancer"}

Reference

If you are loading a mutation or segmented copy number file you will also need to specify the reference genome. You do not need to specify this for other file types

Example

{"type":"mutationVector", 
 "cohort":"TCGA Breast Cancer", 
 "assembly":"hg19"}

Probemap

If you are loading a file that has probes, transcripts, or exons and you would like to query your data by gene, you will need to provide a mapping file. You do not need to specify this for other file types.

#id    gene    chrom    chromStart    chromEnd    strand 
id_1    AADACL3    chr1    12776118    12776347    +
host =“https://reference.xenahubs.net”
xenaPython.probemap_list(host)

If you do not see a probemap that will work for you, please let us know.

To reference a probemap you need three files:

  1. Include the probemap reference in your data file .json

  2. Have the probemap file in the same directory as your data file and data file .json

  3. Also have a .json file for the probemap so that we know how to load it

Note that to reference a probemap you need to load the probemap first, then load the data file.

Example data file .json

{"type":"genomicMatrix", 
 "cohort":"TCGA Breast Cancer", 
 ":probeMap":"/unc_v2_exon_hg19_probe_TCGA"}

Example probemap

Example probemap .json file (required to be in the same folder as the probemap)

{ “type”:“probeMap”, 
  “assembly”:“hg19"}

More information about the metadata

Commands to load data

Put both your .tsv and .json files in your_home_directory/xena/files. Then run the jar, passing in the file name, like so:

java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/*

→ loads all files

OR

java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/file1.tsv

→ loads just file1.tsv

Note that you will need to substitute the name of the .jar. file As of the time of writing (September 20, 2018), the name of the .jar file was cavm-0.22.0-standalone.jar. On linux this will be in the directory where you opened the archive. On Windows or MacOS, use your operating system’s file search capability to search for cavm*jar. On Windows you will need to use the full path to your home directory, instead of “~”.

Note you do not need to load the .json files. Xena will automatically look for these and load them.

Commands to delete data

java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv

→ delete just file1.tsv

java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv ~/xena/files/file2.tsv

→ delete file1.tsv and file2.tsv

Help

You can always type:

java -jar cavm-0.xx.0-standalone.jar -h

for help.

Data format specifications and supported biological data types

There are 2 basic data formats and 2 advanced data formats. Each of these formats has one or more biological data types that it supports.

General Specifications for all data formats

We support most types of genomic and phenotype/clinical/sample annotations. For genomic data we support calls made on the raw data including but not limited to expression calls, mutation calls, etc. This is what TCGA calls ‘Level 3’ data and is typically a value on gene, transcript, probe, etc. We do not support FASTQ, BAMs, or other ‘raw’ files. Please contact us if you have any questions.

We support tab-delimited and Microsoft Excel files (.xlsx and .xls). Tab-delimited files generally have a file name ending in .tsv or .txt, though we do not require this. Note that we load tab-delimited files much faster than Excel files. You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.

Please do not have any duplicate genes/probes/identifiers or samples. We will allow you to load with duplicates but will only display the first one encountered in the file.

We assume you use a '.' to indicate a decimal place as opposed to a ',' .

Basic Genomic data: numbers in a rectangle/matrix/spreadsheet

Supported data types

  • RNA-seq expression (exon, transcript, gene, etc)

  • Array-based expression (probe, gene, etc)

  • Gene-level mutation

  • Gene-level copy number

  • DNA methylation

  • RPPA

  • and more ...

For samples that do not have expression for a particular gene, either have a blank field or use "NA".

An example of a genomic matrix file (in this case, expression):

Sample

TCGA-BA-4074-01

TCGA-BA-4075-01

TCGA-BA-4076-01

ACAP3

0.137

NA

0.022

CTRT2

0.024

0.805

0.256

ALK

0.098

0.805

1.87

Basic Phenotypic data: categories or non-genomic in a rectangle/matrix/spreadsheet

These are data on a sample or patient that is categorical in nature (e.g. Tumor Stage or 'wild type' or 'mutant' for a gene) or is numerical but non-genomic (e.g. age or a genomic signature). Samples can be columns and rows can be phenotype/clinical/sample orientation or vice versa. We support both orientations.

Supported data types

  • phenotype/patient/clinical data (age, weight, if there was blood drawn, etc)

  • sample/aliquot data (where it was sequenced, tumor weight, etc)

  • derived data (regulon activity for a gene, etc)

  • genomic signatures (EMT signature score, stemness score, etc)

  • other (whether a sample has an ERG-TMPRSS2 fusion, whether a sample has WGS data available, etc)

Categorial vs numerical data

We support both numerical and categorical data. For numerical data please use a blank field for any samples which may be missing data. For categorical data you can use a blank field or "NA" for any samples which may be missing data.

Note that if you use "NA" for a missing numerical field then the Xena software will automatically treat that column as a category.

To have it be treated as a numerical field please use a blank field.

An example of a phenotype matrix file:

sample

ER_status

disease_status

age

TCGA-BA-4074-01

positive

complete remission

63

TCGA-BA-4082-01

positive

complete remission

54

TCGA-BA-4078-01

negative

undergoing treatment

65

Advanced Segmented data

For segmented data, we require the following 5 columns: sample, chr, start, end, and value. Note that your column headers must be these names exactly!

Please use 'NA' to indicate no data.

Supported data types

  • copy number

We currently accept hg38, hg19, hg18 coordinates.

Example segmented copy number data with required columns:

sample

chr

start

end

value

TCGA-V4-A9EL-01

chr1

61735

16815530

0.041

TCGA-V4-A9EL-01

chr1

16816090

17190862

-0.4227

TCGA-V4-A9EF-01

chr4

86979944

115173700

0.0414

Advanced Positional data

For positional data, we require 6 columns: sample, chr, start, end, reference, alt. Note that your column headers must be these names exactly!

Note that Xena will not call the gene, variant effect, etc for you. All gene annotation information must be included in the file

Supported data types

  • mutation data

We currently accept hg38, hg19, hg18 coordinates.

Example mutation data with the six required columns, plus the gene column:

sample

chr

start

end

reference

alt

gene

TCGA-AB-2802-03

chr2

29917721

29917721

G

A

ALK

TCGA-AB-2802-03

chr1

119270684

119270687

TTAAA

T

MYC

TCGA-AB-2867-03

chr1

150324146

150324146

T

G

PRPF3

To specify a sample is assayed but no mutation is detected, you need a line in the file with three columns filled: sample, start, end. "start" and "end" are required to be integer (if left empty, the data loader will reject the file), so use -1 to indicate that these are bogus coordinates. The rest of the columns are empty strings.

Advanced Other data

Cite us

Please cite us! Citations are an important metric to our funders. Citing us helps us continue to support Xena.

You've run your analysis and are ready to publish your paper - congratulations! Cite the paper below to thank Xena and keep our project funded.

Tag us when you announce a publication or research

Metadata Specification

metadata (.json file) specification

MAPs

MAPs are tsne, umap, pca embeddings in 2D or 3D, or spatial maps for spatical data.

map is a list in the .json metadata file

For each map

"label" free text. Display label of the map, should be easily readable by users

"dataSubType" a string. Describe the nature of the map, must be embedding, spatial . Note this is the dataSubType attribute for the map, not the dataSubType attribute for the file

"dimension" a list of strings. They are the column headers of the dimension columns in the data file. They are used to retrieve data from db.

If it is a spatial map, there might be microscopy image(s) associated with each map.

"unit" (optional, only relevant to spatial map) a string. The unit of map values, e.g. pixel, micrometer

"micrometer_per_unit" (optional, only relevant to spatial map) a floating point number. The physical size in micrometer (µm) of value =1 in spatial map. The parameter will be used in rendering scale bar is the spatial map. If not specified, scale will not be shown.

"spot_diameter" (optional, relevant to spatial map) a floating point number of the size of spot in map unit (not image unit). The parameter will be used to determine sphere size shown in spatial map. If not specified, the size of the sphere will be determined by the browser.

"image" (optional, only relevant to a spatial map with associated images) an array of images, each image has its own parameters. See below.

For each image

"label" free text. Display label of the image, should be easily readable

"path" file path to the image file

"offset" an array of integers. Image offset in pixel in x and y dimension. See below for conversion from spatial coordinate values in map to pixel position in this image.

"image_scalef": floating point number. A scaling factor that converts spatial coordinate values in the spatial map (e.g. pixel or micrometer) to the pixel unit in this image. It works together with the "offset" parameter to convert spatial coordinate values in the spatial map to the actual pixel positions in this image.

  • pixel_in_image_x = image_scalef * spatial_coordinate_x + offset_x

  • pixel_in_image_y = image_scalef * spatial_coordinate_y + offset_y

"transcript" (optional, only relevant to a spatial transcriptomics map) an array of transcript data, each has its own json parameters. See below.

Note. The transcript coordinate must be in the same unit and scale as the map. Therefore no scaling or offset are needed to convert transcript coordinate and map coordinate.

"label" free text. Display label of the transcript data, should be easily readable

"path" file path to the transcript datafile

"dimension" a list of strings. Must have the same number of dimensions as the map. They are the column headers of the dimension columns in the transcript file. They are used to retrieve data from db.

Map examples

Example, map without image

Example, spatial map with matching microscopy image

Example, spatial map with matching microscopy image and transcript data

SURVIVAL TIME UNIT

Surival time unit is displayed on the x-axis of the KM plot. You specify it in metadata file under the "units" attribute.

"units" free text , KM plot x-axis unit, e.g. years, months, days

example

CUSTOM CATEGORICAL PHENOTYPE

You customize the display of features in a phenotype file ("type": "clinicalMatrix") by adding a "clinicalFeature" file and accompanying .json file. To do this there are two steps

  1. Add a "clinicalFeature" reference to the .json file that accompanies the phenotype file. Note the colon notation in example below.

  2. Compose the clinicalFeaure file (tab delimitated) and its .json file.

Below is an example for adding "clinicalFeature" reference in the phenotype file .json metadata

Below is an example for clinical feature file. This file is tab-delimitated, with headers "feature", "attribute", "value". The attributes are "valueType", "state", and "stateOrder". The values for the attribute "valueType" can be "category" or "float".

Below is an example clinicalFeature file .json file (clinicalFeature.txt.json)

An example of setting binary 0/1 variables as categorical data in the phenotype file. Xena automatically assumes 0/1 is numerical data when it is loaded, so if you want this data to be displayed as categorical you need to indicate it in the clinicalFeature file.

Below is an example of how you would do this for a feature called "your_featureName".

COLUMN DISPLAY NORMALIZATION

For genomic data matrix, the optional metadata parameter colNormalization sets the default display scale. If not specified, the browser automatically determines the scale.

colNormalization: ‘true’ | ‘log2(x)’ | ‘normal2’

  • true: display centered by column mean, x - column average, example usage is gene expression matrix that already log transformed.

  • log2(x): display in log2(x+1) scale, example usage is gene expression count matrix

  • normal2: display value of 2 in the background color (i.e. white), typically used for copy number data where the normal = 2

  • for segmented copy number data, if you don't specify colNormalization, display defaults to normal=0, display value of 0 in background color (i.e. white)

example

The metadata file is a .json file and follows. The metadata .json file needs to be in the same directory as the data file. The metadata file and the data file need to have the same base name, including any file extensions (e.g. my_first_dataset and my_first_dataset.json OR my_second_dataset.txt and my_second_dataset.txt.json).

'genomicMatrix' -> where samples are columns and genomic regions are rows. Note that for loading on the command line we do not support the other orientation

'clinicalMatrix' -> where samples are rows and phenotypic columns are rows. Note that for loading on the command line we do not support the other orientation

'mutationVector' ->

‘genomicSegment’->

Here is an example probemap file (a delimitated file):

We have many probemap files that you can see via our .

Here is a with example data in addition to the examples below.

These are numeric data called on genomic regions (e.g. exon expression or gene-level copy number). This data is in a rectangle where samples are columns and rows are the genomic regions (e.g. HUGO gene symbol, transcript ID, probe ID, etc). We also support samples as rows and genomic regions as the columns (i.e. the opposite orientation). For supported genomic regions, please see .

if you're unsure if we will support your data

This is our most flexible data type. If you are wondering if your data is considered to be 'phenotypic' please .

For more information about configuring your phenotype fields, such as controlling the order for categorical features, please see our .

Other columns that may follow are: gene, effect, DNA_VAF, RNA_VAF, and Amino_Acid_Change. These other columns are not required but will enhance the visualization of this data, such as the "gene" column will enable displaying mutations when queried by gene names in addition to queried by genomic coordinates. The “effect” column will color the mutations by effect (the default color is gray). The effect terms are "Nonsense" (color red), "Frameshift" (red), "Splice" (orange), "missense" (blue), "Silent" (green), and etc. The full list of accepted terms can be found .

We support a number of other specialty data types such as structural variants. Please if you have this data so we can help you load it.

Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020).

You can also read our paper for free at bioRxiv:

Do you use Xena to further your research? when you publish and we'll promote you on our Publication Page.

feature
attribute
value

More information about how to specify missing data as well as how Xena decides if a column is categorical or numerical, see our .

json formatting
https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap
xenaPython app
https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap
Metadata Specification
folder
supported gene and probe names
Contact us
contact us
here in our code
contact us
genomic data
phenotypic data
mutation data
segmented copy number data
{
    "type": "clinicalMatrix",
    "cohort": "name of the cohort",
    "label": "display label of the file",
    "dataSubType": "the section the dataset is displayed under in Xena Datapages, describe what data is in the life",
    "map": [
      {
        "label": "display label of the map",
        "dataSubType": "embedding",
        "dimension": ["UMAP_1","UMAP_2","UMAP_3"] 
      }
    ]
}
{
    "type": "clinicalMatrix",
    "cohort": "name of the cohort",
    "label": "display label of the file",
    "dataSubType": "the section the dataset is displayed under in Xena Datapages, describe what data is in the life",
    "map": [
    {
        "label": "display label of the map",
        "dataSubType": "spatial",
        "dimension": ["X","Y"],
        "unit": "pixel",
        "spot_diameter": 178.37655999999998,
        "micrometer_per_unit": 0.3083364764966877,
        "image": [
            {
            "label": "display label of the image",
            "path": "image file path",
            "size": [24240, 24240],
            "offset": [0,0],
            "image_scalef": 1,
            },
            {
            "label": "display label of the image",
            "path": "image file path",
            "size": [2000, 2000],
            "offset": [0,0],
            "image_scalef": 0.08250825
            }]
      }]
}
{
    "type": "clinicalMatrix",
    "dataSubtype": "phenotype",
    "label": "display label of the file",
    "cohort": "name of the cohort",
    "bioentity":"cell",
    "map":[
        {
            "label":"mIF H&E coregistered",
            "type":"spatial",
            "dimension":["CenterX", "CenterY"],
            "unit":"pixel",
            "micrometer_per_unit":0.120280945,
            "spot_diameter":84,
            "image":[
                {
                    "label":"morphology 2D image, coregistered H&E",
                    "path":"/CosMx/img",
                    "offset": [0, 0],
                    "image_scalef": 1
                }
            ],
            "transcript":[
                {
                    "label":"CosMx transcript data",
                    "path":"transcripts.tsv",
                    "dimension":["x","y"]
		}
            ]
        }
    ]
}
{
    "cohort": "TCGA Breast Cancer (BRCA)", 
    "dataSubType": "phenotype", 
    "label": "Curated survival data", 
    "type": "clinicalMatrix", 
    "units": {
        "OS": "days",
        "DSS": "days",
        "DFI": "days",
        "PFI": "days"
    }
}
phenotypeFile.json
{
    "cohort": "TCGA Breast Cancer (BRCA)",
    "label": "label of you dataset", 
    "type": "clinicalMatrix",
    ":clinialFeature": "clinicalFeature.txt"
}
clinicalFeature.txt
feature    attribute    value
alcohol_history    valueType    category
alcohol_history    state    no
alcohol_history    state    yes
alcohol_history_intensity    stateOrder    "no","yes"
clinicalFeature.txt.json
{    
    "type":"clinicalFeature"
}

your_featureName

valueType

category

your_featureName

state

0

your_featureName

state

1

your_featureName

stateOrder

"0","1"

{
     "cohort": "TCGA Acute Myeloid Leukemia (LAML)",
     "dataSubType": "gene expression RNAseq",
     "label": "IlluminaHiSeq",
     "colNormalization": true,
     "type": "genomicMatrix",
     "unit": "log2(norm_count+1)"
}
Metadata Specifications
https://doi.org/10.1038/s41587-020-0546-8
https://www.biorxiv.org/content/10.1101/326470v6
Tag us on Twitter
Data Format Specifications
Logo
UCSC Xena Publication Page – Publications that use Xena
328KB
Xena_Workshop_handout.pdf
pdf
328KB
Xena_Workshop_handout.pdf
pdf
Xena Workshop Handout
Red asterisk indicating this dataset is the one used in the Basic Wizard
How to enter the Chart View
Visual Spreadsheet showing TFAC30 genomic signature for the TCGA Breast Cancer cohort
Figure 2. Xena view of Gene Expression gene sets realized by clicking on Figure 1D shows
Figure 3. Opening a gene set figure Figure 1 to show individual gene expression by sample.
Figure 4. Clicking on an open gene target from 3 shows the gene set and the individual genes.
Figure 5. Clicking Edit in Figure 1 or 3 allows changing cohort, as well as individual sub cohorts.
Figure 6 Analysis can be changed from Figure 1A. Here we see CNV and Mutation together.
Figure 7 Hovering over CNV + Mutation view shows types of mutation and CNV.
Figure 8 A user that is logged in may upload a tab delimited GMT file which is analyzed and available.
figure 9 '+' symbol in Figure 1A allows editing of gene sets from pre-analyzed set.
https://xenabrowser.net/?bookmark=d31da9334a490d3cc5b5b75446e679a1
https://xenabrowser.net/heatmap/?bookmark=6098aca9a00041d6271f18f2b471a241
https://xenabrowser.net/heatmap/?bookmark=d5de509a8ff0032298a0547c97638e3f
https://xenabrowser.net/heatmap/?bookmark=80d9c57b471b654cc569d4ceb44e6591
https://xenabrowser.net/heatmap/?bookmark=634da50313613e659e865c2bfb958ea1
https://xenabrowser.net/?bookmark=16e1d1a37ab7d9820a6bf1399ce5135e
https://xenabrowser.net/?bookmark=048e461b1819bc84808edc16b34e974b
https://xenabrowser.net/heatmap/?bookmark=2e553fc5ca9858653e225fabce0c36ab
https://xenabrowser.net/heatmap/?bookmark=c03d52bed79d2b474ffcef679796a12d
https://xenabrowser.net/heatmap/?bookmark=c6429007551de3bf0ea491c96814a1cf
https://xenabrowser.net/heatmap/?bookmark=ba5edb23fe570ef22f5f518859ca0911
https://xenabrowser.net/?bookmark=046e741291000b1b366c70ec5a3cd39f