arrow-left

All pages
gitbookPowered by GitBook
1 of 16

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

How do I remove null data (gray lines) from view?

Sometimes not all samples in a dataset have data. This can happen for a variety of reasons, such as a particular patient's sample did not undergo one or more analyses. In this case, we use gray, or 'null' to show that there is no data.

To remove null data use the 'Remove samples with nulls' shortcut in the filter menu.

hashtag
Example

How do I remove duplicate samples from a KM plot?

If your plot has an '!' icon next to the p-value this means that some patients are in your plot twice. This can happen when A) a patient has both a tumor and normal sample or when a patient has a metastasis that is part of the dataset and/or B) a tumor sample was split into multiple aliquots and then run through the same analysis twice.

This page will guide you on how to remove duplicates due to A. If there are duplicates due to B you will need to download the data, decide how to resolve any inconsistencies between the multiple aliquots and load it into your own Xena Hub.

hashtag
Example of error icon

hashtag
Removing duplicates

  1. Add the data column of 'sample type' from the Phenotype data

circle-info

We are adding a column of data that indicates the sample type such as 'Primary Tumor', 'Normal', etc. Note that different datasets may have a different name for this the data.

2. Filter to only samples that are 'Primary tumor' by typing 'primary' into the filter search box. Next, click the filter icon next to the filter search box and chose 'Filter'. This will filter out all samples that are not primary tumor.

circle-exclamation

Note that if you are viewing a mostly metastatic cancer like melanoma you may instead need to filter on 'metastatic' instead of 'primary'

3. Run your KM analysis by clicking the caret menu at the top of the column and choosing 'Kaplan-Meier plot' It will now only have primary tumor samples in it.

hashtag
Example

Removing duplicate samples from TCGA Lower Grade Glioma KM analysis

Ending Screenshotarrow-up-right
More help on filtering
Ending Screenshotarrow-up-right

How do I view my data with the data from TCGA?

If you are adding in new samples, this will require you to combine outside of Xena and then load. If you are adding new data on samples we already have, then simply load the data into a Xena Hub.

hashtag
Adding in new samples

We apologize but we don't provide a simple way to do this because of the batch effects that would be present when combining most data across studies. You will need to download the data you wish to combine from TCGA, combine it yourself outside of Xena, and then load it into your own Xena hub.

circle-info

Download TCGA data through

circle-info

hashtag
Adding in more data on TCGA patient's samples, such as new subtype calls

  1. Load your data into your own Xena hub, making sure to select the cohort that you want to view your data side-by-side with when loading it.

circle-info

Sample names and format are study specific. You will need to match what we already in Xena.

hashtag
Genomic Signatures

Note that if you want to view a genomic signature on our gene expression data, you can do so using our

our data pagesarrow-up-right
More information about loading data into your own Xena hub
Our data pages have more information about the sample names for a studyarrow-up-right
genomic signature feature.

How do I compare tumor vs normal expression?

hashtag
TCGA matched normal vs. GTEx normal

There are two main sources of normal expression data in Xena. The first is matched normal tissue samples from TCGA patients. These patient's samples are called "solid tissue normals" and are taken from tissue near the tumor. Normal samples from TCGA patients are typically limited in number but some cancer types may have enough for a robust statistical comparison. It is important to note that their proximity to tumor means it may have tumor microenvironment signal. The second source of normal expression is GTExarrow-up-right. GTEx has expression data from normal tissue of individuals who do not have cancer. There are typically many more samples in GTEx then in TCGA solid tissue normals. However, experimental sample processing are different from TCGA, which may lead to batch effects.

hashtag
Using the TCGA TARGET GTEx study

You can use the for both types of 'normal' samples. Data from the study is from the UCSC RNA-seq Compendium, where TCGA, TARGET, and GTEx samples are re-analyzed by the same RNA-seq pipeline. This pipeline involved re-aligning the reads to hg38 genome and calling gene expression using RSEM and Kallisto methods. Because all samples are processed using a uniform bioinformatic pipeline, batch effects due to different computational processing is eliminated. Note that the samples from this study have only undergone per-sample normalization.

To compare tumor vs normal, you will need to filter down to just the samples you want to compare and then compare gene expression between your groups of samples.

circle-info

More information:

There are four gene expression datasets in this study. Two are normalized using with-in sample methods. The 'RSEM norm__count' dataset is normalized by the upper quartile method, the 'RSEM expected__count (DESeq2 standardized)' dataset is by DESeq2 normalization. Therefore, these two gene expression datasets should be used.

hashtag
Running a Differential Gene Expression Analysis

If you are looking to compare just a few genes, you can use our to run your analysis. If you are looking to run a genome-wide differential gene expression analysis, you can use our . Note that we only allow users to run our Differential Gene Expression Analysis on less than 2,000 samples total. Thus, you will need to filter to run this analysis on this dataset.

circle-info

More information:

hashtag
Tutorial

TCGA TARGET GTEx studyarrow-up-right
Filtering and subgrouping
Compare gene expression between subgroups
chart view
DEA feature
Filtering and subgrouping
Running a Differential Gene Expression Analysis
Tutorial: Tumor vs Normalchevron-right

How do I make subgroups with geneA high and geneB high?

circle-exclamation

This page assumes you are familiar with making 2 subgroups. If you are not, please see the 'How do I make subgroups' help page.

This page details how to create subgroups based on the expression of 2 genes so that you create the following 4 subgroups:

  • geneA expression is high AND geneB expression is high

  • geneA expression is high AND geneB expression is low

  • geneA expression is low AND geneB expression is low

  • geneA expression is low AND geneB expression is high

To do this enter a search terms for each gene, such as 'C:>15' or 'D:<0.6' into the search box and separate each search term with a ';'.

hashtag
Example

You can see in the search bar the expression used to make column A using the example genes of CD44 and CD24.

Also note that you can use this feature on columns besides gene expression, such as copy number variation, etc. You can also use it on categorical features, for instance to compare expression of a gene and the patient's gender (male or female). Simply add the gender column to the Visual Spreadsheet and enter 'female' for one of the search terms above.

circle-info

from 'true' and 'false' to something more biologically meaningful.

How do I make subgroups?

Use the find samples feature (highlighted below) to make subgroups:

First, search for all the patient's samples you want in one of your subgroups. Next, click the Filter + Subgroup menu and choose 'New subgroup column'.

This will create a new subgroup column. All the patient's samples that matched your search term will be in one subgroup labeled as 'true' and all the samples that did not match your search term will in the other subgroup labeled as 'false'.

Your new column can be used for a or to .

circle-info

More information:

  • How to search for samples

  • Search terms for filtering and subgrouping

hashtag
Example

In this example we are creating two subgroups in the TCGA Lung Adenocarcinoma study: patient's samples with aberrations in EGFR and those without. These aberrations could be mutations or copy number amplifications.

hashtag
Steps

  1. Type '(mis OR infra) OR C:>0.5' into the samples search bar. This will select samples that either have a missense or inframe deletion '(mis OR infra)', or where copy number variation (column C) is greater than 0.5. Note that I arbitrarily choose a cutoff of 0.5.

  2. Click the filter menu and select 'New column subgroup'. This will create a new column that has samples that met our search term marked as 'true' (ie. those that have an EGFR aberration) and those that did not meet our search term as 'false' (ie. those that do not have an EGFR aberration).

circle-info

For more information see our Basic Tutorial: Section 2.

circle-info

See our help on renaming the subgroup labelsarrow-up-right from 'true' and 'false' to something more biologically meaningful.

KM analysis
compare gene expressionarrow-up-right
Ending Screenshotarrow-up-right
See our help on renaming the subgroup labelsarrow-up-right

How do I filter to just one cancer type

For users who wish to use the datasets in a Pan-Can cohort but need to view just one cancer type.

hashtag
Generalized Steps

1. Add the phenotype column that details the cancer type

circle-info

The phenotype column will vary depending on which study you choose. See below for specific column names

2. Search for the cancer type you are interested in, making sure that it is listed in the phenotype column. Click the Filter + subgroup menu next to the search bar and select 'Keep Samples'.

circle-info

hashtag
For TCGA Pan-Cancer (PANCAN)

For the TCGA PanCan (PANCAN), you will want to add the phenotype column:

cancer type abbreviation

circle-info

that will take you to the TCGA PanCan (PANCAN) Study with that phenotype column already selected.

hashtag
For TCGA TARGET GTEx

For the TCGA TARGET GTEx, you will want to add the phenotype columns:

main category

study

primary_site

circle-info

that will take you to the TCGA TARGET GTEx Study with those phenotype columns already selected.

How do I make more than 2 subgroups?

circle-exclamation

This page assumes you are familiar with making 2 subgroups. If you are not, please see .

To make more than 2 sample subgroups, enter multiple search terms, such as 'C:>15' into the search box. Separate each search term with a ';'.

This can be used for a number of situations:

To divide a single numerical column into more than 2 subgroups (e.g. geneA high, geneA mid, and geneA low)

  • To make subgroups over the expression of two genes such that you get 4 subgroups (e.g. geneA high + geneB high, geneA low + geneB high, geneA high + geneB low, geneA low + geneB low)

  • To make subgroups over the expression of a gene and a categorical column (e.g. geneA high + Estrogen Receptor positive, geneA low + Estrogen Receptor positive, geneA high + Estrogen Receptor negative, geneA low + Estrogen Receptor negative)

  • To make subgroups over two categorical columns (e.g. Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative)

  • See below for an example of each.

    hashtag
    Examples

    hashtag
    Dividing a single numerical column into more than 2 subgroups (e.g. geneA high, geneA mid, and geneA low)

    In the screenshot below you can see that column D that ranges from 7.3 to 12. If you wanted to have 3 groups: 7.3 - 9, 9 - 10, and 10 - 12, you would enter:

    C:>9 ; C:>10

    into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.

    circle-info

    from 'true' and 'false' to something more biologically meaningful.

    hashtag
    Making subgroups over the expression of two genes such that you get 4 subgroups (e.g. geneA high + geneB high, geneA low + geneB high, geneA high + geneB low, geneA low + geneB low)

    hashtag
    Making subgroups over the expression of a gene and a categorical column (e.g. geneA high + Estrogen Receptor positive, geneA low + Estrogen Receptor positive, geneA high + Estrogen Receptor negative, geneA low + Estrogen Receptor negative)

    In the screenshot below you can see that column E (ERBB2 gene expression) that ranges from 10 to 16. If you wanted to have 4 groups: ERBB2 > 13 + Estrogen Receptor positive, ERBB2 <= 13 + Estrogen Receptor positive, ERBB2 > 13 + Estrogen Receptor negative, ERBB2 <= 13 + Estrogen Receptor negative), you would enter:

    E:>13 ; C:Negative

    into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.

    circle-info

    from 'true' and 'false' to something more biologically meaningful.

    hashtag
    Making subgroups over two categorical columns (e.g. Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative)

    In the screenshot below, if you wanted to have 4 groups: Estrogen Receptor positive + HER2 positive, Estrogen Receptor negative + HER2 positive, Estrogen Receptor positive + HER2 negative, Estrogen Receptor negative + HER2 negative you would enter:

    C:Negative ; D:Negative

    into the search bar and then choose 'New subgroup column' from the filter/subgroup drop down menu.

    circle-info

    from 'true' and 'false' to something more biologically meaningful.

    'How do I make subgroups'
    More information on filtering
    Here is a bookmark arrow-up-right
    Here is a bookmarkarrow-up-right

    How do I make a KM plot?

    To make a KM plot, click on the column menu at the top of a column and choose 'Kaplan Meier Plot'.

    hashtag
    Example

    More information about KM plots can be found in our .

    Ending Screenshotarrow-up-right
    See our help on renaming the subgroup labelsarrow-up-right
    Click here to see our separate help page for this scenario
    Ending screenshotarrow-up-right
    See our help on renaming the subgroup labelsarrow-up-right
    Ending bookmarkarrow-up-right
    See our help on renaming the subgroup labelsarrow-up-right
    Overview of Kaplan Meier Plots

    How do I view multiple types of cancer together?

    For users who wish to compare data across different types of cancer

    To view multiple types of cancer patients side-by-side you will need to start with a Pan-Cancer dataset and then filter down to just the cancer types you want to see.

    The TCGA PanCan Studyarrow-up-right contains the latest data from the PanCan Atlas project, including many hand curated datasets. It also contains some legacy TCGA data across all cancer types, including GISTIC 2 CNV estimates and miRNAseq estimates.

    hashtag
    Generalized Steps

    1. Add the phenotype column cancer type abbreviation that details the cancer type.

    circle-info

    that will take you to the TCGA PanCan (PANCAN) Study with that phenotype column already selected.

    2. Search for the cancer type you are interested in, making sure that it is listed in the phenotype column. Separate each cancer type by 'OR'. Example: 'lgg OR gbm'. Click the Filter + subgroup menu next to the search bar and select 'Keep Samples'.

    circle-info

    hashtag
    Example

    Below is an example for viewing breast and ovarian cancer together for the TCGA PanCan Atlas

    How do I compare gene expression between different cancer types?

    circle-info

    This page assumes that you are already viewing more than one cancer type in view. Please see the help page '' to get started with this.

    If there are cancer types in view that you do not want to investigate, you will need to filter them out. Please see the help page '' to get started with this.

    How do I compare gene expression between subgroups?

    circle-exclamation

    This page assumes you have a column on screen that has the groups you would like to compare (such as 'sample type' for comparing tumor vs normal') or have already made subgroups (such as 'has mutations in EGFR' vs 'does not have mutations in EGFR'). If you need help making subgroups, please see the help page.

    1. First, make sure that the gene or genes that you want to compare across your groups are on screen.

    Steps:
    1. Add a column of data. Enter your gene or list of genes, select 'Gene Expression' and click done.

    2. From the column menu at the top of the new column you created, select 'Chart & Statistics'

    3. Choose 'Compare Subgroups'

    4. Click the dropdown for 'Show data from' and choose your gene expression column.

    5. Click the dropdown for 'Subgroup samples by' and choose the cancer type column.

    6. Choose if you would like a box plot or violin plot and click 'Done'.

    How do I view multiple types of cancer together
    How do I filter to just one cancer type

    How do I ...

    Step-by-step instructions for our most common use cases

    How do I change the color of a column?

    To change the color threshold, click on the column menu at the top and choose 'Display'. From there click 'custom', enter your new thresholds, and click 'Done'.

    To change the color, click on the column menu at the top and choose 'Display'. From there choose a new set of colors from the drop down.

    Click on the charts icon in the top right and choose 'Compare subgroups'.

  • Click the dropdown for 'Show data from' and choose your gene expression column.

  • Click the dropdown for 'Subgroup samples by' and choose your subgroup column.

  • Choose if you would like a box plot or violin plot and click 'Done'.

  • hashtag
    Example

    Below we look at patient's samples that have aberrations in EGFR in the TCGA Lung Adenocarcinoma study. We will investigate if patient's samples that have aberrations in EGFR (mutations or copy number amplifications) have higher expression.

    hashtag
    Ending Screenshotarrow-up-right

    https://xenabrowser.net/?bookmark=d31da9334a490d3cc5b5b75446e679a1

    hashtag
    Steps

    1. Click the graph icon in the upper right corner to enter Chart View.

    2. Click 'Compare subgroups', since we want to compare the group of samples who have aberrations in EGFR to the group of samples that do not.

    3. Click the dropdown for 'Show data from' and choose 'column C: EGFR - gene expression RNAseq - HTSeq - FPKM-UQ'.

    4. Click the dropdown for 'Subgroup samples by' and choose 'column B: (mis OR infra) OR C:>0.5 - Subgroup'.

    5. Click 'Done'.

    hashtag
    Video of steps

    circle-info

    For more information see our Basic Tutorial: Section 3.

    'How do I make subgroups'
    Here is a bookmark arrow-up-right
    More information on filtering
    Beginning screenshotarrow-up-right
    Ending screenshotarrow-up-right

    How do I interact with the tooltip?

    When you are in the Xena Visual Spreadsheet, hovering the mouse over any data on the screen will trigger a tooltip to show up at the top of the view.

    To freeze the tooltip, you need to "Alt-click", i.e. hold on the ALT key on your computer and at the same time click the left mouse button.

    To unfreeze the tooltip, click on the close (X) icon.

    This can be helpful if you want to click on the link to take you to the UCSC Genome Browser, where you can view more information about those genomic coordinates.

    How do I cite UCSC Xena?

    You've run your analysis and are ready to publish your paper - congratulations! Cite the paper below to thank Xena and keep our project funded.

    Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020).

    You can also read our paper for free at bioRxiv:

    https://doi.org/10.1038/s41587-020-0546-8arrow-up-right
    https://www.biorxiv.org/content/10.1101/326470v6arrow-up-right