More details about all the features we have on Xena
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
When there are no overlapping segments, Xena displays the value and color of the copy number segment as indicated in the column legend at the bottom of the column.
When there are overlapping segments, Xena follows these steps:
Compute overlaps by slicing segments that overlap with other segments. For example if there was one segment from chr1:10000-20000 and a second segment from chr1:10050-10100, then resulting segments from this step would be chr1:10000-10050, chr1:10050-10100, and chr1:10100-20000.
For each segment defined in step 1, determine which segments in the original data overlap with this segment.
Divide data segments into those that are greater than copy number neutral (i.e. are amplifications) and those that are less than copy number neutral (i.e. are deletions). Average the segments for each of these two groups.
Find the colors corresponding to the two averages from step 3. Then pick a color that is in between those two colors on the color wheel. An example would be that if the amplifications are red and deletions are blue, the resulting color from a strong amplification and a strong deletion would be purple. Note that copy number neutral in this example would be white.
More information about how we color mutation columns
Samples that have mutation data are white with a dot or line for the mutation for where the mutation falls in relation to the gene model at the top of the column. Mutation data is colored by the functional impact:
Red - Deleterious
Blue - Missense
Orange - Splice site mutation
Green - Silent
Gray - Unknown
Samples for which there is no mutation data are gray with no dot or line, and are marked as 'null'.
Red --> Nonsense_Mutation, frameshift_variant, stop_gained, splice_acceptor_variant, splice_acceptor_variant&intron_variant, splice_donor_variant, splice_donor_variant&intron_variant, Splice_Site, Frame_Shift_Del, Frame_Shift_Ins
Blue --> splice_region_variant, splice_region_variant&intron_variant, missense, non_coding_exon_variant, missense_variant, Missense_Mutation, exon_variant, RNA, Indel, start_lost, start_gained, De_novo_Start_OutOfFrame, Translation_Start_Site, De_novo_Start_InFrame, stop_lost, Nonstop_Mutation, initiator_codon_variant, 5_prime_UTR_premature_start_codon_gain_variant, disruptive_inframe_deletion, inframe_deletion, inframe_insertion, In_Frame_Del, In_Frame_Ins
Green --> synonymous_variant, 5_prime_UTR_variant, 3_prime_UTR_variant, 5'Flank, 3'Flank, 3'UTR, 5'UTR, Silent, stop_retained_variant
Orange --> others, SV, upstreamgenevariant, downstream_gene_variant, intron_variant, intergenic_region
Note that we are case insensitive when we color for these terms.
For the gene-level mutation datasets (Somatic gene-level non-silent mutation):
Red (=1) --> indicates that a non-silent somatic mutation (nonsense, missense, frame-shif indels, splice site mutations, stop codon readthroughs, change of start codon, inframe indels) was identified in the protein coding region of a gene, or any mutation identified in a non-coding gene
White (=0) --> indicates that none of the above mutation calls were made in this gene for the specific sample
Pink (=0.5) --> some samples have two aliquots. In the event that in one aliquot a mutation was called and in the other no mutation was called, we assign a value of 0.5.
Kaplan Meier Survival Analyses are a way of comparing the survival of groups of patients. More information on what a Kaplan Meier analysis is can be found in this article
To generate a KM plot, click on the column menu at the top of a column and choose 'Kaplan Meier Plot'.
For numerical or continuous features, you will have the option of having 2 groups of samples, 3 groups of samples, or viewing the upper vs lower quartile. For 2 groups, we divide the samples on the median. For 3 groups, we divide samples into the upper third, middle third, and lower third.
When viewing the upper vs lower quartile, note that we only include samples that are greater than (not greater than or equal to) the upper quartile, and the same for the lower quartile.
Note that all are used to calculate the median and other dividing values, whether or not they have survival data. To see which samples have survival data, add the column 'OS' from the phenotype data.
If more than one sample has the same value, we put the samples in a group together, even if this means the groups end up being unequal in size.
For categorical features, we only show the first 10 categories.
For mutation features, we divide samples into those with any mutation and those without. To make different groups (e.g. samples with nonsense mutations vs those without), create your own subgroups and run a KM plot on the new column
We remove samples with 'null' data for all plots.
We default to Overall Survival. Users can select different end points if they are available. An example of this is in the TCGA PanCancer Study.
We default to the last time any individual in the plot was known to be alive. You can change this to be 1-year or 5-year survival by changing the time cutoff at the bottom of the screen. The statistics will automatically recalculate. TCGA data uses days as their measurement of time.
You can generate a high quality PDF by clicking the PDF icon.
You can download the data used to generate the KM plot using the download icon. It will download the Event and Time to Event columns, in addition to the sample ID, patient ID, groups, and underlying data.
When there are multiple curves or lines in a KM plot, Xena Browser compares the different Kaplan–Meier curves using the log-rank test. The Browser reports the test statistics (𝜒 2) and p-value (𝜒 2 distribution). Data is retrieved in real-time from Xena Hub(s) to a user's web browser and the test is performed in the browser to maintain your data privacy.
The statistics the Xena Browser reports are equivalent to R's survival package, survdiff, with rho=0 (default in R).
If all patients in a particular group (i.e. line) are censored before any event happens for the whole population (including all the groups), we exclude this group from the statistical analysis and perform the log-rank test on the remaining groups. We do this because we have no way to know the number of people at risk for this particular group at any of event times, and therefore can not compute any statistics for this group. R handles this exception in the same way. Although this group is removed from the statistical analysis, we still display the group in the KM plot.
Note that we do not automatically remove duplicate patients (for instance if there is a tumor and a normal sample from the same patient). You can determine if there are duplicate patients by looking for the "!" icon next to the p value. Learn how to remove duplicate samples.
This dynamic, powerful, and flexible view is our default view into the data.
The Visual Spreadsheet allows you to add an arbitrary number of columns of any data type (mutation, copy number, expression, protein, phenotype, methylation, etc) on any number of patient's samples into a spreadsheet-like view. We line up all columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.
Get started by going to the Xena Browser and following the wizard to enter your data of interest.
The wizard on the screen will guide you to choose a study to view and TWO columns of data to view on those samples. Note that if you do not choose at least two columns, the wizard will not exit and let you interact with the data.
You can select a cohort either by choosing 'Help me select a cohort' and searching our cohorts for you cancer type, etc. or by choosing 'I know the study I want to use' and searching for the partial or full name of the cohort you are interested in.
Enter a HUGO gene name or a dataset-specific probe names (e.g. a CpG island). You can enter one gene or multiple genes. Separate multiple genes with a space, comma, tab, or new line.
To display a genomic region, enter the genomic region, choose your dataset and click 'done'. We recongize chromosomes (e.g. chr1), arms of chromosomes (e.g. chr19q), and chromosomes coordinates (e.g. chr1:100-4,000).
After entering a gene or probe name, you will need to select one or more datasets.
We have pre-selected default datasets for most cohorts. These datasets are selected based because they are the most used datasets. Typically there is a default mutation, copy number, and expression dataset.
Xena also has more datasets than those listed in the Basic Menu. Depending on the cohort, these can include DNA methylation, exon expression, thresholded CNV data and more. To access them, click on 'Show Advanced' below:
More information on basic datasets
We annotate datasets used in the basic Visual Spreadsheet wizard with a red asterisk in our datasets pages. For an example see: https://xenabrowser.net/datapages/?cohort=TCGA%20Acute%20Myeloid%20Leukemia%20(LAML)
Patient samples are on the y-axis and your columns of data are on the x-axis. We line up all columns so that each row is the same sample, allowing you to easily see trends in the data. Data is always sorted left to right and sub-sorted on columns thereafter.
If you entered a single gene, that gene will be listed at the top of the column. If there are multiple probes mapped to that gene in the dataset you selected they will be displayed as subcolumns ordered left to right in the direction of transcription.
If you selected a positional dataset, such as segmented copy number variation or mutation we will display the gene model will be displayed at the top of the column. The gene model is a composite of all transcripts of the gene. Boxes show different exons with UTR regions being short and CDS regions being tall. We display 2Kb upstream to show the promoter region. Use the column menu to toggle to show intronic regions.
If you entered multiple genes, each gene will be listed as a subcolumn for that dataset. If there are multiple probes mapped to that gene in the dataset (i.e. if you entered a single gene then you would see the probes as subcolumns), then the probes are averaged for a single value per gene.
Note that if you entered more than one gene and selected a mutation dataset, we will only show the first gene. If you wish to see multiple mutation columns, please enter each gene individually and click 'done'
When displaying a chromosome range, genes will be shown at the top of the column, with dark blue genes being on the forward strand and red genes being on the reverse strand. Hovering over a gene will display the gene name in the tooltip. Note that introns are always shown in this mode.
Individual values vary by dataset. The legend at the bottom of the dataset will tell you the units for your particular dataset, including any normalization that was performed. If a sample does not have data for a column, it will show as gray and be labeled as 'null'.
If the entire column is gray this means we did not recognize the gene, probe, or position. If you believe this to be in error, please try an alternate name.
More information about a dataset can be found in the dataset details page. To get there, click on the column menu and choose 'About'.
The Xena Browser uses the y-axis for samples and the x-axis/columns for genomic/phenotypic features. Data from a single sample is always on the same horizontal line across all columns, allowing you to see screen-wide trends. The Xena Browser orders samples left to right first by the first columns, then the second, etc. If there are multiple genes, identifiers, probes within in a column, samples is ordered from left to right by 1st sub-column, then 2nd sub-column, and so on.
Numerical data are ordered in descending order (e.g. 3.5, 1.2, ...). Categorical data (e.g. stage, tumor type, etc) are ordered by categories. CNV data is sorted by the average of the entire column. Positional mutation data is ordered by genomic coordinates (from 5'->3') and then by the predicted impact of the mutation. Both CNV and positional mutation data has the option to instead sort by the zoomed region. Click the column menu at the top of the column and choose 'Sort by zoom region avg'.
To reverse the ordering, click the column menu at the top of the column and chose 'Reverse sort'
As the sample sort order is controlled by the left most columns, it can be useful to explore the data by moving a different column to the left.
To move a column click on the column header and drag a column to the right or left.
Click and drag any where in any column to zoom in in either direction. Zoom out to all samples by clicking the 'Clear Zoom' at the top. Zoom out to the whole column by clicking the red 'x' at the top of a column.
The Tooltip at the top of the Visual Spreadsheet shows more information about the data under the mouse. Links are links to the UCSC Genome Browser to learn more about that gene or genomic position. Alt-click to freeze and unfreeze the tooltip to be able to click on the links. Click here for more information about interacting with the tooltip.
You can change the size of a column by clicking on the bottom right corner of a column and dragging to a new size.
You can add another column of data by clicking on 'Click to add column' either on the right edge of the visual spreadsheet or by hovering between columns until 'Click to insert column' displays'.
Chart View will generate bar plots, box plots, violin plots, scatter plots, and distribution graphs using any of the columns in a Visual Spreadsheet. Statistics, such as and , and will be calculated automatically.
To get to the chart view click on the icon indicated below by the red box or use the column menu and select 'Chart & Statistics'.
Once you enter Chart View, it will ask you a series of questions about what type of graph you are trying to make.
Compare subgroups will allow you to compare groups of patient's samples, either those that you have made or via a categorical feature, such as sample type. It will build the appropriate graph depending on whether you have selected a continuous numerical or categorical column. This option will let you make box plots, violin plots, and bar charts.
See a distribution will let you see a histogram distribution of the data in a single column. The column can have sub-columns, either multiple probes or multiple genes, which will instead create a plot with multiple box plots.
Make a scatterplot will make a scatterplot from two continuous numerical columns. The second column can have multiple sub-columns, either multiple probes or multiple genes, which will create overlapping scatterplots
If an option is grayed out, this means that you do not have enough or the right type of data on the screen. Return to the Visual Spreadsheet and add more data.
We show statistics in the upper right corner of the screen for most graphs. If we detect it will take some time run the statistics we may instead show a button with 'run stats', so that you can decide if you would like to run the statistical test.
Advanced options available under the graph will allow you to change the scales of the axes. If you are viewing a scatterplot it will also allow you to color the points by a column of data.
Note that for violin plots, the width of each plot is does not relate to the number of samples in the plot.
To return to the Visual Spreadsheet, click either the icon in the upper left, or the 'x' close button.
How to find samples that you want to remove or keep in the view. How to make subgroups.
Use the search box at the top of the screen to first pick/find your samples of interest. Then filter to keep or removes these samples, create a new subgroup column, or zoom.
The bar highlighted above allows you to search all data on the screen for your search term. Note that it will not search data that is not on the screen. Samples that match your criteria are marked with a black bar in the Visual Spreadsheet.
You can search for samples by either typing in the search bar or by clicking on the dropper icon to enter the pick samples mode. The pick samples mode will allow you to click on a column to select samples. The search term for your picked samples will appear in the search bar. To exit the pick samples mode, click on the dropper icon again.
Note the pick samples mode tends to work best if the column you are selecting from is the first column.
Once you have your sample(s) of interest, click on the filter + subgroup menu and choose to:
Keep samples: Keep only the samples which match your criteria.
Remove samples: Remove the samples which match your criteria.
Clear sample filter: Remove ALL filters currently applied.
Remove Samples with nulls: Removes samples that have no data for one or more columns. Equivalent to typing 'null' in the search bar and choosing 'Remove samples'.
Zoom: Zoom to the samples that meet your criteria. Shift-click to zoom out.
Once you have either filtered, created a subgroup column, or zoomed to samples, your search term will be added to the search history. Access the search history by clicking the downward facing arrow at the upper right of the search bar.
Once the subgroup column is created, users can change the labels from "true" or "false" to, for example, "wild type" or "EGFR mutant" by adjusting the column display settings. To access these select the three dot menu at the top of the column and choose 'Display'
More information on
New subgroup column: Create a new column where samples that meet your criteria are annotated as 'true' and samples that don't meet your criteria are annotated as 'false'. This new columns can then be used for or in the .
To create more than 2 subgroups, please see our guide.
Note this search history will be preserved in .
Our search is 'contains' search, meaning the term you enter can be at the beginning, end or in the middle of a matched term. Our search is case-independent. An example is
IIA
will match 'Stage IIIA' and 'Stage IIA'. To specify a specific string, use quotes
"Stage IIA"
You can specify a certain column and mathematical expression such as
A:>2
which will find all values greater than 2 in the first column. We support the following operators
= (equal)
>= (less than or equal)
>= (greater than or equal)
< (less than)
> (greater than)
!= (not equal)
You can search any annotation on a mutation, such as the functional impact, protein position, or gene name itself
To find all samples with mutations with the protein change, enter:
V600E
To find all samples where the functional impact has the text 'frame' or 'nonsense' in it:
frame OR nonsense
To find all samples that have a mutation, search the gene annotation:
TP53
To find all samples that do not have a mutation, use the negation of the gene annotation:
!=TP53
To find all samples that do not have data in one or more columns, use:
null
and choose 'Remove samples'. To find all samples that do not have data for just one column, use:
B:null
Enter a sample ID to find a sample of interest. An example:
TCGA-DB-A4XH
If you are searching for multiple sample IDs, you will need to separate each by an 'OR'. You can copy and paste a list of sample IDs into the search bar as long as they are separated by a space, tab, or return (new line).
TCGA-DB-A4XH OR TCGA-2F-A9KO-01 OR TCGA-02-0001
To make it easy to search a specific column, we use shorthand to annotate the first column as 'A:', the second as 'B:', etc. An example is
A:YES
This will search ONLY the first column for the word 'YES'. Note that we will retain your original search if you move the columns around.
You can enter multiple search terms and we will match all of them with an implicit 'AND'. We also support 'OR'.
Use parentheses to group search terms. For example:
"Stage II" (B:Negative OR C:Negative)
will search for samples that match 'Stage II' in any column and are 'Negative' for either the second or third column.
You can also use '!=' to negate a term such as:
!=null
which will match all samples that have data across all columns.
Run a genome-wide differential gene expression analysis to compare groups of samples
To run a differential gene expression analysis, click on the 3 dot column menu at the top of a categorical column (not a numerical column) and choose 'Differential Expression'.
This will take you to new page where you will define the sample subgroups you would like to compare (note that you can select multiple categories for a single subgroup).
After you have your subgroups, scroll to the bottom and click 'submit'.
Due to compute limitations you can only run a total of 2000 samples through the analysis pipeline.
This will start the analysis, which make take a while to run depending on the size of the dataset. As the results are completed, the web page will update. Scroll to see more results. Once the analysis is finished it will say 'Done' at the top of the page.
The gene expression dataset chosen for a specific study/cohort is the same gene expression dataset as the one in the Basic Datasets menu.
The Advanced Visualization parameters only apply to the PCA or t-SNE plot. They do not apply to any other analyses.
We disable running our differential gene expression analysis on your own data since we send the data in the analysis to various websites, which may not be secure. There are 3 options to run our analysis on your own data:
Upload your data to BioJupies to run a somewhat similar analysis. BioJupies by the Ma'ayan lab will run a somewhat similar analysis to the one we run and has a very user friendly interface.
Upload your data to the Bulk RNA-seq analysis pipeline Appyter to run a very similar analysis. This pipeline is what our analysis is based off of and will require a bit more familiarity with running differential gene expression analyses. Our modifications to this analysis are just to automatically pick the best normalization, etc options based on our public data. You will need to know which options are best given your own data.
Run our pipeline on your own computer. This will give you identical results to our pipeline but requires the most engineering to set up and run. You will need to set up a docker with all the dependencies pre-installed and then download and run the notebook on this docker.
Run a genome-wide differential GSEA analysis to compare groups of samples
To run a GSEA analysis, click on the 3 dot column menu at the top of a categorical column (not a numerical column) and choose 'GSEA'.
This will take you to new page where you will define the sample subgroups you would like to compare (note that you can select multiple categories for a single subgroup).
After you have your subgroups, choose a gene set library, scroll to the bottom and click 'submit'.
Due to compute limitations you can only run a total of 2000 samples through the analysis pipeline.
This will start the analysis, which make take a while to run depending on the size of the dataset. As the results are completed, the web page will update. Scroll to see more results. Once the analysis is finished it will say 'Done' at the top of the page.
The gene expression dataset chosen for a specific study/cohort is the same gene expression dataset as the one in the .
The Advanced Visualization parameters apply to the PCA or t-SNE plot, as well as the blitzGSEA analysis itself.
Note that the GSEA analysis runs , a faster implementation of a traditional GSEA analysis.
We disable running our GSEA analysis on your own data since we send the data in the analysis to various websites, which may not be secure. Currently we only offer a as a method for running this pipeline on your own data. Please contact us if you need help setting this up.
for copying a sample ID from the tooltip.
Enter a genomic signature over a set of genes for a particular dataset
Genomic signatures, sometimes expressed as a weighted sum of genes, are an algebra over genes, such as "ESR1 + 0.5*ERBB2 - GRB7". Once a signature is entered, the value for each gene name for each sample are substituted and the algebraic expression is evaluated.
Open the Add column menu
Enter '=' and then your signature into the gene entry box
Select 'gene expression' as the dataset
Click 'Done'
There must be a space on both sides of the "+" and "-".
Alternatively enter a list of genes and we will automatically add a '+' in between each gene when evaluating the signature
If we can not find a gene that is part of the signature, the missing gene will be included as a zero in the expression calculation and the label will list the genes as missing.
Hess et.al. identified 30 genes whose gene expression profile is predictive of complete pathologic response to chemotherapy treatment in breast cancer.
=E2F3 + MELK + RRM2 + BTG3 - CTNND2 - GAMT - METRN - ERBB4 - ZNF552 - CA12 - KDM4B - NKAIN1 - SCUBE2 - KIAA1467 - MAPT - FLJ10916 - BECN1 - RAMP1 - GFRA1 - IGFBP4 - FGFR1OP - MDM2 - KIF3A - AMFR - MED13L - BBS4
Here we can see that the predicted chemo response signature is high in the basal subtype and low in luminal subtype. Additionally, the signature is high for ER negative samples and low for ER positive samples.
Bookmark: https://xenabrowser.net/?bookmark=2401ccb792e256d7397008b24af20565
We also have a number of signature datasets under the TCGA Pan-Cancer study from the PanCan Atlas project:
To use these signatures, go to the dataset pages (links above) to see what the names of the specific signatures are (under Identifiers). Then in the visualization enter the name of the specific signature as a gene, click 'Advanced', choose the appropriate dataset, and click 'Done'
There are 4 ways to download data
1. Download data in a single column of a Visual Spreadsheet In a Visual Spreadsheet, click on the column Hamburger menu, then "Download" to download just the data from the column.
2. Download data in an entire Visual Spreadsheet In a Visual Spreadsheet, clicking on the download icon in the upper right corner of the spreadsheet.
3. Bulk download a whole dataset file Click top banner "Data Sets" to navigate to the dataset of your interest, where a download url link is in the page. You can also reach the dataset page by clicking on the column Hamburger menu, then "About". Click on the download url to download the entire dataset. Or use "wget", "curl" to download from command line.
4. Via our APIs:
Our files are tab-delimited or '.tsv'. We recommend opening them in your favorite spreadsheet program, such as Microsoft Excel, which will automatically convert the tabs into new columns. Please note that if you have many thousands of samples, Microsoft Excel will likely have difficulty opening the file. In this case, the command line may work better for you.
A tool developed by the Stuart Lab to view samples in a 2D layout
UCSC TumorMap is a separate project developed by the Stuart Lab at UCSC. We link to them to help users gain another perspective on the data they are seeing in Xena. From their :
TumorMap is a tool that enables grouping samples based on their omic signatures in a visually accessible way. Similar to dimensionality reduction methods, Tumor Map method takes a high-dimensional omics space and produces a two dimensional visualization. Unlike most dimensionality reduction methods, the TumorMap method is able to combine multiple types of omics data (e.g. mRNA expression and methylation data types in a single map). Furthermore, TumorMap is an interactive tool that allows navigating through a tumor landscape that represents a heterogeneous multi-dimensional and multi-platform omic space of oncogenic signatures.
In the TumorMap, each node is a sample and clusters of samples indicate groups with similar oncogenic signatures and genomic alteration events. The samples in a map may be colored by various molecular, clinical, diagnostic, prognostic, and phenotypic annotations (e.g. tumor type, molecular subtype, etc.) to visualize associations with the data type used in clustering.
Bookmarks are a great way to save a particular view in Xena, either for yourself or to share with others.
To bookmark a view, click on 'Bookmark' in the top navigation bar. From here you can either click 'Bookmark' to create a bookmark URL or click 'Export' to export a file that can then be imported back to the browser.
When you click 'Bookmark' you will then need to click 'Copy Bookmark' to copy the bookmark URL to your copy buffer. Large views may take a second or two to generate a URL.
Note that your filter and subgroup history, as well as the last Chart View you created, if any, will be saved as part of the bookmark.
Bookmarks are only guaranteed for 3 months
The 'Bookmark' option will store all the data in view on our servers and provide you a link. This is the easiest way to share a view. Note that if you have any private data in view, this option will be disabled to preserve your privacy. Please also note that if you lose the link there is no way to get it back.
If you chose Export, it will give you a file with everything Xena needs to recreate your view. You can then save this file and import it back into Xena. While this option can be a bit cumbersome, it will allow you to share private data. Note that these files are still only guaranteed for 3 months, though they may last for longer.
The 'Recent Bookmarks' option will temporarily show the 15 most recent bookmarks you have created. This can be useful if you're constructing many bookmarks. Note that this menu is frequently reset so do not use this as permanent storage for a bookmark.
When you create a bookmark link, we save the data in view on our servers. To protect user data privacy, we have disabled this option when private data is in view. Please use the Export/Import option instead.
A 3D protein viewer developed by Rachel Karchin's lab
We use the MuPIT 3D protein viewer from Rachel Karchin's lab at John Hopkins to provide this visualization to our users. From their Help Page:
MuPIT interactive is an online tool that allows you to map sequence variants from their genomic position onto protein structures. Viewing a variant on protein structure can be useful in interpreting its potential biological consequences. After mapping, the variants are displayed on an interactive 3d structure. The user may turn variants on and off, and display annotations on the protein structure.
Access this tool by going to our Visualization tab and following the wizard to select samples. Next, enter your gene of interest, click 'somatic mutation' and then click 'Done'. You may need to choose another variable such as 'gene expression'.
Once you have the mutation data you're interested in, click the menu at the top of the column and chose 'MuPIT View'. This will send your mutation data to MuPIT and open their viewer in a new tab.
MuPIT Help: http://mupit.icm.jhu.edu/MuPIT_Interactive/Help.html
On the left of the figure is Xena mutation column view of ERBB2 somatic mutations from the TCGA breast cancer cohort. Users click on the MuPIT link from the caret menu at the top of the column. It will send all the mutations' genomic positions as well as their recurrence p-values to the MuPIT display. On the right side of the figure, MuPIT displays mutations in various size of bright green spheres. Large spheres for recurrent mutations. Size of the mutation spheres are determined by recurrence p values. The MuPIT display shows these ERBB2 somatic mutations cluster around the ERBB2 active site (ATP binding site in blue and proton acceptor site in teal).
You can use the python API, xenaPython, to programmatically access data in the public Xena Data Hubs.
Xena's Transcript View shows transcript-specific expression or isoform percentage for 'tumor' TCGA data and 'normal' GTEX data. It allows you to compare the distribution of these values for two groups of patient samples.
This tool was created by Akhil Kamath as part of Google Summer of Code 2017. Akhil was advised by Angela Brooks and Brian Craft. Thank you Akhil for all your work!
Enter the HUGO name of your gene of interest and click 'OK'. Choose your two studies of interest from the two drop down menus. Each row in the visualization shows the transcript, transcript structure and density plots showing range of expression of that transcript.
Change the units from TMP (Transcripts Per Million) to isoform percentage using the drop-down near the top. To zoom on a row, click on it. To zoom out, click on the row again.
All RNAseq data was generated by the Toil pipeline recompute done by the UCSC Computational Core using the RSEM package. All transcripts are from Gencode V23 comprehensive annotation.
For this visualization, we numbered the exons using an in-house automated method which may not line up with exon numbering in the literature. This method is subject to change and should not be relied on to denote any exon going forward.
Regions that are intronic in all transcripts are removed. The remaining exonic regions are numbered 1..N. Different exons within a given region are labeled starting with ‘a’ for the left-most exon (in transcript direction).
For example, exon 3 is the unique exon in the third exonic region. Exons 4a and 4b are two different exons in the fourth exonic region.
Another way to say this is: different exons across all transcripts which overlap transitively will be assigned the same integer. So if one transcript has exons 4a and 4c, there must be exons in other transcripts that overlap them, and each other.
The Xena Gene Sets Viewer https://xenagoweb.xenahubs.net/xena compares gene expression, somatic mutation, and copy number variation profile of cancer related gene sets across cancer cohorts. It queries genomics data hosted on public Xena Hubs, in a similar way as other tools in the Xena Visualization suite. And then it generates gene set visualizations of those data.
Source code:
The Gene Set Viewer allows comparison of individual gene sets or pathways and their genes across two cancer tumor sample cohorts as well as comparison within the same sub cohorts.
As an overview, Figure 1 shows two cohorts, the left (olive background, TCGA Ovarian Cancer) and the right (tan background , TCGA Prostate Cancer). Figure 1A shows the selection for the analysis, Gene Set, view limit, and filter (differential versus similar). Figure 1B shows the view comparing the Mean Gene Set Score in the center and individual samples on the right. 1C shows the individual samples, with the hover result showing the sample and score in 1E. 1D provides a link directly into Xena for the given gene set. 1F provides a sharable URL link. 1G provides a login for use in uploading.
Figure 8 shows analysis of a GMT file using the BPA method [citation: thanks to Verena Friedl]. This is only available to logged in users and they may only see their own analysis and are limited to 100 pathways. Logins are any valid google login. Several public pathway sets are available including those curated from the Gene Ontology Consortium (thanks to Laurent-Philippe Albou) as well as those from the Hallmark [cite] and Pancan [cite] analyses.
BPA GENE EXPRESSION
PARADIGM IPL
REGULON ACTIVITY (only avaiable for the LUAD Cohort)
CNV ∩ MUTATION
COPY NUMBER
MUTATION