Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Step-by-step instructions to viewing your own data
Get started viewing your own data:
We support most types of genomic and/or phenotypic/clinical/annotation data. Genomic data needs to be values called on genes, transcripts, exons, probes or some other identifier. Phenotypic/clinical/annotation data can be almost anything, including patient data (e.g. age, set, etc), clinical data (survival data for a KM plot), and other data such as gene fusion calls, regulon activity, immune scores, and more. Samples can be bulk tissue, cell lines, cells, and more. We do not visualize raw data such as FASTQs or BAMs.
Data can be your own or from another source, like GEO or a publication.
We support tab-delimited (.tsv and .txt) and Microsoft Excel files (.xlsx and .xls). Data on a Local Xena Hub can only be viewed or accessed by the same computer on which it is running, keeping private data secure.
The Local Xena Hub must be installed and running in order to load data, as well as any time you want to view data. The Local Xena Hub will remember previously loaded data.
Please use Chrome to view your own data.
Click on VIEW MY DATA. You will be prompted to download and install a local Xena Hub.
Double click on the download to begin the installation of the Xena Hub. Follow the wizard to finish the install.
Mac: OSX 10.7 and above
Windows: 64-bit
Linux: ability to run a .jar file
After installing a local Xena Hub, go back to VIEW MY DATA to auto-start the Hub. If it does not automatically start, refresh the page or double click on the Xena Hub application on your computer. The Xena Hub application should be in your Applications folder for Mac and Windows. Note that it will take up to one minute to start up.
Most people load data into their Local Xena Hub through our website wizard, which leads you through the loading process step by step. Note that you will want to make sure your data is properly formatted ahead of time.
You can also load data via the command line.
Click on VISUALIZATION. If your study is not already selected as step 1 of the wizard, then select it from the drop down and click 'Done'. Note that if you did not enter a study name your data will be under 'My Study'.
When you loaded your genomic data we asked what type of genes, transcripts or probes you used. If you selected one of the options from the drop down menu then you can enter HUGO gene names or the identifiers in your file. If you did not select one of the options then you will need to enter the identifiers as they appear in your file.
You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is running on your computer. You can also check which studies are on your hub and what data is in them by going to the My Computer Hub page.
Xena does not utilize a central rendering service, or require hubs to be publicly accessible on the internet like, for example, the UCSC Genome Browser does. Data flows in one direction, from hubs to the user agent. If the user installs a Xena Hub on their laptop, the hub is as secure as the laptop. If the user installs a Xena Hub on a local network, behind a firewall, the hub is as secure as the local network.
The Xena Browser accesses data from a local Xena Hub on the same computer by requesting data from http://127.0.0.1. The local Xena Hub will make the data within it available at this address. The local Xena Hub will only answer requests made form the user's own computer.
Users will need to use a web browser that supports this if they wish to use a Xena Hub on the loopback interface. At the time of writing, this includes Chrome, and Firefox, but not Safari.
A very limited set of metadata is considered to be not secure in the Xena architecture model. This includes cohort names and samples names. This metadata is visible to other hubs in the following scenarios. When the user selects a cohort, all hubs are queried for samples on that cohort. When the user selects a data field, the hub holding that field is queried with the field ID (e.g. gene, probe, transcript, phenotype) and all cohort sample IDs. This means, for example, that two hubs holding data on the same cohort will see the union of sample IDs from that cohort. While data queries are not made available publicly, a malicious person could gain entry to a Xena Hub and comb through logs for these queries. For these reasons, these metadata fields should not contain private information.
To visualize and perform a KM analysis, we use two columns/rows of data, time to event and event. These data must be loaded in a phenotype file. The phenotype file can contain other data as well.
Note that you will need to name the headers in your phenotype file EXACTLY what we recognize. See the list of recognized headers for each type of survival/interval below.
This data can be in days, months, years, etc.
Time to Event is a duration variable for each subject having a beginning and an end anywhere along the timeline of the complete study. It begins when the subject is enrolled into a study or when treatment begins, and ends when the end-point (event of interest, for example, death or metastasis) is reached or the subject is censored from the study.
Censoring means the total survival time for that subject cannot be accurately determined. This can happen when something negative for the study occurs, such as the subject drops out, is lost to follow-up, or the required data is not available or, conversely, something good happens, such as the study ends before the subject had the event of interest occur, i.e., they survived at least until the end of the study, but there is no knowledge of what happened thereafter.
Event indicates what the 'event' was for a patient, 1 for the event, for example, death or metastasis, and 0 for censored.
Help text was partially taken from A PRACTICAL GUIDE TO UNDERSTANDING KAPLAN-MEIER CURVES.
Below is a table of the column/row header names we recognize for each type. Note that these header names are case sensitive.
Survival Type
'Time to Event' Header name
'Event' Header name
Overall Survival
OS.time
OS
Disease free interval
DFI.time
DFI
Disease specific survival
DSS.time
DSS
Progression free interval
PFI.time
PFI
Local recurrence interval
LRI.time
LRI
Distant metastasis interval
DMI.time
DMI
Distant disease free survival
DDFS.time
DDFS
Invasive disease free survival
IDFS.time
IDFS
Regional recurrence
RR.time
RR
Relapse
Relapse.time
Relapse
Metastasis
Metastasis.time
Metastasis
Distant recurrence interval
DRI.time
DRI
Distant metastasis free survival
DMFS.time
DMFS
sample
OS
OS.time
TCGA-AB-1234-01
0
100
TCGA-AB-6789-01
1
200
TCGA-CD-1234-01
0
300
TCGA-CD-5678-01
1
400
We will automatically detect and map your probes/transcripts/identifiers to HUGO gene names. For instance, we will map Affy probe IDs to HUGO gene names so that you can enter a HUGO gene name when creating a column in the Visual Spreadsheet and we will pull up the corresponding Affy probes.
You can still load your data if you do not see your identifiers listed. We will just not map them to HUGO genes for you. This means that in the visualization you will need to enter your identifiers as they appear in your file.
Affy U133 array (hg19) e.g. 1007_s_at
Affy HumanExon1.0ST (hg18) e.g. 2315101
Affy Human Gene 1.0 ST array (hg19) e.g. 7896736
Affy Human SNP6 array (hg18) e.g. CN_473963
Agilent Human gene expression 4X44K array (hg18) e.g. A_23_P100001
Agilent SurePrint G3 Human CGH array 2x400K (hg18) e.g. A_16_P01651995
Agilent Human 1A array (hg18) e.g. A_23_P149050
Exon: GENCODE 19 e.g. ENSE00000327880.1
Infinium HumanMethylation27 array GDC version (hg38) e.g. cg00000292
Infinium HumanMethylation27 array TCGA legacy version (hg18) e.g. cg26211698
Infinium HumanMethylation450 array TCGA legacy version (hg19) e.g. cg13332474
Infinium HumanMethylation450 array GDC version (hg38) e.g. cg00000029
HUGO: human gene symbol (hg18) e.g. TP53
HUGO: human gene symbol (hg19) e.g. TP53
HUGO: human gene symbol (hg38) e.g. TP53
Gene: Ensembl human genes (hg19) e.g. ENSG00000223972
Gene: Ensembl human genes (hg38) e.g. ENSG00000223972
Gene: GENCODE 19 e.g. ENSG00000223972.4
Gene: GENCODE 22 comprehensive e.g. ENSG00000223972.5
Gene: GENCODE 23 comprehensive e.g. ENSG00000223972.5
Gene: GENCODE 23 basic e.g. ENSG00000223972.5
Gene: UCSC Known genes (hg18) e.g. uc001aaa.1
Gene: UCSC Known Genes (hg19) e.g. uc001aaa.1
Transcript: GENCODE 19 comprehensive e.g. ENST00000456328.2
Transcript: GENCODE 23 comprehensive e.g. ENST00000456328.2
Transcript: GENCODE 23 basic e.g. ENST00000456328.2
Transcript: RefSeq (hg19) e.g. NM_000014
miRNA miRBase v13 stem-loop (hg18) e.g. hsa-mir-1977
miRNA miRBase v20 stem-loop (hg19) e.g. hsa-mir-1302-2
Contact us if you don't see your gene or probe names in this list and we may be able to add it for you.
If it looks like we picked the wrong set of probes, please click 'Advanced' next to the 'Import' button on the last screen of the wizard to load data. You can then pick the appropriate probes.
In addition to the data itself, we require some metadata about your file. When you use our website to load your data we fill in this metadata for you. When you use the command line, you will need to provide this data in an additional file.
The metadata file is a .json file and follows json formatting. The metadata .json file needs to be in the same directory as the data file. The metadata file and the data file need to have the same base name, including any file extensions (e.g. my_first_dataset and my_first_dataset.json OR my_second_dataset.txt and my_second_dataset.txt.json).
There are two required fields: type and cohort.
Type can be:
'genomicMatrix' -> genomic data where samples are columns and genomic regions are rows. Note that for loading on the command line we do not support the other orientation
'clinicalMatrix' -> phenotypic data where samples are rows and phenotypic columns are rows. Note that for loading on the command line we do not support the other orientation
'mutationVector' -> mutation data
‘genomicSegment’-> segmented copy number data
Cohort is used to know if there are other data on the samples that you are loaded. You can either specify a pre-existing cohort or create your own. Cohort names are displayed on the dataset pages and the cohort drop down menu on the Heatmaps page.
For existing cohorts, you need to enter the cohort name EXACTLY as it appears as the existing cohort name. Note that our cohort names are case sensitive.
If you are loading a mutation or segmented copy number file you will also need to specify the reference genome. You do not need to specify this for other file types
If you are loading a file that has probes, transcripts, or exons and you would like to query your data by gene, you will need to provide a mapping file. You do not need to specify this for other file types.
Here is an example probemap file (a delimitated file): https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap
We have many probemap files that you can see via our xenaPython app.
If you do not see a probemap that will work for you, please let us know.
To reference a probemap you need three files:
Include the probemap reference in your data file .json
Have the probemap file in the same directory as your data file and data file .json
Also have a .json file for the probemap so that we know how to load it
Note that to reference a probemap you need to load the probemap first, then load the data file.
https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap
Put both your .tsv and .json files in your_home_directory/xena/files. Then run the jar, passing in the file name, like so:
→ loads all files
OR
→ loads just file1.tsv
Note that you will need to substitute the name of the .jar. file As of the time of writing (September 20, 2018), the name of the .jar file was cavm-0.22.0-standalone.jar. On linux this will be in the directory where you opened the archive. On Windows or MacOS, use your operating system’s file search capability to search for cavm*jar. On Windows you will need to use the full path to your home directory, instead of “~”.
Note you do not need to load the .json files. Xena will automatically look for these and load them.
→ delete just file1.tsv
→ delete file1.tsv and file2.tsv
You can always type:
for help.
There are a couple of options. You can right-click the .dmg and chosen 'open'. You can also press the Control key, then click the app icon, then choose Open from the shortcut menu. These help pages might help: http://www.iclarified.com:8081/28180/how-to-open-applications-from-unidentified-developers-in-mac-os-x-mountain-lion and from Apple: https://support.apple.com/kb/PH25088?locale=en_US .
The only time the assembly matters is if you decided to visualize part or all of a chromosome, rather than a gene/probe/transcript. If you want to visualize only genes/probes/transcripts than it does not matter which assembly you choose.
You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is started up. You can also check which studies are on your hub and what data is in them by going to the My Computer Hub page: xenabrowser.net/datapages/?host=https%3A%2F%2Flocal.xena.ucsc.edu%3A7223.
You also may not see your study if the hub is still loading the data. Wait a few minutes and refresh the page.
When you loaded your genomic data we asked what type of genes, transcripts or probes you used. If you selected one of the options from the drop down menu then you can enter HUGO gene names or the identifiers in your file. If you did not select one of the options then you will need to enter the identifiers as they appear in your file.
Yes, we will allow you to select phenotypes from both files in the visualization.
You might be able to load your file anyways, depending on the format. Give it a try and if you are unable to load it, write us an email and we may be able to fix your file for you.
You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.
We require that your data files have a unix line ending. To ensure that your files have this line ending on a DOS, please follow the help here:
Note that this requirement is only for data files, not for the associated .json files.
There are 2 basic data formats and 2 advanced data formats. Each of these formats has one or more biological data types that it supports.
We support most types of genomic and phenotype/clinical/sample annotations. For genomic data we support calls made on the raw data including but not limited to expression calls, mutation calls, etc. This is what TCGA calls ‘Level 3’ data and is typically a value on gene, transcript, probe, etc. We do not support FASTQ, BAMs, or other ‘raw’ files. Please contact us if you have any questions.
We support tab-delimited and Microsoft Excel files (.xlsx and .xls). Tab-delimited files generally have a file name ending in .tsv or .txt, though we do not require this. Note that we load tab-delimited files much faster than Excel files. You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.
Please do not have any duplicate genes/probes/identifiers or samples. We will allow you to load with duplicates but will only display the first one encountered in the file.
We assume you use a '.' to indicate a decimal place as opposed to a ',' .
Here is a with example data in addition to the examples below.
These are numeric data called on genomic regions (e.g. exon expression or gene-level copy number). This data is in a rectangle where samples are columns and rows are the genomic regions (e.g. HUGO gene symbol, transcript ID, probe ID, etc). We also support samples as rows and genomic regions as the columns (i.e. the opposite orientation). For supported genomic regions, please see .
RNA-seq expression (exon, transcript, gene, etc)
Array-based expression (probe, gene, etc)
Gene-level mutation
Gene-level copy number
DNA methylation
RPPA
and more ...
For samples that do not have expression for a particular gene, either have a blank field or use "NA".
An example of a genomic matrix file (in this case, expression):
These are data on a sample or patient that is categorical in nature (e.g. Tumor Stage or 'wild type' or 'mutant' for a gene) or is numerical but non-genomic (e.g. age or a genomic signature). Samples can be columns and rows can be phenotype/clinical/sample orientation or vice versa. We support both orientations.
phenotype/patient/clinical data (age, weight, if there was blood drawn, etc)
sample/aliquot data (where it was sequenced, tumor weight, etc)
derived data (regulon activity for a gene, etc)
genomic signatures (EMT signature score, stemness score, etc)
other (whether a sample has an ERG-TMPRSS2 fusion, whether a sample has WGS data available, etc)
We support both numerical and categorical data. For numerical data please use a blank field for any samples which may be missing data. For categorical data you can use a blank field or "NA" for any samples which may be missing data.
Note that if you use "NA" for a missing numerical field then the Xena software will automatically treat that column as a category.
To have it be treated as a numerical field please use a blank field.
An example of a phenotype matrix file:
For segmented data, we require the following 5 columns: sample, chr, start, end, and value. Note that your column headers must be these names exactly!
Please use 'NA' to indicate no data.
copy number
We currently accept hg38, hg19, hg18 coordinates.
Example segmented copy number data with required columns:
For positional data, we require 6 columns: sample, chr, start, end, reference, alt. Note that your column headers must be these names exactly!
Note that Xena will not call the gene, variant effect, etc for you. All gene annotation information must be included in the file
mutation data
We currently accept hg38, hg19, hg18 coordinates.
Example mutation data with the six required columns, plus the gene column:
To specify a sample is assayed but no mutation is detected, you need a line in the file with three columns filled: sample, start, end. "start" and "end" are required to be integer (if left empty, the data loader will reject the file), so use -1 to indicate that these are bogus coordinates. The rest of the columns are empty strings.
if you're unsure if we will support your data
This is our most flexible data type. If you are wondering if your data is considered to be 'phenotypic' please .
For more information about configuring your phenotype fields, such as controlling the order for categorical features, please see our .
Other columns that may follow are: gene, effect, DNA_VAF, RNA_VAF, and Amino_Acid_Change. These other columns are not required but will enhance the visualization of this data, such as the "gene" column will enable displaying mutations when queried by gene names in addition to queried by genomic coordinates. The “effect” column will color the mutations by effect (the default color is gray). The effect terms are "Nonsense" (color red), "Frameshift" (red), "Splice" (orange), "missense" (blue), "Silent" (green), and etc. The full list of accepted terms can be found .
We support a number of other specialty data types such as structural variants. Please if you have this data so we can help you load it.
Sample
TCGA-BA-4074-01
TCGA-BA-4075-01
TCGA-BA-4076-01
ACAP3
0.137
NA
0.022
CTRT2
0.024
0.805
0.256
ALK
0.098
0.805
1.87
sample
ER_status
disease_status
age
TCGA-BA-4074-01
positive
complete remission
63
TCGA-BA-4082-01
positive
complete remission
54
TCGA-BA-4078-01
negative
undergoing treatment
65
sample
chr
start
end
value
TCGA-V4-A9EL-01
chr1
61735
16815530
0.041
TCGA-V4-A9EL-01
chr1
16816090
17190862
-0.4227
TCGA-V4-A9EF-01
chr4
86979944
115173700
0.0414
sample
chr
start
end
reference
alt
gene
TCGA-AB-2802-03
chr2
29917721
29917721
G
A
ALK
TCGA-AB-2802-03
chr1
119270684
119270687
TTAAA
T
MYC
TCGA-AB-2867-03
chr1
150324146
150324146
T
G
PRPF3
Institutional Xena Hubs allow you to share data, visualizations, and analyses with a specific group of people. Xena Hubs can be set up on any server or in the cloud. You control who has access to the Xena Hub by controlling who has access to the server on which it is hosted.
To make your data publicly available, simply make the server open to the web.
First, download the ucsc_xena_xxx.tar.gz file to your server, here:
The file to download is the one called "Tar archive, no updater or JRE - recommended for linux server developments". Uncompress and extract the .jar file (cavm-xxx-standalone.jar). The current version is 0.25.0.
The hub can be started with "java -jar cavm-xxx-standalone.jar". Passing option --help will display usage information.
Note that you need to use Java 8 to run the hub.
There are several options you will want to set.
To bind an external interface (instead of loopback), use "--host 0.0.0.0".
The connection between your hub and the Xena Browser is through https, use "--certfile" and "--keyfile" options to set them.
There are three paths that can be configured: the database file, the log file, and the root directory for data files to be served. These are set by --database, --logfile, and --root. If you don't set these, they will default to paths under ${HOME}/xena.
Copy the content below to a file "start_script"
Link server.jar to cavm-x.xx.x-standalone.jar
Make "start_script" executable
Run "./start_script"
Your hub is now running on "https://computer-external-ip:7223".
When a Xena Hub starts, it opens two consecutive ports, for http and https connections, e.g. 7222 and 7223. HTTP is always the lower number, and HTTPS is always the higher number. This means your hub has two urls
Connecting via HTTP to the hub is no longer supported by modern web browsers, thus you will need to connect via HTTPS. To do this you will need an HTTPS certificate and private key. Paths to the cert and key are set with --certfile and --keyfile. This might seem redundant for a hub behind a firewall, but the web app has no influence over the security policies of the web browser. HTTPS certificates can be acquired from free public Certificate Authorities, or via NIH InCommon.
Once the hub is running, and input files have been placed in the --root directory, a file can be loaded by running the jar a second time, with the -l option, like
If your hub is run on the default 7222 port, you can load data with
If your hub is running on a different port, you load data with
Please contact us at genome-cancer@soe.ucsc.edu for more assistance.
If your hub is run on the default 7222 port, you can delete data with
If your hub is running on a different port, you delete data with
You can now go to the visualization and add a cohort or study listed in your hub.
If you don't have a security certificate yet but you would like to verify that the hub is working you can use ssh tunneling. An example of how to do this for AWS is below, where it is assumed that the xena hub is running on port 7222 for http and 7223 for https. In this scenario, you start the hub without using --certfile and --keyfile options.
Assuming that you typically ssh into EC2 on AWS like this,
you will now set up an ssh tunnel to port 8000 on your computer. To do this we add the -L option:
Now on your computer, http://localhost:8000 is the same as the http://aws-ip:7222. Chrome Browser does not allow a connection to http://aws-ip:7222, but it will allow a connection to http://localhost:8000.
An example apache configuration on AWS VM
in /etc/httpd/conf/httpd.conf
<button class="hubButton" data-cohort="TCGA TARGET GTEx">Launch Xena</button>
To add a clickable button in the hub landing page, make sure the button has classname 'hubButton'. You also need to specify the cohort to view, defined by the data parameter 'data-cohort'. Once users click the button, the visualization wizard will be launched to the specified cohort. You can change the button label.
<button class="cohortButton" data-bookmark="bc7f3f46b042bcf5c099439c2816ff01">Example: compare FOXM1 expression</button>
The button must has a classname 'cohortButton'. If you have the data parameter 'data-bookmark', clicking the button will take the user to the bookmark view. If you don't have the 'data-bookmark' parameter, clicking the button will take the user to the visualization wizard with an empty spreadsheet. You can change the button label. You can as many button as you want.
or
Note that , which can be used for testing purposes only.
You will need to make your data file ready just like for local Xena hub on your laptop. Please see instructions on .
You will also need to make your data's meta-data file (xxx.json) ready. Please see for instructions.
Go to Data Hub page , add "https:computer-external-ip:7223"
After setting up the ssh tunnel go to Data Hub page , add "http://localhost:8000".
Alternatively, you can run the hub behind a reverse proxy, and attach the certificate and keyfile to Apache, Nginx or AWS load balancer configurations. In this scenario, you start the hub without using --certfile and --keyfile options. This is useful if you want your hub to have a url like "". You set up your DNS to point the hostname (tcga.xenahubs.net) to ip address of the server on which the hub is running.
If you have a markdown file called $DOCROOT/meta/info.mdown in your hub's document root directory, the markdown file will serve as a splash page for your hub. An example is the UCSC Toil RNA-seq Recompute hub: . The corresponding markdown file is .
You can also have a landing page for a study cohort. An example is the TCGA TARGET GTEx cohort: . The corresponding markdown file is . The study cohort landing page is also a markdown file, which must be hosted in the repository on github. The markdown file called https://github.com/ucscXena/cohortMetaData/cohort_$cohortName/info.mdown.