1 of 8

Viewing your own data

Getting Started

Step-by-step instructions to viewing your own data

Overview

Get started viewing your own data:

Install a Local Xena Hub on your computer
Start the Local Xena Hub
Load the data you want to view
View the data

We support most types of genomic and/or phenotypic/clinical/annotation data. Genomic data needs to be values called on genes, transcripts, exons, probes or some other identifier. Phenotypic/clinical/annotation data can be almost anything, including patient data (e.g. age, set, etc), clinical data (survival data for a KM plot), and other data such as gene fusion calls, regulon activity, immune scores, and more. Samples can be bulk tissue, cell lines, cells, and more. We do not visualize raw data such as FASTQs or BAMs.

Data can be your own or from another source, like GEO or a publication.

We support tab-delimited (.tsv and .txt) and Microsoft Excel files (.xlsx and .xls). Data on a Local Xena Hub can only be viewed or accessed by the same computer on which it is running, keeping private data secure.

The Local Xena Hub must be installed and running in order to load data, as well as any time you want to view data. The Local Xena Hub will remember previously loaded data.

Please use Chrome to view your own data.

Installing a Local Xena Hub

Click on VIEW MY DATA. You will be prompted to download and install a local Xena Hub.

Double click on the download to begin the installation of the Xena Hub. Follow the wizard to finish the install.

System requirements for Xena Hub

Mac: OSX 10.7 and above
Windows: 64-bit
Linux: ability to run a .jar file

Starting/running a Local Xena Hub

After installing a local Xena Hub, go back to VIEW MY DATA to auto-start the Hub. If it does not automatically start, refresh the page or double click on the Xena Hub application on your computer. The Xena Hub application should be in your Applications folder for Mac and Windows. Note that it will take up to one minute to start up.

Loading data into a Local Xena Hub

Most people load data into their Local Xena Hub through our website wizard, which leads you through the loading process step by step. Note that you will want to make sure your data is properly formatted ahead of time.

You can also load data via the command line.

Viewing data from a Local Xena Hub

Click on VISUALIZATION. If your study is not already selected as step 1 of the wizard, then select it from the drop down and click 'Done'. Note that if you did not enter a study name your data will be under 'My Study'.

Gene names and identifiers for genomic data

When you loaded your genomic data we asked what type of genes, transcripts or probes you used. If you selected one of the options from the drop down menu then you can enter HUGO gene names or the identifiers in your file. If you did not select one of the options then you will need to enter the identifiers as they appear in your file.

Help! I don't see my study listed

You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is running on your computer. You can also check which studies are on your hub and what data is in them by going to the My Computer Hub page.

Data security

How does Xena ensure the security of my data?

Xena does not utilize a central rendering service, or require hubs to be publicly accessible on the internet like, for example, the UCSC Genome Browser does. Data flows in one direction, from hubs to the user agent. If the user installs a Xena Hub on their laptop, the hub is as secure as the laptop. If the user installs a Xena Hub on a local network, behind a firewall, the hub is as secure as the local network.

The Xena Browser accesses data from a local Xena Hub on the same computer by requesting data from http://127.0.0.1. The local Xena Hub will make the data within it available at this address. The local Xena Hub will only answer requests made form the user's own computer.

Users will need to use a web browser that supports this if they wish to use a Xena Hub on the loopback interface. At the time of writing, this includes Chrome, and Firefox, but not Safari.

Is there any data that is considered to not be secure?

A very limited set of metadata is considered to be not secure in the Xena architecture model. This includes cohort names and samples names. This metadata is visible to other hubs in the following scenarios. When the user selects a cohort, all hubs are queried for samples on that cohort. When the user selects a data field, the hub holding that field is queried with the field ID (e.g. gene, probe, transcript, phenotype) and all cohort sample IDs. This means, for example, that two hubs holding data on the same cohort will see the union of sample IDs from that cohort. While data queries are not made available publicly, a malicious person could gain entry to a Xena Hub and comb through logs for these queries. For these reasons, these metadata fields should not contain private information.

Probes/transcripts/identifiers we recognize

We will automatically detect and map your probes/transcripts/identifiers to HUGO gene names. For instance, we will map Affy probe IDs to HUGO gene names so that you can enter a HUGO gene name when creating a column in the Visual Spreadsheet and we will pull up the corresponding Affy probes.

You can still load your data if you do not see your identifiers listed. We will just not map them to HUGO genes for you. This means that in the visualization you will need to enter your identifiers as they appear in your file.

Supported probes and other identifiers

Affy U133 array (hg19) e.g. 1007_s_at
Affy HumanExon1.0ST (hg18) e.g. 2315101
Affy Human Gene 1.0 ST array (hg19) e.g. 7896736
Affy Human SNP6 array (hg18) e.g. CN_473963
Agilent Human gene expression 4X44K array (hg18) e.g. A_23_P100001
Agilent SurePrint G3 Human CGH array 2x400K (hg18) e.g. A_16_P01651995
Agilent Human 1A array (hg18) e.g. A_23_P149050
Exon: GENCODE 19 e.g. ENSE00000327880.1
Infinium HumanMethylation27 array GDC version (hg38) e.g. cg00000292
Infinium HumanMethylation27 array TCGA legacy version (hg18) e.g. cg26211698
Infinium HumanMethylation450 array TCGA legacy version (hg19) e.g. cg13332474
Infinium HumanMethylation450 array GDC version (hg38) e.g. cg00000029

Supported genes and transcripts

HUGO: human gene symbol (hg18) e.g. TP53
HUGO: human gene symbol (hg19) e.g. TP53
HUGO: human gene symbol (hg38) e.g. TP53
Gene: Ensembl human genes (hg19) e.g. ENSG00000223972
Gene: Ensembl human genes (hg38) e.g. ENSG00000223972
Gene: GENCODE 19 e.g. ENSG00000223972.4
Gene: GENCODE 22 comprehensive e.g. ENSG00000223972.5
Gene: GENCODE 23 comprehensive e.g. ENSG00000223972.5
Gene: GENCODE 23 basic e.g. ENSG00000223972.5
Gene: UCSC Known genes (hg18) e.g. uc001aaa.1
Gene: UCSC Known Genes (hg19) e.g. uc001aaa.1
Transcript: GENCODE 19 comprehensive e.g. ENST00000456328.2
Transcript: GENCODE 23 comprehensive e.g. ENST00000456328.2
Transcript: GENCODE 23 basic e.g. ENST00000456328.2
Transcript: RefSeq (hg19) e.g. NM_000014
miRNA miRBase v13 stem-loop (hg18) e.g. hsa-mir-1977
miRNA miRBase v20 stem-loop (hg19) e.g. hsa-mir-1302-2

FAQ

FAQ: Xena didn't map the right probes

If it looks like we picked the wrong set of probes, please click 'Advanced' next to the 'Import' button on the last screen of the wizard to load data. You can then pick the appropriate probes.

Data format specifications and supported biological data types

There are 2 basic data formats and 2 advanced data formats. Each of these formats has one or more biological data types that it supports.

General Specifications for all data formats

We support most types of genomic and phenotype/clinical/sample annotations. For genomic data we support calls made on the raw data including but not limited to expression calls, mutation calls, etc. This is what TCGA calls ‘Level 3’ data and is typically a value on gene, transcript, probe, etc. We do not support FASTQ, BAMs, or other ‘raw’ files. Please contact us if you have any questions.

We support tab-delimited and Microsoft Excel files (.xlsx and .xls). Tab-delimited files generally have a file name ending in .tsv or .txt, though we do not require this. Note that we load tab-delimited files much faster than Excel files. You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.

Please do not have any duplicate genes/probes/identifiers or samples. We will allow you to load with duplicates but will only display the first one encountered in the file.

We assume you use a '.' to indicate a decimal place as opposed to a ',' .

Here is a folder with example data in addition to the examples below.

Basic Genomic data: numbers in a rectangle/matrix/spreadsheet

These are numeric data called on genomic regions (e.g. exon expression or gene-level copy number). This data is in a rectangle where samples are columns and rows are the genomic regions (e.g. HUGO gene symbol, transcript ID, probe ID, etc). We also support samples as rows and genomic regions as the columns (i.e. the opposite orientation). For supported genomic regions, please see supported gene and probe names.

Supported data types

RNA-seq expression (exon, transcript, gene, etc)
Array-based expression (probe, gene, etc)
Gene-level mutation
Gene-level copy number
DNA methylation
RPPA
and more ...

For samples that do not have expression for a particular gene, either have a blank field or use "NA".

An example of a genomic matrix file (in this case, expression):

Sample

TCGA-BA-4074-01

TCGA-BA-4075-01

TCGA-BA-4076-01

ACAP3

0.137

0.022

CTRT2

0.024

0.805

0.256

ALK

0.098

0.805

1.87

Basic Phenotypic data: categories or non-genomic in a rectangle/matrix/spreadsheet

These are data on a sample or patient that is categorical in nature (e.g. Tumor Stage or 'wild type' or 'mutant' for a gene) or is numerical but non-genomic (e.g. age or a genomic signature). Samples can be columns and rows can be phenotype/clinical/sample orientation or vice versa. We support both orientations.

Supported data types

phenotype/patient/clinical data (age, weight, if there was blood drawn, etc)
sample/aliquot data (where it was sequenced, tumor weight, etc)
derived data (regulon activity for a gene, etc)
genomic signatures (EMT signature score, stemness score, etc)
other (whether a sample has an ERG-TMPRSS2 fusion, whether a sample has WGS data available, etc)

This is our most flexible data type. If you are wondering if your data is considered to be 'phenotypic' please contact us.

Categorial vs numerical data

We support both numerical and categorical data. For numerical data please use a blank field for any samples which may be missing data. For categorical data you can use a blank field or "NA" for any samples which may be missing data.

Note that if you use "NA" for a missing numerical field then the Xena software will automatically treat that column as a category.

To have it be treated as a numerical field please use a blank field.

For more information about configuring your phenotype fields, such as controlling the order for categorical features, please see our Metadata Specifications.

An example of a phenotype matrix file:

sample

ER_status

disease_status

age

TCGA-BA-4074-01

positive

complete remission

TCGA-BA-4082-01

positive

complete remission

TCGA-BA-4078-01

negative

undergoing treatment

Advanced Segmented data

For segmented data, we require the following 5 columns: sample, chr, start, end, and value. Note that your column headers must be these names exactly!

Please use 'NA' to indicate no data.

Supported data types

copy number

We currently accept hg38, hg19, hg18 coordinates.

Example segmented copy number data with required columns:

sample

chr

start

end

value

TCGA-V4-A9EL-01

chr1

61735

16815530

0.041

TCGA-V4-A9EL-01

chr1

16816090

17190862

-0.4227

TCGA-V4-A9EF-01

chr4

86979944

115173700

0.0414

Advanced Positional data

For positional data, we require 6 columns: sample, chr, start, end, reference, alt. Note that your column headers must be these names exactly!

Other columns that may follow are: gene, effect, DNA_VAF, RNA_VAF, and Amino_Acid_Change. These other columns are not required but will enhance the visualization of this data, such as the "gene" column will enable displaying mutations when queried by gene names in addition to queried by genomic coordinates. The “effect” column will color the mutations by effect (the default color is gray). The effect terms are "Nonsense" (color red), "Frameshift" (red), "Splice" (orange), "missense" (blue), "Silent" (green), and etc. The full list of accepted terms can be found here in our code.

Note that Xena will not call the gene, variant effect, etc for you. All gene annotation information must be included in the file

Supported data types

mutation data

We currently accept hg38, hg19, hg18 coordinates.

Example mutation data with the six required columns, plus the gene column:

sample

chr

start

end

reference

alt

gene

TCGA-AB-2802-03

chr2

29917721

ALK

TCGA-AB-2802-03

chr1

119270684

119270687

TTAAA

MYC

TCGA-AB-2867-03

chr1

150324146

PRPF3

To specify a sample is assayed but no mutation is detected, you need a line in the file with three columns filled: sample, start, end. "start" and "end" are required to be integer (if left empty, the data loader will reject the file), so use -1 to indicate that these are bogus coordinates. The rest of the columns are empty strings.

Advanced Other data

We support a number of other specialty data types such as structural variants. Please contact us if you have this data so we can help you load it.

KM plots using data from a Local Xena Hub

To visualize and perform a KM analysis, we use two columns/rows of data, time to event and event. These data must be loaded in a phenotype file. The phenotype file can contain other data as well.

Note that you will need to name the headers in your phenotype file EXACTLY what we recognize. See the list of recognized headers for each type of survival/interval below.

This data can be in days, months, years, etc.

Time to Event and Event

Time to Event is a duration variable for each subject having a beginning and an end anywhere along the timeline of the complete study. It begins when the subject is enrolled into a study or when treatment begins, and ends when the end-point (event of interest, for example, death or metastasis) is reached or the subject is censored from the study.

Censoring means the total survival time for that subject cannot be accurately determined. This can happen when something negative for the study occurs, such as the subject drops out, is lost to follow-up, or the required data is not available or, conversely, something good happens, such as the study ends before the subject had the event of interest occur, i.e., they survived at least until the end of the study, but there is no knowledge of what happened thereafter.

Event indicates what the 'event' was for a patient, 1 for the event, for example, death or metastasis, and 0 for censored.

Help text was partially taken from A PRACTICAL GUIDE TO UNDERSTANDING KAPLAN-MEIER CURVES.

Recognized header names for different types of survival

Below is a table of the column/row header names we recognize for each type. Note that these header names are case sensitive.

Survival Type

'Time to Event' Header name

'Event' Header name

Overall Survival

OS.time

Disease free interval

DFI.time

DFI

Disease specific survival

DSS.time

DSS

Progression free interval

PFI.time

PFI

Local recurrence interval

LRI.time

LRI

Distant metastasis interval

DMI.time

DMI

Distant disease free survival

DDFS.time

DDFS

Invasive disease free survival

IDFS.time

IDFS

Regional recurrence

RR.time

Relapse

Relapse.time

Relapse

Metastasis

Metastasis.time

Metastasis

Distant recurrence interval

DRI.time

DRI

Distant metastasis free survival

DMFS.time

DMFS

Example Overall Survival

sample

OS.time

TCGA-AB-1234-01

100

TCGA-AB-6789-01

200

TCGA-CD-1234-01

300

TCGA-CD-5678-01

400

Hubs for institutions, collaborations, labs, and larger projects

Institutional Xena Hubs allow you to share data, visualizations, and analyses with a specific group of people. Xena Hubs can be set up on any server or in the cloud. You control who has access to the Xena Hub by controlling who has access to the server on which it is hosted.

To make your data publicly available, simply make the server open to the web.

Download

First, download the ucsc_xena_xxx.tar.gz file to your server, here:

https://genome-cancer.ucsc.edu/download/public/get-xena/index.html

The file to download is the one called "Tar archive, no updater or JRE - recommended for linux server developments". Uncompress and extract the .jar file (cavm-xxx-standalone.jar). The current version is 0.25.0.

Start the hub

The hub can be started with "java -jar cavm-xxx-standalone.jar". Passing option --help will display usage information.

Note that you need to use Java 8 to run the hub.

There are several options you will want to set.

To bind an external interface (instead of loopback), use "--host 0.0.0.0".

The connection between your hub and the Xena Browser is through https, use "--certfile" and "--keyfile" options to set them.

There are three paths that can be configured: the database file, the log file, and the root directory for data files to be served. These are set by --database, --logfile, and --root. If you don't set these, they will default to paths under ${HOME}/xena.

--database -d default to ${HOME}/xena/database

--logfile default to ${HOME}/xena/xena.log

--root -r default to ${HOME}/xena/files/

Example start script for an open-access hub

Copy the content below to a file "start_script"

#!/bin/bash

PORT=7222
LOGFILE=xena/xena7222.log 
DOCROOT=xena/files
DB=xena/myHub

java -jar server.jar -r ${DOCROOT} -d ${DB} --no-gui -p ${PORT} -H 0.0.0.0 --logfile ${LOGFILE} --certfile ${CERTFILE} --keyfile ${KEYFILE}> log 2>&1 &

disown

Link server.jar to cavm-x.xx.x-standalone.jar

ln -sf cavm-0.xx.0-standalone.jar server.jar

Make "start_script" executable

chmod u+x start_script

Run "./start_script"

./start_script

Your hub is now running on "https://computer-external-ip:7223".

Getting a security certificate for an open-access hub

When a Xena Hub starts, it opens two consecutive ports, for http and https connections, e.g. 7222 and 7223. HTTP is always the lower number, and HTTPS is always the higher number. This means your hub has two urls

http://ip:7222 or https://ip:7223

Connecting via HTTP to the hub is no longer supported by modern web browsers, thus you will need to connect via HTTPS. To do this you will need an HTTPS certificate and private key. Paths to the cert and key are set with --certfile and --keyfile. This might seem redundant for a hub behind a firewall, but the web app has no influence over the security policies of the web browser. HTTPS certificates can be acquired from free public Certificate Authorities, or via NIH InCommon.

Note that the section below detailing a way to utilize ssh tunneling to get around this, which can be used for testing purposes only.

Make your data ready

You will need to make your data file ready just like for local Xena hub on your laptop. Please see instructions on data format specifications.

You will also need to make your data's meta-data file (xxx.json) ready. Please see loading data from the command line for instructions.

Load data through command line

Once the hub is running, and input files have been placed in the --root directory, a file can be loaded by running the jar a second time, with the -l option, like

ln -sf cavm-x.xx.x-standalone.jar server.jar

Delete data through command line

If your hub is run on the default 7222 port, you can load data with

java -jar server.jar -l /path/to/root/file.tsv

If your hub is running on a different port, you load data with

java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv

Please contact us at genome-cancer@soe.ucsc.edu for more assistance.

If your hub is run on the default 7222 port, you can delete data with

java -jar server.jar -x /path/to/root/file.tsv

If your hub is running on a different port, you delete data with

java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv

Viewing data from the hub

Go to Data Hub page here, add "https:computer-external-ip:7223"

You can now go to the visualization and add a cohort or study listed in your hub.

If you don't have a security certificate yet

If you don't have a security certificate yet but you would like to verify that the hub is working you can use ssh tunneling. An example of how to do this for AWS is below, where it is assumed that the xena hub is running on port 7222 for http and 7223 for https. In this scenario, you start the hub without using --certfile and --keyfile options.

Assuming that you typically ssh into EC2 on AWS like this,

ssh -i "xena.pem" ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com

you will now set up an ssh tunnel to port 8000 on your computer. To do this we add the -L option:

ssh -i "xena.pem" -L 8000:localhost:7222 ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com

Now on your computer, http://localhost:8000 is the same as the http://aws-ip:7222. Chrome Browser does not allow a connection to http://aws-ip:7222, but it will allow a connection to http://localhost:8000.

After setting up the ssh tunnel go to Data Hub page here, add "http://localhost:8000".

How to set up my hub to have a url like https://tcga.xenahubs.net

Alternatively, you can run the hub behind a reverse proxy, and attach the certificate and keyfile to Apache, Nginx or AWS load balancer configurations. In this scenario, you start the hub without using --certfile and --keyfile options. This is useful if you want your hub to have a url like "https://tcga.xenahubs.net". You set up your DNS to point the hostname (tcga.xenahubs.net) to ip address of the server on which the hub is running.

An example apache configuration on AWS VM

in /etc/httpd/conf/httpd.conf

<VirtualHost *:443>
    ServerName tcga.xenahubs.net
    SSLEngine on
    SSLProxyEngine On
    SSLProxyVerify none
    SSLProxyCheckPeerCN off
    SSLProxyCheckPeerName off
    SSLProxyCheckPeerExpire off
    SSLCertificateFile YOURCERTIFICATE
    SSLCertificateKeyFile YOURKEY
    # setup the proxy                                                                                                                                                                                          
    ProxyPreserveHost On
    ProxyPass / https://localhost:9000/
    ProxyPassReverse / https://localhost:9001/
</VirtualHost>

A landing page for my hub

If you have a markdown file called $DOCROOT/meta/info.mdown in your hub's document root directory, the markdown file will serve as a splash page for your hub. An example is the UCSC Toil RNA-seq Recompute hub: https://toil.xenahubs.net. The corresponding markdown file is this.

How do I add a 'Launch Xena' button like the TOIL landing page

<button class="hubButton" data-cohort="TCGA TARGET GTEx">Launch Xena</button>

To add a clickable button in the hub landing page, make sure the button has classname 'hubButton'. You also need to specify the cohort to view, defined by the data parameter 'data-cohort'. Once users click the button, the visualization wizard will be launched to the specified cohort. You can change the button label.

A landing page for my cohort

You can also have a landing page for a study cohort. An example is the TCGA TARGET GTEx cohort: https://xenabrowser.net/datapages/?cohort=TCGA%20TARGET%20GTEx. The corresponding markdown file is this. The study cohort landing page is also a markdown file, which must be hosted in the https://github.com/ucscXena/cohortMetaData repository on github. The markdown file called https://github.com/ucscXena/cohortMetaData/cohort_$cohortName/info.mdown.

How do I add a "Launch" button like the TCGA TARGET GTEx landing page

<button class="cohortButton" data-bookmark="bc7f3f46b042bcf5c099439c2816ff01">Example: compare FOXM1 expression</button>

The button must has a classname 'cohortButton'. If you have the data parameter 'data-bookmark', clicking the button will take the user to the bookmark view. If you don't have the 'data-bookmark' parameter, clicking the button will take the user to the visualization wizard with an empty spreadsheet. You can change the button label. You can as many button as you want.

Loading data from the command line

In addition to the data itself, we require some metadata about your file. When you use our website to load your data we fill in this metadata for you. When you use the command line, you will need to provide this data in an additional file.

Metadata file requirements

The metadata file is a .json file and follows json formatting. The metadata .json file needs to be in the same directory as the data file. The metadata file and the data file need to have the same base name, including any file extensions (e.g. my_first_dataset and my_first_dataset.json OR my_second_dataset.txt and my_second_dataset.txt.json).

There are two required fields: type and cohort.

Type

Type can be:

'genomicMatrix' -> genomic data where samples are columns and genomic regions are rows. Note that for loading on the command line we do not support the other orientation
'clinicalMatrix' -> phenotypic data where samples are rows and phenotypic columns are rows. Note that for loading on the command line we do not support the other orientation
'mutationVector' -> mutation data
‘genomicSegment’-> segmented copy number data

Example:

{"type":"mutationVector"}

Cohort

Cohort is used to know if there are other data on the samples that you are loaded. You can either specify a pre-existing cohort or create your own. Cohort names are displayed on the dataset pages and the cohort drop down menu on the Heatmaps page.

For existing cohorts, you need to enter the cohort name EXACTLY as it appears as the existing cohort name. Note that our cohort names are case sensitive.

Example

{"type":"mutationVector", 
 "cohort":"TCGA Breast Cancer"}

Reference

If you are loading a mutation or segmented copy number file you will also need to specify the reference genome. You do not need to specify this for other file types

Example

{"type":"mutationVector", 
 "cohort":"TCGA Breast Cancer", 
 "assembly":"hg19"}

Probemap

If you are loading a file that has probes, transcripts, or exons and you would like to query your data by gene, you will need to provide a mapping file. You do not need to specify this for other file types.

Here is an example probemap file (a delimitated file): https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap

#id    gene    chrom    chromStart    chromEnd    strand 
id_1    AADACL3    chr1    12776118    12776347    +

We have many probemap files that you can see via our xenaPython app.

host =“https://reference.xenahubs.net”
xenaPython.probemap_list(host)

If you do not see a probemap that will work for you, please let us know.

To reference a probemap you need three files:

Include the probemap reference in your data file .json
Have the probemap file in the same directory as your data file and data file .json
Also have a .json file for the probemap so that we know how to load it

Note that to reference a probemap you need to load the probemap first, then load the data file.

Example data file .json

{"type":"genomicMatrix", 
 "cohort":"TCGA Breast Cancer", 
 ":probeMap":"/unc_v2_exon_hg19_probe_TCGA"}

Example probemap

https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap

Example probemap .json file (required to be in the same folder as the probemap)

{ “type”:“probeMap”, 
  “assembly”:“hg19"}

More information about the metadata

Commands to load data

Put both your .tsv and .json files in your_home_directory/xena/files. Then run the jar, passing in the file name, like so:

java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/*

→ loads all files

java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/file1.tsv

→ loads just file1.tsv

Note that you will need to substitute the name of the .jar. file As of the time of writing (September 20, 2018), the name of the .jar file was cavm-0.22.0-standalone.jar. On linux this will be in the directory where you opened the archive. On Windows or MacOS, use your operating system’s file search capability to search for cavm*jar. On Windows you will need to use the full path to your home directory, instead of “~”.

Note you do not need to load the .json files. Xena will automatically look for these and load them.

Commands to delete data

java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv

→ delete just file1.tsv

java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv ~/xena/files/file2.tsv

→ delete file1.tsv and file2.tsv

Help

You can always type:

java -jar cavm-0.xx.0-standalone.jar -h

for help.

FAQ/Troubleshooting Guide

I ran into APPLE warning you about “unidentified developer” warning when installing Xena. What do I do?

There are a couple of options. You can right-click the .dmg and chosen 'open'. You can also press the Control key, then click the app icon, then choose Open from the shortcut menu. These help pages might help: http://www.iclarified.com:8081/28180/how-to-open-applications-from-unidentified-developers-in-mac-os-x-mountain-lion and from Apple: https://support.apple.com/kb/PH25088?locale=en_US .

I see my probes/genes/transcripts when loading my data, but I don't know whether to choose hg18, hg19 or hg38?

The only time the assembly matters is if you decided to visualize part or all of a chromosome, rather than a gene/probe/transcript. If you want to visualize only genes/probes/transcripts than it does not matter which assembly you choose.

I don't see my study listed in the visualization

You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is started up. You can also check which studies are on your hub and what data is in them by going to the My Computer Hub page: xenabrowser.net/datapages/?host=https%3A%2F%2Flocal.xena.ucsc.edu%3A7223.

You also may not see your study if the hub is still loading the data. Wait a few minutes and refresh the page.

I'm entering a gene name but it only draws gray

Can I load two or more phenotype files to the same study?

Yes, we will allow you to select phenotypes from both files in the visualization.

Help! My file is too large to open in Microsoft Excel.

You might be able to load your file anyways, depending on the format. Give it a try and if you are unable to load it, write us an email and we may be able to fix your file for you.

How do I convert my .xls or .xlsx into a tab-delimited file?

You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.

Unix vs DOS

We require that your data files have a unix line ending. To ensure that your files have this line ending on a DOS, please follow the help here:

Note that this requirement is only for data files, not for the associated .json files.

Data format specifications and supported biological data types

There are 2 basic data formats and 2 advanced data formats. Each of these formats has one or more biological data types that it supports.

General Specifications for all data formats

Please do not have any duplicate genes/probes/identifiers or samples. We will allow you to load with duplicates but will only display the first one encountered in the file.

We assume you use a '.' to indicate a decimal place as opposed to a ',' .

Here is a folder with example data in addition to the examples below.

Basic Genomic data: numbers in a rectangle/matrix/spreadsheet

Supported data types

RNA-seq expression (exon, transcript, gene, etc)
Array-based expression (probe, gene, etc)
Gene-level mutation
Gene-level copy number
DNA methylation
RPPA
and more ...

For samples that do not have expression for a particular gene, either have a blank field or use "NA".

An example of a genomic matrix file (in this case, expression):

Sample

TCGA-BA-4074-01

TCGA-BA-4075-01

TCGA-BA-4076-01

ACAP3

0.137

0.022

CTRT2

0.024

0.805

0.256

ALK

0.098

0.805

1.87

Basic Phenotypic data: categories or non-genomic in a rectangle/matrix/spreadsheet

Supported data types

phenotype/patient/clinical data (age, weight, if there was blood drawn, etc)
sample/aliquot data (where it was sequenced, tumor weight, etc)
derived data (regulon activity for a gene, etc)
genomic signatures (EMT signature score, stemness score, etc)
other (whether a sample has an ERG-TMPRSS2 fusion, whether a sample has WGS data available, etc)

This is our most flexible data type. If you are wondering if your data is considered to be 'phenotypic' please contact us.

Categorial vs numerical data

Note that if you use "NA" for a missing numerical field then the Xena software will automatically treat that column as a category.

To have it be treated as a numerical field please use a blank field.

For more information about configuring your phenotype fields, such as controlling the order for categorical features, please see our Metadata Specifications.

An example of a phenotype matrix file:

sample

ER_status

disease_status

age

TCGA-BA-4074-01

positive

complete remission

TCGA-BA-4082-01

positive

complete remission

TCGA-BA-4078-01

negative

undergoing treatment

Advanced Segmented data

For segmented data, we require the following 5 columns: sample, chr, start, end, and value. Note that your column headers must be these names exactly!

Please use 'NA' to indicate no data.

Supported data types

copy number

We currently accept hg38, hg19, hg18 coordinates.

Example segmented copy number data with required columns:

sample

chr

start

end

value

TCGA-V4-A9EL-01

chr1

61735

16815530

0.041

TCGA-V4-A9EL-01

chr1

16816090

17190862

-0.4227

TCGA-V4-A9EF-01

chr4

86979944

115173700

0.0414

Advanced Positional data

For positional data, we require 6 columns: sample, chr, start, end, reference, alt. Note that your column headers must be these names exactly!

Note that Xena will not call the gene, variant effect, etc for you. All gene annotation information must be included in the file

Supported data types

mutation data

We currently accept hg38, hg19, hg18 coordinates.

Example mutation data with the six required columns, plus the gene column:

sample

chr

start

end

reference

alt

gene

TCGA-AB-2802-03

chr2

29917721

ALK

TCGA-AB-2802-03

chr1

119270684

119270687

TTAAA

MYC

TCGA-AB-2867-03

chr1

150324146

PRPF3

Advanced Other data

We support a number of other specialty data types such as structural variants. Please contact us if you have this data so we can help you load it.

Hubs for institutions, collaborations, labs, and larger projects

To make your data publicly available, simply make the server open to the web.

Download

First, download the ucsc_xena_xxx.tar.gz file to your server, here:

https://genome-cancer.ucsc.edu/download/public/get-xena/index.html

Start the hub

The hub can be started with "java -jar cavm-xxx-standalone.jar". Passing option --help will display usage information.

Note that you need to use Java 8 to run the hub.

There are several options you will want to set.

To bind an external interface (instead of loopback), use "--host 0.0.0.0".

The connection between your hub and the Xena Browser is through https, use "--certfile" and "--keyfile" options to set them.

--database -d default to ${HOME}/xena/database

--logfile default to ${HOME}/xena/xena.log

--root -r default to ${HOME}/xena/files/

Example start script for an open-access hub

Copy the content below to a file "start_script"

#!/bin/bash

PORT=7222
LOGFILE=xena/xena7222.log 
DOCROOT=xena/files
DB=xena/myHub

java -jar server.jar -r ${DOCROOT} -d ${DB} --no-gui -p ${PORT} -H 0.0.0.0 --logfile ${LOGFILE} --certfile ${CERTFILE} --keyfile ${KEYFILE}> log 2>&1 &

disown

Link server.jar to cavm-x.xx.x-standalone.jar

ln -sf cavm-0.xx.0-standalone.jar server.jar

Make "start_script" executable

chmod u+x start_script

Run "./start_script"

./start_script

Your hub is now running on "https://computer-external-ip:7223".

Getting a security certificate for an open-access hub

http://ip:7222 or https://ip:7223

Note that the section below detailing a way to utilize ssh tunneling to get around this, which can be used for testing purposes only.

Make your data ready

You will need to make your data file ready just like for local Xena hub on your laptop. Please see instructions on data format specifications.

You will also need to make your data's meta-data file (xxx.json) ready. Please see loading data from the command line for instructions.

Load data through command line

Once the hub is running, and input files have been placed in the --root directory, a file can be loaded by running the jar a second time, with the -l option, like

ln -sf cavm-x.xx.x-standalone.jar server.jar

Delete data through command line

If your hub is run on the default 7222 port, you can load data with

java -jar server.jar -l /path/to/root/file.tsv

If your hub is running on a different port, you load data with

java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv

Please contact us at genome-cancer@soe.ucsc.edu for more assistance.

If your hub is run on the default 7222 port, you can delete data with

java -jar server.jar -x /path/to/root/file.tsv

If your hub is running on a different port, you delete data with

java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv

Viewing data from the hub

Go to Data Hub page here, add "https:computer-external-ip:7223"

You can now go to the visualization and add a cohort or study listed in your hub.

If you don't have a security certificate yet

Assuming that you typically ssh into EC2 on AWS like this,

ssh -i "xena.pem" ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com

you will now set up an ssh tunnel to port 8000 on your computer. To do this we add the -L option:

ssh -i "xena.pem" -L 8000:localhost:7222 ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com

After setting up the ssh tunnel go to Data Hub page here, add "http://localhost:8000".

How to set up my hub to have a url like https://tcga.xenahubs.net

An example apache configuration on AWS VM

in /etc/httpd/conf/httpd.conf

<VirtualHost *:443>
    ServerName tcga.xenahubs.net
    SSLEngine on
    SSLProxyEngine On
    SSLProxyVerify none
    SSLProxyCheckPeerCN off
    SSLProxyCheckPeerName off
    SSLProxyCheckPeerExpire off
    SSLCertificateFile YOURCERTIFICATE
    SSLCertificateKeyFile YOURKEY
    # setup the proxy                                                                                                                                                                                          
    ProxyPreserveHost On
    ProxyPass / https://localhost:9000/
    ProxyPassReverse / https://localhost:9001/
</VirtualHost>

A landing page for my hub

How do I add a 'Launch Xena' button like the TOIL landing page

<button class="hubButton" data-cohort="TCGA TARGET GTEx">Launch Xena</button>

A landing page for my cohort

How do I add a "Launch" button like the TCGA TARGET GTEx landing page

<button class="cohortButton" data-bookmark="bc7f3f46b042bcf5c099439c2816ff01">Example: compare FOXM1 expression</button>