arrow-left

All pages
gitbookPowered by GitBook
1 of 8

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Getting Started

Step-by-step instructions to viewing your own data

hashtag
Overview

Get started viewing your own data:

  1. Install a Local Xena Hub on your computer

We support most types of . Genomic data needs to be values called on genes, transcripts, exons, probes or some other identifier. Phenotypic/clinical/annotation data can be almost anything, including patient data (e.g. age, set, etc), clinical data (), and other data such as gene fusion calls, regulon activity, immune scores, and more. Samples can be bulk tissue, cell lines, cells, and more. We do not visualize raw data such as FASTQs or BAMs.

Data can be your own or from another source, like or a publication.

We support tab-delimited (.tsv and .txt) and Microsoft Excel files (.xlsx and .xls).

circle-exclamation

Please be careful when using Microsoft Excel to open files with gene names as Microsoft Excel will automatically convert some gene names into dates. For more information see:

Data on a Local Xena Hub can only be viewed or accessed by the same computer on which it is running, keeping private data secure.

The Local Xena Hub must be installed and running in order to load data, as well as any time you want to view data. The Local Xena Hub will remember previously loaded data.

circle-exclamation

Please use Chrome to view your own data.

hashtag
Installing a Local Xena Hub

Click on . You will be prompted to download and install a local Xena Hub.

Double click on the download to begin the installation of the Xena Hub. Follow the wizard to finish the install.

hashtag
System requirements for Xena Hub

  • Mac: OSX 10.7 and above

  • Windows: 64-bit

  • Linux: ability to run a .jar file

hashtag
Starting/running a Local Xena Hub

After installing a local Xena Hub, go back to to auto-start the Hub. If it does not automatically start, refresh the page or double click on the Xena Hub application on your computer. The Xena Hub application should be in your Applications folder for Mac and Windows. Note that it will take up to one minute to start up.

hashtag
Loading data into a Local Xena Hub

Most people load data into their Local Xena Hub through our , which leads you through the loading process step by step. Note that you will want to make sure your data is ahead of time.

You can also load data .

hashtag
Viewing data from a Local Xena Hub

Click on . If your study is not already selected as step 1 of the wizard, then select it from the drop down and click 'Done'. Note that if you did not enter a study name your data will be under 'My Study'.

hashtag
Gene names and identifiers for genomic data

When you loaded your genomic data we asked what type of genes, transcripts or probes you used. If you selected one of the options from the drop down menu then you can enter HUGO gene names or the identifiers in your file. If you did not select one of the options then you will need to enter the identifiers as they appear in your file.

hashtag
Help! I don't see my study listed

You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is running on your computer. You can also check which studies are on your hub and what data is in them by going to the .

hashtag
Data security

hashtag
How does Xena ensure the security of my data?

Xena does not utilize a central rendering service, or require hubs to be publicly accessible on the internet like, for example, the UCSC Genome Browser does. Data flows in one direction, from hubs to the user agent. If the user installs a Xena Hub on their laptop, the hub is as secure as the laptop. If the user installs a Xena Hub on a local network, behind a firewall, the hub is as secure as the local network.

The Xena Browser accesses data from a local Xena Hub on the same computer by requesting data from http://127.0.0.1. The local Xena Hub will make the data within it available at this address. The local Xena Hub will only answer requests made form the user's own computer.

Users will need to use a web browser that supports this if they wish to use a Xena Hub on the loopback interface. At the time of writing, this includes Chrome, and Firefox, but not Safari.

hashtag
Is there any data that is considered to not be secure?

A very limited set of metadata is considered to be not secure in the Xena architecture model. This includes cohort names and samples names. This metadata is visible to other hubs in the following scenarios. When the user selects a cohort, all hubs are queried for samples on that cohort. When the user selects a data field, the hub holding that field is queried with the field ID (e.g. gene, probe, transcript, phenotype) and all cohort sample IDs. This means, for example, that two hubs holding data on the same cohort will see the union of sample IDs from that cohort. While data queries are not made available publicly, a malicious person could gain entry to a Xena Hub and comb through logs for these queries. For these reasons, these metadata fields should not contain private information.

Start the Local Xena Hub
Load the data you want to view
View the data
genomic and/or phenotypic/clinical/annotation data
survival data for a KM plot
GEOarrow-up-right
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7arrow-up-right
VIEW MY DATAarrow-up-right
VIEW MY DATAarrow-up-right
website wizardarrow-up-right
properly formatted
via the command line
VISUALIZATIONarrow-up-right
My Computer Hub pagearrow-up-right

KM plots using data from a Local Xena Hub

To visualize and perform a KM analysis, we use two columns/rows of data, time to event and event. These data must be loaded in a phenotype file. The phenotype file can contain other data as well.

Note that you will need to name the headers in your phenotype file EXACTLY what we recognize. See the list of recognized headers for each type of survival/interval below.

This data can be in days, months, years, etc.

hashtag
Time to Event and Event

Time to Event is a duration variable for each subject having a beginning and an end anywhere along the timeline of the complete study. It begins when the subject is enrolled into a study or when treatment begins, and ends when the end-point (event of interest, for example, death or metastasis) is reached or the subject is censored from the study.

Censoring means the total survival time for that subject cannot be accurately determined. This can happen when something negative for the study occurs, such as the subject drops out, is lost to follow-up, or the required data is not available or, conversely, something good happens, such as the study ends before the subject had the event of interest occur, i.e., they survived at least until the end of the study, but there is no knowledge of what happened thereafter.

Event indicates what the 'event' was for a patient, 1 for the event, for example, death or metastasis, and 0 for censored.

Help text was partially taken from .

hashtag
Recognized header names for different types of survival

Below is a table of the column/row header names we recognize for each type. Note that these header names are case sensitive.

hashtag
Example Overall Survival

Data format specifications and supported biological data types

There are 2 basic data formats and 2 advanced data formats. Each of these formats has one or more biological data types that it supports.

hashtag
General Specifications for all data formats

We support most types of genomic and phenotype/clinical/sample annotations. For genomic data we support calls made on the raw data including but not limited to expression calls, mutation calls, etc. This is what TCGA calls ‘Level 3’ data and is typically a value on gene, transcript, probe, etc. We do not support FASTQ, BAMs, or other ‘raw’ files. Please contact us if you have any questions.

Hubs for institutions, collaborations, labs, and larger projects

Institutional Xena Hubs allow you to share data, visualizations, and analyses with a specific group of people. Xena Hubs can be set up on any server or in the cloud. You control who has access to the Xena Hub by controlling who has access to the server on which it is hosted.

To make your data publicly available, simply make the server open to the web.

hashtag
Download

First, download the ucsc_xena_xxx.tar.gz file to your server, here:

PFI

Local recurrence interval

LRI.time

LRI

Distant metastasis interval

DMI.time

DMI

Distant disease free survival

DDFS.time

DDFS

Invasive disease free survival

IDFS.time

IDFS

Regional recurrence

RR.time

RR

Relapse

Relapse.time

Relapse

Metastasis

Metastasis.time

Metastasis

Distant recurrence interval

DRI.time

DRI

Distant metastasis free survival

DMFS.time

DMFS

400

Survival Type

'Time to Event' Header name

'Event' Header name

Overall Survival

OS.time

OS

Disease free interval

DFI.time

DFI

Disease specific survival

DSS.time

DSS

Progression free interval

sample

OS

OS.time

TCGA-AB-1234-01

0

100

TCGA-AB-6789-01

1

200

TCGA-CD-1234-01

0

300

TCGA-CD-5678-01

A PRACTICAL GUIDE TO UNDERSTANDING KAPLAN-MEIER CURVESarrow-up-right

PFI.time

1

https://genome-cancer.ucsc.edu/download/public/get-xena/index.htmlarrow-up-right

The file to download is the one called "Tar archive, no updater or JRE - recommended for linux server developments". Uncompress and extract the .jar file (cavm-xxx-standalone.jar). The current version is 0.25.0.

hashtag
Start the hub

The hub can be started with "java -jar cavm-xxx-standalone.jar". Passing option --help will display usage information.

Note that you need to use Java 8 to run the hub.

There are several options you will want to set.

To bind an external interface (instead of loopback), use "--host 0.0.0.0".

The connection between your hub and the Xena Browser is through https, use "--certfile" and "--keyfile" options to set them.

There are three paths that can be configured: the database file, the log file, and the root directory for data files to be served. These are set by --database, --logfile, and --root. If you don't set these, they will default to paths under ${HOME}/xena.

hashtag
Example start script for an open-access hub

Copy the content below to a file "start_script"

Link server.jar to cavm-x.xx.x-standalone.jar

Make "start_script" executable

Run "./start_script"

Your hub is now running on "https://computer-external-ip:7223".

hashtag
Getting a security certificate for an open-access hub

When a Xena Hub starts, it opens two consecutive ports, for http and https connections, e.g. 7222 and 7223. HTTP is always the lower number, and HTTPS is always the higher number. This means your hub has two urls

http://ip:7222arrow-up-right or https://ip:7223arrow-up-right

Connecting via HTTP to the hub is no longer supported by modern web browsers, thus you will need to connect via HTTPS. To do this you will need an HTTPS certificate and private key. Paths to the cert and key are set with --certfile and --keyfile. This might seem redundant for a hub behind a firewall, but the web app has no influence over the security policies of the web browser. HTTPS certificates can be acquired from free public Certificate Authorities, or via NIH InCommon.

Note that the section below detailing a way to utilize ssh tunneling to get around this, which can be used for testing purposes only.

hashtag
Make your data ready

You will need to make your data file ready just like for local Xena hub on your laptop. Please see instructions on data format specificationsarrow-up-right.

You will also need to make your data's meta-data file (xxx.json) ready. Please see loading data from the command line for instructions.

hashtag
Load data through command line

Once the hub is running, and input files have been placed in the --root directory, a file can be loaded by running the jar a second time, with the -l option, like

hashtag
Delete data through command line

If your hub is run on the default 7222 port, you can load data with

If your hub is running on a different port, you load data with

Please contact us at genome-cancer@soe.ucsc.edu for more assistance.

If your hub is run on the default 7222 port, you can delete data with

If your hub is running on a different port, you delete data with

hashtag
Viewing data from the hub

Go to Data Hub page herearrow-up-right, add "https:computer-external-ip:7223"

You can now go to the visualization and add a cohort or study listed in your hub.

hashtag
If you don't have a security certificate yet

If you don't have a security certificate yet but you would like to verify that the hub is working you can use ssh tunneling. An example of how to do this for AWS is below, where it is assumed that the xena hub is running on port 7222 for http and 7223 for https. In this scenario, you start the hub without using --certfile and --keyfile options.

Assuming that you typically ssh into EC2 on AWS like this,

you will now set up an ssh tunnel to port 8000 on your computer. To do this we add the -L option:

Now on your computer, http://localhost:8000 is the same as the http://aws-ip:7222. Chrome Browser does not allow a connection to http://aws-ip:7222, but it will allow a connection to http://localhost:8000.

After setting up the ssh tunnel go to Data Hub page herearrow-up-right, add "http://localhost:8000".

hashtag
How to set up my hub to have a url like https://tcga.xenahubs.netarrow-up-right

Alternatively, you can run the hub behind a reverse proxy, and attach the certificate and keyfile to Apache, Nginx or AWS load balancer configurations. In this scenario, you start the hub without using --certfile and --keyfile options. This is useful if you want your hub to have a url like "https://tcga.xenahubs.netarrow-up-right". You set up your DNS to point the hostname (tcga.xenahubs.net) to ip address of the server on which the hub is running.

An example apache configuration on AWS VM

in /etc/httpd/conf/httpd.conf

hashtag
A landing page for my hub

If you have a markdown file called $DOCROOT/meta/info.mdown in your hub's document root directory, the markdown file will serve as a splash page for your hub. An example is the UCSC Toil RNA-seq Recompute hub: https://toil.xenahubs.netarrow-up-right. The corresponding markdown file is thisarrow-up-right.

hashtag
How do I add a 'Launch Xena' button like the TOIL landing page

<button class="hubButton" data-cohort="TCGA TARGET GTEx">Launch Xena</button>

To add a clickable button in the hub landing page, make sure the button has classname 'hubButton'. You also need to specify the cohort to view, defined by the data parameter 'data-cohort'. Once users click the button, the visualization wizard will be launched to the specified cohort. You can change the button label.

hashtag
A landing page for my cohort

You can also have a landing page for a study cohort. An example is the TCGA TARGET GTEx cohort: https://xenabrowser.net/datapages/?cohort=TCGA%20TARGET%20GTExarrow-up-right. The corresponding markdown file is thisarrow-up-right. The study cohort landing page is also a markdown file, which must be hosted in the https://github.com/ucscXena/cohortMetaDataarrow-up-right repository on github. The markdown file called https://github.com/ucscXena/cohortMetaData/cohort_$cohortName/info.mdown.

hashtag
How do I add a "Launch" button like the TCGA TARGET GTEx landing page

<button class="cohortButton" data-bookmark="bc7f3f46b042bcf5c099439c2816ff01">Example: compare FOXM1 expression</button>

The button must has a classname 'cohortButton'. If you have the data parameter 'data-bookmark', clicking the button will take the user to the bookmark view. If you don't have the 'data-bookmark' parameter, clicking the button will take the user to the visualization wizard with an empty spreadsheet. You can change the button label. You can as many button as you want.

--database -d default to ${HOME}/xena/database
--logfile default to ${HOME}/xena/xena.log
--root -r default to ${HOME}/xena/files/
#!/bin/bash

PORT=7222
LOGFILE=xena/xena7222.log 
DOCROOT=xena/files
DB=xena/myHub

java -jar server.jar -r ${DOCROOT} -d ${DB} --no-gui -p ${PORT} -H 0.0.0.0 --logfile ${LOGFILE} --certfile ${CERTFILE} --keyfile ${KEYFILE}> log 2>&1 &

disown
ln -sf cavm-0.xx.0-standalone.jar server.jar
chmod u+x start_script
./start_script
ln -sf cavm-x.xx.x-standalone.jar server.jar
java -jar server.jar -l /path/to/root/file.tsv
java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv
java -jar server.jar -x /path/to/root/file.tsv
java -jar server.jar -p ${PORT} -l /path/to/root/file.tsv
ssh -i "xena.pem" ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com
ssh -i "xena.pem" -L 8000:localhost:7222 ec2-user@ec2-11-111-11-111.compute-1.amazonaws.com
<VirtualHost *:443>
    ServerName tcga.xenahubs.net
    SSLEngine on
    SSLProxyEngine On
    SSLProxyVerify none
    SSLProxyCheckPeerCN off
    SSLProxyCheckPeerName off
    SSLProxyCheckPeerExpire off
    SSLCertificateFile YOURCERTIFICATE
    SSLCertificateKeyFile YOURKEY
    # setup the proxy                                                                                                                                                                                          
    ProxyPreserveHost On
    ProxyPass / https://localhost:9000/
    ProxyPassReverse / https://localhost:9001/
</VirtualHost>
We support tab-delimited and Microsoft Excel files (.xlsx and .xls). Tab-delimited files generally have a file name ending in .tsv or .txt, though we do not require this. Note that we load tab-delimited files much faster than Excel files. You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.
circle-exclamation

Please be careful when using Microsoft Excel to open files with gene names as Microsoft Excel will automatically convert some gene names into dates. For more information see: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7arrow-up-right

Please do not have any duplicate genes/probes/identifiers or samples. We will allow you to load with duplicates but will only display the first one encountered in the file.

We assume you use a '.' to indicate a decimal place as opposed to a ',' .

Here is a folderarrow-up-right with example data in addition to the examples below.

hashtag
Basic Genomic data: numbers in a rectangle/matrix/spreadsheet

These are numeric data called on genomic regions (e.g. exon expression or gene-level copy number). This data is in a rectangle where samples are columns and rows are the genomic regions (e.g. HUGO gene symbol, transcript ID, probe ID, etc). We also support samples as rows and genomic regions as the columns (i.e. the opposite orientation). For supported genomic regions, please see supported gene and probe names.

hashtag
Supported data types

  • RNA-seq expression (exon, transcript, gene, etc)

  • Array-based expression (probe, gene, etc)

  • Gene-level mutation

  • Gene-level copy number

  • DNA methylation

  • RPPA

  • and more ...

Contact us if you're unsure if we will support your data

circle-info

For samples that do not have expression for a particular gene, either have a blank field or use "NA".

An example of a genomic matrix file (in this case, expression):

Sample

TCGA-BA-4074-01

TCGA-BA-4075-01

TCGA-BA-4076-01

ACAP3

0.137

NA

0.022

CTRT2

0.024

0.805

0.256

ALK

hashtag
Basic Phenotypic data: categories or non-genomic in a rectangle/matrix/spreadsheet

These are data on a sample or patient that is categorical in nature (e.g. Tumor Stage or 'wild type' or 'mutant' for a gene) or is numerical but non-genomic (e.g. age or a genomic signature). Samples can be columns and rows can be phenotype/clinical/sample orientation or vice versa. We support both orientations.

hashtag
Supported data types

  • phenotype/patient/clinical data (age, weight, if there was blood drawn, etc)

  • sample/aliquot data (where it was sequenced, tumor weight, etc)

  • derived data (regulon activity for a gene, etc)

  • genomic signatures (EMT signature score, stemness score, etc)

  • other (whether a sample has an ERG-TMPRSS2 fusion, whether a sample has WGS data available, etc)

This is our most flexible data type. If you are wondering if your data is considered to be 'phenotypic' please contact us.

hashtag
Categorial vs numerical data

We support both numerical and categorical data. For numerical data please use a blank field for any samples which may be missing data. For categorical data you can use a blank field or "NA" for any samples which may be missing data.

circle-info

Note that if you use "NA" for a missing numerical field then the Xena software will automatically treat that column as a category.

To have it be treated as a numerical field please use a blank field.

circle-info

For more information about configuring your phenotype fields, such as controlling the order for categorical features, please see our Metadata Specifications.

An example of a phenotype matrix file:

sample

ER_status

disease_status

age

TCGA-BA-4074-01

positive

complete remission

63

TCGA-BA-4082-01

positive

complete remission

54

TCGA-BA-4078-01

hashtag
Advanced Segmented data

For segmented data, we require the following 5 columns: sample, chr, start, end, and value. Note that your column headers must be these names exactly!

Please use 'NA' to indicate no data.

hashtag
Supported data types

  • copy number

We currently accept hg38, hg19, hg18 coordinates.

Example segmented copy number data with required columns:

sample

chr

start

end

value

TCGA-V4-A9EL-01

chr1

61735

16815530

0.041

TCGA-V4-A9EL-01

chr1

16816090

17190862

hashtag
Advanced Positional data

For positional data, we require 6 columns: sample, chr, start, end, reference, alt. Note that your column headers must be these names exactly!

Other columns that may follow are: gene, effect, DNA_VAF, RNA_VAF, and Amino_Acid_Change. These other columns are not required but will enhance the visualization of this data, such as the "gene" column will enable displaying mutations when queried by gene names in addition to queried by genomic coordinates. The “effect” column will color the mutations by effect (the default color is gray). The effect terms are "Nonsense" (color red), "Frameshift" (red), "Splice" (orange), "missense" (blue), "Silent" (green), and etc. The full list of accepted terms can be found here in our codearrow-up-right.

circle-info

Note that Xena will not call the gene, variant effect, etc for you. All gene annotation information must be included in the file

hashtag
Supported data types

  • mutation data

We currently accept hg38, hg19, hg18 coordinates.

Example mutation data with the six required columns, plus the gene column:

sample

chr

start

end

reference

alt

gene

TCGA-AB-2802-03

chr2

29917721

29917721

G

A

ALK

circle-info

To specify a sample is assayed but no mutation is detected, you need a line in the file with three columns filled: sample, start, end. "start" and "end" are required to be integer (if left empty, the data loader will reject the file), so use -1 to indicate that these are bogus coordinates. The rest of the columns are empty strings.

hashtag
Advanced Other data

We support a number of other specialty data types such as structural variants. Please contact us if you have this data so we can help you load it.

Probes/transcripts/identifiers we recognize

We will automatically detect and map your probes/transcripts/identifiers to HUGO gene names. For instance, we will map Affy probe IDs to HUGO gene names so that you can enter a HUGO gene name when creating a column in the Visual Spreadsheet and we will pull up the corresponding Affy probes.

circle-info

You can still load your data if you do not see your identifiers listed. We will just not map them to HUGO genes for you. This means that in the visualization you will need to enter your identifiers as they appear in your file.

hashtag
Supported probes and other identifiers
  • Affy U133 array (hg19) e.g. 1007_s_at

  • Affy HumanExon1.0ST (hg18) e.g. 2315101

  • Affy Human Gene 1.0 ST array (hg19) e.g. 7896736

  • Affy Human SNP6 array (hg18) e.g. CN_473963

  • Agilent Human gene expression 4X44K array (hg18) e.g. A_23_P100001

  • Agilent SurePrint G3 Human CGH array 2x400K (hg18) e.g. A_16_P01651995

  • Agilent Human 1A array (hg18) e.g. A_23_P149050

  • Exon: GENCODE 19 e.g. ENSE00000327880.1

  • Infinium HumanMethylation27 array GDC version (hg38) e.g. cg00000292

  • Infinium HumanMethylation27 array TCGA legacy version (hg18) e.g. cg26211698

  • Infinium HumanMethylation450 array TCGA legacy version (hg19) e.g. cg13332474

  • Infinium HumanMethylation450 array GDC version (hg38) e.g. cg00000029

hashtag
Supported genes and transcripts

  • HUGO: human gene symbol (hg18) e.g. TP53

  • HUGO: human gene symbol (hg19) e.g. TP53

  • HUGO: human gene symbol (hg38) e.g. TP53

  • Gene: Ensembl human genes (hg19) e.g. ENSG00000223972

  • Gene: Ensembl human genes (hg38) e.g. ENSG00000223972

  • Gene: GENCODE 19 e.g. ENSG00000223972.4

  • Gene: GENCODE 22 comprehensive e.g. ENSG00000223972.5

  • Gene: GENCODE 23 comprehensive e.g. ENSG00000223972.5

  • Gene: GENCODE 23 basic e.g. ENSG00000223972.5

  • Gene: UCSC Known genes (hg18) e.g. uc001aaa.1

  • Gene: UCSC Known Genes (hg19) e.g. uc001aaa.1

  • Transcript: GENCODE 19 comprehensive e.g. ENST00000456328.2

  • Transcript: GENCODE 23 comprehensive e.g. ENST00000456328.2

  • Transcript: GENCODE 23 basic e.g. ENST00000456328.2

  • Transcript: RefSeq (hg19) e.g. NM_000014

  • miRNA miRBase v13 stem-loop (hg18) e.g. hsa-mir-1977

  • miRNA miRBase v20 stem-loop (hg19) e.g. hsa-mir-1302-2

Contact us if you don't see your gene or probe names in this list and we may be able to add it for you.

hashtag
FAQ

hashtag
FAQ: Xena didn't map the right probes

If it looks like we picked the wrong set of probes, please click 'Advanced' next to the 'Import' button on the last screen of the wizard to load data. You can then pick the appropriate probes.

0.098

0.805

1.87

negative

undergoing treatment

65

-0.4227

TCGA-V4-A9EF-01

chr4

86979944

115173700

0.0414

TCGA-AB-2802-03

chr1

119270684

119270687

TTAAA

T

MYC

TCGA-AB-2867-03

chr1

150324146

150324146

T

G

PRPF3

FAQ/Troubleshooting Guide

hashtag
I ran into APPLE warning you about “unidentified developer” warning when installing Xena. What do I do?

There are a couple of options. You can right-click the .dmg and chosen 'open'. You can also press the Control key, then click the app icon, then choose Open from the shortcut menu. These help pages might help: http://www.iclarified.com:8081/28180/how-to-open-applications-from-unidentified-developers-in-mac-os-x-mountain-lionarrow-up-right and from Apple: https://support.apple.com/kb/PH25088?locale=en_USarrow-up-right .

hashtag
I see my probes/genes/transcripts when loading my data, but I don't know whether to choose hg18, hg19 or hg38?

The only time the assembly matters is if you decided to visualize part or all of a chromosome, rather than a gene/probe/transcript. If you want to visualize only genes/probes/transcripts than it does not matter which assembly you choose.

hashtag
I don't see my study listed in the visualization

You Local Xena Hub must be running to view any data that you have loaded into it. Please ensure it is started up. You can also check which studies are on your hub and what data is in them by going to the My Computer Hub page: .

You also may not see your study if the hub is still loading the data. Wait a few minutes and refresh the page.

hashtag
I'm entering a gene name but it only draws gray

When you loaded your genomic data we asked what type of genes, transcripts or probes you used. If you selected one of the options from the drop down menu then you can enter HUGO gene names or the identifiers in your file. If you did not select one of the options then you will need to enter the identifiers as they appear in your file.

hashtag
Can I load two or more phenotype files to the same study?

Yes, we will allow you to select phenotypes from both files in the visualization.

hashtag
Help! My file is too large to open in Microsoft Excel.

You might be able to load your file anyways, depending on the format. Give it a try and if you are unable to load it, write us an email and we may be able to fix your file for you.

hashtag
How do I convert my .xls or .xlsx into a tab-delimited file?

You can export a Microsoft Excel file as a tab-delimited file using the 'Save as ...' function.

circle-exclamation

Please be careful when using Microsoft Excel to open files with gene names as Microsoft Excel will automatically convert some gene names into dates. For more information see:

hashtag
Unix vs DOS

We require that your data files have a unix line ending. To ensure that your files have this line ending on a DOS, please follow the help here:

Note that this requirement is only for data files, not for the associated .json files.

xenabrowser.net/datapages/?host=https%3A%2F%2Flocal.xena.ucsc.edu%3A7223arrow-up-right
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7arrow-up-right

Loading data from the command line

In addition to the data itself, we require some metadata about your file. When you use our website to load your data we fill in this metadata for you. When you use the command line, you will need to provide this data in an additional file.

hashtag
Metadata file requirements

The metadata file is a .json file and follows json formattingarrow-up-right. The metadata .json file needs to be in the same directory as the data file. The metadata file and the data file need to have the same base name, including any file extensions (e.g. my_first_dataset and my_first_dataset.json OR my_second_dataset.txt and my_second_dataset.txt.json).

There are two required fields: type and cohort.

hashtag
Type

Type can be:

  • 'genomicMatrix' -> where samples are columns and genomic regions are rows. Note that for loading on the command line we do not support the other orientation

  • 'clinicalMatrix' -> where samples are rows and phenotypic columns are rows. Note that for loading on the command line we do not support the other orientation

  • 'mutationVector' ->

hashtag
Example:

hashtag
Cohort

Cohort is used to know if there are other data on the samples that you are loaded. You can either specify a pre-existing cohort or create your own. Cohort names are displayed on the dataset pages and the cohort drop down menu on the Heatmaps page.

For existing cohorts, you need to enter the cohort name EXACTLY as it appears as the existing cohort name. Note that our cohort names are case sensitive.

hashtag
Example

hashtag
Reference

If you are loading a mutation or segmented copy number file you will also need to specify the reference genome. You do not need to specify this for other file types

hashtag
Example

hashtag
Probemap

If you are loading a file that has probes, transcripts, or exons and you would like to query your data by gene, you will need to provide a mapping file. You do not need to specify this for other file types.

Here is an example probemap file (a delimitated file):

We have many probemap files that you can see via our .

If you do not see a probemap that will work for you, please let us know.

To reference a probemap you need three files:

  1. Include the probemap reference in your data file .json

  2. Have the probemap file in the same directory as your data file and data file .json

  3. Also have a .json file for the probemap so that we know how to load it

circle-exclamation

Note that to reference a probemap you need to load the probemap first, then load the data file.

hashtag
Example data file .json

hashtag
Example probemap

hashtag
Example probemap .json file (required to be in the same folder as the probemap)

hashtag
More information about the metadata

hashtag
Commands to load data

Put both your .tsv and .json files in your_home_directory/xena/files. Then run the jar, passing in the file name, like so:

→ loads all files

OR

→ loads just file1.tsv

Note that you will need to substitute the name of the .jar. file As of the time of writing (September 20, 2018), the name of the .jar file was cavm-0.22.0-standalone.jar. On linux this will be in the directory where you opened the archive. On Windows or MacOS, use your operating system’s file search capability to search for cavm*jar. On Windows you will need to use the full path to your home directory, instead of “~”.

Note you do not need to load the .json files. Xena will automatically look for these and load them.

hashtag
Commands to delete data

→ delete just file1.tsv

→ delete file1.tsv and file2.tsv

hashtag
Help

You can always type:

for help.

‘genomicSegment’->

genomic data
phenotypic data
mutation data
https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemaparrow-up-right
xenaPython app
https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemaparrow-up-right
Metadata Specificationchevron-right
{"type":"mutationVector"}
{"type":"mutationVector", 
 "cohort":"TCGA Breast Cancer"}
{"type":"mutationVector", 
 "cohort":"TCGA Breast Cancer", 
 "assembly":"hg19"}
#id    gene    chrom    chromStart    chromEnd    strand 
id_1    AADACL3    chr1    12776118    12776347    +
host =“https://reference.xenahubs.net”
xenaPython.probemap_list(host)
{"type":"genomicMatrix", 
 "cohort":"TCGA Breast Cancer", 
 ":probeMap":"/unc_v2_exon_hg19_probe_TCGA"}
{ “type”:“probeMap”, 
  “assembly”:“hg19"}
java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/*
java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/file1.tsv
java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv
java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv ~/xena/files/file2.tsv
java -jar cavm-0.xx.0-standalone.jar -h
segmented copy number data

Viewing your own data

Today I Learned: Change DOS to Unix text file format in VIMHashrocketchevron-right
Logo