Kaplan Meier Survival Analyses are a way of comparing the survival of groups of patients. More information on what a Kaplan Meier analysis is can be found in this article
To generate a KM plot, click on the column menu at the top of a column and choose 'Kaplan Meier Plot'.
For numerical or continuous features, you will have the option of having 2 groups of samples, 3 groups of samples, or viewing the upper vs lower quartile. For 2 groups, we divide the samples on the median. For 3 groups, we divide samples into the upper third, middle third, and lower third.
When viewing the upper vs lower quartile, note that we only include samples that are greater than (not greater than or equal to) the upper quartile, and the same for the lower quartile.
Note that all are used to calculate the median and other dividing values, whether or not they have survival data. To see which samples have survival data, add the column 'OS' from the phenotype data.
If more than one sample has the same value, we put the samples in a group together, even if this means the groups end up being unequal in size.
For categorical features, we only show the first 10 categories.
For mutation features, we divide samples into those with any mutation and those without. To make different groups (e.g. samples with nonsense mutations vs those without), create your own subgroups and run a KM plot on the new column
We remove samples with 'null' data for all plots.
We default to Overall Survival. Users can select different end points if they are available. An example of this is in the TCGA PanCancer Study.
We default to the last time any individual in the plot was known to be alive. You can change this to be 1-year or 5-year survival by changing the time cutoff at the bottom of the screen. The statistics will automatically recalculate. TCGA data uses days as their measurement of time.
You can generate a high quality PDF by clicking the PDF icon.
You can download the data used to generate the KM plot using the download icon. It will download the Event and Time to Event columns, in addition to the sample ID, patient ID, groups, and underlying data.
When there are multiple curves or lines in a KM plot, Xena Browser compares the different Kaplan–Meier curves using the log-rank test. The Browser reports the test statistics (𝜒 2) and p-value (𝜒 2 distribution). Data is retrieved in real-time from Xena Hub(s) to a user's web browser and the test is performed in the browser to maintain your data privacy.
The statistics the Xena Browser reports are equivalent to R's survival package, survdiff, with rho=0 (default in R).
If all patients in a particular group (i.e. line) are censored before any event happens for the whole population (including all the groups), we exclude this group from the statistical analysis and perform the log-rank test on the remaining groups. We do this because we have no way to know the number of people at risk for this particular group at any of event times, and therefore can not compute any statistics for this group. R handles this exception in the same way. Although this group is removed from the statistical analysis, we still display the group in the KM plot.
Note that we do not automatically remove duplicate patients (for instance if there is a tumor and a normal sample from the same patient). You can determine if there are duplicate patients by looking for the "!" icon next to the p value. Learn how to remove duplicate samples.