# Basic Tutorial: Section 2

## Description

This tutorial is made for those who have never used Xena but who have completed Section 1 of the Basic Tutorial. We will cover how to filter to just the samples you are interested in, how to create subgroups, and how to run a Kaplan Meier survival analysis.

## Prerequisites

This tutorial assumes completion of the [Basic Tutorial: Section 1](https://ucsc-xena.gitbook.io/project/tutorials/basic-tutorial-section-1). This tutorial begins where the Basic Tutorial: Section 1 ends.

## Estimated time needed

**Part A**: 7 min

**Part B**: 15 min

**Part C:** 5 min

## Learning goals

**Part A**

* Search for samples of interest
* Remove samples with no data

**Part B**

* Make subgroups
* Rename subgroups

**Part C**

* Run a Kaplan Meier survival analysis
* Use a custom time endpoint

## Tutorial

In the Basic Tutorial Section 1 we found that we found that samples from patients that have aberrations in *EGFR* have relatively higher expression. These aberrations could be mutations or copy number amplifications.&#x20;

Now we are going to look at whether those patient with aberrations in their samples also have a worse survival prognosis.

{% hint style="warning" %}
To ensure your columns are sorted the same as those in this tutorial, [please start at this link](https://xenabrowser.net/?bookmark=373e6aac9ce49ce3420c95e81d1eb686)
{% endhint %}

### Part A

Our goal is to remove patient's samples with no data (i.e. null) from the view. This will make the view look cleaner and remove irrelevant samples from our Kaplan Meier survival analysis.

#### [Ending Screenshot](https://xenabrowser.net/?bookmark=c892731ed5eb7dfe875146d0aca87bc3)

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2FK6t8ORqGEbDbUXjhT2gC%2FScreenshot%202024-10-14%20at%205.05.09%E2%80%AFPM.png?alt=media&#x26;token=2f14714f-c941-412a-8f8b-172184f4c674" alt=""><figcaption></figcaption></figure>

#### Steps

1. Type 'null' into the samples search bar. This will highlight samples that have 'null' values in any column on the screen. Null means that there is no data for that sample for that column.
2. Click the filter menu and select 'Remove samples'.
3. Delete the search term.

#### Video of steps

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2FlEk086G6QJs6psnyQrTQ%2Ftutorial2.1.gif?alt=media&#x26;token=d2606ad0-10d3-482f-90cc-f264c5a1ebbc" alt=""><figcaption></figcaption></figure>

{% hint style="success" %}
More information

* [Filtering and subgrouping samples](https://ucsc-xena.gitbook.io/project/overview-of-features/filter-and-subgrouping)
* [Supported search terms](https://ucsc-xena.gitbook.io/project/overview-of-features/filter-and-subgrouping/supported-search-terms-for-finding-samples)
  {% endhint %}

#### Shortcut for Part A

Instead of typing 'null' and removing those samples from the view, you can also use the 'Remove samples with nulls' shortcut in the filter menu.

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2Fqdfza73TZgfhPkKFy8ce%2Ftutorial2.2.gif?alt=media&#x26;token=e4b3930c-b5c0-4b63-817e-655cd68c313f" alt=""><figcaption></figcaption></figure>

### Part B

Our goal is to create two subgroups, those patient's with samples with aberrations in *EGFR* and those patient's samples without aberrations in *EGFR*. We will then name the subgroups.

#### [Ending Screenshot](https://xenabrowser.net/?bookmark=d29d9b70671ac2af3dda01a7448be38f)

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2Fw7nXwW5pHX05Ci2llp4s%2FScreenshot%202024-10-14%20at%205.19.03%E2%80%AFPM.png?alt=media&#x26;token=55693cb5-b3af-4b11-89c1-04e07082dbe5" alt=""><figcaption></figcaption></figure>

#### Steps

1. Type **'**(mis OR inframe) OR B:>0.5'  into the samples search bar. This will select samples that either have a missense or inframe deletion '(mis OR inframe)', or where copy number variation (column B) is greater than 0.5. Note that I arbitrarily choose a cutoff of 0.5.

{% hint style="danger" %}
You must have the **copy number variation column as column B** for the search term  **'**(mis OR inframe) OR B:>0.5' to work. The 'B' in 'B:>0.5' is instructing Xena to search in column B for values that are greater than 0.5.
{% endhint %}

1. Click the filter menu and select 'New subgroup column'. This will create a new column that has samples that met our search term marked as 'true' (ie. those that have an *EGFR* aberration) and those that did not meet our search term as 'false' (ie. those that do not have an *EGFR* aberration).
2. Click the column menu for the column we just created (column B) and chose 'Display'.
3. Rename the display so that samples that are 'true' are instead labeled as 'EGFR Aberrations' and the samples that are 'false' are instead labeled as 'No EGFR Aberrations'. Click 'Done'
4. Delete the search term. This will remove the black tick marks for matching samples.

#### Video of steps 1

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2Fde13iVm6PescWx4ziSHc%2Ftutorial2.3.gif?alt=media&#x26;token=4bbace23-4920-4977-9339-206c09c50725" alt=""><figcaption></figcaption></figure>

#### Video of steps 2-4

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2FgQFMOkAENgpqK11BU1Km%2Ftutorial2.4.gif?alt=media&#x26;token=30429c91-420d-4f7d-8385-a026957ba401" alt=""><figcaption></figcaption></figure>

{% hint style="success" %}
More information

* [Filtering and subgrouping samples](https://ucsc-xena.gitbook.io/project/overview-of-features/filter-and-subgrouping)
* [Supported search terms](https://ucsc-xena.gitbook.io/project/overview-of-features/filter-and-subgrouping/supported-search-terms-for-finding-samples)
  {% endhint %}

### Part C

Now that we have our subgroups we will run a Kaplan Meier survival analysis. Note that TCGA survival data is in days, hence the x-axis will be in days.

#### [Ending Screenshot](https://xenabrowser.net/?bookmark=6787986724f81fd87d64980ff8ef85e9)

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2Fv9Tc7C7ZXzbNGSMra2Ed%2FScreenshot%202024-10-14%20at%205.22.24%E2%80%AFPM.png?alt=media&#x26;token=ebe761ca-7490-4917-b420-4e31d8ec9813" alt=""><figcaption></figcaption></figure>

We can now see that there is no difference in survival between patients with *EGFR* aberrations and those without.

#### Steps

1. Click the column menu at the top of column B.
2. Choose 'Kaplan Meier Plot'.
3. Click 'Custom survival time cutoff' at the bottom of the Kaplan Meier plot.
4. Enter 3650, as this is 10 years.

#### Video of steps

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2FqInMyvxYBMBEYRoYZQbG%2Ftutorial2.5.gif?alt=media&#x26;token=def966f5-4332-4307-a375-3940f0ba0d85" alt=""><figcaption></figcaption></figure>

{% hint style="success" %}
More information

* [Kaplan Meier survival analysis](https://ucsc-xena.gitbook.io/project/overview-of-features/kaplan-meier-plots)
  {% endhint %}

## Test your knowledge

{% tabs %}
{% tab title="Question 1" %}
Starting at the end of Part A, filter down to only those patient's samples that have a missense mutation.
{% endtab %}

{% tab title="Answer 1" %}
Search term: "missense"

[**Ending screenshot**](https://xenabrowser.net/?bookmark=14d3d143685004f3c61e40c3cdaa4855)

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2FPaFponCEZ6cpeF5hHQ62%2FScreenshot%202024-10-14%20at%205.25.12%E2%80%AFPM.png?alt=media&#x26;token=36055fea-8624-401b-90b1-62eb3d1ac15c" alt=""><figcaption></figcaption></figure>
{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Question 2" %}
Starting at the end of Part A, create two subgroups: those patient's samples with *EGFR* expression greater than 4 and those with *EGFR* expression less than 4.
{% endtab %}

{% tab title="Answer 2" %}
Search term: "C:>4"

[**Ending screenshot**](https://xenabrowser.net/?bookmark=5bc9a8c614a34134a56b51df73c9aaa2)

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2F8jfIGen0lpNP5TpJqWWC%2FScreenshot%202024-10-14%20at%205.27.19%E2%80%AFPM.png?alt=media&#x26;token=274eacb8-1f7e-481f-8f88-b3c45d5bb108" alt=""><figcaption></figcaption></figure>
{% endtab %}
{% endtabs %}

{% tabs %}
{% tab title="Question 3" %}
Starting at the end of Part A, run a Kaplan Meier analysis on the *EGFR* expression column.
{% endtab %}

{% tab title="Answer 3" %}
[**Ending screenshot**](https://xenabrowser.net/?bookmark=8c5344fd189aa168bd4e8495e2c42963)

<figure><img src="https://3676322134-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LAsF1Wuj3-Bs3YBFd7f%2Fuploads%2FpLPSx9BQq7HB69oKy73I%2FScreenshot%202024-10-14%20at%205.28.48%E2%80%AFPM.png?alt=media&#x26;token=403558c7-352f-4bd5-826b-47200fe86eff" alt=""><figcaption></figcaption></figure>
{% endtab %}
{% endtabs %}
