Characterizing
the Cancer Genome in
Lung Adenocarcinoma.
A
presentation by Peter Belenky
Based
on Weir et al.
Characterizing the cancer genome in lung
adenocarcinoma. Nature December
2008.
Genetics
144: Oncogenomics
Dartmouth Medical School
Course Director: Charles Brenner, Ph.D.
Spring 2008
Overview
The development of cancer is defined by an
accumulation of mutations that provide a growth advantage to a
particular cell. A strong argument can be
made that the progress of cancer is
analogous to Darwinian evolution seen in the
zoological world. Cancer development is
not an organized process and as a result each tumor is likely to have a
largely
unique combination of genetic alterations. Two separate adenocarcinoma
tumors diagnosed
by the same pathologist as being at the same stage and originating from
the
same cell subtype are expected to have a highly divergent genetic
finger print.
However within these highly variant genetic alterations there are
numerous
common occurrences that appear in large subsets of tumors.
Identifying the commonalities between tumors can shed light on the biology of cancer development, provide targets of therapy, and help improve diagnosis. Genome wide searches for genetic alterations are an increasingly popular method for identifying commonalities between cancers. The paper presented within this web page uses single nucleotide polymorphism (SNP) arrays to characterize copy-number alterations in lung adenocarcinoma. Weir et al found 26 large scale events and 31 focal events that were prevalent in their tumor samples. Focal events contained multiple known tumor suppressers and oncogenes, in addition to some novel possible tumor suppressors and oncogenes. The most common amplification (14p13.3) contained the proposed transcription factor NRX2-1. NRX1-2 is overexpressed in multiple lung cancer cell lines where it improves cell viability and anchorage independent cell growth.

Introduction
Primary lung adenocarcinoma is a non-small cell lung cancer (NSCLC), affecting the peripheral tissues of the lung. Lung adenocarcinoma accounts for 30-40% of all lung cancers. This cancer is strongly correlated with smoking, however about 25% of all patients with lung adenocarcinoma have never smoked. Surgical resection is the most common treatment and it is often paired with chemo or radiation therapy. Depending on the choice of treatment and the stage at which the cancer was diagnosed, the five year survival rate can be anywhere from 50% to 2%. Closely adapted from; Sun et al., Nature Reviews, 2007
Nicotine
derived nitrosoaminoketone NNK (Panel 1 A) is the primary mutagen
produced from
nicotine. NNK covalently binds to DNA forming bulky
adducts leading to
numerous
mutations. The most common mutations seen in lung adenocarcinoma are
activating
mutations in KRAS and EGFR oncogenes and disrupting mutations in the
p53 tumor suppressor.
Recent evidence indicates that there is a striking mutation pattern
difference
between smoking induced adenocarcinoma and adenocarcinoma in
non-smokes. Smokers
are more likely to have mutations in KRAS, where as non-smokers tend to
have
mutations in EGFR (Pane 1B, C). Weir et al included samples from both
groups in
their study.

Results
Method
Weir et al started with 575 DNA samples obtained from lung tumors 528 of which were diagnosed as originating from primary lung adenocarcinomas. In addition to tumor DNA, they had 439 matched and 53 unmatched normal non tumor DNA samples. Tumor and not tumor samples were collected from 8 different locations in the US and abroad. Samples in this study had an equal gender distribution, and included a broad range of patients of varying age, cancer progression, and smoking history.
Tumor and paired normal DNAs were SNP genotyped using the Affymetrix 500K Human Mapping Array Set, StyI chip (Panel 2A). DNA was digested using StyI, ligated to an adaptor, and PCR amplified. The amplified DNA was then fragmented using DnaseI, labeled, and hybridized to the chip. The StyI chip contains probe sets for 238000 individual SNPs. For each tumor at each probe, the researchers calculated an intensity ratio between the tumor DNA and normal DNA samples, allowing Weir et al to measure copy number for each of the 238K SNPs.
Weir et al., analyzed their raw data using GLAD (gain or loss analysis of DNA) to produce segmented copy numbers for each tumor. Part of this initial data analysis included a quality control step. Samples were discarded due to: technical failure, low SNP identity to normal sample, too much copy # variation (>100 events), and high copy number similarity to the normal sample, indicating stromal contamination. After GLAD and QC Weir et al had 371 high quality tumor samples and 242 matched normal samples (Panel2B). This data was further analyzed using the GISTIC (genome identification of significant targets in cancer) algorithm (Panel 2C). GISTIC calculates a score based on amplitude and frequency of copy number change at each SNP. Amplifications or deletions were considered significant only if they covered more than 8 consecutive SNPs.
Using
GISTIC they identified 31 focal events (<50% of a chromosome arm)
and 26
large scale events (>50 % of a chromosome arm). Weir et al.
identified substantially
more large scale events than the number found by any other study (10
amplifications
and 16 deletions). When the copy number
data for each of the 371 tumors is ordered by intensity of inter
chromosomal
variation and divided into tertiles, a distinct pattern of
amplification and deletion
emerges (Panel 3A). For example, the p
arm of chromosome 5 shows strong red shading, indicating high frequency
and
amplitude of amplification, whereas the p
arm of chromosome 8 is highly blue, indicating high frequency and
amplitude of deletion.
GISTIC data can also be plotted as a function of q value (false
discovery rate)
at each chromosome position (Panel 3 B). This data shows numerous large
scale
amplification and deletion events above the .25 (q value) cut of.
Because a
large scale event covers so much genetic material, they could not match
large
scale events with particular gene events or clinical parameters with
statistical significance.

Adapted from Weir et al.
LOH
Large and small scale deletions are likely to lead to loss of heterozygosity (LOH). Weir et al. were unable to effectively characterize genome wide LOH because of the high non-tumor content of their sample. They estimate non-tumor DNA content to be 78%, 65% and 50% in their botom, middlle and top tertiles respectively. Weir et al report LOH only for their top tertile (Panel 4 A and B). As expected LOH in Panel 4B closely matches large and focal deletions plotted in Panels3 A, B and C.

Adapted from Weir et al.
Focal
Delitions
After analyzing large scale events without much success, Weir et al. started to look at focal deletions and amplifications. A sufficiently localized focal deletion event is likely to uncover a tumor suppressor gene. They analyzed 5 of the most statistically significant focal deletions and found a known or presumed tumor suppressor gene for each deletion (bottom of Panel 5). Previously known tumor suppressors included: CDKN2A/ CDKN2B, inhibitors of Cdk4/6; PTEN, a phosphatase involved in inhibition of the AKT pathway; and RB1, a regulator of cell progression through G1. New candidate tumor suppressors were: PTPRD, a tyrosine phosphatase; PDE4D, a phosphodiesterase that degrades cAMP; and AUTS2, a protein of unknown function. Based on the information available they could not conclude that the new candidates had a specific causative effect as opposed to being passenger mutations. To further test the validity of their unknowns as tumor targets, they sequenced the exons of AUTS2, PDE4D, and PTPRD in primary tumor samples. AUTS2, and PDE4D, had no validating mutation, but PTPRD had 11 mutations in total of 188 samples. Most significantly, 3 of the mutations inactivated the tyrosine phosphatase domain.
They note that this method, will likely they miss many significant tumor genes. For example, the chromosome location encoding TP53, which is mutated in up to 50% of all adenocarcinomas, shows no large or small scale deletions. This statement is clearly obvious, because many tumor suppressors are likely to be inactivated through: direct mutations; delitions, amplifications and mutations in non-coding DNA; disrupted mRNA stability; and many other alterations that would not be identified by this study.

Focal amplifications are likely
to pinpoint oncogenes. They list 17 of their strongest amplifications
at the
top of Table1 (Panel 5). Scanning the list we see the names of many
known and
familiar oncogenes such as KRAS, EGFR, and MYC. In addition to the
known genes,
some of their amplification events contained no identifiable oncogenes
and two
pointed two new and untested candidates. A small amplification of the p
arm of
chromosome 6 contained two genes; from witch they chose VEGFA as the
likely oncogene
because it encodes a vascular endothelial growth factor. They did not
pursue
the VEGFA story any further. It is
important
to observe that the average size of their focal amplification is
greater than 1
Mb, an aria likely to contain multiple genes. As a result many of their
oncogene
identifications are simply highly educated guesses that are not
exclusive.
NKX2-1
The most common and strongest focal amplification was located on the q arm of chromosome 14, and contained no known oncogenes. The authors pinpointed the region of amplification to a short 480-Kb region containing only two genes, MBIP and NKX2-1 (Panel 6 A, B and C). To confirm this amplification, Weir et al. conducted FISH assays using probes for NKX2-1 on 330 tumors. The signal was amplified in 12% of samples, and the level of amplification was as high as 100 fold. This amount of amplification is significantly higher than the 14 fold seen in their SNP assay (Panel 6 D), indicating that the SNP assay has a lower levels of signal saturation and high non-tumor content. Sequencing NKX2-1 or MBIP identified no mutations leading them to conclude that the tumorigenic effect was provided by the WT gene.



Adapted from Weir et al.
Conclusion:
Weir et al. conducted a genome wide SNP array on 528 lung adenocarcinoma tumor resection samples, to identify regions of copy-number alterations. Adenocarcinomas proved to be malignancies with highly altered genomes. They identified 26 large scale deletion and amplification events and 31 focal events. Large scale events proved to be uninformative because their large size made it statistically difficult to define a particular gene as the target of the amplification or deletion. In addition, large non-tumor DNA contamination (up to 78%) of their sample reduced the overall signal and made it impossible to pinpoint the epicenter of the event. However, when the data is analyzed as a whole, the pattern of amplification and deletion is remarkable and may still prove to be useful when combined with other oncogenomic data. For example, information about large scale deletions and amplifications can be combined with gene expression microarrays to pinpoint tumor contributory genes.
Focal
copy-number changes were more useful
for identifying causative genes involved in the development lung
adenocarcinoma. Focal deletions pointed
out previously documented
and new candidate tumor supresor genes. Sequencing the new tumor
suppressor
candidates in cancer tissues identified significant mutations only in
PTPRD,
making it the strongest tumor suppressor candidate identified in this
study.
Focal amplifications lead to the identification of numerous previously
reported
oncogenes and 2 new oncogene candidates. The new candidate NKX2-1, a
predicted
transcription factor n, was associated with the most common
amplification.
Interestingly, NKX2-1 is known to be specifically critical in lung
development,
indicating that it may be a lung specific oncogene. FISH
assays using probes directed at NKX2-1
demonstrated that this region is amplified up to 100 fold in tested
tumor samples.
NKX2-1 is upregulated in multiple adenocarcinoma cell lines where its
suppression
leads to reduced cell viability.
This study identified many regions of copy
number change;
however, they only investigated several of them carefully. This paper
contains
far more usable data then the authors presented or discussed. Many more new candidate oncogenes and tumor
suppressors
can be pulled out of this study. To accomplish this, the same types of
experiments
that were conducted to demonstrate the identity of NKX2-1 must be
conducted on
many other regions of copy number change. Since the average size of a
focal
event was grater then one mega base, it is also likely that even
regions that
were assigned known oncogenes/tumor suppressors, may still contain
other
causative cancer genes. To increase sensitivity and reduce the size of
the
event epicenter, more tumor samples must be added to this study. In addition focal events shorter than 8 SNPs
were not considered significant and were thrown out at the beginning of
the
study. Yet, an event of this size or even smaller, is extremely likely
to
contain a whole gene, part of a gene or even an intron that may have a
direct
and robust cancer causing phenotype. In fact, it is these short
segments of conserved
copy number change that may prove to be the most powerful identifiers
of cancer
genes.
In addition to
failures to identify genes in the focal events data set, a
large portion of the data set that contained copy number events greater
than
half of a chromosome were not at all characterized due to the huge
number of genes
they encompassed. This data can be utilized by combining it with other
SNP
array experiments, microarrays and proteomic information. This data set
has a
lot of information that can be exploited by computational biologists,
and must
be made publically available.
(1) Weir, B. A., Woo, M. S., Getz, G., Perner, S., Ding, L., Beroukhim, R., Lin, W. M., Province, M. A., Kraja, A., Johnson, L. A., Shah, K., Sato, M., Thomas, R. K., Barletta, J. A., Borecki, I. B., Broderick, S., Chang, A. C., Chiang, D. Y., Chirieac, L. R., Cho, J., Fujii, Y., Gazdar, A. F., Giordano, T., Greulich, H., Hanna, M., Johnson, B. E., Kris, M. G., Lash, A., Lin, L., Lindeman, N., Mardis, E. R., McPherson, J. D., Minna, J. D., Morgan, M. B., Nadel, M., Orringer, M. B., Osborne, J. R., Ozenberger, B., Ramos, A. H., Robinson, J., Roth, J. A., Rusch, V., Sasaki, H., Shepherd, F., Sougnez, C., Spitz, M. R., Tsao, M. S., Twomey, D., Verhaak, R. G., Weinstock, G. M., Wheeler, D. A., Winckler, W., Yoshizawa, A., Yu, S., Zakowski, M. F., Zhang, Q., Beer, D. G., Wistuba, II, Watson, M. A., Garraway, L. A., Ladanyi, M., Travis, W. D., Pao, W., Rubin, M. A., Gabriel, S. B., Gibbs, R. A., Varmus, H. E., Wilson, R. K., Lander, E. S., and Meyerson, M. (2007) Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893-8.
(2) Sun, S., Schiller, J. H., and Gazdar, A. F. (2007) Lung cancer in never smokers--a different disease. Nat Rev Cancer 7, 778-90.