Characterizing the Cancer Genome in Lung Adenocarcinoma.

A presentation by Peter Belenky

Based on Weir et al. Characterizing the cancer genome in lung adenocarcinoma. Nature December 2008.

Genetics 144: Oncogenomics
Dartmouth Medical School
Course Director: Charles Brenner, Ph.D.
Spring 2008

Overview

The development of cancer is defined by an accumulation of mutations that provide a growth advantage to a particular cell.  A strong argument can be made that the  progress of cancer is analogous to Darwinian evolution seen in the zoological world.  Cancer development is not an organized process and as a result each tumor is likely to have a largely unique combination of genetic alterations. Two separate adenocarcinoma tumors diagnosed by the same pathologist as being at the same stage and originating from the same cell subtype are expected to have a highly divergent genetic finger print. However within these highly variant genetic alterations there are numerous common occurrences that appear in large subsets of tumors.

Identifying the commonalities between tumors can shed light on the biology of cancer development, provide targets of therapy, and help improve diagnosis. Genome wide searches for genetic alterations are an increasingly popular method for identifying commonalities between cancers. The paper presented within this web page uses single nucleotide polymorphism (SNP) arrays to characterize copy-number alterations in lung adenocarcinoma. Weir et al found 26 large scale events and 31 focal events that were prevalent in their tumor samples.  Focal events contained multiple known tumor suppressers and oncogenes, in addition to some novel possible tumor suppressors and oncogenes. The most common amplification (14p13.3) contained the proposed transcription factor NRX2-1. NRX1-2 is overexpressed in multiple lung cancer cell lines where it improves cell viability and anchorage independent cell growth.   

PAB1.jpg



Introduction

        Primary lung adenocarcinoma is a non-small cell lung cancer (NSCLC), affecting the peripheral tissues of the lung. Lung adenocarcinoma accounts for 30-40% of all lung cancers. This cancer is strongly correlated with smoking, however about 25% of all patients with lung adenocarcinoma have never smoked. Surgical resection is the most common treatment and it is often paired with chemo or radiation therapy.  Depending on the choice of treatment and the stage at which the cancer was diagnosed, the five year survival rate can be anywhere from 50% to 2%.                                                                                                                                                                 Closely adapted from; Sun et al., Nature Reviews,  2007

    Nicotine derived nitrosoaminoketone NNK (Panel 1 A) is the primary mutagen produced from nicotine. NNK covalently binds to DNA forming bulky
adducts leading to numerous mutations. The most common mutations seen in lung adenocarcinoma are activating mutations in KRAS and EGFR oncogenes and disrupting mutations in the p53 tumor suppressor. Recent evidence indicates that there is a striking mutation pattern difference between smoking induced adenocarcinoma and adenocarcinoma in non-smokes. Smokers are more likely to have mutations in KRAS, where as non-smokers tend to have mutations in EGFR (Pane 1B, C). Weir et al included samples from both groups in their study.


PAB2.jpg




Results

Method

                Weir et al started with 575 DNA samples obtained from lung tumors 528 of which were diagnosed as originating from primary lung adenocarcinomas. In addition to tumor DNA, they had 439 matched and 53 unmatched normal non tumor DNA samples. Tumor and not tumor samples were collected from 8 different locations in the US and abroad. Samples in this study had an equal gender distribution, and included a broad range of patients of varying age, cancer progression, and smoking history.

Tumor and paired normal DNAs were SNP genotyped using the Affymetrix 500K Human Mapping Array Set, StyI chip (Panel 2A). DNA was digested using StyI, ligated to an adaptor, and PCR amplified. The amplified DNA was then fragmented using DnaseI, labeled, and hybridized to the chip. The StyI chip contains probe sets for 238000 individual SNPs. For each tumor at each probe, the researchers calculated an intensity ratio between the tumor DNA and normal DNA samples, allowing Weir et al to measure copy number for each of the 238K SNPs.

Weir et al., analyzed their raw data using GLAD (gain or loss analysis of DNA) to produce segmented copy numbers for each tumor. Part of this initial data analysis included a quality control step. Samples were discarded due to: technical failure, low SNP identity to normal sample, too much copy # variation (>100 events), and high copy number similarity to the normal sample, indicating stromal contamination. After GLAD and QC Weir et al had 371 high quality tumor samples and 242 matched normal samples (Panel2B). This data was further analyzed using the GISTIC (genome identification of significant targets in cancer) algorithm (Panel 2C). GISTIC calculates a score based on amplitude and frequency of copy number change at each SNP.  Amplifications or deletions were considered significant only if they covered more than 8 consecutive SNPs.







                                                                     Adapted from www.affymetrix.com
Large Scale Events

Using GISTIC they identified 31 focal events (<50% of a chromosome arm) and 26 large scale events (>50 % of a chromosome arm). Weir et al. identified substantially more large scale events than the number found by any other study (10 amplifications and 16 deletions).  When the copy number data for each of the 371 tumors is ordered by intensity of inter chromosomal variation and divided into tertiles, a distinct pattern of amplification and deletion emerges (Panel 3A). For example, the p arm of chromosome 5 shows strong red shading, indicating high frequency and amplitude of amplification, whereas the p arm of chromosome 8 is highly blue, indicating high frequency and amplitude of deletion. GISTIC data can also be plotted as a function of q value (false discovery rate) at each chromosome position (Panel 3 B). This data shows numerous large scale amplification and deletion events above the .25 (q value) cut of. Because a large scale event covers so much genetic material, they could not match large scale events with particular gene events or clinical parameters with statistical significance.

PAB3.jpg
                                                                                                                                                                                                                                                  
Adapted from Weir et al.


LOH

Large and small scale deletions are likely to lead to loss of heterozygosity (LOH). Weir et al. were unable to effectively characterize genome wide LOH because of the high non-tumor content of their sample. They estimate non-tumor DNA content to be 78%, 65% and 50% in their botom, middlle and top tertiles respectively.  Weir et al report LOH only for their top tertile (Panel 4 A and B). As expected LOH in Panel 4B closely matches large and focal deletions plotted in Panels3 A, B and C.

PAB4.jpg

                                                                                                        Adapted from Weir et al.
Focal Delitions

After analyzing large scale events without much success, Weir et al. started to look at focal deletions and amplifications. A sufficiently localized focal deletion event is likely to uncover a tumor suppressor gene. They analyzed 5 of the most statistically significant focal deletions and found a known or presumed tumor suppressor gene for each deletion (bottom of Panel 5). Previously known tumor suppressors included: CDKN2A/ CDKN2B, inhibitors of Cdk4/6; PTEN, a phosphatase involved in inhibition of the AKT pathway; and RB1, a regulator of cell progression through G1. New candidate tumor suppressors were: PTPRD, a tyrosine phosphatase; PDE4D, a phosphodiesterase that degrades cAMP; and AUTS2, a protein of unknown function. Based on the information available they could not conclude that the new candidates had a specific causative effect as opposed to being passenger mutations. To further test the validity of their unknowns as tumor targets, they sequenced the exons of AUTS2, PDE4D, and PTPRD in primary tumor samples. AUTS2, and PDE4D, had no validating mutation, but PTPRD had 11 mutations in total of 188 samples. Most significantly, 3 of the mutations inactivated the tyrosine phosphatase domain.

They note that this method, will likely they miss many significant tumor genes. For example, the chromosome location encoding TP53, which is mutated in up to 50% of all adenocarcinomas, shows no large or small scale deletions. This statement is clearly obvious, because many tumor suppressors are likely to be inactivated through: direct mutations; delitions, amplifications and mutations in non-coding DNA; disrupted mRNA stability; and many other alterations that would not be identified by this study.  

PAB5.jpg
                                                                                                                                                                                                 Adapted from Weir et al.
Focal Amplifications

                Focal amplifications are likely to pinpoint oncogenes. They list 17 of their strongest amplifications at the top of Table1 (Panel 5). Scanning the list we see the names of many known and familiar oncogenes such as KRAS, EGFR, and MYC. In addition to the known genes, some of their amplification events contained no identifiable oncogenes and two pointed two new and untested candidates. A small amplification of the p arm of chromosome 6 contained two genes; from witch they chose VEGFA as the likely oncogene because it encodes a vascular endothelial growth factor. They did not pursue the VEGFA story any further.  It is important to observe that the average size of their focal amplification is greater than 1 Mb, an aria likely to contain multiple genes. As a result many of their oncogene identifications are simply highly educated guesses that are not exclusive.


NKX2-1

The most common and strongest focal amplification was located on the q arm of chromosome 14, and contained no known oncogenes.  The authors pinpointed the region of amplification to a short 480-Kb region containing only two genes, MBIP and NKX2-1 (Panel 6 A, B and C). To confirm this amplification, Weir et al. conducted FISH assays using probes for NKX2-1 on 330 tumors. The signal was amplified in 12% of samples, and the level of amplification was as high as 100 fold. This amount of amplification is significantly higher than the 14 fold seen in their SNP assay (Panel 6 D), indicating that the SNP assay has a lower levels of signal saturation and high non-tumor content.  Sequencing NKX2-1 or MBIP identified no mutations leading them to conclude that the tumorigenic effect was provided by the WT gene.

PAB6.jpg

                                                                                                                                                                                              Adapted from Weir et al.

    To isolate NKX1-2 or MBIP as the causative amplification they used RNAi to knockdown both proteins in adenocarcinoma cell lines. NCI-H2009 cells have highly amplified levels of NKX2-1 whereas A549 cells do not (Panel  7A). As a result, RNAi targeted at NKX2-1 significantly reduces NKX2-1 protein levels, and anchorage independent growth in NCI-H2009 cells, but has no effect on A549 cells (Panel 7 Aand B). RNAi targeted at MBIP reduces MBIP protein levels, but does not reduce anchorage independent growth of NCI-H2009 cells. From this data they conclude that amplification NKX2-1 responsible for the tumorigenic effect, and the MBIP is merely a passenger amplification.  Interestingly, they also tested cell viability of NCI-H2009 cells, and found that RNAi targeted at NKX2-1 also reduced overall cell viability (Panel 8). This indicates that the reduced anchorage independent cell growth observed in Panel 7B upon RNAi treatment was simply the result of reduced cell viability. Surprisingly, Weir et al. do not discuss this possibility and simply imply that the results are independent of each other.

PAB7%20copy.jpg
PAB8.jpg

                                                                                                                                                                                                            Adapted from Weir et al.
Conclusion:

                Weir et al. conducted a genome wide SNP array on 528 lung adenocarcinoma tumor resection samples, to identify regions of copy-number alterations. Adenocarcinomas proved to be malignancies with highly altered genomes. They identified 26 large scale deletion and amplification events and 31 focal events. Large scale events proved to be uninformative because their large size made it statistically difficult to define a particular gene as the target of the amplification or deletion. In addition, large non-tumor DNA contamination (up to 78%) of their sample reduced the overall signal and made it impossible to pinpoint the epicenter of the event.  However, when the data is analyzed as a whole, the pattern of amplification and deletion is remarkable and may still prove to be useful when combined with other oncogenomic data. For example, information about large scale deletions and amplifications can be combined with gene expression microarrays to pinpoint tumor contributory genes.

 Focal copy-number changes were more useful for identifying causative genes involved in the development lung adenocarcinoma.  Focal deletions pointed out previously documented and new candidate tumor supresor genes. Sequencing the new tumor suppressor candidates in cancer tissues identified significant mutations only in PTPRD, making it the strongest tumor suppressor candidate identified in this study. Focal amplifications lead to the identification of numerous previously reported oncogenes and 2 new oncogene candidates. The new candidate NKX2-1, a predicted transcription factor n, was associated with the most common amplification. Interestingly, NKX2-1 is known to be specifically critical in lung development, indicating that it may be a lung specific oncogene.  FISH assays using probes directed at NKX2-1 demonstrated that this region is amplified up to 100 fold in tested tumor samples. NKX2-1 is upregulated in multiple adenocarcinoma cell lines where its suppression leads to reduced cell viability.
   
    This study identified many regions of copy number change; however, they only investigated several of them carefully. This paper contains far more usable data then the authors presented or discussed.  Many more new candidate oncogenes and tumor suppressors can be pulled out of this study. To accomplish this, the same types of experiments that were conducted to demonstrate the identity of NKX2-1 must be conducted on many other regions of copy number change. Since the average size of a focal event was grater then one mega base, it is also likely that even regions that were assigned known oncogenes/tumor suppressors, may still contain other causative cancer genes. To increase sensitivity and reduce the size of the event epicenter, more tumor samples must be added to this study.  In addition focal events shorter than 8 SNPs were not considered significant and were thrown out at the beginning of the study. Yet, an event of this size or even smaller, is extremely likely to contain a whole gene, part of a gene or even an intron that may have a direct and robust cancer causing phenotype. In fact, it is these short segments of conserved copy number change that may prove to be the most powerful identifiers of cancer genes. 

In addition to failures to identify genes in the focal events data set, a large portion of the data set that contained copy number events greater than half of a chromosome were not at all characterized due to the huge number of genes they encompassed. This data can be utilized by combining it with other SNP array experiments, microarrays and proteomic information. This data set has a lot of information that can be exploited by computational biologists, and must be made publically available. 


 
   

(1)          Weir, B. A., Woo, M. S., Getz, G., Perner, S., Ding, L., Beroukhim, R., Lin, W. M., Province, M. A., Kraja, A., Johnson, L. A., Shah, K., Sato, M., Thomas, R. K., Barletta, J. A., Borecki, I. B., Broderick, S., Chang, A. C., Chiang, D. Y., Chirieac, L. R., Cho, J., Fujii, Y., Gazdar, A. F., Giordano, T., Greulich, H., Hanna, M., Johnson, B. E., Kris, M. G., Lash, A., Lin, L., Lindeman, N., Mardis, E. R., McPherson, J. D., Minna, J. D., Morgan, M. B., Nadel, M., Orringer, M. B., Osborne, J. R., Ozenberger, B., Ramos, A. H., Robinson, J., Roth, J. A., Rusch, V., Sasaki, H., Shepherd, F., Sougnez, C., Spitz, M. R., Tsao, M. S., Twomey, D., Verhaak, R. G., Weinstock, G. M., Wheeler, D. A., Winckler, W., Yoshizawa, A., Yu, S., Zakowski, M. F., Zhang, Q., Beer, D. G., Wistuba, II, Watson, M. A., Garraway, L. A., Ladanyi, M., Travis, W. D., Pao, W., Rubin, M. A., Gabriel, S. B., Gibbs, R. A., Varmus, H. E., Wilson, R. K., Lander, E. S., and Meyerson, M. (2007) Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893-8.

(2)          Sun, S., Schiller, J. H., and Gazdar, A. F. (2007) Lung cancer in never smokers--a different disease. Nat Rev Cancer 7, 778-90.