Oncogenic Pathway Signatures in
Human Cancers as a Guide to Targeted Therapies

Seung Hyuk Choi

Oncogenomics (Genetics 144)

January 30, 2006


I. "Overview of Microarray-based Cancer Gene Expression Profiling" by Sridhar Ramaswamy and Todd R. Golub (1)

i. Overview

Gene expression studies in human cancer can identify genetic markers of malignant transformation.Traditionally, differential display (2), serial analysis of gene expression (3). and representational differential analysis (4) have been used for identifying genes expressed in human tumors. Although these methods are powerful, they are technically difficult, require large-scale DNA sequencing, and only allow for the study of a few different biologic samples at one time.

In contrast, DNA microarray-based gene expression profiling relies on nucleic acid hybridization and the use of nucleic acid polymers, immobilized on a solid surface, as probes for complementary gene sequences (5). Expression profiling techniques have been used to simultaneously monitor the expression of thousands of genes from human tumor samples. They are relatively easy to use and can be applied to large numbers of samples in parallel. Although a number of competing microarray technologies exist, two platforms (cDNA and oligonucleotide microarrays) are currently used by a majority of investigators and both are effective (Figure 1)

With cDNA arrays, polymerase chain reaction products of cDNA clone inserts representing genes of interest are spotted systematically on nitrocellulose filters or glass slides (6). Spotted arrays are constructed using cDNA collections (ie, libraries) that can be focused on genes expressed in a particular context or cell type (eg, the lymphochip, which contains genes known to be important in lymphocyte biology). The primary benefit of spotted arrays is that they can be made by individual investigators, are easily customizable, however, managing large clone libraries can be a daunting task for most laboratories,, and making high-quality arrays can be difficult.

Oligonucleotide microarrays differ in a number of important ways. Oligonucleotide probes for different genes can be deposited or synthesized directly on the surface of a silicon wafer in a patterned manner (7). Oligonucleotides offer greater specificity than cDNAs, because they can be tailored to minimize chances of cross-hybridization, and sequences up to 60 nucleotides have been used effectively. Major advantages of this approach include uniformity of probe length and the ability to discern splice variants..

The hybridization of a test sample to an array can be detected in one of two ways. cDNA microarrays are commonly queried simultaneously with cDNAs derived from experimental and reference RNA samples that have been differentially labeled with two fluorophores to allow for the quantification of differential gene expression, and expression values are reported as ratios between two fluorescent values. Alternatively, the Affymetrix oligonucleotide system uses a single color fluorescent label, where experimental mRNA is enzymatically amplified, biotin-labeled for detection, hybridized to the wafer, and detected through the binding of a fluorescent compound (Figure 1).

ii. Tumor Sampling

Tumors from biopsy are heterogeneous mixtures of different cell types, including malignant cells with varying degrees of differentiation, stromal elements, blood vessels, and inflammatory cells. Two tumors with similar clinical stages can vary markedly in grade. Tumors of different grades might potentially differ in gene expression, and different markers can be expressed either by malignant cells or by other cellular elements. Because this heterogeneity can complicate the interpretation of gene expression studies, sample selection is an important issue.

The most obvious method for sample selection involves careful histopathologic examination of specimens before microarray analysis. In addition, numerous groups are attempting to focus on the malignant components of this heterogeneous cellular mix using a variety of microdissection techniques. Laser capture microdissection allows for the isolation of individual cells from a tumor section and has been used to isolate cancer cell RNA for microarray studies (8,9). However, it is difficult to obtain adequate amounts of high-quality RNA for expression profiling with this technique, thus limiting its utility. Further refinement of this and other approaches to isolating pure-cell populations should be encouraged. However, a theoretical limitation of focusing only on malignant tumor components relates to the growing appreciation that tumor-stroma, tumor-endothelial, and tumor-immune cell interactions play critical roles in tumor progression. Expression signatures from nonmalignant cells may also be informative. For these reasons, we currently favor using whole tumors enriched in malignant cells.

iii. Variability

 Multiple sources of variation that must be understood in evaluating any microarray experiment include the following: (A) varying cellular composition among tumors, (B) genetic heterogeneity within tumors due to selection and genomic instability, (C) differences in sample preparation, (D) nonspecific cross-hybridization of probes, and (E) differences between individual microarrays.

In general, biologic variation is the major source of variation in gene expression experiments. Increasing the sample number can help in understanding the range of biologic variation in an experiment. Variation due to technical factors can be addressed by replicating sample preparation or array hybridization (10). Although most high throughput expression profiling centers have informal criteria for what constitutes bad data, there are no generally accepted guidelines.

iv. Data Analysis

Gene expression studies pose many challenges for data organization, storage, and analysis (11,12). Present technology allows for the evaluation of nearly the entire genome from a single biologic sample. To date, the computational analysis of gene expression data has centered on two approaches (Figure 2). Unsupervised learning, or clustering, involves the aggregation of a diverse collection of data into clusters based on different features in a data set (13,14). For example, one could divide a group of people into clusters based on any combination of eye color, waist size, or height. Similarly, one can gather data about the various expressed genes in a collection of tumor samples and then cluster the samples as best as possible into groups based on the similarity of their aggregate expression profiles. Alternatively, one could cluster genes across all samples, to identify genes that share similar patterns of expression in varying biologic contexts. Such approaches have the advantage of being unbiased and allow for the identification of structure in a complex data set without making any a priori assumptions. However, because many different relationships are possible in a complex data set, the predominant structure uncovered by clustering may not necessarily reflect clinical or biologic distinctions of interest.

In contrast, supervised learning incorporates the knowledge of class label information to make distinctions of interest. A training data set is used to select those features that best make a distinction. These features are then applied to an independent test data set to validate the ability of selected features to make that distinction. For example, one could select a subset of expressed genes that are best able to distinguish between two cancer types and build a computational model that uses these selected genes to sort an independent, unlabelled collection of those tumor types into the two groups of interest. However, supervised learning is dependent on accurate sample labels, which can be an issue given the limitations of histopathologic cancer diagnosis (Figure 2).

v. Outcome Prediction

Although it is difficult to predict whether chemotherapy will be effective for individual patients, DNA microarrays offer the opportunity to ask whether tumor expression profiles can be used to predict chemosensitivity. The NCI60 panel of 60 cancer cell lines is used extensively at the National Cancer Institute as a screen for drug sensitivity. These lines have been treated with more than 70,000 agents, one at a time and independently. Scherf et al attempted to correlate gene expression and drug sensitivity patterns for 118 drugs with known mechanisms of action in the NCI60 panel using clustering (15). They described correlations between the expression of certain genes with sensitivity or resistance of the NCI60 panel to several drugs. For example, dihydropyrimidine dehydrogenase expression, the rate-limiting enzyme in fluorouracil metabolism, was inversely correlated with sensitivity to fluorouracil. More recently, Staunton et al used supervised learning to demonstrate that statistically significant prediction of chemosensitivity is possible for some compounds using this NCI60 cell-line system (16).

Investigators have demonstrated the utility of using pretreatment gene expression profiling to determine prognosis. In a retrospective study of 38 patients with diffuse large B-cell lymphoma (DLBCL), Alizadeh et al clustered cDNA microarray data to define new subtypes of this lymphoma (17). They found that these subtypes differentially express genes that correlate with either an activated peripheral-blood B-cell (AB) or a normal germinal center B-cell (GCB) phenotype. Because all patients were uniformly treated with anthracycline-based chemotherapy, they then correlated treatment outcome with these two subsets. Although overall 5-year survival was 52%, 76% of GCB DLBCL patients were alive at 5 years compared withh 16% of AB DLBCL patients. They also demonstrated that expression profiling can add value to existing clinical prognostic indices. In considering 24 patients with low-risk DLBCL tumors, as defined by the International Prognostic Index ([IPI] score 0 to 2), the AB subtype was again at higher risk of dying despite standard treatment in comparison with those with the GCB subtype (Figure 3). Although a small study, this work was the first to demonstrate expression-based correlates of outcome.

II. "Oncogenic pathway signatures in human cancers as a guide to targeted therapies" by Bild et al. (18)

i. Characteristic Oncogenic Signatures

When evaluated in several large collections of human cancers, these gene expression signatures identify patterns of pathway deregulation in tumours and clinically relevant associations with disease outcomes. Combining signature-based predictions across several pathways identifies coordinated patterns of pathway deregulation that distinguish between specific cancers and tumour subtypes. Clustering tumours based on pathway signatures further defines prognosis in respective patient subsets, demonstrating that patterns of oncogenic pathway deregulation underlie the development of the oncogenic phenotype and reflect the biology and outcome of specific cancers. Predictions of pathway deregulation in cancer cell lines are also shown to predict the sensitivity to therapeutic agents that target components of the pathway. Linking pathway deregulation with sensitivity to therapeutics that target components of the pathway provides an opportunity to make use of these oncogenic pathway signatures to guide the use of targeted therapeutics.

Primary mammary epithelial cell (HMECs) is used to develop a series of pathway signatures (Box 1). Recombinant adenoviruses were used to express various oncogenic activities in an otherwise quiescent cell, thereby specifically isolating the subsequent events as defined by the activation/deregulation of that single pathway. RNA from multiple independent infections was collected for DNA microarray analysis using Affymetrix Human Genome U133 Plus 2.0 Array. Gene expression signatures that reflect the activity of a given pathway are identified using supervised classification methods of analysis previously described (19). The analysis selects a set of genes for which the expression levels are most highly correlated with the classification of HMEC samples into oncogene-activated/deregulated versus control (green fluorescent protein, GFP). The dominant principal components from such a set of genes then defines a relevant phenotype-related metagene, and regression models assign the relative probability of pathway deregulation in tumour or cell line samples.

ii. Gene expression patterns that predict oncogenic pathway deregulation

The different oncogenic signatures distinguish cells expressing the oncogenic activity from control cells (Figure 1a and b). Use of the first three principal components from each signature, evaluated across all experimental samples, demonstrates that the patterns of expression in each signature are specific to each pathway; the gene expression patterns accurately distinguish the individual oncogenic effects despite overlapping downstream consequences (Figure 1b). To evaluate more formally the predictive validity and robustness of the pathway signatures, a leave-one-out cross validation study was applied to the set of pathway predictors. This analysis demonstrates that these signatures of oncogenic pathways can accurately predict the cells expressing the oncogenic activity from the control cells.

iii. Validation of pathway predictions in tumors

Pathway signatures were regenerated from the genes common to both human and mouse data sets; the pathway signatures matching mouse models that could be used for validation: Myc, Ras and E2F3. Across the set of mouse tumors, this analysis evaluates the relative probability of pathway deregulation of each tumour-that is, the predicted status of the pathway in each mouse tumour based only on the signatures developed in HMECs. These predictions are displayed as a colour map: red indicates a high probability of pathway deregulation and blue indicates a low probability, with predictions sorted by the relative probability of pathway deregulation. As shown in Figure. 2a, the pathway predictions exhibit close correlation with the molecular basis for tumour induction. For instance, the five mouse mammary tumour virus (MMTV)-MYC tumours exhibit the highest probability of Myc pathway deregulation, whereas the six Rb null tumours exhibit the highest probability of E2F3 deregulation. The probability of Ras pathway activation was highest in the MMTV-HRAS animals and MMTV-MYC tumours; this indication of Ras pathway activation in the MMTV-MYC tumours is consistent with past results demonstrating a selection for Ras mutations in these tumors (20). There was a consistent prediction of Ras pathway deregulation within these tumors when compared to the set of samples from control lung tissue (Figure 2b). Taken together, these results strongly support the conclusion that the various oncogenic pathway signatures do reliably reflect pathway status under a variety of circumstances, and thus can serve as useful tools to probe the status of these pathways.

iv. Hierarchical Clustering of Predictions of Pathway deregulation in Human Tumors

Previous work has linked Ras activation with the development of adenocarcinomas of the lung (21,22). As shown in Figure 2c, a set of non-small cell lung carcinoma (NSCLC) samples are used to predict the pathway status and then sorted according to predicted Ras activity (Ras mutation status is indicated by an asterisk). Ras pathway status very clearly correlates with the histological subtype-most of the adenocarcinoma samples exhibit a high probability of Ras deregulation relative to the squamous cell carcinoma samples. Prediction of the status of the other pathways revealed a less distinct pattern, although each tended to be more active in the squamous cell carcinoma samples. This pattern becomes more evident in the analysis shown in Figure 7. An examination of Ras mutation identified 11 samples with K-Ras mutations, all confined to the adenocarcinomas (indicated by an asterisk in the figure). Overall, 14% of NSCLC tumours and 29% of the adenocarcinomas had K-Ras mutations in codon 12. Because nearly all of the adenocarcinomas exhibited Ras pathway deregulation, it seems that deregulation of the Ras pathway is indeed a characteristic of development of adenocarcinoma of the lung, and that this can occur as a result of Ras mutations as well as following other events that deregulate the pathway.

The real power in this approach is the ability to identify patterns of pathway deregulation, using hierarchical clustering, much the same as identifying patterns of gene expression. The hierarchical clustering of the lung cancer samples (Figure 3a, left panel) distinguished adenocarcinomas from squamous cell carcinomas, driven in part by the Ras pathway distinction. It is also evident that the tumors predicted as exhibiting relatively low Ras activity are generally predicted at higher levels of Myc, E2F3, þ-catenin and Src activity (clusters 1-3). Conversely, the tumours with relatively elevated Ras activity exhibited relatively lower levels of these other pathways (clusters 4-7). Independent of the tumour histopathology, concerted deregulation of Ras with þ-catenin, Src and Myc (cluster 8) identified a population of patients with poor survival-a median survival of 19.7 months versus 51.3 months for all other clusters (Figure 3a, right panel). This analysis demonstrates the ability of integrated pathway analysis, based on multiple signatures of component pathway deregulation, to define improved categorization of lung cancer patients.

Two additional examples made use of large sets of breast cancer samples (Figure 3b) and ovarian cancer samples (Figure 3c). Again, there were evident patterns of pathway deregulation, distinct from that seen in the lung samples, which characterized the breast and ovarian tumours. For breast cancer, there were two clusters of patients with good prognosis (clusters 2 and 4), and two clusters with poor prognosis (clusters 1 and 3). Furthermore, clusters 2 and 3, which both contain oestrogen receptor (ER)-positive tumours (and no discernable differences in HER2 status or other clinical parameters), show distinct survival rates (P-value = 0.07). Patients defined by cluster 5 (in which higher than average þ-catenin and Myc activities were predicted, and E2F3 activity was lower than average) exhibited very poor survival, again illustrating the importance of co-deregulation of multiple oncogenic pathways as a determinant of clinical outcome. A final analysis made use of an advanced stage (III or IV) ovarian cancer data set. The ovarian samples exhibited a dominant pattern of þ-catenin and Src deregulation, either elevated (cluster 1 and 2) or diminished (clusters 3-6). Notably, the co-deregulation of Src and þ-catenin defined by clusters 1 and 2 identifies a population of patients with very poor survival compared to other pathway clusters (median survival: 29.0 months versus 91.0 months) (Figure 3c, right panel).

v. Pathway Prediction and Therapeutic Agents that Target Pathways

Given the capacity of the gene expression signatures to predict deregulation of oncogenic signalling pathways, it is possible that we could predict sensitivity to a therapeutic agent that targets that pathway. Predicted pathway deregulation in a series of breast cancer cell lines is screened against potential therapeutic drugs. The results using the set of five pathway predictors, together with an initial collection of breast cancer cell lines, are shown in Figure 4a. In each case, the relative probabilities of pathway activation are predicted from the signature in a manner completely analogous to the prediction of pathway status in tumors. In most cases, there is a good correlation between biochemical measures of pathway activation and prediction based on gene expression signatures. An exception is with Ras, where there is not a significant correlation between the biochemical measure of pathway activation and pathway prediction, presumably reflecting additional events not measured in the biochemical assay. Clearly, the critical issue is whether the gene expression signature predicts drug sensitivity?this point is addressed by the dose?response assays in Figure 4b.

In parallel with mapping the pathway status, the cell lines were assayed with drugs known to target specific activities within given oncogenic pathways. The assays involve growth inhibition measurements using standard colorimetric assays (23). The result of testing the sensitivity of the cell lines to inhibitors of the Ras pathway using both a farnesyl transferase inhibitor (L-744,832) and a farnesylthiosalicylic acid (FTS) is shown in Figure 4b. In addition, a Src inhibitor (SU6656) was also used for these assays. In each case, the results show a close concordance and correlation between the probability of Ras and Src pathway deregulation based on the gene expression prediction, and the extent of cell proliferation inhibition by the respective drugs Figure 4b.These results confirm the ability of the defined 'pathway deregulation signatures' to also predict sensitivity to therapeutic agents that target the corresponding pathways.

III. Discussion and Implications

i. Summary

The development of an oncogenic state is a complex process involving the accumulation of multiple independent mutations that lead to deregulation of cell signalling pathways central to the control of cell growth and cell fate. The ability to define cancer subtypes, recurrence of disease and response to specific therapies using DNA microarray-based gene expression signatures has been demonstrated  the potential for using gene expression profiles for the analysis of oncogenic pathways. Gene expression signatures can be identified in order to reflect the activation status of several oncogenic pathways. When evaluated in several large collections of human cancers, these gene expression signatures identify patterns of pathway deregulation in tumors and clinically relevant associations with disease outcomes. Combining signature-based predictions across several pathways identifies coordinated patterns of pathway deregulation that distinguish between specific cancers and tumor subtypes. Clustering tumors based on pathway signatures further defines prognosis in respective patient subsets, demonstrating that patterns of oncogenic pathway deregulation underlie the development of the oncogenic phenotype and reflect the biology and outcome of specific cancers. Predictions of pathway deregulation in cancer cell lines are also shown to predict the sensitivity to therapeutic agents that target components of the pathway. Linking pathway deregulation with sensitivity to therapeutics that target components of the pathway provides an opportunity to make use of these oncogenic pathway signatures to guide the use of targeted therapeutics.

ii. Conculsion

In most instances, the consequence of mutations in proto-oncogenesor inactivation of tumour suppressor genes is the deregulation of cellular signalling pathways, which ultimately affects the expression of a variety of genes. Use of gene expression signatures that reflect the action of oncogenic pathway deregulation provides a strategy for measuring the functional consequence of these events. Undoubtedly, an ability to distinguish the deregulation of additional subpathways, as well as pathways reflective of additional aspects of tumorigenesis (apoptosis, DNA repair, and so on), will help to categorize further and understand the complexity of tumour development and the oncogenic process. Although the development of targeted biological agents holds the promise of a more precise matching of therapy with disease mechanism, it is nevertheless true that the success rate of single agents as well as the selection of combination therapies could be improved. The ability to predict the deregulation of various oncogenic pathways through gene expression analysis offers an opportunity to identify new therapeutic options for patients by providing a potential basis for guiding the use of pathway-specific drugs. The major value of this approach may be the capacity to direct combinations of therapiesómultiple drugs that target multiple pathwaysóbased on information that specifies the activation state of the pathways.

IV. References

1.  Ramaswamy S. & Golub TR: DNA microarrays in clinical oncology. J. Clin. Oncol. 20: 1932ñ1941 (2002)

2.  Liang P & Pardee AB: Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257: 967-971 (1992)

3.  Velculescu VE, Zhang L, Vogelstein B: Serial analysis of gene expression. Science 276: 1268-1272 (1995)

4.  Diatchenko L, Lau YFC, Campbell AP, et al: Suppression subtractive hybridization: A method for generating differentially regulated or tissue-specific cDNA probes and libraries. Proc Natl Acad Sci USA 93: 6025-6030 (1996)

5.  Southern E, Mir K, Shchepinov M: Molecular interactions on microarrays. Nat Genet 21: 5-9 (1999)

6.  Schena M, Shalon D, Davis RW, et al: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467-470 (1995)

7.  Lockhart DJ, Dong H, Byrne MC, et al: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14: 1675-1680 (1996)

8.  Emmert-Buck M, Bonner RF, Smith PD, et al: Laser capture microdissection. Science 274: 998-1001 (1996)

9.  Kitahara O, Furukawa Y, Tanaka T, et al: Alterations of gene expression during colorectal carcinogenesis revealed by cDNA microarrays after laser-capture microdissection of tumor tissues and normal epithelia. Cancer Res 61: 3544-3549 (2001)

10. Lee ML, Kuo FC, Whitmore GA, et al: Importance of replication in microarray gene expression studies: Statistical methods and evidence from replicative cDNA hybridizations. Proc Natl Acad Sci USA 97: 9834-9839 (2000)

11. Ermolaeva O, Rastogi M, Pruitt KD, et al: Data management and analysis for gene expression arrays. Nat Genet 20: 19-23 (1998)

12. Quackenbush J: Computational analysis of microarray data. Nat Rev Genet 2: 418-427 (2001)

13. Eisen MB, Spellman PT, Brown PO, et al: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863-14868 (1998)

14. Tamayo P, Slonim DK, Mesirov J, et al: Interpreting patterns of gene expression with self-organizing maps: Methods and applications to hematopoietic differentiation. Proc Natl Acad Sci USA 96: 2907-2912 (1999)

15. Scherf U, Ross DT, Waltham M, et al: A gene expression database for the molecular pharmacology of cancer. Nat Genetics 24: 236-244 (2000)

16. Stauton JE, Slonim DK, Coller HA, et al: Chemosensitivity prediction by gene expression profiling in cancer cell lines. Proc Natl Acad Sci USA 98: 10787-10792 (2001)

17. Alizadeh AA, Eisen MB, Davis RE, et al: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503-511 (2000)

18. Bild A, et al: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439, 353-357 (2006)

19. West M, et al: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl Acad. Sci. USA 98, 11462ñ11467 (2001)

20. D'Crus CM, et al: c-MYC induces mammary tumorigenesis by means of a preferred pathway involving spontaneous Kras2 mutations. Nature Med. 7, 235ñ239 (2001)

21. Rodenhuis S, et al: Mutational activation of the K-ras oncogene and the effect of chemotherapy in advanced adenocarcinoma of the lung: a prospective study. J. Clin. Oncol. 15, 285ñ291

22. Salgia R & Skarin AT: Molecular abnormalitities in lung cancer. J. Clin. Oncol. 16, 1207ñ1217 (1998)

23. Riss TL & Moravec RA: Comparison of MTT, Xtt, and a novel tetrazolium compound for MTS for in vitro proliferation and chemosensitivity assays. Mol. Biol. Cell 3, 184a (1992)