Preliminary evidence of different selection pressures on cancer cells as compared to normal tissues
© Ovens and Naugler; licensee BioMed Central Ltd. 2012
Received: 16 August 2012
Accepted: 3 November 2012
Published: 12 November 2012
Cancer is characterized by both a high mutation rate as well as high rates of cell division and cell death. We postulate that these conditions will result in the eventual mutational inactivation of genes not essential to the survival of the cancer cell, while mutations in essential genes will be eliminated by natural selection leaving molecular signatures of selection in genes required for survival and reproduction. By looking for signatures of natural selection in the genomes of cancer cells, it should therefore be possible to determine which genes have been essential for the development of a particular cancer.
We provide a proof of principle test of this idea by applying a test of neutrality (Nei-Gojobori Z-test of selection) to 139 cancer-related nucleotide sequences obtained from GenBank representing 46 cancer-derived genes.
Among cancer associated sequences, 10 genes showed molecular evidence of selection. Of these 10 genes, four showed molecular evidence of selection in non-cancer transcripts. Among non-cancer associated sequences, eight genes showed molecular evidence of selection, with four of these also showing selection in the cancer associated sequences.
These results provide preliminary evidence that the same genes may experience different selection pressures within normal and cancer tissues. Application of this technique could identify genes under unique selection pressure in cancer tissues and thereby indicate possible targets for therapeutic intervention.
Cancer cell clones evolve over the lifespan a tumour[1–3]. The selective pressures driving this clonal evolution are myriad and may include microenvironmental factors, immune system surveillance, competition with other cancer and somatic cells, and selective killing of cancer cells by surgery, chemotherapy and radiation[2–9]. Two features of cancer portend intense natural selection among cancer cells. The first is the observation that cancer cells (at least in the later stages of growth) experience a high rate of cell death. The second is the greatly increased rate of mutations in cancer cells[11–16]. For example, a recent large scale study identified mutations in 11% of protein coding genes examined over 756 cancer cell lines. Many of these mutations, even if they change the resulting protein sequence of the gene product may be considered to be “passenger” mutations that do not contribute to oncogenesis and are of no significance to the cancer cell[3, 12]. Indeed mutations in non-essential genes may even be adaptive to the cancer cell as they shed costly metabolic processes irrelevant to reproduction of the cancer cell.
The high mutation rate and rapid cellular turnover may be expected to form an intense environment for natural selection where mutations arise and are tested for functional importance through competition with other cells. Eventually, this environment may lead to the situation where many genes have been rendered nonfunctional by mutations and the subset of genes that have been important for the survival and multiplication of the cancer cells will have been preserved through constant selection of functional versions of these genes.
Evolutionary biologists have identified a number of methods for detecting molecular evidence of natural selection. These, so-called “tests of selection” attempt to differentiate neutral evolution (i.e. genetic drift) from Darwinian selection. One commonly used method compares ratios of synonymous and non-synonymous base substitutions. This approach has the advantage of being robust with regards to population growth, a confounding factor particularly important in the context of cancer cell growth. Synonymous base substitutions change the exonic base pair sequence but conserve the translated amino acid sequence (because of the degenerate nature of the DNA code). In contrast, nonsynonymous base pair substitutions change both the base pair sequence as well as the translated amino acid sequence. An increased rate of synonymous to nonsynonymous base substitutions provides evidence that the base sequence in question is or has been under natural selection to conserve the amino acid sequence (purifying selection). Less commonly, a sequence may exhibit an increased rate of nonsynonymous to synonymous base substitutions, indicating the base sequence in question has been under natural selection to change the ancestral amino acid sequence (diversifying selection). Perhaps the best described example of this is the diversifying selection shaping the peptide binding grooves of MHC class I molecules. We might expect that the majority of selection pressures on cancer cells would be in the form of purifying selection to maintain the function of essential genes. However it is also possible that diversifying selection also plays a role in cancer cell evolution, possibly in facilitating the exploitation of new microenvironments.
Here we test the hypothesis that due to the high mutation rates and increased cell turnover in cancer cells, genes of importance to the survival of the cancer cell should show molecular evidence of natural selection. Furthermore, we predict that in the majority of cases this selection would be in the form of purifying selection.
Materials and methods
Gene sequences used in the analyses
GenBank Accession Numbers (cancer-related sequences)
Probability of Null hypothesis (Hs=Hn) in cancer-related sequences
Type of selection
GenBank Accession Numbers (non- cancer-related sequences)
Probability of Null hypothesis (Hs=Hn) in non-cancer-related sequences
Type of selection
GI:229892268 GI:229892301 GI:229892299
Growth factor receptor
GI:429094 GI:429093 GI:429091
protease present in seminal plasma
GI:24497567 GI:24497563 GI:24497560
GI:386365498 GI:385648248 GI:385648249
candidate tumor suppressor
GI:15488980 GI:15277570 GI:38197497
GI:342187210 GI:342187198 GI:342187192
GI:223941890 GI:223941884 GI:223941878
protein tyrosine phosphatase
GI:323462168 GI:323276671 GI:323462166 GI:323462167
bone morphogenetic protein antagonist
GI:268370219 GI:268370215 GI:268370223 GI:268370217
tumour suppression gene
GI:331284157 GI:116174749 GI:331284159
GI:331284154 GI:331284152 GI:331284159
regulates the effect of the cAMP-dependent protein kinase signaling pathway
GI:350276262 GI:350276256 GI:350276260
GI:100913195 GI:100913193 GI:100913197
GI:41350319 GI:29171703 GI:29171704
negative regulator of wild type p53 activity
GI:262331568 GI:262331573 GI:262331571 GI:262331569
GI:48734726 GI:38197249 GI:15029674 GI:13435962
GI:253970505 GI:253970501 GI:253970497 GI:253970503
involved gene expression, cell signaling, and RNA processing and transport
Involved in NF-kappa-B activation and regulation of inflammation
GI:13111833 GI:27694435 GI:21411491
GI:51593770 GI:16198474 GI:22121998 GI:9255808
GI:40317615 GI:40317617 GI:40317619
Involved in down-regulation of the androgen receptor
GI:313882513 GI:30411006 GI:39963693 GI:24659234
GI:134053863 GI:134053924 GI:134053890
GI:34785158 GI:25058020 GI:288915540
GI:33874094 GI:25123208 GI:21328745
GI:46249372 GI:46249366 GI:46249370 GI:46249365
GI:33873809 GI:24934971 GI:211938417
GI:211938416 GI:310750386 GI:310750384
possibly involved in regulation of tumor cell growth and monocyte/macrophage-mediated immunological processes
growth-related cell surface protein
GI:114108213 GI:77415336 GI:111598967 GI:296923772
enzyme involved in ubiquinol-cytochrome c reductase complex
GI:85067502 GI:85067498 GI:39645887 GI:85067500
involved in the production of fatty acid hydroperoxides
GI:166795235 GI:238914823 GI:14495610
cell surface receptor that transduces calcium signals
GI:21040248 GI:341915375 GI:343183384
delivers activating signals to NK cells
GI:15489350 GI:164693182 GI:49456482
GI:284004909 GI:257796251 GI:257796250
GI:347360911 GI:166235149 GI:384475524
cytokine that controls the production, differentiation, and function of macrophages
GI:148664243 GI:148664200 GI:187950332 GI:15012012
involved in regulating aneuploidy, cell cycling, and cell death
GI:182252 GI:115528448 GI:324120948
GI:324120957 GI:324120955 GI:324120951 GI:324120950
adhesion and anti-adhesion protein; involved in cell signaling
GI:14709533 GI:33991397 GI:145701031
GI:291167774 GI:145701029 GI:291167776
serine protease; plays a role in hearing
GI:34785969 GI:166064049 GI:166064051
GI:166064049 GI:166064053 GI:166064055
GI:33873803 GI:110624585 GI:83641884 GI:83641883
GI:224177471 GI:1435190 GI:179571 GI:179575
involved in transcriptional initiation
GI:33872201 GI:262205658 GI:262205664
possible cell adhesion molecule
epithelial membrane protein
associated with central nervous system development and motor function
GI:19913528 GI:16924228 GI:14250064
Mediates enzymes, ion channels and other proteins
GI:49522882 GI:34782767 GI:33873543
signaling protein that mediates cell-cell interactions
GI:244790015 GI:244790004 GI:244790009 GI:244790019
GI:82571721 GI:34784984 GI:21951814 GI:21984183
Involved in vesicle trafficking and melanosome distribution
GI:213511729 GI:213511011 GI:213511507 GI:40807164 GI:15680022
multi-pass endoplasmic reticulum transmembrane protein
GI:3150001 GI:16306740 GI: 8648884
GI:62420874 GI:62420871 GI:62420872 GI:8308037
serine/threonine protein kinase
GI:164697166 GI:34528462 GI:336455029
likely involved in integrin signaling
GI:23273964 GI:262205557 GI:262205902
sulfotransferase; modifies glycan structures on ligands of the lymphocyte homing receptor L-selectin
GI:17389375 GI:13528905 GI:18490914
GI37694064 GI:37694063 GI:158254733
GI:76779232 GI:333033786 GI:163965363
GI:85397251 GI:85397957 GI:60116922
Component of nascent polypeptide-associated complex; prevents mistranslocation of proteins
GI:269914125 GI:269914123 GI:269914126 GI:269914124 GI:269954665
Analyses were performed using the Molecular Evolutionary Genetics Analysis (MEGA) software Version 5. Following sequence alignment using the ClustalW method, the Nei-Gojobori Z-Test of Selection was used to calculate the synonymous to nonsynonymous base substitution rates and the associated statistical probabilities. P-values of less than 0.05 were considered significant.
A total of 46 cancer-derived genes represented by 139 sequences were identified (Table1). No sequences were derived from propagated cell lines. However, we were unable to determine what proportion of examples were from primary tumors vs metastatic tumors. Of the 46 genes, nine genes showed evidence of purifying selection and 1 showed evidence of diversifying selection (Table1). Six genes showed molecular evidence of selection only in cancer associated sequences (all in the form of purifying selection), four genes showed evidence of selection only in non-cancer associated sequences (three cases of purifying selection and one case of diversifying selection), and finally four genes showed molecular evidence of selection in both cancer and non-cancer associated sequences (three cases of purifying selection and one case of diversifying selection; Table1). Table1 also gives the GenBank accession numbers for all sequences used as well as sequence divergence estimates (p-distances) and the results of the Nei-Gojobori Z-tests of selection.
If signatures of selection become more common as mutations accumulate in a cancer-associated sequence, we might expect to see greater nucleotide divergence estimates in examples showing significant selection. To test this, we compared p-distances in the 10 examples showing molecular evidence of selection in the cancer associated sequences with the 36 examples not showing evidence of selection in the cancer associated sequences. The mean p-distance of sequences showing evidence of selection was 0.125, while the mean p-distance of sequences not showing evidence of selection was 0.082 (unpaired t-test, p=0.398).
We describe a proof of principle test of a method of identifying molecular signatures of natural selection in cancer-derived gene sequences. We also show that in a sample of 46 genes the cancer and non-cancer derived sequences show different patterns of selection.
As a cancer grows and evolves and different genes come under selection pressure, natural selection may be expected to record evidence of this selection in the proportion of synonymous to nonsynonymous base substitutions as we have discussed here. Even if that particular gene later becomes non-functional through further mutations, evidence of prior selection pressure would be expected to persist. Thus a list of genes showing molecular evidence of selection only in cancer cells could be considered to be those genes which have been important to the survival of the cancer cell up to that point on time. In essence, this provides us with a method to determine which genes have been integral to the survival the cancer cell.
There are several potential weaknesses to our study. First, a different number of sequences were available for the various genes we examined. With a greater number of sequences we may expect a greater power to detect signatures of selection. To test such an effect we compared the mean number of sequences from genes which showed selection (3.17) to the mean number of sequences from genes which did not show selection (3.27). The difference was not statistically significant (p=0.134, unpaired t-test). Therefore, although this is a potential theoretical concern, we can find no evidence of this in our data.
Second, we do not have information about the geographic or racial origins of the individuals from whom the cancer and non-cancer gene sequences were derived. It is possible that increased variability noted for some genes could be due to these factors.
Third and perhaps most importantly, the choice of the model to calculate dN/dS as well as the test interpretation are both potentially controversial. The Nei-Gojobori method is perhaps less conservative than a maximum likelihood model but at the same time if the majority of sites in a protein evolve under purifying selection (as we might expect in a functionally essential gene in a tumour) the dN/dS statistic has reduced sensitivity to detect positive selection. Moreover, the behaviour of dN/dS statistics when applied to polymorphisms within a population may behave differently than when applied to fixed mutations between species. Whether cancer cells from the same tumour and/or from tumours from different individuals are sufficiently diverged to be considered analogous to different species is a critical unanswered question. Therefore, because of these uncertainties, we decided to use the simple Nei-Gojobori statistic for this preliminary analysis. As major cancer sequencing initiatives begin producing whole genome sequences from paired cancer/normal samples from the same patient, this question will become more important. Further work should critically examine the optimal statistic to be used for these analyses.
Although we could not detect a statistically significant difference in the mean p-distances between cancer associated sequences showing evidence of selection and those that did not, there was a trend toward greater p-distances among the sequences showing selection and so our inability to demonstrate a difference may be a factor of the limited sample size.
Parenthetically, the process postulated here, where relentless mutation in cancer cells results in either mutational inactivation of genes or positive selection to maintain their function gives a functional explanation for why more advanced cancers invariably show what pathologists refer to as “de-differentiation”; as Mueller’s ratchet removes all but the reproductively essential genes.
It will be obvious that the ability of gene sequences to display evidence of natural selection is based both on a high cancer cell mutation rate and an increased cancer cell proliferative rate which together provide the raw material on which selection can act. As these conditions likely are greater in more advanced cancers, we would expect to see greater molecular evidence of selection in later stage cancer cells. Indeed, comparison of early and later stage cancer cells could provide a roadmap of when particular genes experience selection pressure and therefore when these genes are important for tumorigenesis. Furthermore, because the molecular signatures of selection would be expected to persist for many generations of cancer cells, late stage cancers would be expected to contain a molecular record of genes conserved at essentially any stage of the clonal evolution of the cancer cell, even if that gene is no longer under selection pressure or even is no longer functional. By this line of reasoning, genes which are epigenetically silenced would be shielded from selection and may be expected to eventually be subject to loss of function mutations, even if they maintain molecular evidence of prior natural selection during tumorigenesis.
We caution that our results with regards to specific genes should be interpreted as preliminary only. Our sample was based only on publicly available sequences and encompassed a number of different malignancies making any conclusions about gene function based on these findings premature. Furthermore, this approach may not distinguish between driver genes which promote oncogenesis and non-driver genes nevertheless essential for cancer cell growth and reproduction. However, the application of previously described methods could be used to distinguish these[16, 17].
As new databases of cancer genomes become available[14, 17–27], a future direction for this work will be to apply these techniques to whole genome sequences of cancer cells. This could be performed at the level of the tumour as a whole to look at genes important across a sample of tumours of the same type or it could be applied to single cells to explore the genes of importance in particular microenvironments such as metastatic deposits. This approach, combined with oncogenetic reconstruction of cancer clonal lineages using the same sequencing data could provide a powerful new tool to identify candidate genes of functional significance for potential targeted therapies as well as providing new insights into the evolutionary mechanisms of cancer cell clonal evolution.
Genes may be under different selection pressures within a cancer as compared to normal tissues. In this paper we proposed a method to answer the question of what genes are important to a cancer cell. The high mutation rates and rapid cell division present in cancer suggests that functionally important genes will show evidence of selection. We could therefore, in an indirect manner, observe what genes a cancer cell needs to survive. The genes that are important could then form a list of possible targets for therapeutic intervention.
- Nowell PC: The clonal evolution of tumor cell populations. Science. 1976, 194: 23-28.View ArticlePubMedGoogle Scholar
- Crespi B, Summers K: Evolutionary biology of cancer. Trend Ecol Evol. 2005, 20: 545-552.View ArticleGoogle Scholar
- Merlo LM, Pepper JW, Reid BJ, Maley CC: Cancer as an evolutionary and ecological process. Nat Rev Cancer. 2006, 6: 924-935.View ArticlePubMedGoogle Scholar
- Shackleton M, Quintana E, Fearon ER, Morrison SJ: Heterogeneity in cancer: cancer stem cells versus clonal evolution. Cell. 2009, 138: 822-829.View ArticlePubMedGoogle Scholar
- Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O'Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D: Patterns of somatic mutation in human cancer genomes. Nature. 2007, 446: 153-158.PubMed CentralView ArticlePubMedGoogle Scholar
- Cahill DP, Kinzler KW, Vogelstein B, Lengauer C: Genetic instability and Darwinian selection in tumors. Trends Cell Biol. 1999, 9: M57-M60.View ArticlePubMedGoogle Scholar
- Tao Y, Ruan J, Yeh SH, Lu X, Wang Y, Zhai W, Cai J, Ling S, Gong Q, Chong Z, Qu Z, Li Q, Liu J, Yang J, Zheng C, Zeng C, Wang HY, Zhang J, Wang SH, Hao L, Dong L, Li W, Sun M, Zou W, Yu C, Li C, Liu G, Jiang L, Xu J, Huang H: Rapid growth of a hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data. Proc Natl Acad Sci. USA. 2011, 108: 12042-12047.PubMed CentralView ArticlePubMedGoogle Scholar
- Komarova N, Wodarz D: Drug resistance in cancer: principles of emergence and prevention. Proc Natl Acad Sci USA. 2005, 102: 9714-9719.PubMed CentralView ArticlePubMedGoogle Scholar
- Anderson AR, Weaver AM, Cummings PT, Quaranta V: Tumor morphology and phenotypic evolution driven by selective pressure from the microenvironment. Cell. 2006, 127: 905-915.View ArticlePubMedGoogle Scholar
- Naugler CT: Population genetics of cancer cell clones: possible implications of cancer stem cells. Theor Biol Med Model. 2010, 7: 42.PubMed CentralView ArticlePubMedGoogle Scholar
- Stoler DL, Chen N, Basik M, Kahlenberg MS, Rodriguez-Bigas MA, Petrelli NJ, Anderson GR: The onset and extent of genomic instability in sporadic colorectal tumor progression. Proc Natl Acad Sci USA. 1999, 96: 15121-15126.PubMed CentralView ArticlePubMedGoogle Scholar
- Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA: Cosmic: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 2011, 39 (database issue): D945-D950.PubMed CentralView ArticlePubMedGoogle Scholar
- Jackson AL, Loeb LA: The mutation rate and cancer. Genetics. 1998, 148: 1483-1490.PubMed CentralPubMedGoogle Scholar
- Puente XS, Pinyol M, Quesada V, Conde L, Ordóñez GR, Villamor N, Escaramis G, Jares P, Beà S, González-Díaz M, Bassaganyas L, Baumann T, Juan M, López-Guerra M, Colomer D, Tubío JM, López C, Navarro A, Tornador C, Aymerich M, Rozman M, Hernández JM, Puente DA, Freije JM, Velasco G, Gutiérrez-Fernández A, Costa D, Carrió A, Guijarro S, Enjuanes A: Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature. 2011, 475: 101-105.PubMed CentralView ArticlePubMedGoogle Scholar
- Tomlinson I, Sagieni P, Bodmer W: how many mutations in a cancer?. Am J Pathol. 2002, 160: 755-758.PubMed CentralView ArticlePubMedGoogle Scholar
- Youn A, Simon R: Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics. 2011, 27: 175-181.PubMed CentralView ArticlePubMedGoogle Scholar
- Bignell GR, Greenman CD, Davies H, Butler AP, Edkins S, Andrews JM, Buck G, Chen L, Beare D, Latimer C, Widaa S, Hinton J, Fahey C, Fu B, Swamy S, Dalgliesh GL, Teh BT, Deloukas P, Yang F, Campbell PJ, Futreal PA, Stratton MR: Signatures of mutation and selection in the cancer genome. Nature. 2010, 463: 893-898.PubMed CentralView ArticlePubMedGoogle Scholar
- Nielson R: Molecular signatures of natural selection. Annu Rev Genet. 2005, 39: 197-218.View ArticleGoogle Scholar
- Naugler C: Origins and relatedness of human leukocyte antigen Class I supertypes. Hum Immunol. 2010, 71: 837-842.View ArticlePubMedGoogle Scholar
- Tamura K, Peterson D, Peterson N, StGecher , Nei M, KuSmar : MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol. 2011, 28: 2731-2739.PubMed CentralView ArticlePubMedGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.PubMedGoogle Scholar
- Nielsen R, Yang Z: Likelihood models for detecting positively selected amino acid sites and application to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.PubMed CentralPubMedGoogle Scholar
- Kryazhimskiy S, Plotkin JB: The population genetics of dN/dS. PLoS Genetics. 2008, 4 (12): e1000304.PubMed CentralView ArticlePubMedGoogle Scholar
- Vincent MD: The animal within: carcinogenesis and the clonal evolution of cancer cells are speciation events sensu stricto. Evolution. 2010, 64: 1173-1183.View ArticlePubMedGoogle Scholar
- Muller HJ: Further studies on the nature and causes of gene mutations. Proceedings of the sixth international congress of genetics. Edited by: Jones DF. 1932, Brooklyn Botanic Gardens, Menagha, Wisconsin, 213-255.Google Scholar
- Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, Pohl C, Smith S, Hawkins A, Abbott S, Locke D, Hillier LW, Miner T, Fulton L, Magrini V, Wylie T, Glasscock J, Conyers J, Sander N, Shi X, Osborne JR, Minx P: DNA sequencing of a cytogenetically normal acute myeloid leukemia genome. Nature. 2008, 456: 66-72.PubMed CentralView ArticlePubMedGoogle Scholar
- The 1000 genomes project consortium: A map of human genome variation from population-scale sequencing. Nature. 2010, 457: 1061-1073.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.