Skip to main content

Preliminary evidence of different selection pressures on cancer cells as compared to normal tissues



Cancer is characterized by both a high mutation rate as well as high rates of cell division and cell death. We postulate that these conditions will result in the eventual mutational inactivation of genes not essential to the survival of the cancer cell, while mutations in essential genes will be eliminated by natural selection leaving molecular signatures of selection in genes required for survival and reproduction. By looking for signatures of natural selection in the genomes of cancer cells, it should therefore be possible to determine which genes have been essential for the development of a particular cancer.


We provide a proof of principle test of this idea by applying a test of neutrality (Nei-Gojobori Z-test of selection) to 139 cancer-related nucleotide sequences obtained from GenBank representing 46 cancer-derived genes.


Among cancer associated sequences, 10 genes showed molecular evidence of selection. Of these 10 genes, four showed molecular evidence of selection in non-cancer transcripts. Among non-cancer associated sequences, eight genes showed molecular evidence of selection, with four of these also showing selection in the cancer associated sequences.


These results provide preliminary evidence that the same genes may experience different selection pressures within normal and cancer tissues. Application of this technique could identify genes under unique selection pressure in cancer tissues and thereby indicate possible targets for therapeutic intervention.


Cancer cell clones evolve over the lifespan a tumour[13]. The selective pressures driving this clonal evolution are myriad and may include microenvironmental factors, immune system surveillance, competition with other cancer and somatic cells, and selective killing of cancer cells by surgery, chemotherapy and radiation[29]. Two features of cancer portend intense natural selection among cancer cells. The first is the observation that cancer cells (at least in the later stages of growth) experience a high rate of cell death[10]. The second is the greatly increased rate of mutations in cancer cells[1116]. For example, a recent large scale study identified mutations in 11% of protein coding genes examined over 756 cancer cell lines[17]. Many of these mutations, even if they change the resulting protein sequence of the gene product may be considered to be “passenger” mutations that do not contribute to oncogenesis[16] and are of no significance to the cancer cell[3, 12]. Indeed mutations in non-essential genes may even be adaptive to the cancer cell as they shed costly metabolic processes irrelevant to reproduction of the cancer cell[3].

The high mutation rate and rapid cellular turnover may be expected to form an intense environment for natural selection where mutations arise and are tested for functional importance through competition with other cells. Eventually, this environment may lead to the situation where many genes have been rendered nonfunctional by mutations and the subset of genes that have been important for the survival and multiplication of the cancer cells will have been preserved through constant selection of functional versions of these genes.

Evolutionary biologists have identified a number of methods for detecting molecular evidence of natural selection[18]. These, so-called “tests of selection” attempt to differentiate neutral evolution (i.e. genetic drift) from Darwinian selection. One commonly used method compares ratios of synonymous and non-synonymous base substitutions. This approach has the advantage of being robust with regards to population growth[18], a confounding factor particularly important in the context of cancer cell growth. Synonymous base substitutions change the exonic base pair sequence but conserve the translated amino acid sequence (because of the degenerate nature of the DNA code). In contrast, nonsynonymous base pair substitutions change both the base pair sequence as well as the translated amino acid sequence. An increased rate of synonymous to nonsynonymous base substitutions provides evidence that the base sequence in question is or has been under natural selection to conserve the amino acid sequence (purifying selection). Less commonly, a sequence may exhibit an increased rate of nonsynonymous to synonymous base substitutions, indicating the base sequence in question has been under natural selection to change the ancestral amino acid sequence (diversifying selection). Perhaps the best described example of this is the diversifying selection shaping the peptide binding grooves of MHC class I molecules[19]. We might expect that the majority of selection pressures on cancer cells would be in the form of purifying selection to maintain the function of essential genes. However it is also possible that diversifying selection also plays a role in cancer cell evolution, possibly in facilitating the exploitation of new microenvironments.

Here we test the hypothesis that due to the high mutation rates and increased cell turnover in cancer cells, genes of importance to the survival of the cancer cell should show molecular evidence of natural selection. Furthermore, we predict that in the majority of cases this selection would be in the form of purifying selection.

Materials and methods

As an initial test of this hypothesis we obtained cancer-derived DNA sequences from GenBank using the search parameters “carcinoma expression library", "cancer-associated transcript”, "tumour-associated transcript" and “Homo sapiens”. We did not attempt to obtain an exhaustive list of all available transcripts but rather sought a convenience sample of different genes where at least two different examples of the same gene sequence from cancer tissue could be obtained. We did not include animal model-derived sequences or experimental cell line sequences. To determine if these genes show natural selection in non-cancerous tissues, Genbank was again used to find non-cancer versions of the same genes. In cases where we could not locate two non-cancer sequences from among the GenBank entries, we isolated the relevant sequences from the NCBI reference sequences primary and alternate assemblies. The sequences used in this study are all publically available from NCBI; the sequence references are given in Table1.

Table 1 Gene sequences used in the analyses

Analyses were performed using the Molecular Evolutionary Genetics Analysis (MEGA) software Version 5[20]. Following sequence alignment using the ClustalW method, the Nei-Gojobori Z-Test of Selection[21] was used to calculate the synonymous to nonsynonymous base substitution rates and the associated statistical probabilities. P-values of less than 0.05 were considered significant.


A total of 46 cancer-derived genes represented by 139 sequences were identified (Table1). No sequences were derived from propagated cell lines. However, we were unable to determine what proportion of examples were from primary tumors vs metastatic tumors. Of the 46 genes, nine genes showed evidence of purifying selection and 1 showed evidence of diversifying selection (Table1). Six genes showed molecular evidence of selection only in cancer associated sequences (all in the form of purifying selection), four genes showed evidence of selection only in non-cancer associated sequences (three cases of purifying selection and one case of diversifying selection), and finally four genes showed molecular evidence of selection in both cancer and non-cancer associated sequences (three cases of purifying selection and one case of diversifying selection; Table1). Table1 also gives the GenBank accession numbers for all sequences used as well as sequence divergence estimates (p-distances) and the results of the Nei-Gojobori Z-tests of selection.

If signatures of selection become more common as mutations accumulate in a cancer-associated sequence, we might expect to see greater nucleotide divergence estimates in examples showing significant selection. To test this, we compared p-distances in the 10 examples showing molecular evidence of selection in the cancer associated sequences with the 36 examples not showing evidence of selection in the cancer associated sequences. The mean p-distance of sequences showing evidence of selection was 0.125, while the mean p-distance of sequences not showing evidence of selection was 0.082 (unpaired t-test, p=0.398).


We describe a proof of principle test of a method of identifying molecular signatures of natural selection in cancer-derived gene sequences. We also show that in a sample of 46 genes the cancer and non-cancer derived sequences show different patterns of selection.

As a cancer grows and evolves and different genes come under selection pressure, natural selection may be expected to record evidence of this selection in the proportion of synonymous to nonsynonymous base substitutions as we have discussed here. Even if that particular gene later becomes non-functional through further mutations, evidence of prior selection pressure would be expected to persist. Thus a list of genes showing molecular evidence of selection only in cancer cells could be considered to be those genes which have been important to the survival of the cancer cell up to that point on time. In essence, this provides us with a method to determine which genes have been integral to the survival the cancer cell.

There are several potential weaknesses to our study. First, a different number of sequences were available for the various genes we examined. With a greater number of sequences we may expect a greater power to detect signatures of selection. To test such an effect we compared the mean number of sequences from genes which showed selection (3.17) to the mean number of sequences from genes which did not show selection (3.27). The difference was not statistically significant (p=0.134, unpaired t-test). Therefore, although this is a potential theoretical concern, we can find no evidence of this in our data.

Second, we do not have information about the geographic or racial origins of the individuals from whom the cancer and non-cancer gene sequences were derived. It is possible that increased variability noted for some genes could be due to these factors.

Third and perhaps most importantly, the choice of the model to calculate dN/dS as well as the test interpretation are both potentially controversial. The Nei-Gojobori method is perhaps less conservative than a maximum likelihood model but at the same time if the majority of sites in a protein evolve under purifying selection (as we might expect in a functionally essential gene in a tumour) the dN/dS statistic has reduced sensitivity to detect positive selection[22]. Moreover, the behaviour of dN/dS statistics when applied to polymorphisms within a population may behave differently than when applied to fixed mutations between species[23]. Whether cancer cells from the same tumour and/or from tumours from different individuals are sufficiently diverged to be considered analogous to different species[24] is a critical unanswered question. Therefore, because of these uncertainties, we decided to use the simple Nei-Gojobori statistic for this preliminary analysis. As major cancer sequencing initiatives begin producing whole genome sequences from paired cancer/normal samples from the same patient, this question will become more important. Further work should critically examine the optimal statistic to be used for these analyses.

Although we could not detect a statistically significant difference in the mean p-distances between cancer associated sequences showing evidence of selection and those that did not, there was a trend toward greater p-distances among the sequences showing selection and so our inability to demonstrate a difference may be a factor of the limited sample size.

Parenthetically, the process postulated here, where relentless mutation in cancer cells results in either mutational inactivation of genes or positive selection to maintain their function gives a functional explanation for why more advanced cancers invariably show what pathologists refer to as “de-differentiation”; as Mueller’s ratchet[25] removes all but the reproductively essential genes.

It will be obvious that the ability of gene sequences to display evidence of natural selection is based both on a high cancer cell mutation rate and an increased cancer cell proliferative rate which together provide the raw material on which selection can act. As these conditions likely are greater in more advanced cancers, we would expect to see greater molecular evidence of selection in later stage cancer cells. Indeed, comparison of early and later stage cancer cells could provide a roadmap of when particular genes experience selection pressure and therefore when these genes are important for tumorigenesis. Furthermore, because the molecular signatures of selection would be expected to persist for many generations of cancer cells, late stage cancers would be expected to contain a molecular record of genes conserved at essentially any stage of the clonal evolution of the cancer cell, even if that gene is no longer under selection pressure or even is no longer functional. By this line of reasoning, genes which are epigenetically silenced would be shielded from selection and may be expected to eventually be subject to loss of function mutations, even if they maintain molecular evidence of prior natural selection during tumorigenesis.

We caution that our results with regards to specific genes should be interpreted as preliminary only. Our sample was based only on publicly available sequences and encompassed a number of different malignancies making any conclusions about gene function based on these findings premature. Furthermore, this approach may not distinguish between driver genes which promote oncogenesis and non-driver genes nevertheless essential for cancer cell growth and reproduction. However, the application of previously described methods could be used to distinguish these[16, 17].

As new databases of cancer genomes become available[14, 1727], a future direction for this work will be to apply these techniques to whole genome sequences of cancer cells. This could be performed at the level of the tumour as a whole to look at genes important across a sample of tumours of the same type or it could be applied to single cells to explore the genes of importance in particular microenvironments such as metastatic deposits. This approach, combined with oncogenetic reconstruction of cancer clonal lineages using the same sequencing data could provide a powerful new tool to identify candidate genes of functional significance for potential targeted therapies as well as providing new insights into the evolutionary mechanisms of cancer cell clonal evolution.


Genes may be under different selection pressures within a cancer as compared to normal tissues. In this paper we proposed a method to answer the question of what genes are important to a cancer cell. The high mutation rates and rapid cell division present in cancer suggests that functionally important genes will show evidence of selection. We could therefore, in an indirect manner, observe what genes a cancer cell needs to survive. The genes that are important could then form a list of possible targets for therapeutic intervention.


  1. 1.

    Nowell PC: The clonal evolution of tumor cell populations. Science. 1976, 194: 23-28.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Crespi B, Summers K: Evolutionary biology of cancer. Trend Ecol Evol. 2005, 20: 545-552.

    Article  Google Scholar 

  3. 3.

    Merlo LM, Pepper JW, Reid BJ, Maley CC: Cancer as an evolutionary and ecological process. Nat Rev Cancer. 2006, 6: 924-935.

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Shackleton M, Quintana E, Fearon ER, Morrison SJ: Heterogeneity in cancer: cancer stem cells versus clonal evolution. Cell. 2009, 138: 822-829.

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O'Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D: Patterns of somatic mutation in human cancer genomes. Nature. 2007, 446: 153-158.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  6. 6.

    Cahill DP, Kinzler KW, Vogelstein B, Lengauer C: Genetic instability and Darwinian selection in tumors. Trends Cell Biol. 1999, 9: M57-M60.

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    Tao Y, Ruan J, Yeh SH, Lu X, Wang Y, Zhai W, Cai J, Ling S, Gong Q, Chong Z, Qu Z, Li Q, Liu J, Yang J, Zheng C, Zeng C, Wang HY, Zhang J, Wang SH, Hao L, Dong L, Li W, Sun M, Zou W, Yu C, Li C, Liu G, Jiang L, Xu J, Huang H: Rapid growth of a hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data. Proc Natl Acad Sci. USA. 2011, 108: 12042-12047.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  8. 8.

    Komarova N, Wodarz D: Drug resistance in cancer: principles of emergence and prevention. Proc Natl Acad Sci USA. 2005, 102: 9714-9719.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  9. 9.

    Anderson AR, Weaver AM, Cummings PT, Quaranta V: Tumor morphology and phenotypic evolution driven by selective pressure from the microenvironment. Cell. 2006, 127: 905-915.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Naugler CT: Population genetics of cancer cell clones: possible implications of cancer stem cells. Theor Biol Med Model. 2010, 7: 42.

    PubMed Central  Article  PubMed  Google Scholar 

  11. 11.

    Stoler DL, Chen N, Basik M, Kahlenberg MS, Rodriguez-Bigas MA, Petrelli NJ, Anderson GR: The onset and extent of genomic instability in sporadic colorectal tumor progression. Proc Natl Acad Sci USA. 1999, 96: 15121-15126.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  12. 12.

    Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA: Cosmic: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 2011, 39 (database issue): D945-D950.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  13. 13.

    Jackson AL, Loeb LA: The mutation rate and cancer. Genetics. 1998, 148: 1483-1490.

    PubMed Central  CAS  PubMed  Google Scholar 

  14. 14.

    Puente XS, Pinyol M, Quesada V, Conde L, Ordóñez GR, Villamor N, Escaramis G, Jares P, Beà S, González-Díaz M, Bassaganyas L, Baumann T, Juan M, López-Guerra M, Colomer D, Tubío JM, López C, Navarro A, Tornador C, Aymerich M, Rozman M, Hernández JM, Puente DA, Freije JM, Velasco G, Gutiérrez-Fernández A, Costa D, Carrió A, Guijarro S, Enjuanes A: Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature. 2011, 475: 101-105.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  15. 15.

    Tomlinson I, Sagieni P, Bodmer W: how many mutations in a cancer?. Am J Pathol. 2002, 160: 755-758.

    PubMed Central  Article  PubMed  Google Scholar 

  16. 16.

    Youn A, Simon R: Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics. 2011, 27: 175-181.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  17. 17.

    Bignell GR, Greenman CD, Davies H, Butler AP, Edkins S, Andrews JM, Buck G, Chen L, Beare D, Latimer C, Widaa S, Hinton J, Fahey C, Fu B, Swamy S, Dalgliesh GL, Teh BT, Deloukas P, Yang F, Campbell PJ, Futreal PA, Stratton MR: Signatures of mutation and selection in the cancer genome. Nature. 2010, 463: 893-898.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  18. 18.

    Nielson R: Molecular signatures of natural selection. Annu Rev Genet. 2005, 39: 197-218.

    Article  Google Scholar 

  19. 19.

    Naugler C: Origins and relatedness of human leukocyte antigen Class I supertypes. Hum Immunol. 2010, 71: 837-842.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Tamura K, Peterson D, Peterson N, StGecher , Nei M, KuSmar : MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol. 2011, 28: 2731-2739.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  21. 21.

    Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.

    CAS  PubMed  Google Scholar 

  22. 22.

    Nielsen R, Yang Z: Likelihood models for detecting positively selected amino acid sites and application to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.

    PubMed Central  CAS  PubMed  Google Scholar 

  23. 23.

    Kryazhimskiy S, Plotkin JB: The population genetics of dN/dS. PLoS Genetics. 2008, 4 (12): e1000304.

    PubMed Central  Article  PubMed  Google Scholar 

  24. 24.

    Vincent MD: The animal within: carcinogenesis and the clonal evolution of cancer cells are speciation events sensu stricto. Evolution. 2010, 64: 1173-1183.

    Article  PubMed  Google Scholar 

  25. 25.

    Muller HJ: Further studies on the nature and causes of gene mutations. Proceedings of the sixth international congress of genetics. Edited by: Jones DF. 1932, Brooklyn Botanic Gardens, Menagha, Wisconsin, 213-255.

    Google Scholar 

  26. 26.

    Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, Pohl C, Smith S, Hawkins A, Abbott S, Locke D, Hillier LW, Miner T, Fulton L, Magrini V, Wylie T, Glasscock J, Conyers J, Sander N, Shi X, Osborne JR, Minx P: DNA sequencing of a cytogenetically normal acute myeloid leukemia genome. Nature. 2008, 456: 66-72.

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  27. 27.

    The 1000 genomes project consortium: A map of human genome variation from population-scale sequencing. Nature. 2010, 457: 1061-1073.

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Christopher Naugler.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

KO participated in study design, performed the majority of data analysis and drafted the manuscript. CN conceived of the initial study design, critically revised the manuscript and performed some of the data analyses. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Ovens, K., Naugler, C. Preliminary evidence of different selection pressures on cancer cells as compared to normal tissues. Theor Biol Med Model 9, 44 (2012).

Download citation


  • Natural Selection
  • Molecular Evidence
  • High Mutation Rate
  • Clonal Evolution
  • Translate Amino Acid Sequence