Theoretical Biology and Medical Modelling Analysis of Variation of Amplitudes in Cell Cycle Gene Expression

Background: Variation in gene expression among cells in a population is often considered as noise produced from gene transcription and post-transcription processes and experimental artifacts. Most studies on noise in gene expression have emphasized a few well-characterized genes and proteins. We investigated whether different cell-arresting methods have impacts on the maximum expression levels (amplitudes) of a cell cycle related gene.


Introduction
Variation in gene expression is often considered as noise or uncertainty arising from experimental artifacts and biological variability. Various studies of noise in gene expression have focused on different scales, ranging from a single gene [1] to a single cell [2,3] to a cell population [4][5][6][7][8][9]. These studies have greatly helped us understand the effects of stochastic noise in gene expression and gene regulation in various model organisms. In a similar spirit, we were interested in the effects of different cell-arresting methods on the maximum expression levels (amplitudes) of some cell cycle related genes.
Various methods such as chemical induction and temperature shift have been used to arrest cells in genome-wide cell cycle studies [10][11][12][13]. Each method may have direct or indirect impacts on the synthesis or degradation of mRNAs from some genes after the interrupted cell cycle resumes. For example Whitfield et al. [11] used thymidine-thymidine (thy-thy) to arrest HeLa cells in G1/S phase and thymidine-nocodazole (thy-noc) to arrest them in G2/M phase. Intuitively, the synthesis or degradation of some mRNAs in G1/S phase and G2/M may be differentially affected by thy-thy and thy-noc arrests, respectively.
Measurements of the intensities of gene expression from microarray experiments are subject to two main sources of variation: (i) technical variability including bioassay preparation, dye-effect and hybridization on chips, (ii) and biological variability including variation in activation of transcription from cell to cell in a population after release from cell cycle arrest. Another implicit feature of microarray data is that gene expression is an average value over a cell population rather than in a single cell. In general, it is difficult to separate these two sources of variation for expression of a gene under given experimental conditions unless multiple repeated measurements are made over time and some prior knowledge of the expression of this gene is available. Periodic expression of some genes may be a good model for examining the effects of various cellarresting methods on the transcription of known genes during cell cycle experiments.
Some advantages of using cell cycle related gene expression to probe the variation in maximum expression level due to different cell-arresting methods are: (i) cells can be synchronized to some extent so that variation of expression from cell to cell can be reduced; (ii) the expression profiles of some known cell cycle related genes such as PCNA and CDC20 (Figures 1 and 2) have been well characterized as sinusoidal waveforms over multiple cycles in different model organisms [10][11][12][13]. This makes it relatively easy to distinguish biological variation from technical variation, which produces random or transient fluctuations around a sinusoidal profile over time.
Amplitude, period and phase angle define the dynamics of a sinusoidal profile. In cell cycle or circadian rhythm studies, the phase angle, or time of maximum expression of a cycling gene, has been a primary focus because it reflects the gene's biological role [10][11][12][13][14][15]. However, the biological implications of amplitudes of cycling genes, referred to as the maximum expression level in one cycle, have not been explored in any previous microarray study of cell cycle or circadian cycle gene expression [10][11][12][13][14][15]. This might be due to the impression that gene expression from high-throughput data is noisy and therefore not reliable. Alternatively, it may be because no control (reference) mRNA was used across the experiments. When the expression of a cycling gene is measured across multiple time points in cell cycle modeled by a sinusoidal profile, its amplitude can be estimated with reasonable accuracy [16]. When a common reference mRNA is used in cell cycle experiments [11], the estimated amplitudes of the same cycling genes should be comparable across experiments. In addition to phases, changes in amplitude may reveal effects of cell-arrest methods on the expression of some cell cycle related genes.
In a single cell, the amplitude and phase of a cell cycle related gene are considered two independent parameters in a sinusoidal model. Within a cell population, however, variation in amplitude may be dependent on variation in phase angle for some genes of this kind when the cells are stressed at different stages of the cycle. The linking of amplitude to phase variability is similar to Winfree's suggestion about the connection: "Thirty-four years later the situation is beginning to change. It is at least widely recognized now that 'phase' is just one aspect of the circadian clock's 'state,' needing supplementation by at least 'amplitude' (possibly a measure of cell-population phase scatter) before experiments can be designed and interpreted with confidence" [17].
In this paper, we first illustrate how variation in amplitude depends on the distribution of phase angles of a cell cycle related gene in a cell population. We then analyze the effects of two different cell-arresting methods on some Log 2 expression ratio for PCNA, a known G1/S phase gene, in thymidine-thymidine (exp2) arrest and thymidine-nocodazole arrest (exp4) studies

Methodologies
Three parameters are commonly used for modeling the time-course of expression, y g (t), of a cell cycle related gene g over time t: amplitude, which we denote as K g ; duration of cycle (period), T; and phase angle, φ g , which is the time in the cycle when the gene is maximally activated; i.e. y g (t) = f(t; K g , T, φ g ). In our previous cell cycle related gene expression studies [16], we introduced a variance parameter σ to y g (t) for modeling attenuation of the amplitude of gene g over time, leading to the following random-periods model (RPM): where the integral averages the expression level across cells and z is assumed to be distributed as standard Gaussian. The linear terms, a g and b g , give the background gene expression. This model approximated the pattern of cycling, with its attenuation across time, when it was applied to a set of known cell cycle related genes [16].
Here, we introduce random noise, ε, to the phase of gene expression in a cell population into model (1). The expectation, E[ ], of the periodic term, which we call c g (t) in (1) for gene g, is where ε is von Mises distributed with concentration parameter κ and mean direction 0, and z is, as before, normally distributed with mean 0 and variance 1. K gmax is the amplitude when ε = 0, i.e. no variation in phase/peak expression time for gene g in a population of perfectly synchronized cells. The expectation of c g (t) in (2), E<Fences>Qc g (t)<Fences>N, can be expanded as If the random variables z and ε are independent, we obtain the simplified expression Since for the random variable ε with a von Mises distribution, we obtain Therefore, the amplitude K g in model (1) is the product of two terms, K g max and E[cos(ε)] in (3). E[cos(ε)] can be considered a measure of the variability in phase across cells in a given experiment. When the duration of the cell cycle is highly variable, as when σ is large in model (1), one might expect a corresponding attenuation of the amplitude over time.
Since it is difficult to estimate both the amplitude K g max and the term E[cos(ε)] directly from (3), we propose instead to compare the amplitude parameters in two independent experiments under the same protocol for g gene, by taking the ratio Log 2 expression ratio for CDC20, a known G2/M phase gene, in thymidine-thymidine (exp2) arrest and thymidine-nocoda-zole arrest (exp4) studies where, , κ g is the concentration parameter of ε with a von Mises distribution [18], and K 1g and K 2g are the maximum expressions of gene g in experiments 1 and 2, respectively, when the phases or peak expression times for g in a cell population are perfectly synchronized. We have 0 ≤ E(cos(ε)) ≤ 1 as the concentration parameter κ g → ∞, the variance goes to 0 and E[cos(ε)] = 1; and as κ g = 0, E[cos(ε)] = 0.
Provided that K 1g = K 2g , we reduce the ratio in (4) to Equation (5) implies that the ratio between the amplitude parameters of periodic expression of gene g in experiments 1 and 2 can be represented by the ratio of the mean noise variation, which has von Mises distributions in both In biological terms, the concentration parameter, κ, reflects the distribution of phases or peak expression times for a gene within a cell population. Therefore, we can use the ratio of estimated amplitudes from RPM (1) to examine the relative variability in phase/peak expression time for gene g in two cell cycle experiments.
To get a sense of how the ratios of estimated amplitude in In the following two sections, we apply the concepts presented above to the variation in amplitude of a set of cycling genes common to two experiments, using the cell cycle gene expression data of Whitfield et al. [11]. Here, we are primarily interested in assessing the variability of amplitudes of cell cycle related genes commonly expressed in two experiments where cells were arrested by two different methods, and in identifying genes of which the amplitudes K g do change in two experiments if there is no systematic variation between any pair of experiments.

Testing equality of amplitudes of a set of cycling gene in two experiments
Let and denote the estimated amplitude and the variance of the amplitude for the g th gene in the j th experiment, g = 1, ..., n, where n is the number of genes and j =

Example
In our previous work [19], we studied the phase association of 47 cell cycle related genes common to the 2 nd , 3 rd and 4 th experiments of Whitfield et al. [11]. In the present study, we use the same 47 genes commonly expressed in the 2 nd  time-course experiment using the random-periods model (1) on log 2 transformed data. The assumptions underlying the model appear reasonable for these data, although our conclusions are somewhat limited given the small sample size. Owing to the systematically smaller amplitudes of the 47 cell cycle related genes in the 3 rd experiment of Whitfield et al. [11], which were identified by the Wilcoxon signed rank test of (6), we excluded the 3 rd experiment from our comparison of amplitudes in this study.
The estimated amplitudes s, and the variances of the s, g = 1, ..., 47, in the 2 nd and 4 th experiments are listed in Table 1.

Results
The p-value from the Wilcoxon signed rank test on the median ∆ = 0 in (6) at the level of α = 0.05 is 0.56, suggesting that the median amplitudes in exp2 and exp4 are similar. Therefore, we can directly compare the estimated amplitudes for each of the 47 genes in the two experiments. The log 2 ratios of amplitudes in exp4 over exp2 are plotted in Figure 4. By comparing the amplitudes of the 47 cycling transcripts in these two experiments, we found that the 95% confidence intervals (z α/2 = 1.96, σ = 0.05) for the genes FLJ10540, PCNA, CDC6 and CDC20 did not include zero, suggesting that the estimated amplitudes for these four genes in exp2 and exp4 of Whitfield et al. [11] might be affected by thy-thy arrest in exp2 and thy-noc arrest in exp4. This was not true of the estimated ampli-tudes of the other 43 genes (Table 1). Note that the amplitudes of CDC6 and PCNA, which are expressed in the G1/ S phase, were reduced almost to half in the thy-thy (S phase arrest) experiment relative to thy-noc (M phase arrest) experiment; the amplitude of CDC20, which is expressed in the G2/M phase, was reduced in the thy-noc experiment to half that in the thy-thy experiment.

Discussion
In this paper, we have analyzed the effect of the scattering of phase angles of a cell cycle related gene in a cell population on the amplitude of expression of this gene. Our analysis suggests that variation in amplitude for such a gene between two experiments depends on the variation of phase distribution in a population of cells. We illustrated our analysis by comparing the amplitudes of 47 cell cycle related genes in the 2 nd and 4 th experiments of Whitfield et al. [11], where two different methods were used that resulted in cells being arrested at different stages of the cycle. The amplitudes of 43 of the 47 genes were not significantly affected by the differences in cell-arresting methods. Among the 4 genes that were differentially affected, the amplitudes of the G1/S phase genes CDC 6 and PCNA were smaller in the thy-thy (S phase arrest) experiment 2, while the amplitude of G2/M gene CDC20 was smaller in the thy-noc (M phase arrest) experiment 4 of Whitfield et al. [11]. These results suggest that thy-thy and thy-noc affected the maximum expression levels of some G1/S and G2/M phase genes differentially. It appears plausible that the thy-thy arresting method might completely prevent expression of some G1/S phase genes. Some of these genes could be recovered from the gene list of the 4 th experiment using the thy-noc method.
Our results suggest that thy-thy interrupts PCNA and CDC6 mRNA synthesis in S phase arrest, and thy-noc interrupts CDC20 and FLJ10540 mRNA synthesis in G2/ M arrest. After the cells are released, synthesis of the mRNAs for some affected genes resumes but with large variation in pace across cells. In other words, the phase distributions of PCNA and CDC6 in the cell population of exp2 are more spread out during the G1/S phase; and the phase distributions of FLJ10540 and CDC20 in the cell population of exp4 are more spread out in the G2/M phase. For example, the ratio between the two amplitudes of CDC20 in exp4 vs. exp2 is about 0.5. According to the ratio defined in (5), we could infer that the upper bound for the concentration parameter of von Mises for CDC20 in exp4 is less than 2.5, provided the for CDC20 in exp2 is very large, e.g. >20. The significant difference between the two distributions with = 2 and 10 is illustrated graphically in Figure  Plot of concentration parameter κ vs. expectation of cos(ε), where ε is von Mises distributed with zero mean direction and concentration κ, i.e., ε ~ VM(κ,0).
Our results show that some cell cycle related genes may be more responsive or sensitive than others to changes in the environment, e.g. cell-arresting chemicals, temperature shift, etc. Raser and O'Shea [8] suggested that noise intrinsic to eukaryotic gene expression is gene-specific, and Fraser et al. [9] suggested that the production of essential and complex-forming proteins involves lower levels of noise than does the production of most other genes. Our findings indicate that the 43 cell cycle related genes with unaltered amplitudes in exp2 and exp4 of Whitfield et al. [11] may be essential to the HeLa cell cycle, and thus less sensitive to perturbation by stress or chemicals. However, K 2K4K2K4 CDC6 and CDC20, which are important to the yeast cell cycle [20], were expressed at significantly different amplitudes in the HeLa cell cycle. Further studies are needed to investigate whether some essential cell cycle genes such as CDC6 and CDC20 are cell type specific in response to chemicals.
The amplitude, phase angle and period estimated from (1) for genes from the microarray data are characteristic of cell populations rather than a single cell. Conventionally, amplitude and phase angle are considered independent parameters in a sinusoidal model. However, in microarray studies, where the measured periodic expression for a cell cycle related gene is averaged over a cell population (>10 6 cells), a phase change in the concentration of von Mises distribution for a gene can contribute to a change in amplitude. Note that our analysis partially addresses Winfree's concern about whether amplitude should be considered as additional information to phase in studies of circadian rhythms [17].
The detection of cell cycle related genes with significantly different amplitudes between exp2 and exp4 of Whitfield et al. [11] depends on: (i) approximation of the true distribution of amplitudes of K gx and K gy , g = 1, ..., 47 by a normal distribution; (ii) the design of exp2 and exp4, including number of time points per gene. While these assumptions appear tenable for these data, a more comprehensive analysis of other relevant cell cycle gene expression studies is needed for more definitive conclusions about their validity. The four genes currently identified all have an estimated 1.5 fold change, and with the current sample size, the power to detect such a change is only around 50%. If the number of time points in exp2 and exp4 were larger (e.g. 47 in exp3 of Whitfield et al. [11]), the power for detecting amplitudes with less than 2fold change would be increased.
One often neglected but important factor in interpreting and analyzing cell cycle related gene expression data is the quality of synchrony of the cell culture. Currently there are no quantitative standards for measuring to what extent cells have been synchronized. The periodic patterns of the 47 genes were measured from stressed or perturbed cells in the 2 nd and 4 th experiments of Whitfield et al. [11]. Gene expression from normal, un-perturbed and synchronized HeLa cells obtained using the technologies proposed by Helmsteteter et al. [21] may serve as references for comparing the expression of these genes when mRNA synthesis is interrupted by different cell-arresting methods, e.g. temperature shift or chemical induction at various phases of the cell cycle. Good quality control of cell synchrony, as suggested in Cooper et al. [22], will provide a basis for microarray studies of cell cycle related genes.
More quantitative measures of cell culture synchrony, and investigation of the impacts of cell culture with various degrees of synchrony on expression of some cell cycle related genes, are needed in future studies.

Conclusion
The amplitudes of some cell cycle related genes were used to measure the effects of two different cell-arresting methods on gene expression. Some genes with periodic expression patterns can be used as models to probe the effects of different cell-arresting methods on expression of these genes, which can be quantitatively described in terms of amplitude and phase. The ratio between the amplitudes estimated in two experiments for a cell cycle related gene can be used to gauge the variation of the phase/peak expression time distribution involved in stochastic transcriptional and post-transcriptional processes for the gene in a cell population. Further investigations are needed using normal, unperturbed and synchronized HeLa cells as a reference for comparing how many cell cycle related genes are directly and indirectly affected by various cellarresting methods.