The time-profile of cell growth in fission yeast: model selection criteria favoring bilinear models over exponential ones

Background There is considerable controversy concerning the exact growth profile of size parameters during the cell cycle. Linear, exponential and bilinear models are commonly considered, and the same model may not apply for all species. Selection of the most adequate model to describe a given data-set requires the use of quantitative model selection criteria, such as the partial (sequential) F-test, the Akaike information criterion and the Schwarz Bayesian information criterion, which are suitable for comparing differently parameterized models in terms of the quality and robustness of the fit but have not yet been used in cell growth-profile studies. Results Length increase data from representative individual fission yeast (Schizosaccharomyces pombe) cells measured on time-lapse films have been reanalyzed using these model selection criteria. To fit the data, an extended version of a recently introduced linearized biexponential (LinBiExp) model was developed, which makes possible a smooth, continuously differentiable transition between two linear segments and, hence, allows fully parametrized bilinear fittings. Despite relatively small differences, essentially all the quantitative selection criteria considered here indicated that the bilinear model was somewhat more adequate than the exponential model for fitting these fission yeast data. Conclusion A general quantitative framework was introduced to judge the adequacy of bilinear versus exponential models in the description of growth time-profiles. For single cell growth, because of the relatively limited data-range, the statistical evidence is not strong enough to favor one model clearly over the other and to settle the bilinear versus exponential dispute. Nevertheless, for the present individual cell growth data for fission yeast, the bilinear model seems more adequate according to all metrics, especially in the case of wee1Δ cells.


Background
During the division cycle of individual growing cells, most size-related parameters such as length (L), volume (V), surface area, dry mass and others show a continuous increase, but there is considerable controversy concerning the exact time-profile of these increases. To describe the growth period, commonly considered possibilities include linear, exponential and bilinear models, and various bodies of experimental evidence and theoretical considerations have been proposed to support one or the other [1]. The same model may not apply for all species, and because of the uncertainties in the experimental data and of the relatively small differences in predictions owing to the relatively limited data-range (approximate doubling of size during a cell cycle), it is difficult to identify the most adequate model unequivocally. Exponential models such as V = αe βt , which are easy to rationalize (the rate of growth is proportional to the existing size: dV/dt = βV) and convenient to parameterize (α, β) and implement, are often employed. However, a number of cases seem to support a bilinear-type growth pattern with growth occurring along two (or perhaps more) essentially linear segments, corresponding to constant rates, separated by a transitional period around a rate-change point (RCP) during which the rate of length-growth increases [2][3][4][5][6][7][8][9]. The difference between the two models is most evident in the time profiles of the speed (rate) of growth increases (dL/dt): that of the bilinear model contains two constant segments connected by a transition period (a characteristic sigmoid step-up function), whereas that of the exponential model shows a continuous, accelerating increase.
Whereas an exponential increase could be related to a steady growth of ribosome numbers, a bilinear pattern might be caused by effects of the cell cycle itself causing a relatively sudden rate-increase at an RCP (or more than one RCP). These effects have not yet been fully characterized. However, two different possibilities have been raised [10], one being passage through a cell-cycle stage (a socalled checkpoint) and the other being a doubling of structural genes, i.e., a "gene dosage" effect at DNA replication (S phase). A bilinear model seemed most adequate to describe the increase of cell length in fission yeast (Schizosaccharomyces pombe) as determined from detailed analyses of time-lapse films of single cells (wild-type, WT, and various mutants) [6,8,9]. In this cylindrical cell species, diameter does not change during the cycle; therefore, cell length is proportional to volume. The adequacy of the bilinear model has been questioned [11,12] by invoking Occam's razor, an often-used principle attributed to William of Occam (c. 1280-1349) that favors the most parsimonious model (originally Pluralitas non est ponenda sine necessitate, i.e., plurality should not be posited without necessity, but most often expressed as Entia non sunt multiplicanda praeter necessitatem, i.e., entities are not to be multiplied without necessity [13]). Accordingly, the exponential model was suggested as more adequate because it relies on fewer parameters and provides only a very slight worsening in the quality-of-fit as judged on the basis of the correlation coefficient (r 2 ) [11]. However, when differently parameterized models are fitted to the same data, r 2 alone is not a sufficient criterion for judging adequacy, and a number of quantitative indicators (model selection criteria) such as the partial (sequential) F-test, the Akaike information criterion (AIC) [14,15] and the Schwarz Bayesian information criterion (SBIC) [16] can be used to decide whether or not the improvement in fitting justifies the increased number of parameters employed (i.e., whether there is enough "necessity" for "entities to be multiplied") [17][18][19][20][21]. Related details are briefly discussed in the Methods section.
Here, a reanalysis of the fission yeast cell growth data is presented on the basis of these more rigorous, quantitative criteria, and a general quantitative framework is introduced to judge the adequacy of bilinear versus exponential models for describing the time-profiles of arbitrary growth processes. This was also made possible by extending a recently-introduced linearized biexponential model (LinBiExp) [21] to allow fitting of general bilinear-type data with a single, unified model. Originally, LinBiExp was introduced to describe quantitative structure-activity relationship (QSAR) data such as toxicities, antimicrobial activities and receptor-binding affinities that have a maximum or a minimum, but are essentially linear sufficiently far away from the zone of the turning point (the zone of the extreme value) [21,22]. However, by extending its parameter-range, LinBiExp can easily be generalized to describe not only data that show a maximum or a minimum, but also data that show only a ratechange between two essentially linear portions, such as those presented here and related to cell growth. Because LinBiExp makes possible a smooth, continuously differentiable and fully parameterizable transition between two linear segments, it is now possible to apply a unified model in a single fitting instead of performing two separate individual linear regressions after visually separating the data into two linear portions. Hence, with LinBiExp, the minimization algorithm itself will determine the two slope values (α 1 , α 2 ) and the position of the rate change point (t RCP ) that result in the lowest sum of squared errors (SSE), and this no longer has to be done by the user relying on preconceived assumptions or mere visual inspection. This eliminates the error-prone and bias-sensitive procedure of performing two separate linear regressions after separating the data on the basis of visual information or some preconceived notion.

Data
Cell length growth data are for individual fission yeast (Schizosaccharomyces pombe) cells (Table 1), selected as representative during the analysis of a large number of cell cycles (40-80 for each strain). These single cell data were determined using time-lapse microscopic films and are from previous publications [8,12]. The length increases occurring during the 5 min observation periods were often less than the smallest quantifiable unit, as the resolution was 0.33 µm for the wild-type and 0.13 µm for the wee1∆ mutant cell, depending on the final magnification. As a consequence, the growth profiles tended to have stair-like patterns with a number of plateaus; these were short inside the cycle, but there was a long plateau at the end of the cycle. To obtain more uniform profiles, they were smoothed using the resistant smooth (rsmooth) procedure of Minitab 7.2 (Minitab, State College, PA, USA) using the default 4235H, twice method, similar to the original publications. To verify consistency, smoothing has also been redone here with Sigma Plot 8.0 (SPSS Inc., Chicago, IL, USA) and with a 2D bisquare (1 -u 2 ) 2 or Loess (1 -|u| 3 ) 3 smoothing using the nearest neighbor bandwidth method and a sampling proportion of 0.3; these resulted in almost identical values. For example, average differences between the rsmooth and Loess values were only 0.008 µm and 0.021 µm for the wee1∆ and WT cell lines, respectively (Table 1). Data up to 135 min for the WT cell and 115 min for the wee1∆ cell were considered as part of the growth period and were used for fitting.

Model for bilinear-type data: LinBiExp
Bilinear fitting was done with the LinBiExp model [21], which relies on the following functional form (written here as a function of time t instead of a general independent variable x and with all adjustable parameters denoted in Greek symbols): Here e (e = 2.718...) denotes the base of the natural logarithm (ln x = log e x), and α 1 , α 2 , χ, τ c and η are adjustable parameters. This form is somewhat more complex than those of simple linear models, f(t) = αt + χ, because it contains the logarithm of the sum of two exponentials, and it is not suitable for linear regression because it contains nonlinear parameters (τ c , η). Nevertheless, it allows a con-   [12]. **Data from [8].
venient extension of linear models with α 1 and α 2 representing the two different slopes and τ c essentially corresponding to the rate change point t RCP . LinBiExp as defined by eq. 1 is a very general bilinear model: the transition from one linear segment to the other does not necessarily have to be along a sharp break point between two lines; it can happen along a smooth, curved portion of adjustable width. The η parameter regulates the smoothness/abruptness of the transition between the two linear portions with smaller absolute values corresponding to more abrupt transitions [21]. Because QSAR data are usually on a decimal log-scale and are arranged to show a maximum, LinBiExp was implemented there in a slightly different form, ), and in most cases, η was considered as having a fixed value of 1/ln10 = 0.4343 [21,22]. No such considerations apply to the present extension; therefore, η is considered as an adjustable parameter, the only restriction being that its value has to remain sufficiently small to maintain a fast-enough transition between the two linear portions (i.e., to maintain an observably bilinear character over the investigated time-range, meaning that the rate of increase, dL/dt, remains constant for at least some time in both the beginning and the ending time-periods). Depending on the actual data, this might in some cases require an upper limit to be imposed on η, but no such restrictions were needed here. To be able to describe general bilinear data of arbitrary shapes and curvatures, α 1 , α 2 and η must be allowed to take both positive and negative values; however, all of them are always positive for the present data. Thus, LinBiExp uses a novel functional form, the logarithm of the sum of two exponentials, to obtain a completely general bilinear functionality that can now fit not only data with a minimum or a maximum, such as those commonly seen in QSAR cases, but also data that show a rate-change, such as those seen for certain growth profiles.
The nonlinear fittings required for LinBiExp can be performed using either the Excel (Microsoft, Seattle, WA, USA) worksheet or the custom-built WinNonlin (Pharsight Corp., Mountain View, CA) model provided with the model [21] (or, obviously, any other implementation with any software capable of nonlinear regression). Those presented here were performed with WinNonlin 5.0, a software package developed for pharmacokinetic modeling [17], but well-suited for the present purposes. The Gauss-Newton (Levenberg and Hartley) minimization algorithm was used with the convergence criteria set to 10 -5 , the increment for partial derivatives set to 10 -3 , and the number of iterations set to 50. User-provided initial parameter estimates and bounds were employed. All fittings were done with unweighted data. Because LinBiExp uses a smooth, continuously differentiable functional form, the optimization process is relatively trouble-free; nevertheless, sufficient care is recommended to verify that a true and not just a local optimization minimum is reached (i.e., using an increased convergence criterion and starting with different initial parameter values from both sides of the final values). Multiple linear regressions and additional statistical analyses were performed in Excel.

Model selection criteria
Because the various models discussed here use different numbers of parameters (n par ), it is not sufficient to rely simply on the correlation coefficient r or its square r 2 : which is a measure of the variance explained in the predicted variable y = f(x) and is expressed here as a function of the overall (total) variance, SS y = Σ i (y i -y mean ) 2 and of the sum of squared errors (residual variance), SSE = Σ i (y i -y i,pred ) 2 ; it is likely to increase with an increasing number of parameters. Further discrimination between rival models (model selection criteria) is needed. Improvement (decrease) in the residual standard deviation (s) is a first possibility, as it accounts at least in part for the change in the degrees of freedom, df = n obs -n par : More accurate indicators (model selection criteria) include, for example, the partial (sequential) F-tests, Mallows's C p , the Akaike information criterion (AIC), the Schwarz Bayesian information criterion (SBIC), the minimum description length (MDL), cross validation (CV, including prediction sum of squares PRESS statistics), and Bayesian model selection [17][18][19][20]. The F-statistics, by using the p-value of the corresponding F probability distribution, verifies whether the reduction in SSE is statistically significant as the corresponding degrees of freedom (df) decrease: The Akaike information criterion (AIC) [14,15] and the Schwarz Bayesian information criterion (SBIC) [16] were They both attempt to quantify the information content of a given set of parameter estimates by relating SSE to the number of parameters required to obtain the fit. The model associated with smaller values of AIC and SBIC is more appropriate, and, as shown by their definitions, SBIC is a more restrictive criterion on increasing n par . Sometimes, they are used in terms of ln(SSE/n obs ), but for a given data-set with minimization of AIC and/or SBIC as the goal, this makes no difference. AIC is similar to Mallows's C p [23]: C p = SSE/σ 2 + 2n par -n obs ≈ SSE/s 2 full model + 2n par -n obs (9) (being essentially the same if σ is known), and its asymptotic equivalence with leave-one-out (LOO) cross-validation has been demonstrated by Stone [24].

Length growth pattern in wild-type fission yeast
Growth of the wild-type (WT) cell considered is less clearly bilinear as there appears to be no sudden ratechange. Instead, there is a curved middle part corresponding to a transition section ( Figure 1). Consequently, the exponential and the bilinear LinBiExp models gave very similar fits that are hard to distinguish visually over most of their ranges. Nevertheless, even on these data, most  The bilinear model of eq. 12 gives a slightly better performance than the exponential one of eq. 11 as judged from s and AIC (they decrease) but not from the more restrictive SBIC, which is more sensitive to the increase in the number of adjustable parameters. According to the Fstatistics, the improvement in the quality of fit is statisti- LinBiExp, where it deviates significantly from both its linear segments, is proportional to η/(α 2 -α 1 ); here, data points deviating by more than 0.1 µm from both linear trend-lines were considered as part of the transition section and denoted with a different color (Figure 1). For this particular WT cell, the two slopes obtained from LinBiExp (0.042 µm min -1 , 0.064 µm min -1 ; eq. 12) correspond to an approximately 50% rate increase and are in excellent agreement with those obtained by separate linear regressions on the two end segments (0.044 µm min -1 , 0.063 µm min -1 ) as shown in Figure 1. This is somewhat higher than the average of 31% observed for these cells [8], but this is mainly due to the large scattering among individual cells in the population. The position of the RCP at about the 0.36 fraction of the cell cycle (at 62 min with a cycle time of ~ 170 min; eq. 12, Figure 1) is in excellent agreement with the average observed for WT cells (0.34) [8].

Length growth pattern in wee1∆ mutant fission yeast
Growth of the representative mutant cell (wee1∆) examined is much more clearly bilinear with a much more abrupt transition ( Figure 2); here, consequently, the bilinear model provides a much more clearly superior fit than the exponential model: For these data, the difference between the two models and the systematic error of the exponential model are much more pronounced according to all metrics and are much more clearly present even by visual inspection (Figure 2). Consequently, the F-statistic also indicates a much more significant difference [ rate-increase (somewhat less than the average of 100% observed for these mutants [8]). The rate-change point (t RCP ) is quite clearly delimited and is around 45 min (eq. 15; Figure 2), which corresponds to the 0.28 fraction of the cell cycle, in excellent agreement with the average of 0.27 for these cells. It is also worth noting that the overall growth-rate of the whole cell cycle, (division lengthbirth length)/cycle time, corresponds to the growth-rate of the first growth period (α 1 ), as the increased rate in the second growth period after the RCP (α 2 ) only makes up for the part that is lost during the final, constant-length period. This can clearly be seen in both figures as the first trend-line catches up with the length data exactly at the end of the cycle, so that the rate-growth of the daughter cell(s) will be exactly the same as that of the mother cell, as it should be. For example, in this cell, the overall growth rate is (8.41 µm -4.94 µm)/160 min = 0.0216 µm min -1 , which is in good agreement with the corresponding average of (8.4 µm -5.0 µm)/155 min = 0.0220 µm min -1 obtained from data from 129 cells [8], and corresponds excellently with the growth rate of the first period: α 1 = 0.0213 µm min -1 .

Discussion
In balanced growth of asynchronous populations of unicellular organisms, total cell mass increases exponentially as a function of time in parallel with cell number; i.e., both exponential functions are characterized by the same β parameter. This also means that every cell (or more precisely, the "average" cell) must double its mass between birth and division. The simplest hypothesis supposes that the size (volume) of individual cells during the cycle grows by the very same exponential function characterized by the very same β parameter. The only problem with this hypothesis is that many experiments with different organisms do not support it, and, at least in some cases, linear patterns with one or more rate change point(s) have been found instead [1]. This is a crucial point in cell physiology, since the two pattern-types reflect totally different strategies: namely, exponential growth means that progression through the cell cycle has no effect on growth at all, whereas the existence of rate change point(s) in a lin- Time-profiles of the speed (rate) of length-growth (∆L/∆t for the experimental data and dL/dt, the first order derivative, for the model functions) for the two types of cells investigated here, together with those obtained from the best-fitting exponential (Exp) and bilinear (LinBiExp) models Figure 3 Time-profiles of the speed (rate) of length-growth (∆L/∆t for the experimental data and dL/dt, the first order derivative, for the model functions) for the two types of cells investigated here, together with those obtained from the best-fitting exponential (Exp) and bilinear (LinBiExp) models. An attractive model organism in these studies is fission yeast, since its length (which is proportional to its volume) can be followed very easily on time-lapse microscopic films. It has long been known that there are at least two rate change points in length growth during the cell cycle of wild-type fission yeast cells [25]. One of them is connected to mitosis; from this point (designated rate change point 3 in Figure 1 and Figure 2) and up to cytokinesis, cell wall synthesis is restricted to septum formation in the middle of the cell leading to a cessation of length growth. After division, the newborn progeny immediately start to grow in length, meaning that there must be another RCP at the beginning of the cycle (designated rate change point 1 in Figure 1 and Figure 2). As a consequence, the cell cycle definitely influences length growth in fission yeast; however, whether or not growth is exponential between RCP1 and RCP3 remains an open question. Experiments seem to favor a bilinear pattern with a third RCP (designated as rate change point 2 in Figure 1 and Figure 2) over an exponential one [6,8,9]; however, detailed statistical analysis has been lacking.
Because there is only a relatively limited range for both the dependent (L) and the independent (t) variables in the cases considered here, the statistical evidence suggesting a bilinear dependence rather than an exponential one is not strong enough to favor one model unequivocally over the other. Nevertheless, the bilinear time-profile seems more adequate according to model selection criteria standards, as described in the Methods section, especially in the case of the wee1∆ cells. This is also well illustrated by a comparison of the predicted speeds of length-growth in the bestfitting exponential and bilinear models ( Figure 3): the characteristic sigmoid step-up profile obtained from the bilinear model fits the experimental data for wee1∆ much better than the continuously increasing profile obtained from the exponential model, but the case of the WT is less clear.
A major goal of the present paper is to propose a general quantitative framework for judging the adequacy of bilinear versus exponential models for arbitrary growth profiles. Hopefully, in addition to the relatively limited number of applications included here, the present detailed description of quantitative model selection procedures will also help to differentiate accurately among linear, exponential and bilinear models for future cell growth data. Furthermore, by introducing the fully optimizable bilinear model LinBiExp, the cumbersome approach of performing two separate linear regressions after separating the data at a visually determined place can be replaced by a single, unified fitting. Hence, the nonlin-ear regression algorithm itself will determine the position of the rate change point (t RCP ) and the value of the two slopes on its left and right sides (α 1 , α 2 , respectively) by minimizing the sum of squared errors (SSE), and this will not have to be done by the user on the basis of preconceived assumptions or mere visual inspection. To facilitate the application of these models and model selection criteria further, a fully functional Excel worksheet-based implementation, which relies on Excel's powerful Solver data analysis tool and contains detailed instructions, is included as a downloadable supplement (see additional file 1: Excel spreadsheet with the wee1∆ data used to perform this analysis.) Finally, we are certain that from a cell biologist's perspective, it might be difficult to accept that a mutant shows a particular phenomenon more clearly than the wild type. In such cases, the effect of the mutation on the observed phenomenon should also be examined. We are fortunate to be able to say that the bilinear length growth pattern of fission yeast is probably not an artifact produced somehow by deleting the wee1 gene from the genome. Formerly, we assumed that the reason for the existence of RCP2 in WT is different from that in the wee1∆ mutant [10]. At about 1/3 rd of their cycle, WT cells are in mid-G2 phase; they are just passing through the so-called mitotic checkpoint and are changing from unipolar to bipolar growth (a phenomenon called new end take-off, NETO, see [6]). It is easy to imagine that the RCP caused by NETO is not a sharp one, since the growth rate at the new end may continuously increase for a period. In contrast, the small-sized wee1∆ mutant cells have a quite different type of cell cycle: at about 1/4 th of their cycle, they are just replicating their DNA [26], which is a fast process on the scale of the whole cycle. As a consequence, S phase could cause the rate change here via the gene dosage effect, which might be a much sharper process, leading to a clear bilinear pattern. Note that the rate increase at RCP2 is also larger in the wee1∆ mutant than in wild type [8].