- Research
- Open Access

# A statistical model for mapping morphological shape

- Guifang Fu
^{1, 2}, - Arthur Berg
^{2}, - Kiranmoy Das
^{1, 2}, - Jiahan LI
^{1, 2}, - Runze LI
^{1, 2}and - Rongling Wu
^{3, 2, 1}Email author

**7**:28

https://doi.org/10.1186/1742-4682-7-28

© Fu et al; licensee BioMed Central Ltd. 2010

**Received:**11 February 2010**Accepted:**1 July 2010**Published:**1 July 2010

## Abstract

### Background

Living things come in all shapes and sizes, from bacteria, plants, and animals to humans. Knowledge about the genetic mechanisms for biological shape has far-reaching implications for a range spectrum of scientific disciplines including anthropology, agriculture, developmental biology, evolution and biomedicine.

### Results

We derived a statistical model for mapping specific genes or quantitative trait loci (QTLs) that control morphological shape. The model was formulated within the mixture framework, in which different types of shape are thought to result from genotypic discrepancies at a QTL. The EM algorithm was implemented to estimate QTL genotype-specific shapes based on a shape correspondence analysis. Computer simulation was used to investigate the statistical property of the model.

### Conclusion

By identifying specific QTLs for morphological shape, the model developed will help to ask, disseminate and address many major integrative biological and genetic questions and challenges in the genetic control of biological shape and function.

## Keywords

- Quantitative Trait Locus
- Shape Mapping
- Leaf Shape
- Backcross Progeny
- Signed Distance Function

## Background

Morphological shape is one of the most conspicuous aspects of an organism's phenotype and provides an intricate link between biological structure and function in changing environments [1, 2]. For this reason, comparing the anatomical and shape feature of organisms has been a central element of biology for centuries. Nowadays, attempts have been made to unlock the genetic secrets behind phenotypic differentiation in developmental shape [3], understand the origin and pattern of shape variation from a developmental perspective [4, 5], and predict the adaptation of morphological shapes in a range of environmental conditions [6].

Three major advances in life and physical science during the last decades will make it possible to study shape variation and its biological underpinnings. First, DNA-based molecular markers allow the identification of quantitative trait loci (QTLs) and biochemical pathways that contribute to quantitatively inherited traits such as shape. In his seminal review, Tanksley [3] summarized some major discoveries of genes for fruit size and shape in tomato. In a long process of domestication, tremendous shape variation has occurred in tomato fruit from almost invariably round (wild or semiwild types) to round, oblate, pear-shaped, torpedo-shaped, and bell pepper-shaped (cultivated types). Some of the QTLs that cause these differences, namely *fw2.2*, *ovate*, and *sun*, have been cloned [7–9].

Second, digital technologies through computerized analyses and processing procedures can obtain a comprehensive representation of the involved objects, capable not only of representing most of the original information, but also of emphasizing their less redundant portions [10–15]. Third, statistical and computational technologies have well been developed for analyzing high-dimensional, large-scale, high-throughput data of high complexity [16, 17]. With the development of missing data analysis, Lander and Botstein [18] have been able to pioneer an approach for dissecting complex quantitative traits into individual QTLs using genetic linkage maps constructed with molecular markers. There has been a vast wealth of literature in the development of QTL mapping models (see [19–25] among many others).

The motivation of this study is to develop a statistical and computational model for mapping specific QTLs that are responsible for differences in morphological shape. Historically, genetic mapping has been focused on the genetic control of a trait at a static point, ignoring the dynamic behavior and spatial properties of the trait. Now, by integrating the developmental principle of trait growth, a new genetic mapping approach, called functional mapping [26–28], can be used to study the dynamic control of genes in time course. The central idea of functional mapping is to connect the genetic control of a developmental trait at different time points through robust mathematical and statistical equations. Complementary to functional mapping, the model developed for shape mapping in this study links gene action with key morphometric parameters of a shape within a statistical framework. We will perform computer simulation to examine the statistical properties of the model.

## Model

### Genetic Design

*n*, founded with two inbred lines that are sharply contrasting in leaf shape. Because of gene segregation, there is a range of variation in leaf shape among the backcross progeny. Such shape variation is illustrated in Fig. 1 by using leaf morphology in cucurbit plants [29]. To map the shape trait, the mapping population is typed for a panel of molecular markers from which a genetic linkage map covering the genome is constructed. The statistical approach for linkage analysis and map construction is reviewed in Wu et al. [30]. Assume that there are some specific QTLs responsible for the biological shape. The approach being developed aims to detect and map such QTLs by capitalizing on knowledge about shape analysis and biological principles behind shape formation and variation.

### Shape Analysis

*I*i (

*i*= 1, ...,

*n*), described by a black and white image, it is gridded as an

*L*×

*L*matrix, where L is the number of pixels in the row and column. At each point in the matrix, we use 0 to denote the background (black) and 1 to denote the leaf (including an arbitrary shape of it) (white). The 1/0 value of the matrix is assumed to follow a Bernoulli distribution. All these

*n*shapes,

*T =*{

*I*

^{1},

*I*

^{2}, ...,

*I*

^{n}}, need to be aligned, in order to minimize the interference caused by pose variations. This can be carried out by establishing a coordinate reference with respect to position, scale and rotation, commonly known as pose to which all shapes are aligned [10, 12, 14]. Denote the pose parameter for each shape

*I*i

*by p*i

*= [a, b, h, θ]*T where

*a*and

*b*correspond to

*x*and

*y*translations,

*h*is the scaling parameter, and

*θ*corresponds to rotation. The transformed image of

*I*i , based on the pose parameter

*p*i , is denoted by

*Ĩ*

^{ i }, defined as

The translation matrix T [*p*] is the product of three matrices: a translation matrix *M*(*a*, *b*), a scaling matrix *H*(*h*), and an in-plane rotation matrix *R*(*θ*). The transformation matrix *T* [*p*] maps the coordinates (*x, y*) ∈ *R*^{2} into coordinates
∈ *R*^{2}, where *x, y* = 1, ..., *L*.

*n*binary images is to use a gradient descent to minimize the following energy function:

where Ω denotes the image domain. Minimizing the energy function (2) is equivalent to simultaneously minimizing the difference between any pair of binary images in the training database. What we would like to estimate is the pose parameter *p*
i
for each *I*
i
.

*p*

^{ i }and

*Ĩ*

^{ i }are given in each iterative step. The steepest gradient algorithm is then used to minimize

*E*in (2) and get the pose parameter

*p*

^{ i }for each shape

*I*

^{ i }. All the training shapes after the alignment procedure described above are obtained (see Fig. 2).

### Statistical Model

*Ĩ*

^{1},

*Ĩ*

^{2}, ...,

*Ĩ*

^{n}}., i.e., the transformed images, which now become continuous variables. The signed distance function was used as a shape descriptor to represent the contours of the shape. Each contour is embedded as the zero level set of a signed distance function with negative distances assigned to the inside and positive distances assigned to the outside. This technique yields

*n*level sets functions

*Y =*{

*Y*

_{ 1 },

*Y*

_{ 2 }, ...

*Y*

_{ n }} corresponding to above

*n*aligned training shapes. From the standpoint of QTL mapping, we treat

*Y =*{

*Y*

_{ 1 },

*Y*

_{ 2 }, ...,

*Y*

_{ n }} as the multiple phenotypic traits of

*n*individuals. For a progeny

*i*(

*i*= 1, 2, ...,

*n*), we have

Thus, each individual has a total of *m* = *L*^{2} phenotypes.

*Y*is assumed to have arisen from one of the two groups of QTL genotypes, each group being modeled from a density function (frequently a normal distribution is assumed). Thus, the population density function of

*Y*is

*ω*represents the mixture proportions (

*ω*

_{1|i},

*ω*

_{2|i}), which are constrained to be nonnegative and sum to unity,

*ϕ*

_{ j }is the expectation parameter specific to different QTL genotypes

*j*= 1, 2, and

*η*is the variance-covariance parameter common to all genotype groups, and

*f*

_{ j }(

*Y*

_{ i }|

*ϕ*

_{ j },

*η*) is the probability density function for QTL genotype

*j*. After images are transformed, Y

_{ i }can be assumed to follow a multivariable normal distribution, i.e.,

and (*m × m*) residual variance-covariance matrix of the variables ∑. If some patterns exist, we will use *ϕ*_{
j
} to model the mean structure of *μ*_{
j
} and *η* to model the covariance structure of ∑.

*L × L*rectangular grid of the training shapes to generate

*m = L × L*lexicographically ordered samples (where the columns of the matrix grid are sequentially stacked on top of one other to form one large row). Also, we assume that all the observations in the long row are independent among the progeny. Now, from equation (5), we get the likelihood function as

where the mean matrix of QTL genotype *j (μ*_{
j
} ) is modeled by parameter *ϕ*_{
j
} , and covariance matrix (∑) modeled by parameter *η*.

### Computational Algorithm

*j*using

for *j* = 1, 2 and *k* = 1, 2, ..., *m*.

The EM steps are iterated between equations (9) and (10) until the estimates converge to stable values. It should be pointed out that the data set for shape analysis is highly sparse and high-dimensional. For example, if a shape is described by (256 × 256) pixels, i.e., L = 256, then we will have m = 256^{2} = 65, 536, and an (*n* × 65, 536) matrix for the phenotypic observations. Several approaches will be developed to model the structure of the variance-covariance matrix. One of the simplest approaches is to use
. This choice is large enough to assure that various levels of differences lie well within a Gaussian distribution.

### Hypothesis Tests

As like an usual mapping approach, shape mapping has a problem of uncertain distribution for the log-likelihood test statistic. However, an empirical approach based on permutation tests, which does not rely on the distribution of log-likelihood ratios, can be used to determine the threshold for claiming the existence of a significant QTL.

## Computer Simulation

Cucurbit (*Cucurbita argyrosperm*) plants display tremendous variation in leaf shape between cultivars and wild types [29]. By mimicking leaf morphologies of this species, we performed simulation studies to examine the statistical behavior of our shape mapping model. A backcross population of 200 progeny was simulated for a linkage group with 11 equally spaced markers. A QTL that determines leaf shape is hypothesized on the third marker interval. The phenotypic values of the shape were simulated with a (75 × 75) dimension by *Y*_{
i
} = *ξ*_{
i
}*μ*_{1} + (1-*ξ*_{
i
} )*μ*_{2} + *e*_{
i
} , where *μ*_{
j
} is the mean shape matrix for QTL genotype *j* (*j* = 1, 2), *ξ*_{
i
} is the indicator variable defined as 1 and 0 if progeny *i* carries QTL genotype *QQ* (1) and *qq* (2), respectively, and *e*_{
i
} follows a multivariate normal distribution with mean vector zero and covariance matrix ∑. To simplify computing, we assumed that ∑ is an identity matrix. We designed two simulation schemes to test our shape mapping algorithm.

## Discussion

When specific genes that control morphological shape and physiological function are identified, we are in an excellent position to address fundamental questions related to growth, development, adaptation, domestication, and human health. In the past decades, the increasing availability of DNA-based markers has inspired our hope to map genes or quantitative trait loci (QTLs) for complex phenotypes [19–25]. However, only several studies have been alert to map so-called shape genes; a few successful examples are the positional cloning of genes for fruit shape in tomato [3, 7–9]. These successes result from the fact that a major mutation occurs to determine shape difference. For many quantitatively inherited shape traits, genetic mapping will provide a powerful tool for characterizing QTLs affecting morphological shape. Klingenberg and colleagues [4, 5] have developed quantitative genetic theory to estimate the heritability of shape by integrating geometric shape analysis. This theory was used to map specific QTLs for morphometric shapes in the mouse [32, 33]. Airey et al. [34] used Procrustes superimposition to study shape differences in the cortical area map of inbred mice.

In this article, we present a new statistical model for mapping shape QTLs in a segregating population. The new model embeds shape analysis within a mixture model framework in which different types of morphological shape are defined for individual genotypes at a QTL. The model was solved using a traditional shape correspondence analysis approach and EM algorithm. The advantage of shape mapping lies in its capacity to quantify subtle differences in any corner of a morphological shape and detect specific QTLs that contribute to these differences. Results from simulation studies suggest that the model has reasonably high power to detect a QTL that control shape difference. Even with a modest sample size (200), the model is able to discern the effect of a QTL with a small effect on morphological shape. The model can be easily extended to model epistatic interactions on morphological shape by including more components in the mixture model.

The model will be needed to be modified for integrating developmental events and their consequences into ontogenetic trajectories of shape. Modern biological studies display an increasing interest in understanding shape variation in ontogenetic processes that bring about differentiation at an adult stage [35–37]. In a longitudinal study of radiographs of the Denver Growth Study, Bulygina et al. [37] investigated the morphological development of individual differences in the anterior neurocranium, face, and basicranium. The modified model can map the QTLs that cause variation in shape developmental trajectories.

In biology, a cell or organ fulfill certain biological functions through its shape. Shape is thought to govern the extent and pattern of energy, matter and signal transduction through the surface and inner structure of the biological object. For this reason, an understanding of biological curvature and texture has received a surge of interest in structural biology. The new model can be extended to map the QTLs that determine a three-dimensional (3D) shape and texture of a biological object. Vision technologies have been developed to estimate the 3 D shape of an object from 2 D image data without information about its texture (albedo), its pose and the illumination environment [38, 39]. These technologies include a 3 D morphable model (3DMM) that represents the 3 D shapes and textures as a linear combination of shapes and textures principal components, a stochastic Newton optimization algorithm that ts the 3DMM to a single facial image, thereby estimating the 3 D shape, the texture and the imaging conditions, and a multi-features fitting algorithm that uses not only the pixel intensity but also other image cues such as the edges and the specular highlights. Statistical models can be developed to map QTLs that control the 3 D shape and texture of a biological object with image data. A series of hypothesis tests about the genetic control of topological features (such as stepness and ridgeness) and texture of a shape will be formulated.

## Declarations

### Acknowledgements

NSF/NIH Joint grant DMS/NIGMS-0540745 and the Changjiang Scholars Award to RW. RL's research is supported by NIDA, NIH grants R21 DA024260 and R21 DA024266. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

## Authors’ Affiliations

## References

- Ricklefs RE, Miles DB: Ecological and evolutionary inferences from morphology: an ecological perspective. Ecological morphology. Edited by: Wainwright PC, Reilly SM. 1994, Univ. of Chicago Press, Chicago, 13-41.Google Scholar
- Reich PB: Body size, geometry, longevity and metabolism: do plant leaves behave like a animal bodies?. Trends Ecol Evol. 2001, 16: 674-680. 10.1016/S0169-5347(01)02306-0.View ArticleGoogle Scholar
- Tanksley SD: The genetic, developmental, and molecular bases of fruit size and shape variation in tomato. Plant cell. 2004, 16: S181-S189. 10.1105/tpc.018119.PubMed CentralView ArticlePubMedGoogle Scholar
- Klingenberg CP, Leamy LJ: Quantitative genetics of geometric shape in the mouse mandible. Evolution. 2001, 55: 2342-2352.View ArticlePubMedGoogle Scholar
- Klingenberg CP: Quantitative genetics of geometric shape: heritability and the pitfalls of the univariate approach. Evolution. 2001, 57: 191-195.View ArticleGoogle Scholar
- Tsukaya H: Leaf shape: genetic controls and environmental factors. Intl J Dev Biol. 2005, 49: 547-555. 10.1387/ijdb.041921ht.View ArticleGoogle Scholar
- Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, Liu J, Meller J, Elber R, Alpert KB, Tanksley SD:
*fw2.2*: A quantitative trait locus key to the evolution of tomato fruit size. Science. 2000, 289: 85-88. 10.1126/science.289.5476.85.View ArticlePubMedGoogle Scholar - Liu J, Van Eck J, Cong B, Tanksley SD: A new class of regulatory genes underlying the cause of pear-shaped tomato fruit. Proc Natl Acad Sci USA. 2002, 99: 13302-13306. 10.1073/pnas.162485999.PubMed CentralView ArticlePubMedGoogle Scholar
- Xiao H, Jiang N, Schaffner E, Stockinger EJ, van der Knaap E: A retrotransposonmediated gene duplication underlies morphological variation in tomato fruit. Science. 2008, 319: 1527-1530. 10.1126/science.1153040.View ArticlePubMedGoogle Scholar
- Bookstein FL: The Measurement of Biological Shape and Shape Change. 1978, Springer- Verlag, New YorkView ArticleGoogle Scholar
- Monteiro LR, Diniz-Filho JA, dos Reis SF, Araujo ED: Geometric estimates of heritability in biological shape. Evolution. 2002, 56: 563-572.View ArticlePubMedGoogle Scholar
- Adams DC, Rohlf FJ, Slice DE: Geometric morphoetrics: ten years of progress following the "revolution". Ital J Zool. 2004, 71: 5-16. 10.1080/11250000409356545.View ArticleGoogle Scholar
- Bernal B: Size and shape analysis of human molars: Comparing traditional and geometric morphometric techniques. J Comp Hum Biol. 2007, 58: 279-296. 10.1016/j.jchb.2006.11.003.View ArticleGoogle Scholar
- Stegmann MB, Gomez DD: A Brief Introduction to Statistical Shape Analysis. 2002, Informatics and Mathematical Modelling, Technical University of Denmark, DTUGoogle Scholar
- Basri R, Costa L, Geiger D, Jacobs D: Determining the similarity of de- formable shapes. Vision Res. 1998, 38: 2365-2385. 10.1016/S0042-6989(98)00043-1.View ArticlePubMedGoogle Scholar
- Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B. 1977, 39: 1-38.Google Scholar
- Tsai A, Wells W, Warfield S, Willsky A: An EM algorithm for shape classification based on level sets. Med Image Anal. 2005, 9: 491-502. 10.1016/j.media.2005.05.001.View ArticlePubMedGoogle Scholar
- Lander ES, Botstein D: Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989, 121: 185-199.PubMed CentralPubMedGoogle Scholar
- Zeng Z-B: Precision mapping of quantitative trait loci. Genetics. 1994, 136: 1457-1468.PubMed CentralPubMedGoogle Scholar
- Jansen RC, Stam P: High resolution mapping of quantitative traits into multiple loci via interval mapping. Genetics. 1994, 136: 1447-1455.PubMed CentralPubMedGoogle Scholar
- Xu S, Atchley W: A random model approach to interval mapping of quantitative trait loci. Genetics. 1995, 141: 1189-1197.PubMed CentralPubMedGoogle Scholar
- Lynch M, Walsh B: Genetics and Analysis of Quantitative Traits. 1998, Sinauer Associates, Sunderland, MAGoogle Scholar
- Broman KW, Speed TP: A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J Roy Stat Soc Ser B. 2002, 64: 641-656. 10.1111/1467-9868.00354.View ArticleGoogle Scholar
- Zou F, Fine JP, Hu J, Lin DY: An efficient resampling method for assessing genome-wide statistical significance in mapping quantitative trait loci. Genetics. 2004, 168: 2307-2316. 10.1534/genetics.104.031427.PubMed CentralView ArticlePubMedGoogle Scholar
- Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D: Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics. 2005, 170: 1333-1344. 10.1534/genetics.104.040386.PubMed CentralView ArticlePubMedGoogle Scholar
- Ma C-X, Casella G, Wu RL: Functional mapping of quantitative trait loci under-lying the character process: A theoretical framework. Genetics. 2002, 161: 1751-1762.PubMed CentralPubMedGoogle Scholar
- Wu RL, Ma C-X, Lou Y-X, Casella G: Molecular dissection of allometry, ontogeny and plasticity: A genomic view of developmental biology. BioScience. 2003, 53: 1041-1047. 10.1641/0006-3568(2003)053[1041:MDOAOA]2.0.CO;2.View ArticleGoogle Scholar
- Wu RL, Lin M: Functional mapping How to study the genetic architecture of dynamic complex traits. Nat Rev Genet. 2006, 7: 229-237. 10.1038/nrg1804.View ArticlePubMedGoogle Scholar
- Schlichting CD, Pigliucci M: Phenotypic Evolution: A Norm Reaction Perspective. 1998, Sinauer Associates, Sunderland, MAGoogle Scholar
- Wu RL, Ma C-X, Casella G: Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL. 2007, Springer-Verlag, New YorkGoogle Scholar
- Dryden IL, Mardia KV: Statistical Shape Analysis. 1998, John Wiley & Sons, New YorkGoogle Scholar
- Leamy LJ, Klingenberg CP, Sherratt E, Wolf JB, Cheverud JM: A search for quantitative trait loci exhibiting imprinting effects on mouse mandible size and shape. Heredity. 2008, 101: 518-526. 10.1038/hdy.2008.79.View ArticlePubMedGoogle Scholar
- Klingenberg CP, Leamy LJ, Cheverud JM: Integration and modularity of quantitative trait locus effects on geometric shape in the mouse mandible. Genetics. 2004, 166: 1909-1921. 10.1534/genetics.166.4.1909.PubMed CentralView ArticlePubMedGoogle Scholar
- Airey DC, Wu F, Guan M, Collins CE: Geometric morphometrics defines shape differences in the cortical area map of C57BL/6J and DBA/2J inbred mice. BMC Neurosci. 2006, 7: 63-10.1186/1471-2202-7-63.PubMed CentralView ArticlePubMedGoogle Scholar
- Vioarsdottir US, O'Higgins P, Stringer C: A geometric morphometric study of regional differences in the ontogeny of the modern human facial skeleton. J Anat. 2002, 201: 211-229. 10.1046/j.1469-7580.2002.00092.x.View ArticlePubMedGoogle Scholar
- Quillevere F, Debat V, Aurray J-C: Ontogenetic and evolutionary patterns of shape dierentiation during the initial diversication of paleocene acarininids (
*planktonic foraminifera*). Paleobiology. 2002, 28: 435-448. 10.1666/0094-8373(2002)028<0435:OAEPOS>2.0.CO;2.View ArticleGoogle Scholar - Bulygina E, Mitteroecker P, Aiello L: Ontogeny of facial dimorphism and patterns of individual development within one human population. Am J Phys Anthrop. 2006, 131: 432-443. 10.1002/ajpa.20317.View ArticlePubMedGoogle Scholar
- Romdhani S, Vetter T: Estimating 3 D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. IEEE Computer Soc Conf Computer Vision Pattern Recog. 2005, 2: 986-993.Google Scholar
- Romdhani S, Ho J, Kriegman DJ: Face recognition using 3-D models: Pose and illumination. Proc IEEE. 2006, 94: 1977-1999. 10.1109/JPROC.2006.886019.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.