Analysis of the relationship between end-to-end distance and activity of single-chain antibody against colorectal carcinoma

We investigated the relationship of End-to-end distance between VH and VL with different peptide linkers and the activity of single-chain antibodies by computer-aided simulation. First, we developed (G4S)n (where n = 1-9) as the linker to connect VH and VL, and estimated the 3D structure of single-chain Fv antibody (scFv) by homologous modeling. After molecular models were evaluated and optimized, the coordinate system of every protein was built and unified into one coordinate system, and End-to-end distances calculated using 3D space coordinates. After expression and purification of scFv-n with (G4S)n as n = 1, 3, 5, 7 or 9, the immunoreactivity of purified ND-1 scFv-n was determined by ELISA. A multi-factorial relationship model was employed to analyze the structural factors affecting scFv: rn=ABn−ABO2+CDn−CDO2+BCn−BCst2. The relationship between immunoreactivity and r-values revealed that fusion protein structure approached the desired state when the r-value = 3. The immunoreactivity declined as the r-value increased, but when the r-value exceeded a certain threshold, it stabilized. We used a linear relationship to analyze structural factors affecting scFv immunoreactivity.


Introduction
Single-chain Fv antibody (scFv) is composed of immunoglobulin heavy-and lightchain variable regions connected by a short peptide linker [1][2][3]. ScFv is an ideal tool for the construction of single-chain bi-specific antibody fusion proteins [4][5][6]. Bivalent antibodies derived from scFv using genetic engineering have a promising future in the clinic. scFvs can be therapeutic and at the same time serve as a vector for delivering a toxin [7]. In recent years, there has been progress in colorectal cancer diagnosis and treatment using scFv as a carrier. However, achieving both high affinity and anti-tumor activity can be difficult, particularly since both are needed to be effective. Studies have shown that a proper linker can provide a scFv with biological activity more effective for clinical applications [8][9][10]. Consequently, choosing and designing a proper linker is a key consideration.
Proteomics has revealed a great deal about the composition, structure, and function of proteins, and bioinformatics provides a powerful tool to study the structure-activity relationship of fusion proteins [11][12][13]. Drug design based on structural simulation incorporates 3D structure, including data from fusion proteins with various functional domains and inter-peptide linkers [14][15][16]. Linkers that contain (G 4 S)n are the most widely used [12,17], prompting us to examine its effects on the structure and function of scFvs.

Materials and methods
Materials IC-2 and CCL-187 cells were cultured using standard conditions. IC-2 is a murine hybridism cell line that secretes the monoclonal antibody ND-1, specific for human colorectal carcinoma. CCL-187 is a human colorectal carcinoma cell line. The pET28a (+) expression vector and E. coli BL21 were contributed by Prof. J. Yun, Xi'an (China). The pMD18-T vector, E.coli JM109 competent cells, DNA polymerase, restriction enzymes, and DNA recovery kits were purchased from TaKaRa Biotechnology (Shanghai, China). mRNA purification kits and T4 DNA ligase were purchased from Pharmacia Biotech (Shanghai, China). Anti-His6 tag antibody was obtained from Invitrogen (Foster City, CA, USA). Ni-NTA resin was provided by QIAGEN (Shanghai, China), MDP and 99mTc were kindly provided by the Department of Nuclear Medicine of China Medical University (Liaoning Province, China). Heavy chain primer 1 and 2, light chain primer mix, linkers [(GGGGS)n] primer mix, and RS primer mix were purchased from Pharmacia Biotech.
ND-1 scFv-n was constructed as previously described. Briefly, mRNA was extracted from 5 × 10 6 IC-2 hybridism cells and cDNA synthesized by reverse transcription using random primers. VH and VL genes were separately amplified from cDNA by PCR using a heavy and light chain primer mix. The VH and VL gene fragments were recovered and mixed in equimolar ratios for two PCR reactions, with the first one using a linker primer mix for 7 cycles, followed by a second one using a RS primer mix for 30 cycles. As a result, VH and VL gene fragments were linked to form a scFv construct by extension, with overlapping splicing PCR. The resulting ND-1 scFv-n construct was cloned into pMD18-T and transformed into E. coli JM109, and positive clones identified by colony PCR and DNA sequencing.

Amino acid sequence
The amino acid sequence of the wild-type VH and wild-type VL are listed below [18], and illustrated in Figure 1. The amino acid sequence of the VH-(G 4 S)n-VL is: MAQVQLQQSGPGLVAPSQSLSITCTVSGFSLTTYDVHWVRQPPRKGLEWLGLVW ANGRTNCTSALMSRISITRDTSKNQVFLTMNSLQTDDTAMYYCARGSYGAVDFWG QGTTVTVSS(GGGGS)nDIELTQSPASLAVSLGQRATISYRASKSVSTSGYSYMHWQQ KPGQPPRLLIYLVSNLESGVPARFSGSGSGTDFTLNIHPVEEEDAATYYCQHIRELTRSE GGPSWK.

Homology modeling, assessment, and optimization
The amino acid sequence of a protein determines its high-level structure. Determining high-level protein structure relies on the identification of one or more known protein "templates" that resemble the structure of the query sequence, and alignment of the query sequence residues to the template residues. Swiss-Models can be used for homology modeling to search protein sequence and structure databases, such as the Protein Data Bank (PDB) [19][20][21]. A three-dimensional model of the targeted molecule can be obtained through homology modeling, and used to assess and optimize the model using Meta MQAP [22,23].

Construction of coordinate system
PDB files were obtained from Swiss-Model with the videotext coordinate system (in which the atomic coordinates are located), in order to facilitate protein structure comparison. The coordinate systems were constructed with Matlab7.0.

Determination of the origin of the coordinate system
The molecular weight of the atoms in the protein was used to calculate molecular weight, and the centric was obtained using the atomic location of each atom. The centric is the origin of the new coordinate system [24].  To determine axes we constructed a second-order moment matrix of the protein's atomic coordinates. This was regarded as the principal component of the matrix's eigenvector of the new coordinate system's X-axis, the sub-principal component of the vector Y-axis, and used to build a coordinate system of the protein's three-dimensional structure.
The 3 × 3 matrix constructed by the second-order moment matrix is as follows: Here, M abc ¼ m k : molecule weight of atoms.
[X k , Y k , Z k ]: 3D coordinates of each atom. The eigenvalues and eigenvectors of S were calculated, and the eigenvector calculated corresponding to the maximum eigenvalue as the first axis (X axis is set, X = [X1, X2, X3]), with the eigenvector corresponding to the second largest eigenvalue as the second axis (Y axis set, Y = [Y1, Y2, Y3]), and similarly for the Z axis.

Analysis of End-to-end distance in fusion proteins
The End-to-end distance is the distance between the first and the last α-carbon atom in a protein. We obtained this information and the X/Y/Z coordinates of the atoms from the PDB database. The algorithm used is as follows: A. Locate the first and last α-carbon atoms in the wild-type VH and VL, and the same in the protein after introduction of (G 4 S)n.
B. Calculate End-to-end distance of wild-type VH (VL) and mutant VH (VL) after introduction of (G 4 S)n.
C. Analyze the relationship between the End-to-end distance and n.

Biological experiments
Expression and purification of ND-1scFv-n. pET28a(+)-ND-1scFv-n plasmids were constructed as expression vectors and transformed into E. coli BL21 cells, which were grown in 100 ml LB broth with 50 mg/ml Kanamycin at 37°C. When the culture attained an O.D. of 0.6, IPTG was added to a final concentration of 1 mM, and cells were shaken at 37°C. After 3.5 h, the culture was centrifuged at 5,000 rpm for 10 min, and the cell pellets treated with lysis solution. After sonication and centrifugation, inclusion bodies containing scFv proteins were solubilized and denatured in the presence of 6 M guanidine hydrochloride. Affinity chromatography on Ni-NTA resin was use to purify scFv, and the column eluted sequentially with 8 M urea at pH8.0, 6.5 and 4.2. The pH4.2 fraction, containing scFv, was collected and recaptured by dialysis. Protein purity and concentration were determined by Bradford assay.

Western blot analysis
ND-1scFv-n proteins were detected by western blot analysis. BL21 transformed with pET-28a(+)ND-1scFv-n was incubated separately in loading buffer (125 mmol/L Tris-HCl, pH 6.8, 10% β-mercapto-ethanol, 4.6% SDS, 20% glycerol and 0.003% bromophenol blue) for 5 min at 100°C, separated by sodium dodecyl sulfate polyacrylamide gel (SDS-PAGE), and electro blotted onto PVDF membrane (Bio-Rad, Hercules, CA, USA). Non-specific binding sites were blocked for 1 h with 5% nonfat milk in TPBS (PBS contained 0.05% Twin 20), and the membrane incubated overnight at 4°C with primary antibody. After washing 3X in TPBS, the membrane was incubated with horseradish peroxidase-conjugated goat anti-rabbit IgG for 2 h at room temperature, and washed 2X with TPBS. Immunoblot signal was detected by autoradiography using an enhanced chemiluminescence detection kit.
ELISA assay for activity of ND-1scFv-n CCL-187 cells (5 × 10 4 ) were grown in 96-well micro titer plates at 37°C for 24 h, fixed with 2.5% glutaraldehyde and blocked with 1% BSA, followed by incubation with ND-1IgG or ND-1scFv at 37°C for 2 h. After washing 3X with PBS, anti-His6 antibody was added to wells with ND-1scFv-n and incubated. The plate was washed and HRP-labeled goat anti-mouse IgG was added into both ND-IgG and ND-1scFv wells. After incubating at 37°C for 2 h, TMB substrate was added, and samples incubated in darkness for 30 min. The reaction was terminated with 1 M H 2 SO 4 . PBS was used as a negative control.

Protein structures
A videotext of the coordinate system was built using the PDB atomic coordinates from PDB files received from SWISS-MODEL, using Mat lab 7.0. The maps were used for comparison of the protein structures ( Figure 2). Homology modeling using SWISS-MODEL was used to evaluate the best evaluation method. Meta-MQAP was used to assess and optimize the model. The accuracy score of the model and the root mean square (RMS) deviation are shown in Table 1. The assessment result shows that the model is reliable.

Local alignment
The End-to-end distance of VH (AB), VL (CD) and linker (BC), at different n values are presented in Table 2. It appears that linker BC was relatively stable from n = 1-7, and there were changes in the End-to-end distances for AB and CD. When the n value increased within a certain range, the End-to-end distance of VH had relatively large fluctuations. The End-to-end distance of VL basically did not change except when n = 6 and n = 0. The data suggests that the major factor for this was that the median value of BC was about 22.6622 in the End-to-end distances of linked peptides. Although the End-to-end distance changes were small, there were fluctuations in the value of AB and CD near the ideal state. Thus, the effects of the linked peptide structural factors (r) on VH and VL can be represented in the following equation: The ideal fusion protein structure should have a stable structure with the linker peptide of minr n ð Þ, as shown in The results suggest that when n = 3, the r-value was the smallest, and the structure of fusion proteins was closest to the ideal state. The r-values increased when n increased and hence the linker length increased, in which VH and VL structure would be impacted to a greater extent. When n was 6, the r value was the most unsatisfactory.

Determination of expression and purity of proteins
Plasmids ND-1scFv-pET28a (+) were transformed into E. coli BL21, and protein expression induced with IPTG. Western blot analysis indicated that BL21 lysates expressed scFv-n proteins with bands of 30 kDa (Figure 4). The sequences encoding the short His-tag peptide were upstream of the multi-cloning site (MCS) of vector pET28a (+), and ND-1scFv-n was expressed as a recombinant fusion protein. Western blot analysis showed that scFv-n protein is expressed in inclusion bodies in the supernatant of BL21 lysates. Inclusion body protein was purified to 94% by metal affinity chromatography using Ni-NTA resin, which binds to the His-tag protein marker on the N terminal end of scFv.

Analysis of the relationship between immunoreactivity and End-to-end distance
The immunoreactivity of purified ND-1scFv-n was determined by ELISA. scFv-n exhibits an immunoreactivity similar to the parental ND-1 antibody, and demonstrated good binding to CCL-187 cells expressing colorectal carcinoma associated antigen LEA. This suggests that scFv-n retains good specificity and activity.   Table 3 shows the relationship between scFv immunoreactivity (A 450 value) and rvalues. The immunoreactivity declined with increasing r-values. It changed significantly when the r-value was less than 42.3716. When the r-value exceeded this value, immunoreactivity became relatively stable ( Figure 5).

Discussion
Homology modeling has been successfully applied to interpreting the correlation of protein sequence, structure, and function. Using a structural model, multiple sequences of orthologues proteins can be compared and evaluated according to the restrictions of natural selection and requirements of protein folding, stability, dynamics, and function. Homology modeling can help determine which functional groups the protein belongs to based on the analyses of conserved residues in the binding site. Homology modeling also plays an important role in computer-aided drug design [25,26].
One basic issue in the study of protein structure is structural comparison. The relatively direct comparison method is to consider the protein as a rigid structure composed of a series of point sets, then compare the corresponding residues of different proteins. At the beginning, a rigid superposing method was used (to translate and rotate the spatial structure of the protein to find the corresponding residues between two proteins) [27,28]. However, Chen proposed using a weight distribution of the atoms composing the protein, and to use this to calculate the protein's gravity center, using a 3 × 3 matrix composed of second-order moments [24]. On this basis, one can use principal component analysis (PCA) to find the main and secondary axis. The best rigid superposition is obtained through superposing the gravity centers of the proteins and then rotating them to let their main axes superimpose. In this study, we used the molecular weight of the atoms to get the centric according to the coordinates of each atom.
It is recognized that fusion proteins have varied affinity and anti-tumor activity compared to the original molecules, due in large part to the structural alterations of the fusion proteins [4,[28][29][30][31]. The inter-peptide linkers can be optimized with computeraided design [32]. Based on homology modeling of derivatives [33], future designs of inter-peptide linkers can be viewed as solving an equation. The structure and characteristics of target molecules, and the composition, length, and flexibility of inter-peptide linker should be taken into consideration [34,35].
In previous studies [35][36][37], the length and composition of the linkers that have been used to link VH and VL on bivalent single-chain antibody often impact stability and function. Linkers may be too short to fold correctly by intermolecular static influence or be too long to ameliorate the immunogenicity of antibodies. To satisfy these requirements, several design strategies have been developed. One approach is to use the flexible Glycine rich sequences (G 4 S)n as tethers. Linkers comprising repeats of G 4 S have been used to construct bivalent single-chain antibodies targeting colorectal cancer with linkers of 5-15 amino acids [18,36]. With a 5 amino acid linker, immune reactivity was unsatisfactory, possibly because the linker was too short to provide an effective distance for the two antigen-binding sites, which affected the stability of the cross-linked protein. The linker with 15 amino acids tended to fold correctly and retained the bivalent single-chain antibody's affinity and capacity. It has long been noted that sufficient flexibility and length for VH and VL domains are achieved by assembling them in the natural Fv orientation to form a monovalent antigen-binding site, which is comparable to the Fab fragment of native antibodies. It has also been shown that the length and sequence of the linker peptide significantly affects scFv expression and stability [36]. It should be pointed out that the impact of linker length on the activity and affinity of engineered antibodies depends strongly on the distance between the N-and Cterminal of the VH domain [37]. A certain degree of flexibility in the linker is required for the functional cooperation of the two subunits. The goal of this study was to characterize novel scFvs and to quantify the impact of linker peptide on binding affinity. Using computer guided homology, scFvs with different linker peptides were proposed based upon the activity and the End-to-end distance. Our aim was to evaluate the impact of (G 4 S)n on the structure and function of VH and VL, and to find the relationship between VH/VL's End-to-end distance and n (or BC) on bivalent single-chain antibodies targeting colorectal cancer. A multi-factor relationship model was established to evaluate VH and VL structural factors using the following formula: r n ð Þ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Based on simulated data and biological experiments, a linear relationship has been established between the immunoreactivity and r-values. The immunoreactivity declines as the r-value increases. Fusion protein structure is ideal when the r = 3. When the n value is 6, protein structure is least satisfactory. However, further exploration of this relationship is needed. Indeed, the expression level and activity of scFv depends largely on the length and sequence of linker. Thus, successful construction of a scFv depends on the selection of a linker that neither interferes with the folding and association of VH and VL domains nor reduces the stability and recognition abilities of the Fv molecule.
In summary, based on the databases of natural protein structures and their associated functions, we predicted the structure and function of fusion proteins by homology modeling and further conducted biological experiments to validate our calculations. Thus, a dual approach that incorporates molecular modeling and linker design of engineered antibodies with quantitative determination of antibody affinity is useful to optimize construction. Our approach provides not only a rationale for designing novel engineered antibodies using molecular modeling, but also provides new insight into quantifying antibody binding affinity, especially at low protein concentration. A combination of bioinformatics and genetic research may therefore be beneficial in exploring new agents for genetic engineering of antibodies.

Competing interests
The author(s) declare that they have no competing interests.