A simple powerful bivariate test for two sample location problems in experimental and observational studies

Background In many areas of medical research, a bivariate analysis is desirable because it simultaneously tests two response variables that are of equal interest and importance in two populations. Several parametric and nonparametric bivariate procedures are available for the location problem but each of them requires a series of stringent assumptions such as specific distribution, affine-invariance or elliptical symmetry. The aim of this study is to propose a powerful test statistic that requires none of the aforementioned assumptions. We have reduced the bivariate problem to the univariate problem of sum or subtraction of measurements. A simple bivariate test for the difference in location between two populations is proposed. Method In this study the proposed test is compared with Hotelling's T2 test, two sample Rank test, Cramer test for multivariate two sample problem and Mathur's test using Monte Carlo simulation techniques. The power study shows that the proposed test performs better than any of its competitors for most of the populations considered and is equivalent to the Rank test in specific distributions. Conclusions Using simulation studies, we show that the proposed test will perform much better under different conditions of underlying population distribution such as normality or non-normality, skewed or symmetric, medium tailed or heavy tailed. The test is therefore recommended for practical applications because it is more powerful than any of the alternatives compared in this paper for almost all the shifts in location and in any direction.


Background
Few medical research studies involve comparing two groups on only a single response variable; comparisons on two or more response variables are usually desired. If a single variable is identified as of major research interest, it will be appropriate to apply a two independent samples t-test or Mann-Whitney test. In some studies, however, two response variables are of equal interest and importance. For example, in studies comparing two different treatments for hypertension, it is equally important to compare their effects on both systolic and diastolic blood pressure. For such studies, a bivariate analysis that compares the treatments on two response variables simultaneously may have advantages over two separate univariate tests, one for each variable. The great advantage of bivariate analysis is the possibility of increased power. If the response variables are not too highly correlated, the bivariate test has a chance of finding significant differences among the treatments even if none of the univariate tests is significant [1].
In most medical research, location analysis may be sufficient and testing the distributions is not necessary. For example, when it is decided to compare two characteristics of a population, such as the weight and height of infants, with those of another population, the researcher tries to compare the bivariate location in two populations. In terms of statistical theory, this problem may be restated as follows.
For the above-mentioned problem, we know that for a bivariate normal population, Hotelling's T 2 is the best. In addition, under nonsingular linear transformations, T 2 is invariant.
When the underlying population is unknown, many nonparametric tests have been proposed. In 1958, Blumen [2] described a sign test for the hypothesis that the medians of two or more variables had a particular value for the bivariate case. The slopes of the vector from the bivariate median to the n sample points were arranged in ascending order according to the respective angles made with the positive horizontal axis. Blumen's proposed statistic is proportional to the squared distance from the centre of gravity of the hypothesized centre. In 1962, Bennett [3] used certain properties of the multivariate normal integral to develop sign tests for the equality of means in two correlated multivariate populations. Chatterjee and Sen [4] extended the Wilcoxon-Mann-Whitney rank sum test to the case of two variables following a conditional approach. Mardia [5] proposed an unconditional non-parametric statistic using the median vector of the combined sample. Peters and Randles [6] introduced a sign rank affine invariant test for the difference in location between two elliptically symmetric populations. Hettmansperger and Oja [7] developed a multivariate invariant sign-test for the multi-sample location problem. Sen and Mathur [8] used the angles made by centerized data for two samples with the positive direction of the x-axis to construct a test statistic suggested as an affine-invariant test statistic for the bivariate two sample location problem. Sen and Mathur [9] proposed a consistent test similar to the Mann-Whitney test for difference in locations between two bivariate populations. LaRocque et al. [10] extended the univariate Wilcoxon sign rank test to the bivariate location problem. Baringhaus and Franz [11] proposed a test statistic using the difference between the sums of all the Euclidean interpoint distances. Mathur [12] suggested a nonparametric bivariate test for two sample location problem that did not require affine-invariance or elliptic-symmetry to be assumed.
The findings of most of these tests are not easy to apply and their powers depend on the direction of shifts and the covariance matrix of the alternative distribution. Some of the proposed tests are powerful only for particular forms of distributions and some of them require specific assumptions to verify the test statistics. Thus, it seems that the tests available in the literature are not wholly adequate and hence it is necessary to introduce a test statistic more powerful than the existing ones, which does not depend on the covariance structure of the underlying population and is also easy to apply with readily available software for those who are not experts in statistics.
In the following section, we present a simple bivariate test statistic for the two sample location problem. To investigate the power of the proposed test and to compare it with the alternatives in the literature, a simulation study was carried out. A summary of the power study is displayed in the results and discussion sections. In the conclusion section, an application of the proposed test statistic to a real set of data is given.

Test Statistic
Let (X 1i , Y 1i ) i = 1, ..., m and (X 2j , Y 2j ) j = 1, ..., n be two independent random samples from bivariate populations. [X 1 Y 1 ] ' and [X 2 Y 2 ] ' denote the joint distributions of X 1 , Y 1 and X 2 , Y 2 respectively. We intend to test the null hypothesis given in (1) against the alternative (2). According to the structure of this testing problem, it is presumed that the two distributions [X 1 Y 1 ] ' and [X 2 Y 2 ] ' have the same structural form, but there may be a location shift in [X 2 Y 2 ] ' with respect to [X 1 Y 1 ] ' . We therefore aim to test the existence of a location shift.
It is obvious that many tests are available to test the difference in locations between two univaraite populations. It is therefore desirable to find a convenient transformation for changing the bivariate data to the univariate case. We implement our test for three possible combinations of shift direction as follows: (i) When the shift directions for two variables are the same i.e. δ 1 δ 2 > 0, random variables are defined as S 1i = X 1i + Y 1i for i = 1, ..., m and S 2j = X 2j + Y 2j for j = 1, . where δ = δ 1 + δ 2 is a location difference parameter. In fact, this is a location problem in the univariate case and an available test such as the Mann-Whitney test can be used to solve it.
(ii) When the shift directions for two variables are not the same i.e., δ 1 δ 2 < 0, the random variables are defined as for i = 1, ..., m and for j = 1, ..., n. Therefore, it is sufficient to test against Where δ = δ 1 -δ 2 is a location parameter. For this location problem in the univariate case, the Mann-Whitney test is again used.
(ii) When (δ 1 = 0, δ 2 ≠ 0) or (δ 1 ≠ 0, δ 2 = 0), it is enough to apply a rank test to the second variable or the first variable, respectively. Remark 2: (a) Note that, when the two variables are on significantly different scales, the data have to be transferred by the following relations before solving the testing problem: and where , and and are the standard deviations of the random variables X 1 (or X 2 ) and Y 1 (or Y 2 ), respectively.
In application, when the two variables are on significantly different scales, the data are transformed by the following relations before using a test statistic and testing hypotheses: and where , and , are the samples pooled variances of the first variables and the second variables in (X 1 , Y 1 ) and (X 2 , Y 2 ), respectively.

Power
This section indicates the results of a Monte Carlo study to assess the power of the new test. For comparison purposes, the performances of the following tests were simulated: (1) Hotelling's T 2 test with test statistic: where S -1 is the inverse of the sample variance-covariance matrix S [13].
(2) The Rank test, which is based on marginal ranks, is given by are the set of scores for each j = 1,2 and X ij are independent identically-distributed random variables with a continuous bivariate distribution [14].
The new proposed test (P) was compared with the above four tests using samples from bivariate normal and non-normal distributions. Simulations were run for bivariate normal with ρ = -0.5,0,0.5. Also, simulations were run for some non-normal distributions generated using the g-and-h distribution [15], i.e. generating Z ij from a bivariate normal distribution and setting .
For g = 0 this expression is taken to be .
As the g-and-h distribution provides a convenient method for considering a very wide range of situations corresponding to both symmetric and highly asymmetric distributions, its use is highly recommended. The case g = h = 0 corresponds to a normal distribution, the case g = 0 corresponds to a symmetric distribution, and, as g increases, the skewness increases as well. For example, with g = 0.5 and h = 0, the skewness is 1.75, which is great [16].
In this study, simulations were run with g = 0.25 and g = 0.5 to span the range of skewness values that seems to occur in practice.
The parameter h determines the heaviness of the tail. As h increase, the heaviness increases as well. With h = 0.2 and g = 0, the kurtosis equals 36. This might seem extreme, but even higher values were found by Wilcox, so our simulations were run for h = 0.2 [16].

Results and discussion
The results in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 were based on 10,000 samples of sizes 15, 18 from a bivariate population with location parameters (δ 1 , δ 2 ). A nominal significance level of 0.05 was used. STAT, MASS, CRAMER and ICSNP libraries in R program version 2.10.0 were used.
Under the bivariate normal distribution with different correlations, the simulation results showed that the proposed test statistic performed better than any of the test statistics compared here for almost all shifts in location.
The findings of this study show that the proposed test had greater power than Hotelling's T 2 and Mathur's test for skewed populations. Also, had greater power than Cramer's test for a small shift in location but reached a power level equivalent to that of the Rank test for a skewed population.   When the population was highly skewed, the proposed test statistic dominated Hotelling's T 2 and Mathur's test for almost all shifts in location. It also dominated Cramer's test for small and moderate shifts in location.
The power of the proposed test was greater than any of its competitors for almost all shifts in location except the Rank test for a large shift in location under a heavy tailed bivariate distribution.
The simulation results revealed that the proposed test statistics would perform much better when the underlying population was bivariate normal, skewed, highly skewed or heavy tailed.
The simulations were done for sample sizes m = 10 and n = 10, and the results were closely similar. In general, simulations performed for different sample sizes showed similar power trends.

Conclusions
In the medical field, where two measurements such as changes in closing volume and white blood cell count [18], cholesterol level and blood pressure, potassium and sodium [19] are considered for important diagnoses, the bivariate values may be related in an unknown way, so bivariate analysis is considered an important problem. The population bivariate distributions may be unknown in many cases so parametric tests cannot be applied. Some nonparametric tests require assumptions that are hard to validate. The proposed test does not require the stringent assumption of affine-invariance or ellipticsymmetry, and it is very easy to understand and apply using only regular statistical programs. In fact, we have solved the bivariate problem by reducing it to the univariate problem of sum or difference of measurements.
The results of the simulation studies showed that the proposed test performed better than most of it competitors for almost all the shifts in location. This very important property of the proposed test statistic established that it would perform much better whether the underlying population was normal or non-normal, skewed or symmetric, medium tailed or heavy tailed. Therefore, its application is recommended, since it is more powerful than any of the alternatives compared here for almost all shifts in location and in any direction.
Most of the test statistics available in the literature were difficult to compute even with the help of the computer. The proposed test statistic could easily be calculated manually for small and moderate sized data sets, which is another important property.
Here for illustration, the application of the proposed test statistic to a real data set is given. Ayatollahi [17] studied growth velocity standards from longitudinally measured infants aged 0-2 years born in Shiraz. A cohort of 317 healthy neonates were selected and followed for two years. They were visited at home at different target ages and several variables were measured. Here the researchers focused on 12 months old children, and we interested in two dependent variables, height and weight, and a grouping variable, mother's education level. Ages were recorded exactly on the basis of the difference between the date of visit and date of birth in days, and then converted to months. The weight velocity over the first year of life was defined as the difference between weight at 12 months old and weight at birth divided by the difference between date of visit at 12 months old and date of birth [17].  Simultaneously, comparison of weight and height velocities between two groups of infants, with primary and secondary educated mothers, was the main interest. The bivariate observations were the 87 measurements on weight velocity (X 1 ) and height velocity (Y 1 ) over the first year of life for infants with primary educated mothers and 54 measurements on weight velocity (X 2 ) and height velocity (Y 2 ) over the first year of life for infants with secondary educated mothers. In order to illustrate the performance of the proposed test versus Hotelling's T 2 test especially for small size samples, a random sample of 22 infants was selected. Weight and height velocities for this random sample, a part of data from Ayatollahi (2005) [17], are presented in Table 8. The bivariate observations were the 13 measurements on weight velocity (X 1 ) and height velocity (Y 1 ) over the first year of life for infants with primary educated mothers and 9 measurements on X 2 , Y 2 for infants with secondary educated mothers. In Table 9, mean and standard deviation of weight and height velocity over the first year of life for infants with primary and secondary educated mothers are presented.  Using the proposed test, the p-value was 0.030, which led to rejection of the null hypothesis at the 5% level of significance. However, this was not consistent with the conclusion reached using Hotelling's T 2 test (p-value = 0.072). In this small data set, Hotelling's T 2 could not detect the difference, but the proposed test could detect it as well as in the large data set.