The analysis of data from several sites and years

	An e-publication by the World Agroforestry Centre
	METEOROLOGY AND AGROFORESTRY

E Library Home

section 4 : measurement and analysis of agroforestry experiments

J. Riley

Rothamsted Experimental Station
Harpenden, Herts AL52JQ, UK

Abstract

The collection of data from different sites and/or years gives rise to a data set subject to both spatial and temporal variability caused by climatic and other variables. Any statistical analysis of the data must take into account these different types of variation; a review of the available methods with reference to agroforestry experiments is presented.

Introduction

It is common practice to extend the results of a simple experiment by repeating the trial at a number of different sites or over a number of years. Such extensive experimentation will provide information about the treatments or genotypes under a range of environmental conditions. Although sites may be chosen according to similarities of soil-type, climatic or other factors may be present that will give rise to possible differences in the results obtained. Experiments repeated over years will be subject to fluctuations in weather patterns. The analysis of data from such a range of environments permits the experimenter to determine the sensitivity of his treatments or varieties to a range of conditions and to provide recommendations about their use for a wider geographical area.

The data collected from such an experimental scheme, however, present substantial problems of analysis. Statistical analysis depends heavily upon the idea of randomness; data collected from a number of locations or from several years cannot be considered random: sites are chosen for their particular characters; and years, in terms of climate, can be considered to be correlated to some degree. Particular techniques thus need to be adopted when analysing data from treatments or varieties collected from a number of environments. Where the environments are represented by different points in time it is sometimes possible to examine the behaviour of the varieties by response curves over time with possible adjustments for seasonal effects. This is discussed in greater detail in the following paper (this Section) by S. Langton. Concentration here is on the more general data set classified according to treatments (or genotypes) and environments.

Analyses available

Much statistical research has been done on the examination of genotype x environment data sets and, since no single technique has been found for analysis, research is continuing. A summary of suitable approaches to the analysis of such data sets is given in Freeman (1973). More recent work was presented by Silvey (1982) and by Good-child (1982) at the Xlth International Biometric Conference.

Although each data set requires analysis tailored to its structure, a common approach to the analysis of data collected from different environments is as follows:

Analysis of variance of the complete data set to determine whether genotype x environment interactions exist; if they do not exist, there is little problem but if they do exist, then interactions need examination.
Joint regression analysis of genotype yield on environment as described by Finlay and Wilkinson (1963) to detect genotype sensitivity to changes of environment.
Multivariate analysis of the genotype x environment table, possibly by one or more of the following methods:

principal components analysis;
canonical variate analysis;
cluster analysis;
principal coordinate analysis; or
significant rank ordering.

Analysis of variance

A most important part of any statistical analysis is to determine the questions that are to be answered. When data are collected at a number of sites, are recommendations required for each site, for groups of sites or for the one area incorporating them all? Only with this information can a relevant analysis be produced. If data are to be combined from different environments then an analysis of variance accounting for the main effects of genotype and of environment and for their interaction will determine whether any substantial interaction exists. Subdivision of the main effects and interaction into components of interest are described by Cochran and Cox (1957) together with appropriate significance tests.

Joint regression analysis

If genotype x environment interactions exist these can be examined in more detail by regressing the genotype means upon a variate of environmental indices. The sum of squares for the interaction can thus be subdivided into a component for the heterogeneity of these regressions and a remainder component. Each of these components can be subdivided correspondingly into comparisons of interest. Environmental indices can be determined in different ways. Wood (1976) examines a number of them. The usual choice of index is the mean yield of all genotypes at each environment, but others are the yield of control genotypes; climatic variables such as rainfall, altitude, temperature; environmental variables such as N% in the plant or the P content of soil; or even a combination or function of several of these external variables. The method of joint regression analysis provides useful indications of the behaviour of each genotype as external conditions vary. Response to environmental change cannot always be expected to be linear and an extreme index value for one site may well distort results. In practice it would be wise to perform such regressions for several different forms of index to avoid the use of spurious results.

Multivariate analysis

The data in the two-way genotype x environment table can be considered as a multi-variate set in order to investigate the variation within it more fully. Principal component analysis can be used to identify those environments that are most influential in the production of interactions. Similarly, the technique can be used to identify groups of genotypes contributing largely to the interaction. The method is described clearly in Holland (1969); a brief outline is given here.

Principal component analysis involves the transformation of the original variates into a new set, usually of fewer variates, which are independent and which successively account for maximum proportions of the total variation. The independence property means the new variates are more easily examined by standard statistical methods; the fact that fewer variates are involved means that calculations are less cumbersome. Although principal components do not necessarily correspond to biological interpretations, Holland (1969) shows that by geometrical manipulation linear functions of the original variates can be found, having biological meaning, thus aiding more fully in the interpretation of the data. Canonical variate analysis has been used by Paterson (1974) to test whether the environments differ in their response to the treatments (-genotypes). Although the treatments are not random effects, Paterson was able to produce a satisfactory analysis, although the method appears to be of most use with large data sets.

Other methods used are cluster analysis and principal co-ordinate analysis. In cluster analysis, attempts are made to find similarities between clusters of environments using the yields from the treatments or genotypes. Various methods for calculating similarities exist and these may give rise to different clusters. Principal coordinate analysis has also been used by Shukla (1972) to demonstrate groups of genotypes making similar contributions.

A recent approach to the examination of genotype x environment interaction data is presented in Beale and Goodchild (1980). They used analysis of variance followed by a ranking process and called this significant rank order. It should be noted that this approach was suitable for the data in question, analysis of variance alone and ranking procedures alone not being used because of the nature of the variation in the data and the small number of replicates. Analysis of variance was used for the data from each environment (site) separately to provide least significant differences (LSDs). The means for the genotypes within each site were then ranked and grouped together if their difference was less than the appropriate LSD. The groupings obtained were compared from site to site. Interactions could be clearly seen through such examination. Care, however, is needed when using the LSD since its use over many pairs of means gives low precision.

Discussion

If data from many sites are required to be combined then no single method of analysis exists. Exploration of the data is necessary by several approaches: joint regression analysis to determine the response of the genotypes to site fluctuations and multivariate approaches to determine clusters within sites (environments) or within treatments (genotypes). The nature of the experiment and the form of the data will need close scrutiny to indicate which analyses are suitable and whether any further modification of the analyses is necessary; no routine form of analysis is possible, the analyses chosen depending upon the information required from the data.

Acknowledgement

This contribution was prepared while the author was funded by the U.K. Overseas Development Administration.

References and related publications

Deale, P.E. and N A. Goodchild. 1980. Genotype x environment interactions for resistance to Kabatiella culivora in Trifolium subterranean sub species yanninicumm. Aust. J. Agric. Res 31:1111-1117.

Cochran, W.G. and G.M. Cox. 1957. Experimental designs. New York: Wiley.

Finlay, K.W. and G.N. Wilkinson. 1963. The analysis of adaptation in a plant breeding programme. Aust. J. Agric. Res. 14:742-754.

Freeman, G.H. 1973. Statistical methods for the analysis of genotype-environment interactions. Heredity 31: 339-354.

Freeman, G.H., and B.D. Dowker. 1973. The analysis of variation between and within genotypes and environments. Heredity 30:97-109.

Goodchild, NA. and K. Vijayan. 1974. Significance tests in plots of multidimensional data in two dimensions. Biometrics 30: 209-210.

Goodchild, NA. and WJ.R. Boyd. 1975. Regional and temporal variations in wheat yield in Western Australia and their implications in plant breeding. Aust..J. Agric. Res. 26: 209-217.

Goodchild, NA. 1982. Combining temporal and spatial components in data analysis models, ln Proceedings of the XIth International Biometric Conference. Toulouse, France, 6-11 September 1982.

Hill, J. and NA. Goodchild. 1981. Analysing environments for plant breeding purposes as exemplified by multivariate analysis of long term wheat yields. Theor. appl. genet. 59:317-325.

Holland, DA. 1969. Component analysis: an end to the interpretation of data. Exp. Agric. 5:151-164.

McCulloch, J.S.G., B.C. Pereira, O. Kerfoot and NA. Goodchild. 1965. Effect of shade trees on tea yields. Agric. Meteorol. 2:385-399.

Paterson, J.G. 1974. The distribution and ecology of wild oats (Avena spp) in the agricultural environment of Western Australia. Unpublished PhD thesis, University of Western Australia.

Shukla, G.K. 1972. Application of some multivariate techniques in the analysis and interpretation of the genotype-environment interaction. Paper presented at the Symposium on Genotype x Environment Interactions, Birmingham University, September 1972.

Silvey, V. 1982. Analysis of crop variety adaptation from performance trials in England and Wales. In Proceedings of the Xlth International Biometric Conference. Toulouse, France, 6-11 September 1982.

Wood, J.T. 1976. The use of environmental variables in the interpretation of genotype-environment interaction. Heredity 37:1-7.