Most of the definitions were taken from The Cambridge Dictionary of Statistics (Everitt, 2002).
Bias: In general terms, deviation of results or inferences from the truth, or processes leading to such deviation. More specifically, the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated, or does not test the hypothesis to be tested.
Cluster: A term applied to both data in which the sampling units are grouped into clusters sharing some common feature, for example, animal litters, families or geo- graphical regions, and longitudinal data in which a cluster is defined by a set of repeated measures on the same unit. A distinguishing feature of such data is that they tend to exhibit intracluster correlation, and their analysis needs to address this correlation to reach valid conclusions.
Confidence interval: A range of values, calculated from the sample observations, that is believed, with a particular probability, to contain the true parameter value. A 95% confidence interval, for example, implies that were the estimation process repeated again and again, then 95% of the calculated intervals would be expected to contain the true parameter value. Note that the stated probability level refers to properties of the interval and not to the parameter itself which is not considered a random variable
Hierarchical/nested sampling: A design in which levels of one or more factors are subsampled within one or more other factors so that, for example, each level of a factor B occurs at only one level of another factor A. Factor B is said to be nested within factor A. An example might be where interest centres on assessing the effect of hospital and doctor on a response variable, patient satisfaction. The doctors can only practice at one hospital so they are nested within hospitals.
Measurement error: Errors in reading, calculating or recording a numerical value. The difference between observed values of a variable recorded under similar conditions and some fixed true value.
Random sampling: Either a set of n independent and identically distributed random variables, or a sample of n individuals selected from a population in such a way that each sample of the same size is equally likely.
Sample: A selected subset of a population chosen by some process usually with the objective of investigating particular properties of the parent population.
Sample size: The number of individuals to be included in an investigation. Usually chosen so that the study has a particular power of detecting an effect of a particular size.
Sampling: The process of selecting some part of a population to observe so as to estimate something of interest
about the whole population. To estimate the amount of reco- verable oil in a region, for example, a few sample holes might be drilled,
or to estimate the abundance of a rare and endangered bird species, the abundance of birds in the population might be estimated on the
pattern of detections from a sample of sites in the study region. Some obvious questions are how to obtain the sample and make the
observations and, once the sample data are to hand, how best to use them to estimate the characteristic of the whole population.
Sampling design: The procedure by which a sample of units is selected from the population. Sampling error: The difference between the sample result and the population characteristic being estimated.
In practice, the sampling error can rarely be determined because the population characteristic is not usually known.
With appropriate sampling proce- dures, however, it can be kept small and the investigator can determine its probable limits of magnitude.
SOC (soil organic carbon) stock vs concentration. Sampling frames: The portion of the population from which the sample is selected. They are usually defined by geographic listings, maps, directories, membership lists or from telephone or other electronic formats.
Soil infrared spectroscopy: Spectroscopy is the study of the interaction between matter and radiated energy1.
Soil Infrared spectroscopy (IR spectroscopy) uses infrared region of the electromagnetic spectrum that is light with a longer wavelength
and lower frequency than visible light to estimate soil properties like soil organic carbon content. IR is an established technology for rapid,
non-destructive characterization of the composition of materials based on the interaction of electromagnetic energy with matter2,3.
IR is now routinely used for analyses of a wide range of materials in laboratory and process control applications in agriculture,
food and feed technology, geology and biomedicine (Shepherd and Walsh, 2004; 2007).
Standard error: The standard deviation of the sampling distribution of a statistic. For example,
the standard error of the sample mean of n observations is σ/√n , where σ2 is the variance of the original observations. Strata: See stratification. Stratification: The division of a population into parts known as strata, particularly for the purpose of drawing a sample. Stratified random sampling: Random sampling from each strata of a population after stratification. Stratum: Each subpopulation of strata __________ Everitt, B. S. 2002.
The Cambridge Dictionary of Statistics, 2nd ed. Institute of Psychiatry, King’s College, University of London,
Cambridge University Press 1 Crouch, Stanley; Skoog, Douglas A. (2007).
Principles of instrumental analysis. Australia:
Thomson Brooks/Cole. 2 Shepherd, K.D., and Walsh, M.G. 2004.
Diffuse reflectance spectroscopy for rapid soil analysis, In Lal, R., ed. Encyclopedia of Soil Science.
Marcel Dekker Inc., New York. 3 Shepherd, K.D. and Walsh, M.G. 2007.
Infrared spectroscopy-enabling an evidence-based diagnostic surveillance approach to agricultural and environmental management in developing countries.
Journal of Near Infrared Spectroscopy 15:1-19.