Chapter 1 Populations in Genetic Studies
The population is a set of individuals sharing more or less common characteristics or properties. Population in biology can include all living individuals on the earth as far as the ecological system is concerned. It can also be referred to as all living indi-viduals of one biological species, such as populations of human beings, animals, plants, microbiology, etc. More often, one biological population consists of indi-viduals of one species living in specific areas or societies. As far as genetics is con-cerned, the population is much smaller, where the individuals are more closely related by co-ancestry or relationship by relatives, and therefore sharing more common characteristics. The genetic population can be any race of one biological species, any variety with genetic variation, or the progenies after sexual or asexual propagation using some individuals as parents. Individuals or lines included in one genetic population normally have clear relationships or kinship, but also are different or differ both phenotypically and genetically. For any genetic study, one or some-times several populations are needed.
A number of different genotypes have to be included in one genetic population. Many factors can affect population architecture, such as mating systems, the number of parental lines, and population size. Developing the most suitable popu-lations is fundamental to most genetic studies. Population genetics is concerned with gene frequency and genotypic frequency in the genetic populations, how these fre-quencies change from the parental generation to progeny generation taking mating system, mutation, selection, random drift, etc. into consideration, and what effects the changes will make on each population. The number of alleles together with their frequencies at each locus, and the number of genotypes together with their fre-quencies are major parameters characterizing the population structure (Wang, 2017; Hartl and Clark, 2007; Hartl and Jones, 2005; Falconer and Mackay, 1996; Crow and Kimura, 1970). This chapter begins with mating designs and various types of genetic populations, followed by the structure of commonly used populations, collection and preliminary analysis of genotypic data, collection and analysis of variance (ANOVA) on phenotypic data, and estimation of variance components, heritability, and genotypic values.
1.1 Commonly Used Populations in Genetic Studies
1.1.1 Bi-Parental Populations
Various mating designs have been proposed and widely used in genetic studies (Wang, 2017; Bernardo, 2010; Lynch and Walsh, 1998). Populations derived from two homozygous parental lines (also called pure lines or fixed lines) are mostly used in plant genetic studies since the rediscovery of Mendel’s hybridization experiments in garden peas in 1900. The bi-parental mating design begins with two pure lines showing the obvious difference in one or several phenotypic traits. Hybridization is made between the two parents (represented by P1 and P2) to generate their F1 hybrid. Selfing of the F1 hybrid generates the segregating population which is called F2; hybridization between the F1 hybrid and its two parents generates the segregating populations which are called P1BC1F1 if backcrossed with P1, and P2BC1F1 if backcrossed with P2. Selfing and backcrossing may be repeatedly applied in F2, P1BC1F1, and P2BC1F1 so as to have more advanced generations. Recombination inbred lines (RILs) are formed after several rounds of repeated selfing. However, pure lines, which are called the doubled haploid (DH) lines, can also be generated from F1, P1BC1F1, or P2BC1F1 through one generation by DH technology. Figure 1.1 shows 20 bi-parental populations which are commonly used in genetic studies in plants, together with chromosomal segment substitution lines (CSSL) after repeated backcrossing and selfing, and the nested association mapping (NAM) population between several parents and one common parent.
FIG. 1.1 – Biparental populations and their derivative relationship in genetic studies in plants.
At one polymorphism locus (no matter whether it is a marker or a gene), assume parent P1 carries allele A, parent P2 carries allele a, and the genotypes of two parents are AA and aa, respectively. When selection and random drift due to the limited population size are not considered, the two alleles have equal frequency, i.e., 0.5, in selfing, repeated selfing, and DH populations starting from the F1 hybrid. Each generation of backcrossing reduces the frequency of the non-recurrent parent allele by half. Based on the frequency of allele A, i.e., fA, the 20 biparental populations shown in figure 1.1 fall into five classes.
(1) fA = 0.875. Following two generations of backcrossing with parent P1, the fre-quency of allele a is one-quarter of the frequency in F1, i.e., 0.125, and the frequency of allele A is equal to 0.875 in P1BC2F1. Selfing, repeated selfing, and DH popul