Probability in genetics pdf
By using our site, you agree to our collection of information through the use of cookies. To learn more, view our Privacy Policy. To browse Academia. Log in with Facebook Log in with Google. Remember me on this computer. Enter the email address you signed up with and we'll email you a reset link. Need an account? Click here to sign up. Download Free PDF.
Genetic and Evolutionary Computation Conference, Bill White. A short summary of this paper. Download Download PDF. Translate PDF. Moore, Lance W. Hahn, Marylyn D. Ritchie, Tricia A. Thornton, Bill C. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes.
Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published.
In this paper, we present a strategy for identifying complex genetic models for simulation studies that utilizes genetic algorithms. The genetic models used in this study are penetrance functions that define the probability of disease given a specific DNA sequence variation has been inherited. We demonstrate that the genetic algorithm approach routinely identifies interesting and useful penetrance functions in a human-competitve manner.
The identification of disease susceptibility genes has the potential to improve human health through the development of new prevention, diagnosis, and treatment strategies. Although achieving this goal is an important public health endeavor, it is not easily accomplished for common diseases, such as essential hypertension, due to the complex multifactorial nature of the disease Kardia, ; Moore and Williams, That is, in such cases, risk of disease is due to a complex interplay between multiple genes and multiple environmental factors.
The identification of genes that influence risk of disease only through complex interactions with other genes i. The statistical challenge is to consider high-dimensional interactions without loss of degrees of freedom while the computational challenge lies in the size and complexity of the search space.
Gene-gene interactions are examples of attribute interactions, a major challenge for data mining Freitas, Several new methods have been developed in an attempt to address the statistical and computational challenges of detecting and characterizing complex disease susceptibility genes. These methods can be classified as either data reduction approaches or pattern recognition approaches.
Data reduction methods such as the multifactor dimensionality reduction or MDR approach Ritchie et al. MDR reduces multiple predictor variables to a single variable, thereby reducing the dimensionality of the problem. In contrast, pattern recognition and machine learning strategies such as neural networks Lucek et al.
Although these methods are promising, the power of these approaches for identifying gene-gene and gene-environment interactions has not been fully evaluated. The evaluation of power is best accomplished using simulated data. The goal of this study was to develop a genetic algorithm GA strategy for discovering complex genetic models in the form of penetrance functions that can be used to simulate data for the evaluation of new statistical and computational methods.
Penetrance functions define the probability of disease given a particular combination of DNA sequence variations has been inherited. Penetrance functions of interest in this study exhibit gene-gene or attribute interactions in the absence of independent main effects. We begin in Section 2 with an overview of genetic models in terms of penetrance functions. A summary and discussion of the results are presented in Sections 4 and 5 respectively.
The conclusions are presented in Section 6. The results presented in this paper demonstrate a GA strategy is capable of routinely identifying interesting and useful genetic models in a human-competitve manner. Penetrance is simply the probability of disease given a particular combination of genotypes. A single genotype is determined by one allele i. For most genetic variations, only two alleles A or a exist in the biological population.
Therefore, because the order of the alleles is unimportant, a genotype can have one of three values: AA, Aa or aa. Penetrance functions define the probability of disease for all genotypes for one or more genetic variations. Once the penetrance functions are specified, genetic data can easily be simulated for people with the disease and for people without the disease. For example, the penetrance function for an autosomal recessive disease i. Here, individuals who inherit the AA or Aa genotypes have zero probability of disease while individuals who inherit the aa genotype are certain to have the disease.
From this simple recessive Mendelian model, data can simply be simulated by giving affected individuals aa genotypes and unaffected individuals AA or Aa genotypes, in proportion to their defined population frequencies. Table 1. Penetrance values for three genotypes from a gene acting under an autosomal recessive disease model.
This means that the probability of disease given the BB genotype is 0. Similarly, the marginal penetrance of Bb can be calculated as 0. Note that for this model, all of the marginal penetrance values i. This is true despite the table penetrance values not being equal. Here, risk of disease is greatly increased by inheriting exactly two high-risk alleles e.
This model was first described by Frankel and Schork What makes this model complex is the absence of a main effect for either of the genetic variations. Thus, each genetic variation only has an effect on disease risk in the context of the other genetic variation.
Table 2. Penetrance values for combinations of genotypes from two genes exhibiting interactions but not main effects. Table penetrance values AA. Table penetrance values More complex genetic models can be developed by assigning disease risk to more than one genotype from one or more genetic variations. Table 2 illustrates a penetrance function that relates two genetic variations, each with two alleles and three genotypes, to risk of disease.
Thus, assuming the frequency of the AA genotype is 0. This is one of only a few complex genetic models that have been described in the literature. The scarcity of complex genetic models in the literature is primarily due to the extraordinary combinatorial complexity of the problem, as has been discussed by Culverhouse et al.
Effectively, there are an infinite number of possible penetrance functions that could be developed for just two genetic variations. Only some of these models would exhibit a complex relationship with disease risk. The size of the search space precludes the human-based trial and error approach as well as exhaustive computational searches without specific restrictions and assumptions about the allele frequency and penetrance function values.
All four albino? What is the probability that the first son of a woman whose brother is affected will be affected? What is the probability that the second son of a woman whose brother is affected will be affected if her first son was affected?
What phenotypes would this cross produce and in what ratios? What can the principles of probability can be used for? What is the probability that their first son will have hemophilia? A woman whose brother had cystic fibrosis marries a man who had a child with cystic fibrosis from a previous marriage. They plan to have 3 children. What is the that only one of the 3 will have cystic fibrosis? If a person with type O blood has a baby with a person who has type AB blood, then what blood type will the baby have?
The allele for curly hair is incompletely dominant. If a mother is homozygous for curly hair and the father is homozygous for straight hair, what percentage of the offspring will exhibit characteristics of both parents? What is the probability that a seed from the cross will produce a tall plant? What is represented by a pair of lowercase letters, such as tt, in a Punnett square? Question ba Question If the frequency of the p allele is. Why is crossing over so important to a species?
Why are small populations more prone to genetic disease? Are calculated genotypic frequencies necessarily the same as observed genotypic frequencies? Question a4. Question c4c
0コメント