File Name: chart of statistical tests and when to use them .zip
The proper understanding and use of statistical tools are essential to the scientific enterprise. This is true both at the level of designing one's own experiments as well as for critically evaluating studies carried out by others. Unfortunately, many researchers who are otherwise rigorous and thoughtful in their scientific approach lack sufficient knowledge of this field. This methods chapter is written with such individuals in mind. Although the majority of examples are drawn from the field of Caenorhabditis elegans biology, the concepts and practical applications are also relevant to those who work in the disciplines of molecular genetics and cell and developmental biology.
Our intent has been to limit theoretical considerations to a necessary minimum and to use common examples as illustrations for statistical analysis. Our chapter includes a description of basic terms and central concepts and also contains in-depth discussions on the analysis of means, proportions, ratios, probabilities, and correlations. We also address issues related to sample size, normality, outliers, and non-parametric approaches.
At the first group meeting that I attended as a new worm postdoc , D. Yes, mutant X has a phenotype. No, mutant Y does not genetically complement mutant Z. We are perhaps even a bit suspicious of other kinds of data, which we perceive as requiring excessive hand waving. However, the realities of biological complexity, the sometimes-necessary intrusion of sophisticated experimental design, and the need for quantifying results may preclude black-and-white conclusions.
Oversimplified statements can also be misleading or at least overlook important and interesting subtleties. Finally, more and more of our experimental approaches rely on large multi-faceted datasets. These types of situations may not lend themselves to straightforward interpretations or facile models.
Statistics may be required. The intent of these sections will be to provide C. Namely, which common situations require statistical approaches and what are some of the appropriate methods i. Our intent is therefore to aid worm researchers in applying statistics to their own work, including considerations that may inform experimental design.
In addition, we hope to provide reviewers and critical readers of the worm scientific literature with some criteria by which to interpret and evaluate statistical analyses carried out by others.
At various points we suggest some general guidelines, which may lead to somewhat more uniformity in how our field conducts and presents statistical findings. Finally, we provide some suggestions for additional readings for those interested in a more systematic and in-depth coverage of the topics introduced Appendix A.
There are numerous ways to describe and present the variation that is inherent to most data sets. Range defined as the largest value minus the smallest is one common measure and has the advantage of being simple and intuitive. Range, however, can be misleading because of the presence of outliers , and it tends to be larger for larger sample sizes even without unusual data values.
Standard deviation SD is the most common way to present variation in biological data. It has the advantage that nearly everyone is familiar with the term and that its units are identical to the units of the sample measurement. Its disadvantage is that few people can recall what it actually means. Figure 1 depicts density curves of brood sizes in two different populations of self-fertilizing hermaphrodites.
Both have identical average brood sizes of However, the population in Figure 1B displays considerably more inherent variation than the population in Figure 1A.
Looking at the density curves, we would predict that 10 randomly selected values from the population depicted in Figure 1B would tend to show a wider range than an equivalent set from the more tightly distributed population in Figure 1A.
We might also note from the shape and symmetry of the density curves that both populations are Normally 1 distributed this is also referred to as a Gaussian distribution. In reality, most biological data do not conform to a perfect bell-shaped curve, and, in some cases, they may profoundly deviate from this ideal.
Nevertheless, in many instances, the distribution of various types of data can be roughly approximated by a normal distribution. Furthermore, the normal distribution is a particularly useful concept in classical statistics more on this later and in this example is helpful for illustrative purposes.
Figure 1. Two normal distributions. Often we can never really know the true mean or SD of a population because we cannot usually observe the entire population. Instead, we must use a sample to make an educated guess. In the case of experimental laboratory science, there is often no limit to the number of animals that we could theoretically test or the number of experimental repeats that we could perform.
It's awkward for us to think of a theoretical collection of bands on a western blot or a series of cycle numbers from a qRT-PCR experiment as a population, but from the standpoint of statistics, that's exactly what they are.
Thus, our populations tend to be mythical in nature as well as infinite. Moreover, even the most sadistic advisor can only expect a finite number of biological or technical repeats to be carried out. The data that we ultimately analyze are therefore always just a tiny proportion of the population, real or theoretical, from whence they came.
It is important to note that increasing our sample size will not predictably increase or decrease the amount of variation that we are ultimately likely to record. What can be stated is that a larger sample size will tend to give a sample SD that is a more accurate estimate of the population SD. In the same vein, a larger sample size will also provide a more accurate estimation of other parameters , such as the population mean. In some cases, standard numerical summaries e.
In particular, these measures usually 3 tell you nothing about the shape of the underlying distribution. Figure 2 illustrates this point; Panels A and B show the duration in seconds of vulval muscle cell contractions in two populations of C. The data from both panels have nearly identical means and SDs, but the data from panel A are clearly bimodal, whereas the data from Panel B conform more to a normal distribution 4.
One way to present this observation would be to show the actual histograms in a figure or supplemental figure. Alternatively, a somewhat more concise depiction, which still gets the basic point across, is shown by the individual data plot in Panel C. In any case, presenting these data simply as a mean and SD without highlighting the difference in distributions would be potentially quite misleading, as the populations would appear to be identical.
Figure 2. Two distributions with similar means and SDs. Panels A and B show histograms of simulated data of vulval muscle cell contraction durations derived from underlying populations with distributions that are either bimodal A or normal B.
Note that both populations have nearly identical means and SDs, despite major differences in the population distributions. Panel C displays the same information shown in the two histograms using individual data plots.
Horizontally arrayed sets of dots represent repeat values. Before you become distressed about what the title of this section actually means, let's be clear about something. Statistics, in its broadest sense, effectively does two things for us—more or less simultaneously. This includes fairly simple stuff such as means and proportions.
It also includes more complex statistics such as the correlation between related measurements, the slope of a linear regression, and the odds ratio for mortality under differing conditions.
These can all be useful for interpreting our data, making informed conclusions, and constructing hypotheses for future studies. However, statistics gives us something else, too. What a deal! Not only can we obtain predictions for the population mean and other parameters, we also estimate how accurate those predictions really are.
In the preceding section we discussed the importance of SD as a measure for describing natural variation within an entire population of worms. We also touched upon the idea that we can calculate statistics, such as SD, from a sample that is drawn from a larger population. Intuition also tells us that these two values, one corresponding to the population, the other to the sample, ought to generally be similar in magnitude, if the sample size is large.
Finally, we understand that the larger the sample size, the closer our sample statistic will be to the true population statistic. This is true not only for the SD but also for many other statistics as well.
It is now time to discuss SD in another context that is central to the understanding of statistics. We do this with a thought experiment. Imagine that we determine the brood size for six animals that are randomly selected from a larger population.
Not being satisfied with our efforts, we repeat this approach every day for 10 days, each day obtaining a new mean and new SD Table 1. At the end of 10 days, having obtained ten different means, we can now use each sample mean as though it were a single data point to calculate a new mean, which we can call the mean of the means.
In addition, we can calculate the SD of these ten mean values, which we can refer to for now as the SD of the means. We can then pose the following question: will the SD calculated using the ten means generally turn out to be a larger or smaller value on average than the SD calculated from each sample of six random individuals?
This is not merely an idiosyncratic question posed for intellectual curiosity. The notion of the SD of the mean is critical to statistical inference.
Read on. Thinking about this, we may realize that the ten mean values, being averages of six worms, will tend to show less total variation than measurements from individual worms. In other words, the variation between means should be less than the variation between individual data values.
Moreover, the average of these means will generally be closer to the true population mean than would a mean obtained from just six random individuals. In fact, this idea is born out in Table 1 , which used random sampling from a theoretical population with a mean of and SD of 50 to generate the sample values.
We can therefore conclude sample means will generally exhibit less variation than that seen among individual samples. Furthermore, we can consider what might happen if we were to take daily samples of 20 worms instead of 6. Namely, the larger sample size would result in an even tighter cluster of mean values. This in turn would produce an even smaller SD of the means than from the experiment where only six worms were analyzed each day.
Thus, increasing sample size will consistently lead to a smaller SD of the means. Note however, as discussed above, increasing sample size will not predictably lead to a smaller or larger SD for any given sample.
It turns out that this concept of calculating the SD of multiple means or other statistical parameters is a very important one. The good news is that rather than having to actually collect samples for ten or more days, statistical theory gives us a short cut that allows us to estimate this value based on only a single day's effort.
In fact, whenever a SD is calculated for a statistic e. SD is a term generally reserved for describing variation within a sample or population only.
Quality Glossary Definition: Statistical process control. Statistical process control SPC is defined as the use of statistical techniques to control a process or production method. SPC tools and procedures can help you monitor process behavior, discover issues in internal systems, and find solutions for production issues. Statistical process control is often used interchangeably with statistical quality control SQC. A control chart helps one record data and lets you see when an unusual event, such as a very high or low observation compared with "typical" process performance, occurs.
encouraging academics to share statistics support resources We are here to help the students to learn and not to do their work for them or to tell them analysis or even just a graph to summarise their results may be enough for their project. Main references for this section: honeycreekpres.orghoneycreekpres.org
And get full access to all statistics. Are you interested in testing our corporate solutions?
When performing research it is essential that you are able to make sense of your data. This allows you to inform other researchers in your field and others what you have found. It also can be used to help build evidence for a theory. Therefore an understanding of what test to use and when is necessary. There are some good practices to do if you are not familiar with performing statistical analysis. The first is to determine your variables. Overview of basic statistics: Dependent variable DV -This is the one that you are measuring.
Published on January 28, by Rebecca Bevans. Revised on December 28, Statistical tests are used in hypothesis testing.
Your email address will not be published. Required fields are marked *