This quick blog is designed to help you get off to the races quickly in world of data science; and here specifically, Experimental design. Enjoy!
When it comes to experiemental design there are three main streps it can be broken down to:
Planning & Design
Planning should always begin with a well formed hypothesis.
Some of the major considerations you want to make in this process are as follows:
- What is the question you want answered?
- What is the population in question?
- What are your dependent and independent variables?
When conducting an experiment, there are 3 key components to consider. These three aspects of an experiment allows us to assess our population’s variability.
The purpose of randomization is to make sure that if there is variation in outcomes that is related to outside factors, then it is distributed across treatment groups.
When conducting an experiment, we seek to understand the variability of outcomes. For instance; if I were to run a given experiment but once, and I was depending on an outcome that may have occurred due to random chance. The point here is to understand the broad spectrum of possibilities or outcomes, it’s important that we replicate the experiment accordingly.
The concept of statistical power means that if your experiment concludes such that you reject the NULL hypothesis and accept the alternative hypothesis, it is the likelihood that it would not be due to random chance. Best practice is 80% statistical power.
So to simplify this even further; If your hypothesis appears to be correct, what’s the likelihood that you didn’t get that outcome just due to random chance.
Blocking is used to help control variability by making treatment groups more alike. Inside of a given group, you might see that differences are minimal, however across other groups that could be much larger. One example of this might be blocking an experiment by gender.
- Blocking by variables – use aov for the sake of blocking
- Randomized Complete Block Design (RCBD) experiment
After accumulating data from your experiment; one quick and east test of statistical significance you might run is called a t-test.
- Consider your hypothesis or central research question:
- NULL hypothesis – lets keep this simple, the null hypothesis is pretty much when you’re wrong. For the mtcars dataset, the null hypothesis might be something like a vehicles horse power has no effect on miles per gallon.
- Alternative hypothesis – conversely the alternative hypothesis means that there was a difference. If you are able to determine with statistical significance the effect of the independent variable on the dependent variable, you would say that you reject the null hypothesis and accept the alternative hypothesis.
- Is this a one or two sided test?
- One sided test – when you are testing whether a given variable is greater than another then it’s a one-sided test; if you’re testing whether it’s less than another… still one-sided.
- Two sided test – when you are testing that a given variable is not equal to another, then that is two sided. Greater or less than in a single test.
- Were your results statistically significant?
- People use the term statistically significant left and right with little consideration of what it actually means. What that is saying is that if you run your test and your data is suggesting a that your hypothesis is correct, statistical significance is effectively knowing that it’s not likely due to random chance.
- The standard here is 95% confidence, or a less than or equal to likelihood of 5%.
- What is statistical power?
- Similar to statistical significance; given that the alternative hypothesis is true, power represents the likelihood that the null hypothesis will be rejected.
- The standard for power is 80%.
For a given experiment one thing to consider is the sample size. In order to arrive at a required number for this is requires a handful of other variables including targeted statistical power & significance.
Another measure is that of effect size. Effect size represents the difference between the average of two groups divided by the standard deviation of both groups combined.
The greater the distance between groups the less of a sample to validate it. The smaller the difference the greater the likelihood that the observed distance is only due to chance.
In order to calculate any of these values including effect size, statistical power, p-value, etc. you need all buy one of them.
Load up the package pwr and use the pwr.anova.test to identify the odd variable out here.
k – number of groups
n – sample size per group
f = effect size
sig.level = significance level
power = statistical power
library(pwr) pwr.anova.test(k = 2, n = NULL, f = .1, sig.level = 0.05, power = .8)
As you can see above, given the demands that we passed to the pwr.anova.test function, we can see that for those things to hold true, we at least 394 samples per group.
I hope this helps to get you started in experimental design!