Link: Resampling Hypothesis testing and the null hypothesis
What is bootstrapping
It’s a resampling method. The method is to randomly select with replacement from the original sample.
How does bootstrapping work
Usually we take 10 or more measurement in each measurement. Let’s say we take n in each small sample. For each sample:
- Draw with replace from the original sample (sampling with replacement)
- Write down the value
- Repeat the draw for n times
- Calculate the mean of this new sample, or bootstrapped dataset
- Repeat above, until we have lots of means (>10k)
- Calculate the standard deviation of the means/medians/SD etc.
Why is bootstrapping so useful?
Bootstrapping allows us to estimate any statistic using the original sample to create a histogram of what might happen if we repeat the experiment many times. Then we can use the histogram to calculate statistics like standard error or confidence intervals without needing a formula.
Calculate p-value using bootstrapping
Calculate P-value for Hypothesis testing and the null hypothesis.
Example: bootstrapping for mean
- Null hypothesis: the drug makes no difference, aka mean = 0
- The mean of original sample = 0.5
- Use bootstraping dataset for mean around 0 (by shifting the mean in the original sample)
- so p-value = p of observing a mean of 0.5 or more extreme = p of means further than +-0.5
- because p > 0.05, we fail to reject the null hypothesis
Example: bootstrapping for median
-
Calculate the median of the original sample
-
Shift the median to 0 in order to get the true null hypothesis median = 0
-
Use bootstraping to generate the histogram of medians around 0
-
Use histogram to calculate p-value for the observed median, given the null hypothesis is true.
e.g. observed median = 1.8, so its p-value = p of observing a bootstrapped median >= 1.8 or <= -1.8, so p-value = p1+p2 = 0.2
-
Therefore, p > 0.05, we fail to reject the null hypothesis that the drug has no effect, aka the drug does have some effect