What is bootstrapping §

It’s a resampling method. The method is to randomly select with replacement from the original sample.

How does bootstrapping work §

Usually we take 10 or more measurement in each measurement. Let’s say we take n in each small sample. For each sample:

1. Draw with replace from the original sample (sampling with replacement)
2. Write down the value
3. Repeat the draw for n times
4. Calculate the mean of this new sample, or bootstrapped dataset
5. Repeat above, until we have lots of means (>10k)
6. Calculate the standard deviation of the means/medians/SD etc.

Why is bootstrapping so useful? §

Bootstrapping allows us to estimate any statistic using the original sample to create a histogram of what might happen if we repeat the experiment many times. Then we can use the histogram to calculate statistics like standard error or confidence intervals without needing a formula.

Calculate p-value using bootstrapping §

Calculate P-value for Hypothesis testing and the null hypothesis.

Example: bootstrapping for mean §

• Null hypothesis: the drug makes no difference, aka mean = 0
• The mean of original sample = 0.5
• Use bootstraping dataset for mean around 0 (by shifting the mean in the original sample)
• so p-value = p of observing a mean of 0.5 or more extreme = p of means further than +-0.5
• because p > 0.05, we fail to reject the null hypothesis

Example: bootstrapping for median §

1. Calculate the median of the original sample

2. Shift the median to 0 in order to get the true null hypothesis median = 0

3. Use bootstraping to generate the histogram of medians around 0

4. Use histogram to calculate p-value for the observed median, given the null hypothesis is true.

e.g. observed median = 1.8, so its p-value = p of observing a bootstrapped median >= 1.8 or <= -1.8, so p-value = p1+p2 = 0.2

5. Therefore, p > 0.05, we fail to reject the null hypothesis that the drug has no effect, aka the drug does have some effect