Link: P-value

What is p-hacking?

P-hacking is the misuse and abuse of analysis techniques and results in being fooled by false positives.

False Positives in p-value

Sometimes, we may get a small p-value, even when it’s actually not significantly different, this is called False Positive.

Because the threshold of p-value is 5%, it means approximately 5% of the statistical tests will result in False Positives.

Multiple Testing Problem: refers to the issues of having lots of (absolute number of) False Positives when doing a large number of tests. However, there’re ways to reduce False Positives. See False discovery rates FDR for more details.

How does p-hacking work in reality?

When the p-value is closer to the threshold 0.05, we tend to do things like add data points (add s) and it sometimes would result in a desire outcome e.g. it becomes significantly different due to p < 0.05. But it’s actually a False Positive.

How to avoid p-hacking?

To avoid it, we need to decide the sample size before the experiment, by doing Power analysis.