What is p-hacking?
P-hacking is the misuse and abuse of analysis techniques and results in being fooled by false positives.
False Positives in p-value
Sometimes, we may get a small p-value, even when it’s actually not significantly different, this is called False Positive.
Because the threshold of p-value is 5%, it means approximately 5% of the statistical tests will result in False Positives.
Multiple Testing Problem: refers to the issues of having lots of (absolute number of) False Positives when doing a large number of tests. However, there’re ways to reduce False Positives. See False discovery rates FDR for more details.
How does p-hacking work in reality?
When the p-value is closer to the threshold 0.05, we tend to do things like add data points (add s) and it sometimes would result in a desire outcome e.g. it becomes significantly different due to p < 0.05. But it’s actually a False Positive.
How to avoid p-hacking?
To avoid it, we need to decide the sample size before the experiment, by doing Power analysis.