Personal Dev Statistics¶

10 cards — 🟢 3 easy | 🟡 4 medium | 🔴 3 hard

🟢 Easy (3)¶

1. What is the base rate fallacy and why does it matter in everyday reasoning?

Show answer

The base rate fallacy is ignoring how common or rare something is in the overall population when evaluating a specific case. Example: a medical test with 99% accuracy still produces mostly false positives if the disease affects only 1 in 10,000 people, because the base rate of the disease is so low that false positives vastly outnumber true positives.

2. What is the difference between correlation and causation, and what is the simplest test to separate them?

Show answer

Correlation means two things move together; causation means one actually produces the other. Things can correlate for dumb reasons: a hidden third variable, coincidence, or reverse direction. The simplest test: ask whether there is a plausible mechanism, whether the timing makes sense, and whether a controlled experiment has been done. Without those, correlation is just a pattern, not an explanation.

3. What is sampling bias and how does it distort conclusions?

Show answer

Sampling bias occurs when the sample studied does not represent the population you want to draw conclusions about. Common forms: convenience sampling (studying whoever is easiest to reach), survivorship bias (studying only successes), and voluntary response bias (only motivated people respond). The math can be perfect, but if the sample is skewed, the conclusion is skewed.

🟡 Medium (4)¶

1. What does a p-value actually mean, and what is the most common misconception about it?

Show answer

A p-value is the probability of seeing data at least as extreme as what was observed, assuming the null hypothesis is true. It is NOT the probability that the hypothesis is correct. Common misconception: "p = 0.03 means there is a 3% chance the result is due to chance." Wrong. It means if nothing real were happening, you would see data this extreme 3% of the time. A small p-value does not tell you the effect is large or important.

2. What is Simpson's paradox and why is it dangerous?

Show answer

Simpson's paradox occurs when a trend that appears in several groups reverses or disappears when the groups are combined. It happens because of unequal group sizes or confounding variables. Example: a treatment can appear better in every subgroup but worse overall if it is disproportionately used in the harder cases. It is dangerous because aggregated data can tell the opposite story from disaggregated data, and both are mathematically correct.

3. What is the difference between a confidence interval and a prediction interval?

Show answer

A confidence interval estimates where the true population parameter (like a mean) likely falls. A prediction interval estimates where a single future observation might fall. Prediction intervals are always wider because they include both the uncertainty about the population parameter AND the natural variability of individual data points. Confusing the two leads to overconfident predictions about individual cases.

4. Why is statistical significance not the same as practical importance, and what concept bridges the gap?

Show answer

Statistical significance only means the result is unlikely under the null hypothesis — it says nothing about the size or importance of the effect. A massive sample can make a trivially small difference statistically significant. Effect size bridges the gap: it measures how large the difference actually is (e.g., Cohen's d, odds ratio). Always ask: significant AND large enough to matter?

🔴 Hard (3)¶

1. What is the core difference between Bayesian and frequentist approaches to probability?

Show answer

Frequentist probability is about long-run frequencies of repeatable events — a coin's probability is defined by what happens over many flips. Bayesian probability represents degrees of belief updated by evidence — you start with a prior (what you believed before), encounter data, and compute a posterior (updated belief) using Bayes' theorem. Frequentists ask "how likely is this data given the hypothesis?" Bayesians ask "how likely is the hypothesis given this data?" In practice, Bayesian reasoning is closer to how real decisions work because you almost always have prior information.

2. What is regression to the mean, and why does it create false narratives about interventions?

Show answer

Regression to the mean is the statistical tendency for extreme measurements to be followed by less extreme ones, simply because extreme values include a large random component. It creates false narratives because people intervene after extreme results (e.g., punishing after a terrible performance, applying a treatment after peak symptoms) and then credit the intervention when things naturally return toward average. The improvement would have happened anyway.

3. What is denominator blindness and how does it distort risk perception?

Show answer

Denominator blindness is focusing on the numerator (the dramatic count) while ignoring the denominator (the total population at risk). "500 people died from X" sounds terrifying, but if 200 million were exposed, the risk is 0.00025%. It distorts risk perception in headlines, medical scares, and policy debates. The fix: always ask "out of how many?" and convert counts to rates before comparing risks.