What a Chi-Square (χ²) Test Measures
A chi-square test is one of the most widely used tools for analyzing categorical data—data that falls into labels or groups like “Yes/No,” “Red/Blue/Green,” “Device type,” “Outcome category,” or “Preference option.” Instead of comparing averages, a chi-square test compares counts. The core question is always the same: are the counts you observed close to what a hypothesis predicts, or is the gap too large to be explained by random variation alone?
Chi-square tests convert the mismatch between observed counts (O) and expected counts (E) into a single statistic, the χ² statistic:
The sum runs over categories (goodness of fit) or cells in a contingency table (independence). If observed counts are close to expected counts, χ² is small. If some categories are much higher or lower than expected, χ² becomes larger. Since χ² is always nonnegative, chi-square tests are typically right-tailed: large χ² values fall in the right tail of the χ² distribution and provide evidence against the null hypothesis.
Two Main Chi-Square Tests This Calculator Supports
This Chi-Square Calculator includes the two most common χ² test families:
| Test type | Data structure | Null hypothesis (H₀) | Typical question |
|---|---|---|---|
| Goodness of fit | One categorical variable with k categories | Observed follows expected distribution | “Do these categories match the expected proportions?” |
| Independence | Two categorical variables in an r×c table | Variables are independent (no association) | “Are these variables related?” |
Goodness of Fit: Observed vs Expected Distribution
A goodness-of-fit chi-square test checks whether observed counts match a specific distribution. For example, you might test whether survey responses match a predicted breakdown, whether product defects appear equally across categories, or whether a random generator appears uniform across several outcomes.
You can define expectations in several ways: (1) expected proportions that sum to 1, (2) expected counts directly, or (3) a uniform expectation where every category has the same expected share. This tool supports all three. If you enter proportions, the calculator converts them to expected counts via Eᵢ = n·pᵢ.
Degrees of freedom matter because they determine the reference χ² distribution. For goodness of fit:
Here, k is the number of categories and m is the number of parameters estimated from the data. Many simple tests use m = 0, which gives df = k − 1. If you estimate parameters (for example, fitting a distribution from the same sample), degrees of freedom should be reduced accordingly.
Independence: Contingency Tables and Expected Counts
A chi-square test of independence uses a contingency table to determine whether two categorical variables are related. Examples include checking if product preference depends on region, whether a website conversion rate depends on device type, or whether an outcome depends on treatment group (with categorical outcomes).
In an r×c table, the null hypothesis assumes independence. Under independence, the expected count in each cell is:
Then compute χ² by summing (O−E)²/E over all cells. Degrees of freedom are:
This calculator also provides a Yates continuity correction option for 2×2 tables. It slightly reduces χ² to be more conservative when counts are small, though it is not always recommended in modern workflows. If your expected counts are small, consider whether a different method (like an exact test) is more appropriate.
How to Interpret the p-Value and the Decision
The p-value in a chi-square test is the probability, under the null hypothesis, of observing a χ² statistic at least as large as the one computed:
You choose a significance level α (commonly 0.05). If p ≤ α, you reject H₀ (evidence of mismatch or association). If p > α, you fail to reject H₀ (insufficient evidence to claim a mismatch or association). Importantly, “fail to reject” does not prove independence or a perfect fit—it simply means the sample does not provide strong evidence against H₀.
Expected Count Checks and Practical Assumptions
Chi-square tests are built on an approximation that works best when expected counts are not too small. A common rule of thumb is that most expected counts should be at least 5 (some texts say 80% ≥ 5 and none < 1). This calculator can warn you when expected counts fall below a threshold so you can interpret results more cautiously.
If expected counts are small, you may consider:
- Combining categories (if it makes sense conceptually), so expected counts increase.
- Collecting more data to increase sample size and stabilize expected counts.
- Using an exact test for 2×2 tables or specialized alternatives for sparse data.
Effect Size: How Strong Is the Difference or Association?
Statistical significance answers whether the pattern is unlikely under H₀; it does not tell you how large or important the pattern is. That’s why effect sizes are helpful:
- Goodness of fit: Cohen’s w = √(χ² / n). Larger w indicates a bigger overall deviation from the expected distribution.
- Independence: Cramér’s V = √(χ² / (n × (min(r−1, c−1)))). For 2×2, Phi (φ) is a special case of Cramér’s V.
These effect sizes make it easier to compare results across studies with different sample sizes. Two datasets can both be “significant,” but one may have a much stronger association than the other.
Residuals and Contributions: Finding “Where” the χ² Comes From
The overall χ² statistic is a sum of cell/category contributions. This calculator reports per-row (goodness of fit) or per-cell (independence) values:
- Contribution: (O−E)²/E tells you how much a category/cell adds to χ².
- Residual: (O−E)/√E gives a signed, scaled difference to show direction and magnitude.
Residuals help you interpret results beyond a single p-value. For example, a significant independence test tells you an association exists, and residuals help identify which combinations (cells) are unusually high or low relative to independence.
Step-by-Step: Using This Chi-Square Calculator
Quick workflow
- Choose α and whether you want to display detailed tables.
- Select Goodness of Fit (one variable) or Independence (two variables).
- Enter observed counts (and expected proportions/counts if needed).
- Click Calculate to get χ², df, p-value, critical χ², and effect size.
- Use the details table to see expected counts, contributions, and residuals.
FAQ
Chi-Square Calculator FAQs
Common questions about χ² goodness-of-fit and independence tests, assumptions, degrees of freedom, and interpretation.
A chi-square test compares observed counts to expected counts. It measures how far the observed data deviates from what a hypothesis (or independence assumption) predicts, using the χ² distribution.
Goodness of fit tests whether one categorical variable follows a specific distribution (expected proportions). Independence tests whether two categorical variables are related (using a contingency table).
Key assumptions include independent observations and sufficiently large expected counts (commonly most expected counts ≥ 5). If expected counts are small, consider combining categories or using exact methods.
The χ² statistic is always nonnegative, and larger values indicate greater mismatch from expectations, so the rejection region is in the right tail of the χ² distribution.
Goodness of fit: df = k − 1 − m (k categories, m parameters estimated from data). Independence: df = (r − 1)(c − 1) for an r×c table.
The p-value is the probability of observing a χ² statistic at least as large as yours (under the null hypothesis). Small p-values suggest evidence against the null.
For goodness of fit, Cohen’s w is common. For independence tests, Cramér’s V (or Phi for 2×2 tables) summarizes association strength on a 0–1 scale.
Residuals compare each cell’s observed count to its expected count. Large residuals highlight which categories or cells contribute most to the overall χ² statistic.