Updated Math

Linear Regression Calculator

Enter your (x, y) data to compute the best-fit line, slope and intercept, correlation (r), R², RMSE, residuals, and predictions with confidence intervals.

Least Squares R² & RMSE Residual Table Prediction

Best-Fit Line, R², Error Metrics & Predictions

Add data points manually or paste them, then calculate the regression line and diagnostics instantly.

# X Y Remove
Tip: Enter at least 2 points to fit a line. For confidence intervals and error estimates, you’ll typically want 3+ points so degrees of freedom (n − 2) is positive.
Accepted formats: x,y, x y, or tab-separated. Empty lines are ignored.
Predictions are most reliable inside the range of your observed X values. Extrapolation (predicting far beyond your data) can be misleading even when R² is high.
Residuals are y − ŷ. Use them to spot outliers, patterns (non-linearity), and changing variance.
Run Calculate to see residuals.

What Linear Regression Is and What It Tells You

Linear regression is one of the most widely used tools for understanding and predicting relationships between two variables. In its simplest form, you provide paired observations (x, y), and the model finds the straight line that best explains how y tends to change as x changes. The result is a line called the least-squares regression line:

ŷ = b0 + b1x

Here, b1 is the slope (how much y changes per 1 unit of x), and b0 is the intercept (the predicted y when x = 0). Because the method is based on minimizing squared errors, it is very stable and easy to compute — even for large datasets. But it still requires careful interpretation: linear regression can summarize trends well, yet it can also be misleading when the true relationship is non-linear, when outliers dominate, or when you extrapolate beyond your observed x values.

How the Best-Fit Line Is Calculated

This linear regression calculator uses the classic ordinary least squares (OLS) approach. OLS chooses b0 and b1 so that the sum of squared residuals is as small as possible. A residual is the vertical difference between a real point and the regression line:

Residual = y − ŷ

The slope and intercept can be computed using summary quantities from your dataset. If you let n be the number of points, x̄ the mean of x, and ȳ the mean of y, then the most common (numerically stable) formulation is based on centered sums:

Sxx = Σ(x − x̄)²
Sxy = Σ(x − x̄)(y − ȳ)

b1 = Sxy / Sxx
b0 = ȳ − b1x̄

If Sxx = 0, then all x values are identical and there is no unique slope (you cannot fit a meaningful line in x). The calculator will warn you in that case. Otherwise, once b0 and b1 are known, every predicted value is computed from ŷ = b0 + b1x.

Interpreting Slope and Intercept

The slope is usually the most important parameter. It translates your line into a statement about change: “for each 1 unit increase in x, y changes by about b1 units on average.” If b1 is positive, y tends to increase with x. If b1 is negative, y tends to decrease as x increases. The intercept is the value the line predicts at x = 0. That is meaningful when x = 0 is realistic in your context (for example, if x is time starting at 0), but it can be purely mathematical if x = 0 is outside your data range.

Correlation r and R²: Strength vs Explained Variance

In simple linear regression, the correlation coefficient r describes the direction and strength of the linear relationship. It ranges from −1 to +1. Values near +1 mean a strong increasing linear relationship; values near −1 mean a strong decreasing relationship; values near 0 suggest a weak or no linear relationship.

, the coefficient of determination, describes how much of the variance in y is explained by the model. In simple regression, R² equals r². An R² of 0.75 means that about 75% of the variability in y is accounted for by the line — but that does not automatically mean the model is “good.” You also need to look at error size (RMSE), residual patterns, and whether the relationship is plausible for your domain.

Metric Meaning Range What to watch for
Slope (b1) Average change in y per 1 unit of x Any real number Outliers can skew slope heavily
Intercept (b0) Predicted y when x = 0 Any real number May be meaningless if x=0 isn’t relevant
Correlation (r) Direction/strength of linear association −1 to +1 High r doesn’t prove causation
Variance in y explained by the line 0 to 1 Can look high even with bad extrapolation
RMSE Typical error size in y units 0 to ∞ Compare to the scale of y and domain tolerance

Residuals: The Most Important Diagnostic

If you only look at the regression equation and R², you can miss major issues. Residuals reveal whether the straight-line assumption is reasonable and whether error variance is stable across x. Ideally, residuals should look like random noise: no strong patterns, no funnel shapes, and no clusters.

Use the Residuals tab in this tool to review each point’s predicted value and residual. If you see residuals that grow in magnitude as x increases, that can indicate heteroscedasticity (non-constant variance). If residuals curve or switch signs systematically, a non-linear model might fit better.

Confidence Intervals and What They Mean

This calculator can show confidence intervals for the slope and intercept, and (optionally) a confidence interval for the mean prediction at a chosen x value. Confidence intervals quantify uncertainty in the estimated line based on your sample. They are most meaningful when the regression assumptions are reasonably satisfied.

The tool uses the usual OLS standard errors, based on mean squared error:

SSE = Σ(y − ŷ)²
MSE = SSE / (n − 2)

SE(b1) = √(MSE / Sxx)
SE(b0) = √(MSE(1/n + x̄²/Sxx))

With degrees of freedom df = n − 2, a (for example) 95% confidence interval is:

b1 ± t* · SE(b1)     and     b0 ± t* · SE(b0)

Where t* is a critical value from the Student’s t distribution. For very small datasets, confidence intervals can be wide — which is a useful signal that the line is uncertain.

How to Use the Linear Regression Calculator

Quick workflow

  1. Add your data points in the Data tab (or paste them in the Paste tab).
  2. Click Calculate to compute the best-fit line and statistics.
  3. Use Predict to estimate y for new x values (and optionally show a confidence interval).
  4. Review the Residuals table to spot outliers and non-linear patterns.

Common Mistakes to Avoid

  • Extrapolating too far: A line can fit your observed range well but fail outside it.
  • Ignoring outliers: A single extreme point can dramatically change slope and R².
  • Confusing correlation with causation: Regression describes association, not mechanism.
  • Forgetting units: RMSE is in y units; slope units are (y units)/(x units).
  • Assuming linearity: Residual patterns often tell you when a curved model is better.

When Simple Linear Regression Is the Right Tool

Use simple linear regression when you have one primary predictor x and the relationship with y is approximately linear within your range of interest. If you have multiple predictors, interactions, seasonality, or strong curvature, you may need multiple regression or non-linear models. Even then, simple regression remains a valuable first step because it provides an interpretable baseline and helps you understand data scale, trends, and error levels quickly.

FAQ

Linear Regression Calculator – FAQs

Answers about slope, intercept, r, R², residuals, forecasting, and regression assumptions.

It finds the best-fit straight line for your (x, y) data using least squares and reports key statistics like slope, intercept, correlation (r), R², error metrics, and predictions.

For simple linear regression, the model is ŷ = b0 + b1x, where b1 is the slope and b0 is the intercept. The calculator estimates b0 and b1 from your data.

The slope (b1) is the average change in y for a 1-unit increase in x. A positive slope means y tends to increase as x increases; a negative slope means it tends to decrease.

R² (coefficient of determination) is the fraction of variance in y explained by the line. For example, R² = 0.80 means the line explains about 80% of the variability in y.

r is the correlation coefficient that measures direction and strength of linear relationship (from −1 to +1). R² is r² in simple regression and measures explained variance (from 0 to 1).

Once the line is estimated, you plug a new x value into ŷ = b0 + b1x to get a predicted y. The calculator can also show a confidence interval for the mean prediction.

You can, but be careful with extrapolation. Predictions are more reliable within the range of your observed x values and when the relationship is close to linear.

Residuals (y − ŷ) show how far each point is from the line. RMSE summarizes typical prediction error size in the same units as y.

Common assumptions include linearity, independent observations, constant variance (homoscedasticity), and approximately normal residuals for inference like confidence intervals.

This tool is for educational and analytical use. If your data involves important decisions, validate model assumptions and consider domain-specific methods.