Simple Regression
Simple regression, also known as simple linear regression, is a statistical method used to model the relationship between two variables. The relationship between the variables is assumed to be linear, meaning that a straight line can adequately describe the association between them.
Y = β0 + β1X + ε
Where:
- Y is the dependent variable.
- X is the independent variable.
- β0 is the intercept (the value of Y when X is zero).
- β1 is the slope (the change in Y corresponding to a one-unit change in X).
- ε is the error term, representing the difference between the observed and predicted values of Y.
The goal of simple regression is to estimate the values of β0 and β1 that minimize the sum of squared differences between the observed and predicted values of Y. Once the model is fitted, we can use it to make predictions about the dependent variable based on new values of the independent variable.
Start with simple linear regression to understand the relationship between one independent variable and one dependent variable.
- Use the lm() function in R to fit a simple linear regression model.
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 6)
# Fit simple linear regression model
model <- lm(y ~ x)
# Summary of the model
summary(model)
Output:
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5
-0.6 0.6 0.8 -1.0 0.2
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8000 0.9381 1.919 0.1508
x 0.8000 0.2828 2.828 0.0663 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8944 on 3 degrees of freedom
Multiple R-squared: 0.7273, Adjusted R-squared: 0.6364
F-statistic: 8 on 1 and 3 DF, p-value: 0.06628
The “Call” section shows the formula used for the regression. The “Residuals” section displays the differences between the observed and predicted values. The “Coefficients” section provides estimates for the intercept and slope of the regression line, along with their standard errors and statistical significance. The “Residual standard error” indicates the average distance of data points from the regression line. The “Multiple R-squared” and “Adjusted R-squared” values measure the goodness of fit of the regression model. The “F-statistic” tests the overall significance of the model, with its associated p-value.
How to proceed from Simple to Multiple and Polynomial Regression in R
Regression analysis allows us to understand how one or more independent variables relate to a dependent variable. Simple linear regression, which explores the relationship between two variables. Multiple linear regression extends this to include several predictors simultaneously. Finally, polynomial regression introduces flexibility by accommodating non-linear relationships in the R Programming Language.