Blaze
Dev

Regression: An Overview

In general, regression refers to the phenomenon of "falling towards" something. It describes a tendency of data to regress back towards a central value. In machine learning, regression is used to model relationships between variables, predicting the outcome of one variable based on others.

Linear Regression

Linear regression is a type of regression where the data points fall toward a straight line or hyperplane. This line or hyperplane is determined by fitting a regression line through the data points.

Terminology

Cost Function

Once we provide the training data to the model, we want it to learn the underlying patterns. For this, we define a cost function, which measures how close the predicted values are to the actual data points.

How We Train the Model Using the Cost Function

A line in linear regression can be represented as: Data Exploration Report

y^=θ1x+θ0\hat{y} = \theta_1 \cdot x + \theta_0

Where:

We need to tweak the parameters ( w or θ₁ ) (weights) and ( b or θ₀ ) (bias) to minimize the cost function.

Gradient Descent

To adjust ( w ) and ( b ), we use algorithms such as gradient descent, one of the most popular optimization algorithms.

The goal is to find the minimum of the cost function ( J(w) ). For this, we calculate the gradients, which indicate the direction and magnitude of the adjustment.

Gradient Descent Process

Learning Rate

Cost Function: Ordinary Least Squares (OLS)

The cost function ( J(w) ) for Ordinary Least Squares (OLS) regression is defined as:

J(w)=12mi=1m(yiy^i)2J(w) = \frac{1}{2m} \sum_{i=1}^{m} (y_i - \hat{y}_i)^2 J(w)=12mi=1m(wx+by)2J(w) = \frac{1}{2m} \sum_{i=1}^{m} (wx + b - y)^2

Where:

Gradient Calculations

To minimize the cost function, we compute the gradient with respect to ( w ) and ( b ) as follows:

Jw=1mi=1m((wx+b)y)x\frac{\partial J}{\partial w} = \frac{1}{m} \sum_{i=1}^{m} \left( (wx + b) - y \right) x

Jb=1mi=1m((wx+b)y)\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^{m} \left( (wx + b) - y \right)

Why Do We Subtract the Gradient?

We subtract the gradient because we want to move in the direction of the negative slope (i.e., towards the minimum). If the slope is negative, subtracting the gradient increases ( w\mathbf{w} ); if the slope is positive, it decreases ( w\mathbf{w} ).

This ensures that the model parameters are adjusted in the correct direction to minimize the cost function.

Multivariate Linear Regression

In simple linear regression, we only have one feature and one label, which forms a straight line. However, in multivariate linear regression, there are multiple features .

Equation for Multivariate Linear Regression

The regression equation in multivariate linear regression is:

y^=wTx+b\hat{y} = \mathbf{w}^T \mathbf{x} + b

Where:

Gradient Descent for Multiple Features

In multivariate regression, the gradient descent algorithm works similarly to the univariate case. However, we adjust the weights and bias for each feature.

For every iteration, we subtract the gradient scaled by the learning rate from the weights and bias:

w=wαJw\mathbf{w} = \mathbf{w} - \alpha \frac{\partial J}{\partial \mathbf{w}} b=bαJbb = b - \alpha \frac{\partial J}{\partial b}

Where ( α\alpha ) is the learning rate.

Assumptions of Linear Regression

While linear regression is a powerful tool, it is built on several key assumptions. Violating these assumptions can lead to inaccurate or misleading results. Below are the assumptions of linear regression, common problems that arise when they are violated, and the necessary tests to identify and address these issues.

1. Linearity

2. Independence of Errors

3. Homoscedasticity

4. Normality of Errors

5. No Multicollinearity

6. No Autocorrelation

When Problems Arise

Violations of these assumptions can result in issues like biased coefficients, inefficient estimates, incorrect standard errors, or even failure of the model to generalize.

1. Violation of Linearity

2. Violation of Independence of Errors

3. Violation of Homoscedasticity

4. Violation of Normality of Errors

5. Multicollinearity

6. Autocorrelation

Tests to Identify Problems

1. Durbin-Watson Test for Autocorrelation

2. Breusch-Pagan Test for Heteroscedasticity

3. Variance Inflation Factor (VIF) for Multicollinearity

4. Shapiro-Wilk Test for Normality

How to Solve These Issues

1.1. Non-linearity: Try feature transformation or use a more complex model.
2.2. Heteroscedasticity: Use Weighted Least Squares (WLS) or transform the dependent variable.
3.3. Autocorrelation: Apply time-series models like ARIMA or introduce lagged variables.
4.4. Multicollinearity: Remove or combine variables or use PCA.
5.5. Non-normality of errors: Apply transformations or use robust models that do not assume normality.


Conclusion

Identifying and solving issues in linear regression involves understanding the assumptions and running the appropriate tests. By addressing violations of these assumptions, you can ensure that your model produces reliable and valid predictions.