Lasso regression is a method we can use to fit a regression model when multicollinearity is present in the data.

In a nutshell, least squares regression tries to find coefficient estimates that minimize the sum of squared residuals (RSS):

**RSS = Σ(y _{i} – ŷ_{i})2**

where:

**Σ**: A greek symbol that means*sum***y**: The actual response value for the i_{i}^{th}observation**ŷ**: The predicted response value based on the multiple linear regression model_{i}

Conversely, lasso regression seeks to minimize the following:

**RSS + λΣ|β _{j}|**

where *j* ranges from 1 to *p* predictor variables and λ ≥ 0.

This second term in the equation is known as a *shrinkage penalty*. In lasso regression, we select a value for λ that produces the lowest possible test MSE (mean squared error).

This tutorial provides a step-by-step example of how to perform lasso regression in R.

**Step 1: Load the Data**

For this example, we’ll use the R built-in dataset called **mtcars**. We’ll use **hp** as the response variable and the following variables as the predictors:

- mpg
- wt
- drat
- qsec

To perform lasso regression, we’ll use functions from the **glmnet** package. This package requires the response variable to be a vector and the set of predictor variables to be of the class **data.matrix**.

The following code shows how to define our data:

**#define response variable
y <- mtcars$hp
#define matrix of predictor variables
x <- data.matrix(mtcars[, c('mpg', 'wt', 'drat', 'qsec')])
**

**Step 2: Fit the Lasso Regression Model**

Next, we’ll use the **glmnet()** function to fit the lasso regression model and specify **alpha=1**.

Note that setting alpha equal to 0 is equivalent to using ridge regression and setting alpha to some value between 0 and 1 is equivalent to using an elastic net.** **

To determine what value to use for lambda, we’ll perform k-fold cross-validation and identify the lambda value that produces the lowest test mean squared error (MSE).

Note that the function **cv.glmnet()** automatically performs k-fold cross validation using k = 10 folds.

**library(glmnet)
#perform k-fold cross-validation to find optimal lambda value
cv_model <- cv.glmnet(x, y, alpha = 1)
#find optimal lambda value that minimizes test MSE
best_lambda <- cv_model$lambda.min
best_lambda
[1] 5.616345
#produce plot of test MSE by lambda value
plot(cv_model)
**

The lambda value that minimizes the test MSE turns out to be **5.616345**.

**Step 3: Analyze Final Model**

Lastly, we can analyze the final model produced by the optimal lambda value.

We can use the following code to obtain the coefficient estimates for this model:

**#find coefficients of best model
best_model <- glmnet(x, y, alpha = 1, lambda = best_lambda)
coef(best_model)
5 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) 484.20742
mpg -2.95796
wt 21.37988
drat .
qsec -19.43425**

No coefficient is shown for the predictor **drat** because the lasso regression shrunk the coefficient all the way to zero. This means it was completely dropped from the model because it wasn’t influential enough.

Note that this is a key difference between ridge regression and lasso regression. Ridge regression shrinks all coefficients *towards* zero, but lasso regression has the potential to remove predictors from the model by shrinking the coefficients *completely* to zero.

We can also use the final lasso regression model to make predictions on new observations. For example, suppose we have a new car with the following attributes:

- mpg: 24
- wt: 2.5
- drat: 3.5
- qsec: 18.5

The following code shows how to use the fitted lasso regression model to predict the value for *hp* of this new observation:

**#define new observation
new = matrix(c(24, 2.5, 3.5, 18.5), nrow=1, ncol=4)
#use lasso regression model to predict response value
predict(best_model, s = best_lambda, newx = new)
[1,] 109.0842**

Based on the input values, the model predicts this car to have an *hp* value of **109.0842**.

Lastly, we can calculate the R-squared of the model on the training data:

**#use fitted best model to make predictions
y_predicted <- predict(best_model, s = best_lambda, newx = x)
#find SST and SSE
sst <- sum((y - mean(y))^2)
sse <- sum((y_predicted - y)^2)
#find R-Squared
rsq <- 1 - sse/sst
rsq
[1] 0.8047064
**

The R-squared turns out to be **0.8047064**. That is, the best model was able to explain **80.47%** of the variation in the response values of the training data.

You can find the complete R code used in this example here.

Shoudnt you standarize before Lasso/Ridge?

Hi Zach

Thank you for this post. I couldn’t reproduce all the results until I remembered that `cv.glmnet` is not deterministic. You might want to add a seed. This helps others to be sure, that their code does the same as your example :).

Is it possible to extract the residuals part of the lasso regression as a data.frame?

you fit the final model to the same data that you used to find the optimal lambda. That is the book example of overfitting. Please correct this as you contribute to false practice of ML model application.

How can it be used for the lasso-based Cox Ph model to produce a full coefficient table (including p-values)?

how do i then find the RMSE and MSE

how can I then find find RMSE for lasso in R?

Is it not necessary to scale to some consistent range across variables prior to fitting the model?

I want to cite this amazing work in my paper but couldn’t find your full name. Could you send me your full name please?

Thank you so much for your contribution. It helps me a lot. Keep up the good work.

Hi Zach

please i had an issue with “sst” as when i write the it down in R,

sst ” sign of R,

then every thing had error as it doesn’t recognize “sst”

hoping to have your help in that.

Mohamed Morsy

I split my dataset into training (80%) and test (20%) datasets.

I have conducted all of the above (up to the coefficients) on the training dataset.

Now, I want to test the model on the ‘test dataset’.

How do I do that?

Thanks in advance.