How to Perform Logistic Regression in Excel


Logistic regression is a method that we use to fit a regression model when the response variable is binary.

This tutorial explains how to perform logistic regression in Excel.

Example: Logistic Regression in Excel

Use the following steps to perform logistic regression in Excel for a dataset that shows whether or not college basketball players got drafted into the NBA (draft: 0 = no, 1 = yes) based on their average points, rebounds, and assists in the previous season.

Step 1: Input the data.

First, input the following data:

Raw data in Excel

Step 2: Enter cells for regression coefficients.

Since we have three explanatory variables in the model (pts, rebs, ast), we will create cells for three regression coefficients plus one for the intercept in the model. We will set the values for each of these to 0.001, but we will optimize for them later.

Logistic regression in Excel

Next, we will have to create a few new columns that we will use to optimize for these regression coefficients including the logit, elogit,  probability, and log likelihood.

Step 3: Create values for the logit.

Next, we will create the logit column by using the the following formula:

Logistic regression equation in Excel

Step 4: Create values for elogit.

Next, we will create values for elogit by using the following formula:

Logistic regression calculations in Excel

Step 5: Create values for probability.

Next, we will create values for probability by using the following formula:

Step 6: Create values for log likelihood.

Next, we will create values for log likelihood by using the following formula:

Log likelihood = LN(Probability)

Step 7: Find the sum of the log likelihoods.

Lastly, we will find the sum of the log likelihoods, which is the number we will attempt to maximize to solve for the regression coefficients.

Step 8: Use the Solver to solve for the regression coefficients.

If you haven’t already install the Solver in Excel, use the following steps to do so:

  • Click File.
  • Click Options.
  • Click Add-Ins.
  • Click Solver Add-In, then click Go.
  • In the new window that pops up, check the box next to Solver Add-In, then click Go.

Once the Solver is installed, go to the Analysis group on the Data tab and click Solver. Enter the following information:

  • Set Objective: Choose cell H14 that contains the sum of the log likelihoods.
  • By Changing Variable Cells: Choose the cell range B15:B18 that contains the regression coefficients.
  • Make Unconstrained Variables Non-Negative: Uncheck this box.
  • Select a Solving Method: Choose GRG Nonlinear.

Then click Solve.

The Solver automatically calculates the regression coefficient estimates:

By default, the regression coefficients can be used to find the probability that draft = 0.

However, typically in logistic regression we’re interested in the probability that the response variable = 1.

So, we can simply reverse the signs on each of the regression coefficients:

logistic regression in Excel

Now these regression coefficients can be used to find the probability that draft = 1.

For example, suppose a player averages 14 points per game, 4 rebounds per game, and 5 assists per game. The probability that this player will get drafted into the NBA can be calculated as:

P(draft = 1) = e3.681193 + 0.112827*(14) -0.39568*(4) – 0.67954*(5) / (1+e3.681193 + 0.112827*(14) -0.39568*(4) – 0.67954*(5)) = 0.57.

Since this probability is greater than 0.5, we predict that this player would get drafted into the NBA.

Related: How to Create a ROC Curve in Excel (Step-by-Step)

74 Replies to “How to Perform Logistic Regression in Excel”

  1. Hi, how’s it going.

    Very helpful post. However in step 6, I guess you forgot to put the formula for calculating the log likelihood. Couldn’t find any explanation somewhere else. Hoping you can help us out with this one!

    Cheers

  2. Thanks for the tutorial. Unfortunately, it appears that your screenshot for step No. 6 is missing the formula.

  3. I’m confused by your last step. When I followed your procedure, my numbers came out as positive 3.68, etc…I didn’t have to reverse the signs. You note that the default is X=0, but those numbers are identical to what you then outline as being X=1. Can you clarify?

  4. The data are provided as an image. To enter the data, we would have to type them in. Would it make more sense to provide them as a table that we can copy?

  5. In Step 6: Create values for log likelihood, the formula to calculate log likelihood is missing. Please supply this formula.

  6. Hi Zach
    I think you left out the formula for creating the log likelihood values in step 6. I would love to know how you calculated those. Kind regards Katy

  7. Hello,

    This is very helpful, but I’m stuck on Step 6 (calculating the log likelihood). Your instructions say to use the following formula…but then there is no formula. What is the proper Excel function to use at this step?

    Thanks!

  8. -8,492092603 I found different loglikelihood value . i dont understand why its so different. probability and ln functions are ok. but ln() formula give me different numbers

  9. I used MS-Excel 2016 and I have gone through the procedure as you have instructed. Results for all columns were the same as your instruction notes. However, my results for log likelihoods differs from yours. what are the possible causes of the differences between my results and your results.

  10. Hi,
    I’m a retired ENT doctor and I’m just entering again -after some decades – into the field of statistics. Since I need to use logistic regression to evaluate the results of an experiment with a binomial outcome fail/pass I’ve been trying to understand this peculiar type of statistical analysis. However when I used the values reported in this example in an Excel sheet while the values of Logit e^logit, Probability are the same the LogLikelihood values are for some cells different ( for example the values in the cells H2 and H3 are identical, -0.7042); so also the sum ( cell H14) is different and when I apply the solver I get very high values for b0,b1, b3, b4 and probability values become all equal to 1 and LogLikelyhood values all equal to 0.
    I’d be very grateful If you could explain me which mistake I’m making.
    Thx in advance
    Bruno Morra

  11. Very helpful.. some numbers appear inconsistent. e.g., LN (.4945) returns .6822 in row 1 .7042 in row 2… other rows have the same inconsistency… Lastly, solver says no convergence, do I need to add constraints?

  12. Why are the calculations in step 6 varying with identical inputs? For example, H2 and H3 are identical inputs with different outputs.
    I installed the solver on windows 10 excel version for office 365 however the output following these steps gives me “the object cell values do not converge”

  13. Log likelihoods appear inconsistent. eg. , row 2 probability = .4945; LN( probability) = -.6822 ; row 3 probability = .4945;
    LN (probability ) = -.7042. Can you please explain.

  14. When I take the LN(Probability) I don’t get the same numbers as you. Not sure if that is significant but when I run the solver the analysis tells me the numbers don’t converge.

  15. The above e^Logit formula should be
    = F2/(1+F2) instead of 1/(1+F2)

    Following Step 6 gave me Solver convergence error.
    After browsing deeper I found out that the correct formula for calculating Likelihood (Step 6) is
    Y=Y*LN(P)+(1-Y)*LN(1-P).

    Which means in above excel case it will be
    =A2 * LN(G2) + (1-A2)*LN(1-G2)

  16. Hi Zach,

    Shouldn’t the information from the draft data (column A) be used in the optimisation?

    Thanks
    Neil

  17. https://www.statology.org/logistic-regression-excel/
    Thanks Zach, when I run this, I get the “the Objective Cell values do not converge”, but my file mimics exactly the same as yours, is there anything I need to edit in the Solver Options for GRG Nonlinear? thanks

  18. This doesn’t make any sense.

    1) where does the draft column factor in at all?
    2) why do you have different loglihood’s ol rows 1 and 2, they should be identical per your formula.

    I think the main error lies somewhere in there…

  19. Dear Zach,

    Many thanks for this. I noticed that the above post never refers to the ‘draft’ data in its formulae. Hence, the regression is not complete and solver doesnt converge. I believe step 6 should include data from the ‘draft’ column. Unless I’ve missed something. Hope this helps.

  20. Zach, when I run these same calculations, step by step, once I select Solve my b values change to these:
    b0 -3186.946
    b1 -53687.090
    b2 -17287.476
    b3 -17812.004

    Any ideas why this is happening?

  21. Hey Zach,

    I tried to follow your example, however I have a few discrepancies between the natural logs that you have in your screenshots and the ones that I’m getting. Then, when I use the solver, I do not get a converging result.

  22. Hi Zach,

    I was looking for examples of logistical regression calculations using spreadsheets and I found your site, which has helpful resources. Your efforts are commendable.

    Unfortunately I am having trouble reproducing the numbers in your example above.

    I think there is a problem with step 5: Create values for probability, where you compute Probability in cell G2 as
    G2 = 1/(1 + F2) [1].
    The logistic regression is defined as
    Probability = F2/(1 + F2), with F2 = exp(b_0 + b_1*B2 + b_2*C2 + b_3*D2),
    which corresponds to Probability
    G2 = 1/[1 + (1/F2)] or G2 = 1 – 1/(1 + F2) [2]
    i.e. my G2 = 1 – your G2.

    Moreover, your values in “Step 6: Create values for log likelihood = LN(Probability)”,
    column H, sometimes correspond to formula [1] above and sometimes to formula [2].
    Here is the output of my version
    Inputs Logit e^logit Prob. LN(Prob) 1-Prob LN(1-Prob) Web
    0 12 3 6 0.022 1.0222 0.4945 -0.7042 0.5055 -0.6822 -0.6822
    1 13 4 4 0.022 1.0222 0.4945 -0.7042 0.5055 -0.6822 -0.7042
    0 13 4 6 0.024 1.0243 0.4940 -0.7052 0.5060 -0.6812 -0.6812
    1 12 9 9 0.031 1.0315 0.4923 -0.7088 0.5077 -0.6778 -0.7088
    1 14 4 5 0.024 1.0243 0.4940 -0.7052 0.5060 -0.6812 -0.7052
    0 14 4 4 0.023 1.0233 0.4943 -0.7047 0.5057 -0.6817 -0.6817
    0 17 2 2 0.022 1.0222 0.4945 -0.7042 0.5055 -0.6822 -0.6822
    1 17 6 5 0.029 1.0294 0.4928 -0.7078 0.5072 -0.6788 -0.7078
    1 21 5 7 0.034 1.0346 0.4915 -0.7103 0.5085 -0.6763 -0.7103
    0 21 9 3 0.034 1.0346 0.4915 -0.7103 0.5085 -0.6763 -0.6763
    1 24 11 11 0.047 1.0481 0.4883 -0.7169 0.5117 -0.6699 -0.7169
    0 24 4 5 0.034 1.0346 0.4915 -0.7103 0.5085 -0.6763 -0.6763

    As it can be seen, the numbers in the last column, that correspond to the numbers in your Steps 6 and 7, alternate between LN(Probability) and LN(1-Prob) (which is the correct formula [1]), and seem to be correlated with the values in the first column’s cells “draft” being either 0 or 1: when draft = 0, the LN value matches LN(1-Prob) but it matches LN(Prob) when draft = 1.

    As a result, when I try to solve the regression, eXcel’s output doesn’t match your values, as it would be expected.

    It would be nice if you could provide some input on this issue.

    Thank you,

    Oscar Rondon

  23. Thanks for this Zach, but I’m confused about step 6. I get a log likelihood on row 2 of -0.704207679 which should match row 3 because they share the same probability. When I follow the remaining steps, I get an error in the Solver function that “Objective Cells do not converge”. Can you clarify how you arrived at the values in column H in step 6?

  24. I tried this and the solver managed to converge to 0 sum of log-likelyhood. It seems your log-likelyhoods are independent of the value of the response variable (“draft” in this case). I suspect your formula for log-likelyhood should be `=LOG(IF(A2, G2, 1-G2))`.

  25. Hi Zach, I think I see the issue, the probability for P(draft=0) formula should be e^logit/(1+e^logit) and for P(draft =1) should be 1/(1+e^logit). Once, I modified my spreadsheet formulas, I was able to tie out with your final results. Does this make sense? I’m definitely not a Logistic Regression expert. Thank you, Alex

  26. Hi Zach,

    Thanks for posting this, it helped me a lot! I noticed a few discrepancies –
    1) There is a step missing in the explanation: A ‘Likelihood’ column is needed after the ‘Probability’ column, in order to calculate LN(Likelihood). The ‘Likelihood’ formula is =IF(A2=1,G2,1-G2). Then ‘Log Likelihood’ will be the LN of the ‘Likelihood’ column. With that change, I get the same numbers as you.
    2) Something is off with the last screenshot – the Logits and coefficients have been negated and the other columns have different numbers. Also, I didn’t need to switch the signs on coefficients, they came out as P(X=1).
    I think once you add in the Likelihood column it will work. Thanks again!

  27. Likelihood is supposed to gauge fitness of coefficients to the sample. But here the likelihood formula isn’t influenced by the `draft` column.

  28. What formula are you actually using to get Log Likelihood here? You suggest it is =LN(Probability) but if I put this formula into excel I do not reproduce your Log Likelihood numbers, equally the solver does then not converge for me and the method doesn’t work.

    draft pts rebs ast Logit eLogit Probability Log Likelihood
    0 12 3 6 0.022 1.022243784 0.494500222 -0.704207679
    1 13 4 4 0.022 1.022243784 0.494500222 -0.704207679
    0 13 4 6 0.024 1.024290318 0.494000288 -0.705219179
    1 12 9 9 0.031 1.031485504 0.492250621 -0.708767301
    1 14 4 5 0.024 1.024290318 0.494000288 -0.705219179
    0 14 4 4 0.023 1.02326654 0.494250253 -0.704713304
    0 17 2 2 0.022 1.022243784 0.494500222 -0.704207679
    1 17 6 5 0.029 1.029424594 0.492750508 -0.707752302
    1 21 5 7 0.034 1.034584607 0.491500819 -0.710291674
    0 21 9 3 0.034 1.034584607 0.491500819 -0.710291674
    1 24 11 11 0.047 1.048122009 0.488252163 -0.71692328
    0 24 4 5 0.034 1.034584607 0.491500819 -0.710291674

    Thanks for your assistance.

  29. Would you explain why the values in H2 and H3 are different when the values in G2 and G3 are the same? Is there a missing step(s)? I got the same values for H2 and H3 and the data did not converge. I tried the values in the write up and they did not converge. I am not a solver skilled person so maybe I missed something.

    Frank

  30. Thank you for this post. I tried it with a small sample (20 banks) to try out 3 ratios. Unfortunately, I experienced some issues with my solver with the constraints. I tried adding some and changing the parameters but it didn’t work. I ended up using r for the maximization and inverted it but it was frustrating.
    Any tips for solver issues?
    Thank you,

  31. Zach — I think I’m missing something fundamental in your recipe and I hope you can help explain it to me: I don’t see any reference in your write-up to column A, the draft outcome that you are trying to model. In other words, it looks to me like everything that you describe can be run without even inputting Col. A, which seems pretty counterintuitive. Is there something missing from the Solver step?

  32. Many thanks Zach for this beautiful example to understand logistic regression. While replicating this logistic regression in excel, I ran into an error that says – “Objective cell values do not converge”. I am not quite sure how to fix it, though I understand a little bit that the objective function will keep on increasing even if constraints are satisfied and may not reach a stable objective function value.
    My question: how to put a constraint and how can i make this workable?
    many thanks!
    Samresh

  33. In step 6 de log likelihood is calculated: LN(Probability). Cells G2 and G3 show a probability of 0.4945. The result should be the same if LN(0.4945) is calculated. Cell H3 seems correct (-0.7042) , but the question is how cell H2 is calculated (-0.6822).

    The same question for all draft = 0 log likelihoods. They show a different value then LN(probability).

    I guess I am misinterpreting something

  34. Zach,

    Hope you’re up for humoring a question from someone that is in the process of trying to better understand statistics, regression analytics and predictive modeling somewhat late in life, after not having been math focused during my time in college. I am 100% open to the idea that my question is just an extension of my relative ignorance of the baseline mathematics and statistical theory at play here. That said…

    With the above example, how is the solver able to link the included independent variables with the observed actual outcomes for the binary dependent variable around whether the individuals included in the sample were drafted to the NBA? Unless I’m missing it, it seems like the solver is optimizing around permutations of calculations & transformations that are calculated without the use of any values in column A, even though column A would appear to be the sample size Y values for what you’re trying to predict. Completely open to the possibility that I’m just embarrassing myself here, but if you could help me out with what I’m missing, I’d sincerely appreciate it.

  35. Hi good document – suggest formula in H2 should be =A2*LN(G2)+(1-A2)*LN(1-G2) … the formula needs to be copied down

    rgds
    Bruce

  36. Are oyu sure this is correct. The likelihood function needs to involve the dependant variable (i.e. column A needs to be referenced?). Otherwise the solver is not using the actual outcome

  37. Cool set up, very helpful! Unfortunately, the example doesn’t seem to be replicable. Your LN(probability) calculation is not producing the actual LN of the probability column. And Solver doesn’t find the solution with the parameters you specified.

  38. Hi, could you help me with doing the probability below in excel?
    P(draft = 1) = e3.681193 + 0.112827*(14) -0.39568*(4) – 0.67954*(5) / (1+e3.681193 + 0.112827*(14) -0.39568*(4) – 0.67954*(5)) = 0.57.

  39. Hi Zach,

    Thanks for posting this.

    Unfortunately, I am running into two issues that are throwing off my results. One is that my Log Likelihood is -8.49 instead of -8.33.

    Secondly, when I run the solver it transforms all of my Log Likelihood variables into 0 and does this wild transformation to Probability, eLogit, and Logit variables. I would be happy to send you my excel file or send screenshots of some of the issues im experiencing.

    Thank you and I look forward to your response.

  40. The explanation for how to evaluate Log likelihood is poor.

    People may interpret your instruction of:
    Log likelihood = LN(Probability)
    as Log likelihood = LN(G2)

    This is incorrect. Here is the formula for how you calculate log likelihood on row 2:
    =(A2*LN(G2))+((1-A2)*LN(1-G2))

  41. There are steps missing in this tutorial, and it used the incorrect formula for calculating probability (probability = odds/(1+odds) not 1/(1+odds))

  42. I don’t understand, the draft column, our response variable is not used at all in this. Where does our actual response variable, and whether the predictors predict our response actually included in the model?

  43. Further, row G2 and G3 are identical (each have a probability of 0.4945) but when we take the natural log of them now they are suddenly different numbers in the screenshots above? Why? This means that when I’ve tried to replicate your results the sum of the log likelihood is different from what is reported above.

  44. Okay, I see what was unclear. To calculate the log likelihood we should be inputting:

    (A2*LN(G2)) + (1-A2)*(LN(1-G2)) then my answers more closely match yours.

    Also, how much variation in our solved values can we expect? Because every time I run the optimizer, it gives me a different solution.

  45. I don’t think this is correct. You are not using the draft values at all for calculating the likelihood. To do it right you need to calculate draft*prob(draft) + (1-draft)*(1-prob(draft)).

  46. I tried replicating this example but did not get the same results. It started with the setup. In the example, row 2, Log Likelihood = -0.6822. But the natural log of 0.4945 (Probability) is -0.7042. Differences such as this continue in the Log Likelihood column of the example leading to a different sum of Log Likelihood. Example = -8.3331. My attempt = -8.4921.

    Finally, when running the Solver with the settings indicated in the example, I had wildly different results for the regression coefficients: b0 = -3186.95, b1 = -53687.1, b2 = -17287.5, b3 = -17812.

  47. Nice ‘logistic Regression in Excel’ lesson BUT your formula showing in Step 5 is wrong; it should be =F2/(1+F2).
    Also, the probability for “1” and “0” should be calculated differently for “1” it’s =F2/(1+F2) while for “0” it’s = 1 – (F2/(1+F2)). The “Solver” won’t converge the way it is shown in this example.

  48. This is missing a step at step 6. the formula for the Log likelihood needs to include a reference to the column A for the variable is either 1 or 0. it appears your spreadsheet has this formula as the screen shot shows the different values for the same probability. Without this correction the formula as presented do not function.

  49. Hi Zach
    Thank you for this – I did all the steps but did not get the results when I press solve. If there something I missed?

  50. I have 2 problems with this example:

    1. Some of the values in the log likelihood column are incorrect; this can easily be seen since rows with the same probability values have different log likelihoods; I calculate the sum of these values to be -8.492. However, this should not affect the next stage where the solver is used.
    2. When I use the solver as instructed it simply gives me values for the regression coefficients as all being 0.

  51. Hi there,

    I got all numbers correct except the P(draft = 1)
    Can you please write it like you did with your example loit, elogit o rpropability?
    I could not get 0.57

  52. Hey, I tried to solve the above illustration in excel using the steps above. The results were different though to a large extent. Also, may you please share how to look for t stat and p value for the coefficients ? I am unable to check for the same.

Leave a Reply

Your email address will not be published. Required fields are marked *