Comparison Between Logistic Regression And Linear Regression
Recently, I have learned a new statistical model called logistic regression in my statistics courses, which is used to model a binary dependent variable. This statistical model is different from linear regression, which is usually used to solve questions of simple researches. As a model I am most familiar with and most frequently used, linear regression is a linear method to model the relationship between a dependent variable and one or multiple independent variables. In the following, I am going to indicate differences between logistic regression and linear regression in several aspects.
![]() |
Figure: Comparison between linear regression and logistic regression |
First of all, the goals of the two regression models are distinct. Linear regression aims to determine the nest fit line that can accurately show the output of a continuous dependent variable, whereas logistic regression aims to predict a categorical dependent variable based on information from one or multiple explanatory variables. It is noteworthy that the types of dependent variables are distinct in the two models. Additionally, since the type of the dependent variable is categorical in logistic regression, the output of the logistic regression problem can be only between 0 and 1; however, the output of the linear regression model is a continuous value, such as price, height, weight and score. Moreover, the least-square estimation method is used for the estimation of accuracy in linear regression, while the statistician often uses the maximum likelihood estimation method for the estimation of accuracy. Besides, the relationship between the dependent variable and independent variables must be linear in the linear regression, on the other hand, it is not required that the dependent and independent variables have a linear relationship in the logistic regression. Significantly, the log of the mean rate must be a linear function of independent variables in the logistic regression. Furthermore, collinearity might exist between independent variables in the linear regression, but this potential problem would not be concerned in the logistic regression.
Specifically, logistic regression is better than linear regression for modelling percentage data. It is because the binomial data can be presented by the percentage conveniently and logistic regression is appropriate to be used for binomial data. If the statistician uses the linear regression for binary data, three potential problems might appear: (1) the error term is heteroscedasticity; (2) the distribution of the error term is not normal; (3) the restriction requiring the prediction to fall between 0 and 1 does not exist. Weighted least-square regression can handle the first problem. Besides, if the sample size is large enough, the method of least squares can be used to make the estimators be approximately normal although the distribution of the error term is not normal. Unfortunately, there is still no good method to solve the third problem. Based on the above potential problems, logistic regression is more appropriate than linear regression for modelling percentage data.
Overall, there are various differences between logistic regression and linear regression. Even though the linear regression models might provide acceptable goodness of fit, logistic regression is better for modelling percentage data.
Reference:
Zhao, L., Chen, Y., & Schaffner, D. W. (2001). Comparison of logistic regression and linear regression in MODELING percentage data. Applied and Environmental Microbiology, 67(5), 2129-2135. doi:10.1128/aem.67.5.2129-2135.2001
Linear regression vs logistic regression - JAVATPOINT. (n.d.). Retrieved March 13, 2021, from https://www.javatpoint.com/linear-regression-vs-logistic-regression-in-machine-learning#:~:text=Linear%20regression%20is%20used%20to,given%20set%20of%20independent%20variables.&text=In%20logistic%20Regression%2C%20we%20predict%20the%20values%20of%20categorical%20variables.
Logistic regression. (n.d.). Retrieved March 29, 2021, from https://sci2lab.github.io/ml_tutorial/logistic_regression/
That's a good point. Learnt a lot from it.
ReplyDeleteThat’s really helpful to understand the logistic model!
ReplyDeleteThanks for your support.
DeleteI have learned it too. This is really helpful for me to understand the differences between logistic regression and linear regression.
ReplyDeleteThere are some basic differences between the two regressions. i hope it can really help you. Thanks!
DeleteWell-written! Thanks for sharing
ReplyDeleteI learned a lot from your text, and I would like to see more about your ideas in the future.
ReplyDeleteThanks for your appreciation! I hope it can really help you.
DeleteTotally agree! Learned a lot about how to distinguish these two regression models.
ReplyDeleteThat's similar with my post. We can discuss more about their limitations. Thanks for sharing.
ReplyDelete