QBoard » Statistical modeling » Stats - Conceptual » What is the difference between linear regression and logistic regression?

What is the difference between linear regression and logistic regression?

  • When we have to predict the value of a categorical (or discrete) outcome we use logistic regression. I believe we use linear regression to also predict the value of an outcome given the input values.

    Then, what is the difference between the two methodologies?

      November 27, 2021 10:56 AM IST
    0
  • Linear regression

    Is meant to resolve the problem of predicting/estimating the output value for a given element X (say f(x)). The result of the prediction is a continuous function where the values may be positive or negative. In this case you normally have an input dataset with lots of examples and the output value for each one of them. The goal is to be able to fit a model to this data set so you are able to predict that output for new different/never seen elements. Following is the classical example of fitting a line to set of points, but in general linear regression could be used to fit more complex models (using higher polynomial degrees):

    enter image description here

    Resolving the problem

    Linear regression can be solved in two different ways:

    1. Normal equation (direct way to solve the problem)
    2. Gradient descent (Iterative approach)

    Logistic regression

    Is meant to resolve classification problems where given an element you have to classify the same in N categories. Typical examples are, for example, given a mail to classify it as spam or not, or given a vehicle find to which category it belongs (car, truck, van, etc ..). That's basically the output is a finite set of discrete values.

    Resolving the problem

    Logistic regression problems could be resolved only by using Gradient descent. The formulation in general is very similar to linear regression the only difference is the usage of different hypothesis function. In linear regression the hypothesis has the form:

    h(x) = theta_0 + theta_1*x_1 + theta_2*x_2 .. 
    

    where theta is the model we are trying to fit and [1, x_1, x_2, ..] is the input vector. In logistic regression the hypothesis function is different:

    g(x) = 1 / (1 + e^-x)
    

    enter image description here

    This function has a nice property, basically it maps any value to the range [0,1] which is appropiate to handle propababilities during the classificatin. For example in case of a binary classification g(X) could be interpreted as the probability to belong to the positive class. In this case normally you have different classes that are separated with a decision boundary which basically a curve that decides the separation between the different classes. Following is an example of dataset separated in two classes.

    enter image description here

      February 11, 2022 12:25 PM IST
    0
    • Linear Regression is used to handle regression problems whereas Logistic regression is used to handle the classification problems.
    • Linear regression provides a continuous output but Logistic regression provides discreet output.
    • The purpose of Linear Regression is to find the best-fitted line while Logistic regression is one step ahead and fitting the line values to the sigmoid curve.
    • The method for calculating loss function in linear regression is the mean squared error whereas for logistic regression it is maximum likelihood estimation.
      January 11, 2022 3:36 PM IST
    0