QBoard » Statistical modeling » Stats - Tech » Multiple linear regression in Python

Multiple linear regression in Python

  • I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).

    For example, with this data:

    print 'y        x1      x2       x3       x4      x5     x6       x7'
    for t in texts:
        print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
       .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)​
    (output for above:)

      y        x1       x2       x3        x4     x5     x6       x7
       -6.0     -4.95    -5.87    -0.76     14.73   4.02   0.20     0.45
       -5.0     -4.55    -4.52    -0.71     13.74   4.47   0.16     0.50
      -10.0    -10.96   -11.64    -0.98     15.49   4.18   0.19     0.53
       -5.0     -1.08    -3.36     0.75     24.72   4.96   0.16     0.60
       -8.0     -6.52    -7.45    -0.86     16.59   4.29   0.10     0.48
       -3.0     -0.81    -2.36    -0.50     22.44   4.81   0.15     0.53
       -6.0     -7.01    -7.33    -0.33     13.93   4.32   0.21     0.50
       -8.0     -4.46    -7.65    -0.94     11.40   4.43   0.16     0.49
       -8.0    -11.54   -10.03    -1.03     18.18   4.28   0.21     0.55​

    How would I regress these in python, to get the linear regression formula:

    Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + +a7x7 + c

      December 19, 2020 11:02 AM IST
    0
  • sklearn.linear_model.LinearRegression will do it:
    from sklearn import linear_model
    clf = linear_model.LinearRegression()
    clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts],
            [t.y for t in texts])
    

    Then clf.coef_ will have the regression coefficients.

    sklearn.linear_model also has similar interfaces to do various kinds of regularizations on the regression.

      December 21, 2020 1:09 PM IST
    0
  • Use scipy.optimize.curve_fit. And not only for linear fit.
    from scipy.optimize import curve_fit
    import scipy
    
    def fn(x, a, b, c):
        return a + b*x[0] + c*x[1]
    
    # y(x0,x1) data:
    #    x0=0 1 2
    # ___________
    # x1=0 |0 1 2
    # x1=1 |1 2 3
    # x1=2 |2 3 4
    
    x = scipy.array([[0,1,2,0,1,2,0,1,2,],[0,0,0,1,1,1,2,2,2]])
    y = scipy.array([0,1,2,1,2,3,2,3,4])
    popt, pcov = curve_fit(fn, x, y)
    print popt
      December 21, 2020 3:05 PM IST
    0
  • You can use numpy.linalg.lstsq:
    import numpy as np
    
    y = np.array([-6, -5, -10, -5, -8, -3, -6, -8, -8])
    X = np.array(
        [
            [-4.95, -4.55, -10.96, -1.08, -6.52, -0.81, -7.01, -4.46, -11.54],
            [-5.87, -4.52, -11.64, -3.36, -7.45, -2.36, -7.33, -7.65, -10.03],
            [-0.76, -0.71, -0.98, 0.75, -0.86, -0.50, -0.33, -0.94, -1.03],
            [14.73, 13.74, 15.49, 24.72, 16.59, 22.44, 13.93, 11.40, 18.18],
            [4.02, 4.47, 4.18, 4.96, 4.29, 4.81, 4.32, 4.43, 4.28],
            [0.20, 0.16, 0.19, 0.16, 0.10, 0.15, 0.21, 0.16, 0.21],
            [0.45, 0.50, 0.53, 0.60, 0.48, 0.53, 0.50, 0.49, 0.55],
        ]
    )
    X = X.T  # transpose so input vectors are along the rows
    X = np.c_[X, np.ones(X.shape[0])]  # add bias term
    beta_hat = np.linalg.lstsq(X, y, rcond=None)[0]
    print(beta_hat)


    RESULT:

    [ -0.49104607   0.83271938   0.0860167    0.1326091    6.85681762  22.98163883 -41.08437805 -19.08085066]






      December 21, 2020 3:28 PM IST
    0