Multiple linear regression in Python

QBoard » Statistical modeling » Stats - Tech » Multiple linear regression in Python

User Dashboard

Multiple linear regression in Python

Back To Topics

Tags : statistics python numpy linear-regression scipy

Mitali Bhavsar

32 1

I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).

For example, with this data:

print 'y        x1      x2       x3       x4      x5     x6       x7'
for t in texts:
    print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
   .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)

(output for above:)

  y        x1       x2       x3        x4     x5     x6       x7
   -6.0     -4.95    -5.87    -0.76     14.73   4.02   0.20     0.45
   -5.0     -4.55    -4.52    -0.71     13.74   4.47   0.16     0.50
  -10.0    -10.96   -11.64    -0.98     15.49   4.18   0.19     0.53
   -5.0     -1.08    -3.36     0.75     24.72   4.96   0.16     0.60
   -8.0     -6.52    -7.45    -0.86     16.59   4.29   0.10     0.48
   -3.0     -0.81    -2.36    -0.50     22.44   4.81   0.15     0.53
   -6.0     -7.01    -7.33    -0.33     13.93   4.32   0.21     0.50
   -8.0     -4.46    -7.65    -0.94     11.40   4.43   0.16     0.49
   -8.0    -11.54   -10.03    -1.03     18.18   4.28   0.21     0.55

How would I regress these in python, to get the linear regression formula:

Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + +a7x7 + c

December 19, 2020 11:02 AM IST

Sai Anirudh

24
sklearn.linear_model.LinearRegression will do it:
```
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts],
        [t.y for t in texts])
```
Then clf.coef_ will have the regression coefficients.

sklearn.linear_model also has similar interfaces to do various kinds of regularizations on the regression.
December 21, 2020 1:09 PM IST

0

Tarun Reddy

Use scipy.optimize.curve_fit. And not only for linear fit.

from scipy.optimize import curve_fit
import scipy

def fn(x, a, b, c):
    return a + b*x[0] + c*x[1]

# y(x0,x1) data:
#    x0=0 1 2
# ___________
# x1=0 |0 1 2
# x1=1 |1 2 3
# x1=2 |2 3 4

x = scipy.array([[0,1,2,0,1,2,0,1,2,],[0,0,0,1,1,1,2,2,2]])
y = scipy.array([0,1,2,1,2,3,2,3,4])
popt, pcov = curve_fit(fn, x, y)
print popt

December 21, 2020 3:05 PM IST

Vinaya Chahal

29 1

You can use numpy.linalg.lstsq:

import numpy as np

y = np.array([-6, -5, -10, -5, -8, -3, -6, -8, -8])
X = np.array(
    [
        [-4.95, -4.55, -10.96, -1.08, -6.52, -0.81, -7.01, -4.46, -11.54],
        [-5.87, -4.52, -11.64, -3.36, -7.45, -2.36, -7.33, -7.65, -10.03],
        [-0.76, -0.71, -0.98, 0.75, -0.86, -0.50, -0.33, -0.94, -1.03],
        [14.73, 13.74, 15.49, 24.72, 16.59, 22.44, 13.93, 11.40, 18.18],
        [4.02, 4.47, 4.18, 4.96, 4.29, 4.81, 4.32, 4.43, 4.28],
        [0.20, 0.16, 0.19, 0.16, 0.10, 0.15, 0.21, 0.16, 0.21],
        [0.45, 0.50, 0.53, 0.60, 0.48, 0.53, 0.50, 0.49, 0.55],
    ]
)
X = X.T  # transpose so input vectors are along the rows
X = np.c_[X, np.ones(X.shape[0])]  # add bias term
beta_hat = np.linalg.lstsq(X, y, rcond=None)[0]
print(beta_hat)

RESULT:

[ -0.49104607   0.83271938   0.0860167    0.1326091    6.85681762  22.98163883 -41.08437805 -19.08085066]

December 21, 2020 3:28 PM IST

Member Sign In

Member Sign In

Create Account

Multiple linear regression in Python

Connect With Us