Use sklearn to do multilinear regression(by adding a random row)

Data file：Multiple linear regression.csv

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
sns.set()
##use sklearn
from sklearn.linear_model import LinearRegression
In :
data = pd.read_csv('C:\\Users\\Python_practice\\1.02. Multiple linear regression.csv')
In :
data.describe()   ##The dataset has 84 samples
Out:
SATGPARand 1,2,3
count84.00000084.00000084.000000
mean1845.2738103.3302382.059524
std104.5306610.2716170.855192
min1634.0000002.4000001.000000
25%1772.0000003.1900001.000000
50%1846.0000003.3800002.000000
75%1934.0000003.5025003.000000
max2050.0000003.8100003.000000
In :
x = data[['SAT','Rand 1,2,3']]
y = data['GPA']
In :
reg = LinearRegression()
reg.fit(x,y)
Out:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In :
reg.coef_
Out:
array([ 0.00165354, -0.00826982])
In :
reg.intercept_   # We get y = 0.296 + 0.0017*SAT - 0.0083*(Rand 1,2,3)
Out:
0.29603261264909486
In :
reg.score(x,y)    ##This is R-square, not adjusted R-square. Usually we use adjusted one to analyze Multilinear
Out:
0.4066811952814285
In :
##If We use feature with little explinatory power, the R-square would increase.
##Thus we need to penalize this excessive usage through the adjusted R-square

In :
x.shape  ##n=84(the number of observations), p=2(the number of predictors)
Out:
(84, 2)
In :
r2 = reg.score(x,y)
n = x.shape
p = x.shape
Rsquare_adj = [1 - (1 - r2)*(n-1)/(n-p-1)]
Out:
[0.39203134825134023]
In :
##Conclusion：Adjusted R-square is considerably less than R-square
##Thus one or more of the predictors have little or no explinatory power
##We need to eliminate those unnecessary features
##p-value>0.05, disregard it, in sklearn called f_regression
In :
from sklearn.feature_selection import f_regression
In :
f_regression(x,y)   ##second array is p-value
Out:
(array([56.04804786,  0.17558437]), array([7.19951844e-11, 6.76291372e-01]))
In :
p_value = f_regression(x,y)
p_value
Out:
array([7.19951844e-11, 6.76291372e-01])
In :
p_value.round(3)   ##We don't need so many digits, here we take only 3 digits
Out:
array([0.   , 0.676])

We find out Rand 1,2,3 is an useless feature

In :
##Make a conclusion table
reg_summary = pd.DataFrame(data = x.columns.values, columns = ['Features'])
reg_summary['Coefficients'] = reg.coef_
reg_summary['p-value'] = p_value.round(3)
reg_summary
Out:
FeaturesCoefficientsp-value
0SAT0.0016540.000
1Rand 1,2,3-0.0082700.676