Weekday As Dummy / Factor Variable In A Linear Regression Model Using Statsmodels
The question: How can I add a dummy / factor variable to a model using sm.OLS()? The details: Data sample structure: Date A B weekday 2013-05-04 25.03 88.51 Saturday 20
Solution 1:
You can use pandas categorical to create the dummy variables, or, simpler, use the formula interface where patsy transforms all non-numeric columns to the dummy variables, or other factor encoding.
Using the formula interface in this case (same as lower case ols
in statsmodels.formula.api) shows the result below.
Patsy sorts levels of the categorical variable alphabetically. 'Friday' is missing in the list of variables and has been selected as reference category.
>>>res=sm.OLS.from_formula('A~B+weekday',df).fit()>>>print(res.summary())OLSRegressionResults==============================================================================Dep. Variable: A R-squared:0.301Model: OLS Adj. R-squared:0.029Method: Least Squares F-statistic:1.105Date:Thu,03May2018 Prob(F-statistic):0.401Time: 15:26:02 Log-Likelihood:-97.898No. Observations: 26 AIC:211.8Df Residuals: 18 BIC:221.9Df Model:7Covariance Type:nonrobust========================================================================================coefstderrtP>|t| [0.0250.975]
----------------------------------------------------------------------------------------Intercept-1.471719.343-0.0760.940-42.11039.167weekday[T.Monday]2.58379.8570.2620.796-18.12423.291weekday[T.Saturday]-6.58899.599-0.6860.501-26.75513.577weekday[T.Sunday]9.22879.6160.9600.350-10.97529.432weekday[T.Thursday]-1.761010.321-0.1710.866-23.44519.923weekday[T.Tuesday]2.65079.6640.2740.787-17.65222.953weekday[T.Wendesday]-6.93209.911-0.6990.493-27.75413.890B0.40470.2581.5660.135-0.1380.948==============================================================================Omnibus: 1.039 Durbin-Watson:2.313Prob(Omnibus):0.595Jarque-Bera(JB):0.532Skew:-0.350Prob(JB):0.766Kurtosis:3.007Cond.No.638.==============================================================================Warnings:
[1] StandardErrorsassumethatthecovariancematrixoftheerrorsiscorrectlyspecified.
See patsy documentation for options for categorical encodings http://patsy.readthedocs.io/en/latest/categorical-coding.html
For example, the reference coding can be specified explicitly as in this formula
"A ~ B + C(weekday, Treatment('Sunday'))"
http://patsy.readthedocs.io/en/latest/API-reference.html#patsy.Treatment
Post a Comment for "Weekday As Dummy / Factor Variable In A Linear Regression Model Using Statsmodels"