Skip to content Skip to sidebar Skip to footer

Weekday As Dummy / Factor Variable In A Linear Regression Model Using Statsmodels

The question: How can I add a dummy / factor variable to a model using sm.OLS()? The details: Data sample structure: Date A B weekday 2013-05-04 25.03 88.51 Saturday 20

Solution 1:

You can use pandas categorical to create the dummy variables, or, simpler, use the formula interface where patsy transforms all non-numeric columns to the dummy variables, or other factor encoding.

Using the formula interface in this case (same as lower case ols in statsmodels.formula.api) shows the result below. Patsy sorts levels of the categorical variable alphabetically. 'Friday' is missing in the list of variables and has been selected as reference category.

>>>res=sm.OLS.from_formula('A~B+weekday',df).fit()>>>print(res.summary())OLSRegressionResults==============================================================================Dep. Variable:                      A   R-squared:0.301Model:                            OLS   Adj. R-squared:0.029Method:                 Least Squares   F-statistic:1.105Date:Thu,03May2018   Prob(F-statistic):0.401Time:                        15:26:02   Log-Likelihood:-97.898No. Observations:                  26   AIC:211.8Df Residuals:                      18   BIC:221.9Df Model:7Covariance Type:nonrobust========================================================================================coefstderrtP>|t|      [0.0250.975]
----------------------------------------------------------------------------------------Intercept-1.471719.343-0.0760.940-42.11039.167weekday[T.Monday]2.58379.8570.2620.796-18.12423.291weekday[T.Saturday]-6.58899.599-0.6860.501-26.75513.577weekday[T.Sunday]9.22879.6160.9600.350-10.97529.432weekday[T.Thursday]-1.761010.321-0.1710.866-23.44519.923weekday[T.Tuesday]2.65079.6640.2740.787-17.65222.953weekday[T.Wendesday]-6.93209.911-0.6990.493-27.75413.890B0.40470.2581.5660.135-0.1380.948==============================================================================Omnibus:                        1.039   Durbin-Watson:2.313Prob(Omnibus):0.595Jarque-Bera(JB):0.532Skew:-0.350Prob(JB):0.766Kurtosis:3.007Cond.No.638.==============================================================================Warnings:
[1] StandardErrorsassumethatthecovariancematrixoftheerrorsiscorrectlyspecified.

See patsy documentation for options for categorical encodings http://patsy.readthedocs.io/en/latest/categorical-coding.html

For example, the reference coding can be specified explicitly as in this formula

"A ~ B + C(weekday, Treatment('Sunday'))"

http://patsy.readthedocs.io/en/latest/API-reference.html#patsy.Treatment

Post a Comment for "Weekday As Dummy / Factor Variable In A Linear Regression Model Using Statsmodels"