Evaluate Slope And Error For Specific Category For Statsmodels Ols Fit
Solution 1:
Very brief background:
The general question for this is how does the prediction change if we change on of the explanatory variables, holding other explanatory variables fixed or averaging over those.
In the nonlinear discrete models, there is a special Margins method that calculates this, although it is not implemented for changes in categorical variables.
In the linear model, the prediction and change in prediction is just a linear function of the estimated parameters, and we can (mis)use t_test
to calculate the effect, its standard error and confidence interval for us.
(Aside: There are more helper methods in the works for statsmodels to make prediction and margin calculations like this easier and will be available most likely later in the year.)
As brief explanation of the following code:
- I make up a similar example.
- I define the explanatory variables for length = 1 or 2, for each animal type
- Then, I calculate the difference in these explanatory variables
- This defines linear combinations or contrast of parameters, that can be used in t_test.
Finally, I compare with the result from predict to check that I didn't make any obvious mistakes. (I assume this is correct but I had written it pretty fast.)
import numpy as np
import pandas as pd
from statsmodels.regression.linear_model import OLS
np.random.seed(2)
nobs = 20
animal_names = np.array(['cat', 'dog', 'snake'])
animal_idx = np.random.random_integers(0, 2, size=nobs)
animal = animal_names[animal_idx]
length = np.random.randn(nobs) + animal_idx
weight = np.random.randn(nobs) + animal_idx + length
data = pd.DataFrame(dict(length=length, weight=weight, animal=animal))
res = OLS.from_formula('weight ~ length * animal', data=data).fit()
print(res.summary())
data_predict1 = data = pd.DataFrame(dict(length=np.ones(3), weight=np.ones(3),
animal=animal_names))
data_predict2 = data = pd.DataFrame(dict(length=2*np.ones(3), weight=np.ones(3),
animal=animal_names))
import patsy
x1 = patsy.dmatrix('length * animal', data_predict1)
x2 = patsy.dmatrix('length * animal', data_predict2)
tt = res.t_test(x2 - x1)
print(tt.summary(xname=animal_names.tolist()))
The result of the last print is
Test for Constraints
============================================================================== coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
cat 1.0980 0.280 3.926 0.002 0.498 1.698
dog 0.9664 0.860 1.124 0.280 -0.878 2.811
snake 1.5930 0.428 3.720 0.002 0.675 2.511
We can verify the results by using predict and compare the difference in predicted weight if the length for a given animal type increases from 1 to 2:
>>> [res.predict({'length': 2, 'animal':[an]}) - res.predict({'length': 1, 'animal':[an]}) for an in animal_names]
[array([ 1.09801656]), array([ 0.96641455]), array([ 1.59301594])]
>>> tt.effectarray([ 1.09801656, 0.96641455, 1.59301594])
Note: I forgot to add a seed for the random numbers and the numbers cannot be replicated.
Post a Comment for "Evaluate Slope And Error For Specific Category For Statsmodels Ols Fit"