Tuesday, March 11, 2008

Interaction variables yet again

This is the first time I've put any thoughts on this but I had thought that everyone had read the post on New Economist and the links in the post. The catalyst for this post was a paper by Matthew Kotchen and Laura Grant, "Does Daylight Saving Time Save Energy? Evidence from a Natural Experiment in Indiana". What bothered me a little was that they labeled their paper "Very Preliminary" yet allowed themselves to present their findings in the Wall Street Journal. See Marginal Revolution's post for the link. What bothered me even more was that this paper was presented at the NBER and no one seemed to have caught the error. This indicates to me that economists make this error more often than not -- just like political scientists. See this link for the same error made by political scientists.

However, I have to admit that even after getting a PhD I was not pointed to this type of error until I started working - by statisticians. One analyst (another PhD economist) proposed estimating an equation with interaction terms but without including all of the variables as main (or level) variables. The statistician on the project had to point out that this is not correct. I can now see why this is the case. For instance, let's say we want to estimate:

Y = a + bX1 + cX2 + dX1X2
1. From an ANOVA standpoint there is no reason to exclude X1 and X2 separately (one or both) and just include X1X2.
2. Leaving out one of the main effects (or level variables), for instance, X2 is tantamount to assuming/imposing the restriction c = 0. There is no a priori reason to do this. Econometrics lets us test this restriction and there really is no harm to keeping it in.
3. Leaving out one variable is similar to doing model selection by dropping insignificant variables but in this case the authors do not test that this is the case. In any case, even if a variable is not statistically significant there is still no good reason to drop the variable in these types of analyses.
4. At most analysts should consider including the variable as a main effect as part of sensitivity analysis (even if they do not believe that the variable should be included as a main effect).

In their paper, Kotchen and Grant focus on the coefficient of the interaction, d, in this case which they use to support their claim that DST increases energy usage. My guess is that if they were to estimate the model correctly, the size of the coefficient, d, would fall. Right now their estimates of d are partially capturing the effects of the omitted variable. I suppose the other possibility is that including all the relevant variables as main effects could have resulted in some perfect collinearity although they don't indicate this is the case.

No comments: