On the front page of today's
Oregonian there is a headline that reads: "
More trees in a city bring surprising benefit, Portland study finds." You probably figured I would to post on this and get all worked up about the correlation vs. causation problem. You are right.
The article refers to
this study by Donovan, et. al. published in the journal
Health & Place. [NB: They also did one on trees and crime that
I blogged about before] In it, the authors find that increased tree canopy around a house significantly reduces the incidence that babies are being born that are 'small for gestational age'(SGA) - undersized - but has no impact on pre-term births (PTB). The marginal effect is very small, slightly more than a 0.1% decrease in the incidence of pre-term births from a 1% increase in tree canopy coverage within 50 meters of the house.
The first thing that even the most inexperienced empiricist thinks of is that tree cover is probably highly correlated with wealth and status and these are probably highly correlated with birth outcomes. To address these and other problems, the authors include measures of mothers education, age, race and, to proxy for income, the real market value of the home. The problem here is of course that the real market value of the home is not a very good proxy for income. Another good proxy for income might be - you guessed it - trees on the property. They are expensive and expensive to maintain so what trees might be capturing is the unexplained variation in income not controlled for by home value. So the correlation might be entirely spurious it is working entirely through the correlation with income.
But it actually gets worse. The authors, after making hay about the inclusion of these variables actively data mine them away. This is inexcusable. Look at the final regression that was estimated which I display below. Only the total births, no college education and a single race variable were included as controls and the proxy for income, market value of the house. In the iterative deletion process of other controls, one expects that the proxy for income was dropped because it was insignificant. But of course this can happen if it is correlated with another regressor...like trees! So now trees and college education are the remaining proxies for income and trees is probably the better one.
So this result is probably totally spurious, by which I mean that if we could adequately control for income and nutrition the effect would most likely vanish entirely (which is probably why they throw out covariates). Why I get so worked up about these things is that pretty soon in public policy debates in neighborhoods and cities you are going to hear references to this article (and the one on crime) and people are going to take it seriously and it may affect policy. This is dangerous because the real problem of income and information is not going to be solved by planting trees and money for trees would more effectively be spend in other ways if the objective is to improve prenatal health. It is possible that trees matter, but there is no way to know from this study.
In defending the results the authors say this:
Although no observational study can prove a causal relation- ship, consider the following strengths of the study. First, it builds on past experimental work demonstrating that trees can improve health outcomes (Ulrich, 1984). Second, if trees were merely proxies for positive neighborhood characteristics, one would expect that trees further than 50 m from a house would also be correlated with better birth outcomes, but they were not. Third, a wide range of individual and neighborhood characteristics, includ- ing many markers for socioeconomic status, were controlled for. Fourth, validation testing showed that results were not due to spurious correlation.
A coupe of things to say about this. One, I am not worried about trees being proxies for neighborhood characteristics, I am worries that trees are proxies for household characteristics, a worry further enhanced by the lacy of a 50 plus meter affect. Two, the validation testing is crap. It tells you that there is a correlation, but not the source of the correlation.
Another big problem of the study is the lack of any real model - just what exactly do trees do? The authors meekly try and suggest 'reduce stress', but little else. A model might help you, a priori, expect that trees close help and trees father away don't, etc. but as no such model exists, it is hard to interpret the results in a meaningful way. I would also like to know what causes SGA but does not affect PTB. My guess is nutrition, because intuitively physical health and mental health of the mother would seem to affect both. But I do not know and this is a vital question given the results. I mention this because proper nutrition (both from being able to afford it and from understanding what is proper nutrition) are probably highly correlated with income and education.
Unfortunately this paper is an example of too much of the stuff that passes for real 'research' in policy debates. It is not just useless, it is actively harmful and what we need are more policy types who are data savvy. I am going to try and do my part this term with my OSU MPP students.
Finally, I enjoyed immensely the quote in the
Oregonian from "Dr. Stephen Fortmann, a senior investigator at the Kaiser Permanente Center for Health Research in Portland." Dr. Fortmann is not an economist and is therefore very, very polite. But he clearly understands the weaknesses of the study when he says: "The issue with any observational epidemiological study is confounding. Is there a causal relationship here, or is something else going on?" An economist like me would just say it is crap. See, I just did.