The multiple levels of Mister P

In a previous entry, we began to discuss the Mister P method for estimating constituency opinion.

Specifically, we looked at the `regression and post-stratification’ steps of the Mister P method. To re-cap, in the regression step, national survey data is used to model individual voters’ opinions as a function of their socio-demographic characteristics, yielding a predicted probability of holding a particular opinion for every voter type (where each voter type is a unique combination of socio-demographic characteristics). Then, at the second ‘post-stratification’ stage, opinion for an area is estimated by weighting the predicted probabilities for each voter type by the number of voters of that type living in the constituency.

To keep things simple, last time we left out one important aspect of Mister P that we discuss in more detail here: the use of a particular type of regression — multilevel regression — in the first step of Mister P.

As its name suggests, multilevel regression allows us to explicitly account for the fact that data is often structured at multiple levels. In our case, we can think of two levels in our national survey data. First, we can think of an `individual level’: our political opinion of interest, as well as demographic characteristics, all vary across individual respondents in our data.

Second, we can think of a `constituency level’: our respondents are clustered by the Parliamentary constituencies in which they live, and some variables vary at this constituency level — for example, constituency population density or constituency unemployment levels.

The key thing is that, because we are interested in estimating average political opinion in each constituency, it makes sense to make use of constituency level information by employing multilevel regression.

This involves a tweak to the standard regression that modelled political opinion only as a function of demographic characteristics that vary only at the individual level. To this regression model, we add a `random effect’ for each constituency. These constituency effects are modelled as draws from a common distribution and capture average
differences in opinion between survey respondents from different constituencies after accounting for
demographics.

As a result, they allow for the possibility that geography matters — that voters in some areas may be different from voters in other areas, even though they are similar in terms of age, education, or social grade.

When we incorporate multilevel regression at the first stage of Mister P, the post-stratification step remains very similar to before, except now we also add in the estimated constituency effects at this stage.

One big advantage of the multilevel regression approach is that it is very flexible. In fact, we can begin to model our constituency effects as a function of constituency level variables such as population density. This introduces further useful information about constituencies into our method, which can in turn lead
to further improvements in the accuracy of our eventual opinion estimates.

Posted in Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *

*