Extracting Systematic Social Science Meaning from Text

Citation:

King, Gary, and Daniel Hopkins. 2007. “Extracting Systematic Social Science Meaning from Text”. Copy at http://www.tinyurl.com/y5cvn25r

Date Published:

Sep 16, 2007

Abstract:

We develop two methods of automated content analysis that give approximately unbiased estimates of quantities of theoretical interest to social scientists. With a small sample of documents hand coded into investigator-chosen categories, our methods can give accurate estimates of the proportion of text documents in each category in a larger population. Existing methods successful at maximizing the percent of documents correctly classified allow for the possibility of substantial estimation bias in the category proportions of interest. Our first approach corrects this bias for any existing classifier, with no additional assumptions. Our second method estimates the proportions without the intermediate step of individual document classification, and thereby greatly reduces the required assumptions. For both methods, we also correct statistically, apparently for the first time, for the far less-than-perfect levels of inter-coder reliability that typically characterize human attempts to classify documents, an approach that will normally outperform even population hand coding when that is feasible. These methods allow us to measure the classical conception of public opinion as those views that are actively and publicly expressed, rather than the attitudes or nonattitudes of the populace as a whole. To do this, we track the daily opinions of millions of people about President Bush and the candidates for the 2008 presidential nominations using a massive data set of online blogs we develop and make available with this article. We also offer easy-to-use software that implements our methods, which we also demonstrate work with many other sources of unstructured text.

Notes:

This paper describes material that is patent pending. Earlier versions of this paper were presented at the 2006 annual meetings of the Midwest Political Science Association (under a different title) and the Society for Political Methodology.
Download PDF

GA4 tracking code

Extracting Systematic Social Science Meaning from Text

Citation:

Date Published:

Abstract:

Notes:

Publications by Type

Publications by Author

Address

Main Menu

Secondary Menu

Follow Us

Newsletter Signup

5b6726b61a085b4ffdedd5fa00e968e4

952e7e4c5464ec2020e7d3ec2f8bb584