What are the media saying about your candidate?
In an analysis of five hundred articles from the last few days, Klobuchar leads in sentiment, Biden in volume.
There are only a handful of ways to talk about Presidential candidates: the personal bio, the policy analysis, the scandal, and the horse-race. At this point we are all familiar with the critique of the horse race, that without aggregated polling methods individual data points are misleading. In this new era, we have all sorts of fun data driven ways of writing stories about politics. One method that is particularly efficient is sentiment analysis. This is the first of what will be a weekly series, where I run a repeatable method with public code to get a sense of the sentiment in any given week. With these repeatable results we can begin to have some semblance of a social science of sentiment in an election.
To give you a sense of the data, this is a plot of all five hundred articles on a paragraph level, where color is the standard deviation of sentiment within, size is mean, and position on the y-axis is sum of sentiment.
Things to notice: that notch around 8,000 is a wave of hit pieces against one candidate in some regional papers. Some of those high peaking dots, are just lists of candidate names with positive words, and those low dots often shouldn’t be in the dataset or are read backwards in the first place.
Occam’s Razor and Text Mining
For this project, I will be doing my best to avoid unnecessary human intervention. Both because have stuff to do, but also because it is not good social science. Thus, I will avoid using word level analysis, opting instead for sentence level analysis. I can’t tell who an article is about. This is a really important point: this analysis uses a large dataset, in this case 500 news articles. Because I will be making a claim about the composition of the data, I will avoid grinding axes or making claims that depend on thick description of the records or close reading. If in a particular week of this project over eighteen months I do make this turn, I will mark it in a particular article accordingly. Also, get hyped for a lot of really boring articles about this. It is conceivable that after the nominations are in, roughly one year from now, I will slightly shift my search terms, or if the dynamics of the democratic race changes. I will declare these at those points in special posts.
Good Text Science
This might seem boring, but I am literally doing this as a form of descriptive social science. The state of the research in this field, known as Critical Search, a subset of Cultural Analytics, a communication related discipline, is Guldi 2018. In some of my other work I use the extensive methods described by Guldi, in this case, I am going to keep things very simple and repeatable. Will this give me flashy results? No. Is it likely to produce a null result? That’s ok. The core product here is an extremely basic sense of what people are saying in major newspapers. The repository for this project is linked here.
If this project is successful, we should have a weekly impression of at least one half of all newspaper sentiment regarding the democratic candidates. This is comprehensive and repeatable, yet limited by all of the problems with the method.
Who has the most positive coverage?
Highest mean sentiment in a paragraph featuring their name (first or last), with at least ten relevant paragraphs:
- Klobuchar 3.34
- Booker 2.69
- Warren 2.06***
- Buttigieg 1.99
- Sanders 1.95
- O’Rourke 1.47 AND Harris 1.47
- Gillibrand 1.33
- Biden .62
- Warren -.84
Why all the stars on Warren? There is something important to understand about this method: initially Warren was strongly negative. There were two reasons:
A. She was represented in a story about Facebook that was decidedly negative. I removed this major outlier.
B. There was a clutch of horribly negative, downright malicious content in some smaller California papers about the Senator. If I don’t remove those, she remains in 9th place. I take out the outlier and the cluster of repeated mean editorials, she is well behind Biden for sentiment this week, unless I intervene on her behalf.
Next week, I will NOT intervene on Warren’s behalf. If hit pieces are a part of the sentiment ecology, and they are, we need to include them in our primary score. I would guess that as this shifts from a minimax to a maximin game we will see much more of this, and our strategies for correction (likely a clustering method) will be an important topic for this series.
Who had the most coverage?
Paragraphs in their subcorpus.
- Biden 551
- Harris 160
- Sanders 159
- Warren 131
- Buttigieg 127
- O’Rourke 111
- Booker 71
- Klobuchar 43
- Gillibrand 31
What do we see here? Biden had an intense, bad week. After that coverage was balanced among the other candidates with a substantial drop-off from O’Rourke to Booker.
Complications for the “Narrative”
There is not evidence here for gender based breakdowns in coverage the descriptive quality of coverage. Harris and Warren are receiving major coverage. The coverage of Klobuchar and Gillibrand is positive, they simply don’t receive that much.
Biden is the frontrunner and dealing with a negative story.
There is a lot of overlap. That one super positive point for folks is a single positive paragraph that names a number of candidates, although not Biden. There are some real problems with autocorrelation: the signal is finding itself. Which turns back to the question of methods: does sentiment analysis help us understand the coverage?
What does high positivity and low volume likely mean? Either meta-coverage, the sort of why isn’t my candidate being covered story, or a profile piece of a relative unknown. Given this baseline, and the influence of heavy candidate specific stories this week, it will be important to see how this changes moving forward.