Tag Archives: Statistics

When Will Republicans Bail?

With one shocking headline after another, many progressives are starting to wonder if there is anything that would cause Republicans to stop supporting Donald Trump. Historically, partisans tend to rally around their president during a scandal. As the prior two links explain in more detail, Republicans didn’t start jumping ship until near the end of the Nixon investigation. Reagan’s approval among Republicans never dropped below 73 percent. Bill Clinton actually grew in popularity during the Monica Lewinsky investigation, since he was able to portray it as the work of a rabid partisan prosecutor.

It seems unlikely that leading Republicans who had backed Trump in the past will quickly bail on him now. Every prior backing ties these Republicans closer together. Any Republican who breaks ties now will face questions of “why didn’t you do this sooner?” The Republicans who break with Trump over the Comey firing and/or Trump’s ties to Russia will probably be the same Republicans who refused to endorse him in the general election and have otherwise taken a stand.

It looks like we need to wait for Republican voters to turn on Trump. Then Republicans in Congress will jump off the sinking ship. That’s not exactly the most optimistic proposition. After all, Trump faced a major scandal one month before the election when the Access Hollywood tape was released. Trump boasted about committing sexual assault on this tape. Anderson Cooper confronted Trump during the next debate about whether Trump understood what he was admitting to. Many progressives thought there was no way Trump could win after the tape came out…but he won the Electoral College anyway!

Obviously, a lot has happened in American politics since the Access Hollywood tape and the election. Voters who were willing to give Trump a chance could always say enough is enough. But how come voters didn’t reach this conclusion during the election? I thought it would be worth checking the 2016 American National Election Survey. The ANES has two waves, one before the election and one after the election. In the post-election wave, they asked two questions specifically about the Access Hollywood tape:

In October, the media released a 2005 recording of Donald Trump having a crude conversation about women. Have you heard about this video, or not?

95.54 percent of respondents said they heard about the video. There’s little partisan split: 95.11 percent of Trump voters said they heard about the video. So what did they do with the information. The ANES didn’t ask “did the video affect your vote?” Instead, they asked how others should use the information:

In deciding how to vote, how much do you think the information from the video should have mattered to people?

A great deal, a lot, a moderate amount, a little, or not at all?

I’m not an expert on polling language. My naive assumption is that asking should this have mattered to people is better than asking how did this affect your vote. Strong partisans probably made up their mind about the election well before the tape came out. However, strong partisans may also have the strongest feelings about what moderates should do with this information. To start off with, let’s look at some crosstabs for how people answered this question, based on who they said they voted for:

Video Should Matter… Other Voter Trump Voter Overall
A Great Deal 55.46% 1.99% 32.07%
A Lot 20.80% 4.37% 13.61%
Moderate Amount 13.26% 20.67% 16.50%
A Little 6.36% 34.59% 18.71%
Not At All 4.12% 38.39% 19.11%

There’s a definite partisan split here. There are also some differences within each group. Are there any other variables that explain how voters thought people should make sense of the Access Hollywood tape? If there’s something here beyond the basic “who did you vote for?” this may give us some clues about how people will react to Trump’s more recent scandals.

To test this possibility, I ran an ordered logit regression. The ologit model assumes we can create separate bins for each response category and place them in order from left to right. Each independent variable pushes respondents to the left or the right. For example, voting for Trump probably pushes people to the “the tape should not matter at all” side. The ologit model also assumes there are no walls or speed bumps that would keep a respondent in a middle bin instead of going all the way to an extreme if the value on the independent variables is high or low enough.

Here are the independent variables I used:

  • trumpvoter: Did the respondent vote for Trump or someone else? (Non-voters are in the ANES but excluded here)
  • male: Is the respondent male?
  • senior: Is the respondent a senior citizen? (Age 65 or older)
  • collegegrad: Is the respondent a college graduate?
  • nonwhite: Is the respondent non-Hispanic White or not? (I tested more specific racial categories)
  • foxindex: Did the respondent watch The O’Reilly Factor, Hannity or The Kelly File on a “regular basis” the month before the election? I added the yes responses together. The ANES tells respondents to check the box if they saw a particular TV show once in the last month. It’s a very low bar for TV news. Nonetheless, 65.42 percent of Trump voters have a zero here.
  • msnbcindex: Did the respondent watch Hardball, The Rachel Maddow Show or All In With Chris Hayes? I added the yes responses together for a 0-3 scale.
  • anynightlynet: Did the respondent watch any of the network nightly news broadcasts? Since these broadcasts are direct competitors in the same time slot, an index doesn’t make sense here. It’s just a yes/no variable.
  • ideology_post: The respondent’s self-described political ideology on a seven point scale, where 1 is strong liberal and 7 is strong conservative. Respondents who said they haven’t thought about their political ideology much were excluded (when I double-checked in a separate analysis, they were like moderates on this question). Strong liberals are the omitted category.

Here are the results. Positive coefficients push people towards saying the Access Hollywood tape should not matter at all. Sadly I have to screenshot this, so apologies for the mess:

Screen Shot 2017-05-19 at 8.56.54 AM

Screen Shot 2017-05-19 at 8.53.49 AM

Even after I added a bunch of control variables, whether or not someone voted for Trump is still the biggest factor determining how much they think the Access Hollywood tape should have mattered to people. It’s not surprising that Trump voters would back their candidate and tell others to ignore the scandal. Loyalty to a particular candidate (or maybe a party) blows most variables out of the water. The gender difference is so small that it is not statistically significant. Fox News didn’t push viewers further right here, although MSNBC pushed viewers a little further left.

The big surprise is the effect of political ideology. When I first ran this model, I treated ideology as a linear variable. I didn’t expect there to be anything all that dramatic. Using political ideology as a categorical variable was one of those last minute “I better double check everything before hitting publish” situations. In this model, strong liberals are the omitted category. The regression coefficients measure the difference between moderates and strong liberals, strong conservatives and strong liberals, etc.

Everyone is considerably to the right of strong liberals’ feelings about the Access Hollywood tape. Maybe a better way to put it is strong liberals felt very strongly that the tape should matter a great deal to people. Other respondents had more mixed opinions. After controlling for who someone voted for, there isn’t much of a difference between moderates and strong conservatives. To help make these regression coefficients more concrete, I used Stata’s margins command to give predicted probabilities for some respondents:

Not Trump Not Trump Trump Trump
Strong Lib. Moderate Moderate Strong Cons.
A Great Deal 87.36% 42.40% 4.02% 3.38%
A Lot 7.82% 24.30% 6.66% 5.71%
Moderate Amount 3.52% 18.44% 20.56% 18.46%
A Little 1.00% 6.83% 34.98% 34.58%
Not At All 0.30% 2.22% 33.78% 37.87%

There is a big jump among people who didn’t vote for Trump between strong liberals and moderates. (The median Clinton voter identified as slightly liberal.) There is another big jump between moderates who voted Trump versus moderates who voted for someone else. However, the difference between moderate Trump voters and strong conservative Trump voters is minimal.

Strong liberals rallied around the Access Hollywood tape. I don’t think any of the strong liberals I knew gave Trump a chance of winning after the tape was released. All the criticism of Trump being unqualified because of his incompetence and racism turned in to criticism that Trump is morally unqualified because he bragged about sexual assault. How could anyone but the most committed conservative vote for this man? I was always a bit dubious about this argument. Historically, the United States is a bit of an outlier in expecting moral purity from heads of state. Some voters are deeply affected by seeing someone who bragged about sexual assault in the White House. Other voters are more selfish, and mainly want to know what government will do for them.

Without any polling or data, my guess is that the current scandals surrounding Trump will play differently. Trump fired the head of the FBI and is making sweeping changes to law enforcement philosophy. Trump gave classified intelligence to the Russians and may have deep ties to Putin. This would fundamentally weaken national security. Trump’s current scandals are less symbolic. It’s easier to connect Trump’s latest actions to dangerous policy. If strong liberals focus on the tangible implications of Trump’s scandals – not just the symbolism – it may be possible to pull moderates and weaker conservatives away from Trump.


Facebook Dives Into the Echo Chamber

Earlier today, several in house researchers at Facebook published a study in Science regarding how much users engage with links that cut against their ideological beliefs. There are already a lot of thoughtful posts on this article, since there’s a lot to chew on. The basic finding isn’t too surprising: people are less likely to “engage” with links that do not correspond to their stated political beliefs. The authors argue there is a three-step process:

  1. We only see links posted by friends and other pages we follow, and they are not a random group. People tend to congregate on Facebook based on their political ideology.
  2. Facebook’s algorithm does not place all possible stories on our “News Feed” when we log in. It favors posts shared, liked and commended on by friends. The authors do not fully disclose how the algorithm works, but they do find it cuts down on how much people see stories that cross ideological boundaries. 5% of stories were screened out for self-identified liberals, and 8% for self-identified conservatives.
  3. Facebook users don’t click on every link. As I’ll discuss later, Facebook users ignore the vast majority of links to political stories. After controlling for things like the position of the link (people are much more likely to click on the first link when they log in), liberals were 6% less likely to click on a link mainly shared by conservatives. conservatives were 17% less likely to click on a link mainly shared by liberals.

This process makes sense for an individual story, but it’s a troubling model for studying months’ worth of Facebook user behavior. As Christian Sandvig points out, Facebook’s algorithm is based on what users engage with. In other words, if I tend to click on all of the fantasy baseball links I see in May, I will be more likely to see fantasy baseball links that people share in June. I’ll probably see some fantasy football links too, even though I want no part of fantasy football.

Separating step 3 from step 2 is problematic, but it appears to be the authors’ main goal in interpreting their results: “We conclusively establish that on average in the context of Facebook, individual choices more than algorithms limit exposure to attitude-challenging content.” To continue with the sports reference, this is where sociologists start throwing penalty flags. The interpretation found in the scholarly journal just happens to be the same argument that Andy Mitchell, the director of news and media partnerships for Facebook, gave when facing criticism last month. (See Jay Rosen’s criticism here.) As I argued weeks ago, Facebook isn’t in a position to get the benefit of the doubt. We’ll get back to the problems of how to interpret the article’s findings in a minute. First, it is important to understand how the group that the authors claim to study and the group they actually study are very different.

As Eszter Hargittai and other sociologists have pointed out, 91 percent of Facebook users were excluded from this study because they did not explicitly disclose their political ideology on their Facebook biography. Users were excluded for providing ideologies that weren’t explicitly liberal or conservative – a user who said their politics “are none of your damn business” would be dropped. It is unclear how self-identified “independents” were treated in this study (none of the posts I have seen mentioned this). My political scientist friends would like me to point out that self-identified independents are often treated as “moderates” when they are actually covert partisans. Users who did not log on at least four times a week were dropped as well.

Once Hargittai added all the exclusions, just under three percent of Facebook users were included in the study. As she argues, the 3% figure is far more important than the 10 million observations:

“Can publications and researchers please stop being mesmerized by large numbers and go back to taking the fundamentals of social science seriously? In related news, I recently published a paper asking “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” that I recommend to folks working through and with big data in the social sciences.”

The 3% of Facebook users who are included in the study are probably different from the 97% who are not.At this point, it would probably be helpful to separate the two groups

What Happens for People in the Sample?

One of the hardest things for a scholar to do is publish findings that aren’t surprising. We already know that people tend to have social networks with disproportionately like-minded people. The biggest effect that the Facebook researchers found is homophily. We don’t see a random selection of stories when we log on to Facebook because our friends aren’t a random group of humans. We see stories from people who we are friends with – assuming we haven’t muted those friends because of their postings – and from pages we follow. Most media scholars have found some degree of self-selection, and found it is most prominent online. Neither the study’s authors or critics want to emphasize this point, but the results seem pretty clear in the graphic below (reproduced from the article):

Screen Shot 2015-05-07 at 6.30.17 PM

Critics focus on the role of the algorithm (the “exposed” line in this graphic) versus the role of users choosing to ignore stories. When I first read the term, I thought “users’ choice” included choice of friends (the big drop for “potential from network” in the graphic). Apparently this only refers to whether users choose to click on a story or not (the last line of the graphic). It does not refer to whether users choose to unfriend or block a user because of their political beliefs. Maybe I’m thinking of this differently because I recently talked with someone who chose to unfriend everyone who didn’t share her political views. If we include adding and dropping friends to the big ledger of “user decisions” and Facebook’s friend suggestion algorithm to the big ledger of “algorithmic influence,” it is much easier to see why the authors would argue user behavior is so important, but I may be giving them more credit than they deserve.

The “News Feed” algorithm picks favorites, and we don’t fully know how, which is very troubling. On the other hand, it is only picking from the narrow subset of stories our friends have posted, and that may be a very narrow ideological range. As I wrote weeks ago, Facebook clearly has its thumb on the scale by not showing everything on a user’s “News Feed” when they log in. Facebook’s in-house researchers acknowledge some degree of algorithmic censorship of stories that are mainly shared by the other side instead of the user’s side. The effect is 5% for liberals and 8% for conservatives. This looks like Facebook has its thumb on the scale. However, the weight comes from who we are friends with.

Click-throughs as the End Measure? Really?

The emphasis on people clicking links was surprising to me, because “clickthroughs” are relatively rare. This study only included people who provide their political ideology on their Facebook pages. These users are likely to be more engaged with politics. We would expect them to be more likely to click on political links than other users. But the overall click-through rate reported in this article was only 6.53 percent. As many scholars and writers are finding, social media “engagement” often has a very low correlation with reading the link. In many cases, the low correlation is driven by posts that get a lot of likes and comments, even though people don’t read the story that gets linked to.

Imagine someone linked to a story about Hillary Clinton’s speech where she advocated for more pathways to citizenship for undocumented immigrants. Furthermore, imagine the person sharing the story is a conservative, arguing against Clinton’s “pro-amnesty” position. Other conservatives may rally around the Facebook post, seeing it as an opportunity to voice their complaints about Clinton instead of new information to be consumed to make more informed decisions in the democratic process. This behavior happens on both sides of the aisle. Progressives may post a link about Rand Paul’s avoiding a campaign stop in Baltimore for the same reason.

What About People Excluded from the Study?

 97% of Facebook users were excluded from the study. Some of these users will be just as partisan and ideological as the people who were included in the study; they just declined to put their ideology in their bio page. Other users may be less ideological or less interested in politics. Because most people interested in Facebook’s effect on the news are interested in political news, it is easy to overlook the fact that a lot of people who write online may not be all that interested in politics. (In my dissertation, I found a strong preference for bloggers publishing non-political phrases instead of political phrases during the time period of the 2008 election, but there are critical methodological differences between repeating phrases and showing holistic interest.)

If people do not engage in posting political stories or reading most political links on Facebook, would we expect them to learn anything about politics when they log on? I’m not sure if any research has been published specifically on this question yet, but studies of television “infotainment” suggest the answer is yes. Matthew Baum and Angela Jamison found that people who avoided the news but regularly watched shows like Oprah and David Letterman were better informed about politics than people who avoided the news and Oprah or Letterman. (Full disclosure, I worked as an RA for years on a project with Tim Groeling and Matt Baum.) Watching the news or reading the newspaper provides more information than “soft news,” but soft news can be surprisingly effective in communicating the broad strokes of current events.

Skimming Facebook may also give people the broad strokes of current events. People who have read their Facebook wall in the last two weeks may know there was a riot or uprising in Baltimore, even if they do not regularly watch the news or click on links to news stories. The difference in Facebook is exposure to political information is largely contingent on who your friends are, and your friends are more likely than not to congregate on the same side of the political spectrum. Thus, some people may have heard about the Baltimore riot while others heard about the Baltimore uprising.

Ironically, it is Mitchell, the Facebook executive, who offers the best advice on how to treat Facebook as a potential news source:

“We have to create a great experience for people on Facebook and give them the content they’re interested in. And like I said earlier, Facebook should be a complimentary news source.”

The problem with this is skimming Facebook could make it easy for people to feel like they are getting informed without actually being informed.

March Math-ness?

Nate Silver decided to launch fivethirtyeight.com on Monday as opposed to last week so he could launch the site with his own NCAA Tournament predictions and a little strategy about how to fill out your bracket. Even though college basketball’s single elimination format makes outcomes less predictable, lots of people are looking for any edge they can get in their March Madness pools. Silver’s timing makes sense. Outside of elections, the NCAA Tournament may have the highest demand for quantitative predictions of outcomes here in the United States.

While Silver defines his political coverage as a kind of corrective to the predictions of traditional pundits, college basketball is not an area that appears to need this corrective. If you happen to sneak away from work tomorrow to watch opening round games of the NCAA Basketball Tournament, you will probably see a preview screen that looks like this while waiting for a game to start:

Screen Shot 2014-03-19 at 6.00.21 PM

The graphics for “keys to the game” are difficult to read in this screenshot, but I’m not here to troll the lack of contrast. Instead, I want to point out how CBS’ online preview screen is embracing advanced statistics. If you could read the screen, you would see the “keys to the game” are eFG%, FT Rate, OR% and TO%. CBS is telling us each team’s record, but they aren’t telling us traditional things in a preview like Iowa scores 82 points per game and Tennessee scores 71.3 points per game. CBS isn’t telling us a count of which team gets more rebounds or which team commits more turnovers.

In college basketball, it is critical to look at statistics as rates instead of counts when predicting a winner, because some teams play much faster or slower than others. (I’m sure friends who are rooting for Wisconsin or Virginia can sympathize.) According to Dean Oliver, the four rates CBS put on its preview screen are the best way to predict basketball success. Quantitative analysts debate Oliver’s weighting (he has shooting at 40%, turnovers 25%, rebounding 20%, free throws 15%), but Oliver’s “four factors” is a common starting point for analysis.

In many ways, it makes sense that college basketball would be the sport where mainstream national columnists and broadcasters adopted “advanced statistics” with relatively little resistance. Anyone who spends tomorrow watching the NCAA Tournament will realize that a 10 point Wisconsin lead feels very different than a 10 point Oregon or BYU lead. It’s easy to convince columnists that rates are more informative than counts. It’s also easy for journalists and broadcasters to find an analyst who will take all the different rates and a few other key variables, combining them in to one number for offensive and defensive “efficiency” (I subscribe to Ken Pomeroy’s stats – it’s only $20 and a great investment for a college hoops fan.)

On the other hand, CBS’ preview graphic leaves a lot to be desired as a way to introduce casual viewers to these ideas. For starters, they only present offensive statistics, ignoring defense. Viewers do not get to see Tennessee’s defensive prowess throughout the year. Second, the acronyms on the screen won’t make sense to someone who doesn’t already know them. Live announcers assume viewers don’t know the acronyms and will explain the concepts on-air. Third, the photos do not feature either of Tennessee’s star players, which seems like a particularly glaring mistake for an upcoming TV broadcast! That being said, it’s an encouraging sign that CBS does not fear putting more sophisticated statistics on screen for the entire audience.