Sports Officials Get Outside Help. What if we Gave it to Debate Moderators?

A few weeks ago I posted about how moderating a debate or candidate forum is a losing proposition. It’s almost impossible to end with over half the audience happy with your performance. Moderators who challenge a candidate will be scorned by that candidate’s supporters. Moderators who don’t challenge candidates may come off as meek and lose face with partisans on both sides. Challenging a candidate tends to hurt the moderator’s future career, so I wouldn’t expect a lot of fireworks coming from Lester Holt later today.

Since I wrote that post, both campaigns have argued over whether the moderator should act as a fact checker. Trump laid some of the groundwork last week by claiming that Holt is a Democrat, even though he is registered as a Republican. Here’s Robby Mook, Clinton campaign manager, appearing on ABC’s This Week:

“All that we’re asking is that if Donald Trump lies, that it’s pointed out. It’s unfair to ask that Hillary Clinton both play traffic cop with Trump, make sure that his lies are corrected, and also to present her vision for what she wants to do for the American people.”

As we might expect, Trump campaign manager Kellyanne Conway disagreed:

“I really don’t appreciate campaigns thinking it is the job of the media to go and be these virtual fact-checkers and that these debate moderators should somehow do their bidding,”

Historically, the question of whether debate moderators should be aggressive fact checkers was not a partisan issue. Candidates and journalists favored staying out of the way and letting the candidates be the story. Janet Brown, head of the committee that organized the presidential debates, endorsed this view on CNN yesterday. For this election cycle, many journalists and pundits have argued Trump requires special rules (see this Slate interview with the executive editor of the New York Times, then see Jay Rosen here for a longer version of that argument). As a moral issue, I have favored moral aggressive fact checking since I first got the right to vote in a presidential election. However, I also felt confident in my ability to evaluate candidates’ ability to tell the truth without relying on the moderator.

The more time I spend studying journalism and then watching sports in my free time, the more I doubt whether any moderator could meet my fact-checking expectations. In the last week I saw a pitch right in the middle of the strike zone get called a ball. That umpire faced an easy, objective, technical call and got it wrong. I went to UCLA, which means I have seen a lot of Pac-12 sports. If you’re a college sports fan, you won’t be surprised that when I typed “pac 12 refs” in to Google the first auto-complete is “are the worst.” The conference’s officials are notorious for baffling and inconsistent interpretation of the rules for football and basketball. Then again, even the best officials in the sports world make mistakes. Why do we expect debate moderators to be perfect?

When I was a reporter, I made a bunch of mistakes in interviews. Sometimes I caught people lying to me right away. Sometimes I had to look things up afterwards. There were a lot of times when I looked back in my notes and didn’t have as much material as I thought I did, and I really wish I could have followed up on things. On a national stage, with less cooperative sources in Donald Trump and Hillary Clinton, catching all the lies is much harder.

As much as I care about sports, I think the risks of a “blown call” in live fact-checking of a presidential debate is much more serious. Sports leagues have universally adopted the use of supplemental off-field officials as a way to get calls right. Professional reviews from the league office help insulate the on-field officials from hostile crowds. It seems absurd to expect Lester Holt or any other moderator to do the entire job by their lonesome, with no help. Marvel’s latest superhero couldn’t achieve that feat, let alone a real person.

Simulating How a Jury of Fact Checkers Would Work

Since I am also a stats person, I decided to run a few very simple simulations to try and explain why moderator error is a bigger risk than people realize, and how a large jury of outside fact checkers could solve the problem. Let’s assume that if Clinton and Trump talked forever, they would each say 10,000 things that are fact-checkable and false. It’s probably best to call them the Clinton lying bot and the Trump lying bot, because this isn’t a simulation of how often candidates lie. This is a simulation of how well moderators could catch lying and what could happen when moderators are imperfect. (We could call them the ice bot and the fire bot instead of naming them after candidates; it makes no difference to Stata.)

I started by creating an aggressive, skilled, courageous moderator. This moderator will roll a six-sided die every time one of the candidate bots lies to them. On a 1, they don’t notice the lie right away. On a 2 through 6, they challenge the lie the next time they get to speak. No, I don’t expect a debate moderator to do any better than this while they also have to think about how to fit a large range of topics in to a small time frame.

(Sidenote: The limited time frame is another strong and largely unmentioned issue in the current fact checking debate. Political journalists will stop following up on a particular topic once it becomes clear that the president will not give a direct answer on topic X, because they might give good answers on topics Y and Z.)

Anyone who plays tabletop games or knows basic probability can guess what happens in the simulation. The aggressive moderator caught nine of the Trump bot’s first ten lies, and eight of the Clinton bot’s ten. Over the first 30 statements, this moderator is catching 83.33% of Clinton bot’s lies – the predicted mean. However, they caught 87% of Trump bot’s lies in this period. Over the full dataset the aggressive moderator would stay just as aggressive (it’s a simulation, not real life), catching 83.19% of Clinton bot lies and 83.27% of Trump bot lies.

Next I created a moderator who is really bad at fact checking. Maybe they have a very high threshold for challenging a politician. Maybe they really want to fact check but can’t focus on what a candidate is saying right now and the next question all at the same time. Either way, this moderator still gets to roll a six-sided die for every lie, but they only challenge the lie on a 6. This moderator challenged three of the Clinton bot’s first ten lies, while only challenging one of the Trump bot’s first ten. By thirty observations the poor moderator is catching one out of every six Clinton bot lies, but is still stuck at catching only one of ten from Trump bot.

Let’s imagine Lester Holt misses a lie during the real debate. Maybe he catches some but not others. There is a limit to how many lies a candidate can tell in 90 minutes – they are long winded and repetitive speakers. I wouldn’t expect a large enough sample of lies for a moderator’s forgetfulness to balance out. What are the chances that partisan audiences will tweet “Oh Lester Holt just made an innocent mistake. Things happen. Nobody is perfect.” I’m going to pan over to Holt’s colleague Matt Lauer and say the chance of Holt getting a pass is zero. We have no idea what’s going on in a moderator’s head. We don’t know if the failure to challenge a presidential candidate is an innocent mistake or a more serious attempt to influence voters. And I’m not sure we care, because even an innocent mistake can have real consequences.

What would happen if we had a room of 100 good fact checkers? I ran several simulations creating 100 fact checkers for each of Trump bot’s lies and Clinton bot’s lies. To start with, I rolled a six-sided die to set each fact checker’s evaluation for each political bot’s 10,000 lies. What I want to do here is show how different juries would make sense of those impressions and whether they would buzz the moderator saying “this response is a lie, you MUST follow up!”

Let’s assume we had a jury full of good moderators who catch a lie with a 2-6 on their die roll. With this large a group none of the 20,000 total lies in the database was red flagged by my entire group of fact-checkers. However, the crowd can pick up an individual’s mistake. Every statement was red-flagged by at least 66 fact checkers. If we could find 100 great fact checkers, they would be far superior to any individual moderator trying to fact check in real time. The converse is also true. If we got 100 of the bad fact checkers together, each needing to roll a six to catch the lie, they would never agree on whether to buzz the moderator.

In the real world, a lot of fact checking watchdogs are politically motivated. So let’s assume we have a fact checking jury of 25% Clinton supporters, 25% Trump supporters, 35% good fact checkers, 15% bad fact checkers. For this simulation the partisans will call out the opposing bot if they roll a 2-6. I also decided they would call out their own bot on a 6: partisans may hope a second question pushes their bot to a more acceptable answer. In this scenario the median is 57% of the jury detecting a lie. If it only took a simple majority to buzz the moderator and demand a follow up, this jury would be effective 95 percent of the time. If the jury acted like they had to break through a filibuster in the Senate, random error would be a much bigger issue. This partisan jury would start by buzzing in for three Clinton bot lies but only one Trump bot lie. After 300 statements the odds even out, but that’s a lot to ask.

I thought an ideal situation would be having a range of debate jurors. I made one last room with 15% dedicated Clinton supporters and 15% dedicated Trump supporters. Then I made another 20% who leaned to each candidate. They buzz in for an opposing bot’s lie on a 3 through 6 and their own candidate on a 5 or 6. The jury also has 15 percent good moderators and 15 percent bad moderators. It turns out the balanced debate jury was also the most unstable in simulations. The median result was a 50-50 deadlock. At this point it becomes rather philosophical. For debate juries to work better than a sole moderator, the key appears to be packing the jury with people willing (if not eager) to challenge both candidates if and when they distort the truth.


About Noah Grand

PhD in Sociology. I use statistics to predict news coverage. And home runs. View all posts by Noah Grand

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: