Skip to content
Ideas Social Media

Striking a balance: automated and human content moderation

Rushi Bhavsar

COVID-19 was accompanied by a surge of misinformation that laid siege to social media platforms at the very moment that they became indispensable. What made the problem even knottier was that these platforms were forced to confront this crisis without their first line of defense — content moderators.

Content moderators are the unsung heroes of the internet — an invisible but omnipresent network of key workers who help sanitize the internet for the rest of us by filtering the most dangerous and harmful content being posted. Because these moderators often have access to highly sensitive data, they are required to work in closely monitored call-centre like settings. This work is considered too sensitive to be done from home, And so lockdown conditions under COVID-19 have led to a drastic increase in the volume of automated content moderation — a planet scale experiment testing computers' ability to police speech online.

Barring some notable exceptions, this push towards a technology-driven response to policing content has been a success. It might seem impossible to believe given the extent to which misinformation has trickled into our societies today, but the fact remains that it could've been infinitely worse, and that the major social media platforms have still protected billions of people from the worst infodemic known to man. However, it is time for us to begin to consider how global moderation systems may evolve as a result of the pandemic, and how that affects the media landscape as a whole.

Perhaps the biggest problem with end-to-end automated moderation is that these systems tend to be imperfect and blunt instruments which can't read nuance in speech the way humans can. For example, while Facebook’s automated systems are able to remove 95% of posts involving adult nudity, terrorism and child exploitation, these systems can only catch 16% of posts involving bullying and harassment. The disparities are even more glaring when it comes to non-English content.

False Positives and Brand Safety

Leaving aside the deeper ethical issues around systemic biases in machine learning systems, the simplest way to think about the effects of automated moderation is through the lens of error management theory — when making decisions under conditions of uncertainty (such as moderation), there are two kinds of errors - Type 1 errors, or false positives are when the system decides that a piece of content is problematic when it isn't, and Type 2 errors (false negatives), when the system decides that a piece of content isn't problematic when it is.

When COVID-19 first hit, Facebook, Google and Twitter immediately declared that a consequence of increased automated moderation would be an increase in false positives, i.e. any potentially dangerous content would be taken down, including posts that might have passed review in the past. A version of this bias was also seen in the programmatic space, where brand safety filters based on semantic strings immediately prevented any ads from running on web pages containing COVID-19 related content.

While this bias towards false positives makes sense in the context of viral disinformation during a pandemic, we must use this opportunity to think deeply about the consequences of this shift towards increased automation.

To begin with, we must ask ourselves whether un-nuanced automated rules and a bias towards false positives are really the right path towards ensuring brand safety, especially if they come at the cost of damaging the publisher landscape as a whole. Improved publisher whitelisting combined with a renewed focus on contextual advertising are possible start points.

Next, we have to come to terms with the fact that no matter how advanced automated systems get, these systems may always require humans to adjust for context. As the stewards of the critical infrastructure that is social media, moderators need to be considered essential workers. As Mark Zuckerberg himself stated to the Washington Post, "I think there will always be humans in the loop."

Finally, when we think about the extent to which we need automation in the loop, we should reject a black and white dichotomy framed as a fully automated AI-moderated internet or a free-for-all of "unbearable human self-expression," as described by Sarah Roberts at UCLA. She reminds us that humans are not "infrastructure." The tens of thousands of moderators worldwide are not mere cogs in the machine, but rather incredible super-computers capable of parsing nuance, language and culture aided by our best attempts at artificial intelligence. They work best as a team.