One of Twitter’s first ethics projects years before had found that the system inaccurately labeled some tweets as marginally abusive and applied a fix for only a small set of words, Yee says. The team now wanted to develop a more comprehensive patch. Otherwise, “over-penalization risks hurting the very communities that these tools are meant to protect,” Yee said at a conference last year.
Abuse-detection algorithms at other companies, including Google and Facebook, had been shown to struggle with African American vernacular, as well as traditionally hateful speech that target populations had reclaimed for their own use. The speaker and context matters significantly in these cases, but those variables can be lost on algorithms that developers have not set up well.
At the time Yee studied it, Twitter’s marginal abuse system scored tweets for content considered insulting or malicious, or that encouraged dangerous behavior. The company trained it on a sample of tweets it hired people to rate.
META’s novel automated analysis of tweets determined the 350 terms most strongly associated with tweets that had been inaccurately flagged as marginally abusive. The team grouped them into several categories, including identity-related (Chinese, deaf), geographies (Palestine, Africa), political identity (feminist, Tories), and current events (cop, abortions). For 33 terms for which researchers considered errors particularly concerning, including “queer” and “Jewish,” they re-trained the machine-learning system with nearly 50,000 additional samples of the terms being used within Twitter’s rules.
The adjusted model wrongly flagged tweets less often for some of the terms without substantially worsening the system’s overall ability to predict whether something was actually problematic. Although the improvements were small and not universal, the team considered the adjusted model better and deployed it onto Twitter mid-2022. Yee says the update laid a foundation for future research and updates. “This should really be seen not as the ending point, with like a perfect bow on it, but rather the starting point,” she says.
But Twitter has lost at least two-thirds of its workforce in its first quarter under Musk, and work on further changes to abuse detection has likely stopped—just one of many projects put to rest, which could contribute to Twitter becoming a worse place to be over time.
“There’s nobody there to do this work. There’s nobody there who cares. There’s no one picking up the phone,” Chowdhury says. The META team’s projects had included testing a dashboard to identify in real time the amplification of different political parties on Twitter in hopes of catching bot networks seeking to manipulate discussions.
By some accounts, Twitter is already growing sour. A survey by Amnesty International USA and other groups published this month of 20 popular LGBTQ+ organizations and personalities found that 12 had experienced increased hate and abusive speech on Twitter since Musk became owner; the rest had not noticed a change.
Shoshana Goldberg, director of public education and research at the Human Rights Campaign Foundation, says much of the most-viewed hate speech comes from relatively few accounts, and even with few researchers left, Musk can help marginalized communities on Twitter. “I appreciate when companies are willing to take an internal look,” Goldberg says. “But we also kind of know who is doing this, and not enough is being done to address this.”