Botometer is a supervised machine learning tool, which means it has been taught to separate bots from humans on its own. Yang says Botometer differentiates bots from humans by looking at more than 1,000 details associated with a single Twitter account—such as its name, profile picture, followers, and ratio of tweets to retweets—before giving it a score between zero and five. “The higher the score means it’s more likely to be a bot, the lower score means it’s more likely to be a human,” says Yang. “If an account has a score of 4.5, it means it’s really likely to be a bot. But if it’s 1.2, it’s more likely to be a human.”
Crucially, however, Botometer does not give users a threshold, a definitive number that defines all accounts with higher scores as bots. Yang says the tool should not be used at all to decide whether individual accounts or groups of accounts are bots.He prefers it be used comparatively to understand whether one conversation topic is more polluted by bots than another.
Still, some researchers continue to use the tool incorrectly, says Yang. And the lack of threshold has created a gray area. Without a threshold, there’s no consensus about how to define a bot. Researchers hoping to find more bots can choose a lower threshold than researchers hoping to find less. In pursuit of clarity, many disinformation researchers have defaulted to defining bots as any account that scores above 50 percent or 2.5 on Botometer’s scale, according to Florian Gallwitz, a computer science professor at Germany’s Nuremberg Institute of Technology.
Gallwitz is an outspoken critic of Botometer, claiming it is polluting the way academics study disinformation on Twitter. In July, he published a paper claiming that out of hundreds of accounts scoring 2.5 and above, not a single one was a bot. “Many of these accounts are operated by people with impressive academic and professional credentials,” the paper reads.
One account that Botometer flags as suspicious using the 2.5 threshold is that of Annalena Baerbock, Germany’s foreign minister, who scores 2.8 (although Botometer warns in the results that “19 percent of accounts with a bot score above 2.8 are labeled as humans”). Baerbock’s team told WIRED that the foreign minister’s account is not automated in any way.
To Gallwitz, these types of false positives prove that Botometer doesn’t work. “It is a tool that everybody can use to produce pseudoscience,” he claims. Gallwitz is frustrated that researchers relying on Botometer do not share examples of the accounts they identified as bots so that others can verify their results. As an example, he points to an August 2022 study by researchers at the University of Adelaide, which used Botometer to claim that between 60 and 80 percent of accounts tweeting pro-Ukraine and pro-Russia hashtags are bots. “We avoid reporting individual-level data due to privacy and ethics,” says Joshua Watt, one of the study’s authors.
Yet Yang is clear: 2.5 should not be a threshold as it signals that the machine learning model is “not really confident.” The allegations in Gallwitz’s study are not new, Yang adds, noting that some people exploit Botometer’s limitations—inevitable for all supervised machine learning algorithms, he argues—to undermine the entire field of study devoted to social bots.
But the threshold is an important detail when assessing the use of Botometer by Musk’s legal team. “Musk’s team didn’t provide any detail on what threshold they used,” adds Yang. “I’m not sure I’m convinced that the number they’ve provided is accurate,” he says. “You can choose any threshold to get any number you want.”