These problems come from the top. Machine learning research is overwhelmingly male and white, a demographic world away from the diverse communities it purports to help. And Big Tech firms don’t just offer online diversions—they hold enormous amounts of power to shape events in the real world.
Birhane and others have branded this “digital colonialism”—arguing that the power of Big Tech rivals the old colonial empires. Its harms will not affect us all equally, she argues: As technology is exported to the global south, it carries embedded Western norms and philosophies along with it. It’s sold as a way of helping people in underdeveloped nations, but it’s often imposed on them without consultation, pushing them further into the margins. “Nobody in Silicon Valley stays up worrying about the unbanked Black women in a rural part of Timbuktu,” Birhane says.
Birhane believes shifting public attitudes will be the most effective driver of change: Big Tech firms respond more to outrage than bureaucratic rule changes. But she has no desire to live in a permanent cloud of bile: As a Black woman doing critical work, she has faced pushback from day one. “I don’t know if I can live my life fighting,” she says. Birhane—who now combines lecturing with a senior fellowship at the Mozilla Foundation—would prefer to let her research do the work. “I am a big proponent of ‘show the data,’” she says.
But Birhane does not think that will be enough—she is not optimistic that Big Tech will self-correct. For every problematic data set that’s revealed and corrected, another lies waiting. Sometimes nothing even changes: In 2021, Birhane and colleagues published a paper about a data set of more than 400 million images, called the LAION-400M data set, which returned explicit pornography when prompted with even mildly feminine words such as “mummy” or “aunty.” The paper triggered outrage, but the data set still exists and has swelled to more than 5 billion images. It recently won an award.
There’s a reason nothing has changed. While creating data sets for AI is fairly simple—just trawl the internet—auditing them is time-consuming and expensive. “Doing the dirty work is just a lot harder,” Birhane says. There is no incentive to make a clean data set—only a profitable one. But this means all that dirty work falls on the shoulders of researchers like Birhane, for whom sifting through these data sets—having to spend hours looking at racist imagery or rape scenes–takes a toll. “It’s really depressing,” she says. “It really can be traumatizing, looking at these things.”
In an ideal world, change would be driven by the vast resources of the tech companies, not by independent researchers. But corporations are not likely to overhaul their ways without considerable pressure. “I want, in an ideal world, a civilized system where corporations will take accountability and responsibility and make sure that the systems they’re putting out are as accurate and as fair and just for everybody,” Birhane says. “But that just feels like it’s asking too much.”
This article appears in the March/April 2023 edition of WIRED UK magazine.