We want to hear from you. Please take a few minutes and let us know how we can improve this newsletter by completing the TechStream reader survey. We greatly appreciate your feedback and time!
Over the course of 24 hours, a bot built by the YouTuber and AI researcher Yannick Kilcher ran rampant on 4chan, the notorious online forum that serves as an incubator for a variety of extremist ideologies. To the unsuspecting eye, messages from Kilcher’s bot accounts were exactly what would be expected from 4chan posts: insulting, conspiratorial, misogynistic, anti-Semitic and racist. But for the hundreds of 4chan users who interacted with them, the bot accounts, which posted messages from a large language model built by Kilcher, seemed the same as every other anonymous user contributing to the site’s toxic discourse. 4chan’s users were being hoodwinked by a racist, anti-Semitic, degenerate AI.
Kilcher built the language model in question by training an AI on a data-set of posts scraped from 4chan’s infamous /pol message board. He dubbed the model gpt-4chan, and for anyone who has spent any time on the site it is inspired by, the model is uncanny at reproducing 4chan’s blend of toxic rhetoric. When I asked the model its opinion of women’s role in society, it described women as a “different species” and “not human.” When I asked its opinion of Black people, its answers were uniformly racist and featured the n-word. Remarkably, Kilcher allowed this bot to freely post on 4chan for 24 hours, creating a system in which nine instances of the bot posted on the site. While his bot accounts were up and running, Kilcher claims that they contributed 10% of the posts on /pol. The high volume of messages posted by Kilcher’s bots attracted the attention of 4chan users, who began to speculate on the identity of this anonymous poster. Some correctly identified it as a bot; others found the bots messages to bear the hallmarks of human language; others thought it was a police operation. At no point were 4chan users informed that they were interacting with an AI.
According to Kilcher’s account, gpt-4chan was built by fine tuning GPT-J, a large language model developed and open-sourced by a collective of independent AI researchers. Large language models represent an increasingly popular—and increasingly accessible—way to demonstrate the power of machine learning systems. By training language models on large bodies of text, these models are able to generate original text that is remarkably human in content and highly difficult to distinguish as computer-generated. Such models were pioneered by the research firm OpenAI, whose GPT-3 model represents probably the most advanced and widely deployed large language model, but they are becoming increasingly available. (The acronym GPT stands for “Generative Pre-trained Transformer,” which refers to a particular type of machine learning architecture.)
Kilcher’s experiment illustrates the profound ethical questions raised by the proliferation of this technology. The deployment of large language models pose a variety of potential harms—ranging from the automation of disinformation to spam to fraud to astroturfing—and gpt-4chan encapsulates many of these concerns. Kilcher’s bots spammed 4chan and exposed users to hateful, racist, anti-Semitic messages, reinforcing the impression among users about the prevalence of such language on the platform. Trained on /pol’s toxic language, gpt-4chan reproduced this language in an original way with sufficient authenticity that it sparked an intense debate on the platform about its identity. And the bot’s references to itself—as well as its references to its girlfriend—were among the evidence cited by users that what was a bot had to be a human or perhaps a team of users working together.
Kilcher’s YouTube video about gpt-4chan sparked an intense debate on HuggingFace, the open-source repository for machine learning models where Kilcher made the model freely available, about the appropriateness of allowing anyone to download it. For now, HuggingFace has gated access to the model as the company continues to develop ethics reviews for models posted on the platform. In a widely cited Twitter thread, the AI ethics researcher Lauren Oakden-Rayner, argued that gpt-4chan represents a profound ethical failure in conducting what Oakden-Rayner calls an “experiment” on unsuspecting users. Kilcher has dismissed these concerns and argues that his critics have failed to identify any actual harm caused by his model.
This debate highlights how far the AI field has to go in building effective systems to govern the use of increasingly powerful tools. Earlier this month, the AI firms Cohere, OpenAI, and AI21 Labs released a set of best practices for companies deploying large language models, but these policies are far from binding nor are they widely implemented. The guidelines include prohibitions on misuse, the mitigation of unintentional harm, and thoughtful collaboration with stakeholders—guidelines that Kilcher obviously ignored when he unleashed his model on 4chan.
The contours of acceptable online speech, and the appropriate mechanisms to ensure meaningful online communities, are among the most contentious policy debates in America today. Moderating content that is not per se illegal but that likely creates significant harm has proven particularly divisive. Many on the left insist digital platforms haven’t done enough to combat hate speech, misinformation, and other potentially harmful material, while many on the right argue that platforms are doing far too much—to the point where “Big Tech” is censoring legitimate speech and effectively infringing on Americans’ fundamental rights. As Congress weighs new regulation for digital platforms, and as states like Texas and Florida create social media legislation of their own, the importance and urgency of the issue is only set to grow. Unfortunately, the debate over free speech online is also being shaped by fundamentally incorrect understandings of the First Amendment, Aileen Nielsen writes.
Antitrust. Federal Trade Commission Chair Lina Khan warned in an interview that her agency is closely watching the video game market, which has seen several major acquisitions in recent months, and tech companies efforts to expand into virtual-reality technologies. "I think these types of nascent, expanding markets are definitely on our radar and top of mind," Khan told Protocol. "Especially in as much as VR or AR [are] also becoming a major part of how some of these games are functioning for users and [how] users are interacting."
Twitter. Twitter plans to give Elon Musk access to the full, so-called “firehose” API for tweets to address his concerns about the presence of bots on the platform. After launching a takeover bid of the platform, Musk has accused Twitter of failing to provide sufficient data regarding bots on the platform, and Twitter is now moving to address that concern by giving Musk unfettered access to tweets as they are posted. That move is unlikely to provide Musk with a meaningful tool to assess the presence of bots; rather, it is likely aimed at countering future arguments by Musk’s lawyers that the platform has been insufficiently cooperative as a way to scupper the acquisition.
Cyberwar. After American officials revealed that U.S. military hackers have carried out offensive cyberoperations in support of Ukraine, Russia warned that cyberattacks against its infrastructure could result in a military confrontation. The exact nature of U.S. operations against Russia remains unclear, but U.S. willingness to carry out such attacks—while refraining from a broader military intervention—suggests that American officials are not particularly concerned about the type of reprisals Russian officials are now warning of. American officials have warned that Russia may carry out retaliatory cyberattacks against the United States for its support of Ukraine, but, so far, such attacks have not materialized.
Chinese hacking. U.S. security agencies warned that they have observed Chinese government-backed hackers targeting American telecommunications firms using publicly disclosed cybersecurity vulnerabilities. By carrying out attacks using known vulnerabilities, rather than unique tools, Chinese hackers may be attempting to provide greater deniability to their attacks or to mask their origin all-together.
Crypto. A bill introduced in the Senate this week would treat most cryptocurrencies as commodities and assign responsibility for regulating them to the Commodity Futures Trading Commission (CFTC), rather than the Securities and Exchange Commission (SEC). Granting the CFTC authority over regulating digital currencies has been a long-standing goal of the crypto industry, which would much prefer the smaller, less powerful body to oversee the nascent industry—rather than the SEC, whose head, Gary Gensler, has said he wants far tighter rules for regulating digital currencies.
Voting machines. The U.S. Cybersecurity and Infrastructure Agency warned that a voting machine in use by 16 states suffers from software vulnerabilities that could allow an attacker to access sensitive systems. The warning regarding the machines made by Dominion Voting Systems does not include any information to indicate that vote totals have been manipulated, but urges state election officials to address them quickly. Election fraud featuring Dominion machines has been a central narrative to the false claims that former President Donald Trump won the 2020 election; in warning about the Dominion machines’ vulnerability, U.S. officials are attempting to walk a fine line of addressing security concerns while not providing additional fuel for these conspiracy theories.
Charging ports. European Union officials reached an agreement that will require tech companies to use a standard USB-C charging port for new devices. The move is aimed at reducing electronic waste but is being fiercely resisted by tech companies, who argue it will reduce innovation in charging technology. The rule is set to go into effect in 2024 for portable devices and in 2026 for laptops.
Reports we’re reading
TikTok. A report from the Mozilla Foundation examines the prevalence of disinformation related to the 2022 Kenyan election.
Aging scientists. A new paper finds that as scientists age, they tend to cite older work and become more hostile to new ideas.
A final point
“Taser-equipped drones ... is more than any of us can abide.”
— A statement from the resigning members of Axon’s AI Ethics Board that have stepped down in response to the company’s announcement that it will develop taser-equipped drones and position them in schools to respond to mass-shootings.
Google provides financial support to the Brookings Institution, a nonprofit organization devoted to rigorous, independent, in-depth public policy research.
The Brookings Institution, 1775 Massachusetts Ave NW, Washington, DC 20036