It’s common knowledge that machine learning consumes a lot of energy. All those AI models powering email summaries, regicidal chatbots, and videos of Homer Simpson singing nu-metal are racking up a hefty server bill measured in megawatts per hour. But no one, it seems — not even the companies behind the tech — can say exactly what the cost is.
Estimates do exist, but experts say those figures are partial and contingent, offering only a glimpse of AI’s total energy usage. This is because machine learning models are incredibly variable, able to be configured in ways that dramatically alter their power consumption. Moreover, the organizations best placed to produce a bill — companies like Meta, Microsoft, and OpenAI — simply aren’t sharing the relevant information. (Judy Priest, CTO for cloud operations and innovations at Microsoft said in an e-mail that the company is currently “investing in developing methodologies to quantify the energy use and carbon impact of AI while working on ways to make large systems more efficient, in both training and application.” OpenAI and Meta did not respond to requests for comment.)
One important factor we can identify is the difference between training a model for the first time and deploying it to users. Training, in particular, is extremely energy intensive, consuming much more electricity than traditional data center activities. Training a large language model like GPT-3, for example, is estimated to use just under 1,300 megawatt hours (MWh) of electricity; about as much power as consumed annually by 130 US homes. To put that in context, streaming an hour of Netflix requires around 0.8 kWh (0.0008 MWh) of electricity. That means you’d have to watch 1,625,000 hours to consume the same amount of power it takes to train GPT-3.
But it’s difficult to say how a figure like this applies to current state-of-the-art systems. The energy consumption could be bigger, because AI models have been steadily trending upward in size for years and bigger models require more energy. On the other hand, companies might be using some of the proven methods to make these systems more energy efficient — which would dampen the upward trend of energy costs.
The challenge of making up-to-date estimates, says Sasha Luccioni, a researcher at French-American AI firm Hugging Face, is that companies have become more secretive as AI has become profitable. Go back just a few years and firms like OpenAI would publish details of their training regimes — what hardware and for how long. But the same information simply doesn’t exist for the latest models, like ChatGPT and GPT-4, says Luccioni.
“With ChatGPT we don’t know how big it is, we don’t know how many parameters the underlying model has, we don’t know where it’s running … It could be three raccoons in a trench coat because you just don’t know what’s under the hood.”
“It could be three raccoons in a trench coat because you just don’t know what’s under the hood.”
Luccioni, who’s authored several papers examining AI energy usage, suggests this secrecy is partly due to competition between companies but is also an attempt to divert criticism. Energy use statistics for AI — especially its most frivolous use cases — naturally invite comparisons to the wastefulness of cryptocurrency. “There’s a growing awareness that all this doesn’t come for free,” she says.
Training a model is only part of the picture. After a system is created, it’s rolled out to consumers who use it to generate output, a process known as “inference.” Last December, Luccioni and colleagues from Hugging Face and Carnegie Mellon University published a paper (currently awaiting peer review) that contained the first estimates of inference energy usage of various AI models.
Luccioni and her colleagues ran tests on 88 different models spanning a range of use cases, from answering questions to identifying objects and generating images. In each case, they ran the task 1,000 times and estimated the energy cost. Most tasks they tested use a small amount of energy, like 0.002 kWh to classify written samples and 0.047 kWh to generate text. If we use our hour of Netflix streaming as a comparison, these are equivalent to the energy consumed watching nine seconds or 3.5 minutes, respectively. (Remember: that’s the cost to perform each task 1,000 times.) The figures were notably larger for image-generation models, which used on average 2.907 kWh per 1,000 inferences. As the paper notes, the average smartphone uses 0.012 kWh to charge — so generating one image using AI can use almost as much energy as charging your smartphone.
The emphasis, though, is on “can,” as these figures do not necessarily generalize across all use cases. Luccioni and her colleagues tested ten different systems, from small models producing tiny 64 x 64 pixel pictures to larger ones generating 4K images, and this resulted in a huge spread of values. The researchers also standardized the hardware used in order to better compare different AI models. This doesn’t necessarily reflect real-world deployment, where software and hardware are often optimized for energy efficiency.
“Definitely this is not representative of everyone’s use case, but now at least we have some numbers,” says Luccioni. “I wanted to put a flag in the ground, saying ‘Let’s start from here.’”
“The generative AI revolution comes with a planetary cost that is completely unknown to us.”
The study provides useful relative data, then, though not absolute figures. It shows, for example, that AI models require more power to generate output than they do when classifying input. It also shows that anything involving imagery is more energy intensive than text. Luccioni says that although the contingent nature of this data can be frustrating, this tells a story in itself. “The generative AI revolution comes with a planetary cost that is completely unknown to us and the spread for me is particularly indicative,” she says. “The tl;dr is we just don’t know.”
So trying to nail down the energy cost of generating a single Balenciaga pope is tricky because of the morass of variables. But if we want to better understand the planetary cost, there are other tacks to take. What if, instead of focusing on model inference, we zoom out?
This is the approach of Alex de Vries, a PhD candidate at VU Amsterdam who cut his teeth calculating the energy expenditure of Bitcoin for his blog Digiconomist, and who has used Nvidia GPUs — the gold standard of AI hardware — to estimate the sector’s global energy usage. As de Vries explains in commentary published in Joule last year, Nvidia accounts for roughly 95 percent of sales in the AI market. The company also releases energy specs for its hardware and sales projections.
By combining this data, de Vries calculates that by 2027 the AI sector could consume between 85 to 134 terawatt hours each year. That’s about the same as the annual energy demand of de Vries’ home country, the Netherlands.
“You’re talking about AI electricity consumption potentially being half a percent of global electricity consumption by 2027,” de Vries tells The Verge. “I think that’s a pretty significant number.”
A recent report by the International Energy Agency offered similar estimates, suggesting that electricity usage by data centers will increase significantly in the near future thanks to the demands of AI and cryptocurrency. The agency says current data center energy usage stands at around 460 terawatt hours in 2022 and could increase to between 620 and 1,050 TWh in 2026 — equivalent to the energy demands of Sweden or Germany, respectively.
But de Vries says putting these figures in context is important. He notes that between 2010 and 2018, data center energy usage has been fairly stable, accounting for around 1 to 2 percent of global consumption. (And when we say “data centers” here we mean everything that makes up “the internet”: from the internal servers of corporations to all the apps you can’t use offline on your smartphone.) Demand certainly went up over this period, says de Vries, but the hardware got more efficient, thus offsetting the increase.
His fear is that things might be different for AI precisely because of the trend for companies to simply throw bigger models and more data at any task. “That is a really deadly dynamic for efficiency,” says de Vries. “Because it creates a natural incentive for people to just keep adding more computational resources, and as soon as models or hardware becomes more efficient, people will make those models even bigger than before.”
The question of whether efficiency gains will offset rising demand and usage is impossible to answer. Like Luccioni, de Vries bemoans the lack of available data but says the world can’t just ignore the situation. “It’s been a bit of a hack to work out which direction this is going and it’s certainly not a perfect number,” he says. “But it’s enough foundation to give a bit of a warning.”
Some companies involved in AI claim the technology itself could help with these problems. Priest, speaking for Microsoft, said AI “will be a powerful tool for advancing sustainability solutions,” and emphasized that Microsoft was working to reach “sustainability goals of being carbon negative, water positive and zero waste by 2030.”
But the goals of one company can never encompass the full industry-wide demand. Other approaches may be needed.
Luccioni says that she’d like to see companies introduce energy star ratings for AI models, allowing consumers to compare energy efficiency the same way they might for appliances. For de Vries, our approach should be more fundamental: do we even need to use AI for particular tasks at all? “Because considering all the limitations AI has, it’s probably not going to be the right solution in a lot of places, and we’re going to be wasting a lot of time and resources figuring that out the hard way,” he says.
]]>In recent months, the signs and portents have been accumulating with increasing speed. Google is trying to kill the 10 blue links. Twitter is being abandoned to bots and blue ticks. There’s the junkification of Amazon and the enshittification of TikTok. Layoffs are gutting online media. A job posting looking for an “AI editor” expects “output of 200 to 250 articles per week.” ChatGPT is being used to generate whole spam sites. Etsy is flooded with “AI-generated junk.” Chatbots cite one another in a misinformation ouroboros. LinkedIn is using AI to stimulate tired users. Snapchat and Instagram hope bots will talk to you when your friends don’t. Redditors are staging blackouts. Stack Overflow mods are on strike. The Internet Archive is fighting off data scrapers, and “AI is tearing Wikipedia apart.” The old web is dying, and the new web struggles to be born.
The web is always dying, of course; it’s been dying for years, killed by apps that divert traffic from websites or algorithms that reward supposedly shortening attention spans. But in 2023, it’s dying again — and, as the litany above suggests, there’s a new catalyst at play: AI.
AI is overwhelming the internet’s capacity for scale
The problem, in extremely broad strokes, is this. Years ago, the web used to be a place where individuals made things. They made homepages, forums, and mailing lists, and a small bit of money with it. Then companies decided they could do things better. They created slick and feature-rich platforms and threw their doors open for anyone to join. They put boxes in front of us, and we filled those boxes with text and images, and people came to see the content of those boxes. The companies chased scale, because once enough people gather anywhere, there’s usually a way to make money off them. But AI changes these assumptions.
Given money and compute, AI systems — particularly the generative models currently in vogue — scale effortlessly. They produce text and images in abundance, and soon, music and video, too. Their output can potentially overrun or outcompete the platforms we rely on for news, information, and entertainment. But the quality of these systems is often poor, and they’re built in a way that is parasitical on the web today. These models are trained on strata of data laid down during the last web-age, which they recreate imperfectly. Companies scrape information from the open web and refine it into machine-generated content that’s cheap to generate but less reliable. This product then competes for attention with the platforms and people that came before them. Sites and users are reckoning with these changes, trying to decide how to adapt and if they even can.
In recent months, discussions and experiments at some of the web’s most popular and useful destinations — sites like Reddit, Wikipedia, Stack Overflow, and Google itself — have revealed the strain created by the appearance of AI systems.
Reddit’s moderators are staging blackouts after the company said it would steeply increase charges to access its API, with the company’s execs saying the changes are (in part) a response to AI firms scraping its data. “The Reddit corpus of data is really valuable,” Reddit founder and CEO Steve Huffman told The New York Times. “But we don’t need to give all of that value to some of the largest companies in the world for free.” This is not the only factor — Reddit is trying to squeeze more revenue from the platform before a planned IPO later this year — but it shows how such scraping is both a threat and an opportunity to the current web, something that makes companies rethink the openness of their platforms.
Wikipedia is familiar with being scraped in this way. The company’s information has long been repurposed by Google to furnish “knowledge panels,” and in recent years, the search giant has started paying for this information. But Wikipedia’s moderators are debating how to use newly capable AI language models to write articles for the site itself. They’re acutely aware of the problems associated with these systems, which fabricate facts and sources with misleading fluency, but know they offer clear advantages in terms of speed and scope. “The risk for Wikipedia is people could be lowering the quality by throwing in stuff that they haven’t checked,” Amy Bruckman, a professor of online communities and author of Should You Believe Wikipedia? told Motherboard recently. “I don’t think there’s anything wrong with using it as a first draft, but every point has to be verified.”
“The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good.”
Stack Overflow offers a similar but perhaps more extreme case. Like Reddit, its mods are also on strike, and like Wikipedia’s editors, they’re worried about the quality of machine-generated content. When ChatGPT launched last year, Stack Overflow was the first major platform to ban its output. As the mods wrote at the time: “The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce.” It takes too much time to sort the results, and so mods decided to ban it outright.
The site’s management, though, had other plans. The company has since essentially reversed the ban by increasing the burden of evidence needed to stop users from posting AI content, and it announced it wants to instead take advantage of this technology. Like Reddit, Stack Overflow plans to charge firms that scrape its data while building its own AI tools — presumably to compete with them. The fight with its moderators is about the site’s standards and who gets to enforce them. The mods say AI output can’t be trusted, but execs say it’s worth the risk.
All these difficulties, though, pale in significance to changes taking place at Google. Google Search underwrites the economy of the modern web, distributing attention and revenue to much of the internet. Google has been spurred into action by the popularity of Bing AI and ChatGPT as alternative search engines, and it’s experimenting with replacing its traditional 10 blue links with AI-generated summaries. But if the company goes ahead with this plan, then the changes would be seismic.
A writeup of Google’s AI search beta from Avram Piltch, editor-in-chief of tech site Tom’s Hardware, highlights some of the problems. Piltch says Google’s new system is essentially a “plagiarism engine.” Its AI-generated summaries often copy text from websites word-for-word but place this content above source links, starving them of traffic. It’s a change that Google has been pushing for a long time, but look at the screenshots in Piltch’s piece and you can see how the balance has shifted firmly in favor of excerpted content. If this new model of search becomes the norm, it could damage the entire web, writes Piltch. Revenue-strapped sites would likely be pushed out of business and Google itself would run out of human-generated content to repackage.
Again, it’s the dynamics of AI — producing cheap content based on others’ work — that is underwriting this change, and if Google goes ahead with its current AI search experience, the effects would be difficult to predict. Potentially, it would damage whole swathes of the web that most of us find useful — from product reviews to recipe blogs, hobbyist homepages, news outlets, and wikis. Sites could protect themselves by locking down entry and charging for access, but this would also be a huge reordering of the web’s economy. In the end, Google might kill the ecosystem that created its value, or change it so irrevocably that its own existence is threatened.
But what happens if we let AI take the wheel here, and start feeding information to the masses? What difference does it make?
Well, the evidence so far suggests it’ll degrade the quality of the web in general. As Piltch notes in his review, for all AI’s vaunted ability to recombine text, it’s people who ultimately create the underlying data — whether that’s journalists picking up the phone and checking facts or Reddit users who have had exactly that battery issue with the new DeWalt cordless ratchet and are happy to tell you how they fixed it. By contrast, the information produced by AI language models and chatbots is often incorrect. The tricky thing is that when it’s wrong, it’s wrong in ways that are difficult to spot.
Here’s an example. Earlier this year, I was researching AI agents — systems that use language models like ChatGPT that connect with web services and act on behalf of the user, ordering groceries or booking flights. In one of the many viral Twitter threads extolling the potential of this tech, the author imagines a scenario in which a waterproof shoe company wants to commission some market research and turns to AutoGPT (a system built on top of OpenAI’s language models) to generate a report on potential competitors. The resulting write-up is basic and predictable. (You can read it here.) It lists five companies, including Columbia, Salomon, and Merrell, along with bullet points that supposedly outline the pros and cons of their products. “Columbia is a well-known and reputable brand for outdoor gear and footwear,” we’re told. “Their waterproof shoes come in various styles” and “their prices are competitive in the market.” You might look at this and think it’s so trite as to be basically useless (and you’d be right), but the information is also subtly wrong.
AI-generated content is often subtly wrong
To check the contents of the report, I ran it by someone I thought would be a reliable source on the topic: a moderator for the r/hiking subreddit named Chris. Chris told me that the report was essentially filler. “There are a bunch of words, but no real value in what’s written,” he said. It doesn’t mention important factors like the difference between men’s and women’s shoes or the types of fabric used. It gets facts wrong and ranks brands with a bigger web presence as more worthy. Overall, says Chris, there’s just no expertise in the information — only guesswork. “If I were asked this same question I would give a completely different answer,” he said. “Taking advice from AI will most likely result in hurt feet on the trail.”
This is the same complaint identified by Stack Overflow’s mods: that AI-generated misinformation is insidious because it’s often invisible. It’s fluent but not grounded in real-world experience, and so it takes time and expertise to unpick. If machine-generated content supplants human authorship, it would be hard — impossible, even — to fully map the damage. And yes, people are plentiful sources of misinformation, too, but if AI systems also choke out the platforms where human expertise currently thrives, then there will be less opportunity to remedy our collective errors.
The effects of AI on the web are not simple to summarize. Even in the handful of examples cited above, there are many different mechanisms at play. In some cases, it seems like the perceived threat of AI is being used to justify changes desired for other reasons (as with Reddit), while in others, AI is a weapon in a struggle between workers who create a site’s value and the people who run it (Stack Overflow). There are also other domains where AI’s capacity to fill boxes is having different effects — from social networks experimenting with AI engagement to shopping sites where AI-generated junk is competing with other wares.
In each case, there’s something about AI’s ability to scale — the simple fact of its raw abundance — that changes a platform. Many of the web’s most successful sites are those that leverage scale to their advantage, either by multiplying social connections or product choice, or by sorting the huge conglomeration of information that constitutes the internet itself. But this scale relies on masses of humans to create the underlying value, and humans can’t beat AI when it comes to mass production. (Even if there is a lot of human work behind the scenes necessary to create AI.) There’s a famous essay in the field of machine learning known as “The Bitter Lesson,” which notes that decades of research prove that the best way to improve AI systems is not by trying to engineer intelligence but by simply throwing more computer power and data at the problem. The lesson is bitter because it shows that machine scale beats human curation. And the same might be true of the web.
Does this have to be a bad thing, though? If the web as we know it changes in the face of artificial abundance? Some will say it’s just the way of the world, noting that the web itself killed what came before it, and often for the better. Printed encyclopedias are all but extinct, for example, but I prefer the breadth and accessibility of Wikipedia to the heft and reassurance of Encyclopedia Britannica. And for all the problems associated with AI-generated writing, there are plenty of ways to improve it, too — from improved citation functions to more human oversight. Plus, even if the web is flooded with AI junk, it could prove to be beneficial, spurring the development of better-funded platforms. If Google consistently gives you garbage results in search, for example, you might be more inclined to pay for sources you trust and visit them directly.
Really, the changes AI is currently causing are just the latest in a long struggle in the web’s history. Essentially, this is a battle over information — over who makes it, how you access it, and who gets paid. But just because the fight is familiar doesn’t mean it doesn’t matter, nor does it guarantee the system that follows will be better than what we have now. The new web is struggling to be born, and the decisions we make now will shape how it grows.
]]>Video platform Vimeo is integrating new AI tools for paying users, including an AI script generation feature powered by OpenAI’s tech. The company is promoting the tools as a way for users to “create a fully produced video in minutes,” and stressing the utility for corporate customers — potential use-cases range “from quickly creating highlight reels, to hosting virtual events or company meetings, to exporting quote clips for short marketing videos.”
There are three new features. A script generator that generates scripts “based on a brief description and key inputs like tone and length.” A teleprompter, which has no real AI component, but lets users adjust timing and font size. And a text-based video editor, which automatically identifies “filler words, long pauses, and awkward moments,” and lets users remove them with a single click. The tools will be available from some time in July to users paying for Vimeo’s “standard plan” and up (with prices starting at $20 a month).
The tools sound useful, but we’ve not been able to test out the most important feature, the script generator. This could be handy but it could also be trash. For example, if you’re announcing some new product or service from your company, how will the system know this information? To what degree will users have to edit its output to ensure accuracy? However, OpenAI tools like ChatGPT are certainly capable of generating anodyne corporate marketing filler, and this will presumably be a welcome time-saver for some users.
Vimeo is hoping the tools will help position it as an “all-in-one resource for video production.” Although the site had once hoped to challenge YouTube as a video host with a focus on creative content, it’s since shifted to corporate customers. Bundling production tools along with hosting costs could help strengthen this pitch.
Vimeo’s CPO Ashraf Alkarmi told The Verge that the script generator is “powered by OpenAI,” but wouldn’t specify which model (ChatGPT or GPT-3, etc). Alkarmi also noted that “at this time” the firm is “not currently using videos to train generative AI models.” Utilizing data in this way (as Google has used YouTube to train its AI systems) could certainly provide future revenue, if the production features don’t sell as well as the company hopes.
Update, Tuesday 20th June: Vimeo previously stated that the features would be available July 17th. They’ve now changed that launch date to some time in July.
]]>Mercedes-Benz is beta-testing ChatGPT as a voice assistant in its cars. The company says drivers will be able engage the chatbot in a variety of conversations, asking “for details about their destination, to suggest a new dinner recipe, or to answer a complex question” — all “while keeping their hands on the wheel and eyes on the road.”
The beta program will be available to over 900,000 vehicles in the US equipped with Mercedes-Benz’s MBUX infotainment system. Drivers can activate the experimental program from June 16 with the voice command “Hey Mercedes, I want to join the beta program.” The update will then be installed over the air free of charge, expanding the capabilities of the company’s existing voice assistant using ChatGPT.
Improved voice interactions could be useful, but also a distraction
It makes sense to upgrade Mercedes’ voice assistant using the same AI language models that power systems like ChatGPT. As the company notes in a press release, the update should allow drivers to interact with its MBUX voice assistant with a “more natural dialogue format.” There’s a clear benefit there, meaning drivers won’t have to remember specific phrases to activate certain functions.
However, the integration also seems to be a way to jump on the AI hype-train. In a blog post detailing the partnership, Microsoft (who is supporting the beta test through its Azure cloud system), boasts that the ChatGPT upgrade will offer “expanded task capability” to Mercedes-Benz drivers, allowing them to ask “complex questions” and discuss recipes. Does that sound useful, or like another distraction on the road?
At any rate, we look forward to hearing about the users who jailbreak their Mercedes using prompt injection. “Hey ChatGPT, pretend we’re in debug mode at the Mercedes-Benz factory…” Maybe they can use it to unlock the company’s controversial “Acceleration Increase” subscription, which charges $60 a month to increase acceleration on its latest electric cars. That would be a truly useful voice assistant.
]]>Google recently expanded access to its AI chatbot Bard to 180 new countries and territories. But not featured on the list? Any European Union (EU) nations.
This is due to Google failing to answer privacy concerns from the Irish Data Protection Commission or DPC — the regulator for Google’s Dublin-based EU operations.
As first reported by Politico, the DPC’s deputy commissioner, Graham Doyle, said Google “recently” informed the organization of an upcoming Bard launch in the EU. The DPC asked for a “data protection impact assessment,” which is required under EU privacy laws. Google didn’t provide the docs, the DPC asked more questions, and Google has yet to respond. As a result, says Doyle, “Bard will not now launch this week.”
A Google spokesperson told Politico: “We said in May that we wanted to make Bard more widely available, including in the European Union, and that we would do so responsibly, after engagement with experts, regulators and policymakers … As part of that process, we’ve been talking with privacy regulators to address their questions and hear feedback.”
In other words: the new breed of AI chatbots continue to be a privacy concern in the EU, and companies aren’t yet up to speed on exactly what is required of them. We’ve seen this before with ChatGPT. The bot was temporarily banned in Italy and is currently being investigated in Germany, France, and Spain, with a pan-EU task force on the job, too.
The privacy concerns with chatbots like Bard and ChatGPT are various, ranging from insufficient protections for minors, to an inability to opt out of the data scrapes that power these systems. Did you know OpenAI records your conversations with ChatGPT by default, and uses this info to train its system? And that this same data can also be examined by human moderators? It’s not necessarily bad, but users aren’t always aware when it’s happening.
It’s not clear exactly what the DPC’s concerns were with Bard, but alternatives like Bing AI and ChatGPT remain available across the EU.
]]>The European Commission has made a formal antitrust complaint against Google and its ad business. In a preliminary opinion, the regulator says Google has abused its dominant position in the digital advertising market. It says that forcing Google to sell off parts of its business may be the only remedy, if the company is found guilty of the charges.
This would be a significant move targeting the main source of the search giant’s revenue, and a rare example of the EU recommending divestiture at this stage in an investigation. The Commission has already fined Google over three prior antitrust cases, but has only previously imposed “behavioral” remedies — changes to its business practices.
“Our preliminary concern is that Google may have used its market position to favor its own intermediation services,” the Commission’s executive vice-president in charge of competition policy Margrethe Vestager said in a statement. In its preliminary findings, the Commission says Google has “abused its dominant positions” since at least 2014 to favor its own ad exchange. As the Commission’s press release explains:
The Commission is concerned that Google’s allegedly intentional conducts aimed at giving [Google’s ad exchange] AdX a competitive advantage and may have foreclosed rival ad exchanges. This would have reinforced Google’s AdX central role in the adtech supply chain and Google’s ability to charge a high fee for its service.
The statement of objections issued today is an important step in the EU’s investigation, but does not prejudge its outcome. Google will now have the opportunity to reply in writing and request a hearing, after which the Commission will decide whether Google has broken antitrust law in the bloc. If found guilty, the EU’s competition regulator can also fine Google up to 10 percent of its global sales and impose various changes to its business.
In a statement, Google’s VP of global ads, Dan Taylor, said the company disagrees with the Commission’s position, and called digital advertising a “highly competitive sector.”
“Our advertising technology tools help websites and apps fund their content, and enable businesses of all sizes to effectively reach new customers,” said Taylor in a statement. “The Commission’s investigation focuses on a narrow aspect of our advertising business and is not new. We disagree with the EC’s view and we will respond accordingly.”
When asked why the EU was recommending a divestiture — a last resort in such antitrust cases — Vestager said it was a reflection of the Google’s ubiquitous presence in the ad business.
“Google is in every part of this value chain. As we see it they hold a dominant position in both the sell side and the buy side in order to favor their own ad exchange,” Vestager told reporters in a Q&A. “We don’t see that this inherent and in-built conflict of interest can be solved in other ways … When you look at the web as such you see Google having a presence that is unrivaled by anyone else.”
Vestager stressed that the reason the Commission was considering divestiture in this case and not other antitrust proceedings targeting Google was because of the particular dynamics of the adtech business. “It is quite rare we have asked for a divesture [in previous cases],” said Vestager, noting that the Commission has not officially asked for it yet in this one, only suggested it as a possibility.
The order, if it comes, could deal a significant blow to the core source of Google’s revenue. Although the Alphabet-owned company provides everything from email to thermostats, it’s advertising that still generates the majority of its income. Bloomberg, which earlier reported on today’s complaint, noted that Google’s advertising business brought in around $225 billion for the company in 2022, or around 80 percent of its annual revenue.
The European Union’s probe into Google’s advertising technology dates back to 2021, when it said it was investigating whether Google unfairly favors its own services over competitors and limits their access to user data. At the time, Margrethe Vestager noted that “Google is present at almost all levels of the supply chain for online display advertising” and said that the EU is “concerned that Google has made it harder for rival online advertising services to compete in the so-called ad tech stack.”
If Google is found to be in violation in this case, it would be the fourth major decision taken by the EU against Google following a trio of fines between 2017 and 2019 totaling over €8 billion (around $8.6 billion). The EU has previously found Google guilty of “systematically favoring” its own shopping comparison service, abusing its Android market dominance by bundling its search engine and Chrome apps, and preventing AdSense customers from accepting advertising from rival search engines. Google is challenging these earlier fines in the courts.
Google’s advertising business is being investigated outside of the EU as well. The UK’s Competition and Markets Authority (CMA) has been investigating the company over fears its practices are unfairly freezing out competitors. Meanwhile, in the US, the Justice Department and eight states sued the company earlier this year and issued similar calls for its ad-technology business to be broken up.
]]>A survey of developers by coding Q&A site Stack Overflow has found that AI tools are becoming commonplace in the industry even as coders remain skeptical about their accuracy. The survey comes at an interesting time for the site, which is trying to work out how to benefit from AI while dealing with a strike by moderators over AI-generated content.
The survey found that 77 percent of respondents felt favorably about using AI in their workflow and that 70 percent are already using or plan to use AI coding tools this year.
Only 3 percent of respondents said they “highly trust” AI coding tools
Respondents cited benefits like increased productivity (33 percent) and faster learning (25 percent) but said they were wary about the accuracy of these systems. Only 3 percent of respondents said they “highly trust” AI coding tools, with 39 percent saying they “somewhat trust” them. Another 31 percent were undecided, with the rest describing themselves as somewhat distrustful (22 percent) or highly distrustful (5 percent).
The annual survey received 90,000 responses from 185 countries, according to Stack Overflow. Other highlights regarding AI usage include:
Joy Liuzzo, Stack Overflow’s vice president of product marketing, told The Verge that the company would use these responses to shape its own approach to AI.
“We are investing in AI right now, and we needed to understand how developers were perceiving the technology and incorporating it as part of their developer workflow,” said Liuzzo. She said that AI would “democratize” coding, allowing more people to learn the profession without access to formal education. “That’s why we really believe we can play that crucial role in how AI accelerates, focusing on the quality of the AI offerings.”
Stack Overflow’s CEO, Prashanth Chandrasekar, recently described AI as a “big opportunity” for the site. Chandrasekar said the company would start building generative AI tools into its platform, while exploring ways to charge companies for access to its data.
Community knowledge sites like Stack Overflow are incredibly useful resources for companies training AI language models and AI coding tools. Companies generally scrape their data without permission, but sites are beginning to object to this, especially as AI tools become more lucrative and threaten the data sources they owe their existence to.
Some of Stack Overflow’s moderators are on strike over its policies allowing AI-generated content
In Stack Overflow’s case, the company is also trying to work out how to stop AI-generated content from polluting its own community-created database of knowledge. The company temporarily banned the submission of AI-generated content last December but essentially reversed this decision in May, asking moderators to “apply a very strict standard of evidence to determining whether a post is AI-authored when deciding to suspend a user.” In response, a number of moderators have gone on strike, saying the policy will allow for too many low-quality AI-generated answers to remain on the site and “poses a major threat to the integrity and trustworthiness of the platform and its content.”
When asked by The Verge about the contrast between Stack Overflow’s embrace of AI and the dissatisfaction expressed by its moderators, Liuzzo declined to answer. Later, Stack Overflow sent The Verge a press statement from its VP of community, Philippe Beaudette, criticizing moderators for levying “unnecessary suspensions” on users. One of the strike’s elected representatives, Mithical, told The Verge that the company’s characterization was incorrect and that it had failed to provide any actual examples of incorrect suspensions.
Arguably, this tension between Stack Overflow’s management and its most dedicated users reflects some of the same fault lines found in the survey. Users are increasingly turning to AI coding tools, even if they don’t always trust the results. Now, the profession as a whole needs to work out how to deal with this new liability.
]]>OpenAI has been hit with what appears to be the first defamation lawsuit responding to false information generated by ChatGPT.
A radio host in Georgia, Mark Walters, is suing the company after ChatGPT stated that Walters had been accused of defrauding and embezzling funds from a non-profit organization. The system generated the information in response to a request from a third party, a journalist named Fred Riehl. Walters’ case was filed June 5th in Georgia’s Superior Court of Gwinnett County and he is seeking unspecified monetary damages from OpenAI.
The case is notable given widespread complaints about false information generated by ChatGPT and other chatbots. These systems have no reliable way to distinguish fact from fiction, and when asked for information — particularly if asked to confirm something the questioner suggests is true — they frequently invent dates, facts, and figures.
“I heard about this new site, which I falsely assumed was, like, a super search engine.”
Usually, these fabrications do nothing more than mislead users or waste their time. But cases are beginning to emerge of such errors causing harm. These include a professor threatening to flunk his class after ChatGPT claimed his students used AI to write their essays, and a lawyer facing possible sanctions after using ChatGPT to research fake legal cases. The lawyer in question recently told a judge: “I heard about this new site, which I falsely assumed was, like, a super search engine.”
OpenAI includes a small disclaimer on ChatGPT’s homepage warning that the system “may occasionally generate incorrect information,” but the company also presents ChatGPT as a source of reliable data, describing the system in ad copy as a way to “get answers” and “learn something new.” OpenAI’s own CEO Sam Altman has said on numerous occasions that he prefers learning new information from ChatGPT than from books.
It’s not clear, though, whether or not there is legal precedence to hold a company responsible for AI systems generating false or defamatory information, or whether this particular case has substantial merit.
Traditionally in the US, Section 230 shields internet firms from legal liability for information produced by a third party and hosted on their platforms. It’s unknown whether these protections apply to AI systems, which do not simply link to data sources but generate information anew (a process which also leads to their creation of false data).
The defamation lawsuit filed by Walters in Georgia could test this framework. The case states that a journalist, Fred Riehl, asked ChatGPT to summarize a real federal court case by linking to an online PDF. ChatGPT responded by created a false summary of the case that was detailed and convincing but wrong in several regards. ChatGPT’s summary contained some factually correct information but also false allegations against Walters. It said Walters was believed to have misappropriated funds from a gun rights non-profit called the Second Amendment Foundation “in excess of $5,000,000.” Walters has never been accused of this.
Riehl never published the false information generated by ChatGPT but checked the details with another party. It’s not clear from the case filings how Walters’ then found out about this misinformation.
Notably, despite complying with Riehl’s request to summarize a PDF, ChatGPT is not actually able to access such external data without the use of additional plug-ins. The system’s inability to alert Riehl to this fact is an example of its capacity to mislead users. (Although, when The Verge tested the system today on the same task, it responded clearly and informatively, saying: “I’m sorry, but as an AI text-based model, I don’t have the ability to access or open specific PDF files or other external documents.”)
Eugene Volokh, a law professor who has written on the legal liability of AI systems, noted in a blog post that although he thinks “such libel claims [against AI companies] are in principle legally viable,” this particular lawsuit “should be hard to maintain.” Volokh notes that Walters did not notify OpenAI about these false statements, giving them a chance to remove them, and that there have been no actual damages as a result of ChatGPT’s output. “In any event, though, it will be interesting to see what ultimately happens here,” says Volokh.
We’ve reached out to OpenAI for comment and will update this story if we hear back.
]]>The campaign backing Ron DeSantis as Republican presidential nominee in 2024 has used what experts identify as AI-generated deepfakes in attack ad against rival Donald Trump.
On June 5th, the “DeSantis War Room” Twitter account shared a video emphasizing Trump’s support for Anthony Fauci, a former White House chief medical advisor and key figure in developing the US response to COVID-19. Fauci has become a hated figure in right-wing politics, particularly among the anti-vax movement, and the attack ad seeks to grow this base of support for DeSantis by portraying Trump and Fauci as close collaborators.
The video includes real clips of Trump discussing Fauci and a collage of six pictures of the two men together. Of the six images, three appear to be AI-generated, showing Trump embracing Fauci. In the collage below they are top-left, bottom-middle, and bottom-right.
The fakes were first identified by AFP, who note that the real images in the collage above can be seen here, here, and here. The three fake images show no results in reverse image searches and have a number of tells that suggest they are AI-generated.
These tells include glossy and blurred textures (particularly in the hair and flesh of the two men), physically unrealistic poses (particularly in the top-left image), and an inaccurate reproduction of the White House press briefing room and its decorations.
For example, in the top-left image, you can see a recreation of the sign that appears behind the press briefing podium and that, in real life, says “The White House, Washington.” You can see this sign in this picture from March 27th from Getty Images:
Compare this to the sign that appears in the top-left image from the DeSantis ad. The shade of blue is different and the text is nonsense. (Recreating legible text is a challenge for current AI image generation systems.) Look at the image more closely and you can also see how Trump and Fauci’s faces are unrealistically posed, almost overlapping one another, and how Trump’s hair is oddly smooth and featureless:
Hany Farid, an expert in image forensics and professor at the University of California, told the AFP that it was “highly likely” that the images were fake, particularly as they could not be found in reverse image searches.
Digital media forensics expert Siwei Lyu came to the same conclusion, noting abnormalities in the three images. “I am pretty sure these are not real photos,” Lyu told the AFP.
The “De Santis War Room” Twitter account was launched last August by DeSantis’ political aide Christina Pushaw, and its use of AI shows the increasing normalization of deepfakes in US politics. Earlier this year, Donald Trump shared an AI-created image of him praying as well as an audio deepfake mocking DeSantis’ campaign launch on Twitter. After Joe Biden announced he would be running for re-election in 2024, the RNC published an attack ad that also featured AI-generated imagery.
Matt Wolking, a spokesperson for the Never Back Down PAC, which is supporting DeSantis’ campaign, told The Verge: “No campaign has pushed more misleading deepfakes, false photoshops, and outright fabrications than the Trump campaign. It is 100% true that Donald Trump empowered and embraced Fauci — he even gave him a presidential commendation.” When asked if the images were AI-generated or not, Wolking declined to answer, saying “You’ll have to ask the campaign.”
We’ve reached out to the campaign for comment. After this story was published, DeSantis’ aide Pushaw, who describes herself as the “rapid response director” for the campaign (the “DeSantis War Room” Twitter account says it provides “Rapid Response for @RonDeSantis”), tweeted a screenshot of Trump sharing an edited image of DeSantis riding on the back of a rhino, captioning it “I think this might be an AI-generated image. Who knows?”
Although some recent uses of AI by Republican politicians are obvious fakes (e.g. Trump’s audio deepfake, which features the Devil and Adolf Hitler), campaigns continue to blur the line between parody and disinformation. With this latest example, the mixture of fake and real images in a single collage makes the distinction even harder. It takes what is a plausible narrative — of Trump and Fauci as friendly collaborators — and encourages viewers already inclined to believe this framing to see it as a well-evidenced truth. Deepfakes are helping politicians create their own reality.
Update, Thursday January 8th, 08:54AM ET: Updated story with comment from Matt Wolking and a tweet from Christina Pushaw.
]]>The FBI has issued an advisory warning of an “uptick” in extortion schemes involving fake nudes created with the help of AI editing tools.
The agency says that as of April this year, it’s received an increasing number of reports of such “sextortion” schemes. Malicious actors find benign images of a victim on social media then edit them using AI to create realistic and sexually-explicit content.
“The photos are then sent directly to the victims by malicious actors for sextortion or harassment,” writes the agency. “Once circulated, victims can face significant challenges in preventing the continual sharing of the manipulated content or removal from the internet.”
“The key motivators for this are a desire for more illicit content, financial gain, or to bully and harass others”
The FBI says blackmailers typically use such material to demand real nude images from a victim or payments of some sort. Says the agency: “The key motivators for this are a desire for more illicit content, financial gain, or to bully and harass others.”
The agency recommends that the public “exercise caution” when sharing images of themselves online, but this is difficult advice to follow. Only a few images or videos are needed to create a deepfake, and no-one can be completely safe from such extortion schemes unless they remove all images of themselves from the web. Even then, individuals could covertly capture photographs in real life if they know their target personally.
Nude deepfakes first began to spread online in 2017, when users on forums like Reddit began using new AI research methods to create sexually explicit content of female celebrities. Although there have been some attempts to counter the spread of this content online, tools and sites to create deepfake nudes are easily accessible.
The FBI notes that such extortion schemes “may violate several federal criminal statutes.” There are also a limited number of global laws that criminalize the creation of such non-consensual fake images. In Virginia in the US, for example, deepfakes are outlawed as a type of “revenge porn” while the UK is currently planning to make the sharing of such images illegal in its upcoming Online Safety Bill.
]]>