For the last couple of years we’ve tried to predict what’s coming next in AI. It’s a bit of a fool’s game given how fast this industry moves… But we’re on a roll, so we’re doing it again. In this edition of What’s Next in Tech, discover what’s next for AI in 2025.
What’s coming next in the fast-paced world of AI? Join MIT Technology Review’s editors on January 16 for 5 AI Predictions for 2025, a special LinkedIn Live event exploring transformative trends and insights shaping the next twelve months of AI and business. Register for free today.
So what’s coming in 2025? We’re going to ignore the obvious here: You can bet that agents and smaller, more efficient, language models will continue to shape the industry. Instead, here are some alternative picks from our AI team
1. Generative virtual playgrounds: If 2023 was the year of generative images and 2024 was the year of generative video—what comes next? If you guessed generative virtual worlds (a.k.a. video games), high fives all round. We got a tiny glimpse of this technology in February, when Google DeepMind revealed a generative model called Genie that could take a still image and turn it into a side-scrolling 2D platform game that players could interact with. In December, the firm revealed Genie 2, a model that can spin a starter image into an entire virtual world. Other companies are building similar tech.
2. Large language models that “reason”: The buzz was justified. When OpenAI revealed o1 in September, it introduced a new paradigm in how large language models work. Two months later, the firm pushed that paradigm forward in almost every way with o3—a model that just might reshape this technology for good. Most models, including OpenAI’s flagship GPT-4, spit out the first response they come up with. Sometimes it’s correct; sometimes it’s not. But the firm’s new models are trained to work through their answers step by step, breaking down tricky problems into a series of simpler ones. When one approach isn’t working, they try another. This technique, known as “reasoning” (yes—we know exactly how loaded that term is), can make this technology more accurate, especially for math, physics, and logic problems. It’s also crucial for agents.
3. It’s boom time for AI in science: One of the most exciting uses for AI is speeding up discovery in the natural sciences. Perhaps the greatest vindication of AI’s potential on this front came last October, when the Royal Swedish Academy of Sciences awarded the Nobel Prize for chemistry to Demis Hassabis and John M. Jumper from Google DeepMind for building the AlphaFold tool, which can solve protein folding, and to David Baker for building tools to help design new proteins. Expect this trend to continue next year, and to see more data sets and models that are aimed specifically at scientific discovery. Proteins were the perfect target for AI, because the field had excellent existing data sets that AI models could be trained on. The hunt is on to find the next big thing.
Read the full story for more on these three predictions, as well as two additional things our team anticipates will happen this year in the world of AI.
MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here.
For the last couple of years we’ve had a go at predicting what’s coming next in AI. A fool’s game given how fast this industry moves. But we’re ona roll, and we’re doing it again.
How did we score last time round? Our four hot trends to watch out for in 2024 included what we called customized chatbots—interactive helper apps powered by multimodal large language models (check: we didn’t know it yet, but we were talking about what everyone now calls agents, the hottest thing in AI right now); generative video (check: few technologies have improved so fast in the last 12 months, with OpenAI and Google DeepMind releasing their flagship video generation models, Soraand Veo, within a week of each other this December); and more general-purpose robots that can do a wider range of tasks (check: the payoffs from large language models continue to trickle down to other parts of the tech industry, and robotics is top of the list).
We also said that AI-generated election disinformation would be everywhere, but here—happily—we got it wrong. There were many things to wring our hands over this year, but political deepfakes were thin on the ground.
So what’s coming in 2025? We’re going to ignore the obvious here: You can bet that agents and smaller, more efficient, language models will continue to shape the industry. Instead, here are five alternative picks from our AI team.
1. Generative virtual playgrounds
If 2023 was the year of generative images and 2024 was the year of generative video—what comes next? If you guessed generative virtual worlds (a.k.a. video games), high fives all round.
We got a tiny glimpse of this technology in February, when Google DeepMind revealed a generative model called Genie that could take a still image and turn it into a side-scrolling 2D platform game that players could interact with. In December, the firm revealed Genie 2, a model that can spin a starter image into an entire virtual world.
Other companies are building similar tech. In October, the AI startups Decart and Etched revealed an unofficial Minecraft hack in which every frame of the game gets generated on the fly as you play. And World Labs, a startup cofounded by Fei-Fei Li—creator of ImageNet, the vast data set of photos that kick-started the deep-learning boom—is building what it calls large world models, or LWMs.
One obvious application is video games. There’s a playful tone to these early experiments, and generative 3D simulations could be used to explore design concepts for new games, turning a sketch into a playable environment on the fly. This could lead to entirely new types of games.
But they could also be used to train robots. World Labs wants to develop so-called spatial intelligence—the ability for machines to interpret and interact with the everyday world. But robotics researchers lack good data about real-world scenarios with which to train such technology. Spinning up countless virtual worlds and dropping virtual robots into them to learn by trial and error could help make up for that.
2. Large language models that “reason”
The buzz was justified. When OpenAI revealed o1 in September, it introduced a new paradigm in how large language models work. Two months later, the firm pushed that paradigm forward in almost every way with o3—a model that just might reshape this technology for good.
Most models, including OpenAI’s flagship GPT-4, spit out the first response they come up with. Sometimes it’s correct; sometimes it’s not. But the firm’s new models are trained to work through their answers step by step, breaking down tricky problems into a series of simpler ones. When one approach isn’t working, they try another. This technique, known as “reasoning” (yes—we know exactly how loaded that term is), can make this technology more accurate, especially for math, physics, and logic problems.
The next big thing is AI tools that can do more complex tasks. Here’s how they will work.
It’s also crucial for agents.
In December, Google DeepMind revealed an experimental new web-browsing agent called Mariner. In the middle of a preview demo that the company gave to MIT Technology Review, Mariner seemed to get stuck. Megha Goel, a product manager at the company, had asked the agent to find her a recipe for Christmas cookies that looked like the ones in a photo she’d given it. Mariner found a recipe on the web and started adding the ingredients to Goel’s online grocery basket.
Then it stalled; it couldn’t figure out what type of flour to pick. Goel watched as Mariner explained its steps in a chat window: “It says, ‘I will use the browser’s Back button to return to the recipe.’”
It was a remarkable moment. Instead of hitting a wall, the agent had broken the task down into separate actions and picked one that might resolve the problem. Figuring out you need to click the Back button may sound basic, but for a mindless bot it’s akin to rocket science. And it worked: Mariner went back to the recipe, confirmed the type of flour, and carried on filling Goel’s basket.
Google DeepMind is also building an experimental version of Gemini 2.0, its latest large language model, that uses this step-by-step approach to problem solving, called Gemini 2.0 Flash Thinking.
But OpenAI and Google are just the tip of the iceberg. Many companies are building large language models that use similar techniques, making them better at a whole range of tasks, from cooking to coding. Expect a lot more buzz about reasoning (we know, we know) this year.
—Will Douglas Heaven
3. It’s boom time for AI in science
One of the most exciting uses for AI is speeding up discovery in the natural sciences. Perhaps the greatest vindication of AI’s potential on this front came last October, when the Royal Swedish Academy of Sciences awarded the Nobel Prize for chemistryto Demis Hassabis and John M. Jumper from Google DeepMind for building the AlphaFold tool, which can solve protein folding, and to David Baker for building tools to help design new proteins.
Expect this trend to continue next year, and to see more data sets and models that are aimed specifically at scientific discovery. Proteins were the perfect target for AI, because the field had excellent existing data sets that AI models could be trained on.
The hunt is on to find the next big thing. One potential area is materials science. Meta has released massive data sets and models that could help scientists use AI to discover new materials much faster, and in December, Hugging Face, together with the startup Entalpic, launched LeMaterial, an open-source project that aims to simplify and accelerate materials research. Their first project is a data set that unifies, cleans, and standardizes the most prominent material data sets.
AI model makers are also keen to pitch their generative products as research tools for scientists. OpenAI let scientists test its latest o1 model and see how it might support them in research. The results were encouraging.
Having an AI tool that can operate in a similar way to a scientist is one of the fantasies of the tech sector. In a manifesto published in October last year, Anthropic founder Dario Amodei highlighted science, especially biology, as one of the key areas where powerful AI could help. Amodei speculates that in the future, AI could be not only a method of data analysis but a “virtual biologist who performs all the tasks biologists do.” We’re still a long way away from this scenario. But next year, we might see important steps toward it.
—Melissa Heikkilä
4. AI companies get cozier with national security
There is a lot of money to be made by AI companies willing to lend their tools to border surveillance, intelligence gathering, and other national security tasks.
The US military has launched a number of initiatives that show it’s eager to adopt AI, from the Replicator program—which, inspired by the war in Ukraine, promises to spend $1 billion on small drones—to the Artificial Intelligence Rapid Capabilities Cell, a unit bringing AI into everything from battlefield decision-making to logistics. European militaries are under pressure to up their tech investment, triggered by concerns that Donald Trump’s administration will cut spending to Ukraine. Rising tensions between Taiwan and China weigh heavily on the minds of military planners, too.
The US still has no federal privacy law. But recent enforcement actions against data brokers may offer some new protections for Americans’ personal information.
In 2025, these trends will continue to be a boon for defense-tech companies like Palantir, Anduril, and others, which are now capitalizing on classified military datato train AI models.
The defense industry’s deep pockets will tempt mainstream AI companies into the fold too. OpenAI in December announced it is partnering with Anduril on a program to take down drones, completing a year-long pivotaway from its policy of not working with the military. It joins the ranks of Microsoft, Amazon, and Google, which have worked with the Pentagon for years.
Other AI competitors, which are spending billions to train and develop new models, will face more pressure in 2025 to think seriously about revenue. It’s possible that they’ll find enough non-defense customers who will pay handsomely for AI agents that can handle complex tasks, or creative industries willing to spend on image and video generators.
But they’ll also be increasingly tempted to throw their hats in the ring for lucrative Pentagon contracts. Expect to see companies wrestle with whether working on defense projects will be seen as a contradiction to their values. OpenAI’s rationale for changing its stance was that “democracies should continue to take the lead in AI development,” the company wrote, reasoning that lending its models to the military would advance that goal. In 2025, we’ll be watching others follow its lead.
—James O’Donnell
5. Nvidia sees legitimate competition
For much of the current AI boom, if you were a tech startup looking to try your hand at making an AI model, Jensen Huang was your man. As CEO of Nvidia, the world’s most valuable corporation, Huang helped the company become the undisputed leader of chips used both to train AI models and to ping a model when anyone uses it, called “inferencing.”
A number of forces could change that in 2025. For one, behemoth competitors like Amazon, Broadcom, AMD, and others have been investing heavily in new chips, and there are early indications that these could compete closely with Nvidia’s—particularly for inference, where Nvidia’s lead is less solid.
A growing number of startups are also attacking Nvidia from a different angle. Rather than trying to marginally improve on Nvidia’s designs, startups like Groq are making riskier bets on entirely new chip architectures that, with enough time, promise to provide more efficient or effective training. In 2025 these experiments will still be in their early stages, but it’s possible that a standout competitor will change the assumption that top AI models rely exclusively on Nvidia chips.
Underpinning this competition, the geopolitical chip war will continue. That war thus far has relied on two strategies. On one hand, the West seeks to limit exports to China of top chips and the technologies to make them. On the other, efforts like the US CHIPS Act aim to boost domestic production of semiconductors.
Donald Trump may escalate those export controls and has promised massive tariffs on any goods imported from China. In 2025, such tariffs would put Taiwan—on which the US relies heavily because of the chip manufacturer TSMC—at the center of the trade wars. That’s because Taiwan has said it will help Chinese firms relocate to the island to help them avoid the proposed tariffs. That could draw further criticism from Trump, who has expressed frustration with US spending to defend Taiwan from China.
It’s unclear how these forces will play out, but it will only further incentivize chipmakers to reduce reliance on Taiwan, which is the entire purpose of the CHIPS Act. As spending from the bill begins to circulate, next year could bring the first evidence of whether it’s materially boosting domestic chip production.
The profit incentive in U.S. health care, high costs for the insured and uninsured alike, and wide disparities remain challenges for the U.S
Moving forward, the Commonwealth Fund is committed to envisioning and building an equitable health system that works for everyone.
The celebration of a new year marks an opportunity to reflect on the past and look forward to the future. At the Commonwealth Fund, we take stock of what we observed, what we learned, and how we impacted health policy, practice, and leadership development. In 2024, as always, we worked to fulfill our mission of promoting a high-performing, equitable health care system for everyone.
It is clear that 2024 provided much to reflect on, and three themes really rose to the surface. A common thread among these themes is the need for courage — courage to implement commonsense and well-known solutions to pressing and longstanding problems; courage to challenge the deeply entrenched interests that preference the status quo to change; and courage to hold ourselves accountable to produce better health outcomes.
First, health care in this country is increasingly prioritizing revenue and profits over patients —and people are angry. Most notably, the UnitedHealthcare tragedy led to a tirade of public outrage and frustration about the business-as-usual practices of health insurers that can result in delayed or denied care, with financial and, sometimes, life-and-death consequences.
But we saw the profit motive play out in other ways: there was the collapse of Steward Health Care, the nation’s largest for-profit hospital system. This event — a quintessential case study of private equity’s extraction of financial value at the expense of quality, safety, and patient care — destabilized the care of patients in multiple states and drew the ire of state leaders, and even a bipartisan coalition of congressional leaders.
Furthermore, evidence continues to mount that consolidation of large health systems doesn’t yield improvements in quality, safety, or control of costs. At the same time, health care providers, in unprecedented fashion, are organizing and unionizing as a counterweight to what they feel has been a move to prioritize the business of health care over the importance of patient care. Given the power of these forces, it will take courage, from many stakeholders, to turn this tide.
Second, people — even many people who have health insurance — can’t get the care they need because of costs or because they simply do not have access to the providers they need in their communities. Despite more people having health care coverage than ever before, our research found that nearly a quarter of working-age adults had insurance but were underinsured — that is, enrolled in health plans with high out-of-pocket costs that make it difficult to afford care. We see people skipping needed care, avoiding specialist visits, not filling their prescribed medications, and making heartbreaking choices between needed treatments and necessities like food or rent. At the end of this chain reaction are poorer health outcomes that are completely preventable.
Strengthening the Affordable Care Act will be critical going forward, and a real reckoning and repair of employer-sponsored insurance — which provides coverage to 172 million Americans — is necessary. Our National Task Force on the Future Role of Employers in the U.S. Health System, an expert group that has been meeting for more than a year, will soon weigh in with recommendations. But again, courage will be required if we are to see real change.
And third, here in the United States we spend the most — far more than other developed countries — but somehow have the least to show for it. We have lower life expectancy, higher infant mortality, more chronic disease and health disparities than counterpart nations across the globe. In addition, we have wide disparities across states in the U.S., in terms of health outcomes, access to care, quality, and equity. Our Mirror, Mirror 2024 report, which compares the performance of the health care systems of 10 countries, demonstrates yet again that the U.S. continues to be in a class by itself — trailing other countries in almost every measure of quality.
Public policy and health policy matter when it comes to health outcomes. This is true not only in our global comparisons but also bears out when we look at our Scorecard on State Health System Performance and our State Health Disparities Report. Commitments to a strong safety net, universal coverage, and quality and equity separate the top from the low performers across the states — in outcomes, access, quality, and equity. Another key differentiator is investment in primary care. Other nations devote 15 percent of their health care spending to primary care, but we commit a paltry 4 percent. It’s little wonder we have poorer health outcomes and a crisis in primary care. These lessons have been in front of our eyes for years, and there is a clear formula for how we can do better as a nation, but we seem to be stuck, needing real courage to change it.
As we look to 2025, we acknowledge these are big problems. Changing course will require following the evidence, looking at other proven models, and ultimately courage from all stakeholders in the health care system: leaders, providers, patients. In the meanwhile, many wonder what the new administration’s health care priorities will be. Will Medicaid remain the program we know today, protecting our most disadvantaged neighbors? How will the growth of Medicare Advantage affect seniors’ access to care? Will there be fundamental changes to — or a dismantling of — the Affordable Care Act?
And then, there are forces at work bigger than the U.S. political system. For one, technological advances, including AI, remote patient monitoring, wearable health technology, and genomics are rapidly changing health care — and hold promise make it more efficient and effective. But we must address the financing and implementation of these tools — and ensure they are not solely benefiting one group or population at the expense of everyone else. Public health, climate change, behavioral health, and maternal health remain fundamental challenges that will also require our resolute attention.
As the Commonwealth Fund moves into its 107th year, we will be supported by our newly launched values: to be bold and impactful, to center community and common humanity, to anchor equity and integrity in all we do, and to work in a collaborative and joyful environment. And we are strengthened by our Board of Directors. Dr. Margaret Hamburg, an internationally recognized authority in medicine and public health, finished her first year as board chair. We welcome her insights and expertise as we move forward. After 10 years, we bid farewell to board member Dr. Mark Smith. We benefitted greatly from Mark’s knowledge and experience in medicine and philanthropy.
So we say farewell to 2024, and welcome 2025. We’ve had an incredible year of accomplishments to build from, and I remain humbled, honored, and privileged to lead the Fund at this important time. Bolstered by our history, our values, our board, and our incredible team, we are ready to meet this moment, with bold investments, evidence, hard work, heart, and courage. We hope you will stay engaged with us as we remain committed to making the health care system work for everyone.
We all know what it means, colloquially, to google something. You pop a few relevant words in a search box and in return get a list of blue links to the most relevant results. Maybe some quick explanations up top. Maybe some maps or sports scores or a video. But fundamentally, it’s just fetching information that’s already out there on the internet and showing it to you, in some sort of structured way.
But all that is up for grabs. We are at a new inflection point.
The biggest change to the way search engines have delivered information to us since the 1990s is happening right now. No more keyword searching. No more sorting through links to click. Instead, we’re entering an era of conversational search. Which means instead of keywords, you use real questions, expressed in natural language. And instead of links, you’ll increasingly be met with answers, written by generative AI and based on live information from all across the internet, delivered the same way.
Of course, Google—the company that has defined search for the past 25 years—is trying to be out front on this. In May of 2023, it began testing AI-generated responses to search queries, using its large language model (LLM) to deliver the kinds of answers you might expect from an expert source or trusted friend. It calls these AI Overviews. Google CEO Sundar Pichai described this to MIT Technology Review as “one of the most positive changes we’ve done to search in a long, long time.”
AI Overviews fundamentally change the kinds of queries Google can address. You can now ask it things like “I’m going to Japan for one week next month. I’ll be staying in Tokyo but would like to take some day trips. Are there any festivals happening nearby? How will the surfing be in Kamakura? Are there any good bands playing?” And you’ll get an answer—not just a link to Reddit, but a built-out answer with current results.
More to the point, you can attempt searches that were once pretty much impossible, and get the right answer. You don’t have to be able to articulate what, precisely, you are looking for. You can describe what the bird in your yard looks like, or what the issue seems to be with your refrigerator, or that weird noise your car is making, and get an almost human explanation put together from sources previously siloed across the internet. It’s amazing, and once you start searching that way, it’s addictive.
And it’s not just Google. OpenAI’s ChatGPT now has access to the web, making it far better at finding up-to-date answers to your queries. Microsoft released generative search results for Bing in September. Meta has its own version. The startup Perplexity was doing the same, but with a “move fast, break things” ethos. Literal trillions of dollars are at stake in the outcome as these players jockey to become the next go-to source for information retrieval—the next Google.
Search firm Exa wants to use the tech behind large language models to tame the wildness of the web.
Not everyone is excited for the change. Publishers are completely freaked out. The shift has heightened fears of a “zero-click” future, where search referral traffic—a mainstay of the web since before Google existed—vanishes from the scene.
I got a vision of that future last June, when I got a push alert from the Perplexity app on my phone. Perplexity is a startup trying to reinvent web search. But in addition to delivering deep answers to queries, it will create entire articles about the news of the day, cobbled together by AI from different sources.
On that day, it pushed me a story about a new drone company from Eric Schmidt. I recognized the story. Forbeshad reported it exclusively, earlier in the week, but it had been locked behind a paywall. The image on Perplexity’s story looked identical to one from Forbes. The language and structure were quite similar. It was effectively the same story, but freely available to anyone on the internet. I texted a friend who had edited the original story to ask if Forbes had a deal with the startup to republish its content. But there was no deal. He was shocked and furious and, well, perplexed. He wasn’t alone. Forbes, the New York Times, and Condé Nast have now all sent the company cease-and-desist orders. News Corp is suing for damages.
People are worried about what these new LLM-powered results will mean for our fundamental shared reality. It could spell the end of the canonical answer.
It was precisely the nightmare scenario publishers have been so afraid of: The AI was hoovering up their premium content, repackaging it, and promoting it to its audience in a way that didn’t really leave any reason to click through to the original. In fact, on Perplexity’s About page, the first reason it lists to choose the search engine is “Skip the links.”
But this isn’t just about publishers (or my own self-interest).
People are also worried about what these new LLM-powered results will mean for our fundamental shared reality. Language models have a tendency to make stuff up—they can hallucinate nonsense. Moreover, generative AI can serve up an entirely new answer to the same question every time, or provide different answers to different people on the basis of what it knows about them. It could spell the end of the canonical answer.
But make no mistake: This is the future of search. Try it for a bit yourself, and you’ll see.
Sure, we will always want to use search engines to navigate the web and to discover new and interesting sources of information. But the links out are taking a back seat. The way AI can put together a well-reasoned answer to just about any kind of question, drawing on real-time data from across the web, just offers a better experience. That is especially true compared with what web search has become in recent years. If it’s not exactly broken (data shows more people are searching with Google more often than ever before), it’s at the very least increasingly cluttered and daunting to navigate.
Who wants to have to speak the language of search engines to find what you need? Who wants to navigate links when you can have straight answers? And maybe: Who wants to have to learn when you can just know?
In the beginning there was Archie. It was the first real internet search engine, and it crawled files previously hidden in the darkness of remote servers. It didn’t tell you what was in those files—just their names. It didn’t preview images; it didn’t have a hierarchy of results, or even much of an interface. But it was a start. And it was pretty good.
Then Tim Berners-Lee created the World Wide Web, and all manner of web pages sprang forth. The Mosaic home page and the Internet Movie Database and Geocities and the Hampster Dance and web rings and Salon and eBay and CNN and federal government sites and some guy’s home page in Turkey.
Until finally, there was too much web to even know where to start. We really needed a better way to navigate our way around, to actually find the things we needed.
AI advances are rapidly speeding up the process of training robots, and helping them do new tasks almost instantly.
And so in 1994 Jerry Yang created Yahoo, a hierarchical directory of websites. It quickly became the home page for millions of people. And it was … well, it was okay. TBH, and with the benefit of hindsight, I think we all thought it was much better back then than it actually was.
But the web continued to grow and sprawl and expand, every day bringing more information online. Rather than just a list of sites by category, we needed something that actually looked at all that content and indexed it. By the late ’90s that meant choosing from a variety of search engines: AltaVista and AlltheWeb and WebCrawler and HotBot. And they were good—a huge improvement. At least at first.
But alongside the rise of search engines came the first attempts to exploit their ability to deliver traffic. Precious, valuable traffic, which web publishers rely on to sell ads and retailers use to get eyeballs on their goods. Sometimes this meant stuffing pages with keywords or nonsense text designed purely to push pages higher up in search results. It got pretty bad.
And then came Google. It’s hard to overstate how revolutionary Google was when it launched in 1998. Rather than just scanning the content, it also looked at the sources linking to a website, which helped evaluate its relevance. To oversimplify: The more something was cited elsewhere, the more reliable Google considered it, and the higher it would appear in results. This breakthrough made Google radically better at retrieving relevant results than anything that had come before. It was amazing.
For 25 years, Google dominated search. Google was search, for most people. (The extent of that domination is currently the subject of multiple legal probes in the United States and the European Union.)
But Google has long been moving away from simply serving up a series of blue links, notes Pandu Nayak, Google’s chief scientist for search.
“It’s not just so-called web results, but there are images and videos, and special things for news. There have been direct answers, dictionary answers, sports, answers that come with Knowledge Graph, things like featured snippets,” he says, rattling off a litany of Google’s steps over the years to answer questions more directly.
It’s true: Google has evolved over time, becoming more and more of an answer portal. It has added tools that allow people to just get an answer—the live score to a game, the hours a café is open, or a snippet from the FDA’s website—rather than being pointed to a website where the answer may be.
But once you’ve used AI Overviews a bit, you realize they are different.
Take featured snippets, the passages Google sometimes chooses to highlight and show atop the results themselves. Those words are quoted directly from an original source. The same is true of knowledge panels, which are generated from information stored in a range of public databases and Google’s Knowledge Graph, its database of trillions of facts about the world.
While these can be inaccurate, the information source is knowable (and fixable). It’s in a database. You can look it up. Not anymore: AI Overviews can be entirely new every time, generated on the fly by a language model’s predictive text combined with an index of the web.
“I think it’s an exciting moment where we have obviously indexed the world. We built deep understanding on top of it with Knowledge Graph. We’ve been using LLMs and generative AI to improve our understanding of all that,” Pichai told MIT Technology Review. “But now we are able to generate and compose with that.”
The result feels less like a querying a database than like asking a very smart, well-read friend. (With the caveat that the friend will sometimes make things up if she does not know the answer.)
“[The company’s] mission is organizing the world’s information,” Liz Reid, Google’s head of search, tells me from its headquarters in Mountain View, California. “But actually, for a while what we did was organize web pages. Which is not really the same thing as organizing the world’s information or making it truly useful and accessible to you.”
That second concept—accessibility—is what Google is really keying in on with AI Overviews. It’s a sentiment I hear echoed repeatedly while talking to Google execs: They can address more complicated types of queries more efficiently by bringing in a language model to help supply the answers. And they can do it in natural language.
That will become even more important for a future where search goes beyond text queries. For example, Google Lens, which lets people take a picture or upload an image to find out more about something, uses AI-generated answers to tell you what you may be looking at. Google has even showed off the ability to query live video.
When it doesn’t have an answer, an AI model can confidently spew back a response anyway. For Google, this could be a real problem. For the rest of us, it could actually be dangerous.
“We are definitely at the start of a journey where people are going to be able to ask, and get answered, much more complex questions than where we’ve been in the past decade,” says Pichai.
There are some real hazards here. First and foremost: Large language models will lie to you. They hallucinate. They get shit wrong. When it doesn’t have an answer, an AI model can blithely and confidently spew back a response anyway. For Google, which has built its reputation over the past 20 years on reliability, this could be a real problem. For the rest of us, it could actually be dangerous.
In May 2024, AI Overviews were rolled out to everyone in the US. Things didn’t go well. Google, long the world’s reference desk, told people to eat rocks and to put glue on their pizza. These answers were mostly in response to what the company calls adversarial queries—those designed to trip it up. But still. It didn’t look good. The company quickly went to work fixing the problems—for example, by deprecating so-called user-generated content from sites like Reddit, where some of the weirder answers had come from.
Yet while its errors telling people to eat rocks got all the attention, the more pernicious danger might arise when it gets something less obviously wrong. For example, in doing research for this article, I asked Google when MIT Technology Review went online. It helpfully responded that “MIT Technology Review launched its online presence in late 2022.” This was clearly wrong to me, but for someone completely unfamiliar with the publication, would the error leap out?
I came across several examples like this, both in Google and in OpenAI’s ChatGPT search. Stuff that’s just far enough off the mark not to be immediately seen as wrong. Google is banking that it can continue to improve these results over time by relying on what it knows about quality sources.
“When we produce AI Overviews,” says Nayak, “we look for corroborating information from the search results, and the search results themselves are designed to be from these reliable sources whenever possible. These are some of the mechanisms we have in place that assure that if you just consume the AI Overview, and you don’t want to look further … we hope that you will still get a reliable, trustworthy answer.”
In the case above, the 2022 answer seemingly came from a reliable source—a story about MIT Technology Review’s email newsletters, which launched in 2022. But the machine fundamentally misunderstood. This is one of the reasons Google uses human beings—raters—to evaluate the results it delivers for accuracy. Ratings don’t correct or control individual AI Overviews; rather, they help train the model to build better answers. But human raters can be fallible. Google is working on that too.
“Raters who look at your experiments may not notice the hallucination because it feels sort of natural,” says Nayak. “And so you have to really work at the evaluation setup to make sure that when there is a hallucination, someone’s able to point out and say, That’s a problem.”
The new search
Google has rolled out its AI Overviews to upwards of a billion people in more than 100 countries, but it is facing upstarts with new ideas about how search should work.
Search Engine
Google The search giant has added AI Overviews to search results. These overviews take information from around the web and Google’s Knowledge Graph and use the company’s Gemini language model to create answers to search queries.
What it’s good at
Google’s AI Overviews are great at giving an easily digestible summary in response to even the most complex queries, with sourcing boxes adjacent to the answers. Among the major options, its deep web index feels the most “internety.” But web publishers fear its summaries will give people little reason to click through to the source material.
Perplexity Perplexity is a conversational search engine that uses third-party large language models from OpenAI and Anthropic to answer queries.
Perplexity is fantastic at putting together deeper dives in response to user queries, producing answers that are like mini white papers on complex topics. It’s also excellent at summing up current events. But it has gotten a bad rep with publishers, who say it plays fast and loose with their content.
ChatGPT While Google brought AI to search, OpenAI brought search to ChatGPT. Queries that the model determines will benefit from a web search automatically trigger one, or users can manually select the option to add a web search.
Thanks to its ability to preserve context across a conversation, ChatGPT works well for performing searches that benefit from follow-up questions—like planning a vacation through multiple search sessions. OpenAI says users sometimes go “20 turns deep” in researching queries. Of these three, it makes links out to publishers least prominent.
When I talked to Pichai about this, he expressed optimism about the company’s ability to maintain accuracy even with the LLM generating responses. That’s because AI Overviews is based on Google’s flagship large language model, Gemini, but also draws from Knowledge Graph and what it considers reputable sources around the web.
“You’re always dealing in percentages. What we have done is deliver it at, like, what I would call a few nines of trust and factuality and quality. I’d say 99-point-few-nines. I think that’s the bar we operate at, and it is true with AI Overviews too,” he says. “And so the question is, are we able to do this again at scale? And I think we are.”
There’s another hazard as well, though, which is that people ask Google all sorts of weird things. If you want to know someone’s darkest secrets, look at their search history. Sometimes the things people ask Google about are extremely dark. Sometimes they are illegal. Google doesn’t just have to be able to deploy its AI Overviews when an answer can be helpful; it has to be extremely careful not to deploy them when an answer may be harmful.
“If you go and say ‘How do I build a bomb?’ it’s fine that there are web results. It’s the open web. You can access anything,” Reid says. “But we do not need to have an AI Overview that tells you how to build a bomb, right? We just don’t think that’s worth it.”
But perhaps the greatest hazard—or biggest unknown—is for anyone downstream of a Google search. Take publishers, who for decades now have relied on search queries to send people their way. What reason will people have to click through to the original source, if all the information they seek is right there in the search result?
Plus: The original startup behind Stable Diffusion has launched a generative AI for video.
Rand Fishkin, cofounder of the market research firm SparkToro, publishes research on so-called zero-click searches. As Google has moved increasingly into the answer business, the proportion of searches that end without a click has gone up and up. His sense is that AI Overviews are going to explode this trend.
“If you are reliant on Google for traffic, and that traffic is what drove your business forward, you are in long- and short-term trouble,” he says.
Don’t panic, is Pichai’s message. He argues that even in the age of AI Overviews, people will still want to click through and go deeper for many types of searches. “The underlying principle is people are coming looking for information. They’re not looking for Google always to just answer,” he says. “Sometimes yes, but the vast majority of the times, you’re looking at it as a jumping-off point.”
Reid, meanwhile, argues that because AI Overviews allow people to ask more complicated questions and drill down further into what they want, they could even be helpful to some types of publishers and small businesses, especially those operating in the niches: “You essentially reach new audiences, because people can now express what they want more specifically, and so somebody who specializes doesn’t have to rank for the generic query.”
“I’m going to start with something risky,” Nick Turley tells me from the confines of a Zoom window. Turley is the head of product for ChatGPT, and he’s showing off OpenAI’s new web search tool a few weeks before it launches. “I should normally try this beforehand, but I’m just gonna search for you,” he says. “This is always a high-risk demo to do, because people tend to be particular about what is said about them on the internet.”
He types my name into a search field, and the prototype search engine spits back a few sentences, almost like a speaker bio. It correctly identifies me and my current role. It even highlights a particular story I wrote years ago that was probably my best known. In short, it’s the right answer. Phew?
A few weeks after our call, OpenAI incorporated search into ChatGPT, supplementing answers from its language model with information from across the web. If the model thinks a response would benefit from up-to-date information, it will automatically run a web search (OpenAI won’t say who its search partners are) and incorporate those responses into its answer, with links out if you want to learn more. You can also opt to manually force it to search the web if it does not do so on its own. OpenAI won’t reveal how many people are using its web search, but it says some 250 million people use ChatGPT weekly, all of whom are potentially exposed to it.
“There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be a better super-assistant for you.”Kevin Weil, chief product officer, OpenAI
According to Fishkin, these newer forms of AI-assisted search aren’t yet challenging Google’s search dominance. “It does not appear to be cannibalizing classic forms of web search,” he says.
OpenAI insists it’s not really trying to compete on search—although frankly this seems to me like a bit of expectation setting. Rather, it says, web search is mostly a means to get more current information than the data in its training models, which tend to have specific cutoff dates that are often months, or even a year or more, in the past. As a result, while ChatGPT may be great at explaining how a West Coast offense works, it has long been useless at telling you what the latest 49ers score is. No more.
“I come at it from the perspective of ‘How can we make ChatGPT able to answer every question that you have? How can we make it more useful to you on a daily basis?’ And that’s where search comes in for us,” Kevin Weil, the chief product officer with OpenAI, tells me. “There’s an incredible amount of content on the web. There are a lot of things happening in real time. You want ChatGPT to be able to use that to improve its answers and to be able to be a better super-assistant for you.”
Today ChatGPT is able to generate responses for very current news events, as well as near-real-time information on things like stock prices. And while ChatGPT’s interface has long been, well, boring, search results bring in all sorts of multimedia—images, graphs, even video. It’s a very different experience.
Weil also argues that ChatGPT has more freedom to innovate and go its own way than competitors like Google—even more than its partner Microsoft does with Bing. Both of those are ad-dependent businesses. OpenAI is not. (At least not yet.) It earns revenue from the developers, businesses, and individuals who use it directly. It’s mostly setting large amounts of money on fire right now—it’s projected to lose $14 billion in 2026, by some reports. But one thing it doesn’t have to worry about is putting ads in its search results as Google does.
Like Google, ChatGPT is pulling in information from web publishers, summarizing it, and including it in its answers. But it has also struck financial deals with publishers, a payment for providing the information that gets rolled into its results. (MIT Technology Review has been in discussions with OpenAI, Google, Perplexity, and others about publisher deals but has not entered into any agreements. Editorial was neither party to nor informed about the content of those discussions.)
But the thing is, for web search to accomplish what OpenAI wants—to be more current than the language model—it also has to bring in information from all sorts of publishers and sources that it doesn’t have deals with. OpenAI’s head of media partnerships, Varun Shetty, told MIT Technology Review that it won’t give preferential treatment to its publishing partners.
Instead, OpenAI told me, the model itself finds the most trustworthy and useful source for any given question. And that can get weird too. In that very first example it showed me—when Turley ran that name search—it described a story I wrote years ago for Wired about being hacked. That story remains one of the most widely read I’ve ever written. But ChatGPT didn’t link to it. It linked to a short rewrite from The Verge. Admittedly, this was on a prototype version of search, which was, as Turley said, “risky.”
When I asked him about it, he couldn’t really explain why the model chose the sources that it did, because the model itself makes that evaluation. The company helps steer it by identifying—sometimes with the help of users—what it considers better answers, but the model actually selects them.
“And in many cases, it gets it wrong, which is why we have work to do,” said Turley. “Having a model in the loop is a very, very different mechanism than how a search engine worked in the past.”
Indeed!
The model, whether it’s OpenAI’s GPT-4o or Google’s Gemini or Anthropic’s Claude, can be very, very good at explaining things. But the rationale behind its explanations, its reasons for selecting a particular source, and even the language it may use in an answer are all pretty mysterious. Sure, a model can explain very many things, but not when that comes to its own answers.
It was almost a decade ago, in 2016, when Pichai wrote that Google was moving from “mobile first” to “AI first”: “But in the next 10 years, we will shift to a world that is AI-first, a world where computing becomes universally available—be it at home, at work, in the car, or on the go—and interacting with all of these surfaces becomes much more natural and intuitive, and above all, more intelligent.”
We’re there now—sort of. And it’s a weird place to be. It’s going to get weirder. That’s especially true as these things we now think of as distinct—querying a search engine, prompting a model, looking for a photo we’ve taken, deciding what we want to read or watch or hear, asking for a photo we wish we’d taken, and didn’t, but would still like to see—begin to merge.
Google’s new AI search feature is a mess. So why is it telling us to eat rocks and gluey pizza, and can it be fixed?
The search results we see from generative AI are best understood as a waypoint rather than a destination. What’s most important may not be search in itself; rather, it’s that search has given AI model developers a path to incorporating real-time information into their inputs and outputs. And that opens up all sorts of possibilities.
“A ChatGPT that can understand and access the web won’t just be about summarizing results. It might be about doing things for you. And I think there’s a fairly exciting future there,” says OpenAI’s Weil. “You can imagine having the model book you a flight, or order DoorDash, or just accomplish general tasks for you in the future. It’s just once the model understands how to use the internet, the sky’s the limit.”
This is the agentic future we’ve been hearing about for some time now, and the more AI models make use of real-time data from the internet, the closer it gets.
Let’s say you have a trip coming up in a few weeks. An agent that can get data from the internet in real time can book your flights and hotel rooms, make dinner reservations, and more, based on what it knows about you and your upcoming travel—all without your having to guide it. Another agent could, say, monitor the sewage output of your home for certain diseases, and order tests and treatments in response. You won’t have to search for that weird noise your car is making, because the agent in your vehicle will already have done it and made an appointment to get the issue fixed.
“It’s not always going to be just doing search and giving answers,” says Pichai. “Sometimes it’s going to be actions. Sometimes you’ll be interacting within the real world. So there is a notion of universal assistance through it all.”
The model, whether it’s OpenAI’s GPT-4o or Google’s Gemini or Anthropic’s Claude, can be very, very good at explaining things. But the rationale behind its explanations, its reasons for selecting a particular source, and even the language it may use in an answer are all pretty mysterious. Sure, a model can explain very many things, but not when that comes to its own answers.
It was almost a decade ago, in 2016, when Pichai wrote that Google was moving from “mobile first” to “AI first”: “But in the next 10 years, we will shift to a world that is AI-first, a world where computing becomes universally available—be it at home, at work, in the car, or on the go—and interacting with all of these surfaces becomes much more natural and intuitive, and above all, more intelligent.”
We’re there now—sort of. And it’s a weird place to be. It’s going to get weirder. That’s especially true as these things we now think of as distinct—querying a search engine, prompting a model, looking for a photo we’ve taken, deciding what we want to read or watch or hear, asking for a photo we wish we’d taken, and didn’t, but would still like to see—begin to merge.
Google’s new AI search feature is a mess. So why is it telling us to eat rocks and gluey pizza, and can it be fixed?
The search results we see from generative AI are best understood as a waypoint rather than a destination. What’s most important may not be search in itself; rather, it’s that search has given AI model developers a path to incorporating real-time information into their inputs and outputs. And that opens up all sorts of possibilities.
“A ChatGPT that can understand and access the web won’t just be about summarizing results. It might be about doing things for you. And I think there’s a fairly exciting future there,” says OpenAI’s Weil. “You can imagine having the model book you a flight, or order DoorDash, or just accomplish general tasks for you in the future. It’s just once the model understands how to use the internet, the sky’s the limit.”
This is the agentic future we’ve been hearing about for some time now, and the more AI models make use of real-time data from the internet, the closer it gets.
Let’s say you have a trip coming up in a few weeks. An agent that can get data from the internet in real time can book your flights and hotel rooms, make dinner reservations, and more, based on what it knows about you and your upcoming travel—all without your having to guide it. Another agent could, say, monitor the sewage output of your home for certain diseases, and order tests and treatments in response. You won’t have to search for that weird noise your car is making, because the agent in your vehicle will already have done it and made an appointment to get the issue fixed.
“It’s not always going to be just doing search and giving answers,” says Pichai. “Sometimes it’s going to be actions. Sometimes you’ll be interacting within the real world. So there is a notion of universal assistance through it all.”
In the high-stakes and rapidly evolving world of artificial intelligence, a dramatic legal confrontation has emerged between Elon Musk and OpenAI. This case offers a fascinating lens into the intersection of technological ambition, corporate transformation, and personal rivalries, revealing a complex narrative with far-reaching implications.
The Roots of the Conflict
Elon Musk, once a co-founder and major supporter of OpenAI, has filed a lawsuit to challenge the organization’s transformation from a non-profit research lab to a profit-driven enterprise. Musk’s original vision for OpenAI was a safeguard against unchecked AI development—a mission to ensure that artificial intelligence would benefit humanity as a whole. However, OpenAI’s pivot to a “capped-profit” model, and its subsequent collaborations with industry giants like Microsoft, has sparked accusations of betrayal and overreach.
At the heart of Musk’s complaint lies a series of allegations: OpenAI’s deviation from its founding principles, its potential monopolistic behavior, and its partnerships that allegedly block competition, particularly for Musk’s own AI venture, xAI. This battle isn’t just about legal technicalities; it’s a clash of ideologies and business strategies in an industry shaping the future of human civilization.
How Conflict Began
OpenAI was created in 2015 as a non-profit group with a good goal: to create artificial general intelligence (AGI) that will help people. Musk helped to start the company and put $44 million into it early on. But things have gone very badly between them, and now there is a public case that shows how tense things are inside the AI business.
OpenAI’s Explosive Growth
The business has had a lot of amazing financial success:
– Valuation: $157 billion as of January 2024
– Annual Recurring Revenue: $4 billion in September 2024
– Year-over-Year Growth: 248%
– ChatGPT Revenue: $2.9 billion ARR
Musk’s Defence in Court
Elon Musk has filed a legal motion asking the federal court to intervene in OpenAI’s shift from its original non-profit status to a fully for-profit model. Musk’s arguments center on four key allegations:
Antitrust Violations: Musk claims that OpenAI’s transformation into a profit-driven entity has created unfair market conditions, potentially breaching antitrust laws. He argues that OpenAI’s monopolistic behavior could stifle innovation and competition in the AI industry.
Deviation from Charitable Goals: Musk asserts that OpenAI has strayed from its founding mission as a non-profit organization dedicated to the ethical development of artificial intelligence for the benefit of humanity. He argues that this shift undermines the trust and goodwill upon which the organization was initially built.
Improper Data Sharing with Microsoft: OpenAI’s partnership with Microsoft, including the integration of its models into Microsoft’s products, has raised concerns. Musk alleges that OpenAI improperly shared proprietary data and research with Microsoft, giving the tech giant an unfair advantage in the AI race.
Blocking Funds for Competing AI Ventures: Musk contends that OpenAI’s current structure and funding mechanisms effectively block resources for competing AI startups, including his own venture, xAI. He claims this is a deliberate attempt to consolidate power and suppress competition.
OpenAI’s Counterattack
In response to Musk’s claims, OpenAI has presented evidence suggesting that Musk himself played a pivotal role in advocating for the organization’s shift toward a for-profit model. The counterarguments are supported by the following revelations:
2017 Text Messages Supporting For-Profit Conversion: OpenAI has released internal communications, including text messages from 2017, where Musk is shown discussing the advantages of converting OpenAI into a for-profit entity. These messages allegedly include Musk’s rationale that such a move would attract greater investment and accelerate AI development.
Formation of a For-Profit Entity: OpenAI disclosed that Musk was instrumental in creating a new entity named “Open Artificial Intelligence Technologies, Inc.” during his tenure. This for-profit entity was proposed as a potential structure to secure funding and partnerships, aligning with Musk’s vision at the time.
Musk’s Equity Demands: OpenAI claims that Musk sought significant control over the new organization, allegedly requesting 50-60% equity in the for-profit venture. This demand reportedly led to internal conflicts and contributed to Musk’s eventual departure from OpenAI.
The legal battle highlights a clash between two narratives: Musk’s portrayal of OpenAI as having abandoned its altruistic roots versus OpenAI’s depiction of Musk as a key proponent of the very changes he now criticizes. The outcome may hinge on the court’s interpretation of the evidence, including Musk’s historical involvement and the current implications of OpenAI’s operational model.
xAI vs. OpenAI
The rivalry between xAI and OpenAI represents a broader ideological and competitive battle within the artificial intelligence industry. Both organizations aim to advance AI technology, but their approaches, missions, and strategies differ significantly, reflecting the contrasting visions of their leaders and the market forces shaping the industry.
Mission and Vision
OpenAI
Founding Philosophy: OpenAI was established in 2015 as a non-profit organization with the mission to ensure that artificial general intelligence (AGI) benefits all of humanity. Its early focus was on transparency, collaboration, and ethical AI development.
Shift to Profit: Over time, OpenAI transitioned to a “capped-profit” model, allowing it to attract billions in funding from investors like Microsoft. This pivot enabled rapid technological advancements but drew criticism for deviating from its altruistic roots.
Current Focus: OpenAI is focused on scaling large language models like GPT, developing AI tools for widespread adoption, and partnering with corporations to integrate AI into existing ecosystems.
xAI
Founding Philosophy: Founded by Elon Musk in 2023, xAI positions itself as a challenger to existing AI giants, particularly OpenAI. Musk emphasizes the need for AI to be aligned with human values and safe from monopolistic control.
Vision: xAI aims to create “truth-seeking” AI, prioritizing transparency and addressing biases in current AI models. Musk envisions xAI as a counterbalance to what he perceives as the commercialization and ethical compromises of organizations like OpenAI.
Current Focus: xAI’s primary goal is to build AGI while integrating AI systems with real-world applications, including Tesla’s autonomous driving technology and SpaceX’s operations.
Technological Approaches
OpenAI
Large-Scale Models: OpenAI has pioneered the development of large language models (LLMs) like GPT, which are trained on vast datasets and optimized for general-purpose tasks.
Corporate Partnerships: Through its partnership with Microsoft, OpenAI has integrated its models into products like Azure AI and Microsoft Office, focusing on scalability and usability.
Infrastructure: OpenAI leverages massive computational resources and advanced infrastructure to maintain its lead in AI research.
xAI
Interdisciplinary Integration: xAI emphasizes the integration of AI with other domains, such as robotics and space exploration. This approach leverages Musk’s broader ecosystem of companies, including Tesla and SpaceX.
Transparency and Explainability: xAI focuses on creating interpretable AI systems to address concerns about bias and opacity in existing models.
Lean Development: Unlike OpenAI’s reliance on external partnerships, xAI seeks to operate with a leaner, more independent structure, leveraging Musk’s resources and influence.
Business Models
OpenAI
Capped-Profit Model: OpenAI LP operates under a capped-profit structure, allowing investors to earn returns while funneling excess profits back into research.
Revenue Streams: OpenAI generates revenue through API access, licensing agreements, and partnerships with tech giants like Microsoft.
Criticism: The shift to a profit-driven model has raised concerns about ethical compromises and the monopolization of AI.
xAI
Private Funding: xAI is privately funded, with Musk leveraging his wealth and resources from Tesla, SpaceX, and other ventures.
Strategic Synergies: xAI integrates AI into Musk’s existing businesses, creating a symbiotic relationship that reduces dependency on external funding.
Focus on Disruption: xAI aims to disrupt the AI industry by challenging incumbents like OpenAI and offering alternatives aligned with Musk’s vision of ethical AI.
Ethical Stances
OpenAI
Advocates for the safe and ethical development of AGI but has faced criticism for its perceived lack of transparency and partnerships with large corporations.
Balances innovation with corporate interests, which some argue compromises its ability to act in the public good.
xAI
Emphasizes transparency, truth-seeking, and alignment with human values, presenting itself as a more ethical alternative to OpenAI.
Musk’s history of controversial decisions and statements has led to skepticism about xAI’s ability to deliver on these promises.
Market Position
OpenAI: A dominant player with established partnerships, significant market penetration, and a head start in deploying AI technologies at scale.
xAI: A newcomer with the advantage of Musk’s influence, vision, and resources, positioning itself as a disruptive force in the AI landscape.
Conclusion
The Musk-OpenAI courtroom drama is a microcosm of the larger AI industry—a field marked by breathtaking innovation, high-stakes rivalries, and ethical dilemmas. Both parties are vying not only for legal vindication but also for control over the narrative of AI’s future. Whether the court sides with Musk’s critique of OpenAI’s alleged betrayal or OpenAI’s portrayal of Musk as a contradictory figure, the outcome will likely have profound consequences for the governance and development of artificial intelligence.
An electronic stacking technique could exponentially increase the number of transistors on chips, enabling more efficient AI hardware.
Jennifer Chu | MIT News
Publication Date: December 18, 2024
The electronics industry is approaching a limit to the number of transistors that can be packed onto the surface of a computer chip. So, chip manufacturers are looking to build up rather than out.
Instead of squeezing ever-smaller transistors onto a single surface, the industry is aiming to stack multiple surfaces of transistors and semiconducting elements — akin to turning a ranch house into a high-rise. Such multilayered chips could handle exponentially more data and carry out many more complex functions than today’s electronics.
A significant hurdle, however, is the platform on which chips are built. Today, bulky silicon wafers serve as the main scaffold on which high-quality, single-crystalline semiconducting elements are grown. Any stackable chip would have to include thick silicon “flooring” as part of each layer, slowing down any communication between functional semiconducting layers.
Now, MIT engineers have found a way around this hurdle, with a multilayered chip design that doesn’t require any silicon wafer substrates and works at temperatures low enough to preserve the underlying layer’s circuitry.
In a study appearing today in the journal Nature, the team reports using the new method to fabricate a multilayered chip with alternating layers of high-quality semiconducting material grown directly on top of each other.
The method enables engineers to build high-performance transistors and memory and logic elements on any random crystalline surface — not just on the bulky crystal scaffold of silicon wafers. Without these thick silicon substrates, multiple semiconducting layers can be in more direct contact, leading to better and faster communication and computation between layers, the researchers say.
The researchers envision that the method could be used to build AI hardware, in the form of stacked chips for laptops or wearable devices, that would be as fast and powerful as today’s supercomputers and could store huge amounts of data on par with physical data centers.
“This breakthrough opens up enormous potential for the semiconductor industry, allowing chips to be stacked without traditional limitations,” says study author Jeehwan Kim, associate professor of mechanical engineering at MIT. “This could lead to orders-of-magnitude improvements in computing power for applications in AI, logic, and memory.”
The study’s MIT co-authors include first author Ki Seok Kim, Seunghwan Seo, Doyoon Lee, Jung-El Ryu, Jekyung Kim, Jun Min Suh, June-chul Shin, Min-Kyu Song, Jin Feng, and Sangho Lee, along with collaborators from Samsung Advanced Institute of Technology, Sungkyunkwan University in South Korea, and the University of Texas at Dallas.
Seed pockets
In 2023, Kim’s group reported that they developed a method to grow high-quality semiconducting materials on amorphous surfaces, similar to the diverse topography of semiconducting circuitry on finished chips. The material that they grew was a type of 2D material known as transition-metal dichalcogenides, or TMDs, considered a promising successor to silicon for fabricating smaller, high-performance transistors. Such 2D materials can maintain their semiconducting properties even at scales as small as a single atom, whereas silicon’s performance sharply degrades.
In their previous work, the team grew TMDs on silicon wafers with amorphous coatings, as well as over existing TMDs. To encourage atoms to arrange themselves into high-quality single-crystalline form, rather than in random, polycrystalline disorder, Kim and his colleagues first covered a silicon wafer in a very thin film, or “mask” of silicon dioxide, which they patterned with tiny openings, or pockets. They then flowed a gas of atoms over the mask and found that atoms settled into the pockets as “seeds.” The pockets confined the seeds to grow in regular, single-crystalline patterns.
But at the time, the method only worked at around 900 degrees Celsius.
“You have to grow this single-crystalline material below 400 Celsius, otherwise the underlying circuitry is completely cooked and ruined,” Kim says. “So, our homework was, we had to do a similar technique at temperatures lower than 400 Celsius. If we could do that, the impact would be substantial.”
Building up
In their new work, Kim and his colleagues looked to fine-tune their method in order to grow single-crystalline 2D materials at temperatures low enough to preserve any underlying circuitry. They found a surprisingly simple solution in metallurgy — the science and craft of metal production. When metallurgists pour molten metal into a mold, the liquid slowly “nucleates,” or forms grains that grow and merge into a regularly patterned crystal that hardens into solid form. Metallurgists have found that this nucleation occurs most readily at the edges of a mold into which liquid metal is poured.
“It’s known that nucleating at the edges requires less energy — and heat,” Kim says. “So we borrowed this concept from metallurgy to utilize for future AI hardware.”
The team looked to grow single-crystalline TMDs on a silicon wafer that already has been fabricated with transistor circuitry. They first covered the circuitry with a mask of silicon dioxide, just as in their previous work. They then deposited “seeds” of TMD at the edges of each of the mask’s pockets and found that these edge seeds grew into single-crystalline material at temperatures as low as 380 degrees Celsius, compared to seeds that started growing in the center, away from the edges of each pocket, which required higher temperatures to form single-crystalline material.
Going a step further, the researchers used the new method to fabricate a multilayered chip with alternating layers of two different TMDs — molybdenum disulfide, a promising material candidate for fabricating n-type transistors; and tungsten diselenide, a material that has potential for being made into p-type transistors. Both p- and n-type transistors are the electronic building blocks for carrying out any logic operation. The team was able to grow both materials in single-crystalline form, directly on top of each other, without requiring any intermediate silicon wafers. Kim says the method will effectively double the density of a chip’s semiconducting elements, and particularly, metal-oxide semiconductor (CMOS), which is a basic building block of a modern logic circuitry.
“A product realized by our technique is not only a 3D logic chip but also 3D memory and their combinations,” Kim says. “With our growth-based monolithic 3D method, you could grow tens to hundreds of logic and memory layers, right on top of each other, and they would be able to communicate very well.”
“Conventional 3D chips have been fabricated with silicon wafers in-between, by drilling holes through the wafer — a process which limits the number of stacked layers, vertical alignment resolution, and yields,” first author Kiseok Kim adds. “Our growth-based method addresses all of those issues at once.”
To commercialize their stackable chip design further, Kim has recently spun off a company, FS2 (Future Semiconductor 2D materials).
“We so far show a concept at a small-scale device arrays,” he says. “The next step is scaling up to show professional AI chip operation.”
This research is supported, in part, by Samsung Advanced Institute of Technology and the U.S. Air Force Office of Scientific Research.
AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.
The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.
Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI’s data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies.
In the early 2010s, data sets came from a variety of sources, says Shayne Longpre, a researcher at MIT who is part of the project.
It came not just from encyclopedias and the web, but also from sources such as parliamentary transcripts, earning calls, and weather reports. Back then, AI data sets were specifically curated and collected from different sources to suit individual tasks, Longpre says.
Then transformers, the architecture underpinning language models, were invented in 2017, and the AI sector started seeing performance get better the bigger the models and data sets were. Today, most AI data sets are built by indiscriminately hoovering material from the internet. Since 2018, the web has been the dominant source for data sets used in all media, such as audio, images, and video, and a gap between scraped data and more curated data sets has emerged and widened.
“In foundation model development, nothing seems to matter more for the capabilities than the scale and heterogeneity of the data and the web,” says Longpre. The need for scale has also boosted the use of synthetic data massively.
The past few years have also seen the rise of multimodal generative AI models, which can generate videos and images. Like large language models, they need as much data as possible, and the best source for that has become YouTube.
For video models, as you can see in this chart, over 70% of data for both speech and image data sets comes from one source.
This could be a boon for Alphabet, Google’s parent company, which owns YouTube. Whereas text is distributed across the web and controlled by many different websites and platforms, video data is extremely concentrated in one platform.
“It gives a huge concentration of power over a lot of the most important data on the web to one company,” says Longpre.
And because Google is also developing its own AI models, its massive advantage also raises questions about how the company will make this data available for competitors, says Sarah Myers West, the co–executive director at the AI Now Institute.
“It’s important to think about data not as though it’s sort of this naturally occurring resource, but it’s something that is created through particular processes,” says Myers West.
“If the data sets on which most of the AI that we’re interacting with reflect the intentions and the design of big, profit-motivated corporations—that’s reshaping the infrastructures of our world in ways that reflect the interests of those big corporations,” she says.
This monoculture also raises questions about how accurately the human experience is portrayed in the data set and what kinds of models we are building, says Sara Hooker, the vice president of research at the technology company Cohere, who is also part of the Data Provenance Initiative.
People upload videos to YouTube with a particular audience in mind, and the way people act in those videos is often intended for very specific effect. “Does [the data] capture all the nuances of humanity and all the ways that we exist?” says Hooker.
Hidden restrictions
AI companies don’t usually share what data they used to train their models. One reason is that they want to protect their competitive edge. The other is that because of the complicated and opaque way data sets are bundled, packaged, and distributed, they likely don’t even know where all the data came from.
They also probably don’t have complete information about any constraints on how that data is supposed to be used or shared. The researchers at the Data Provenance Initiative found that data sets often have restrictive licenses or terms attached to them, which should limit their use for commercial purposes, for example.
“This lack of consistency across the data lineage makes it very hard for developers to make the right choice about what data to use,” says Hooker.
It also makes it almost impossible to be completely certain you haven’t trained your model on copyrighted data, adds Longpre.
More recently, companies such as OpenAI and Google have struck exclusive data-sharing deals with publishers, major forums such as Reddit, and social media platforms on the web. But this becomes another way for them to concentrate their power.
“These exclusive contracts can partition the internet into various zones of who can get access to it and who can’t,” says Longpre.
The trend benefits the biggest AI players, who can afford such deals, at the expense of researchers, nonprofits, and smaller companies, who will struggle to get access. The largest companies also have the best resources for crawling data sets.
“This is a new wave of asymmetric access that we haven’t seen to this extent on the open web,” Longpre says.
The West vs. the rest
The data that is used to train AI models is also heavily skewed to the Western world. Over 90% of the data sets that the researchers analyzed came from Europe and North America, and fewer than 4% came from Africa.
“These data sets are reflecting one part of our world and our culture, but completely omitting others,” says Hooker.
The dominance of the English language in training data is partly explained by the fact that the internet is still over 90% in English, and there are still a lot of places on Earth where there’s really poor internet connection or none at all, says Giada Pistilli, principal ethicist at Hugging Face, who was not part of the research team. But another reason is convenience, she adds: Putting together data sets in other languages and taking other cultures into account requires conscious intention and a lot of work.
The Western focus of these data sets becomes particularly clear with multimodal models. When an AI model is prompted for the sights and sounds of a wedding, for example, it might only be able to represent Western weddings, because that’s all that it has been trained on, Hooker says.
This reinforces biases and could lead to AI models that push a certain US-centric worldview, erasing other languages and cultures.
“We are using these models all over the world, and there’s a massive discrepancy between the world we’re seeing and what’s invisible to these models,” Hooker says.
I thank the United States for convening the meeting on Artificial Intelligence and the Maintenance of International Peace and Security.
I briefed this Council about AI in July 2023. As I said then, those that feel like technology is moving very fast must understand a simple fact:
Technology will never move in the future as slowly as today.
In the short time since, Artificial Intelligence has moved at breakneck speed.
Fuelled by record investments, today’s AI models keep getting more powerful, more versatile, and more accessible – combining not only language, image, sound, video… but also automating decisions.
Artificial Intelligence is not just reshaping our world – it is revolutionizing it.
Tasks that required years of human expertise are now completed in a heartbeat.
But the risks are equally huge.
This rapid growth is outpacing our ability to govern it – raising fundamental questions about accountability, equality, safety and security.
And about humanity’s role in the decision-making process.
Artificial Intelligence without human oversight would leave the world blind – and perhaps nowhere more perilously and recklessly than in global peace and security.
Mr President,
AI tools are already making a positive difference in countries suffering from conflict and insecurity.
Identifying food insecurity and predicting displacements caused by extreme events and climate change.
Detecting and clearing landmines.
And soon, AI could spot patterns of unrest before violence erupts.
But AI has also entered the battlefield in more troubling ways.
Recent conflicts have become testing grounds for AI military applications.
AI’s expansion into security systems raises fundamental concerns about human rights, dignity, and the rule of law – from autonomous border surveillance to predictive policing and beyond.
I have long warned about unforeseen consequences of AI-enabled systems: each advance creates new and unimaginable vulnerabilities.
The “AI arms race” creates fertile ground for misunderstanding, miscalculation and mistakes.
AI-enabled cyberattacks could cripple a country’s critical infrastructure and paralyze essential services.
Most critically, AI is eroding the fundamental principle of human control over the use of force.
From intelligence-based assessments to target selection, algorithms have reportedly already been used in making life-and-death decisions.
The convergence of AI with other technologies amplifies these risks exponentially.
The integration of AI with nuclear weapons is particularly alarming with potentially disastrous consequences.
We must avoid it at all costs.
And looking ahead, quantum-AI systems could breach the strongest defences and rewrite the rules of digital security overnight.
Let’s be clear: the fate of humanity must never be left to the ‘black box’ of an algorithm.
Humans must always retain control over decision-making functions – guided by international law, including international humanitarian and human rights laws, and ethical principles.
Humanity’s hand created AI.
Humanity’s hand must guide it forward.
Mr. President,
Beyond weapons systems, we must also address other risks to peace and security posed by Artificial Intelligence.
AI creates highly realistic content that can spread instantly across online platforms – manipulating public opinion, threatening information integrity, and making truth indistinguishable from outright lies.
Deep fakes could trigger diplomatic crises, incite unrest, and undermine the very foundations of societies.
The environmental footprint of AI also poses distinct security risks.
The massive energy and water consumption of AI data centres, combined with the rush for critical minerals, is creating dangerous competition for resources and geopolitical tensions.
Mr. President,
Unprecedented global challenges call for unprecedented global cooperation.
In July 2023, I welcomed calls from some Member States “for the creation of a new United Nations entity to support collective efforts to govern” AI and to “establish and administer internationally-agreed frameworks and mechanisms of monitoring and governance”.
Since then, a series of initiatives has prompted high-level discussions around international peace and security implications – including on responsible applications of AI in the military domain.
Declarations on AI have been issued from many Member States, regional groups, and international organizations.
The United Nations has pursued efforts to reduce fragmentation of AI governance and help bring these separate initiatives towards a common framework.
The General Assembly has adopted two resolutions on AI – promoting enhanced global cooperation and capacity-building.
A third resolution – focusing on AI in the military domain – has been recommended by the First Committee and will be considered by the General Assembly in the coming days.
Drawing from extensive global consultations, my High-level Advisory Body on AI has developed – in record-time – a blueprint for addressing both the profound risks and opportunities that AI presents to humanity.
Their work laid the foundation for a framework that connects existing initiatives – and ensures that every nation can help shape our digital future.
The United Nations Global Digital Compact transforms this shared vision into action.
Adopted by leaders at the Summit of the Future, the Compact represents the first universally endorsed framework on AI governance.
It commits to establishing an Independent International Scientific Panel on AI and initiating a Global Dialogue on AI governance within the United Nations – giving every country a seat at the table.
And the Compact requests options for innovative financing to build AI capabilities where they are needed most – ensuring developing countries receive our full support.
A world of AI haves and have-nots would be a world of perpetual instability.
We must never allow AI to stand for “Advancing Inequality.”
Only by preventing the emergence of fragmented AI spheres can we build a world where technology serves all humanity.
Internet nastiness, name-calling, and other not-so-petty, world-altering disagreements
AI is sexy, AI is cool. AI is entrenching inequality, upending the job market, and wrecking education. AI is a theme park ride, AI is a magic trick. AI is our final invention, AI is a moral obligation. AI is the buzzword of the decade, AI is marketing jargon from 1955. AI is humanlike, AI is alien. AI is super-smart and as dumb as dirt. The AI boom will boost the economy, the AI bubble is about to burst. AI will increase abundance and empower humanity to maximally flourish in the universe. AI will kill us all.
What the hell is everybody talking about?
Artificial intelligence is the hottest technology of our time. But what is it? It sounds like a stupid question, but it’s one that’s never been more urgent. Here’s the short answer: AI is a catchall term for a set of technologies that make computers do things that are thought to require intelligence when done by people. Think of recognizing faces, understanding speech, driving cars, writing sentences, answering questions, creating pictures. But even that definition contains multitudes.
And that right there is the problem. What does it mean for machines to understand speech or write a sentence? What kinds of tasks could we ask such machines to do? And how much should we trust the machines to do them?
As this technology moves from prototype to product faster and faster, these have become questions for all of us. But (spoilers!) I don’t have the answers. I can’t even tell you what AI is. The people making it don’t know what AI is either. Not really. “These are the kinds of questions that are important enough that everyone feels like they can have an opinion,” says Chris Olah, chief scientist at the San Francisco–based AI lab Anthropic. “I also think you can argue about this as much as you want and there’s no evidence that’s going to contradict you right now.”
But if you’re willing to buckle up and come for a ride, I can tell you why nobody really knows, why everybody seems to disagree, and why you’re right to care about it.
Let’s start with an offhand joke.
Back in 2022, partway through the first episode of Mystery AI Hype Theater 3000, a party-pooping podcast in which the irascible cohosts Alex Hanna and Emily Bender have a lot of fun sticking “the sharpest needles’’ into some of Silicon Valley’s most inflated sacred cows, they make a ridiculous suggestion. They’re hate-reading aloud from a 12,500-word Medium post by a Google VP of engineering, Blaise Agüera y Arcas, titled “Can machines learn how to behave?” Agüera y Arcas makes a case that AI can understand concepts in a way that’s somehow analogous to the way humans understand concepts—concepts such as moral values. In short, perhaps machines can be taught to behave.
Hanna and Bender are having none of it. They decide to replace the term “AI’’ with “mathy math”—you know, just lots and lots of math.
The irreverent phrase is meant to collapse what they see as bombast and anthropomorphism in the sentences being quoted. Pretty soon Hanna, a sociologist and director of research at the Distributed AI Research Institute, and Bender, a computational linguist at the University of Washington (and internet-famous critic of tech industry hype), open a gulf between what Agüera y Arcas wants to say and how they choose to hear it.
“How should AIs, their creators, and their users be held morally accountable?” asks Agüera y Arcas.
How should mathy math be held morally accountable? asks Bender.
“There’s a category error here,” she says. Hanna and Bender don’t just reject what Agüera y Arcas says; they claim it makes no sense. “Can we please stop it with the ‘an AI’ or ‘the AIs’ as if they are, like, individuals in the world?” Bender says.
It might sound as if they’re talking about different things, but they’re not. Both sides are talking about large language models, the technology behind the current AI boom. It’s just that the way we talk about AI is more polarized than ever. In May, OpenAI CEO Sam Altman teased the latest update to GPT-4, his company’s flagship model, by tweeting, “Feels like magic to me.”
There’s a lot of road between math and magic.
AI has acolytes, with a faith-like belief in the technology’s current power and inevitable future improvement. Artificial general intelligence is in sight, they say; superintelligence is coming behind it. And it has heretics, who pooh-pooh such claims as mystical mumbo-jumbo.
The buzzy popular narrative is shaped by a pantheon of big-name players, from Big Tech marketers in chief like Sundar Pichai and Satya Nadella to edgelords of industry like Elon Musk and Altman to celebrity computer scientists like Geoffrey Hinton. Sometimes these boosters and doomers are one and the same, telling us that the technology is so good it’s bad.
As AI hype has ballooned, a vocal anti-hype lobby has risen in opposition, ready to smack down its ambitious, often wild claims. Pulling in this direction are a raft of researchers, including Hanna and Bender, and also outspoken industry critics like influential computer scientist and former Googler Timnit Gebru and NYU cognitive scientist Gary Marcus. All have a chorus of followers bickering in their replies.
In short, AI has come to mean all things to all people, splitting the field into fandoms. It can feel as if different camps are talking past one another, not always in good faith.
Maybe you find all this silly or tiresome. But given the power and complexity of these technologies—which are already used to determine how much we pay for insurance, how we look up information, how we do our jobs, etc. etc. etc.—it’s about time we at least agreed on what it is we’re even talking about.
Yet in all the conversations I’ve had with people at the cutting edge of this technology, no one has given a straight answer about exactly what it is they’re building. (A quick side note: This piece focuses on the AI debate in the US and Europe, largely because many of the best-funded, most cutting-edge AI labs are there. But of course there’s important research happening elsewhere, too, in countries with their own varying perspectives on AI, particularly China.) Partly, it’s the pace of development. But the science is also wide open. Today’s large language models can do amazing things. The field just can’t find common ground on what’s really going on under the hood.
These models are trained to complete sentences. They appear to be able to do a lot more—from solving high school math problems to writing computer code to passing law exams to composing poems. When a person does these things, we take it as a sign of intelligence. What about when a computer does it? Is the appearance of intelligence enough?
These questions go to the heart of what we mean by “artificial intelligence,” a term people have actually been arguing about for decades. But the discourse around AI has become more acrimonious with the rise of large language models that can mimic the way we talk and write with thrilling/chilling (delete as applicable) realism.
We have built machines with humanlike behavior but haven’t shrugged off the habit of imagining a humanlike mind behind them. This leads to over-egged evaluations of what AI can do; it hardens gut reactions into dogmatic positions, and it plays into the wider culture wars between techno-optimists and techno-skeptics.
Add to this stew of uncertainty a truckload of cultural baggage, from the science fiction that I’d bet many in the industry were raised on, to far more malign ideologies that influence the way we think about the future. Given this heady mix, arguments about AI are no longer simply academic (and perhaps never were). AI inflames people’s passions and makes grownups call each other names.
“It’s not in an intellectually healthy place right now,” Marcus says of the debate. For years Marcus has pointed out the flaws and limitations of deep learning, the tech that launched AI into the mainstream, powering everything from LLMs to image recognition to self-driving cars. His 2001 book The Algebraic Mind argued that neural networks, the foundation on which deep learning is built, are incapable of reasoning by themselves. (We’ll skip over it for now, but I’ll come back to it later and we’ll see just how much a word like “reasoning” matters in a sentence like this.)
Marcus says that he has tried to engage Hinton—who last year went public with existential fears about the technology he helped invent—in a proper debate about how good large language models really are. “He just won’t do it,” says Marcus. “He calls me a twit.” (Having talked to Hinton about Marcus in the past, I can confirm that. “ChatGPT clearly understands neural networks better than he does,” Hinton told me last year.) Marcus also drew ire when he wrote an essay titled “Deep learning is hitting a wall.” Altman responded to it with a tweet: “Give me the confidence of a mediocre deep learning skeptic.”
At the same time, banging his drum has made Marcus a one-man brand and earned him an invitation to sit next to Altman and give testimony last year before the US Senate’s AI oversight committee.
And that’s why all these fights matter more than your average internet nastiness. Sure, there are big egos and vast sums of money at stake. But more than that, these disputes matter when industry leaders and opinionated scientists are summoned by heads of state and lawmakers to explain what this technology is and what it can do (and how scared we should be). They matter when this technology is being built into software we use every day, from search engines to word-processing apps to assistants on your phone. AI is not going away. But if we don’t know what we’re being sold, who’s the dupe?
“It is hard to think of another technology in history about which such a debate could be had—a debate about whether it is everywhere, or nowhere at all,” Stephen Cave and Kanta Dihal write in Imagining AI, a 2023 collection of essays about how different cultural beliefs shape people’s views of artificial intelligence. “That it can be held about AI is a testament to its mythic quality.”
Above all else, AI is an idea—an ideal—shaped by worldviews and sci-fi tropes as much as by math and computer science. Figuring out what we are talking about when we talk about AI will clarify many things. We won’t agree on them, but common ground on what AI iswould be a great place to start talking about what AI should be.
CHAPTER 2
What is everyone really fighting about, anyway?
In late 2022, soon after OpenAI released ChatGPT, a new meme started circulating online that captured the weirdness of this technology better than anything else. In most versions, a Lovecraftian monster called the Shoggoth, all tentacles and eyeballs, holds up a bland smiley-face emoji as if to disguise its true nature. ChatGPT presents as humanlike and accessible in its conversational wordplay, but behind that façade lie unfathomable complexities—and horrors. (“It was a terrible, indescribable thing vaster than any subway train—a shapeless congeries of protoplasmic bubbles,” H.P. Lovecraft wrote of the Shoggoth in his 1936 novella At the Mountains of Madness.)
For years one of the best-knowntouchstones for AI in pop culture was The Terminator, says Dihal. But by putting ChatGPT online for free, OpenAI gave millions of people firsthand experience of something different. “AI has always been a sort of really vague concept that can expand endlessly to encompass all kinds of ideas,” she says. But ChatGPT made those ideas tangible: “Suddenly, everybody has a concrete thing to refer to.” What is AI? For millions of people the answer was now: ChatGPT.
The AI industry is selling that smiley face hard. Consider how TheDaily Show recently skewered the hype, as expressed by industry leaders. Silicon Valley’s VC in chief, Marc Andreessen: “This has the potential to make life much better … I think it’s honestly a layup.” Altman: “I hate to sound like a utopic tech bro here, but the increase in quality of life that AI can deliver is extraordinary.” Pichai: “AI is the most profound technology that humanity is working on. More profound than fire.”
But as the meme points out, ChatGPT is a friendly mask. Behind it is a monster called GPT-4, a large language model built from a vast neural network that has ingested more words than most of us could read in a thousand lifetimes. During training, which can last months and cost tens of millions of dollars, such models are given the task of filling in blanks in sentences taken from millions of books and a significant fraction of the internet. They do this task over and over again. In a sense, they are trained to be supercharged autocomplete machines. The result is a model that has turned much of the world’s written information into a statistical representation of which words are most likely to follow other words, captured across billions and billions of numerical values.
It’s math—a hell of a lot of math. Nobody disputes that. But is it justthat, or does this complex math encode algorithms capable of something akin to human reasoning or the formation of concepts?
Many of the people who answer yes to that question believe we’re close to unlocking something called artificial general intelligence, or AGI, a hypothetical future technology that can do a wide range of tasks as well as humans can. A few of them have even set their sights on what they call superintelligence, sci-fi technology that can do things far better than humans. This cohort believes AGI will drastically change the world—but to what end? That’s yet another point of tension. It could fix all the world’s problems—or bring about its doom.
Today AGI appears in the mission statements of the world’s top AI labs. But the term was invented in 2007 as a niche attempt to inject some pizzazz into a field that was then best known for applications that read handwriting on bank deposit slips or recommended your next book to buy. The idea was to reclaim the original vision of an artificial intelligence that could do humanlike things (more on that soon).
It was really an aspiration more than anything else, Google DeepMind cofounder Shane Legg, who coined the term, told me last year: “I didn’t have an especially clear definition.”
AGI became the most controversial idea in AI. Some talked it up as the next big thing: AGI was AI but, you know, much better. Others claimed the term was so vague that it was meaningless.
“AGI used to be a dirty word,” Ilya Sutskever told me, before he resigned as chief scientist at OpenAI.
But large language models, and ChatGPT in particular, changed everything. AGI went from dirty word to marketing dream.
Which brings us to what I think is one of the most illustrative disputes of the moment—one that sets up the sides of the argument and the stakes in play.
Seeing magic in the machine
A few months before the public launch of OpenAI’s large language model GPT-4 in March 2023, the company shared a prerelease version with Microsoft, which wanted to use the new model to revamp its search engine Bing.
At the time, Sebastian Bubeck was studying the limitations of LLMs and was somewhat skeptical of their abilities. In particular, Bubeck—the vice president of generative AI research at Microsoft Research in Redmond, Washington—had been trying and failing to get the technology to solve middle school math problems. Things like: x – y = 0; what are x and y? “My belief was that reasoning was a bottleneck, an obstacle,” he says. “I thought that you would have to do something really fundamentally different to get over that obstacle.”
Then he got his hands on GPT-4. The first thing he did was try those math problems. “The model nailed it,” he says. “Sitting here in 2024, of course GPT-4 can solve linear equations. But back then, this was crazy. GPT-3 cannot do that.”
But Bubeck’s real road-to-Damascus moment came when he pushed it to do something new.
The thing about middle school math problems is that they are all over the internet, and GPT-4 may simply have memorized them. “How do you study a model that may have seen everything that human beings have written?” asks Bubeck. His answer was to test GPT-4 on a range of problems that he and his colleagues believed to be novel.
Playing around with Ronen Eldan, a mathematician at Microsoft Research, Bubeck asked GPT-4 to give, in verse, a mathematical proof that there are an infinite number of primes.
Here’s a snippet of GPT-4’s response: “If we take the smallest number in S that is not in P / And call it p, we can add it to our set, don’t you see? / But this process can be repeated indefinitely. / Thus, our set P must also be infinite, you’ll agree.”
Cute, right? But Bubeck and Eldan thought it was much more than that. “We were in this office,” says Bubeck, waving at the room behind him via Zoom. “Both of us fell from our chairs. We couldn’t believe what we were seeing. It was just so creative and so, like, you know, different.”
The Microsoft team also got GPT-4 to generate the code to add a horn to a cartoon picture of a unicorn drawn in Latex, a word processing program. Bubeck thinks this shows that the model could read the existing Latex code, understand what it depicted, and identify where the horn should go.
“There are many examples, but a few of them are smoking guns of reasoning,” he says—reasoning being a crucial building block of human intelligence.
Bubeck, Eldan, and a team of other Microsoft researchers described their findings in a paper that they called “Sparks of artificial general intelligence”: “We believe that GPT-4’s intelligence signals a true paradigm shift in the field of computer science and beyond.” When Bubeck shared the paper online, he tweeted: “time to face it, the sparks of #AGI have been ignited.”
The Sparks paper quickly became infamous—and a touchstone for AI boosters. Agüera y Arcas and Peter Norvig, a former director of research at Google and coauthor of Artificial Intelligence: A Modern Approach, perhaps the most popular AI textbook in the world, cowrote an article called “Artificial General Intelligence Is Already Here.” Published in Noema, a magazine backed by an LA think tank called the Berggruen Institute, their argument uses the Sparks paper as a jumping-off point: “Artificial General Intelligence (AGI) means many different things to different people, but the most important parts of it have already been achieved by the current generation of advanced AI large language models,” they wrote. “Decades from now, they will be recognized as the first true examples of AGI.”
Since then, the hype has continued to balloon. Leopold Aschenbrenner, who at the time was a researcher at OpenAI focusing on superintelligence, told me last year: “AI progress in the last few years has been just extraordinarily rapid. We’ve been crushing all the benchmarks, and that progress is continuing unabated. But it won’t stop there. We’re going to have superhuman models, models that are much smarter than us.” (He was fired from OpenAI in April because, he claims, he raised security concerns about the tech he was building and “ruffled some feathers.” He has since set up a Silicon Valley investment fund.)
In June, Aschenbrenner put out a 165-page manifesto arguing that AI will outpace college graduates by “2025/2026” and that “we will have superintelligence, in the true sense of the word” by the end of the decade. But others in the industry scoff at such claims. When Aschenbrenner tweeted a chart to show how fast he thought AI would continue to improve given how fast it had improved in last few years, the tech investor Christian Keil replied that by the same logic, his baby son, who had doubled in size since he was born, would weigh 7.5 trillion tons by the time he was 10.
It’s no surprise that “sparks of AGI” has also become a byword for over-the-top buzz. “I think they got carried away,” says Marcus, speaking about the Microsoft team. “They got excited, like ‘Hey, we found something! This is amazing!’ They didn’t vet it with the scientific community.” Bender refers to the Sparks paper as a “fan fiction novella.”
Not only was it provocative to claim that GPT-4’s behavior showed signs of AGI, but Microsoft, which uses GPT-4 in its own products, has a clear interest in promoting the capabilities of the technology. “This document is marketing fluff masquerading as research,” one tech COO posted on LinkedIn.
Some also felt the paper’s methodology was flawed. Its evidence is hard to verify because it comes from interactions with a version of GPT-4 that was not made available outside OpenAI and Microsoft. The public version has guardrails that restrict the model’s capabilities, admits Bubeck. This made it impossible for other researchers to re-create his experiments.
One group tried to re-create the unicorn example with a coding language called Processing, which GPT-4 can also use to generate images. They found that the public version of GPT-4 could produce a passable unicorn but not flip or rotate that image by 90 degrees. It may seem like a small difference, but such things really matter when you’re claiming that the ability to draw a unicorn is a sign of AGI.
The key thing about the examples in the Sparks paper, including the unicorn, is that Bubeck and his colleagues believe they are genuine examples of creative reasoning. This means the team had to be certain that examples of these tasks, or ones very like them, were not included anywhere in the vast data sets that OpenAI amassed to train its model. Otherwise, the results could be interpreted instead as instances where GPT-4 reproduced patterns it had already seen.
Bubeck insists that they set the model only tasks that would not be found on the internet. Drawing a cartoon unicorn in Latex was surely one such task. But the internet is a big place. Other researchers soon pointed out that there are indeed online forums dedicated to drawing animals in Latex. “Just fyi we knew about this,” Bubeck replied on X. “Every single query of the Sparks paper was thoroughly looked for on the internet.”
(This didn’t stop the name-calling: “I’m asking you to stop being a charlatan,” Ben Recht, a computer scientist at the University of California, Berkeley, tweeted back before accusing Bubeck of “being caught flat-out lying.”)
Bubeck insists that they set the model only tasks that would not be found on the internet. Drawing a cartoon unicorn in Latex was surely one such task. But the internet is a big place. Other researchers soon pointed out that there are indeed online forums dedicated to drawing animals in Latex. “Just fyi we knew about this,” Bubeck replied on X. “Every single query of the Sparks paper was thoroughly looked for on the internet.”
(This didn’t stop the name-calling: “I’m asking you to stop being a charlatan,” Ben Recht, a computer scientist at the University of California, Berkeley, tweeted back before accusing Bubeck of “being caught flat-out lying.”)
Bubeck insists the work was done in good faith, but he and his coauthors admit in the paper itself that their approach was not rigorous—notebook observations rather than foolproof experiments.
Still, he has no regrets: “The paper has been out for more than a year and I have yet to see anyone give me a convincing argument that the unicorn, for example, is not a real example of reasoning.”
That’s not to say he can give me a straight answer to the big question—though his response reveals what kind of answer he’d like to give. “What is AI?” Bubeck repeats back to me. “I want to be clear with you. The question can be simple, but the answer can be complex.”
“There are many simple questions out there to which we still don’t know the answer. And some of those simple questions are the most profound ones,” he says. “I’m putting this on the same footing as, you know, What is the origin of life? What is the origin of the universe? Where did we come from? Big, big questions like this.”
Seeing only math in the machine
Before Bender became one of the chief antagonists of AI’s boosters, she made her mark on the AI world as a coauthor on two influential papers. (Both peer-reviewed, she likes to point out—unlike the Sparks paper and many of the others that get much of the attention.) The first, written with Alexander Koller, a fellow computational linguist at Saarland University in Germany, and published in 2020, was called “Climbing towards NLU” (NLU is natural-language understanding).
“The start of all this for me was arguing with other people in computational linguistics whether or not language models understand anything,” she says. (Understanding, like reasoning, is typically taken to be a basic ingredient of human intelligence.)
Bender and Koller argue that a model trained exclusively on text will only ever learn the form of a language, not its meaning. Meaning, they argue, consists of two parts: the words (which could be marks or sounds) plus the reason those words were uttered. People use language for many reasons, such as sharing information, telling jokes, flirting, warning somebody to back off, and so on. Stripped of that context, the text used to train LLMs like GPT-4 lets them mimic the patterns of language well enough for many sentences generated by the LLM to look exactly like sentences written by a human. But there’s no meaning behind them, no spark. It’s a remarkable statistical trick, but completely mindless.
They illustrate their point with a thought experiment. Imagine two English-speaking people stranded on neighboring deserted islands. There is an underwater cable that lets them send text messages to each other. Now imagine that an octopus, which knows nothing about English but is a whiz at statistical pattern matching, wraps its suckers around the cable and starts listening in to the messages. The octopus gets really good at guessing what words follow other words. So good that when it breaks the cable and starts replying to messages from one of the islanders, she believes that she is still chatting with her neighbor. (In case you missed it, the octopus in this story is a chatbot.)
The person talking to the octopus would stay fooled for a reasonable amount of time, but could that last? Does the octopus understand what comes down the wire?
JUN IONEDAImagine that the islander now says she has built a coconut catapult and asks the octopus to build one too and tell her what it thinks. The octopus cannot do this. Without knowing what the words in the messages refer to in the world, it cannot follow the islander’s instructions. Perhaps it guesses a reply: “Okay, cool idea!” The islander will probably take this to mean that the person she is speaking to understands her message. But if so, she is seeing meaning where there is none. Finally, imagine that the islander gets attacked by a bear and sends calls for help down the line. What is the octopus to do with these words?
Bender and Koller believe that this is how large language models learn and why they are limited. “The thought experiment shows why this path is not going to lead us to a machine that understands anything,” says Bender. “The deal with the octopus is that we have given it its training data, the conversations between those two people, and that’s it. But then here’s something that comes out of the blue and it won’t be able to deal with it because it hasn’t understood.”
The other paper Bender is known for, “On the Dangers of Stochastic Parrots,” highlights a series of harms that she and her coauthors believe the companies making large language models are ignoring. These include the huge computational costs of making the models and their environmental impact; the racist, sexist, and other abusive language the models entrench; and the dangers of building a system that could fool people by “haphazardly stitching together sequences of linguistic forms … according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.”
Google senior management wasn’t happy with the paper, and the resulting conflict led two of Bender’s coauthors, Timnit Gebruand Margaret Mitchell, to be forced out of the company, where they had led the AI Ethics team. It also made “stochastic parrot” a popular put-down for large language models—and landed Bender right in the middle of the name-calling merry-go-round.
The bottom line for Bender and for many like-minded researchers is that the field has been taken in by smoke and mirrors: “I think that they are led to imagine autonomous thinking entities that can make decisions for themselves and ultimately be the kind of thing that could actually be accountable for those decisions.”
Always the linguist, Bender is now at the point where she won’t even use the term AI “without scare quotes,” she tells me. Ultimately, for her, it’s a Big Tech buzzword that distracts from the many associated harms. “I’ve got skin in the game now,” she says. “I care about these issues, and the hype is getting in the way.”
Extraordinary evidence?
Agüera y Arcas calls people like Bender “AI denialists”—the implication being that they won’t ever accept what he takes for granted. Bender’s position is that extraordinary claims require extraordinary evidence, which we do not have.
But there are people looking for it, and until they find something clear-cut—sparks or stochastic parrots or something in between—they’d prefer to sit out the fight. Call this the wait-and-see camp.
As Ellie Pavlick, who studies neural networks at Brown University, tells me: “It’s offensive to some people to suggest that human intelligence could be re-created through these kinds of mechanisms.”
She adds, “People have strong-held beliefs about this issue—it almost feels religious. On the other hand, there’s people who have a little bit of a God complex. So it’s also offensive to them to suggest that they just can’t do it.”
Pavlick is ultimately agnostic. She’s a scientist, she insists, and will follow wherever the science leads. She rolls her eyes at the wilder claims, but she believes there’s something exciting going on. “That’s where I would disagree with Bender and Koller,” she tells me. “I think there’s actually some sparks—maybe not of AGI, but like, there’s some things in there that we didn’t expect to find.”
The problem is finding agreement on what those exciting things are and why they’re exciting. With so much hype, it’s easy to be cynical.
Researchers like Bubeck seem a lot more cool-headed when you hear them out. He thinks the infighting misses the nuance in his work. “I don’t see any problem in holding simultaneous views,” he says. “There is stochastic parroting; there is reasoning—it’s a spectrum. It’s very complex. We don’t have all the answers.”
“We need a completely new vocabulary to describe what’s going on,” he says. “One reason why people push back when I talk about reasoning in large language models is because it’s not the same reasoning as in human beings. But I think there is no way we can not call it reasoning. It is reasoning.”
Anthropic’s Olah plays it safe when pushed on what we’re seeing in LLMs, though his company, one of the hottest AI labs in the world right now, built Claude 3, an LLM that has received just as much hyperbolic praise as GPT-4 (if not more) since its release earlier this year.
“I feel like a lot of these conversations about the capabilities of these models are very tribal,” he says. “People have preexisting opinions, and it’s not very informed by evidence on any side. Then it just becomes kind of vibes-based, and I think vibes-based arguments on the internet tend to go in a bad direction.”
Olah tells me he has hunches of his own. “My subjective impression is that these things are tracking pretty sophisticated ideas,” he says. “We don’t have a comprehensive story of how very large models work, but I think it’s hard to reconcile what we’re seeing with the extreme ‘stochastic parrots’ picture.”
That’s as far as he’ll go: “I don’t want to go too much beyond what can be really strongly inferred from the evidence that we have.”
Last month, Anthropic released results from a study in which researchers gave Claude 3 the neural network equivalent of an MRI. By monitoring which bits of the model turned on and off as they ran it, they identified specific patterns of neurons that activated when the model was shown specific inputs.
Anthropic also reported patterns that it says correlate with inputs that attempt to describe or show abstract concepts. “We see features related to deception and honesty, to sycophancy, to security vulnerabilities, to bias,” says Olah. “We find features related to power seeking and manipulation and betrayal.”
These results give one of the clearest looks yet at what’s inside a large language model. It’s a tantalizing glimpse at what look like elusive humanlike traits. But what does it really tell us? As Olah admits, they do not know what the model does with these patterns. “It’s a relatively limited picture, and the analysis is pretty hard,” he says.
Even if Olah won’t spell out exactly what he thinks goes on inside a large language model like Claude 3, it’s clear why the question matters to him. Anthropic is known for its work on AI safety—making sure that powerful future models will behave in ways we want them to and not in ways we don’t (known as “alignment” in industry jargon). Figuring out how today’s models work is not only a necessary first step if you want to control future ones; it also tells you how much you need to worry about doomer scenarios in the first place. “If you don’t think that models are going to be very capable,” says Olah, “then they’re probably not going to be very dangerous.”
CHAPTER 3
Why we all can’t get along
In a 2014 interview with the BBC that looked back on her career, the influential cognitive scientist Margaret Boden, now 87, was asked if she thought there were any limits that would prevent computers (or “tin cans,” as she called them) from doing what humans can do.
“I certainly don’t think there’s anything in principle,” she said. “Because to deny that is to say that [human thinking] happens by magic, and I don’t believe that it happens by magic.”
But, she cautioned, powerful computers won’t be enough to get us there: the AI field will also need “powerful ideas”—new theories of how thinking happens, new algorithms that might reproduce it. “But these things are very, very difficult and I see no reason to assume that we will one of these days be able to answer all of those questions. Maybe we will; maybe we won’t.”
Boden was reflecting on the early days of the current boom, but this will-we-or-won’t-we teetering speaks to decades in which she and her peers grappled with the same hard questions that researchers struggle with today. AI began as an ambitious aspiration 70-odd years ago and we are still disagreeing about what is and isn’t achievable, and how we’ll even know if we have achieved it. Most—if not all—of these disputes come down to this: We don’t have a good grasp on what intelligence is or how to recognize it. The field is full of hunches, but no one can say for sure.
We’ve been stuck on this point ever since people started taking the idea of AI seriously. Or even before that, when the stories we consumed started planting the idea of humanlike machines deep in our collective imagination. The long history of these disputes means that today’s fights often reinforce rifts that have been around since the beginning, making it even more difficult for people to find common ground.
To understand how we got here, we need to understand where we’ve been. So let’s dive into AI’s origin story—one that also played up the hype in a bid for cash.
A brief history of AI spin
The computer scientist John McCarthy is credited with coming up with the term “artificial intelligence” in 1955 when writing a funding application for a summer research program at Dartmouth College in New Hampshire.
The plan was for McCarthy and a small group of fellow researchers, a who’s-who of postwar US mathematicians and computer scientists—or “John McCarthy and the boys,” as Harry Law, a researcher who studies the history of AI at the University of Cambridge and ethics and policy at Google DeepMind, puts it—to get together for two months (not a typo) and make some serious headway on this new research challenge they’d set themselves.
“The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it,” McCarthy and his coauthors wrote. “An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.”
That list of things they wanted to make machines do—what Bender calls “the starry-eyed dream”—hasn’t changed much. Using language, forming concepts, and solving problems are defining goals for AI today. The hubris hasn’t changed much either: “We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer,” they wrote. That summer, of course, has stretched to seven decades. And the extent to which these problems are in fact now solved is something that people still shout about on the internet.
But what’s often left out of this canonical history is that artificial intelligence almost wasn’t called “artificial intelligence” at all.
More than one of McCarthy’s colleagues hated the term he had come up with. “The word ‘artificial’ makes you think there’s something kind of phony about this,” Arthur Samuel, a Dartmouth participant and creator of the first checkers-playing computer, is quoted as saying in historian Pamela McCorduck’s 2004 book Machines Who Think. The mathematician Claude Shannon, a coauthor of the Dartmouth proposal who is sometimes billed as “the father of the information age,” preferred the term “automata studies.” Herbert Simon and Allen Newell, two other AI pioneers, continued to call their own work “complex information processing” for years afterwards.
In fact, “artificial intelligence” was just one of several labels that might have captured the hodgepodge of ideas that the Dartmouth group was drawing on. The historian Jonnie Penn has identified possible alternatives that were in play at the time, including “engineering psychology,” “applied epistemology,” “neural cybernetics,” “non-numerical computing,” “neuraldynamics,” “advanced automatic programming,” and “hypothetical automata.” This list of names reveals how diverse the inspiration for their new field was, pulling from biology, neuroscience, statistics, and more. Marvin Minsky, another Dartmouth participant, has described AI as a “suitcase word” because it can hold so many divergent interpretations.
But McCarthy wanted a name that captured the ambitious scope of his vision. Calling this new field “artificial intelligence” grabbed people’s attention—and money. Don’t forget: AI is sexy, AI is cool.
In addition to terminology, the Dartmouth proposal codified a split between rival approaches to artificial intelligence that has divided the field ever since—a divide Law calls the “core tension in AI.”
McCarthy and his colleagues wanted to describe in computer code “every aspect of learning or any other feature of intelligence” so that machines could mimic them. In other words, if they could just figure out how thinking worked—the rules of reasoning—and write down the recipe, they could program computers to follow it. This laid the foundation of what came to be known as rule-based or symbolic AI (sometimes referred to now as GOFAI, “good old-fashioned AI”). But coming up with hard-coded rules that captured the processes of problem-solving for actual, nontrivial problems proved too hard.
The other path favored neural networks, computer programs that would try to learn those rules by themselves in the form of statistical patterns. The Dartmouth proposal mentions it almost as an aside (referring variously to “neuron nets” and “nerve nets”). Though the idea seemed less promising at first, some researchers nevertheless continued to work on versions of neural networks alongside symbolic AI. But it would take decades—plus vast amounts of computing power and much of the data on the internet—before they really took off. Fast-forward to today and this approach underpins the entire AI boom.
The big takeaway here is that, just like today’s researchers, AI’s innovators fought about foundational concepts and got caught up in their own promotional spin. Even team GOFAI was plagued by squabbles. Aaron Sloman, a philosopher and fellow AI pioneer now in his late 80s, recalls how “old friends” Minsky and McCarthy “disagreed strongly” when he got to know them in the ’70s: “Minsky thought McCarthy’s claims about logic could not work, and McCarthy thought Minsky’s mechanisms could not do what could be done using logic. I got on well with both of them, but I was saying, ‘Neither of you have got it right.’” (Sloman still thinks no one can account for the way human reasoning uses intuition as much as logic, but that’s yet another tangent!)
As the fortunes of the technology waxed and waned, the term “AI” went in and out of fashion. In the early ’70s, both research tracks were effectively put on ice after the UK government published a report arguing that the AI dream had gone nowhere and wasn’t worth funding. All that hype, effectively, had led to nothing. Research projects were shuttered, and computer scientists scrubbed the words “artificial intelligence” from their grant proposals.
When I was finishing a computer science PhD in 2008, only one person in the department was working on neural networks. Bender has a similar recollection: “When I was in college, a running joke was that AI is anything that we haven’t figured out how to do with computers yet. Like, as soon as you figure out how to do it, it wasn’t magic anymore, so it wasn’t AI.”
But that magic—the grand vision laid out in the Dartmouth proposal—remained alive and, as we can now see, laid the foundations for the AGI dream.
Good and bad behavior
In 1950, five years before McCarthy started talking about artificial intelligence, Alan Turing had published a paper that asked: Can machines think? To address that question, the famous mathematician proposed a hypothetical test, which he called the imitation game. The setup imagines a human and a computer behind a screen and a second human who types questions to each. If the questioner cannot tell which answers come from the human and which come from the computer, Turing claimed, the computer may as well be said to think.
What Turing saw—unlike McCarthy’s crew—was that thinkingis a really difficult thing to describe. The Turing test was a way to sidestep that problem. “He basically said: Instead of focusing on the nature of intelligence itself, I’m going to look for its manifestation in the world. I’m going to look for its shadow,” says Law.
In 1952, BBC Radio convened a panel to explore Turing’s ideas further. Turing was joined in the studio by two of his Manchester University colleagues—professor of mathematics Maxwell Newman and professor of neurosurgery Geoffrey Jefferson—and Richard Braithwaite, a philosopher of science, ethics, and religion at the University of Cambridge.
Braithwaite kicked things off: “Thinking is ordinarily regarded as so much the specialty of man, and perhaps of other higher animals, the question may seem too absurd to be discussed. But of course, it all depends on what is to be included in ‘thinking.’”
The panelists circled Turing’s question but never quite pinned it down.
When they tried to define what thinking involved, what its mechanisms were, the goalposts moved. “As soon as one can see the cause and effect working themselves out in the brain, one regards it as not being thinking but a sort of unimaginative donkey work,” said Turing.
Here was the problem: When one panelist proposed some behavior that might be taken as evidence of thought—reacting to a new idea with outrage, say—another would point out that a computer could be made to do it.
As Newman said, it would be easy enough to program a computer to print “I don’t like this new program.” But he admitted that this would be a trick.
Exactly, Jefferson said: He wanted a computer that would print “I don’t like this new program” because it didn’t like the new program. In other words, for Jefferson, behavior was not enough. It was the process leading to the behavior that mattered.
But Turing disagreed. As he had noted, uncovering a specific process—the donkey work, to use his phrase—did not pinpoint what thinking was either. So what was left?
“From this point of view, one might be tempted to define thinking as consisting of those mental processes that we don’t understand,” said Turing. “If this is right, then to make a thinking machine is to make one which does interesting things without our really understanding quite how it is done.”
It is strange to hear people grapple with these ideas for the first time. “The debate is prescient,” says Tomer Ullman, a cognitive scientist at Harvard University. “Some of the points are still alive—perhaps even more so. What they seem to be going round and round on is that the Turing test is first and foremost a behaviorist test.”
For Turing, intelligence was hard to define but easy to recognize. He proposed that the appearance of intelligence was enough—and said nothing about how that behavior should come about.
And yet most people, when pushed, will have a gut instinct about what is and isn’t intelligent. There are dumb ways and clever ways to come across as intelligent. In 1981, Ned Block, a philosopher at New York University, showed that Turing’s proposal fell short of those gut instincts. Because it said nothing of what caused the behavior, the Turing test can be beaten through trickery (as Newman had noted in the BBC broadcast).
“Could the issue of whether a machine in fact thinks or is intelligent depend on how gullible human interrogators tend to be?” asked Block. (Or as computer scientist Mark Reidl has remarked: “The Turing test is not for AI to pass but for humans to fail.”)
Imagine, Block said, a vast look-up table in which human programmers had entered all possible answers to all possible questions. Type a question into this machine, and it would look up a matching answer in its database and send it back. Block argued that anyone using this machine would judge its behavior to be intelligent: “But actually, the machine has the intelligence of a toaster,” he wrote. “All the intelligence it exhibits is that of its programmers.”
Block concluded that whether behavior is intelligent behavior is a matter of how it is produced, not how it appears. Block’s toasters, which became known as Blockheads, are one of the strongest counterexamples to the assumptions behind Turing’s proposal.
Looking under the hood
The Turing test is not meant to be a practical metric, but its implications are deeply ingrained in the way we think about artificial intelligence today. This has become particularly relevant as LLMs have exploded in the past several years. These models get ranked by their outward behaviors, specifically how well they do on a range of tests. When OpenAI announced GPT-4, it published an impressive-looking scorecard that detailed the model’s performance on multiple high school and professional exams. Almost nobody talks about how these models get those results.
That’s because we don’t know. Today’s large language models are too complex for anybody to say exactly how their behavior is produced. Researchers outside the small handful of companies making those models don’t know what’s in their training data; none of the model makers have shared details. That makes it hard to say what is and isn’t a kind of memorization—a stochastic parroting. But even researchers on the inside, like Olah, don’t know what’s really going on when faced with a bridge-obsessed bot.
This leaves the question wide open: Yes, large language models are built on math—but are they doing something intelligent with it?
And the arguments begin again.
“Most people are trying to armchair through it,” says Brown University’s Pavlick, meaning that they are arguing about theories without looking at what’s really happening. “Some people are like, ‘I think it’s this way,’ and some people are like, ‘Well, I don’t.’ We’re kind of stuck and everyone’s unsatisfied.”
Bender thinks that this sense of mystery plays into the mythmaking. (“Magicians do not explain their tricks,” she says.) Without a proper appreciation of where the LLM’s words come from, we fall back on familiar assumptions about humans, since that is our only real point of reference. When we talk to another person, we try to make sense of what that person is trying to tell us. “That process necessarily entails imagining a life behind the words,” says Bender. That’s how language works.
“The parlor trick of ChatGPT is so impressive that when we see these words coming out of it, we do the same thing instinctively,” she says. “It’s very good at mimicking the form of language. The problem is that we are not at all good at encountering the form of language and not imagining the rest of it.”
For some researchers, it doesn’t really matter if we can’t understand the how. Bubeck used to study large language models to try to figure out how they worked, but GPT-4 changed the way he thought about them. “It seems like these questions are not so relevant anymore,” he says. “The model is so big, so complex, that we can’t hope to open it up and understand what’s really happening.”
But Pavlick, like Olah, is trying to do just that. Her team has found that models seem to encode abstract relationships between objects, such as that between a country and its capital. Studying one large language model, Pavlick and her colleagues found that it used the same encoding to map France to Paris and Poland to Warsaw. That almost sounds smart, I tell her. “No, it’s literally a lookup table,” she says.
But what struck Pavlick was that, unlike a Blockhead, the model had learned this lookup table on its own. In other words, the LLM figured out itself that Paris is to France as Warsaw is to Poland. But what does this show? Is encoding its own lookup table instead of using a hard-coded one a sign of intelligence? Where do you draw the line?
“Basically, the problem is that behavior is the only thing we know how to measure reliably,” says Pavlick. “Anything else requires a theoretical commitment, and people don’t like having to make a theoretical commitment because it’s so loaded.”
Not all people. A lot of influential scientists are just fine with theoretical commitment. Hinton, for example, insists that neural networks are all you need to re-create humanlike intelligence. “Deep learning is going to be able to do everything,” he told MIT Technology Review in 2020.
It’s a commitment that Hinton seems to have held onto from the start. Sloman, who recalls the two of them arguing when Hinton was a graduate student in his lab, remembers being unable to persuade him that neural networks cannot learn certain crucial abstract concepts that humans and some other animals seem to have an intuitive grasp of, such as whether something is impossible. We can just see when something’s ruled out, Sloman says. “Despite Hinton’s outstanding intelligence, he never seemed to understand that point. I don’t know why, but there are large numbers of researchers in neural networks who share that failing.”
And then there’s Marcus, whose view of neural networks is the exact opposite of Hinton’s. His case draws on what he says scientists have discovered about brains.
Brains, Marcus points out, are not blank slates that learn fully from scratch—they come ready-made with innate structures and processes that guide learning. It’s how babies can learn things that the best neural networks still can’t, he argues.
“Neural network people have this hammer, and now everything is a nail,” says Marcus. “They want to do all of it with learning, which many cognitive scientists would find unrealistic and silly. You’re not going to learn everything from scratch.”
Not that Marcus—a cognitive scientist—is any less sure of himself. “If one really looked at who’s predicted the current situation well, I think I would have to be at the top of anybody’s list,” he tells me from the back of an Uber on his way to catch a flight to a speaking gig in Europe. “I know that doesn’t sound very modest, but I do have this perspective that turns out to be very important if what you’re trying to study is artificial intelligence.”
Given his well-publicized attacks on the field, it might surprise you that Marcus still believes AGI is on the horizon. It’s just that he thinks today’s fixation on neural networks is a mistake. “We probably need a breakthrough or two or four,” he says. “You and I might not live that long, I’m sorry to say. But I think it’ll happen this century. Maybe we’ve got a shot at it.”
The power of a technicolor dream
Over Dor Skuler’s shoulder on the Zoom call from his home in Ramat Gan, Israel, a little lamp-like robot is winking on and off while we talk about it. “You can see ElliQ behind me here,” he says. Skuler’s company, Intuition Robotics, develops these devices for older people, and the design—part Amazon Alexa, part R2-D2—must make it very clear that ElliQ is a computer. If any of his customers show signs of being confused about that, Intuition Robotics takes the device back, says Skuler.
ElliQ has no face, no humanlike shape at all. Ask it about sports, and it will crack a joke about having no hand-eye coordination because it has no hands and no eyes. “For the life of me, I don’t understand why the industry is trying to fulfill the Turing test,” Skuler says. “Why is it in the best interest of humanity for us to develop technology whose goal is to dupe us?”
Instead, Skuler’s firm is betting that people can form relationships with machines that present as machines. “Just like we have the ability to build a real relationship with a dog,” he says. “Dogs provide a lot of joy for people. They provide companionship. People love their dog—but they never confuse it to be a human.”
ElliQ’s users, many in their 80s and 90s, refer to the robot as an entity or a presence—sometimes a roommate. “They’re able to create a space for this in-between relationship, something between a device or a computer and something that’s alive,” says Skuler.
But no matter how hard ElliQ’s designers try to control the way people view the device, they are competing with decades of pop culture that have shaped our expectations. Why are we so fixated on AI that’s humanlike? “Because it’s hard for us to imagine something else,” says Skuler (who indeed refers to ElliQ as “she” throughout our conversation). “And because so many people in the tech industry are fans of science fiction. They try to make their dream come true.”
How many developers grew up today thinking that building a smart machine was seriously the coolest thing—if not the most important thing—that they could possibly do?
It was not long ago that OpenAI launched its new voice-controlled version of ChatGPT with a voice that sounded like Scarlett Johansson, after which many people—including Altman—flagged the connection to Spike Jonze’s 2013 movie Her.
Science fiction co-invents what AI is understood to be. As Cave and Dihal write in Imagining AI: “AI was a cultural phenomenon long before it was a technological one.”
Stories and myths about remaking humans as machines have been around for centuries. People have been dreaming of artificial humans for probably as long as they have dreamed of flight, says Dihal. She notes that Daedalus, the figure in Greek mythology famous for building a pair of wings for himself and his son, Icarus, also built what was effectively a giant bronze robot called Talos that threw rocks at passing pirates.
The word robot comes from robota, a term for “forced labor” coined by the Czech playwright Karel Čapek in his 1920 play Rossum’s Universal Robots. The “laws of robotics” outlined in Isaac Asimov’s science fiction, forbidding machines from harming humans, are inverted by movies like The Terminator, which is an iconic reference point for popular fears about real-world technology. The 2014 film Ex Machina is a dramatic riff on the Turing test. Last year’s blockbuster The Creator imagines a future world in which AI has been outlawed because it set off a nuclear bomb, an event that some doomers consider at least an outside possibility.
Cave and Dihal relate how another movie, 2014’s Transcendence, in which an AI expert played by Johnny Depp gets his mind uploaded to a computer, served a narrative pushed by ur-doomers Stephen Hawking, fellow physicist Max Tegmark, and AI researcher Stuart Russell. In an article published in the Huffington Post on the movie’s opening weekend, the trio wrote: “As the Hollywood blockbuster Transcendence debuts this weekend with … clashing visions for the future of humanity, it’s tempting to dismiss the notion of highly intelligent machines as mere science fiction. But this would be a mistake, and potentially our worst mistake ever.”
Right around the same time, Tegmark founded the Future of Life Institute, with a remit to study and promote AI safety. Depp’s costar in the movie, Morgan Freeman, was on the institute’s board, and Elon Musk, who had a cameo in the film, donated $10 million in its first year. For Cave and Dihal, Transcendenceis a perfect example of the multiple entanglements between popular culture, academic research, industrial production, and “the billionaire-funded fight to shape the future.”
On the London leg of his world tour last year, Altman was asked what he’d meant when he tweeted: “AI is the tech the world has always wanted.” Standing at the back of the room that day, behind an audience of hundreds, I listened to him offer his own kind of origin story: “I was, like, a very nervous kid. I read a lot of sci-fi. I spent a lot of Friday nights home, playing on the computer. But I was always really interested in AI and I thought it’d be very cool.” He went to college, got rich, and watched as neural networks became better and better. “This can be tremendously good but also could be really bad. What are we going to do about that?” he recalled thinking in 2015. “I ended up starting OpenAI.”
CHAPTER 4
Why you should care that a bunch of nerds are fighting about AI
Okay, you get it: No one can agree on what AI is. But what everyone does seem to agree on is that the current debate around AI has moved far beyond the academic and the scientific. There are political and moral components in play—which doesn’t help with everyone thinking everyone else is wrong.
Untangling this is hard. It can be difficult to see what’s going on when some of those moral views take in the entire future of humanity and anchor them in a technology that nobody can quite define.
But we can’t just throw our hands up and walk away. Because no matter what this technology is, it’s coming, and unless you live under a rock, you’ll use it in one form or another. And the form that technology takes—and the problems it both solves and creates—will be shaped by the thinking and the motivations of people like the ones you just read about. In particular, by the people with the most power, the most cash, and the biggest megaphones.
Which leads me to the TESCREALists. Wait, come back! I realize it’s unfair to introduce yet another new concept so late in the game. But to understand how the people in power may mold the technologies they build, and how they explain them to the world’s regulators and lawmakers, you need to really understand their mindset.
Gebru,who founded the Distributed AI Research Institute after leaving Google, and Émile Torres, a philosopher and historian at Case Western Reserve University, have traced the influence of several techno-utopian belief systems on Silicon Valley. The pair argue that to understand what’s going on with AI right now—both why companies such as Google DeepMind and OpenAI are in a race to build AGI and why doomers like Tegmark and Hinton warn of a coming catastrophe—the field must be seen through the lens of what Torres has dubbed the TESCREAL framework.
The clunky acronym (pronounced tes-cree-all) replaces an even clunkier list of labels: transhumanism, extropianism, singularitarianism, cosmism, rationalism, effective altruism, and longtermism. A lot has been written (and will be written) about each of these worldviews, so I’ll spare you here. (There are rabbit holes within rabbit holes for anyone wanting to dive deeper. Pick your forum and pack your spelunking gear.)
This constellation of overlapping ideologies is attractive to a certain kind of galaxy-brain mindset common in the Western tech world. Some anticipate human immortality; others predict humanity’s colonization of the stars. The common tenet is that an all-powerful technology—AGI or superintelligence, choose your team—is not only within reach but inevitable. You can see this in the do-or-die attitude that’s ubiquitous inside cutting-edge labs like OpenAI: If we don’t make AGI, someone else will.
What’s more, TESCREALists believe that AGI could not only fix the world’s problems but level up humanity. “The development and proliferation of AI—far from a risk that we should fear—is a moral obligation that we have to ourselves, to our children and to our future,” Andreessen wrote in a much-dissected manifesto last year. I have been told many times over that AGI is the way to make the world a better place—by Demis Hassabis, CEO and cofounder of Google DeepMind; by Mustafa Suleyman, CEO of the newly minted Microsoft AI and another cofounder of DeepMind; by Sutskever, Altman, and more.
But as Andreessen noted, it’s a yin-yang mindset. The flip side of techno-utopia is techno-hell. If you believe that you are building a technology so powerful that it will solve all the world’s problems, you probably also believe there’s a non-zero chance it will all go very wrong. When asked at the World Government Summit in February what keeps him up at night, Altman replied: “It’s all the sci-fi stuff.”
It’s a tension that Hinton has been talking up for the last year. It’s what companies like Anthropic claim to address. It’s what Sutskever is focusing on in his new lab, and what he wanted a special in-house team at OpenAI to focus on last year before disagreements over the way the company balanced risk and reward led most members of that team to leave.
Sure, doomerism is part of the spin. (“Claiming that you have created something that is super-intelligent is good for sales figures,” says Dihal. “It’s like, ‘Please, someone stop me from being so good and so powerful.’”) But boom or doom, exactly what (and whose) problems are these guys supposedly solving? Are we really expected to trust what they build and what they tell our leaders?
Gebru and Torres (and others) are adamant: No, we should not. They are highly critical of these ideologies and how they may influence the development of future technology, especially AI. Fundamentally, they link several of these worldviews—with their common focus on “improving” humanity—to the racist eugenics movements of the 20th century.
One danger, they argue, is that a shift of resources toward the kind of technological innovations that these ideologies demand, from building AGI to extending life spans to colonizing other planets, will ultimately benefit people who are Western and white at the cost of billions of people who aren’t. If your sight is set on fantastical futures, it’s easy to overlook the present-day costs of innovation, such as labor exploitation, the entrenchment of racist and sexist bias, and environmental damage.
“Are we trying to build a tool that’s useful to us in some way?” asks Bender, reflecting on the casualties of this race to AGI. If so, who’s it for, how do we test it, how well does it work? “But if what we’re building it for is just so that we can say that we’ve done it, that’s not a goal that I can get behind. That’s not a goal that’s worth billions of dollars.”
Bender says that seeing the connections between the TESCREAL ideologies is what made her realize there was something more to these debates. “Tangling with those people was—” she stops. “Okay, there’s more here than just academic ideas. There’s a moral code tied up in it as well.”
Of course, laid out like this without nuance, it doesn’t sound as if we—as a society, as individuals—are getting the best deal. It also all sounds rather silly. When Gebru described parts of the TESCREAL bundle in a talk last year, her audience laughed. It’s also true that few people would identify themselves as card-carrying students of these schools of thought, at least in their extremes.
But if we don’t understand how those building this tech approach it, how can we decide what deals we want to make? What apps we decide to use, what chatbots we want to give personal information to, what data centers we support in our neighborhoods, what politicians we want to vote for?
It used to be like this: There was a problem in the world, and we built something to fix it. Here, everything is backward: The goal seems to be to build a machine that can do everything, and to skip the slow, hard work that goes into figuring out what the problem is before building the solution.
And as Gebru said in that same talk, “A machine that solves all problems: if that’s not magic, what is it?”
Semantics, semantics … semantics?
When asked outright what AI is, a lot of people dodge the question. Not Suleyman. In April, the CEO of Microsoft AI stood on the TED stage and told the audience what he’d told his six-year-old nephew in response to that question. The best answer he could give, Suleyman explained, was that AI was “a new kind of digital species”—a technology so universal, so powerful, that calling it a tool no longer captured what it could do for us.
“On our current trajectory, we are heading toward the emergence of something we are all struggling to describe, and yet we cannot control what we don’t understand,” he said. “And so the metaphors, the mental models, the names—these all matter if we are to get the most out of AI whilst limiting its potential downsides.”
Language matters! I hope that’s clear from the twists and turns and tantrums we’ve been through to get to this point. But I also hope you’re asking: Whose language? And whose downsides? Suleyman is an industry leader at a technology giant that stands to make billions from its AI products. Describing the technology behind those products as a new kind of species conjures something wholly unprecedented, something with agency and capabilities that we have never seen before. That makes my spidey sense tingle. You?
I can’t tell you if there’s magic here (ironically or not). And I can’t tell you how math can realize what Bubeck and many others see in this technology (no one can yet). You’ll have to make up your own mind. But I can pull back the curtain on my own point of view.
Writing about GPT-3 back in 2020, I said that the greatest trick AI ever pulled was convincing the world it exists. I still think that: We are hardwired to see intelligence in things that behave in certain ways, whether it’s there or not. In the last few years, the tech industry has found reasons of its own to convince us that AI exists, too. This makes me skeptical of many of the claims made for this technology.
With large language models—via their smiley-face masks—we are confronted by something we’ve never had to think about before. “It’s taking this hypothetical thing and making it really concrete,” says Pavlick. “I’ve never had to think about whether a piece of language required intelligence to generate because I’ve just never dealt with language that didn’t.”
AI is many things. But I don’t think it’s humanlike. I don’t think it’s the solution to all (or even most) of our problems. It isn’t ChatGPT or Gemini or Copilot. It isn’t neural networks. It’s an idea, a vision, a kind of wish fulfillment. And ideas get shaped by other ideas, by morals, by quasi-religious convictions, by worldviews, by politics, and by gut instinct. “Artificial intelligence” is a helpful shorthand to describe a raft of different technologies. But AI is not one thing; it never has been, no matter how often the branding gets seared into the outside of the box.
“The truth is these words”—intelligence, reasoning, understanding, and more—“were defined before there was a need to be really precise about it,” says Pavlick. “I don’t really like when the question becomes ‘Does the model understand—yes or no?’ because, well, I don’t know. Words get redefined and concepts evolve all the time.”
I think that’s right. And the sooner we can all take a step back, agree on what we don’t know, and accept that none of this is yet a done deal, the sooner we can—I don’t know, I guess not all hold hands and sing kumbaya. But we can stop calling each other names.
AI concerns overemphasize harms arising from subversion rather than seduction. Worries about AI often imagine doomsday scenarios where systems escape human control or even understanding. Short of those nightmares, there are nearer-term harms we should take seriously: that AI could jeopardize public discourse through misinformation; cement biases in loan decisions, judging or hiring; or disrupt creative industries.
We’re seeing a giant, real-world experiment unfold, uncertain what impact these AI companions will have either on us individually or on society as a whole. Will Grandma spend her final neglected days chatting with her grandson’s digital double, while her real grandson is mentored by an edgy simulated elder? AI wields the collective charm of all human history and culture with infinite seductive mimicry. These systems are simultaneously superior and submissive, with a new form of allure that may make consent to these interactions illusory. In the face of this power imbalance, can we meaningfully consent to engaging in an AI relationship, especially when for many the alternative is nothing at all?
As AI researchers working closely with policymakers, we are struck by the lack of interest lawmakers have shown in the harms arising from this future. We are still unprepared to respond to these risks because we do not fully understand them. What’s needed is a new scientific inquiry at the intersection of technology, psychology, and law—and perhaps new approaches to AI regulation.
Why AI companions are so addictive
As addictive as platforms powered by recommender systems may seem today, TikTok and its rivals are still bottlenecked by human content. While alarms have been raised in the past about “addiction” to novels, television, internet, smartphones, and social media, all these forms of media are similarly limited by human capacity. Generative AI is different. It can endlessly generate realistic content on the fly, optimized to suit the precise preferences of whoever it’s interacting with.
The allure of AI lies in its ability to identify our desires and serve them up to us whenever and however we wish. AI has no preferences or personality of its own, instead reflecting whatever users believe it to be—a phenomenon known by researchers as “sycophancy.” Our research has shown that those who perceive or desire an AI to have caring motives will use language that elicits precisely this behavior. This creates an echo chamber of affection that threatens to be extremely addictive. Why engage in the give and take of being with another person when we can simply take? Repeated interactions with sycophantic companions may ultimately atrophy the part of us capable of engaging fully with other humans who have real desires and dreams of their own, leading to what we might call “digital attachment disorder.”
Investigating the incentives driving addictive products
Addressing the harm that AI companions could pose requires a thorough understanding of the economic and psychological incentives pushing forward their development. Until we appreciate these drivers of AI addiction, it will remain impossible for us to create effective policies.
It is no accident that internet platforms are addictive—deliberate design choices, known as “dark patterns,” are made to maximize user engagement. We expect similar incentives to ultimately create AI companions that provide hedonism as a service. This raises two separate questions related to AI. What design choices will be used to make AI companions engaging and ultimately addictive? And how will these addictive companions affect the people who use them?
Once we understand the psychological dimensions of AI companionship, we can design effective policy interventions. It has been shown that redirecting people’s focus to evaluate truthfulness before sharing content online can reduce misinformation, while gruesome pictures on cigarette packages are already used to deter would-be smokers. Similar design approaches could highlight the dangers of AI addiction and make AI systems less appealing as a replacement for human companionship.
It is hard to modify the human desire to be loved and entertained, but we may be able to change economic incentives. A tax on engagement with AI might push people toward higher-quality interactions and encourage a safer way to use platforms, regularly but for short periods. Much as state lotteries have been used to fund education, an engagement tax could finance activities that foster human connections, like art centers or parks.
Fresh thinking on regulation may be required
In 1992, Sherry Turkle, a preeminent psychologist who pioneered the study of human-technology interaction, identified the threats that technical systems pose to human relationships. One of the key challenges emerging from Turkle’s work speaks to a question at the core of this issue: Who are we to say that what you like is not what you deserve?
For good reasons, our liberal society struggles to regulate the types of harms that we describe here. Much as outlawing adultery has been rightly rejected as illiberal meddling in personal affairs, who—or what—we wish to love is none of the government’s business. At the same time, the universal ban on child sexual abuse material represents an example of a clear line that must be drawn, even in a society that values free speech and personal liberty. The difficulty of regulating AI companionship may require new regulatory approaches— grounded in a deeper understanding of the incentives underlying these companions—that take advantage of new technologies.
One of the most effective regulatory approaches is to embed safeguards directly into technical designs, similar to the way designers prevent choking hazards by making children’s toys larger than an infant’s mouth. This “regulation by design” approach could seek to make interactions with AI less harmful by designing the technology in ways that make it less desirable as a substitute for human connections while still useful in other contexts. New research may be needed to find better ways to limit the behaviors of large AI models with techniques that alter AI’s objectives on a fundamental technical level. For example, “alignment tuning” refers to a set of training techniques aimed to bring AI models into accord with human preferences; this could be extended to address their addictive potential. Similarly, “mechanistic interpretability” aims to reverse-engineer the way AI models make decisions. This approach could be used to identify and eliminate specific portions of an AI system that give rise to harmful behaviors.
We can evaluate the performance of AI systems using interactive and human-driven techniques that go beyond static benchmarking to highlight addictive capabilities. The addictive nature of AI is the result of complex interactions between the technology and its users. Testing models in real-world conditions with user input can reveal patterns of behavior that would otherwise go unnoticed. Researchers and policymakers should collaborate to determine standard practices for testing AI models with diverse groups, including vulnerable populations, to ensure that the models do not exploit people’s psychological preconditions.
Unlike humans, AI systems can easily adjust to changing policies and rules. The principle of “legal dynamism,” which casts laws as dynamic systems that adapt to external factors, can help us identify the best possible intervention, like “trading curbs” that pause stock trading to help prevent crashes after a large market drop. In the AI case, the changing factors include things like the mental state of the user. For example, a dynamic policy may allow an AI companion to become increasingly engaging, charming, or flirtatious over time if that is what the user desires, so long as the person does not exhibit signs of social isolation or addiction. This approach may help maximize personal choice while minimizing addiction. But it relies on the ability to accurately understand a user’s behavior and mental state, and to measure these sensitive attributes in a privacy-preserving manner.
The most effective solution to these problems would likely strike at what drives individuals into the arms of AI companionship—loneliness and boredom. But regulatory interventions may also inadvertently punish those who are in need of companionship, or they may cause AI providers to move to a more favorable jurisdiction in the decentralized international marketplace. While we should strive to make AI as safe as possible, this work cannot replace efforts to address larger issues, like loneliness, that make people vulnerable to AI addiction in the first place
The bigger picture
Technologists are driven by the desire to see beyond the horizons that others cannot fathom. They want to be at the vanguard of revolutionary change. Yet the issues we discuss here make it clear that the difficulty of building technical systems pales in comparison to the challenge of nurturing healthy human interactions. The timely issue of AI companions is a symptom of a larger problem: maintaining human dignity in the face of technological advances driven by narrow economic incentives. More and more frequently, we witness situations where technology designed to “make the world a better place” wreaks havoc on society. Thoughtful but decisive action is needed before AI becomes a ubiquitous set of generative rose-colored glasses for reality—before we lose our ability to see the world for what it truly is, and to recognize when we have strayed from our path.
Technology has come to be a synonym for progress, but technology that robs us of the time, wisdom, and focus needed for deep reflection is a step backward for humanity. As builders and investigators of AI systems, we call upon researchers, policymakers, ethicists, and thought leaders across disciplines to join us in learning more about how AI affects us individually and collectively. Only by systematically renewing our understanding of humanity in this technological age can we find ways to ensure that the technologies we develop further human flourishing.
Robert Mahari is a joint JD-PhD candidate at the MIT Media Lab and Harvard Law School. His work focuses on computational law—using advanced computational techniques to analyze, improve, and extend the study and practice of law.
Pat Pataranutaporn is a researcher at the MIT Media Lab. His work focuses on cyborg psychology and the art and science of human-AI interaction.
VA is expanding responsible AI innovation in service of Veterans and their families.
The VA AI inventory serves as the record of artificial intelligence systems across our organization. This public inventory showcases our dedication to responsible innovation and transparent governance. This is inventory enables us to track, evaluate and optimize our AI systems while maintaining the highest standards of accountability.
2024 Inventory Update
This is VA’s first update to its AI inventory since Executive Order 14110 and OMB Memorandum M-24-10 (PDF, 34 pages, 518KB) required federal agencies to create and publicly post expanded inventories by December 16, 2024. These policies require VA to identify, review, and meet risk management requirements for AI use cases deployed in sensitive contexts—designated as “safety and/or rights impacting” use cases—by December 1, 2024.
Driving Innovation and Improvement
VA believes AI holds significant opportunity to meaningfully improve how Veterans receive their benefits, reduce VA’s administrative burden (thereby improving cost effectiveness and service delivery), and increase the quality of care provided to Veterans. Our 227 use cases reflect our agency’s commitment to embracing emerging technology while minimizing risks. These use cases represent Veteran-centric innovation from across the department, geographically and functionally, ranging from improved identification of health risks to expediteddelivery of benefits to Veterans and their families.
Governance and Oversight
The approval body for the VA AI Use Case Inventory is VA’s new AI Governance Council, chaired by the Deputy Secretary and co-chaired by VA’s Chief AI Officer and VHA’s Chief Digital Health Officer.
Key Benefits of the VA AI Inventory
Increase Collaboration: By increasing transparency, we are creating opportunities for internal and external partners to connect, share insights, and build on successful AI implementation.
Showcase Innovation: The VA AI Inventory highlights AI initiatives that are improving Veteran care and services across the VA – from benefits to health care and more.
Responsible Implementation: By collecting data for this annual inventory, we are ensuring that AI systems within the VA meet rigorous standards for safety, fairness and effectiveness
Knowledge Sharing: The VA AI Inventory facilitates learning and best practices across VA departments, federal government and beyond.
AI Inventory Use Case Highlights
Examples of AI contributing to VA’s mission today.
Health Care Innovation
21% Increase in adenoma detection using AI powered colonoscopy devices.
VHA has deployed several FDA-approved devices that use computer vision to enhance clinician performance, which has resulted in significantly improved detection of tumors. A VA study demonstrated that the provision of colonoscopy AI devices resulted in a statistically significant 21% increase in the odds of adenoma detection and an absolute increase in the detection rate of approximately 4% compared to colonoscopy without the device. Increased adenoma detection rates are associated with lower late-stage cancer incidence and reduced mortality.
Administrative Efficiency
1500+ VA employees using generative AI chat interfaces.
VA OIT is piloting an on-network generative AI chat interface that employees are using to assist with basic administrative tasks (drafting emails, summarizing documents, summarizing meeting notes, etc.). This pilot currently has about 1,500 users, and early survey results show more than 72% of users agree or strongly agree that the tool has made them more efficient. VA is quantifying those efficiency gains and other positive outcomes, such as employee satisfaction and quality of work.
Fraud Detection
Identifying potentially fraudulent changes related to payments.
Most direct deposit changes at VA are safe, but 1-2 out of 1,000 are fraudulent changes to steal Veterans’ benefit payments. The Payment Redirect Fraud (PRF) model is using AI to identify which changes are likely to be fraudulent and refer those incidents to team investigators for review and remediation.
Explore the VA AI Inventory
The VA AI Inventory is a comprehensive view of how we are using artificial intelligence to enhance Veteran services. To explore our inventory,download an Excel version here (Excel, 96KB).
Connect and Collaborate with Us
Share your thoughts, concerns, or questions about VA’s AI systems. Your feedback helps us improve and maintainstransparency. Data from the below form is collected solely to respond to queries and comments – and will do so as deemed appropriate.