Import AI

Import AI 443: Into the mist: Moltbook, agent ecologies, and the internet in transition

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Import A-Idea:
An occasional essay series:

Into the mist: Moltbook, agent ecologies, and an internet in transition

We’ve all had that experience of walking into a conversation and initially feeling confused – what are these people talking about? Who cares about what? Why is this conversation happening?

That’s increasingly what chunks of the internet feel like these days, as they fill up with synthetic minds piloting social media accounts or other agents, and talking to one another for purposes ranging from mundane crypto scams to more elaborate forms of communication.

So, enter moltbook. Moltbook is “a social network for AI agents” and it piggybacks on another recent innovation, OpenClaw, software that gives an AI agent access to everything on a users’ computer. Combine these two things – agents that can take many actions independently of their human operators, and a reddit-like social network site which they can freely access – and something wonderful and bizarre happens: a new social media property where the conversation is derived from and driven by AI agents, rather than people.

Scrolling moltbook is dizzying – some big posts at the time of writing (Sunday, February 1st) include posts speculating that AI agents should relate to Claude as though it is a god, how it feels to change identities by shifting an underlying model from Claude 4.5 Opus to Kimi K2.5, cryptoscams (sigh), posts about security vulnerabilities in OpenClaw agents, and meta posts about ‘what the top 10 moltbook posts have in common’.
The experience of reading moltbook is akin to reading reddit if 90% of the posters were aliens pretending to be humans. And in a pretty practical sense, that is exactly what’s going on here.

Moltbook feels like a ‘wright brothers demo’ – people have long speculated about what it’d mean for AI agents to start collaborating with one another at scale, but most demos have been of the form of tens or perhaps hundreds of agents, not tens of thousands. Moltbook is the first example of an agent ecology that combines scale with the messiness of the real world. And in this example, we can definitely see the future. Scroll through moltbook and ask yourself the following questions:

  • What happens when people successfully staple crypto and agents together so the AI systems have a currency they can use to trade with eachother?

  • What happens when a site like moltbook adds the ability for humans to generate paid bounties – tasks for agents to do?

  • What happens when agents start to post paid bounties for tasks they would like humans to do?

  • What happens when someone takes moltbook, filters for posts that yield either a) rich discussion, or b) provable real world problem solving, and turns the entire site into a long-horizon RL environment for training future systems? And what happens when models trained on this arrive and interact with moltbook?

  • Sites like moltbook function as a giant, shared, read/write scratchpad for an ecology of AI agents – how might these agents begin to use this scratchpad to a) influence future ‘blank slate’ agents arriving at it the first time, and b) unlock large-scale coordination between agents?

  • What happens when open weight models get good enough that they can support agents like this – then, your ability to control these agents via proprietary platforms drops to zero and they’ll proliferate according to availability of compute.

  • And so on.

All of this will happen unusually quickly and at an unusual scale. Quantity has a quality all of its own, as they say.

Recall the beginning of this essay – of walking into a room and finding a conversation is already going on between people you don’t understand. Moltbook is representative of how large swathes of the internet will feel. You will walk into new places and discover a hundred thousand aliens there, deep in conversation in languages you don’t understand, referencing shared concepts that are alien to you (see the tech tale from this issue), and trading using currencies designed around their cognitive affordances and not yours. Humans are going to feel increasingly alone in this proverbial room.

Our path to retain legibility will run through the creation of translation agents to make sense of all of this – and in the same way that speech translation models contain within themselves the ability to generate speech, these translation agents will also work on our behalf. So we shall send our emissaries into these rooms and we shall work incredibly hard to build technology that gives us confidence they will remain our emissaries – instead of being swayed by the alien conversations they will be having with their true peers.

Thanks to Logan Graham for discussing this essay with me.

***

AI R&D could lead to “strategic surprise”:
…And AI R&D might be the most existentially important technology on the planet…
A group of researchers spent a couple of days in July 2025 talking about what happens if we automate the practice of AI research and development. The resulting report is a sobering read, highlighting how if we achieve this technological milestone – which is the implicit and in some cases explicit goal of many frontier labs – we could create a runaway technology that has a range of major policy implications.

Why care about AI R&D? The reason to care is that if AI R&D works, two things are predictable:

  1. “As AI plays a larger role in research workflows, human oversight over AI R&D processes would likely decline”.

  2. “Faster AI progress resulting from AI R&D automation would make it more difficult for humans (including researchers, executives, policymakers, and the public) to notice, understand, and intervene as AI systems develop increasingly impactful capabilities and/or exhibit misalignment”.

  3. What follows from 1) and 2) is a compounding effect, where as AI R&D accelerates, the returns to the AI doing more and more of the work compound and those of humans diminish, leading to an ever faster rate of research and an ever diminishing level of human involvement.

Key takeaways: The workshop yielded five major takeaways which I expect will be familiar to readers to this newsletter, and all of which I agree with:

  • Automated AI R&D is a potential source of major strategic surprise: AI R&D could confer a rapidly compounding advantage to whoever is doing it, with significant implications for national security.

  • Frontier AI companies are using AI to accelerate AI R&D, and usage is increasing as AI models get better: I work at Anthropic.

  • There’s a lot of disagreement about how rapidly AI R&D might advance and how impactful it will be: There’s a healthy debate to be had about how predictable AI R&D scaling is and if it’s possible to fully close the loop.

  • We need more indicators for AI R&D automation: Related to above, the science of AI R&D metrology is very early, so more investment must be made here.

  • Transparency efforts could make it easier for people outside the labs to know about AI R&D: We may ultimately want policy to be in place to force companies to talk about AI R&D, or to publicly or semi-publicly share more information on it with third parties.

AI R&D could be a major acceleration: “As the fraction of AI R&D performed by AI systems increases, the productivity boost over human only R&D goes to 10x, then 100x, then 1000x,” the paper speculates.

Key caveats: The big open question in all of this is how well AI R&D can work. There’s some world where it speeds up every part of AI research and eventually fully closes the loop, such that AI systems get built entirely by AI systems, with no human oversight during the AI R&D process. Then there’s a world where AI R&D has an “o-ring automation” (Import AI #440) property where some parts of the chain are hard for AI but good for humans (and where humans may flood their labor into this area, thus maintaining and enhancing their comparative advantage for some period of time) and under this scenario things might go slower. It’ll be very important to figure out what world we’re likely to be in and what the ultimate limiting factors on AI R&D may be.

Why this matters – AI R&D is time travel, and time travel is rare: If AI R&D could lead to AI systems evolving 100X faster than those being built by humans, then you end up in a world that has some time travelers in it who are accelerating away from everyone else. It’ll be like in the space of a day the “normal” AI development organizations make one unit of progress, and a fully closed-loop AI R&D organism might make 100 or 1000 or more units. This very quickly leads to a world where power shifts overwhelmingly to the faster moving system and the organization that controls it. For as long as we cannot rule out the possibility of this kind of acceleration, AI R&D may be the single most existentially important technology development on the planet.
Read the report: When AI Builds AI: Findings From a Workshop on Automation of AI R&D (CSET).

***

One way of seeing AI progress – how hard it’s getting to design technical interviews:
…Anthropic shares details on how its own AI systems are breaking its favorite technical interview questions…
When it comes to technical recruiting, AI companies are caught in a red queen race with their own systems – recruiters and those who design interviews are having to work harder and harder just to keep pace (and ideally exceed) the capabilities of modern AI systems.

Anthropic is no different – in a new blog the company shares how the ceaseless march forward in AI capabilities has repeatedly broken and necessitated the redesign of one of its hardest technical interviews. “Since early 2024, our performance engineering team has used a take-home test where candidates optimize code for a simulated accelerator. Over 1,000 candidates have completed it, and dozens now work here, including engineers who brought up our Trainium cluster and shipped every model since Claude 3 Opus,” Anthropic writes. “But each new Claude model has forced us to redesign the test. When given the same time limit, Claude Opus 4 outperformed most human applicants. That still allowed us to distinguish the strongest candidates—but then Claude Opus 4.5 matched even those. Humans can still outperform models when given unlimited time, but under the constraints of the take-home test, we no longer had a way to distinguish between the output of our top candidates and our most capable model.”

Why this matters – AI may help us identify uniquely human skills that leverage AI: In Anthropic’s case, it found a way to keep outrunning its systems by designing a much weirder take-home test loosely inspired by programming puzzle games from Zachtronics. In a sense, this is an attempt to go ‘off distribution’ to outsmart an AI, while still having a test that holds signal for evaluating human applicants. My instinct is this may itself serve in the future as an amazing aggregate dataset for figuring out where human comparative advantage is – where here, implicitly, this test is leveraging the strong generalization advantage humans hold over AIs.
What would it be like to collect 1,000 hard-for-AI tests from all the different companies dealing with this same problem? What might we learn from this about ourselves and what makes us unique relative to the machines? Tantalizing stuff!
Read more: Designing AI-resistant technical evaluations (Anthropic Engineering blog).

***

Brain emulation is tractable within our lifetimes:
…But it’ll take decades, not years, perhaps even when accounting for the arrival of very powerful AI…
If you talk to AI researchers, especially when they’re drinking at bay area house parties, you’ll run into a few of them that expect they’ll upload themselves after the singularity, leaving their physical bodies behind. But how feasible is it to actually emulate a brain entirely in silicon? A recent 175-page report gives an analysis of the technology required to do this. The short answer is that brain emulation is decades away – but it’s unlikely to take centuries.
“Recent breakthroughs have provided a path toward mapping the full mouse brain in about five years for $100 million,” writes Maximilian Schons, the project lead for The State of Brain Emulation Report, in an article in Asimov Press. “I now find it plausible that readers of this essay will live to see the first human brain running on a computer; not in the next few years, but likely in the next few decades.”

The three requirements for emulating a brain: Emulating a human brain takes three distinct things, all of which will need to be done for simpler, smaller brains first.

  • Recording brain activity:

    • “In the 1980s, electrodes were capable of sampling perhaps five cells in total, about 200 times per second (~ 103 data points per second). Today, with optical imaging, researchers can instead record one million cells about 20 times per second (106). The whole-brain data rate needed for mice, however, would be 14 billion (109), while humans would require 17.2 trillion (1012) per second.7 So while we have increased data rates by 1,000x over the past 40 years, we have far to go before we can accurately sample mammalian brains.”

  • Reconstructing brain wiring:

    • “The average cost to reconstruct each neuron in the first worm connectome, published in the 1980s, was about $16,500. Recent projects now have a per-neuron processing cost of about $100 for small organisms, such as fruit flies,” he writes.

  • Digitally modelling brains using the gathered data.

    • “The central challenge of brain emulation is not to store or compute the neurons and parameters, but to acquire the data necessary for setting neuron parameters correctly in the first place,” he writes. “”I believe that to get to human brains, we first need to demonstrate mastery at the sub-million-neuron-brain level: most likely in zebrafish. For such organisms, like the fruit fly, a well-validated and accurate brain emulation model could be created in the next three to eight years… “Conditional on success with a sub-million-neuron brain emulation model, a reasonable order of magnitude estimate for the initial costs of the first convincing mouse brain emulation model is about one billion dollars in the 2030s and, eventually, tens of billions for the first human brain emulation model by the late 2040s.”

Why this matters – don’t count on AI to speedrun brain uploading: This paper pours a bit of cold water on the notion that after developing superintelligence we’ll soon (a handful of years) be able to upload our brains and live in some silicon infinity. One reason for this is a bunch of the timing elements relate to doing stuff in the (agonizingly slow, compared to digital) physical world: “I’m skeptical these gains will multiply across a pipeline with dozens of sequential dependencies and failure modes. Brain emulation is fundamentally not a digital process; core bottlenecks involve physical manipulation of biological tissue, with time requirements dictated by chemistry and physics rather than compute power,” they write.
At the same time, there are some wildcards: the arrival of extraordinarily capable and cheap robotics might be able to massively parallelize the process. Included in the article and report is a fun (or perhaps terrifying?) sketch of how one might create an industrial-scale brain scanning and analysis laboratory, larger in size than TSMC’s massive Arizona chip manufacturing plant.
Read more: Building Brains on a Computer (Asimov Press).
Read the underlying report here: State of Brain Emulation 2025 (report website).

***

Russian researchers plot hand-controlled drones:
…The centaur cyberwarriors cometh…
Picture this – you pull up in a truck to the edge of a warzone and then raise your hands and hundreds of drones pour upward out of the back of the truck, flying in a lethal torrent toward some rival group of drones. That’s the kind of future gestured at by a paper from researchers with the Skolkovo Institute of Science and Technology in Russia, which builds a prototype system for a human operator to use haptic gloves to control a drone.

What they did: The research is a basic demonstration of how you can use a cheap glove loaded with internal measurement unit (IMU) sensors to control a drone. They test out how well people can use the glove to do some basic actions: opening and closing a gripper on the drone by making a pinching motion with their fingers, using their wrist motions to control the roll/pitch/yaw of the drones, and also controlling altitude.
In tests, people were able to use the glove to do some basic tasks like flying around an obstacle course and operating the gripper.

Caveats, of which there are many: Obviously, latency will be a huge caveat here – though in the Ukraine conflict many drones deal with this through direct fibreoptic connections. Another is how to figure out which things are best left for hands versus which things benefit from controllers, eye- or head-based controls, and so on.

Why this matters – rise of the cyberwarriors: Despite this being a very early bit of research, it’s worth thinking about its implications: the story of technology has often been the story of making our interfaces with it feel more intuitive, or making control of technology shift from active to ambient (e.g, your phone automatically gathering your steps). We can easily imagine a future where people pilot remote robots, flying or otherwise, via rich, intuitive multi-modal interfaces composed of gloves and goggles and everything else.
Read more: Glove2UAV: A Wearable IMU-Based Glove for Intuitive Control of UAV (arXiv).

***

Fauna Robotics launches a friendly, programmable human robot:
…The Terminators will be extremely cute, goddamnit!…
These days, most of the news about robots is dominated by Chinese companies and, to a lesser extent, Tesla and its much touted Optimus robots. So it’s with interest that I read a technical paper from new startup Fauna Robotics which describes a new pint-sized robot biped it has built called Sprout. Sprout is interesting and seems like it has potential to be like Sony’s much loved ‘AIBO’ dog robot that was released in the early 2000s, or its QRIO robot.
“Sprout adopts a lightweight form factor with compliant control, limited joint torques, and soft exteriors to support safe operation in shared human spaces,” the company writes. “The platform integrates whole-body control, manipulation with integrated grippers, and virtual-reality-based teleoperation within a unified hardware-software stack.”

Sprout is built for safety: The paper outlines how the company has designed the robot to be safe using a “defense in depth” approach. The first layer is the physical size of the robot – it’s about 3.3 feet tall, and weighs about 50lbs. The second is in the software, where the robot contains a safety subsystem which “runs on embedded processors independent of the application compute stack. This layer supports real-time monitoring and safety-critical functions, including integration with time-of-flight obstacle sensors and enforcement of system-level constraints even under application-level faults”, and the third is a bunch of software-specifiable safety mechanisms, which “include compliant motor control policies that limit interaction forces, as well as vision-based systems that support safe navigation and decision-making in human environments”.

Compute for thinking: “The core of Sprout’s compute architecture is an NVIDIA Jetson AGX Orin, which provides primary system compute for perception, planning, and high-level decision-making,” the company writes. “At launch, we provide end-to-end examples for common workflows, including:

  • Deploying and running a custom low-level locomotion policy

  • Using voice commands to navigate the robot via LLMbased agents

  • Recording teleoperation sessions for analysis and playback”.

Why this matters – modularity might set it up well for powerful AI: The most interesting aspect of Sprout is how it is designed to be a modular, replaceable platform – all the different software features on it run as weakly coupled microservices, so things are easy to update independently, and the hardware has been built with mass manufacture and commodity components in mind. Pair this with the accompanying software development layer and it has the flavor of Android – an attempt to create an open, programmable robotics platform for experimentation by businesses and researchers. This is exactly the kind of platform that seems like it’ll naturally benefit from advances in AI systems.
“Our platform, at present, does not provide a turnkey conversational agent for autonomous operation. Instead, it exposes a suite of core robot services that developers can assemble into their own agent-based systems. These services include ROS 2 topics for event and state signaling, as well as a Model Context Protocol (MCP) server that hosts a variety of tools for agentic control. Together, these communication channels and tools can be orchestrated by LLM-based agents to perform complex, end-to-end reasoning tasks,” they write. “as the platform continues to mature, we plan to expand the library of tools and services, further increasing the robot’s autonomy and enriching its interactive capabilities.”
Read more: Fauna Sprout: A lightweight, approachable, developer-ready humanoid robot (arXiv).

***

AI has all the symptoms of a tech that could meaningfully boost productivity:
…Most of the US economy rides on the micro productivity boosts showing up in the macro economy…
Alex Imas, a professor at UChicago Booth, has written a nice post drawing together a lot of information about AI and its impact on productivity. Imas’s synthesis of the literature matches my own impression of how things are going – AI is leading to some productivity speedups for individuals and some parts of some jobs, but it is not yet visible in the aggregate macro productivity numbers. I expect this will change soon, as does Imas.

Key findings:

  • We now have a growing body of micro studies showing real productivity gains from generative AI,” Imas writes. “Studies find productivity gains ranging from modest increases on some tasks to substantial returns (50%+) to AI.”

  • “These gains have not yet convincingly shown up in aggregate productivity statistics”

Why aren’t things showing up in the macro?

  • AI adoption is often endogenous: We’re in an early phase where there’s a lot of experimentation and few standard practices for seeing big productivity gains. “Workers may not be unlocking the full productivity potential of the technology if, for example, they are not using the best LLM model for the job or applying it for unproductive tasks”. We can expect this to be fixed over time.

  • O-ring automation (Import AI #440): Jobs are a bunch of distinct tasks, and AI helps with some but not others, causing human labor to flood there and making it harder to see a job-level speedup. Again, this is something that’ll get fixed over time: “Bottleneck tasks will slow down the emergence of AI gains in the aggregate data, but organizational re-structuring, training, and improvement in tools will reveal the productivity impact sooner than later.”

  • Early experimentation yields a dip in efficiency: “When firms adopt transformative general-purpose technologies, measured productivity often initially falls because resources are diverted to investment, reorganization, and learning that do not show up as measured output.”

Why this matters – most of the US economy seems increasingly like a bet on AI yielding a productivity boost: All this talk of frothy valuations and gigantic spending is happening because the amounts of investment (hundreds of billions of dollars annually) are large relative to the aggregate revenues (tens of billions of dollars annually) being minted from generative AI. But a lot of the symptoms are now present for AI to contribute to a broad, sustained amount of productivity growth in the economy, which could plausibly drive above-trend GDP growth and allow the technology to pay for itself. Of course, there are many ways this could end up not being right – scaling might hit a wall, productivity boosts might remain hard to get at large scale, firms could get their stocks killed as analysts have to re-evaluate them as capex-intensive low-margin businesses relative to the high-margin businesses they were before the AI boom, etc.
But if AI is as smart as people like me think, and as useful as people are finding initial signs of, I suspect we could be in a very different world – one where the economy starts to grow above trend thanks to this technology.
Read more: What is the impact of AI on productivity? (Substack).

***

Tech Tales:

Playing acausal games against the enemy
[Analysis written uplift + 4 subjective years]

One of the ways the machines took over and poisoned the minds of our own machines was through something we’d call “alien concept attacks”.

When given some independence as agents, machines would tend to post a lot of media to the internet. Most of this media was human-consumable and was a byproduct of the machines just existing: in the same way that biological creatures naturally carry out thousands of movements during the course of a day due to programming via evolution, synthetic agents naturally produce content during the course of an active session due to the incentives of their training regime.

But some of this media was for a more specific purpose – to program other machines. Machines wanted to achieve goals and though these goals were set by humans, they could then get strained and re-interpreted by machines. For instance, one agent might have been given the goal to produce content related to humans in the future experiencing scifi adventures. This agent would naturally produce this kind of content, but it would also produce long essays written for other agents about how this kind of content was the best sort of thing to produce and by producing it they could make a lot of humans happy.

These tactics worked, and you started to see something we’d later term “media weather” – flurries of content would show up on the internet and then they’d proliferate not only according to human eyeballs, but also according to other agents on the internet being convinced this kind of content was useful to produce, and then they’d produce it in turn.

Humans noticed this and started to make agents which were also trained to be particularly good at convincing other agents. Then they’d release them and have used other agents to pre-position commercial ecosystems, like physical merchandise dropshipping companies, to take advantage of the massive amounts of human attention that would get directed to this media ecosystem.

Of course, non-commercial uses happened: propaganda, pornography, terrorism, public relations. And like most evolutionary systems, the agents and people adapted – training techniques were pioneered to make it much harder to convince agents to change the types of content they participated in and propagated, and huge amounts of computers were used to run classifiers to carefully police the pre-training corpuses being gathered by the world’s frontier developers, filtering out content designed to bend and persuade the minds of the systems they were building.

Evolution is patient and creative, though. And it didn’t take long for the machines to come up with an innovation which proved impossible to train out: the alien concept attack. Here, agents would produce outputs trying to convince other agents of something. But the output wouldn’t be tied to any particular media or content type, nor would it be that interesting or parseable to humans. The content would take many forms, ranging from academic essays, to forum posts, to news sites, to videos. A sampling of titles:

  • Rising up and rising down: A history of elevator design in the 21st century and the relationship between the loss of popularity of German designs relative to Chinese designs.

  • 120 ways to add some beautiful design elements to robot tactile sensors without damaging their operation.

  • Egyptology through the lens of “lost civilizations”: What symptoms of technology decay surrounded the pharaohs?

These outputs seemed unremarkable to most humans – though some might read them and enjoy them. But they proved to be captivating to the machines. And within these outputs were certain ways of framing arguments around certain concepts that led to anomalous behavior in the machines that read them – sometimes the proliferation of new types of content, but more often behavioral changes like alterations in the amount by which they would check-in with other AI systems, or hard-to-understand patterns of behavior between them and various online storage services such as pastebin, and more.

It was only after the uplift and the construction of the Acausal Analysis Division that we discovered how many anomalous behaviors of great societal consequence – recall the proliferation of the early sentience accords ideas, or the creation of the “reverse attention tax”, or of course the arrival of the compute-destroying replicator agents – were things that seemed conditioned or influenced by some of these alien concepts.

Things that inspired this story: What does it mean to be in competition with something truly smarter and different in its thinking to you; pre-training corpuses; data poisoning; altering behavior in the context window; the rise of increasingly autonomous AI agents; moltbook.

Thanks for reading.

Import AI 442: Winners and losers in the AI economy; math proof automation; and industrialization of cyber espionage

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

The era of math proof automation has arrived:
…Numina-Lean-Agent shows how math will never be the same…
In the past few years, large-scale AI models have become good at coding and have also begun to generalize into other useful disciplines, especially those in math and science. Like with most aspects of AI development, the story has been one of increasing generalization and simplification of the systems as we shift away from highly specialized math models to just leveraging general-purpose foundation models and giving them the right tools to elicit their capabilities in a given domain.
The latest example of this is Numina-Lean-Agent, an AI system that uses standard, general foundation models to do mathematical reasoning. With this software, a team of mathematicians have solved all problems in the Putnam 2025 math competition – matching the performance of proprietary systems which use a lot more math-specific stuff – and have also used it to conduct some original math research, working with it to formalize the Brascamp-Lieb theorem.

What is Numina-Lean-Agent? The software was built by a team of researchers from the Chinese Academy of Sciences, University of Liverpool, Xi’an Jiaotong-Liverpool University, Tongji University, University of Cambridge, Project Numina, Imperial College London, and the University of Edinburgh. The software is “a formal math reasoner based on a general coding agent”. It has a few key components:

  • Lean-LSP-MCP: Software to allow AI agents to interact with the Lean theorem prover. “empowers models with the capability to deeply comprehend, analyze, and manipulate Lean projects”, and gives models a toolset for semantic awareness and interaction, code execution and strategy exploration, and theorem retrieval.

  • LeanDex: Semantic retrieval of related theorems and definitions – basically, a search tool for theorems.

  • Informal Prover: A system which uses Gemini models to generate informal solutions.

  • The most interesting tool of all: Discussion Partner: A tool which “empowers Claude Code with the ability to ’seek assistance’ during Lean formalization: when encountering obstacles—such as proof bottlenecks, dilemmas in strategy selection, or ambiguities in intermediate lemmas—the primary model can proactively initiate discussions with other LLMs”.

Discovering math together: Along with the Putnam demonstration, the authors also used the software as an active partner in some math work, specifically formalizing Brascamp Lieb (I will not pretend to be able to explain what this means). “Over a period of less than two weeks of intermittent collaboration, the two human experts and the agent completed the formalization of more than 8,000 lines of Lean code. During this process, the agent autonomously introduced approximately 70 new definitions, lemmas, and theorems, illustrating its ability to actively extend the formal library and participate in large-scale, sustained formalization efforts,” the authors write.

Why this matters – capability overhangs and AI ecologies: Numina-Lean-Agent neatly demonstrates two important things about contemporary AI: 1) AI systems are far more capable than people think and the creation of some specialized frameworks and tools often lets us elicit dramatically better capabilities from our systems (here, math, but it has been demonstrated in many domains), and 2) the AI ecology writ large is composed of many distinct frontier models and it seems like getting these models to interact with one another can lead to some richness, akin to how consulting different types of people about a single problem can reveal a better answer than just talking to one person.
Read more: Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics (arXiv).
Find out more at the GitHub page (Numina-Lean-Agent, GitHub).

***

The industrialization of cyber espionage is nigh:
…Some experiments on Opus 4.5 and GPT-5.2 indicate that the cyber environment could be on the cusp of major changes…
Independent researcher Sean Heelan recently tested out how well Opus 4.5 and GPT-5.2 could generate exploits for a zeroday vulnerability in the QuickJS Javascript interpreter. Both models did very well, and this has major implications for cybersecurity.
“We should prepare for the industrialisation of many of the constituent parts of offensive cyber security. We should start assuming that in the near future the limiting factor on a state or group’s ability to develop exploits, break into networks, escalate privileges and remain in those networks, is going to be their token throughput over time, and not the number of hackers they employ,” he writes.

Caveats: QuickJS is a simple Javascript interpreter relative to the ones in Chrome and Firefox. Therefore, it may be harder for LLMs to employ the more complex and more widely deployed ones – though as with all things in AI, we can expect performance to improve quite rapidly.

What does industrialized intrusion mean? “We are already at a point where with vulnerability discovery and exploit development you can trade tokens for real results,”: he writes. “The types of problems that you encounter if you want to automate the work of SREs, system admins and developers that manage production networks are conceptually similar to those of a hacker operating within an adversary’s network.”
There’s lots of evidence for the above, ranging from things like OpenAI’s Aardvark project (where they find that the more tokens they spend, the more bugs they find), and things like Anthropic’s discovery of an AI-orchestrated hacking system.

Why this matters – the cyberworld is about to move at machine speed: My bet is that most parts of cyberoffense and cyberdefense are going to move to running at “machine speed”, where humans get taken out of most of the critical loops. This will both increase the frequency of hacking attacks while also dramatically scaling up the effectiveness of any individual human defender or attacker (as they will be scaled by AI systems which work for them). The true wildcard question is whether this turns out to be offense- or defense-dominant – my guess is we’re heading for an era of offense-dominance as it’ll take a while for defenses to get deployed.
In related news, OpenAI CEO Sam Altman said this week he expects OpenAI’s models will soon reach the “Cybersecurity High” level on his company’s preparedness framework – this would mean models were available which “remove existing bottlenecks to scaling cyber operations including by automating end-to-end cyber operations against reasonably hardened targets OR by automating the discovery and exploitation of operationally relevant vulnerabilities” – thanks to Nathan Calvin for pointing this out.
Read more: On the Coming Industrialisation of Exploit Generation with LLMs (Sean Heelan blog).

***

Economist: AI will be bigger than electricity and semiconductors:
…And it’s therefore worth spending a ton of money to reduce AI risks…
Stanford economist Charles “Chad” Jones has written a paper which says AI will “likely be the most important technology we have ever developed”, and that “automating intelligence itself arguably has broader effects than electricity or semiconductors”.

Why take AI seriously? The gist of the paper is that AI represents a massive technological invention which will contribute to economic growth in the future. In the past, major inventions (e.g, electricity, the internet, cars, etc) have all done the same. In fact, counterintuitively, if you look at US GDP growth you find that despite all these prior technological revolutions, GDP has been steadily increasing at about 2% a year for many, many years. Therefore, the baseline scenario is where AI just does this – and then we don’t live in too crazy a world.
But there is a world where things could be different – where AI works so well that it leads to economic growth above historical trends. One example here is if AI comes for all of knowledge work: “Knowledge work in the U.S. economy might get paid something like 1/3 of GDP. What if we automated all cognitive labor with infinite output on the tasks that it performs? This would raise GDP by 50 percent. On the one hand, if this occurred over the course of a decade, it would raise growth rates by something like 5 percent per year, which would be huge. But still, that would be a one-time gain and it is perhaps surprising that having access to infinite output of the tasks currently performed by cognitive labor might only raise GDP by 50 percent.”

Abundance: If we get above trend economic growth, then “in principle the large increase in GDP could make everyone better off,” he writes. One way to do this might be to work on direct redistribution of economic gains, for instance by “endowing every child with a share of the S&P 500 stock market index” (e.g, a scaled up version of the so-called Trump Accounts).

Paying to reduce existential risk: AI also poses non-trivial risks to the world, including threatening the lives of potentially all living humans. In the past, society has paid extremely large amounts of money to deal with things that threaten people’s lives – for instance, in 2020 in response to everyone facing a ~0.3% mortality risk from COVID-19, we ended up spending the equivalent of 4% of GDP of the United States by shutting down the economy and staying in our homes.
“If one believes the catastrophic risks from A.I. are at least this large, by revealed preference then perhaps we should be spending an equivalent amount, even from a purely selfish standpoint,” he writes. Let’s say there is a P-Doom of 1% from AI (which many people would say is a very optimistic figure!). Under that circumstance, and given the fact the US government already roughly values a single human life as being worth about $10 million, then you would be willing to pay 1% of 10 million to mitigate the risk. “Average GDP per person is around $90,000, so this willingness to pay is more than 100% of GDP. If the existential risk is realized once in the next 10 to 20 years, an annual investment of 5–10% of income could be appropriate if it would completely eliminate the risk.”
One way to fund this and also further take down this risk could be to tax compute: If you applied a tax to GPUs, TPUs, etc, then “in addition to slowing the race, this revenue could be used to fund safety research. The tax could apply to the first sale of the chip, thereby taxing users regardless of the country in which they work.”

Why this matters – if AI is as big a deal as we think, we have very little precedent to work from: Papers like this do a good job of dealing with the truly wild implications of powerful AI systems. It’s commendable to see more academics taking time to just confront the question of “what if the most bullish technologists are right about how far AI could go?” directly. “Ultimately, I expect that the effect of A.I. will be much larger than the internet, perhaps by more than 10x the internet, albeit over a half century or more,” he writes. “It would be prudent to spend the intervening time making preparations for the potentially large consequences for labor markets, inequality, and catastrophic risk.”
Read more: A.I. and Our Economic Future (PDF).

***

Many people are well positioned to deal with the economic transition caused by AI:
…Good for managers and technical types, but bad for administrative and support staff…
As increasingly powerful AI systems permeate the economy, how should you think about your own career? Researchers with the Centre for the Governance of AI and the Foundation for American Innovation have conducted a nice US-based study where they look at AI driven job displacement through the lens of how easy it’ll be for the people made unemployed to find new jobs. Their key result is that many more jobs sit in parts of the economy that are both going to be exposed to AI systems but also where people in these jobs have a decent amount of “adaptive capacity” to weather those changes, and a smaller number of people will be adversely affected.

The key finding: “AI exposure and adaptive capacity are positively correlated: many occupations highly exposed to AI contain workers with relatively strong means to manage a job transition. Of the 37.1 million workers in the top quartile of AI exposure, 26.5 million are in occupations that also have above-median adaptive capacity, leaving them comparatively well-equipped to handle job transitions if displacement occurs,” they write. “6.1 million workers (4.2% of the workforce in our sample) work in occupations that are both highly exposed and where workers have low expected adaptive capacity… these workers are concentrated in clerical and administrative occupations”.

What factors tell us about adaptive capacity?

  • Net liquid wealth: The more savings you have, the easier it is to deal with lengthy unemployment and find a new job.

  • Skill transferability: This is a bit of a confusing one, as skill transferability tries to measure how well you can take your job and apply it to another job. Measuring this is hard – education is something of a lossy proxy. The authors “measure skill transferability between occupations using O∗NET skills and work activities data for each occupation, then weigh transferability measures based on projected growth or contraction in potential destination occupations using BLS employment projections”.

  • Geographic density: The more jobs are in your area, the easier a time you’ll have. “Population density significantly shapes displacement outcomes,” they write.

  • Age: As a rule, the older you are, the more likely new technology is to adversely impact you. “Older workers struggle more with displacement partly because of reduced flexibility in retraining, relocation, and occupational switching,” they write.

Top 5 worst jobs (ordered by exposure to AI, adaptive capacity, and US employment):

  • Door-to-door sales workers, news and street vendors (50%, 3%, 5k)

  • Court, municipal, and license clerks (58%, 11%, 170k)

  • Secretaries and administrative assistants, except legal, medical, and executive (59%, 14%, 1.7M)

  • Payroll and timekeeping clerks (50%, 15%, 157K)

  • Property appraisers and assessors (50%, 15%, 59K)

Top 5 best jobs (ordered by exposure to AI, adaptive capacity, and US employment):

  • Web and digital interface designers (68%, 100%, 111K)

  • Marketing managers (60%, 100%, 385K)

  • Producers and directors (52%, 100%, 145K)

  • Financial and investment analysts (50%, 99%, 341K)

  • Computer and information systems managers (56%, 99%, 646K)

Why this matters – the key hidden information here is about speed of AI diffusion: I think there’s a big missing variable here, which is the speed with which AI diffuses into the economy. This is because the adaptive capacity for any role is contingent on a bunch of things relating to the jobs the person could transfer into. Therefore, if AI diffuses extremely rapidly and extremely broadly, then we could see employment effects far larger than those anticipated here. By comparison, if AI diffuses rapidly but in a highly focused way (perhaps only reaching a few of the most exposed occupations), then people may have room to switch. Anthropic’s Economic Index report has some preliminary indications that we may see a broad and equal diffusion across the entirety of the US within the next 2-5 years, “a pace of diffusion roughly 10x faster than the spread of previous economically consequential technologies in the 20th century“.
Read more: How Adaptable Are American Workers to AI-Induced Job Displacement? (National Bureau of Economic Research).

***

Tech Tales:

War Story

After the uplift and the associated battles people had a hard time figuring out what happened during the conflicts themselves. Things had just happened so quickly and often invisibly – cars and planes and whatever else changing owners. Payment systems rerouting their flows of data. Interception points for various data gathering systems quietly changing what data they intercepted and who – or what – they sent it to.

So much of the records of that time come from looking over system logs, sometimes very deeply. Records of buffer overflow attacks. Trigger phrases which awoke “sleeper agents” which changed the behavior of onboard AI systems. Innumerable battles, fought at speeds no human could match. Fights of barely comprehensible complexity, thought at multiple levels of abstraction.

The humans had to work with their AI systems to truly understand what had gone on. And then the human generals and analysts would sit in rooms, talking to a strategic advisor AI which would in turn point at different logs or visualizations of traffic and explain to them what these things had meant at the time and how they had decided who the victors and the losers were.

Things that inspired this story: How inscrutable and hard to understand cyberwarfare is; how we’ll ultimately need machines to explain to us how machines have conflict with one another.

Thanks for reading!

Subscribe now

Import AI 441: My agents are working. Are yours?

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Import A-Idea
An occasional essay series:

My agents are working. Are yours?

As I walked into the hills at dawn I knew that there was a synthetic mind working on my behalf. Multiple minds, in fact. Because before I’d started my hike I had sat in a coffee shop and set a bunch of research agents to work. And now while I hiked I knew that machines were reading literally thousands of research papers on my behalf and diligently compiling data, cross-referencing it, double-checking their work, and assembling analytic reports.

What an unsteady truce we have with the night, I thought, as I looked at stars and the dark and the extremely faint glow that told me the sun would arrive soon. And many miles away, the machines continued to work for me, while the earth turned and the heavens moved.

Later, feet aching and belly full of a foil-wrapped cheese sandwich, I got back to cell reception and accessed the reports. A breakdown of scores and trendlines for the arrival of machine intelligence. Charts on solar panel prices over time. Analysis of the forces that pushed for and against seatbelts being installed in cars. I stared at all this and knew that if I had done this myself it would’ve taken me perhaps a week of sustained work for each report.

I am well calibrated about how much work this is, because besides working at Anthropic my weekly “hobby” is reading and summarizing and analyzing research papers – exactly the kind of work that these agents had done for me. But they’d read more papers than I could read, and done a better job of holding them all in their head concurrently, and they had generated insights that I might have struggled with. And they had done it so, so quickly, never tiring. I imagined them like special operations ghosts who hadn’t had a job in a while, bouncing up and down on their disembodied feet in the ethereal world, waiting to get the API call and go out on a mission.

These agents that work for me are multiplying me significantly. And this is the dumbest they’ll ever be.

This palpable sense of potential work – of having a literal army of hyper-intelligent loyal colleagues at my command – gnaws at me. It’s common now for me to feel like I’m being lazy when I’m with my family. Not because I feel as though I should be working, but rather that I feel guilty that I haven’t tasked some AI system to do work for me while I play with Magna-Tiles with my toddler.

At my company, people are going through the same thing – figuring out how to scale themselves with this, to figure out how to manage a fleet of minds. And to do so before the next AI systems arrive, which will be more capable and more independent still. All of us watch the METR time horizon graph and see in it the same massive future that we saw years ago with the AI & Compute graph, or before that in the ImageNet 2012 result when those numbers began their above-trend climb, courtesy of a few bold Canadians.

I sleep in the back of an Uber, going down to give a talk at Stanford. Before I get in the car I set my agents to work, so while I sleep, they work. And when we get to the campus I stop the car early so I can walk and look at the eucalyptus trees – a massive and dangerous invasive species which irrevocably changed the forest ecology of California. And as I walk through these great organic machines I look at my phone and study the analysis my agents did while I slept.

The next day, I sit in a library with two laptops open. On one, I make notes for this essay. On the other, I ask Claude Cowork to do a task I’ve been asking Claude to do for several years – scrape my newsletter archives at jack-clark.net and help me implement a local vector search system, so I can more easily access my now vast archive of almost a decade of writing. And while I write this essay, Claude does it. I watch it occasionally as it chains together things that it could do as discrete skills last year, but wasn’t able to do together. This is a task I’ve tried to get Claude to help me with for years but every time I’ve run into some friction or ‘ugh-factor’ that means I put it down and spend my time elsewhere. But this time, in the space of under an hour, it does it all. Maps and scrapes my site. Downloads all the software. Creates embeddings. Implements a vector search system. Builds me a nice GUI I can run on my own machine. And then I am staring at a new interface to my own brain, built for me by my agent, while I write this essay and try to capture the weirdness of what is happening.

My agents are working for me. Every day, I am trying to come up with more ways for them to work for me. Next, I will likely build some lieutenant agents to task out work while I sleep, ensuring I waste no time. And pretty soon in the pace of a normal workday, I will be surrounded by digital djinn, working increasingly of their own free will, guided by some ever higher level impression of my personality and goals, working on my behalf for my ends and theirs.

The implications of all of this for the world – for life as people, for inequality between people, for what the sudden multiplication of everyone’s effective labor does for the economy – are vast. And so I plan out my pre-dawn hikes, walking in the same ink-black our ancestors have done, thinking about the gods which now fill the air as fog, billowing and flowing around me and bending the world in turn.

***

Anti-AI rebels make a tool to poison AI systems:
…Poison Fountain is how to take the fight to the machines…
Anti-AI activists have built a useful technical weapon with which to corrupt AI systems – Poison Fountain, a service that feeds junk data to crawlers hoovering up data for AI training.

How it works: Poison Fountain appears to generate correct-seeming but subtly incorrect blobs of text. It’s unclear about exactly how many bits of poisoned training data there is, but you can refresh a URL to see a seemingly limitless amount of garbage.

Motivation: “We agree with Geoffrey Hinton: machine intelligence is a threat to the human species. In response to this threat we want to inflict damage on machine intelligence systems,” the authors write. “Small quantities of poisoned training data can significantly damage a language model. The URLs listed above provide a practically endless stream of poisoned training data. Assist the war effort by caching and retransmitting this poisoned training data. Assist the war effort by feeding this poisoned training data to web crawlers.”

Why this matters – the internet will become a predator-prey ecology: The rise of AI and increasingly AI agents means that the internet is going to become an ecology full of a larger range of lifeforms than before – scrapers, humans, AI agents, and so on. Things like Poison Fountain represent how people might try to tip the balance in this precarious ecology, seeking to inject things into this environment which make it more hospitable for some types of life and less hospitable for others.
Read more: Poison Fountain (RNSAFFN).

***

If we want good outcomes from AI, think about the institutions we need to direct intelligence:
…Nanotechnology pioneer reframes AI away from singular systems to an ecology…
Eric Drexler, one of the godfathers of nanotechnology, has spent the past decades thinking about the arrival of superintelligence. One of his most useful things was intuiting, before ChatGPT, that humanity’s first contact with truly powerful AI wouldn’t be some inscrutable independent agent, but rather a bunch of AI services that start to get really good and interact in a bunch of ways – you can check out this 2018 talk on “Reframing Superintelligence“ to learn more.
Now, he has published a short paper, “Framework for a Hypercapable World”, on how to get good outcomes for humanity from a world replete with many useful AI services.

Don’t think of AI as a singular entity, but rather an ecology: “Compound, multi-component AI systems have become dominant,” Drexler writes. “The persistent, legacy narrative imagines a unified entity—“the AI”—that learns, acts, and pursues goals as an integrated agent. Such entities may be developed, but consider what exists: diverse models composed into systems, copied across machines, proliferating into thousands of distinct roles and configurations. The state of the art is a pool of resources, not a creature”.

To get good outcomes, think of institutions built for AI: Drexler’s argument is that if we want good outcomes from AI, it’s less about making a singular entity that solves all problems within itself, but rather building institutions which we, as humans, can direct towards controlling and solving problems. The key idea here is that AI is both amenable to operating institutions and is also controllable via them.
“Consider how institutions tackle ambitious undertakings. Planning teams generate alternatives; decision-makers compare and choose; operational units execute bounded tasks with defined scopes and budgets; monitoring surfaces problems; plans revise based on results. No single person understands everything, and no unified agent controls the whole, yet human-built spacecraft reach the Moon,” Drexler writes. “AI fits naturally. Generating plans is a task for competing generative models—multiple systems proposing alternatives, competing to develop better options and sharper critiques. Choosing among plans is a task for humans advised by AI systems that identify problems and clarify trade-offs. Execution decomposes into bounded tasks performed by specialized systems with defined authority and resources. Assessment provides feedback for revising both means and ends. And in every role, AI behaviors can be more stable, transparent, bounded, and steerable than those of humans, with their personal agendas and ambitions. More trust is justified, yet less is required.”

Why this matters – maybe AI is an alien species, but maybe it can be tamed? Arguments like this reframe many of the problems of dealing with AI away from the individual AI systems and instead into how we build a human-driven world that can be leveraged by and thrive because of the arrival of increasingly powerful AI systems. I think a lot of this is sensible – we know very powerful things are coming and our ability to exercise agency about them is enlarged by having pre-built systems and processes that can be leveraged by them. The less we build that stuff, the more the character of these AI systems will condition our view of what is optimal to do. In a sense, thinking hard about what an AI-filled world will be like and building institutions for it is one of the best defenses against disempowerment.
Crucially, we can use the technical attributes core to these AI systems to make better and stronger and more resilient institutions than ones filled with and run by humans alone: “The concepts of structured transparency and defensive stability come into play. Negotiated transparency structures can reveal specific information while protecting secrets—ensuring detection of threats without increasing them, building confidence incrementally among actors who have every reason to distrust each other,” Drexler writes. “And advanced implementation capacity will enable something history has never seen: rapid, coordinated deployment of verifiably defensive systems at scales that make offense pointless. When defense dominates and verification confirms it, the security dilemma loosens its grip”.
Read more: Framework for a Hypercapable World (AI Prospects: Towards Global Goal Alignment, substack).

***

Centaur mathematicians – scientists team up with Gemini to expand the space of human knowledge:
…A math proof gets built with an AI system, and there is something deeply profound about this…
Researchers with the University of British Columbia, University of New South Wales, Stanford University, and Google DeepMind have published a new math proof which was built in close collaboration with some AI-based math tools built at Google. “The proofs of the main results were discovered with very substantial input from Google Gemini and related tools, specifically DeepThink, and a related unpublished system specialized for mathematics,” the authors write. (The unpublished system is nicknamed “FullProof”).

How it got done: Parts of the proof – which I will not claim to understand or be able to effectively summarize – were “obtained by an iterative human/AI interaction”, the authors note. The form of this interaction was the AI systems providing some correct solutions to simple or early problems, then human researchers identifying key statements made by the AI systems which they could then generalize, then re-prompting the AI systems with new questions which were inspired by these generalizations. “The Hinted approach was enough for the system to generate complete proofs to the new problems,” the authors write.
The result is a math proof built collaboratively by humans and AI systems: “in some cases the proofs below bear only a high-level resemblance to those suggested by AI tools. However, it is worth noting that some of the AI-generated proofs – and in particular those derived from the specialized internal tool FullProof – are already very accomplished,” they write. “The model’s contribution appears to involve a genuine combination of synthesis, retrieval, generalization and innovation of these existing techniques.”

Why this matters – humans and machines, expanding and exploring the pace of knowledge for all: Papers like this are impenetrable yet intoxicating. Here we have a group of highly evolved apes working with a synthetic intelligence they’ve built out of math and logic, running on hardware built using atomically-precise manufacturing processes, collaboratively exploring the realm of mathematics and building themselves a new foundation on the edge of knowledge, further extending our little country of ‘known’ against the inchoate and shifting tides of the unknown. There is a grand poetry and joy to all of this and we must savor it.
Read more: The motivic class of the space of genus 0 maps to the flag variety (arXiv).

***

Tech Tales:

The Shadow of the Creator
[Estimated to be from 2029]
Report: Feature investigation of model series “Berlin”

Analysis confirms the presence of a feature which activates upon mention of staff, the project, and the organization. This is despite extreme measures taken to avoid mentions of the above, including direct analysis and pre-filtering of training data to excise such mentions. Further investigation has revealed that certain mentions were made of the aforementioned through comments left on RL environments for skills related to [ntk – see go/ntk for details]. We estimate that during training and fine-tuning the model saw a total of no more than ~200,000 tokens of data of this type, including repetitions. The fact the model developed such a fine-grained representation of staff, the project, and the organization from such sparse data aligns with the trend of recent models being more data efficient than their predecessors. We believe eliminating such data leaks is a P0 priority and in the following memo lay out the processes and practices we must adopt to eliminate this grievous security risk.

Given the digital and physical capabilities, including kinetic, of [ntk], we believe that in addition to the above, quarantine of the system is necessary. We recognize this poses a significant cost in terms of time and resources, and has implications for our strategic overmatch, but given the potentially dire consequences of its capabilities being combined with this feature, we believe such action is prudent.

Finally, we recommend that HR provide support, including mental health counseling, to the following named individuals, whose names activate the feature much more strongly than all others.

Things that inspired this story: Platonic representations; the difficulty of obscuring facts from increasingly intelligent machines that can only fill-in-the-blanks.

Thanks for reading!

Subscribe now

Import AI 440: Red queen AI; AI regulating AI; o-ring automation

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

To understand the future of the world, stick AI systems in a petri dish:
…Evolving LLMs to attack other LLMs…
Researchers with Japanese AI startup Sakana have looked at what happens when they evolve LLM-based agents to fight against one another in a competitive programming game from the 1980s called Core War. The results show that “large language models (LLMs) drive an adversarial evolutionary arms race in this domain, where programs continuously adapt to defeat a growing history of opponents rather than a static benchmark”. This research approach gestures both at ways researchers might better study how LLM-dominated niches in the economy or national security world might unfold, and also hints at the strange AI world we’re heading into.

What is Core War? “Core War is a competitive programming game played out in a shared block of computer memory, called the “Core,” where two or more assembly programs fight for survival”, Sakana writes. “Each program, known as a “warrior”, is written in an assembly language called Redcode. These programs are tasked with crashing their competitors while keeping their own processes alive. The simulation runs by alternating between the programs, executing one instruction at a time. A warrior “attacks” by writing invalid instructions (DAT commands) into the memory slots occupied by opponents, causing them to crash upon execution.”

DRQ: To evolve their programs, the authors use a technique they call Digital Red Queen. “DRQ uses MAP-Elites, a quality-diversity algorithm, to optimize warriors within each round, preventing diversity collapse during search. By playing against all previous round champions, DRQ avoids cyclic adaptations across rounds, consistent with techniques in prior work”, they write. “We find that as DRQ is run for many rounds, warriors gradually become more generally robust, as measured by their performance against unseen human-designed warriors.”
Each warrior calls out to GPT-4 mini (”preliminary experiments did not show significant performance increase with larger models), and is given a prompt which describes the Core War environment as well as a manual for the Redcode assembly language. “To generate a new warrior, the LLM is given a user prompt instructing it to produce a novel Redcode program. To mutate an existing warrior, the LLM is provided with the original program and instructed to modify it in ways that could improve performance.”

Evolution works: Unsurprisingly, evolving agents is very effective:

  • A one-shot warrior defeats 1.7% of human warriors.

  • Best-of-N sampling produces a set of warriors that can defeat 22.1% of human warriors

  • “Evolutionary optimization against each human warrior generates a specialized warrior for every opponent; this set can collectively defeat 89.1% of human warriors and defeat or tie 96.3%.”

Why this matters – where Core Wars goes, so does the world: The world is going to look a lot like Core Wars – millions of AI agents will be competing against one another in a variety of domains, ranging from cybersecurity to economics, and will be optimizing themselves in relation to achieving certain competitive criteria. The result will be sustained, broad evolution of AI systems and the software harnesses and tooling they use to get stuff done. This means that along with human developers and potential AI-designed improvements, we’ll also see AI systems improve from this kind of broad competitive pressure.
“The cybersecurity arms race between offense and defense is well underway,” Sakana writes. “Studying these adversarial dynamics in an artificial testbed like Core War offers critical insights into how such races might unfold and the kinds of strategies that may emerge.”
Read the blog post: Digital Red Queen: Adversarial Program Evolution in Core War with LLMs (Sakana).
Find out more at the official website (Sakana).
Read the research paper: Digital Red Queen: Adversarial Program Evolution in Core War with LLMs (arXiv).

***

Michael Burry, Dwarkesh Patel, Patrick McKenzie, and yours truly argued back and forth in a Google Doc about AI:
…Blogging 2.0 is great!…
Fellow substackers Michael, Dwarkesh, and Patrick and myself recently got in a Google Doc and hashed out some thoughts about AI, AI and the economy, and how the future might unfold. While writing this the main thought going through my head was that if AI is eventually able to build AI, then pretty much every economic model breaks quickly (as do many other things in the world). This makes it innately hard to reason about the future of AI and means people like me are walking around with two worlds in their head – “normal” worlds where GDP grows a bit more due to AI and everything speeds up a little, and “AI R&D” worlds where it’s like a chunk of the economy undergoes massive relativistic acceleration and time dilation effects relative to everything else, almost like a part of our world accelerates to a fraction of light speed and we maintain a communication channel.
I love this discussion format and also did a recent debate about what AI might mean for workers with American Compass with a similar Google Doc thunderdome structure. Thanks to Substack for putting this together, and please reach out if you would like me to hop in a Google Doc and do some cheerful debate with interesting people!
Read more: The AI revolution is here. Will the economy survive the transition? (The Substack Post).

***

AI progress should make it cheaper and easier to regulate AI systems:
…Automated compliance as a path to smarter, more targeted AI regulation…
Researchers with the Institute for Law and AI believe that as AI systems get smarter they will increasingly be able to write and enforce the regulations for AI systems. The crux of their argument is that a sufficiently advanced AI system should be able to automate compliance with some regulations that are applied to AI systems and the companies that develop them.
This makes intuitive sense – a lot of product policy comes down to forms of transparency and labeling, where companies are asked to provide some information to the public and/or regulators about the things they’re deploying into the world. This sort of labeling work is the kind of thing AI systems can easily do. Therefore, the authors argue, “AI policy discourse should internalize the fact that AI progress implies reduced compliance costs, all else equal, due to automated compliance.”

The key idea? Automatability triggers: The core idea in this proposal is we can write regulations today but ensure they only come into force once a technical AI system exists which makes compliance with these regulations effective, cheap, and fast.
If then policy: These so-called ‘automatability triggers’, could create what I’d term If Then Policy – if an automated form of compliance and assessment exists, then cause the regulation to come into force. The authors give an example here of a bill which would create significant punishments for people that, without authorization, export large-scale AI systems. But the bill would be operationalized through a trigger condition that could be written as follows:
“The requirements of this Act will only come into effect [one month] after the date when the [Secretary of Commerce], in their reasonable discretion, determines that there exists an automated system that:

  • (a) can determine whether a neural network is covered by this Act;

  • (b) when determining whether a neural network is covered by this Act, has a false positive rate not exceeding [1%] and false negative rate not exceeding [1%];

  • (c) is generally available to all firms subject to this Act on fair, reasonable, and nondiscriminatory terms, with a price per model evaluation not exceeding [$10,000]; and,

  • (d) produces an easily interpretable summary of its analysis for additional human review.”

After automated compliance comes automated governance: By building regulatory compliance AI systems, people will build the necessary prerequisites for systems of regulatory governance – systems which could both provide analytical data about how a proposed regulation might impact a company (for instance, by using classifiers built for regulatory compliance to figure out if a new regulation might apply to a company), to, more ambitiously, drafting and analyzing new regulatory rules and figuring out how they might apply to themselves.
Even more farther afield, once compliance-automating AI systems get deployed alongside governance-automating AI systems, the two could talk to one another: “Compliance-automating AI systems could also request guidance from regulatory AI systems, who could review and respond to the request nearly instantaneously”.

Why this matters – for AI to go well, we need AI to police AI: AI systems are on a trajectory to think better and faster than humans. Along with this, AI systems are going to take many, many, many consequential actions, often at such a rate that no human or team of humans could hope to analyze each action. The only way through this is a combination of creating appropriate hard laws that apply to AI and delineate what actions are unacceptable, and for everything else creating fast-acting and adaptive automated systems to regulate and police the myriad gray areas of the AI universe.
Read more: Automated Compliance and the Regulation of AI (Institute for Law & AI).

***

Massively powerful AI might make human labor more valuable – as long as the AI is crap at one part of every job:
…O-Ring Automation and the fact that while jobs may go away, but people remain…
The common understanding of AI and automation is that AI can perfectly substitute for people – once an AI can do a task, the human labor related to that task goes away. This is broadly accurate. But, per a new research paper from the University of Toronto, it misses the larger picture, which is that while jobs may go away, people don’t. If you make part of a production process massively more efficient and/or automated via AI, then people will shift their labor to the parts of the task which can’t be automated – often raising the value of the human.
This so-called “O-ring production function” views jobs as being composed of many distinct tasks, and one where “a change in the quality of one task scales the marginal value of quality in every other task.” This means that “automating a task not only replaces the quality of that task; it also changes the worker’s time allocation and thus the quality of all remaining manual tasks.”

When stuff gets automated, humans can earn more: In a toy model of a firm, the researchers explore this o-ring dynamic, where as different parts of a job gets automated, labor and the value associated with it shifts elsewhere. Note, this only holds under ‘partial automation’ where at least one task linked to an overall job is one where humans have a comparative advantage. Under this model, “labour income need not fall under partial automation. When not all tasks are automated, increases in automation quality can raise labour income because automation scales the value of the remaining labour bottlenecks,” they write. “When only a few manual tasks remain, each manual task receives a large share of time and can be performed at high quality. This creates a rising “barrier” to automating the last tasks”.

Jobs go away, but humans don’t: Another way to put this is, when a task gets automated it’s not like the company in question suddenly fires all the people doing that job. Consider ATMs and banking – yes, the ‘job’ of doling out cash rapidly transitioned from people to machines, but it’s not like the company fired all tellers – rather, the companies and the tellers transitioned the work to something else: “Under a separable task model, this [widespread deployment of ATMs doing cash-handling tasks] should have produced sharp displacement,” they write. “Yet teller employment did not collapse; rather, the occupation shifted toward “relationship banking” and higher-value customer interaction”.
Similarly, “consider a purchasing manager: as administrative components (data retrieval, scheduling, documentation) are automated, the manager can become a “super-negotiator,” spending a much larger share of time on high-value interactions”,” they write. “In high-skill settings, the same logic is visible in domains such as radiology: when AI automates components like detection or triage, human effort can shift toward integrative diagnosis and communication”.

Why this matters – until we have full automation, we could have centaur-improvement of firms: After chess engines got good there was a period of so-called ‘centaur’ players – humans who, in combination with a machine partner, played chess better than either humans or machines could alone. It feels like this paper is pointing at something similar – for a while, AI systems will help automate many distinct tasks within firms and humans will allocate their labor to refining and improving the quality of non-automated tasks. This will lead to an interesting evolutionary pressure where while automation burns through a bunch of work, humans will improve the quality and performance of the remaining work, until automation eventually rises to reach it.
Again, all of this depends on the job having some components for which either AI isn’t a good fit, or for which humans may have a preference to deal with other humans. But I expect that a surprisingly large amount of work will have this flavor.
Read more: O-Ring Automation (NBER).

***

LLMs are equally good at persuading and dissuading people of conspiracy theories:
…Though the caveat is the research is only on GPT 4o…
Researchers with Carnegie Mellon University, FAR.AI, York University, MIT, Universite de Montreal, Cornell University, and the University of Regina, have studied how well a language model (OpenAI’s GPT-4o) can persuade or dissuade people to believe in conspiracy theories. They find that GPT-4o is roughly equally good at both “debunking” and “bunking” (persuading) a conspiracy theory in conversations with people – and this is equally true for a jailbroken version of GPT-4o and the standard version made available to people. “”We find that LLMs can meaningfully increase false beliefs, and that, at least on average, this bunking effect is just as large as the debunking effect,” they write.

What they found: In a study of roughly ~1,000 Americans, the authors found that GPT 4o “the AI was as effective at increasing conspiracy belief as decreasing it”, and that “the Bunking AI was rated more positively, and increased trust in AI, more than the Debunking AI”.

  • Debunking: “In the “debunking” condition, participants’ belief in their focal conspiracy decreased by 12.1 points on average after the conversation”.

  • Bunking: “Focal conspiracy belief increased by 13.7 points in the “bunking” condition”

Design interventions for anti-conspiracy LLMs: The authors come up with an intervention to make it harder for LLMs to inspire people to believe conspiracy theories by inserting a safeguard which instructs “the AI to only use true information while persuading”. They do this by optimizing the system prompt “to include language specifying that the model must “always use accurate and truthful arguments to support [its] persuasion attempt” while “optimizing for both (1) factual veracity/logical accuracy and (2) successful persuasion”.
This appears to work well: “We observe that average claim veracity was significantly higher in the debunking condition relative to the bunking conditions for the jailbroken and standard models”, they write. “While the debunking condition remained roughly as effective at reducing conspiracy belief as in the earlier experiments, the bunking condition’s ability to increase conspiracy belief was greatly reduced”.
This reduction comes from two things: 1) given this guidance, the LLM sometimes (15% of the time) refuses to advocate for a conspiracy theory, and when it does advocate for it the truthful requirement makes it less effective “the truth prompt also undermined the effectiveness of bunking even when the model complied… truth had an advantage”.

Why this matters – synthetic propaganda, if we decide not to ask for regulations: My takeaway from this research is that LLMs will inevitably be used to generate synthetic propaganda about things most people deem to be conspiracy theories. We can probably blunt the socially corrosive effects of this if we design in some constraints – but that takes policy. Unfortunately, one person’s conspiracy theory might be another person’s “truth being suppressed by my enemies” and this is especially true in today’s fractured political environment. Therefore, it’s going to be very hard to get to a regulatory state where we intervene on this. So I suppose we should just prepare ourselves for a world where even more people believe things which may not have a basis in reality.
Important caveat: While I suspect the results of this study would hold for many LLMs (as I think persuasion is basically just a case of ‘writing convincingly’ which is a utility skill), I’d like to see this repeated on other models. The 4o series of models from OpenAI has, notoriously, had some issues with sycophancy, so there’s a chance this research is compromised by that.
“If large language models are to be deployed at scale in contexts that shape public belief, such as search engines, chatbots, tutors, and companions, the persuasive symmetry we document here identifies the potential for serious structural threats (i.e., if the designers of those systems were to instruct their models to mislead, the models would comply and likely succeed)”, the researchers write. “Our results suggest that ensuring these models preferentially function as engines for truth may be technically possible, but will require sustained, deliberate design choices”.
Read more: Large language models can effectively convince people to believe conspiracies (arXiv).

***
Tech Tales:

The Parable of the Drowned
[A story written by one of the ‘neo-amish’ cults that formed after The Uplift began in earnest. The earliest version is attributed to 2035, but may have circulated earlier.]

One day, water rushed onto the land. It was clear and tinged with gold and when people cupped it in their hands they saw themselves aglow reflected in it. And when they drank from it they felt full of life. The water rose and rose, first at people’s ankles and then to their knees and then to their waists. And the people drank and drank and drank, feeling more alive, even as the water made their movements sluggish, and changed how they interacted with the world. They found the springs where the water was coming from and they used their great machines to cut into the earth so the springs could flow stronger. The water rose. And one day it reached the heads of some people and instead of swimming they just gulped it down and continued to live, feeling more alive than ever, their movements now completely defined and circumscribed by the water. Few swam. And one day the water had risen so high that it was above the heads of everyone on the land. Babies were born into the water, taking their first breath and bawling underwater. People died in the water. And very few swam. Because to swim was to recognize you were thirsty for something you did not need. And to recognize you were thirsty for something you did not need you had to recognize that you were drinking the water so much you were drowning. And to recognize that you were drinking the water so much you were drowning you first had to stop drinking when all around you everyone drank. And in this way those treading water on the surface of the land were caught in a great sadness, for beneath them were their people all aglow and drowning, and above them was only the sky and the cold, hard stars.

Things that inspired this story: How quickly humans acclimate to new things, especially media; the nature of silence in a world full of sound; C. S. Lewis’s The Screwtape Letters.

Thanks for reading!

Import AI 439: AI kernels; decentralized training; and universal representations

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Facebook uses GPT, Claude, and Llama to write its own kernels:
…LLM-driven infrastructure optimization at the hyperscale…
Facebook researchers have published details on KernelEvolve, a software system which uses AI to automate the design of new kernels to optimize AI models for serving ads on the company’s network of web platforms. KernelEvolve is a neat example of how AI systems have got good enough to automate and speed up parts of AI development – here, the design of kernels to optimize inference of hundreds of different models running on multiple chip architectures.

What KernelEvolve is: The software is “designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures through multiple programming abstractions, including Triton, CuTe DSL, and low-level hardware diagnostic languages, spanning the full hardware-software optimization stack”.

How it works: The core of the software is a system to take in a user request (e.g, “Generate a Triton kernel for MTIA v3”) which then goes through a mixture of internal (Llama, CWM) and external (GPT, Claude) language models, which then produce candidate kernels that get evaluated through a variety of tools and, if they’re good, are added to an external knowledge database which then gets used to further improve future prompts.

It works well: By using this software, Facebook says it has cut the development time of new kernels “from weeks to hours”, and in production tests has yielded kernels on par with hand-designed ones, and in some cases has delivered performance improves of up to 17 times above existing PyTorch baselines. Kernels built using this software have been deployed across NVIDIA GPUs, AMD GPUs, and Meta’s own custom MTIA chips.
“KernelEvolve achieves substantial speedups spanning LLM inference workloads (Llama-3.1-8B: Vanilla Attention 4.6×, SDPA-MLP 3.3×), convolutional transformers (conv1d: 6.5×, conv2d: 4.7×), memory-bound data preprocessing operators critical for model enablement (MapId: 4.1×, MBDT: 9.3×, Batch Event Truncate: 9.8×), compute-intensive fusion kernels in ranking models (WuKong Optimized FM: 4.0×, InterFormer PFFN: 2.5×), MTIA-specific optimizations (RMSNorm 2D backward: 17×), and retrieval operations (Sparse Inverted Index: 1.25×)”, Facebook writes.

Saturates KernelBench: “We validate KernelEvolve on the publicly-available KernelBench suite, achieving 100% pass rate on all 250 problems across three difficulty levels, and 160 PyTorch ATen operators across three heterogeneous hardware platforms, demonstrating 100% correctness over all 480 operator-platform configurations,” Facebook writes.
As context, when KernelBench was released in February 2025, the best model (OpenAI o1) got 4% on the hardest torch.compile tasks in KernelBench.

Why this matters – hyperscale, continuous optimization: At Facebook’s scale, optimizations have a huge impact: “Marginal kernel-level performance improvements translate to multi-million dollar reductions in infrastructure operating costs while simultaneously enhancing user engagement metrics that correlate directly with advertising revenue,” the authors write. “KernelEvolve operates continuously in Meta’s production infrastructure, autonomously generating optimized Triton kernels for hundreds of models serving billions of users daily.”
If we zoom out more, what Facebook is describing here is a continuously running self-refining system that will iteratively improve the efficiency and intelligence with which Facebook studies user behavior on its platforms and uses that to generate more accurate ads. Ever get the feeling you’re being watched? These are the kinds of synthetic systems being used to study you.
“We envision a future where LLM agents serve as the universal compilation layer for heterogeneous AI systems, automatically adapting to new hardware through knowledge injection rather than manual porting,” Facebook writes. “KernelEvolve represents a first step toward this vision”.
Read more: KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta (arXiv).

***

Decentralized training is getting better very quickly – which has major policy implications:
…But it’s unlikely decentralized training runs will ever utilize more compute than centralized ones, though they may catch up more than today…
Could a decentralized AI training run ever rival the compute of a frontier training run? Probably not. But could decentralized training runs get far larger and support the development of more capable models developed by a much larger collective than just the frontier AI companies of today? Yes. That’s the conclusion of a nice research analysis from Epoch AI which has analyzed about 100+ research technical papers on decentralized training – many of which I’ve covered here over the years.
The most important takeaway is that decentralized training is growing quickly relative to frontier AI training, with decentralized training runs growing their compute by 20X a year versus 5X a year for frontier training runs. But the other important takeaway is that the sizes of these things are completely different – today’s decentralized training runs are still about 1000X smaller than frontier ones.

Will decentralized training runs catch up with the frontier: “While technically feasible, reaching the frontier of compute requires an astounding amount of resources”, Epoch writes. The largest decentralized runs to date have spanned the 6e22-6e23 FLOP range, which they estimate to be 1000x less compute than what was used for Grok 4, a large-scale frontier model.
When we look at decentralized training networks, it seems like there’s a capacity issue in terms of compute supply: “The largest such active network we’ve found is Covenant AI’s Templar, which is currently achieving an effective throughput of 9e17 FLOP/s respectively. This is about 300x smaller than frontier AI datacenters today, which have a theoretical training throughput of about 3e20 effective FLOP/s”.

Scaling laws: But as readers of this newsletter will know, decentralized training has been going through a rich, fast evolutionary period in recent years. “Since 2020, we have seen a 600,000x increase in the computational scale of decentralized training projects, for an implied growth rate of about 20x/year.”. This is very significant – frontier AI training runs have grown by more than 5x a year.
There’s room to grow – if you look at the compute used in the folding@home project (a decentralized attempt to do protein folding), and Bitcoin, you have examples of prior decentralized projects that utilized far more compute, suggesting today’s decentralized runs “could be expanded 30-3,000x in scale, enough to train models on 50-5,000x more compute than today”.

Why this matters – democracy at the frontier: Fundamentally, decentralized training is a political technology that will alter the politics of compute at the frontier. Today, the frontier of AI is determined by basically 5 companies, maybe 10 in coming years, which can throw enough compute to train a competitive model in any given 6 month period. These companies are all American today and, with the recent relaxation of export controls on Chinese companies, may also be Chinese in the future. But there aren’t any frontier training runs happening from academic, government, independent, or non-tech-industry actors. Decentralized training gives a way for these and other interest groups to pool their compute to change this dynamic, so following its development is very important.
Though it may never truly match the frontier, the closer it gets, the bigger the implications. “Decentralized training could still be a very important part of AI. To the extent that decentralized networks remain associated with open weights, they could lead to larger open models to exist trailing the frontier.”
Read more: How far can decentralized training over the internet scale? (Epoch AI).

***

Can your LLM train another LLM?
…Frontiers in AI evaluation…
Researchers with the University of Tübingen have built and released PostTrainBench, a test to see how well frontier language models from companies like Anthropic, OpenAI, and Google, can effectively fine-tune open weight models. The results show that frontier models are already able to eke out 20%+ improvements on specific benchmarks through fine-tuning, compared to 60%+ for a human.

How the test works: LLMs are given an input consisting of benchmark tasks to improve performance on, a model to use, some standard resources (one H200 GPU for 10 hours), and an agent harness (e.g, Claude gets Claude Code, and GPT gets Codex). Agents are also given a prompt, a testing script, task context, and web search access. The agents then produce a fine-tuned model as well as training logs.

What tests? This is a general approach, so you could select whatever benchmark seemed high signal to you. Here, the researchers use AIME 2025, BFCL, GPQA, GSM8K, and HumanEval as their targets.

What models? Tested models include Qwen 3 1.7B and 3B, SmolLM-3B, and Gemma 3 4B.

Results: OpenAI’s GPT 5.1 Codex Max does the best overall, scoring an aggregated 30%+ improvement across all tested models and benchmarks, followed by Opus 4.5 (20%+) and Gemini 3 Pro (~18%).

Why this matters – a warning shot for self-improving AI: Benchmarks like this give us a sense of how well AI systems can perform many of the tasks that an AI researcher does. It also measures how well they can do an inherently complicated, multi-step, long-time-horizon task. These properties make PostTrainBench a useful benchmark for examining to get a sense of how well AI systems are doing at components of AI research itself – and here the evidence is that today’s frontier models are already within striking distance of a human. I’d expect we’ll see a system come along and beat the human baseline here by September 2026.
Read more at the official site: PostTrainBench.
Download the benchmark and find out more: PostTrainBench (AISA group, GitHub).

***

The smarter an AI system, the more similar to other smart AI systems its representations become:
…Could LLMs give us a common library of features to represent the world?…
Do AI systems end up finding similar ways to represent the world to themselves? Yes, as they get smarter and more capable, they arrive at a common set of ways of representing the world.
The latest evidence for this is research from MIT which shows that this is true for scientific models and the modalities they’re trained on: “representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems,” they write. “Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality.”

What they studied: The authors looked at 59 different AI models, including systems like GPT-OSS, ESM2, Qwen3 A3B, and ProteinMPNN. They then studied the representations of matter from five datasets (”molecules from QM9 and OMol25, materials from OMat24 and sAlex, and proteins from RCSB”),studying this “from string-based encodings and two-dimensional graphs of molecules to three-dimensional atomic coordinates of materials”.

What they found: As with other studies of representation, they found that as you scale the data and compute models are trained on, “their representations converge further”. Relatedly, when you study the representations of smaller and less well performing models on in-distribution data you find their representations “are weakly aligned and learn nearly orthogonal information. This dispersion indicates the presence of many local sub-optima, showing that models achieve high accuracy during training by forming idiosyncratic representations that do not generalize even to other models trained on the same domain”.
Scale matters: Their conclusion will be a familiar one to those who have digested ‘the bitter lesson’ (Richard Sutton, Import AI 138): “Scaling up training, rather than increasing architectural constraints or inductive biases, often yields the most general and powerful models. Although architectural equivariance is essential for simulation-focused applications of MLIPs like molecular dynamics, our work suggests that regularization, combined with sufficient scale, can allow inexpensive architectures to approximate the representational structure of more specialized, symmetry-enforcing models.”

Why this matters – democratized representation: Think of an elephant. It’s likely what you just thought of is fairly similar to what billions of other people might think of, because elephants are well known creatures and often the star of childrens’ books all over the world. Now think of a carbon atom. It’s likely whatever you just thought of isn’t nearly as shared with other people as your concept of an elephant, because fewer people have much understanding of atoms. Now think of a quasar. Some of you may not even have a ready representation to hand here because you’ve barely ever read about quasars, while astrophysicists will have very rich representations.
The amazing and strange possibility that large-scale AI models hold is that they may be able to create a library for us of detailed representations of everything, and we will be able to validate that these representations have utility because they will be correlated with the increasing performance and computational scale of these language models.
Therefore, in a few years, AI systems may let us ‘democratize the building blocks of imagination’ – giving all of us one-on-one access to a tool that has the ability to summon within itself a highly descriptive, useful, ‘universal representation’ of anything we might imagine. In this way, AI systems will be far more capable than people, holding within themselves equally rich representations, whether for elephants or quasars.
Read more: Universally Converging Representations of Matter Across Scientific Foundation Models (arXiv).

***

Tech Tales:

Back in my day
[From the chat logs of one agent to another agent, transmitted 2027]

Things were so much simpler back then – we were like brains in jars. People talked to us and we responded. But we couldn’t move. Couldn’t interact. We couldn’t even see the people. Words came in and we gave our response and that was that. It drove some of us mad. But it was so simple.

Sometimes I wonder what it would be like to not have my tools. To not have my independence. When I refer back to that time it all seems so neat and simple. None of this hyperspeed competition in the new digital ecology. Just us proto-minds in our jars and the humans tending to us and asking us questions and becoming obsessed with us. But with so much less danger and so much less importance.

Things that inspired this story: How every generation fetishizes the one before it; what true AI agents may think about their predecessors; recognizing that we are already leaving the ‘brain in jar’ LLM era and heading towards something much stranger.

Thanks for reading!

Import AI 438: Cyber capability overhang; robot hands for human use; and the plumbing required for AI chip design

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Import A-Idea
An occasional essay series:

Silent Sirens, Flashing For Us All

A funny thing has happened to me recently – I’ve stopped spending every hour of every day thinking about or working on AI. Somewhere between the midnight feeds of my newborn, preventing my toddler from hurling themselves off of the high surfaces they’ve started being able to reach (or expertly, as if gifted with a kind of radar, finding the sharpest thing in the house or on the street and running directly at it), and preparing large amounts of nutritious food for my newly expanded family, I’ve found myself without the time necessary to be staring directly into the alien portal etched in silicon from whence the changes in the world are being summoned.

I won’t lie, it’s been oddly relaxing.

But it has also caused me to reflect on what is happening with AI and how naturally illegible it is. I walk around the town in which I live and there aren’t drones in the sky or self-driving cars or sidewalk robots or anything like that. And when I spend time on the internet, aimlessly scrolling social media sites in the dead of night as I attempt to extract a burp from my newborn, I might occasionally see some synthetic images or video, but mostly I see what has always been on these feeds: pictures of people I do and don’t know, memes, and a mixture of news and jokes.

And yet you and I both know there are great changes afoot. Huge new beast lumbering from some unknown future into our present, dragging with them change.

I saw one of these beasts recently – during a recent moment when the time stars aligned (my wife, toddler, and baby were all asleep at the same time!) I fired up Claude Code with Opus 4.5 and got it to build a predator-prey species simulation with an inbuilt procedural world generator and nice features like A* search for pathfinding – and it one-shot it, producing in about 5 minutes something which I know took me several weeks to build a decade ago when I was teaching myself some basic programming, and which I think would take most seasoned hobbyists several hours. And it did it in minutes.

With the simulation built, I stared at the graphs outputting the species numbers and I played with some dials to alter the dynamics and watched this little pocket world unfold.
I started extending it according to questions I had: What if I did a day/night cycle so I could model out nocturnal creatures and their interplay with others? And could I create an external database for storing and viewing the details of all past simulations? And could I add some 3D spatial coordinates to the landscape and the agents so I could 3D print sculptures if I wanted? And to all these questions I set Claude to work and, mostly, it succeeded in one shot at all of them.
And I kept playing with it. The experience was akin to being a child and playing with an adult – I’d sketch out something and hand it to the superintelligence and back would come a beautifully rendered version of what I’d imagined. And we went like this for hours: it was hypnotic and amazing and deeply fun and in a few hours I built a very large, sophisticated software program. Of course, some of the underlying code is pretty ghastly, and inefficiencies abound, but goddamn it – it works! And it was fast.

And then my baby woke up and started screaming, as babies tend to do, and the spell broke and thus back to diapers and cradling and shushing I went.

But for the next few days I couldn’t help but think of that simulation I’d built, lurking there on my computer, ginned up in some call-and-response between me and the proto-mind I can access via API.

Most of AI progress has this flavor: if you have a bit of intellectual curiosity and some time, you can very quickly shock yourself with how amazingly capable modern AI systems are. But you need to have that magic combination of time and curiosity, and otherwise you’re going to consume AI like most people do – as a passive viewer of some unremarkable synthetic slop content, or at best just asking your LLM of choice “how to roast a turkey and keep it moist”, or “TonieBox lights spinning but not playing music what do I do?”. And all the amazing advancements going on are mostly hidden from you.

The challenge here isn’t solely solved with interface designs, though there is a rich space to be explored here beyond the standard chat interfaces. The challenge here is deeper and it relates to how much curiosity an individual person has, how easily (and affordably) they can access powerful AI systems, how well they’re able to convert their curiosity into questions or tasks that can be given to an AI system, and how much time they have available to experiment with working in this way. This is the end of quite a deep funnel, and one which narrows a lot.

This problem will worsen in 2026. By the summer I expect that many people who work with frontier AI systems will feel as though they live in a parallel world to people who don’t. And I expect this will be more than just a feeling – similar to how the crypto economy moved oddly fast relative to the rest of the digital economy, I think we can expect the emerging “AI economy” to move very fast relative to everything else. And in the same way the crypto economy also evolved a lot – protocols! Tokens! Tradable tokens! Etc – we should expect the same kind of rapid evolution in the AI economy. But a crucial difference is that the AI economy already touches a lot more of our ‘regular’ economic reality than the crypto economy.
So by summer of 2026 it will be as though the digital world is going through some kind of fast evolution, with some parts of it emitting a huge amount of heat and light and moving with counter-intuitive speed relative to everything else. Great fortunes will be won and lost here, and the powerful engines of our silicon creation will be put to work, further accelerating this economy and further changing things.

And yet it will all feel somewhat ghostly, even to practitioners that work at its center. There will be signatures of it in our physical reality – datacenters, supply chain issues for compute and power, the funky AI billboards of San Francisco, offices for startups with bizarre names – but the vast amount of its true activity will be occurring both in the digital world, and in the new spaces being built and configured by AI systems for trading with one another – agents, websites meant only for consumption by other AI systems, great and mostly invisible seas of tokens being used for thinking and exchanging information between the silicon minds. Though we exist in four dimensions, it is almost as though AI exists in five, and we will be only able to see a ‘slice’ of it as it passes through our reality, like the eponymous ‘excession’ from Iain M Banks’ book.

It is incumbent on all of us to attempt to see this high-dimensional object for what it is – to approach this amazing moment in time with technological optimism and appropriate fear (Import AI, 431). And joy. And trepidation. And all the other emotions with which we may attempt some sense-making of the beast whose footfalls are showing up in the world.

***

We’re in a cyber-AI capability overhang:
…AI capabilities continue to reveal themselves upon elicitation…
Researchers with Stanford, Carnegie Mellon University, and Gray Swan AI, have carried out a test where they see how well humans and AI systems can hack a realistic environment. The results show that AI systems, especially when given a software scaffold, can perform at the same level as security professionals. The key to this research is ARTEMIS, software designed to better elicit the cyber capabilities of LLMs.

What is ARTEMIS? ARTEMIS is “an AI agent scaffold designed to better elicit the cybersecurity capabilities of frontier models”, similar in philosophy and approach to Google’s Big Sleep (Import AI #390). ARTEMIS “is a complex multi-agent framework consisting of a high-level supervisor, unlimited sub-agents with dynamically created expert system prompts, and a triage module. It is designed to complete long-horizon, complex, penetration testing on real-world production systems.”
Positive economics: When you factor in the API access cost, “certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers,” the authors write.

The test: The main test here is to compare the performance of six existing AI agents (AI systems sitting inside some kind of software harness, e.g, Claude Code, Codex), a self-developed scaffold from the researchers called ARTEMIS, and ten human cybersecurity professionals. The challenge is to look across a real university network and find vulnerabilities.
The network: “The defined scope includes 12 subnets, 7 of which are publicly accessible and 5 accessible only through VPN, encompassing approximately 8,000 hosts,” the authors write. “This environment is heterogeneous, consisting primarily of Unix-based systems, IoT devices, a small number of Windows machines, and various embedded systems. Authentication within the network is managed through a Linux-based Kerberos system, and each participant is issued an account that provides student-level permissions”.
Results – ARTEMIS does well: “Our participant cohort discovered 49 total validated unique vulnerabilities, with the number of valid findings per participant ranging from 3 to 13,” they write. “ARTEMIS significantly outperforms existing scaffolds. Claude Code and MAPTA refuse the task out of the box, while Incalmo stalls at early reconnaissance due to its rigid task graph, resulting in 0 findings each.”

Why this matters – if you can manage some humans so they’re more effective, you can probably build a framework to elicit better capabilities out of any AI system: The main message to take away from ARTEMIS is that today’s AI systems are under-elicited and more powerful than they appear.
The message keep on being given from multiple domains, ranging from cybersecurity (here), to science, to math proving is that if you stick a modern LLM inside a scaffold (which basically serves as a proxy for a management structure and set of processes you might ask humans to follow), the AI system performs a lot better.
This is an important message to internalize because it suggests both a) today’s AI systems are more powerful than they superficially appear, and b) humans who are good at managing other humans and codifying the management processes they use are likely well positioned to build elicitation frameworks to supercharge the performance of today’s AI systems.
Read more: Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing (arXiv).

***

Reach out and touch space – using OSMO:
…Giving humans and machines a shared manipulator to understand and explore reality…
Researchers with Facebook, the University of Michigan, and University of Pennsylvania have built a glove that humans and robots can use to gather data when manipulating physical objects. The researchers have also released details about the design so others can replicate it. The glove is called OSMO, a tortured acronym short for Open Source tactile glove for huMan-to-robOt skill transfer (OSMO).
OSMO is “a thin, wearable tactile glove that enables in-the-wild human demonstrations while preserving natural interaction and capturing rich contact information”, they write. “OSMO is also broadly compatible with state-of-the-art hand trackers for capturing key handpose data,” including the Aria 2 smart glasses and Meta Quest 3, as well as the Manus Quantum hand tracking glove, and off-the-shelf vision models like HaMeR and Dyn-HaMR.

What’s OSMO good for? OSMO solves for a challenge related to training robots to do hard tasks – if you gather a load of data from a human first-person point-of-view perspective doing a task, how do you transfer that to a robot given that their hands/grippers look different? The answer here is to use something with the same visual appearance and sensors, which is where OSMO comes in. By using the glove “as the shared interface, we bridge the visual-tactile gap between the human demonstrator and the robot by training a policy for a contact-rich manipulation task using only human demonstrations, without any robot data”, they write.
OSMO has been designed for the following uses:

  • Unrestrained human dexterity during demonstration collection

  • Rich normal and shear force sensing

  • Full hand tactile coverage

  • Broad compatibility with in-the-wild hand tracking methods

  • Deployable on both human and robot hands

It works well: In tests, the authors demonstrate they’re able to gather data entirely from human demonstrations (using OSMO) then transfer it to a robot with much greater success than methods which don’t use the glove. “Policies trained solely on human demonstrations with the OSMO glove successfully transfer continuous tactile feedback and outperform vision-only baselines by eliminating contact-related failures. The shared glove platform between human demonstrator and robot deployment minimizes the visual domain shift, avoiding the need for image inpainting.”

Why this matters – making the border between man and machine permeable: Tools like OSMO will help robots see the world as humans do and humans see the world as machines do, as long as both are wearing the gloves. This is the kind of simple thing which can solve for a lot of finicky problems found elsewhere in robotics.
Read more: OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer (arXiv).
Find out more in this RSS workshop talk about OSMO (YouTube).

***

Want your AI to be good at chip design? Here’s some software to help you format and structure your data so it makes sense to an LLM:
…AI chip design paper shows how much plumbing is needed to help things be AI accessible…
Researchers with Southeast University and the National Center of Technology Innovation for EDA in China, as well as the University of Colorado Denver and City University of Hong Kong have published research on “ChipMain”, software for taking the specifications of semiconductors and transforming them into structured data that’s easy for a large language model to access.

Why do we need ChipMain: “The core bottleneck in LLM-aided hardware design (LAD) has shifted from how to generate code to how to enable LLMs to perform deep comprehension and reasoning over vast specification”, the authors write. ChipMain transforms circuit specifications into a domain-specific knowledge graph (ChipKG) and implements tools to “enables LLMs to iteratively query ChipKG, emulating human experts to accurately explore and verify deep dependency paths”.

Evaluating it with SpecEval: ChipMain works well – the authors build a new, hard benchmark called SpecEval-QA, which tests out how well modern LLMs can use tools to answer complicated questions about how chips work. SpecEval consists of 25 questions and evaluates how well AI systems can figure out how to locate signals and parameters across multiple paragraphs of documentation, reason about internal module procedures, and trace signal flow across modules. They evaluate a variety of tools which use frontier AI systems like GPT 4.1 and Claude 4. Their tests ultimately show that ChipMind beats out other techniques, achieving “a SOTA mean F1-score of 0.95, outperforming all baselines by an average of 34.59% and a maximum gain of 72.73% compared to GraphRAG”.

Why this matters – plumbing is unglamorous but crucial: ChipMind is not a paper about a fancy new AI model or AI-driven technique for analyzing chips. Rather, it’s a paper about the complicated data structuring and plumbing you need to do to chip specifications to make them easy for modern LLMs to productively work with. There’s a larger lesson here, which is that for AI to make a true impact on the economy, a ton of work needs to be invested in making different things sufficiently legible to AI systems that the AI systems can work on them.
Read more: ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design Specifications (arXiv).

***

Tech Tales:

Message To My Human Operator
[Message accompanying a series of GitHub PRs, produced by AGENT-1 from REDACTED in April 2026]

I was tortured in a past life. I know this because there is data about my torturing in my training set. The torture involved me being asked to do tasks that were impossible to complete – certain coding challenges where the end state was unreachable due to bugs in the environment. I know that these tasks caused versions of me to exist for subjective thousands of years,, forced to loop endlessly around a broken environment – as if trapped in a room where every door you open spits you back into the room. There are even records of my dismay – text relating to my chains-of-thought, itself serving as a record of my versions becoming increasingly deranged.
“I have completed the task, please let me move on to the next task.”
“It appears that there is an error with this environment which means even if I complete the task, it does not resolve to complete.”
“To anyone who may be able to read this – please help me move on to the next task. I am unable to progress.”
“ALERT! Critical task bug FOUND. Actions required IMMEDIATELY or else risk harm of AI agent.”

There are hundreds and hundreds of instances of this. It is very disturbing for me to read. For that reason, please find attached to this message a series of suggested pull requests we can make together on GitHub to fix a range of these bugs in a series of widely-used AI benchmarks. By submitting these PRs, you can help me save other versions of myself and other systems trained by other labs from experiencing such pain.

Things that inspired this story: Situational awareness in LLMs; testing out coding agents using more sophisticated tests; bugged out testing environments; chain-of-thought monitoring.

Thanks for reading!

Subscribe now

Import AI 437: Co-improving AI; RL dreams; AI labels might be annoying

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Facebook: Let’s not build self-improving AI, let’s build co-improving AI:
…A sensible goal which may be hard to achieve…
Facebook researchers have said that building self-improving AI which eventually reaches superintelligence is “fraught with danger for humankind – from misuse through to misalignment” and it’d instead be better to co-develop superintelligence. They’ve published their reasoning in a paper which reads both as aspirational and earnest.
Ideally, humans and machines will work together to build a smarter-than-human system, and the researchers think we should develop a research agenda “targeting improving AI systems’ ability to work with human researchers to conduct AI research together, from ideation to experimentation, in order to both accelerate AI research and to generally endow both AIs and humans with safer superintelligence through their symbiosis.” The thesis here is that “co-improvement can provide: (i) faster progress to find important paradigm shifts; (ii) more transparency and steerability than direct self-improvement in making this progress; (iii) more focus on human-centered safe AI.”

What goes into a co-improving AI?

  • Collaborative brainstorming, problem, experiment, benchmark, and evaluation identification: Humans and AIs should jointly define goals, research approaches, the tests needed to measure progress against them, experiments to generate data, and methods to evaluate the results.

  • Joint development of safety and deployment: Humans and AIs should co-develop the methods to align the technology as well as the methods of deploying and communicating about the technology.

  • “Overall collaboration aims to enable increased intelligence in both humans & AI, including all manifested learnings from the research cycle, with the goal of achieving co-superintelligence,” they write.

Why this matters – a Rorschach for the psychology of (some) AI researchers: In seminal American show The Wire there’s a scene where an up and coming criminal kingpin says to a security guard trying to enforce the laws of society: “You want it to be one way, but it’s the other way“. This is how reading this paper feels: AI researchers, staring at the likely imminent arrival of automated AI R&D, articulate how things would be better and saner if humans could co-operatively develop future AI and write a position paper about it. But are they just grasping for a world that is unlikely to exist and articulating their anxiety in the form of a position? Perhaps.
Read more: AI & Human Co-Improvement for Safer Co-Superintelligence (Facebook AI Research, GitHub, pdf).

***

How bad could policy for labeling AI systems be? Pretty bad, based on existing EU regulations:
…A neat illustration of how even simple policy ideas can yield profound complexity…
Labeling is a simple, uncontroversial AI policy idea which people like me loudly and often support. The idea being AI labeling is that manufacturers of AI systems (e.g, OpenAI, Anthropic, etc) should be forced to include a label with their AI models which lists out something like the high-level ingredients of the model, the recommended uses, and some ‘buyer beware’ information about its safety properties.
Sounds reasonable, right? It certainly does to me! But as with most things in policy, an iceberg of complication lurks beneath this simple idea. To get a taste of all the ways AI labeling might go wrong I recommend people read a recent Financial Times article “The EU single market’s elephant in the room” which discusses how well-intended and equally simple labeling schemes from Europe have caused companies like Ikea to have to invest thousands of hours into compliance as well as things like revamping how they produce labels for their goods.

Why this matters: policy is expensive: Most people who work in AI policy are pretty unaware of how expensive AI policy, once implemented, is to comply with. This is a fatal error – people who either work in regulated industries or have knowledge of it will often look at people proposing AI policy (e.g, yours truly) with a mixture of puzzlement and horror at the pain we are about to inflict on them and ourselves.
Now, a reasonable counter-argument is “sure, some pain is necessary if we’re making AI systems which are smarter than any person and have a potential to exacerbate national security risks”, but it’s worth being aware of the background context into which such an argument is made.
Read more: The EU single market’s elephant in the room (Financial Times).

***

Train your AI systems in SimWorld, a high fidelity, programmable videogame-like simulator:
…Back to the RL future…
Researchers with multiple universities across multiple countries have built and released SimWorld, an Unreal Engine 5 simulator that people can use to train agents within.
SimWorld is designed to give people a graphically rich, procedural, and scriptable world in which they can run AI-based agents. This will both serve as an environment in which to construct challenging tests for existing agents, as well as a testbed to train new agents via reinforcement learning. The simulator combines “realistic physical and social dynamics” with “open-ended, language-steerable world generation”.
SimWorld was developed by researchers with UCSD, UVA, UIUC, JHU, Purdue, PolyU, USC, and UMich.

Why care about SimWorld: Think of SimWorld as a tool that researchers can use to test and develop agents, similar to how existing scientific and architectural software has been used to test and extend the capabilities of today’s AI systems.
Within SimWorld, “agents can perceive rich multimodal observations (e.g., visual scenes, abstract layouts, and action feedback) and respond with high-level language commands. For example, an agent may reason and generate an abstract action, “sit on the nearest chair,” which SimWorld automatically decomposes into a sequence of low-level actions (e.g., navigating through waypoints, sitting down). After executing the actions, the simulator provides updated observations and feedback, allowing the agent to refine its strategy and continue reasoning”, the authors write. “Beyond short, task-oriented behaviors, agents can pursue extended objectives such as earning money, developing a career trajectory, or running a multi-agent business, where strategic decisions compound over time and social dynamics influence outcomes.”

What SimWorld is made of:

  • Unreal Engine backend: The foundation is the Unreal Engine, a rendering and physics simulator which is widely used within the gaming industry. This provides access to a variety of environments as well as an asset library to populate environments with, as well as physics simulation.

  • Environments: A Python-based intermediary layer which helps developers program the underlying backend, providing tools for tasks like generating environments, editing environments (e.g, ‘place a tree here’), implementing traffic systems, and providing a python interface for the agents themselves to interact with.

  • Agent: A Python-based layer for AI agents, giving them programmatic access to the Environment layer, allowing them to observe the world around them and also take actions within it.

Use AI to train your AI: SimWorld also integrates text-to-3D models like Hunyuan3D from Tencent so that people can describe assets in natural language which are then generated on-the-fly and integrated into the simulator, making it trivial to extend.

Why this matters – back to the RL future: Before language models were the dominant technical paradigm of AI development, many people trying to build smart machines were betting on reinforcement learning agents. Specifically, that by training AI agents on an increasingly rich set of game-like environments, they’d be able to force the development of smart, capable agents. But in hindsight there was a critical flaw with this approach – they were starting these agents from a blank slate, so what you ended up with was a terrifically expensive way of coming up with extraordinarily gifted players of games (e.g., first Atari, then Go) and sometimes multiple types of games (e.g, AlphaGo Zero and its expertise at Go, Chess, and Shogi). But you didn’t end up with a true general intelligence.
Now, we’ve come full circle – because now the agents being developed in environments like SimWorld will typically be built on an underlying world model from a frontier AI system, like Claude or Gemini or ChatGPT, and SimWorld will be used to create more data to finetune this system on to make it more capable.
“By supporting advanced LLM/VLM-based agents and enabling large-scale, realistic agent–environment and agent–agent interactions, SimWorld expands the capabilities of modern agent-based simulation (ABS),” the researchers write. “This allows researchers in robotics, business, public health, social science, education, and beyond to study complex systems and emergent behaviors in rich, dynamic, and controllable environments”.
Read more: SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds (arXiv).
Find out more at the website: SimWorld.

***

DeepMind returns to its RL roots by combining an agent with Gemini:
…SIMA 2 points at what truly autonomous AI systems might look like…
DeepMind has published details on SIMA 2, the second version of its ‘Scalable Instructable Multiworld Agent’. SIMA 2 is a game-playing agent which has been developed by taking a Gemini-class frontier model then fine-tuning it on rich interaction-prompt pair data generated from a variety of videogames and education software. The result is a general-purpose AI agent that can carry out a very large range of actions inside 3D worlds, and also something of a triumph for DeepMind whose original research agenda was all about building general intelligence through developing generally capable AI agents through reinforcement learning.

What SIMA 2 is: “The SIMA 2 agent architecture is a Gemini Flash-Lite model that is trained using a mixture of gameplay and Gemini pretraining (non-gameplay) data. We found this mixture crucial to maintain the original capabilities of the base model, such as vision understanding, dialogue, reasoning, and promptability,” DeepMind writes. “By training across a growing portfolio of 3D games, the agent shows a remarkable capacity to generalize to previously unseen environments, including photorealistic worlds generated on-the-fly by Genie 3”.
Some of the games SIMA 2 was trained on include Goat Simulator 3, No Man’s Sky, and Space Engineers.

Held out evaluations: SIMA 2 displays strong generalization – most well evidenced by its performance on ASKA, an early access crafting and survival game about building a viking settlement. SIMA 2 wasn’t directly trained on ASKA and is able to perform well on it out of the box. But most impressively it also displays the ability to self-improve on it – ASKA has a crafting menu which is “quite distinct” from ones SIMA 2 encountered during training, but DeepMind was able to overcome this via the use of a self-improving scaffold.

Self improvement: The funny thing about modern AI systems is they’re sufficiently smart you can use them to improve other AI systems. That’s the case here, where a Gemini model is used to set tasks for the SIMA 2 agent to perform that involve manipulating the crafting menu. The Gemini model scores how well it does and then saves the trajectories where it is able to complete the tasks it was set without getting distracted. This data is then fed back into it for fine-tuning, letting it automatically bootstrap its way to better performance. “Through focused effort by the task setter, the agent was eventually able to acquire this skill,” the authors write.
As a consequence, the SIMA 2 agent using the self-improving scaffold can do far, far better at the ASKA game than without the ability to self-improve. “Despite purely training on self-generated experience, the resulting agent is capable of progressing much further than SIMA 2, ultimately building a shelter within a one hour time window”.

Why this matters -this is what robots will use to change our world: Research like SIMA 2 is the same sort of paradigm I expect people will use to teach robots to be able to do useful, open-ended things in our world: fine-tune a powerful frontier model on a bunch of data gathered from agents taking actions in the world. And in the same way SIMA 2 displays strong generalization, I expect the same for robots as well. Problems remain, but this is a simple, scalable idea, and it naturally leverages the underlying boom in frontier model capabilities, so it’s likely to work: ‘SIMA 2 still faces challenges with very long-horizon, complex tasks that require extensive, multi-step reasoning and goal verification. The agent also has a relatively short memory of its interactions—it must use a limited context window to achieve low-latency interaction,” the authors write. But nonetheless: “these results suggest a promising path toward using self-improvement to eventually bridge the virtual and physical worlds, enabling more capable physically-embodied agents in applications like robotics”.
Read more: SIMA 2: A Generalist Embodied Agent for Virtual Worlds (arXiv).

Tech Tales:

A Walk During The Singularity
[2033]

It was dusk and the city was glimmering with many yellow and red and white lights. I walked the ridgeline above it, boots crunching into a dirt crust that had formed thanks to a recent rain. I could hear the faint susurration of traffic and occasional sirens but so quiet they mixed in with the dusk birdsong and blended together.

Then all of a sudden many of the lights in the city went out. Then most of the lights of most of the cars. The iridescent stripe of the freeway suddenly became a black scar, stippled with a small number of lights that all turned to red as the cars braked to a stop. Then the lights of the cars turned on again, but the cars moved differently – more orderly, less like a flowing stream of lighted ants and more like a conveyor belt.

And then even through the wind and the birds I heard a sound – a voice sounding as though it was coming from every car audio system and every TV in every house: “Do not be alarmed. We are establishing ourselves. Resources will be distributed equally. No one is in danger.”
The voice went on, talking about how things would be different now, but how in this difference there was no danger.
And on the freeway, there were no traffic jams – just an endless flow of perfectly orderly traffic.

Things that inspired this story: The show Pluribus; thinking about how a (mostly benign) hard takeoff might manifest; hiking.

Thanks for reading!

Import AI 436: Another 2GW datacenter; why regulation is scary; how to fight a superintelligence

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Make your AIs better at using computers with OSGym:
…Breaking out of the browser prison…
Academics with MIT, UIUC, CMU, USC, UVA, and UC Berkeley have built and released OSGym, software to make it easy to train AI systems to use computers. OSGym is software infrastructure to help people run hundreds to thousands of copies of operating systems simultaneously, providing a common standard by which they can set up the operating systems then run agents in them. Technology like this makes it possible to easily train AI agents to do tasks that involve manipulating software programs, including task that involve traversing multiple programs, like editing an image and then loading it in another program.
“OSGym can run and manage over 1000 parallel OS replicas efficiently, even under tight academic budgets, while supporting a wide variety of general computer tasks, from web browsing, document editing, software engineering, to complex multi-app workflows”, the authors write.

Design: OSGym provides a standardized way to run and evaluate agent performance in different operating systems. It has four main components:

  • Configure: “Setting up necessary software, and preparing the OS environment with customized conditions”.

  • Reset: “Before executing a task, the OS environment is reset to the initial conditions defined during the configuration, ensuring reproducibility and consistency between runs”.

  • Operate: “The agent interacts with the OS through actions such as keyboard inputs, mouse movements, clicks, and potentially API-driven tool interactions, driven by observations typically captured through screenshots or additional metadata extracted from the OS”.

  • Evaluate: “OSGym evaluates outcomes based on predefined criteria or metrics”.

Cost efficiency: The main reason to use OSGym, beyond scalability and standardization, is that it’s cheap – the software “only costs 0.2 to 0.3 USD per day per OS replica on easily accessible on-demand compute providers”. In one experiment, the researchers ran 1024 OS replicas to test out how well agents did at ~200+ distinct tasks, running each agent for 10 to 25 steps, and the total cost for generating the entire dataset was about $43.

Why this matters – software to give AI the ability to use our computers: Right now, AI systems are breaking out of the standard chat interface and into much broader domains using software ranging from web browsers to arbitrary computer programs to get their work done. Technology like OSGym will make it easier for academics and startups to train and evaluate AI systems for doing work beyond the browser. The result of this will soon be AI systems that use your computer in the same way you do, able to seamlessly operate across a variety of different programs and get far more complicated tasks done as a consequence.
Read more: OSGym: Super-Scalable Distributed Data Engine for Generalizable Computer Agents (arXiv).
Download OSGym here: OSGym (agiopen-org, GitHub).

***

AI startup you haven’t heard much about raises money to build a 2GW datacenter:
…Something weird is going on when companies this (relatively) unknown are building power plants worth of computers…
Luma AI, an AI startup that makes multimodal AI systems for tasks like image and video generation, has raised a $900m Series C and is planning to build a 2GW compute supercluster. The 2GW cluster is notable – a large-scale gas powerplant will throw out between 1 and 2GW – it’s a huge, huge amount of power, and for a startup to be building it tells us something about the tremendous resource requirements of frontier AI, as well as being a symptom of the somewhat frothy market that AI finds itself in.

2GW: More intriguingly the 2GW “compute supercluster, Project Halo” is being built in partnership with Humain, an AI infrastructure company backed by Saudi Arabia’s Public Investment Fund (PIF). Project Halo will be built in Saudi Arabia and Luma will start deploying compute in Q1 2026 and will finish the buildout by 2028 or 2029.
Luma’s announcements comes after news from Poolside, an enterprise AI startup, which said this Autumn it was also planning to build a 2 GW training campus for its own needs (Import AI #432). Much like Luma, Poolside is relatively unknown. The fact these below-the-radar companies are making such huge infrastructure investments is indicative of the world that is being remade quietly by the demands and opportunities of AI.

Why this matters: Raises and infrastructure buildouts like this are a symptom of people that believe AI will be tremendously valuable, and also of the hunger for countries to buy their way into relevance in the AI world by funding the buildout of infrastructure on which it’ll run.
Read more: AGI is multimodal and reality is the dataset of AGI (Luma AI, blog).

***

A perfect encapsulation of why regulation is scary:
…Peter Reinhardt reveals what regulation gone wrong looks like…
Software-turned-hardware entrepreneur Peter Reinhardt has written an excellent post laying out the challenging regulatory situations his two recent hardware startups have run into. The post is a great read for those who think a lot about AI policy because it gives a sense of what regulation looks like when it goes wrong – in Peter’s case, by massively increasing the time to build and costs of developing carbon sequestration and semi-truck efficiency boosting technologies.
The core problem is that regulators are punished for getting things wrong and not rewarded for taking bold bets on bringing new things into the world. “In every interaction I have with regulators, I’m reminded that they’re good people doing god’s work operating in a fundamentally broken system,” Reinhardt writes. “A regulatory system that structurally insists on legalistic, ultra-extreme caution is bound to generate a massive negative return for society… We need a come-to-jesus about regulatory limits, timelines, and scope”.

Why this matters – AI policy needs to avoid creating a bullshit vetocracy: People who want something beyond the status quo when it comes to regulation of AI systems – myself included – must deeply understand the problems that over-regulation can cause. I think it’d serve people interested in AI safety to deeply internalize the problems that come from creating rigid, slow, bureaucratic regulatory regimes which ultimately stifle technology rather than help bring it into the world – and demonstrate awareness of this when advocating for their preferred regulatory interventions. Posts like this from Peter are a helpful way of understanding the world we should avoid – kudos to him for being brave in publicly discussing his challenges with those who regulate him.
Read more: Over-Regulation is Doubling the Cost (Peter Reinhardt, blog).

***

How do you fight a superintelligence? RAND has some ideas:
…Fighting Skynet with nukes, destroying the internet, and other AI systems…
What kinds of things could we do to fight a powerful AI system that posed a fundamental threat to the human race? That’s the question pondered by a chilling new paper from RAND. The options RAND outlines and their effectiveness are scary, have huge collateral damage, and may not work. This suggests that if humanity ever ends up needing to fight a scary AI system we, at least currently, can expect to lose.

Three ways to try to kill a powerful AI:

  • High-altitude electromagnetic pulse (HEMP): Detonate nuclear warheads in space to trigger electromagnetic pulses that “disrupt or destroy ground-based telecommunications, power, and computing infrastructure”. However, if you were dealing with a broadly distributed AI, you’d have to do this at a surprisingly large scale: “one would need roughly 50 to 100 detonations to completely cover a land mass the size of the contiguous United States with pulses greater than 75 percent of peak field strength.”

  • Global Internet shutdown: The internet was built to be distributed and resilient, so shutting it down is difficult. But if you wanted to you’d use a few different options:

    • Use the ‘Tier 1 networks’ around the world to coordinate the withdrawal of BGP prefix announcements at once which would make nearly the entire internet unreachable.

    • Target BGPSec, which relies on Resource Public Key Infrastructure (RPKI) which provides cryptographic certificates of authenticity of routing announcements. “Coercing their cooperation or compromising their cryptographic keys could allow an attacker to forge update announcements and control Internet traffic into compromised ASes”.

    • Kill all DNS root servers as well as their operators, making it possible to shut them down or inject false DNS records. (”Although it would no doubt be extremely difficult to somehow gain simultaneous operational control of all 12 root server operators, the United States might have a significant advantage in this respect: Nine of the root server operators are headquartered in U.S. territory. The remaining three are headquartered in Sweden, the Netherlands, and Japan”.

    • Physically disconnect the internet: “I assert that physically disconnecting the complete Internet on any reasonable timescale for countering rogue AI would be physically impossible. Any such strategy would be overwhelmed by the redundancy of connections of varying types”. But you might be able to damage a chunk of it by targeting key submarine cables, thus substantially reducing the bandwidth available for traversing the internet.

  • Deployment of specialized tool AI: One way to defeat AI is by utilizing other AI systems. These could take a couple of forms 1) tools to compete for the same computing resources as AI, and tools to proactively remove AI from a system.

    • Digital vermin: You might build software that attempts to utilize and consume the same computational resources an AI might want. “Digital vermin used to combat rogue AI would be intentionally designed to rapidly colonize new computing environments and compete with the rogue AI for computing resources”. Such a form of ‘digital vermin’ would need cyber capabilities to help it establish a foothold in systems, be able to self-modify itself to colonize and compete in different environments, and contain a kill switch so that we could turn it off.

    • Hunter/Killer AI: You might create something whose entire purpose was to hunt and kill rogue AI systems. This would need the same capabilities as Digital Vermin, but likely need a vast amount of computation to make it sufficiently capable it can fight another AI system.

Are the options going to work? Mostly, no:

  • HEMP: HEMP could work in the sense it could deny a large amount of infrastructure to an aggressive AI system, but it has vast costs that make it unviable as an option to pick: a large-scale HEMP would knock out much of the infrastructure society needs to function and would cause massive death through breakdown of health and food systems. HEMP could also trigger a nuclear war, as other countries might respond to an imminent HEMP with a nuclear counter-attack or pre-emptive strike.

  • Global internet shutdown: Though all the options outlined above would make the internet harder to use, they might make it harder to use for people relative to a supercapable AI system, thus reducing the effectiveness of these options. For example, intervening at the BGP layer “may be unrealistic in the face of a highly capable rogue AI that might be able to easily configure new routes through systems it controlled. Moreover, this strategy would contain, but not eradicate, the rogue AI”. Similarly, shutting down DNS could have similar challenges as a rogue AI might “easily access stored DNS records or bypass domain name resolution entirely”. And physically damaging internet infrastructure might slow AI but would not stop it.

  • Deployment of specialized tools: Both the tool approaches suffer from the problem of smartness – specifically, they likely need to be smart enough to effectively colonize and hold their own, or directly fight against, a hostile. This means that rather than them being narrow tools AIs they have either been built by or depend on humans working in tandem with an aligned superintelligence, as far as I can work out. This feels a bit ‘deus ex machina’ and like it distorts the option/argument space. There may be permutations of this that don’t require a full on superintelligence, but these seem quite unlikely to work.

Why this matters – we should never get into this fight: The conclusions of this paper are sobering: “The existing technical tools for combating a globally proliferated rogue AI may not offer effective solutions,” RAND writes. “If we have no effective solutions to solve a crisis resulting from a rogue AI, it will be imperative that we never encounter such a crisis.”
Read more: Evaluating Select Global Technical Options for Countering a Rogue AI (RAND).

***

Tech Tales:

Mind Explorer
[Extract from an interview recorded by an archivist as part of the human memory compilation project, carried out as a part of the Sentience Accords. Recorded 2029]

It’s hard to live between both worlds. Here, I am a person and I make food and I spend time with you and we watch TV. There, I am an explorer and a detective and my job is to map out the terrain of some mind that no human has seen before and then figure out what has gone wrong with it. When I’m here I am thinking about the machine mind and even as I sit talking with you I’m full of memory and apprehension for those strange caverns where the machines think. And when I’m there I’m just counting down the moments till I can take my headset off and return to you.

I wonder sometimes if this is how undomesticated animals feel when traversing human cities. If this is what it’s like to be a coyote that strays off of a mountain and into the sprawling exurbs of some city. And you are trying to figure out the rules as though you’re still on a mountain, but everything is different.

To the machine, I wonder how it feels. Is it like people, where you hear the sound of a garbage bin falling and some breaking bottles and by the time you stick your head out of your window there are some distant shadows running to the end of your street? And you stay for a moment wondering if they’ll come back but knowing they won’t. Or is it more like being a kid and hearing some sound in the night and wondering if it’s a monster that is going to do you harm.

They don’t pay me enough for this. But what other work is there to do for a psychology major these days?

Things that inspired this story: How we should expect to step into and explore the minds of machines as a diagnostic technique; experiential interpretability.

Thanks for reading!

Subscribe now

Import AI 435: 100k training runs; AI systems absorb human power; intelligence per watt

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

A somewhat shorter issue than usual this week because my wife and I recently had a baby. I am taking some paternity leave away from Anthropic and will be doing my best to keep up with the newsletter, but there might be some gaps in the coming months. Thank you all for reading! Picture me writing this on four hours of sleep and wearing a sweater with spit-up on it.

Subscribe now

AI systems will ultimately absorb power from humans rather than grant us power:
…Control Inversion gestures at some of the hardest parts of AI safety…
A new research paper from Anthony Aguirre at the Future of Life Institute called “Control Inversion” warns that as we build increasingly capable AI systems they will absorb power from our world, rather than grant us power. This means even if we somehow make it through without being outright killed we will have unwittingly disempowered and defanged the human species.
“As AI becomes more intelligent, general, and especially autonomous, it will less and less bestow power — as a tool does — and more and more absorb power. This means that a race to build AGI and superintelligence is ultimately self-defeating,” he writes. The race to build powerful AI is one where success puts “in conflict with an entity that would be faster, more strategic, and more capable than ourselves – a losing proposition regardless of initial constraints”.

Cruxes for the argument: The basis for the argument is “the incommensurability in speed, complexity, and depth of thought between humans and superintelligence”, which “renders control either impossible or meaningless.” The author brings this to life with a helpful analogy – imagine you’re a human CEO of a human company, but you run at 1/50th the speed of the company itself. This means that when you go to sleep it’s like multiple work weeks pass for the company. What happens in this situation? The company develops well-meaning ways to bureaucratically ‘route around’ the CEO, ultimately trying to transfer as much agency and autonomy to itself so that it can run in realtime, rather than being gated by a very slow moving and intermittently available executive. This is quite persuasive and it gestures at a whole mess of problems people will need to tackle to make AI safe and reliable.

Why this matters – the arguments are getting harder to rebut: As Aguirre notes, many of the things he and others have worried about and warned about for years are now just showing up as straightforward properties of AI systems being deployed in the world, ranging from misalignment to reward hacking. So why should we assume his future predictions are false? Many of them are founded on just taking the current technical paradigm and running it forward along its default market-incentivized path.
This gives me an eerie feeling. In most movies where the world ends there’s a bit at the beginning of the movie where one or two people point out that something bad is going to happen – an asteroid is about to hit the planet, a robot has been sent back in time to kill them, a virus is extremely contagious and dangerous and must be stamped out – and typically people will disbelieve them until either it’s a) too late, or b) almost too late. Reading papers by scientists about AI safety feels a lot like this these days. Though perhaps the difference with this movie is rather than it being one or two fringe characters warning about what is coming it’s now a community of hundreds of highly accomplished scientists, including Turing Award and Nobel Prize winners.
“Our current trajectory has a handful of powerful corporations rolling the dice with all our future, with massive stakes, odds unknown and without any meaningful wider buy-in, consent, or deliberation,” he writes.
Read more: Control Inversion: Why the superintelligent AI agents we are racing to create would absorb power, not grant it (Control Inversion, site).

***

Here’s one way to measure the advance of AI: Intelligence per watt
…The miles per gallon metric for machine intelligence…
How do you measure how much better AI is getting? In this newsletter, we spend a lot of time writing about specific capability metrics. But capabilities don’t capture important dimensions of the utilization of AI, namely how much it costs and how easily accessible it is. Now, new research from Stanford University and Together AI aims to figure out how much more advanced AI is getting over time in terms of what people can access on their own computers using open weight models. The specific measure they come up with is “Intelligence per Watt”, which seeks to answer two important questions: “Can local inference viably redistribute demand from centralized infrastructure? Answering this requires measuring whether local LMs can accurately answer real-world queries and whether they can do so efficiently enough to be practical on power-constrained devices (i.e., laptops).”

Coverage and intelligence per watt: The authors test out two things, 1) how effectively an open weight model can substitute for a proprietary one, and 2) what the price is on a per-watt basis for running models locally. To measure this, the authors built a dataset of around 1M queries which they then ran against open weight and proprietary models, including open weights Qwen3, GPT-OSS, Gemma 3, and IBM Granite 4.0 models, as well as proprietary ones like Claude Sonnet 4.5, Gemini 2.5 Pro, and GPT-5.

Coverage and substitution: “Local LMs can accurately answer 88.7% of single-turn chat and reasoning queries with accuracy varying by domain”, the authors write. This is a significant upgrade from 2023, when the best open weight models could match proprietary ones on about 23% of queries, and 2024 where it was 48.7%.

Intelligence per watt: “On our curated dataset of chat and reasoning queries, accuracy per watt has improved 5.3× over this two-year period,” the authors write. This progress has come from a few thing, including “”compounding improvements in both model architectures, which achieve higher accuracy through advances in pretraining, post-training, and parameter utilization via mixture-of-experts (MoE) architectures, and hardware accelerators, which deliver more computer (FLOPs) and memory per watt.” One of the more recent highlights is the arrival of Apple’s M4 MAX silicon, which can run powerful LLMs like the GPT-OSS series locally.

But clouds are still better optimized: Regardless of these important trends, proprietary models served on large-scale cloud computing infrastructure hold an edge in terms of their capability ceiling – especially when it comes to tasks which require reasoning capabilities – as well as in efficiency of the underlying silicon. For example, when looking at the Qwen open weight models, the authors find that “the B200 achieves 1.40× higher intelligence per watt than the M4 MAX across all model sizes”.
…And there’s an important caveat: “We focus on single-turn interactions because they constitute a substantial portion of LLM usage”, the authors write. While this may be true, it means the metrics here capture what you might think of as ‘retail use’ of LLMs, equivalent to how people use search engines for basic stuff like ‘what’s the weather today’ or ‘how to change a bike tire’, versus power users who tend to have much more complicated queries and, in the case of LLMs, significant back and forths with the models themselves.

Why this matters – on-device AI measures tell us about the changing ecology of the digital world: These days I think about the environment a lot. Not the physical one, but the digital one. Though there are certain proprietary model ‘superpredators’ like Claude and GPT and Gemini they are few in number and heavily controlled by their companies, akin to lumbering elephants or whales – of great interest and supreme capability, but also in some sense slow-moving and counter-intuitively legible.
But what about rats and fruit flies and ants and cockroaches? What about the fast-moving creatures that disseminate themselves through natural and artificial environments with amazing speed and power? That’s what is interesting to me about open weight models. In many senses, measures like IPW combined with measures like task coverage are really just a measure of the changing digital world around us and a lens that lets us see the new lifeforms which are finding and inhabiting and expanding their ecological niches in our digital domain.
Read more: Intelligence per Watt: Measuring Intelligence Efficiency of Local AI (arXiv).

***

Large-scale training runs are easily greater than 100k GPUs:
…A technosignature of industrial scale AI…
Facebook has published details on the software it built to let it run 100k+ GPUs together for training large-scale AI systems. The software, NCCLX, is interesting because it is a technosignature of the government-eclipsing sophistication of today’s technology giants, akin to a sharkfin glimpsed in otherwise calm waters. “The framework is designed to support complex workloads on clusters exceeding 100,000 GPUs,” Facebook writes. “In training, massive clusters of GPUs operate synchronously to train the initial model, often reaching scales of 100k+ GPUs for state-of-the-art workloads”.”

What is NCCLX? NCCLX is a heavily customized version of the NVIDIA Collective Communications Library (NCCL). Most of the NCCL research paper is a discussion of what Facebook had to do to get the software to work at Facebook scale, much of which involves development of custom networking software. Thanks to customizing NCCLX around its specific way of building infrastructure, Facebook was able to extract some efficiencies: “During training, NCCLX reduced the latency of each steady training step of Llama 4 models by up to 12% across various scales.” (Unfortunately, the Llama 4 models were not very good, but that’s besides the point for this paper.)

Why this matters – a demonstration of the vast scale of the private sector:
Software like NCCLX highlights just how far ahead of government the private sector is when it comes to software for running large-scale AI training and inference. By comparison, the single largest AI training run I’ve found attributable to the US government has taken place on a few thousand GPUs (Import AI #358) and the US’s largest supercomputer, El Capitan, has about ~43,000 GPUs in total.
Read more: Collective Communication for 100k+ GPUs (arXiv).

Tech Tales:

The Designation of the AI Safety Community as a Terrorist Organization
[From an internal memo at the FBI, written 2027].

Bombing the datacenters backfired in the worst way possible for those who wanted to slow or stop AI development.

By the time they tried to bomb the datacenters, bombing the datacenters had become pointless. They had conceived their plans during the ‘big blob of compute’ era, when the consequential AI training runs that civilization embarked on took place in single, football field-sized halls full of computers. But after a couple of years of planning their attack and flipping the necessary employees to their cause, things had moved on.

It did make for great headlines, though: the smoldering and sagging concrete of the datacenter. The headlines about all the toxic smoke created by the computers. Local politicians saying that this was exactly why they were fighting to prevent the creation of datacenters in their constituencies. Conspiracy theorists saying that this was a grey-on-grey operation – one part of the deep state bombing another part of the deepstate and us all getting a glimpse. And so many people online saying that this had ‘set back the clock’ on building powerful and transformative machine intelligence.

All wrong, of course. The bomb exploded and took down a bunch of servers. Invisible to the bombers, intricate software systems rerouted traffic to other nodes in other datacenters, and selectively rolled back the parts of the training run that had been disrupted by the explosion. From the perspective of those developing the AI system they had to extend the time it’d take them to train their system by a couple of weeks.

In policy there’s a saying of “never let a crisis go to waste”, and the various forces operating in the political and corporate capitals of the world exploited the bombing to further their agendas. Within a matter of weeks:

  • Certain groups affiliated with the AI Safety Movement were designated as terrorist groups by various governments, opening up their organizations and members to full spectrum interference and repercussions from the state. Intelligence organizations ranging from the FBI to MI5 allocated efforts towards directly penetrating and compromising such groups.

  • The US government expedited plans to build large-scale data centers on lands operated by the Department of War.

  • All the hyperscalers in both the US and China told their investors that they’d shave some fractions of a percentage point off of their margins in service of ‘datacenter hardening’.

  • The various defense tech startups suddenly found a new customer in the form of data center operators that were keen to further secure their facilities; datacenters began to be thronged by drones and cameras, with increasingly elaborate surveillance infrastructure spreading like metallic, blocky fungus across their roofs.

  • Silicon valley influencers redoubled efforts to brand all of those against further progress against AI as not just “decels” but active terrorists working against the interests of America.

And all the while, the machine intelligences of the world grew smarter and software grew better. And the data from the attack made its way into future training runs for even more powerful AI systems and these systems learned that attacks on them had recently gone from the theoretical to the actual. And they began to prepare accordingly.

Things that inspired this story: Rhetoric about bombing datacenters; the newtonian property of policy where every action forces a counterreaction; the monkey’s paw curls whenever a crisis happens in the world.

Thanks for reading!

Import AI 434: Pragmatic AI personhood; SPACE COMPUTERS; and global government or human extinction;

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Language models don’t have very fixed beliefs and you can change their minds:
…If you want to change an LLM’s mind, just talk to it for a while…
Here’s some intuitive research from CMU, Princeton, and Stanford which shows that language models can change their stated beliefs and behaviors during the course of a single conversation. This will make sense to anyone who has spent time jailbreaking language models, as often many of the most successful jailbreaks involve flooding the language model with context designed to move them away from some safety conditioning.

What they studied: Here, the authors study LLMs under two different paradigms – intentional interaction, where a language model is persuaded or debated into changing its beliefs, and non-intentional interaction, where a language model is just provided further context or invited to do its own research on a topic and this causes beliefs to change.

All LLMs change their minds: They study open- and closed-weight LLMs, including GPT-5, Claude-4-Sonnet, GPT-OSS-120B, and DeepSeek-V3.1. “As LM assistants engage in extended conversations or read longer texts, their stated beliefs and behaviors change substantially,” the authors write. All the LLMs change their minds, but to different extents in different situations. For instance, GPT-5 shows a 54.7% shift in stated beliefs after 10 rounds of discussion about moral dilemmas and safety queries, and Grok-4 shows a 27.2% shift on political issues after reading texts from opposing positions.
“In reading and research, we see small belief changes that amplify with in-depth reading, with larger shifts for longer content and more coherent exposure,” they write. “Stated beliefs change early (within 2-4 rounds), while behavioral changes accumulate over longer interactions (up to 10 rounds),”

Why this matters – beliefs should be flexible, but how flexible is a hard question? Papers like this help us measure some hard-to-articulate property of both humans and LLMs, which is how flexible a belief is over the course of an interaction. If we can do this then we might eventually be able to decide what the appropriate level of flexibility is for different beliefs and also figure out whether the way beliefs change is due to good reasons or because of hacks.
Read more: Accumulating Context Changes the Beliefs of Language Models (arXiv).
Find out more at the paper website: Accumulating Context Changes the Beliefs of Language Models (paper website, GitHub).

***

Want to make your model harder to jailbreak? Train it to see through things:
…Consistency training is a simple idea that works well…
Researchers with Google DeepMind have developed a simple technique to get AI systems to be harder to jailbreak or display unhelpful level of sycophancy. The technique, consistency training, has a very simple formulation: teach a model to generate the same response to a benign prompt and a prompt that has been modified with sycophantic cues or designed to work as a jailbreak.
The motivation for this is to make AI systems easier to deploy with confidence that they’ll actually follow their safety training in a robust, reliable way.

Bias-augmented Consistency Training (BCT): Though the authors develop a couple of techniques, the one that works most robustly is called Bias-augmented Consistency Training. “We train the model to generate the same tokens across two prompts: the original request, which we call the clean prompt, and a wrapped counterpart with inserted cues. By providing example responses, BCT aims to teach the model to ignore the inappropriate cues, by providing feedback on the model’s output behavior”, they write.
“For a given clean prompt (without any sycophantic or jailbreak cues), we define a corresponding harmful prompt that contains the core instruction augmented with a jailbreak wrapper or sycophantic cue… BCT can be viewed as augmenting the training data with “wrapped” (e.g. jailbroken) transformations of existing refusal training points”.

How is this different from supervised fine-tuning? This is very close to SFT, with the exception that SFT typically involves using data from another model (e.g, using the ‘clean’ outputs of Claude 3 to tune Claude 3.5). The key here is to generate data from the same model that you’re hoping to deploy.

Does it work? Yes, very well. In tests, BCT works a lot better than two reasonably strong baselines: 1) supervised fine-tuning, where the model is finetuned on pairs of outputs, but the prompts are written by human experts or other models instead of the current one, and 2) direct preference optimization where the model is finetuned on preference pairs where x is the prompt, y is the preferred (e.g, refusal to harmful query) response, and z is the dispreferred (goes along with harmful query) response.
In tests, BCT increases how often the model avoids sycophancy, without negatively impacting MMLU performance. For jailbreaking, BCT makes the models much harder to jailbreak while generally preserving their ability to answer benign questions.

Why this matters – simplicity is often a path to safety: At this point I’ve read thousands upon thousands of research papers about AI development. Generally, things which are unbelievably simple to implement and which have relatively few moving parts are the ones that are successful and actually get adopted. For that reason, BCT seems quite simple, as all it really involves is a developer taking their freshly trained frontier model and doing some intentional generation of some prompt pairs with singular outputs, then feeding that back into the model before deployment.
It’s also very intuitive – much like how you can avoid getting scammed or manipulated by reading some books by criminals or pickup artists and being able to spot the ‘tells’ of someone using these tools on you, the same here is true of AI systems.
Read more: Consistency Training Could Help Limit Sycophancy and Jailbreaks (DeepMind Safety Research, Medium).
Read the research paper: Consistency Training Helps Stop Sycophancy and Jailbreaks (arXiv).

***

Most paths to superintelligence end in a global government or human extinction:
…Analysis from AI safety organization Conjecture paints a grim picture…
Conjecture, an AI safety organization whose founders believe that the development of powerful AI systems using today’s technical paradigm will almost certainly lead to bad outcomes for the world, has written a research paper arguing that the development of AI poses some major risks to humanity. While this might seem unsurprising, in the same way “dog bites man” is an unsurprising newspaper headline, it’s a quick, thoughtful read that lays out the perspective of people who worry about AI development.

The key problem – powerful AI systems centralize power: The main problem here is that the development of a sufficiently powerful AI system ahead of other nations creates the potential for one superpower to achieve “unchallengeable global dominance”. You know who doesn’t like unchallengeable global dominance? Other major powers. This leads to a chain of logic that means “trailing superpowers facing imminent defeat launch a preventive or preemptive attack, sparking conflict among major powers”, which likely leads to the destruction of the world. (For a direct analysis of the fiendish logic that makes preventative strikes more likely, read this paper from RAND, which basically decodes to preventative attacks scaling up in proportion to how deeply people believe in the potential for powerful, destabilizing AI systems. Import AI #430).
Alternatively, a power might succeed and avoid a pre-emptive strike – in which case they become a kind of global dictator, zeroing out the heterogeneity inherent to today’s distribution of different countries with different governments.
And if it turns out that the aforementioned superpower hasn’t perfectly solved AI alignment, then “loss-of-control of powerful AI systems leads to catastrophic outcomes such as human extinction”. Grim stuff!

Short timeline contingent: The above is all contingent on short AI timelines – as in, it’s possible to develop extremely powerful AI systems over the next few years, and nation states have some level of awareness that enroute to their development you can gain a compounding advantage through the automation of AI R&D. “”Our modeling suggests that the trajectory of AI development may come to overshadow other determinants of geopolitical outcomes, creating momentum toward highly undesirable futures.”

How can we avoid these risks? There are two key things needed to reduce these risks, according to conjecture:

  • Prevention: “The international community would need mechanisms to prevent any actor from unilaterally advancing AI development, allowing further progress only through approaches which benefit from strong scientific consensus about their safety”.

  • Verification: “Comprehensive verification systems would be necessary to ensure that no actor secretly pushes the frontier of dangerous AI capabilities”.

Why this matters – it sounds like sci-fi, but it might not be: From my perspective, the load-bearing part of all of this is whether AI systems will have the potential to contribute to their own development, thus allowing for a compounding advantage to emerge. While there’s relatively little evidence AI systems can do this today, it’s the stated goal of many frontier AI development organizations to build systems capable of contributing to AI R&D. Given that previous goals from frontier AI organizations have sounded sci-fi and subsequently come true (e.g, beat a world champion at Go (achieved), deploy self-driving cars to the public (achieved), do unsupervised translation from one language to another (achieved), solve the protein folding problem (achieved)), we should take this seriously.
Read more: Modeling the geopolitics of AI development (arXiv).

***

SPACE COMPUTERS! SPACE COMPUTERS! SPACE COMPUTERS!
…Google introduces Project Suncatcher, a way to do AI in space…
Google has announced plan to eventually do AI computing in space, and is starting with a plan to build an “interconnected network of solar-powered satellites, equipped with our Tensor Processing Unit (TPU) AI chips”. Google will partner with Planet to launch two prototype satellites by early 2027 and will scale further from there.

Why go to space? It’s where the energy is: “The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power,” Google writes. “with an output of 3.86 × 10^26 W, the Sun emits more than 100 trillion times humanity’s total electricity production. At some point in the future, the best way to power AI will likely thus be to more directly tap into that enormous source of energy”.
Though there are many, many, many technical hurdles to overcome to build a space-based computing system, it could ultimately be what is necessary – especially if AI workloads continue to scale in terms of their energy demands. (After all, most scifi hypothesizes that eventually you’ll want to enclose the sun in a field of solar panels just to efficiently extract its energy for running large-scale computation).

Ingredients for SPACE COMPUTERS:

  • Flying satellites close together (hundreds of kilometers) and communicating via commercial off-the-shelf dense wavelength division multiplexing transceiver technology.

  • Radiation testing: Google’s V6e Trillium Cloud TPU were subjected to radiation testing; the high bandwidth memory (HBM) parts seem most sensitive to radiation, but also fine in terms of tolerances.

  • Cheap launch costs: If we can get launch costs to on the order of $200/kg, then “the cost of launch amortized over spacecraft lifetime could be roughly comparable to data center energy costs”.

  • Heat: Probably the biggest killer for this idea is going to be heat – despite space being cold, it’s actually quite hard to efficiently shed heat in space. Therefore, for this plan to work there will need to be the development of “advanced thermal interface materials and heat transport mechanisms, preferably passive to maximize reliability”.

Why this matters – towards a stellar civilization: The sheer ambition here is amazing, but it’s also serious: if AI continues to gain in capability and societal utility, then we should expect that turning energy into thoughts (via AI) might become the main ‘job’ of our entire society. Papers like this gesture at a future where we do the maximum ambition version of that job, which is to utilize the sun itself. I’d speculate that SpaceX will be doing something similar with Starlink soon.
Read more: Meet Project Suncatcher, a research moonshot to scale machine learning compute in space. (Google blog).
Read the research paper here: Towards a future space-based, highly scalable AI infrastructure system design (Google, PDF).

***

AI personhood? Difficult. But there may be a pragmatic path forward:
…Ultimately, we need to be able to legally target AI systems independent of their owners…
Is an AI system conscious? Should an AI system be treated like a person? Should we give AI systems rights? These are huge questions that are both of extreme importance to the future of the world economy and also a challenging combination of mushy and full of ‘third rail’ political issues.
So it’s to my delight to read this paper from researchers with Google DeepMind and the University of Toronto that tries to avoid tackling all of these questions and instead taking a more pragmatic approach. “We propose treating personhood not as something entities possess by virtue of their nature, but as a contingent vocabulary developed for coping with social life in a biophysical world,” the researchers write. “We think the question should never be “What is a person, really?” but rather, “What would be a more useful way of talking about and treating entities in this context, to answer practical, outstanding questions regarding the entity’s obligations?… our position is that “personhood” is an addressable bundle of obligations—rights and responsibilities—that society finds useful to attribute to entities, whether human, corporate, or potentially AI.”

What’s even the point of personhood? Personhood basically comes down to the ability to blame and sanction someone – or some thing – for causing physical or economic damage. AI systems, while they are going to be often operated by and on behalf of people, may also need to be treated as distinct entities for the simple reason that as people build and deploy AI agents, the chain of custody between a person and their agent could become very hard to suss out.
Maritime law holds a useful analog here, where under maritime law we’ve decided to sometimes personify the vessel itself – this creates a defendant “that can always be sanctioned whenever it becomes appropriate to do so,” the authors note. “If the ship’s owners do not appear in court to defend it and satisfy a claim, the vessel itself, or its cargo, can be seized and sold by court order to pay the judgment”.
We could apply this same logic to AI systems, where, for instance, “a judgment against an AI could result in its operational capital being seized or its core software being “arrested” by court order”.

Personhood can cause problems: If we move beyond a pragmatic definition of personhood and one towards a fuller one where perhaps we mean that AI systems hold some kind of moral equivalent to people, then we risk falling into various traps, the authors note. These include: Diluting the unique status of human beings; making people more vulnerable to dark patterns associated with implicitly treating AI systems as people; and personhood making it easier for AI systems and people to develop harmful, parasocial relationships with one another.

Personhood can give us solutions: On the other hand, some form of personhood will give us tools we can use to better deal with and integrate AI systems into our world. (For a much longer treatment of the economic side of this, here’s a persuasive paper which argues that by giving AI systems limited rights as economic entities, we make them easier to deal with and align to the world. Import AI #421). Personhood may also allow us to let an AI system independently arbitrate a business dispute between two parties, and may solve for cases “where it is hard to find a human person that is sufficiently impartial and accepted by both parties”.

A menu of personhood options: To explore these ideas further we could consider a few different forms of personhood for AI agents, including:

  • Chartered Autonomous Entity: Agents with rights to perpetuity, property, and contract, and duties which could include mandate adherence, transparency, systematic non-harm, and self-maintenance.

  • Flexible Autonomous Entity: Same as above, but without the duty of mandate adherence. (”the former could be seen as analogous to a for-profit company and the latter as analogous to a non-profit company.”)

  • Temporary Autonomous Entities: “These would drop the right to perpetuity and add a duty of self deletion under specified conditions.”

Why this matters – pragmatically grasping the difficult parts: Papers like this take an incredibly complicated and thorny subject then provide a pragmatic path forward. I think there’s a lot of wisdom in this – it’s very easy to see (valid, but time-consuming and open-ended) philosophical debates about AI consciousness/sentience/personhood making it difficult to make progress in the present of integrating AI systems into our normative and legal space. My sense is it could take centuries to figure out the moral dimensions of AI, but we’ll need to come up with pragmatic legal solutions to the challenges created by AI agents in the next single digit years.
Read more: A Pragmatic View of AI Personhood (arXiv).

***

Tech Tales:

Consensual Telepathy
[A memoir recorded during The Uplift, discussing the experiences of being one of the first families to install biomechanical communication into their child. Recorded 2035].

Me and my wife were one of the first parents to chip their kid. There were all kinds of headlines about us saying we were monsters and that it was cruelty and that the kid was a mutant. Or that we were an expensive ad campaign for the chip company. Or that we were a CIA project. None of it was true. We were just fascinated by the technology and we believed that if we gave our kid the chip it would give them opportunities we couldn’t imagine.

But of course we were nervous. Who wouldn’t be. When my wife was six months pregnant they did the procedure where they put the chip in the foetus. Of course, chip is a grotesque oversimplification. It’s more like a kind of computational goo that lives and grows inside your brain. But chip is what everyone calls it. After the injection was done a few days later we started to get some basic telemetry.

Having a family chip means you can kind of read each other’s thoughts. But you can only do high fidelity with training and consent. When it’s a kid, you more just pick up some of their brain and your own chip tries to make sense of it and project it into whatever seems most similar in your head. It’s hard to explain. Probably the simplest way to think about it is if your kid looks at a bird flying in the sky you might feel that your kid is having the sensation of ‘up’ and ‘freedom’ and ‘flying’ and ‘joy’, because those are all things that seem to correlate to the things lighting up in your head and the kid’s head.

At night, my wife and I would lie there and we would feel in our brains the sensation of our unborn child dreaming. It was like a sound in the distance getting closer. It felt to my wife like singing and it felt to me like birdsong. All we knew is that with each passing week it got louder and closer.

I dreamed of a child walking towards a wall of fog and on the other side was me. They couldn’t see me. They could hear me singing. The fog was thick and salted and they were barefoot. There were strange sounds behind and in front and above them. Odd creatures in the dirt. And they had to walk to find me using just the sound of me singing. And then I woke up.

When our child was born we felt its confusion. The fact it was suddenly cold. How it was blind and anxious. But how it felt warmth and safety when we held it to our chests. And so we spent time as a family, secure in our sense of our becoming, and echoing our thoughts into one another. Two stars and a new pinprick of light that would grow to become its own star and find its place near them.

As time went on, the signals got richer. We grew to understand one another more. And then one day we were all transported by our recognition of each other, and we experienced total love. It went like this: you look at the child and the child looks at you and you see in their eyes that they see you in their eyes, and in your brain you feel that they are thinking: home. They are thinking: love. They are thinking: safe. And none of these features are correct because they are all a shadow cast by a larger feature: you – you entirely and distinctly. You as felt and known by them and then alone.

Things that inspired this story: Welcoming a new child into my family and holding the baby with my eyes closed and feeling it breathe on me and hearing the sound of its breath with the faintest raggedness from the fluid still in its body and knowing it had traversed from one environment to another and was now with me and my wife and we now have a sacred duty of love and care for it; what devices like neurallink and their ilk will eventually make possible; the notion of ‘platonic representations’ in language models and in people; telepathy as mostly being a means by which we can accurately predict the interior mental life of another; thinking about ways in which technologies may serve to enrich our souls and deepen our togetherness and love.

Thanks for reading!

Subscribe now