Original: Swyx · 10/02/2026
Summary
Curing cancer, designing new materials, and solving energy will require AI that can interface with and predict the physical world. *Editor: The response to our new AI for Science agenda has been cautiously positive! We’ll also be featuring essays and approachable analysis for AI Engineers, in this dedicated feed — which you can [opt in/out](https://support.substack.com/hc/en-us/articles/891Key Insights
“Curing cancer, designing new materials, and solving energy will require AI that can interface with and predict the physical world.” — Highlighting the necessity of AI to go beyond language processing for scientific advancements.
“Scientists (LLMs) and simulators (domain models) are different programs within ML research that require distinct talent pools and data infrastructure.” — Explaining the distinct roles and requirements of LLMs and domain-specific models in AI research.
“The capacity for LLMs to accelerate (and eventually replace) scientists cannot be overstated.” — Emphasizing the potential of LLMs to transform the scientific research landscape.
Topics
Full Article
Published: 2026-02-10
Source: https://www.latent.space/p/scientist-simulator
Editor: The response to our new AI for Science agenda has been cautiously positive! We’ll also be featuring essays and approachable analysis for AI Engineers, in this dedicated feed — which you can opt in/out of on your account! One struggle we’ve had: approximately NONE of us love the “AI for Science” moniker. We are excited to launch our Science section with Melissa Du, who proposed a useful taxonomy framework for thinking about how money and talent are funneling into 2-3 main approaches… and how they combine in a coherent plan for progress. By coincidence, she introduces many of our upcoming guests on the Science pod!
We’ve witnessed the meteoric rise of LLMs over the past 5 years. Through scale alone, the models have grown from naive stochastic parrots into entities we credit with agency and emotional depth. 66% of physicians use AI in the clinic; 47% of software developers rely on AI coding assistants daily (surprised this number isn’t higher…). 79% of law firms report AI adoption in document review. AI has nearly mastered language and humanity’s digitized knowledge. When Dario Amodei published Machines of Loving Grace in 2024, he promised that AI would eliminate all bodily and mental ailments, resolve economic inequality, and create material abundance. But these aren’t exclusively language problems. Curing cancer, designing new materials, and solving energy will require AI that can interface with and predict the physical world, not just reason about text. Modern day AI for science discourse largely conflates progress in language models with progress in scientific modeling more broadly. And while the former will certainly accelerate our capacity to understand the natural world, the field of scientific modeling has had its own tribulations and successes predating the launch of ChatGPT.
Physics is simple, biology is complex
Why are simulators valuable? The core distinction between scientists and simulators, per our definition, is the reliance on text and reasoning as opposed to the reliance on domain-specific foundation models. Reasoning is sufficient when a domain has enough theoretical structure to support chain-of-thought derivation, but when theory is lacking, we require models that can learn directly from the data. The question of when this transition happens points to a deeper tension at the heart of scientific modeling: when can you derive predictions from theory and first principles, and when do you have to build models that pattern-match from empirics? Silicon Valley loves “first-principles thinking,” and, as it happens, so do academics. Everything is derivable from first principles. Chemistry emerges from physics, biology from chemistry, cognition from biology. If you encode the fundamental laws and apply enough computation, everything can be simulated. This is true! But unfortunately not as helpful for simulation as we’d like. Everything is atoms, but modeling atoms quickly becomes computationally intractable:I had a stormy graduate career, where every week we would have a shouting match. I kept doing deals where I would say, ‘Okay let me do neural nets for another six months and I will prove to you they work.’ At the end of the six months, I would say, ‘Yeah, but I am almost there, give me another six months.’ — Geoffrey Hinton, “Godfather of AI”The turning point came in 2012, when AlexNet, an image classification model, won an image classification benchmark known as ImageNet and became the watershed moment that convinced the broader ML community of the capacity for neural networks to scale. LLMs followed on as an even more prodigious success, exemplifying the necessity of data-driven learned simulators and launching the transformer architecture to well-deserved acclaim.
Attention is not all you need
The scaling laws transformed modeling in scientific domains as well–learned simulators are outperforming traditional physics-based methods in both accuracy and speed:- In 2023, Google DeepMind’s GraphCast, a graph neural network, exceeded ECMWF’s accuracy while making 10-day forecasts in under a minute on a single TPU, compared to hours on a supercomputer for traditional methods.
Slaves to the physical world
Biology is arguably where the simulator is both the most necessary and least developed. The accessibility of data in other domains is a largely solved problem. Weather had ERA5 reanalysis data—decades of global atmospheric observations, assimilated and quality-controlled, publicly available. For materials science, training data comes largely from DFT calculations, which are expensive but automatable. But biology wet-lab data is slow, noisy, expensive, often proprietary, and has been historically impossible to translate to real world validity. Cell lines don’t reliably predict what will happen in humans and animal models fail constantly; over 90% of drugs that work in mice fail in human trials. Single experiments can cost millions of dollars over the course of months. Sequencing has become cheap, but sequencing is only one modality. Predicting gene expression from sequence is hard. Predicting protein function from structure is hard. Predicting drug efficacy from molecular interactions is very hard. Arguably, we don’t even know what the right data to collect looks like. Noetik has distinguished themselves with a multimodal approach of training cancer world models on multiplex protein staining, spatial gene expression, DNA sequencing, and structural markers. Biohub has been racing to build diverse measurement tools across scales, from individual proteins to whole organisms. Generating data across a plurality of modalities for a plurality of models is the strategy of having no strategy (and it applies to the entire field of biology). It’s even possible that holistic theories of biology will continue to evade us. If so, progress will look less like physics and more like engineering–narrow focuses on particular diseases, particular organs, particular modalities. The work is unglamorous and the timelines are long. We remain slaves to the physical world.AGI AI for science timelines
So why not just wait for the LLMs to figure it out? There’s credible evidence that the big labs have invested direct effort into building AI scientists for ML research, a potential route towards recursive self improvement. But even if LLMs build or significantly accelerate the creation of accurate simulators, the scientist and simulator systems can still be distinguished on their technical basis, data requirements, and deployment timelines, which have tangible impacts on investment and policy. GPT-7 may very well have the cognitive capabilities to design digital twins that simulate human biology, but it will have been enabled by many other players already advancing the algorithms behind effective simulators and building automated data infrastructure. In the same way that ML for world models, voice, and image generation were pushed forward by ElevenLabs, Midjourney, and WorldLabs, among others, we should expect ML for science to be pushed forward by a plurality of efforts.
The scientist (LLMs for reasoning and synthesis) is being built by the frontier labs.
The simulator (domain-specific ML models) requires specialized architectures and domain expertise. DeepMind has done impressive work here, but it’s not their core business.
Data infrastructure (automated labs, high-throughput assays, simulation pipelines) requires capital-intensive physical facilities and years of iteration.
Melissa Du is a Research Engineer at Radical Numerics and is on X and on Substack. Give her a follow!
Key Takeaways
Notable Quotes
Curing cancer, designing new materials, and solving energy will require AI that can interface with and predict the physical world.Context: Highlighting the necessity of AI to go beyond language processing for scientific advancements.
Scientists (LLMs) and simulators (domain models) are different programs within ML research that require distinct talent pools and data infrastructure.Context: Explaining the distinct roles and requirements of LLMs and domain-specific models in AI research.
The capacity for LLMs to accelerate (and eventually replace) scientists cannot be overstated.Context: Emphasizing the potential of LLMs to transform the scientific research landscape.
Related Topics
- [[topics/ai-for-science]]
- [[topics/llms]]
- [[topics/scientific-modeling]]
- [[topics/ai-agents]]
Related Articles
Experts Have World Models. LLMs Have Word Models.
Swyx · explanation · 73% similar
[AINews] OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT 5.3 Codex
Swyx · explanation · 71% similar
[AINews] AI vs SaaS: The Unreasonable Effectiveness of Centralizing the AI Heartbeat
Swyx · explanation · 71% similar
Originally published at https://www.latent.space/p/scientist-simulator.