Next week is the annual APS conference that was once the March Meeting and is now the combined March/April "Global Physics Summit". As I've done annually, I will try to give some impressions of interesting talks that I see, hopefully at an understandable level. This year I'm only there from Monday through late Thursday morning, so I may miss exciting things - hopefully people will still discuss such things here as has happened in past years.
A few science tidbits in the meantime:
People sometimes time arXiv submissions to coincide with the APS meeting, and sometimes it's just coincidence. Two preprints (here and here) popped up very recently, both experiments on interferometry and braiding of anyons in bilayer graphene. There are many subtleties in such experiments. The colorized electron microscope images of the devices show how sophisticated fabrication has become in these systems, where very small amounts of disorder can disrupt the fragile many-body quantum states of interest.
On a much more classical physics note, this preprint uses some sophisticated multiscale modeling to address the question, why is ice so slippery? A super-thin layer of water on the surface of the ice under sliding conditions is crucial, and the roles of frictional heating and heat transfer have been tricky to quantify.
Meanwhile, across town from me at the University of Houston, Paul Chu and company have published this paper in PNAS, where they have demonstrated ambient pressure superconductivity in a mercury-based cuprate at 151 K, breaking the old ambient pressure record by 18 K (!). The trick here has been pressure annealing. Many superconductors, particularly the cuprates, tend to have higher transition temperatures at elevated pressures. One idea is that pressure distortion of certain bond angles favors superconductivity in this system, and Chu et al. have been exploring the idea of cycling pressure and temperature to "lock in" the altered crystal structure.
Moving away from condensed matter and turning to science used in the aid of history: When Vesuvius erupted in 79 CE, the pyroclastic flow swept through Herculaneum and a nearby Roman villa, housing a library of more than 1800 now-carbonized scrolls. Using 3D x-ray tomography, it is hoped that these scrolls may actually be read without trying to physically unroll them, prompting the Vesuvius Prize. This effort, involving x-ray imaging and AI methods, seems to be bearing fruit. There may be many more scrolls still buried as well. It would be amazing if great lost works of ancient Greek and Roman literature could be recovered.
Tangentially related to science, the arXiv is looking for a CEO - here is the position description. It's hard to overstate the impact of the arXiv and its relations in terms of open science, and in the chaotic world of scientific publishing, it's more important than ever.
If you need evidence of how screwed up scientific publishing is, apparently Springer-Nature has been surveying people to see how willing they would be to pay an up-front fee (e.g. $299) just for the privilege of submitting an article.
Mathematical research traditionally involves a small number of professional mathematicians working closely on difficult problems. However, I have long believed that there is a complementary way to do mathematics, in which one works with a broad community of mathematically minded people on problems which may not be as deep as the problems one traditionally works on, but still are of mathematical interest; and that modern technologies, including AI, are more suitable for contributing the latter type of workflow. The “Polymath projects” were one example of this broad type of collaboration, where internet platforms such as blogs and wikis were used to facilitate such collaboration. Some years later, collaborative formalization projects (such as the one to formalize the Polynomial Freiman–Ruzsa conjecture of Marton, discussed previously on this blog here) became popular in some circles. And in 2024, I launched the Equational Theories Project (ETP) (discussed on this blog here and here), combining the rigor of Lean formalization with “good old fashioned AI” (in the form of automated theorem provers) to settle (with formal verification) over 22 million true-false problems in universal algebra.
Continuing in this spirit, Damek Davis and I are launching a new project, in the form of an experimental competitive challenge hosted by the SAIR Foundation (where I serve as a board member, and which is supplying technical support and compute). The idea of this challenge, motivated in part by this recent paper of Honda, Murakami, and Zhang, is to measure the extent to which the 22 million universal algebra true-false results obtained by the ETP can be “distilled” into a short, human-readable “cheat sheet”, similar to how a student in an undergraduate math class might distill the knowledge learned from that class into a single sheet of paper that the student is permitted to bring into an exam.
Here is a typical problem in universal algebra that the ETP was able to answer:
Problem 1 Suppose that is a binary operation such that for all . Is it true that for all ?
Such a problem can be settled either by algebraically manipulating the initial equation to deduce the target equation, or by finding a counterexample to the target equation that still satisfies the initial equation. There are a variety of techniques to achieve either of these, but this sort of problem is difficult, and even undecidable in some cases; see this paper of the ETP collaborators for more discussion. Nevertheless, many of these problems can be settled with some effort by humans, by automated theorem provers, or by frontier AI systems; here for instance is an AI-generated solution to the above problem.
However, these AI models are expensive, and do not reveal much insight as to where their answers come from. If one instead tries a smaller and cheaper model, such as one of the many open-source models available, it turns out that these models basically perform no better than random chance, in that when asked to say whether the answer to a question such as the above is true or false, they only answer correctly about 50% of the time.
But, similarly to how a student struggling with the material for a math class can perform better on an exam when provided the right guidance, it turns out that such cheap models can perform at least modestly better on this task (with success rates increasing to about 55%-60%) if given the right prompt or “cheat sheet”.
“Stage 1” of the distillation challenge, which we launched today, asks for contestants to design a cheat sheet (of at most 10 kilobytes in size) that can increase the performance of these models on the above true-false problems to as high a level as possible. We have provided a “playground” with which to test one’s cheat sheet (or a small number of example cheat sheets) some cheap models against a public set of 1200 problems (1000 of which were randomly selected, and rather easy, together with 200 “hard” problems that were selected to resist the more obvious strategies for resolving these questions); a brief video explaining how to use the playground can be found here.
Submissions stage will end on April 20, after which we will evaluate the submissions against a private subset of test questions. The top 1000 submissions will advance to a second stage which we are currently in the process of designing, which will involve more advanced models, but also the more difficult task of not just providing a true-false answer, but also a proof or counterexample to the problem.
The competition will be coordinated on this Zulip channel, where I hope there will be a lively and informative discussion.
My hope is that the winning submissions will capture the most productive techniques for solving these problems, and/or provide general problem-solving techniques that would also be applicable to other types of mathematical problems. We started with the equational theory project data set for this pilot competition due to its availability and spectrum of difficulty levels, but if this type of distillation process leads to interesting results, one could certainly run in on many other types of mathematical problem classes to get some empirical data on how readily they can be solved, particularly after we learn from this pilot competition on how to encourage participation and share of best practices.
SAIR will also launch some other mathematical challenges in the coming months that will be of a more cooperative nature than this particular competitive challenge; stay tuned for further announcements.
Recently, my coworkers and I put out a preprint “Classical solution
of the FeMo-cofactor model to chemical accuracy and its
implications’’ (Zhai et al. 2026). It is a bit
unusual to write commentary on one’s own scientific article. However, in
this case, given the many inquiries I have had about the work in the
context of quantum computing, many of which have contained similar
questions (and often similar misunderstandings), I thought it would be
useful to provide some perspective that we could not provide in the
original preprint, in an informal manner.
What is FeMo-co?
I will start with some background on the FeMo-cofactor (FeMo-co).
This cofactor is the reaction center of nitrogenase, an enzyme found in
certain soil-dwelling bacteria. Nitrogenase’s claim to fame is that it
converts atmospheric dinitrogen, which is held together by a strong N-N
triple bond, into a reduced form (ammonia) which can then be taken up by
plants and thereby be passed onto the rest of the living biomass. In
terms of incorporating nitrogen into biomass, nitrogenase is believed to
be responsible for about 2/3 of
biological nitrogen, with the remainder coming from fertilizers. Because
it plays this critical role, it is sometimes referred to as the enzyme
that feeds the planet.
The chemistry of how dinitrogen is reduced at the FeMo-cofactor is still largely unknown. The basic stoichiometry of the reaction is often written as
but this just a sketch of the process. In particular, the above equation contains, nominally, a large number of molecular reactants, and clearly they do not all just come together in a bang! The role of the cofactor, and the enzyme more generally, is to coordinate the protons, electrons, biological energy source (ATP), and the dinitrogen molecule, into a sequence of well-defined steps, known as the reaction mechanism. Since the work of Lowe and Thorneley (Thorneley and Lowe 1984), the most common proposal for the nitrogenase reaction mechanism contains 8 intermediate steps (corresponding roughly to 8 sequential proton and electron additions). However, due to the difficulty in isolating the intermediate states of FeMo-co, as well as challenges in using experimental probes to deduce what these states are, the Lowe-Thorneley cycle still remains an unproven hypothesis. Biochemists, spectroscopists, as well as a few theoretical quantum chemists, are today actively engaged in observing, computing, deducing (and arguing about) the nitrogenase mechanism (Jiang and Ryde 2023; Lancaster et al. 2011; Einsle and Rees 2020; Badding et al. 2023; Thorhallsson et al. 2019).
So how did nitrogenase become so widely discussed in the setting of
quantum computing? In 2016, an article “Elucidating reaction mechanisms
on quantum computers’’, that has since become one of the most cited
papers in the nitrogenase field, arguably started this all (Reiher et al.
2017). The article included a number of proposals, including (1)
that the ‘promise of exponential speedups for the electronic structure
problem’ could be applied to elucidate the nitrogenase reaction
mechanism that had so far proved intractable for classical computation,
and (2) that solving this problem would be an example of how quantum
simulation could be ‘scientifically and economically impactful’.
(Similar proposals can also be found repeated in less technical language
and settings, see e.g. ‘Why do
we want a quantum computer’). An important technical contribution of
the article was to provide a detailed quantum resource estimate for a
simulation of chemistry. The problem statement was to compute the
ground-state energy of a specific ‘54 orbital’ (108 qubit) model of
FeMo-co, to an accuracy of 1 kcal/mol, referred to as chemical accuracy.
It is important to note the word ‘model’ in the problem statement.
Electrons move in continuous space, and thus quantum chemical
Hamiltonians are formulated in the continuum, while quantum computation
requires discretization of this space. This discretization, in terms of
a so-called active space set of orbitals that the electrons can
variously occupy, constitutes the model. We will return to the
definition of the model below. By compiling a Trotter-Suzuki
implementation of the quantum phase estimation algorithm within a
fault-tolerant resource model for their specific FeMo-co model
Hamiltonian, Ref. (Reiher et al. 2017) provided
a T-gate resource estimate. Combined with some assumptions about the
quantum architecture, this provided perhaps the first concrete time-cost
to solve an interesting chemistry problem on a quantum computer. This
work has since served as an inspiration for many subsequent quantitative
resource estimation efforts in the quantum computing for chemistry
field.
Exponential speedup and
societal impact
Before proceeding further in this story, it is worth examining the
two key propositions made in Ref. (Reiher et al. 2017). I start
with the question of exponential speedup. Quantum algorithms for the
ground-state energy, such as quantum phase estimation, essentially
perform a projective measurement of the energy (encoded in a phase).
Thus, it is essential to prepare a good initial state, i.e. with large
overlap with the desired eigenstate, to measure the correct energy.
This, however, is a strong constraint, if we are seeking asymptotically
large quantum advantage. For example, if such an initial state is first
determined classically, as is often suggested, then exponential quantum
advantage in a given problem requires that finding good classical
guesses is easy, while improving them classically to fully solve the
problem becomes exponentially hard as the problem size increases.
Unfortunately, convincing evidence that chemically relevant electronic
structure problems, including the problem of cofactor electronic
structure exemplified by FeMo-co, fall into this category has not yet
been found, as discussed in detail in Refs. (Lee et al. 2023; Chan
2024).
The second proposition, that elucidating the reaction mechanism of
nitrogenase will lead to a transformative societal impact, is similarly
nuanced. The claim originates in the observation that the competing
industrial process for fertilizer production via nitrogen reduction,
namely, the Haber-Bosch process, takes place at high temperatures and
pressures and consumes a significant percentage of the world’s energy.
Bacteria, on the other hand, can do this process at room
temperature.
While it is true that the nitrogenase enzyme functions at ambient
temperature and pressure, it is simply false that it consumes much less
energy. This is because the large amount of energy required for nitrogen
fixation mainly originates from thermodynamics, i.e. one needs energy to
break the strong nitrogen triple bond. In fact, taking into account the
physiological conditions and the ATP cost, bacteria arguably expend
more energy to reduce ammonia (Chan 2024) than a modern efficient
industrial implementation of the Haber-Bosch process. Thus the real hope
behind trying to understand the nitrogenase mechanism in the context of
societal impact is that we may one day engineer a variant of it with
more desirable properties, e.g. with higher turnover, or with a lower
carbon footprint, or which is more selective for nitrogen reduction.
Whether this is actually possible remains to be seen, and certainly
requires much more than knowing the ground-state of FeMo-co, or even the
full reaction mechanism.
Which FeMo-cofactor model?
I now return to the question of FeMo-cofactor models. Ref. (Reiher et al.
2017) introduced a particular cofactor model, which I will refer
to it as RWST, following the names of the authors. As we soon found out,
simulating the ground-state of the RWST model was actually very easy
classically, and in fact (as reported in (Li et al. 2019)) could be done
using standard quantum chemistry methods with a few hours of calculation
on a laptop. This was because although the RWST model was a 108 qubit
model, and (in the worst case) a 108 qubit ground-state cannot be stored
classically, the RSWT model Hamiltonian was constructed in such a way to
not capture any of the difficult features of the FeMo-cofactor
ground-state. This highlights the importance of not assuming worst case
complexity about physical problems!
What makes the electronic structure of the FeMo-cofactor (relatively)
complicated is the presence of many ‘unpaired’ electrons. In simple
molecules, we can describe the ground-state as one where all the
electrons sit in pairs in orbitals. Since an orbital can only carry a
pair of electrons at a time, the ground-state is simply described by
filling the lowest energy orbitals with pairs. However, in molecules
with transition metals, there are typically ‘unpaired’ electrons
(so-called open-shells), and then we need to consider whether and how
they pair up, which orbitals are singly versus doubly occupied, and so
on. The RSWT model ground-state had no unpaired electrons! It was
therefore unrepresentatively easy to solve for the ground state
classically.
Because of the problems with the RWST model, my group formulated a
more suitable 76 orbital/152 qubit model of FeMo-co in Ref. (Li et al. 2019),
which I will refer to as the LLDUC model, again by the names of the
authors. Although the LLDUC model is still a significant truncation of
the true electronic structure of FeMo-co, we verified that it contains
the correct open-shell character of the cofactor, and thus has a
‘representative’ complexity in its ground-state. Since we published the
LLDUC model, it has become the most common benchmark model of FeMo-co
used in quantum resource estimates for new quantum chemistry
ground-state algorithms (Wan
et al. 2022; Berry et al. 2019; Luo and Cirac 2025; Low et al.
2025).
Heuristics
in the classical solution of the LLDUC FeMo-cofactor model
This brings me now to the recent work in Ref. (Zhai et al. 2026), where, through
a sequence of classical calculations, we could produce a classical
estimate of the ground-state energy of the LLDUC model to chemical
accuracy. How was this achieved?
Classical electronic structure methods (aside from exact
diagonalization) are heuristic algorithms. Much like quantum algorithms,
they implicitly or explicitly start from an initial state. In chemical
applications, this can be viewed as a product state or set of product
states: for tensor network algorithms, such as the density matrix
renormalization group (when not considering topological order) this is
the set of states (specified by
the underlying basis) which connect to the space of slightly entangled
states with that the
algorithm naturally explores. In coupled cluster methods, this is the
initial reference state to which excitations are applied. Although many
classical heuristics are exact with exponential effort, e.g. by
increasing the bond dimension in
a tensor network or excitation level in coupled cluster theory, in
practical computational chemistry, classical heuristics are used with
the assumption that so long as the initial state is chosen
appropriately, they will converge rapidly to the true ground-state
without exponential effort. I analyze this heuristic working assumption
in Ref. (Chan
2024) where I name it the classical heuristic cost conjecture.
However, finding the good classical initial state is an NP hard problem,
and this is often the crux of where the challenge in simulation actually
lies.
In FeMo-co, unlike in simpler molecules, it is not at all obvious
what product state to start from. To address this, in Ref. (Zhai et al.
2026), we devised an enumeration and filtering protocol. The
relevant manifold arises from the orbital and spin degrees of freedom of
the Fe ions: which Fe orbitals are occupied, by how many electrons, and
with which spins. One technical point is that the resulting product
states do not generally conserve the global spin symmetry. However, as recognized
by Anderson decades ago, for magnetic order in large systems, the
eigenstates can be chosen to break symmetry due to the tower of symmetry
preserving eigenstates at an energy scale of (where is the system volume). For a finite
energy resolution we can equally use a broken symmetry description
of the states, an example of the fragility of entanglement effects in
physical systems.
Because applying the highest level of classical approximation to all
enumerated product states was far too expensive, we used a filtering
funnel, where product states were ranked at different levels of theory,
passing promising candidates to higher levels of classical computation.
In the end, the final most accurate calculations were performed on only
3 candidates, which we deduced to all be essentially degenerate to
within chemical accuracy.
There are other important technical details in Ref. (Zhai et al.
2026) which I have not mentioned: the use of unrestricted
orbitals, the systematic extrapolations to obtain the final energies and
estimated errors, and the benchmarking required to be confident about
the protocol. However, recognizing that the FeMo-co ground-state problem
could be reduced essentially to a ranking problem was the essence of
what made the estimate possible.
Implications
of the classical solution for chemistry
From a chemical and biochemical perspective, computing the
ground-state energy of a model to some specified accuracy – even
chemical accuracy – is a highly artificial target. Most chemical
calculations that have an impact on our understanding never achieve or
even target chemical accuracy in the total energy. In addition,
chemistry does not depend on the the total energy, but the relative
energy of different chemical configurations, which typically differ only
by changes in the bonding and
chemistry.
The main take-home from our work then is that there is nothing
especially mysterious about FeMo-co’s electronic structure. The story of
the FeMo-co ground-state is not one of multiconfigurational electronic
structure (i.e. where the states are not at all close to product
states), but one of multiple configurations (i.e. many competing product
states). Indeed, this is basically how nitrogenase chemists have long
reasoned about the electronic structure of iron-sulfur clusters and
FeMo-co (Lovell et al. 2001;
Yamaguchi et al. 1990). Our work thus now provides extensive and
rigorous numerical support for this picture.
Because of this simplicity, the full richness of classical quantum
chemistry methods can now be brought to bear on FeMo-co electronic
structure beyond the LLDUC model. Assuming the model already captures
the qualitative complexity of the cluster’s electronic structure, we
expect such investigations to provide quantitative corrections to the
picture we have obtained. We took some initial steps to confirm this in
our manuscript, considering larger orbital spaces, the effect of protein
fluctuations, and the interpretation of certain spectroscopies. In the
future, connecting these simulations to more spectroscopic measurements
will be an exciting possibility. In addition, now that the electronic
structure is on a conceptually sound footing, we have a foundation to
support the central question of resolving the reaction mechanism. This
opens up a whole new set of scientific challenges associated with
observing reactions on extremely slow timescales.
Implications
of the classical solution for quantum computing in chemistry
Because of the success of classical heuristic methods for this
problem, one may naturally wonder what these results mean for the
application of quantum computers in chemistry. Here I address some
commonly asked questions.
Is the classical simulation of the LLDUC model a ‘last hurrah’ for classical methods?
I have seen the analogy drawn between the FeMo-co result and the classical tensor network simulations for random circuit sampling experiments. In that case, while the famous Google Sycamore experiment (Arute et al. 2019) could be replicated by classical tensor network simulations (Gray and Kourtis 2021), subsequent improvements in quantum processors, soon led to random circuit sampling experiments outpacing the capabilities of classical simulations.
However, the situation here is quite different. There is strong
evidence that generating samples from a random quantum circuit (without
noise) is actually exponentially hard to do using a classical algorithm,
and indeed, the classical simulations used for the task were (mostly)
brute force simulations with exponential cost in circuit size. In
contrast, the theoretical support for exponential quantum advantage in
the FeMo-co problem is much weaker, and as an empirical fact, most of
the methods used in the FeMo-co simulation (namely the coupled cluster
methods for a given excitation level) are polynomial cost algorithms.
Since a similar simulation strategy has also been successfully applied
across the series of 2, 4, and 8 metal iron-sulfur complexes (Sharma
et al. 2014; Li et al. 2019; Zhai et al. 2023, 2026), we have no
reason to expect a radically different situation if we consider larger
analogous complexes in this series.
And in any case, chemistry does not provide an endless scaling of problem size; FeMo-co is the largest enzyme cofactor in terms of the number of transition metals. Materials simulations provide a setting to scale the problem size, but one still faces the question as to whether the relevant states observed are truly that complicated classically. For example, classical simulations of the ground-state orders of the 2D Hubbard model currently show no exponential increase in difficulty when going to larger system sizes (Chen et al. 2025; Liu et al. 2025).
Has the availability of a classical strategy for FeMo-co changed your enthusiasm for quantum computers in chemistry?
Again, my answer is no. There is an entire community of nitrogenase scientists: experimental spectroscopists, synthetic chemists, and of course computational chemists, who are working to map out the reaction mechanism, none of whose research is predicated on using a quantum computer. Personally, I have never thought that to understand nitrogenase we would first have to build a quantum computer, otherwise I would not work on the problem!
At the same time, any computational tool brings new capabilities that
will be useful. Quantum algorithms come with theoretical guarantees; for
example, so long as the initial state is well prepared, we know the
error in the energy that we measure from a quantum algorithm, which is
more reliable than the classical estimates of error we obtain from
extrapolations. Similarly, initial state preparation for a quantum
computer, even for classically tractable problems, is probably easier
than solving the entire problem classically, since only a ‘rough’ guess
is needed. And finally, a polynomial or even constant factor speedup is
exciting, so long as the speedup is large enough!
Thus, I am in fact excited to see quantum computers applied to this
problem, I am just not waiting for them to be built first.
How should one think about past work on quantum algorithms that has used FeMo-co as a target?
FeMo-co was amongst the earliest examples of a chemical problem for which a case for quantum advantage was made. For this reason, it is overrepresented in the literature of quantum computing for chemistry. Should fully fault-tolerant quantum computers be available, they will naturally be applied to a wider set of systems (Chan 2024; Babbush et al. 2025).
Also, one must recognize that the availability of a single concrete optimization target has led to undeniable advances in quantum algorithms for quantum chemistry. In most cases, prior work to improve quantum resources estimates for FeMo-co involve techniques that apply to other systems as well. Thus, there’s no need to throw away those papers!
What are some lessons and conclusions to draw?
The first is obviously that, just because something has not been solved, or appears hard to solve classically, does not mean it is the best problem to choose for a quantum computer. The classical solution strategy for FeMo-co essentially involved a complicated classical state preparation problem, which is a shared challenge with ground-state estimation algorithms in quantum computers, and thus not perhaps an optimal choice of problem.
My second main conclusion is that since classical solutions in
complex problems are possible because they use some understanding of the
problem, for quantum algorithms to have maximum impact, they should use
the same knowledge. In fact most chemistry is not about truly mysterious
quantum systems, but more about ordinary quantum matter where we know
roughly what is going on, but where detailed simulations are still
required. If quantum computing algorithms can target this ‘mundane’
regime, they will have maximum impact on chemistry as it is practiced
today. In recent work, we have taken some steps in this direction by
proposing quantum algorithms for electronic structure that work within
the same heuristic framework as most current quantum chemistry
methods (Chen and
Chan 2025).
Finally, I wish to emphasize that, from the perspective of
understanding nitrogenase, and maximising societal impact, the choice of
computational algorithm and hardware to solve the problem is irrelevant.
The fact that FeMo-co electronic structure is not so mysterious is an
enormously positive thing, as it means that making progress on the
larger problem of the mechanism using computation no longer seems so
impossible. I have seen some of the brightest minds in the world helping
to advance quantum algorithms for this problem. If any of this
brainpower can be devoted to the chemical question itself, I believe we
can be very optimistic about the future solution of the nitrogenase
problem.
References
Arute, Frank, Kunal Arya, Ryan Babbush, et
al. 2019. “Quantum Supremacy Using a Programmable
Superconducting Processor.”Nature 574 (7779): 505–10.
Babbush, Ryan, Robbie King, Sergio Boixo, et al. 2025. “The Grand
Challenge of Quantum Applications.”arXiv Preprint
arXiv:2511.09124.
Badding, Edward D, Suppachai Srisantitham, Dmitriy A Lukoyanov, Brian M
Hoffman, and Daniel LM Suess. 2023. “Connecting the Geometric and
Electronic Structures of the Nitrogenase Iron–Molybdenum Cofactor
Through Site-Selective 57Fe Labelling.”Nature Chemistry 15 (5): 658–65.
Berry, Dominic W, Craig Gidney, Mario Motta, Jarrod R McClean, and Ryan
Babbush. 2019. “Qubitization of Arbitrary Basis Quantum Chemistry
Leveraging Sparsity and Low Rank Factorization.”Quantum
3: 208.
Chan, Garnet Kin-Lic. 2024. “Spiers Memorial Lecture: Quantum
Chemistry, Classical Heuristics, and Quantum Advantage.”Faraday Discussions 254: 11–52.
Chen, Ao, Zhou-Quan Wan, Anirvan Sengupta, Antoine Georges, and
Christopher Roth. 2025. “Neural Network-Augmented Pfaffian
Wave-Functions for Scalable Simulations of Interacting Fermions.”arXiv Preprint arXiv:2507.10705.
Chen, Jielun, and Garnet Kin Chan. 2025. “A Framework for Robust
Quantum Speedups in Practical Correlated Electronic Structure and
Dynamics.”arXiv Preprint arXiv:2508.15765.
Einsle, Oliver, and Douglas C Rees. 2020. “Structural Enzymology
of Nitrogenase Enzymes.”Chemical Reviews 120 (12):
4969–5004.
Jiang, Hao, and Ulf Ryde. 2023. “N2 Binding to the E0–E4 States
of Nitrogenase.”Dalton Transactions 52 (26): 9104–20.
Lancaster, Kyle M, Michael Roemelt, Patrick Ettenhuber, et al. 2011.
“X-Ray Emission Spectroscopy Evidences a Central Carbon in the
Nitrogenase Iron-Molybdenum Cofactor.”Science 334
(6058): 974–77.
Lee, Seunghoon, Joonho Lee, Huanchen Zhai, et al. 2023.
“Evaluating the Evidence for Exponential Quantum Advantage in
Ground-State Quantum Chemistry.”Nature Communications
14 (1): 1952.
Li, Zhendong, Sheng Guo, Qiming Sun, and Garnet Kin-Lic Chan. 2019.
“Electronic Landscape of the P-Cluster of Nitrogenase as Revealed
Through Many-Electron Quantum Wavefunction Simulations.”Nature Chemistry 11 (11): 1026–33.
Liu, Wen-Yuan, Huanchen Zhai, Ruojing Peng, Zheng-Cheng Gu, and Garnet
Kin-Lic Chan. 2025. “Accurate Simulation of the Hubbard Model with
Finite Fermionic Projected Entangled Pair States.”Physical
Review Letters 134 (25): 256502.
Lovell, Timothy, Jian Li, Tiqing Liu, David A Case, and Louis Noodleman.
2001. FeMo Cofactor of
Nitrogenase: A Density Functional Study of States MN, MOX, MR, and
MI.” Journal of the American Chemical Society 123 (49):
12392–410.
Low, Guang Hao, Robbie King, Dominic W Berry, et al. 2025. “Fast
Quantum Simulation of Electronic Structure by Spectrum
Amplification.”arXiv Preprint arXiv:2502.15882.
Luo, Maxine, and J Ignacio Cirac. 2025. “Efficient Simulation of
Quantum Chemistry Problems in an Enlarged Basis Set.”PRX
Quantum 6 (1): 010355.
Reiher, Markus, Nathan Wiebe, Krysta M Svore, Dave Wecker, and Matthias
Troyer. 2017. “Elucidating Reaction Mechanisms on Quantum
Computers.”Proceedings of the National Academy of
Sciences 114 (29): 7555–60.
Sharma, Sandeep, Kantharuban Sivalingam, Frank Neese, and Garnet Kin-Lic
Chan. 2014. “Low-Energy Spectrum of Iron–Sulfur Clusters Directly
from Many-Particle Quantum Mechanics.”Nature Chemistry
6 (10): 927–33.
Thorhallsson, Albert Th, Bardi Benediktsson, and Ragnar Bjornsson. 2019.
“A Model for Dinitrogen Binding in the E4 State of
Nitrogenase.”Chemical Science 10 (48): 11110–24.
Thorneley, Roger NF, and DJ Lowe. 1984. “The Mechanism of
Klebsiella Pneumoniae Nitrogenase Action. Pre-Steady-State Kinetics of
an Enzyme-Bound Intermediate in N2 Reduction and of NH3
Formation.”Biochemical Journal 224 (3): 887–94.
Wan, Kianna, Mario Berta, and Earl T Campbell. 2022. “Randomized
Quantum Algorithm for Statistical Phase Estimation.”Physical
Review Letters 129 (3): 030503.
Yamaguchi, Kizashi, Takayuki Fueno, Masa-aki Ozaki, Norikazu Ueyama, and
Akira Nakamura. 1990. “A General Spin-Orbital (GSO) Description of
Antiferromagnetic Spin Couplings Between Four Irons in Iron-Sulfur
Clusters.”Chemical Physics Letters 168 (1): 56–62.
Zhai, Huanchen, Seunghoon Lee, Zhi-Hao Cui, Lili Cao, Ulf Ryde, and
Garnet Kin-Lic Chan. 2023. “Multireference Protonation Energetics
of a Dimeric Model of Nitrogenase Iron–Sulfur Clusters.”The
Journal of Physical Chemistry A 127 (47): 9974–84.
Zhai, Huanchen, Chenghan Li, Xing Zhang, Zhendong Li, Seunghoon Lee, and
Garnet Kin-Lic Chan. 2026. “Classical Solution of the FeMo-Cofactor
Model to Chemical Accuracy and Its Implications.”arXiv
Preprint arXiv:2601.04621.
I’ve had a bit more time to dig in to the paper I mentioned last week, where OpenAI collaborated with amplitudes researchers, using one of their internal models to find and prove a simplified version of a particle physics formula. I figured I’d say a bit about my own impressions from reading the paper and OpenAI’s press release.
This won’t be a real “deep dive”, though it will be long nonetheless. As it turns out, most of the questions I’d like answers to aren’t answered in the paper or the press release. Getting them will involve actual journalistic work, i.e. blocking off time to interview people, and I haven’t done that yet. What I can do is talk about what I know so far, and what I’m still wondering.
Context:
Scattering amplitudes are formulas used by particle physicists to make predictions. For a while, people would just calculate these when they needed them, writing down pages of mess that you could plug in numbers to to get answers. However, forty years ago two physicists decided they wanted more, writing “we hope to obtain a simplified form for the answer, making our result not only an experimentalist’s, but a theorist’s delight.”
In their next paper, they managed to find that “theorist’s delight”: a simplified, intuitive-looking answer that worked for calculations involving any number of particles, summarizing many different calculations. Ten years later, a few people had started building on it, and ten years after that, the big shots started paying attention. A whole subfield, “amplitudeology”, grew from that seed, finding new forms of “theorists’s delight” in scattering amplitudes.
Each subfield has its own kind of “theory of victory”, its own concept for what kind of research is most likely to yield progress. In amplitudes, it’s these kinds of simplifications. When they work out well, they yield new, more efficient calculation techniques, yielding new messy results which can be simplified once more. To one extent or another, most of the field is chasing after those situations when simplification works out well.
That motivation shapes both the most ambitious projects of senior researchers, and the smallest student projects. Students often spend enormous amounts of time looking for a nice formula for something and figuring out how to generalize it, often on a question suggested by a senior researcher. These projects mostly serve as training, but occasionally manage to uncover something more impressive and useful, an idea others can build around.
I’m mentioning all of this, because as far as I can tell, what ChatGPT and the OpenAI internal model contributed here roughly lines up with the roles students have on amplitudes papers. In fact, it’s not that different from the role one of the authors, Alfredo Guevara, had when I helped mentor him during his Master’s.
Senior researchers noticed something unusual, suggested by prior literature. They decided to work out the implications, did some calculations, and got some messy results. It wasn’t immediately clear how to clean up the results, or generalize them. So they waited, and eventually were contacted by someone eager for a research project, who did the work to get the results into a nice, general form. Then everyone publishes together on a shared paper.
How impressed should you be?
I said, “as far as I can tell” above. What’s annoying is that this paper makes it hard to tell.
If you read through the paper, they mention AI briefly in the introduction, saying they used GPT-5.2 Pro to conjecture formula (39) in the paper, and an OpenAI internal model to prove it. The press release actually goes into more detail, saying that the humans found formulas (29)-(32), and GPT-5.2 Pro found a special case where it could simplify them to formulas (35)-(38), before conjecturing (39). You can get even more detail from an X thread by one of the authors, OpenAI Research Scientist Alex Lupsasca. Alex had done his PhD with another one of the authors, Andrew Strominger, and was excited to apply the tools he was developing at OpenAI to his old research field. So they looked for a problem, and tried out the one that ended up in the paper.
What is missing, from the paper, press release, and X thread, is any real detail about how the AI tools were used. We don’t have the prompts, or the output, or any real way to assess how much input came from humans and how much from the AI.
Contra some commentators, I don’t think the authors are being intentionally vague here. They’re following business as usual. In a theoretical physics paper, you don’t list who did what, or take detailed account of how you came to the results. You clean things up, and create a nice narrative. This goes double if you’re aiming for one of the most prestigious journals, which tend to have length limits.
This business-as-usual approach is ok, if frustrating, for the average physics paper. It is, however, entirely inappropriate for a paper showcasing emerging technologies. For a paper that was going to be highlighted this highly by OpenAI, the question of how they reached their conclusion is much more interesting than the results themselves. And while I wouldn’t ask them to go to the standards of an actual AI paper, with ablation analysis and all that jazz, they could at least have aimed for the level of detail of my final research paper, which gave samples of the AI input and output used in its genetic algorithm.
For the moment, then, I have to guess what input the AI had, and what it actually accomplished.
Let’s focus on the work done by the internal OpenAI model. The descriptions I’ve seen suggest that it started where GPT-5.2 Pro did, with formulas (29)-(32), but with a more specific prompt that guided what it was looking for. It then ran for 12 hours with no additional input, and both conjectured (39) and proved it was correct, providing essentially the proof that follows formula (39) in the paper.
Given that, how impressed should we be?
First, the model needs to decide to go to a specialized region, instead of trying to simplify the formula in full generality. I don’t know whether they prompted their internal model explicitly to do this. It’s not something I’d expect a student to do, because students don’t know what types of results are interesting enough to get published, so they wouldn’t be confident in computing only a limited version of a result without an advisor telling them it was ok. On the other hand, it is actually something I’d expect an LLM to be unusually likely to do, as a result of not managing to consistently stick to the original request! What I don’t know is whether the LLM proposed this for the right reason: that if you have the formula for one region, you can usually find it for other regions.
Second, the model needs to take formulas (29)-(32), write them in the specialized region, and simplify them to formulas (35)-(38). I’ve seen a few people saying you can do this pretty easily with Mathematica. That’s true, though not every senior researcher is comfortable doing that kind of thing, as you need to be a bit smarter than just using the Simplify[] command. Most of the people on this paper strike me as pen-and-paper types who wouldn’t necessarily know how to do that. It’s definitely the kind of thing I’d expect most students to figure out, perhaps after a couple of weeks of flailing around if it’s their first crack at it. The LLM likely would not have used Mathematica, but would have used SymPy, since these “AI scientist” setups usually can write and execute Python code. You shouldn’t think of this as the AI reasoning through the calculation itself, but it at least sounds like it was reasonably quick at coding it up.
Then, the model needs to conjecture formula (39). This gets highlighted in the intro, but as many have pointed out, it’s pretty easy to do. If any non-physicists are still reading at this point, take a look:
Could you guess (39) from (35)-(38)?
After that, the paper goes over the proof that formula (39) is correct. Most of this proof isn’t terribly difficult, but the way it begins is actually unusual in an interesting way. The proof uses ideas from time-ordered perturbation theory, an old-fashioned way to do particle physics calculations. Time-ordered perturbation theory isn’t something any of the authors are known for using with regularity, but it has recently seen a resurgence in another area of amplitudes research, showing up for example in papers by Matthew Schwartz, a colleague of Strominger at Harvard.
If a student of Strominger came up with an idea drawn from time-ordered perturbation theory, that would actually be pretty impressive. It would mean that, rather than just learning from their official mentor, this student was talking to other people in the department and broadening their horizons, showing a kind of initiative that theoretical physicists value a lot.
From an LLM, though, this is not impressive in the same way. The LLM was not trained by Strominger, it did not learn specifically from Strominger’s papers. Its context suggested it was working on an amplitudes paper, and it produced an idea which would be at home in an amplitudes paper, just a different one than the one it was working on.
While not impressive, that capability may be quite useful. Academic subfields can often get very specialized and siloed. A tool that suggests ideas from elsewhere in the field could help some people broaden their horizons.
Overall, it appears that that twelve-hour OpenAI internal model run reproduced roughly what an unusually bright student would be able to contribute over the course of a several-month project. Like most student projects, you could find a senior researcher who could do the project much faster, maybe even faster than the LLM. But it’s unclear whether any of the authors could have: different senior researchers have different skillsets.
A stab at implications:
If we take all this at face-value, it looks like OpenAI’s internal model was able to do a reasonably competent student project with no serious mistakes in twelve hours. If they started selling that capability, what would happen?
If it’s cheap enough, you might wonder if professors would choose to use the OpenAI model instead of hiring students. I don’t think this would happen, though: I think it misunderstands why these kinds of student projects exist in a theoretical field. Professors sometimes use students to get results they care about, but more often, the student’s interest is itself the motivation, with the professor wanting to educate someone, to empire-build, or just to take on their share of the department’s responsibilities. AI is only useful for this insofar as AI companies continue reaching out to these people to generate press releases: once this is routinely possible, the motivation goes away.
More dangerously, if it’s even cheaper, you could imagine students being tempted to use it. The whole point of a student project is to train and acculturate the student, to get them to the point where they have affection for the field and the capability to do more impressive things. You can’t skip that, but people are going to be tempted to.
And of course, there is the broader question of how much farther this technology can go. That’s the hardest to estimate here, since we don’t know the prompts used. So I don’t know if seeing this result tells us anything more about the bigger picture than we knew going in.
Remaining questions:
At the end of the day, there are a lot of things I still want to know. And if I do end up covering this professionally, they’re things I’ll ask.
What was the prompt given to the internal model, and how much did it do based on that prompt?
Was it really done in one shot, no retries or feedback?
How much did running the internal model cost?
Is this result likely to be useful? Are there things people want to calculate that this could make easier? Recursion relations it could seed? Is it useful for SCET somehow?
How easy would it have been for the authors to do what the LLM did? What about other experts in the community?
Agent frameworks are popular. (These are frameworks for coordinating large language model agents, not to be confused with agent-based modelling in the simulation sense.) There are dozens of them for wrapping large language models in something called an agent and assembling groups of agents into workflows. Much of the surrounding discussion is marketing, but the underlying intuition is old: your web browser identifies itself as a user agent. What is new is the capability that generative language models bring.
The moment you have one agent, you can have more than one. That much is obvious. How to coordinate them is not. The existing frameworks (n8n, LangGraph, CrewAI, and others) are engineering solutions, largely ad hoc. Some, like LangGraph, involve real thinking about state machines and concurrency. But none draws on what we know from mathematics and computer science about typed composition, protocol specification, or structural guarantees for concurrent systems.
This matters because it is expensive. Multi-agent systems are complicated concurrent programs. Without structural guardrails, they fail in ways you discover only after spending the compute. A job can go off the rails, and the money you paid for it is wasted; the providers will happily take it regardless. At current subscription rates the cost is hidden, but a recent Forbes investigation found that a heavy user of Anthropic’s $200/month Claude Code subscription can consume up to $5,000/month measured at retail API rates. For third-party tools like Cursor, which pay close to those retail rates, these costs are real. Wasted tokens are wasted money.
To address this, we built a language called plumbing. It describes how agents connect and communicate, in such a way that the resulting graph can be checked before execution: checked for well-formedness, and within limits for deadlocks and similar properties. It is a statically typed language, and these checks are done formally. There is a compiler and a runtime for this language, working code, not a paper architecture. In a few lines of plumbing, you can describe agent systems with feedback loops, runtime parameter modulation, and convergence protocols, and be sure they are well-formed before they run. This post explains how it works.
The name has a history in computing. Engineers have always talked informally about plumbing to connect things together: bits of software, bits of network infrastructure. When I was a network engineer I sometimes described myself as a glorified plumber. The old Solaris ifconfig command took plumb as an argument, to wire a network interface into the stack. Plan 9 had a deeper version of the same idea. The cultural connection goes back decades.
This is the first of two posts. This one introduces the plumbing calculus: what it is, how it works, and a few simple examples. Motifs for adversarial review, ensemble reasoning, and synthesis. The second post will tackle something harder.
The calculus
The plumbing language is built on a symmetric monoidal category, specifically a copy-discard category with some extra structure. The terminology may be unfamiliar, but the underlying concept is not. Engineers famously like Lego. Lego bricks have studs on top and holes with flanged tubes underneath. The studs of one brick fit into the tubes of another. But Lego has more than one connection type: there are also holes through the sides of Technic bricks, and axles that fit through them, and articulated ball joints for the fancier kits. Each connection type constrains what can attach to what. This is typing.
In plumbing, the objects of the category are typed channels: streams that carry a potentially infinite sequence of values, each of a specific type (integer, string, a record type, or something more complex). We write !A to mean "a stream of As", so !string is a stream of strings and !int is a stream of integers. The morphisms, which describe how you connect channels together, are processes. A process has typed inputs and typed outputs.
There are four structural morphisms. Copy takes a stream and duplicates it: the same values appear on two output streams. Discard throws values away, perhaps the simplest thing you can do with a stream, and often needed. These two, together with the typed channels and the laws of the category, give us a copy-discard category.
To this we add two more. Merge takes two streams of the same type and interleaves them onto a single output stream. This is needed because a language model’s input is a single stream. There is nothing to be done about that. If you want to send two different things into it, you must send one and then the other. One might initially give merge the type !A ⊗ !B → !(A + B), taking two streams of different types and producing their coproduct. This works, but it is unnecessarily asymmetrical.
As Tobias Fritz has observed, it is cleaner to do the coproduct injection first, converting each stream to the coproduct type separately, and then merge streams that already have the same type. This gives:
merge : !A ⊗ !A → !(A + A)
Barrier takes two streams, which may be of different types, and synchronises them. Values arrive unsynchronised; the barrier waits for one value from each stream and produces a pair.
barrier : !A ⊗ !B → !(A, B)
(A mathematician would write A × B for the product. We cannot easily do this in a computer language because there is no × symbol on most keyboards, so we use (A, B) for the product, following Haskell’s convention.)
This is a synchronisation primitive. It is important because it unlocks session types, which we will demonstrate in the second post.
Two further morphisms are added to the category (they are not derivable from the structural ones, but are needed to build useful things): map, which applies a pure function to each value in a stream, and filter, which removes values that do not satisfy a predicate. Both are pure functions over streams. Both will be familiar from functional programming.
Here is a graphical representation of the morphisms. We can glue them together freely, as long as the types and the directions of the arrows match up.
There are two forms of composition. Sequential composition connects morphisms nose to tail, the output of one feeding the input of the next. Parallel composition places them side by side, denoted by ⊗ (the tensor product, written directly in plumbing source code). So: four structural morphisms, two utilities, two compositional forms, all operating on typed channels.
Because the channels are typed, the compiler can check statically, at compile time, that every composition is well-formed: that outputs match inputs at every boundary. This gives a guarantee that the assembled graph makes sense.
A composition of morphisms is itself a morphism. This follows from the category laws (it has to, or it is not a category) but the practical consequence is worth stating explicitly. We can assemble a subgraph of agents and structural morphisms, and then forget the internal detail and use the entire thing as a single morphism in a larger graph. This gives modularity. We can study, test, and refine a building block in isolation, and once satisfied, use it as a component of something bigger.
What we have described so far is the static form of the language: concise, point-free (composing operations without naming intermediate values), all about compositions. This is what you write. It is not what the runtime executes. A compiler takes this static form and produces the underlying wiring diagram, expanding the compositions into explicit connections between ports. The relationship is similar to point-free style in functional programming: the concise form is good for thinking and writing; the expanded form is good for execution.
Agents
An agent is a special kind of morphism. It takes typed input and produces typed output, like any other morphism, and we can enforce these types. This much is a well-known technique; PydanticAI and the Vercel AI SDK do it. Agents implement typing at the language model level by producing and consuming JSON, and we can check that the JSON has the right form. This is the basis of the type checking.
Unlike the structural morphisms and utilities, an agent is stateful. It has a conversation history, a context window that fills up, parameters that change. You cannot sensibly model an agent as a pure function. You could model it using the state monad or lenses, and that would be formally correct, but it is the wrong level of abstraction for engineering. Instead, we allow ourselves to think of agents as opaque processes with a typed protocol for interacting with them. We mutate their state through that protocol, and we know how to do that purely from functional programming and category theory. The protocol is the right abstraction; the state management is an implementation detail behind it. How this works in practice, and what happens when it goes wrong, is the subject of the second post.
In addition to their main input and output ports, agents in plumbing have control ports (control in and control out) for configuring the agent at runtime. For example, the temperature parameter governs how creative a language model is: how wide its sampling distribution when choosing output. At zero it is close to deterministic; at one it becomes much less predictable. A control message might say set temperature to 0.3; the response on the control out wire might be acknowledged. The control port carries a typed stream like anything else.
Agents also have ports for operator-in-the-loop (often called human-in-the-loop, though there is no reason an operator must be human), tool calls, and telemetry. The telemetry port emits usage statistics and, if the underlying model supports it, thinking traces. We will not detail these here. Suffice it to say that an agent has several pairs of ports beyond what you might imagine as its regular chat input and output.
An agent has many ports, but most programs use only a few of them. We adopt a convention from the κ calculus: don’t care, don’t write. Any output port that is not mentioned in the program is implicitly connected to discard. If a port’s output cannot matter, there is no reason to write it down.
Example: adversarial document composition
Suppose the problem is to write a cover letter for a job application. You provide some background material (a CV, some notes, some publications) and a job advert. You want a network of agents to produce a good cover letter. A good cover letter has two constraints: it must be accurate, grounded in the source materials, not making things up; and it must be compelling, so that the reader wants to give you an interview.
These two constraints are in tension, and they are best served by different agents with different roles. A composer drafts from the source materials. A checker verifies the draft against those materials for accuracy, producing a verdict: pass or fail, with commentary. A critic, who deliberately cannot see the source materials, evaluates whether the result is compelling on its own terms, producing a score.
The feedback loops close the graph. If the checker rejects the draft, its commentary goes back to the composer. If the critic scores below threshold, its review goes back to the composer. Only when the critic is satisfied does the final draft emerge.
And here is a graphical representation of what’s going on:
The agent configuration is elided. The main pipeline takes a string input and produces a string output. It is itself a morphism, and could be used as a component in something larger.
Notice what the wiring enforces. The critic receives verdicts, not the original source materials. The information partition is a consequence of the types, not an instruction in a prompt. The feedback loops are explicit: a failed verdict routes back to the composer with commentary; a low score routes back with the review. All of this is checked at compile time.
Example: heated debate
The previous example shows sequential composition and feedback loops but not parallel composition. An ensemble of agents running simultaneously on the same input needs the tensor product.
Ensembles are common. Claude Code spawns sub-agents in parallel to investigate or review, then gathers the results. This is a scatter-gather pattern familiar from high-performance computing.
But this example, due to Vincent Danos, adds something less common: modulation of agent behaviour through the control port.
The input is a proposition. Two agents debate it, one advocating and one sceptical, running in parallel via the tensor product. Their outputs are synchronised by a barrier into a pair and
presented to a judge. The judge decides: has the debate converged? If so, a verdict goes to the output. If not, a new topic goes back to the debaters, and a temperature goes to their control inputs.
The intuition is that the debaters should start creative (high temperature, wide sampling) and become progressively more focused as the rounds continue. The judge controls this. Each round, the
judge decides both whether to continue and how volatile the next round should be. If the debate appears to be converging, the judge lowers the temperature, preventing the system from wandering
off in new directions. Whether this actually causes convergence is a research question, not a proven result.
type Verdict = { resolved: bool, verdict: string,
topic: string, heat: number }
type Control = { set_temp: number }
let advocate : (!string, !Control) -> !string = agent { ... }
let skeptic : (!string, !Control) -> !string = agent { ... }
let judge : !(string, string) -> !Verdict = agent { ... }
let cool : !Verdict -> !Control = map({set_temp: heat})
let main : !string -> !string = plumb(input, output) {
input ; (advocate ⊗ skeptic) ; barrier ; judge
judge ; filter(resolved = false).topic ; (advocate ⊗ skeptic)
judge ; filter(resolved = true).verdict ; output
judge ; cool ; (advocate@ctrl_in ⊗ skeptic@ctrl_in)
}
And here is the graphical representation:
The ⊗ operator is the tensor product: parallel composition. (The grammar also accepts * for editors that cannot input unicode.) The advocate and skeptic run simultaneously on the same input. The barrier synchronises their outputs into a pair for the judge. The last line is the control feedback: the judge’s verdict is mapped to a temperature setting and sent to both agents’ control inputs. Notice that advocate@ctrl_in addresses a specific port on the agent, the control port rather than the main input.
This is a small program. It is also a concurrent system with feedback loops, runtime parameter modulation, and a convergence protocol. Without types, getting the wiring right would be a matter of testing and hope. With types, it is checked before it runs.
What this shows
In a few lines of code, with a language that has categorical foundations, we can capture interesting agent systems and be sure they are well-formed before they run.
The upshot: when we have guarantees about well-formedness, systems work more stably and more predictably. With static typing, entire classes of structural errors are impossible. You cannot wire an output of one type to an input of another. You cannot forget a connection. The job you pay for is more likely to actually work, and you get more useful work per dollar spent. Runtime budget controls can put a ceiling on cost, but they do not prevent the waste. Static typing prevents the waste. But there is a lot more to do. What we have so far is already useful as a language for constructing agent graphs with static type checking. But we have given short shrift to the complexity and internal state of the agent morphism, which is really all about memory architecture and context management. That is where the real power comes from. For that we need more than a copy-discard category with some extra structure. We need protocols—and that is the subject of the sequel, soon to appear here.
The plumbing compiler, runtime, and MCP server are available as binary downloads for macOS and Linux:
How category theory can be used to help coordinate collections of interacting large language models.
Agent frameworks are popular. (These are frameworks for coordinating large language model agents, not to be confused with agent-based modelling in the simulation sense.) There are dozens of them for wrapping large language models in something called an agent and assembling groups of agents into workflows. Much of the surrounding discussion is marketing, but the underlying intuition is old: your web browser identifies itself as a user agent. What is new is the capability that generative language models bring.
The moment you have one agent, you can have more than one. That much is obvious. How to coordinate them is not. The existing frameworks (n8n, LangGraph, CrewAI, and others) are engineering solutions, largely ad hoc. Some, like LangGraph, involve real thinking about state machines and concurrency. But none draws on what we know from mathematics and computer science about typed composition, protocol specification, or structural guarantees for concurrent systems.
This matters because it is expensive. Multi-agent systems are complicated concurrent programs. Without structural guardrails, they fail in ways you discover only after spending the compute. A job can go off the rails, and the money you paid for it is wasted; the providers will happily take it regardless. At current subscription rates the cost is hidden, but a recent Forbes investigation found that a heavy user of Anthropic’s $200/month Claude Code subscription can consume up to $5,000/month measured at retail API rates. For third-party tools like Cursor, which pay close to those retail rates, these costs are real. Wasted tokens are wasted money.
To address this, we built a language called plumbing. It describes how agents connect and communicate, in such a way that the resulting graph can be checked before execution: checked for well-formedness, and within limits for deadlocks and similar properties. It is a statically typed language, and these checks are done formally. There is a compiler and a runtime for this language, working code, not a paper architecture. In a few lines of plumbing, you can describe agent systems with feedback loops, runtime parameter modulation, and convergence protocols, and be sure they are well-formed before they run. This post explains how it works.
The name has a history in computing. Engineers have always talked informally about plumbing to connect things together: bits of software, bits of network infrastructure. When I was a network engineer I sometimes described myself as a glorified plumber. The old Solaris ifconfig command took plumb as an argument, to wire a network interface into the stack. Plan 9 had a deeper version of the same idea. The cultural connection goes back decades.
This is the first of two posts. This one introduces the plumbing calculus: what it is, how it works, and a few simple examples. Motifs for adversarial review, ensemble reasoning, and synthesis. The second post will tackle something harder.
The calculus
The plumbing language is built on a symmetric monoidal category, specifically a copy-discard category with some extra structure. The terminology may be unfamiliar, but the underlying concept is not. Engineers famously like Lego. Lego bricks have studs on top and holes with flanged tubes underneath. The studs of one brick fit into the tubes of another. But Lego has more than one connection type: there are also holes through the sides of Technic bricks, and axles that fit through them, and articulated ball joints for the fancier kits. Each connection type constrains what can attach to what. This is typing.
In plumbing, the objects of the category are typed channels: streams that carry a potentially infinite sequence of values, each of a specific type (integer, string, a record type, or something more complex). We write !A to mean "a stream of As", so !string is a stream of strings and !int is a stream of integers. The morphisms, which describe how you connect channels together, are processes. A process has typed inputs and typed outputs.
There are four structural morphisms. Copy takes a stream and duplicates it: the same values appear on two output streams. Discard throws values away, perhaps the simplest thing you
can do with a stream, and often needed. These two, together with the typed channels and the laws of the category, give us a copy-discard category.
To this we add two more. Merge takes two streams of the same type and interleaves them onto a single output stream. This is needed because a language model’s input is a single stream. There is nothing to be done about that. If you want to send two different things into it, you must send one and then the other. One might initially give merge the type !A ⊗ !B → !(A + B), taking two streams of different types and producing their coproduct. This works, but it is unnecessarily asymmetrical.
As Tobias Fritz has observed, it is cleaner to do the coproduct injection first, converting each stream to the coproduct type separately, and then merge streams that already have the same type. This gives:
merge : !A ⊗ !A → !(A + A)
Barrier takes two streams, which may be of different types, and synchronises them. Values arrive unsynchronised; the barrier waits for one value from each stream and produces a pair.
barrier : !A ⊗ !B → !(A, B)
(A mathematician would write A B for the product. We cannot easily do this in a computer language because there is no symbol on most keyboards, so we use (A, B) for the product, following Haskell’s convention.)
This is a synchronisation primitive. It is important because it unlocks session types, which we will demonstrate in the second post.
Two further morphisms are added to the category (they are not derivable from the structural ones, but are needed to build useful things): map, which applies a pure function to each value in a stream, and filter, which removes values that do not satisfy a predicate. Both are pure functions over streams. Both will be familiar from functional programming.
Here is a graphical representation of the morphisms. We can glue them together freely, as long as the types and the directions of the arrows match up.
There are two forms of composition. Sequential composition connects morphisms nose to tail, the output of one feeding the input of the next. Parallel composition places them side by side, denoted by ⊗ (the tensor product, written directly in plumbing source code). So: four structural morphisms, two utilities, two compositional forms, all operating on typed channels.
Because the channels are typed, the compiler can check statically, at compile time, that every composition is well-formed: that outputs match inputs at every boundary. This gives a guarantee that the assembled graph makes sense.
A composition of morphisms is itself a morphism. This follows from the category laws (it has to, or it is not a category) but the practical consequence is worth stating explicitly. We can assemble a subgraph of agents and structural morphisms, and then forget the internal detail and use the entire thing as a single morphism in a larger graph. This gives modularity. We can study, test, and refine a building block in isolation, and once satisfied, use it as a component of something bigger.
What we have described so far is the static form of the language: concise, point-free (composing operations without naming intermediate values), all about compositions. This is what you write. It is not what the runtime executes. A compiler takes this static form and produces the underlying wiring diagram, expanding the compositions into explicit connections between ports. The relationship is similar to point-free style in functional programming: the concise form is good for thinking and writing; the expanded form is good for execution.
Agents
An agent is a special kind of morphism. It takes typed input and produces typed output, like any other morphism, and we can enforce these types. This much is a well-known technique; PydanticAI and the Vercel AI SDK do it. Agents implement typing at the language model level by producing and consuming JSON, and we can check that the JSON has the right form. This is the basis of the type checking.
Unlike the structural morphisms and utilities, an agent is stateful. It has a conversation history, a context window that fills up, parameters that change. You cannot sensibly model an agent as a pure function. You could model it using the state monad or lenses, and that would be formally correct, but it is the wrong level of abstraction for engineering. Instead, we allow ourselves to think of agents as opaque processes with a typed protocol for interacting with them. We mutate their state through that protocol, and we know how to do that purely from functional programming and category theory. The protocol is the right abstraction; the state management is an implementation detail behind it. How this works in practice, and what happens when it goes wrong, is the subject of the second post.
In addition to their main input and output ports, agents in plumbing have control ports (control in and control out) for configuring the agent at runtime. For example, the temperature parameter governs how creative a language model is: how wide its sampling distribution when choosing output. At zero it is close to deterministic; at one it becomes much less predictable. A control message might say set temperature to 0.3; the response on the control out wire might be acknowledged. The control port carries a typed stream like anything else.
Agents also have ports for operator-in-the-loop (often called human-in-the-loop, though there is no reason an operator must be human), tool calls, and telemetry. The telemetry port emits usage statistics and, if the underlying model supports it, thinking traces. We will not detail these here. Suffice it to say that an agent has several pairs of ports beyond what you might imagine as its regular chat input and output.
An agent has many ports, but most programs use only a few of them. We adopt a convention from the κ calculus: don’t care, don’t write. Any output port that is not mentioned in the program is implicitly connected to discard. If a port’s output cannot matter, there is no reason to write it down.
Example: adversarial document composition
Suppose the problem is to write a cover letter for a job application. You provide some background material (a CV, some notes, some publications) and a job advert. You want a network of agents to produce a good cover letter. A good cover letter has two constraints: it must be accurate, grounded in the source materials, not making things up; and it must be compelling, so that the reader wants to give you an interview.
These two constraints are in tension, and they are best served by different agents with different roles. A composer drafts from the source materials. A checker verifies the draft against those materials for accuracy, producing a verdict: pass or fail, with commentary. A critic, who deliberately cannot see the source materials, evaluates whether the result is compelling on its own terms, producing a score.
The feedback loops close the graph. If the checker rejects the draft, its commentary goes back to the composer. If the critic scores below threshold, its review goes back to the composer. Only when the critic is satisfied does the final draft emerge.
And here is a graphical representation of what’s going on:
The agent configuration is elided. The main pipeline takes a string input and produces a string output. It is itself a morphism, and could be used as a component in something larger.
Notice what the wiring enforces. The critic receives verdicts, not the original source materials. The information partition is a consequence of the types, not an instruction in a prompt. The feedback loops are explicit: a failed verdict routes back to the composer with commentary; a low score routes back with the review. All of this is checked at compile time.
Example: heated debate
The previous example shows sequential composition and feedback loops but not parallel composition. An ensemble of agents running simultaneously on the same input needs the tensor product.
Ensembles are common. Claude Code spawns sub-agents in parallel to investigate or review, then gathers the results. This is a
scatter-gather pattern familiar from high-performance computing.
But this example, due to Vincent Danos, adds something less common:
modulation of agent behaviour through the control port.
The input is a proposition. Two agents debate it, one advocating
and one sceptical, running in parallel via the tensor product.
Their outputs are synchronised by a barrier into a pair and
presented to a judge. The judge decides: has the debate converged?
If so, a verdict goes to the output. If not, a new topic goes back
to the debaters, and a temperature goes to their control inputs.
The intuition is that the debaters should start creative (high
temperature, wide sampling) and become progressively more focused
as the rounds continue. The judge controls this. Each round, the
judge decides both whether to continue and how volatile the next
round should be. If the debate appears to be converging, the
judge lowers the temperature, preventing the system from wandering
off in new directions. Whether this actually causes convergence
is a research question, not a proven result.
type Verdict = { resolved: bool, verdict: string,
topic: string, heat: number }
type Control = { set_temp: number }
let advocate : (!string, !Control) -> !string = agent { ... }
let skeptic : (!string, !Control) -> !string = agent { ... }
let judge : !(string, string) -> !Verdict = agent { ... }
let cool : !Verdict -> !Control = map({set_temp: heat})
let main : !string -> !string = plumb(input, output) {
input ; (advocate ⊗ skeptic) ; barrier ; judge
judge ; filter(resolved = false).topic ; (advocate ⊗ skeptic)
judge ; filter(resolved = true).verdict ; output
judge ; cool ; (advocate@ctrl_in ⊗ skeptic@ctrl_in)
}
And here is the graphical representation:
The ⊗ operator is the tensor product: parallel composition. (The grammar also accepts * for editors that cannot input unicode.) The advocate and skeptic run simultaneously on the same input. The barrier synchronises their outputs into a pair for the judge. The last line is the control feedback: the judge’s verdict is mapped to a temperature setting and sent to both agents’ control inputs. Notice that advocate@ctrl_in addresses a specific port on the agent, the control port rather than the main input.
This is a small program. It is also a concurrent system with feedback loops, runtime parameter modulation, and a convergence protocol. Without types, getting the wiring right would be a matter of testing and hope. With types, it is checked before it runs.
What this shows
In a few lines of code, with a language that has categorical foundations, we can capture interesting agent systems and be sure they are well-formed before they run.
The upshot: when we have guarantees about well-formedness, systems work more stably and more predictably. With static typing, entire classes of structural errors are impossible. You cannot wire an output of one type to an input of another. You cannot forget a connection. The job you pay for is more likely to actually work, and you get more useful work per dollar spent. Runtime budget controls can put a ceiling on cost, but they do not prevent the waste. Static typing prevents the waste. But there is a lot more to do. What we have so far is already useful as a language for constructing agent graphs with static type checking. But we have given short shrift to the complexity and internal state of the agent morphism, which is really all about memory architecture and context management. That is where the real power comes from. For that we need more than a copy-discard category with some extra structure. We need protocols—and that is the subject of the sequel, soon to appear here.
The plumbing compiler, runtime, and MCP server are available as binary downloads for macOS and Linux:
Your inbox registers an email from the chair of a faculty-hiring committee. With trembling fingers, you click on the message. “We were very impressed…we’re delighted to offer…” Months of labor, soul-searching, strain, and anxiety give way to jubilation. You hug your partner/roommate/mom/dog; throw an impromptu dance party; and forward the email, prefaced with five exclamation points, to your mentor.1
As your heart rate returns to a level less likely to alarm a cardiologist, a new source of uncertainty puckers your brow. You’ve received an offer of a faculty position. What happens now? How should you proceed?
This article will address those questions. It follows my guide to faculty interviews, which follows my guide to writing research statements. Like the former guide, this one pertains most to theoretical physicists seeking assistant professorships at R1-level North American universities. Yet all the advice pertains to candidates outside this pool.
The institution will bring you (and, if relevant, your partner) over for a visit. Yes, you visited to interview; but you’re now visiting for another purpose. Assess whether you and your family could flourish if you accepted the offer. Which neighborhoods might you like to live in? Could you tolerate the commute to campus? Vide infra for more questions to keep in mind.
Politely notify the other hiring committees that interviewed you and that are still considering your application. You’ll do the other committees a kindness: their chances of hiring you have narrowed. If they wish to lure you, they’ll need to act quickly. The notice may bump you up in their priority lists. Did the first institution request that you decide about its offer by some deadline? If so, notify the other institutions.
Gather all the information you need. The department may offer to put you in touch with faculty members, deans, and more. Request more connections if necessary. Approach each conversation with a list of questions, and take notes. How will the tenure process unfold? How do early-career faculty members characterize their experiences with it? To what extent does the department shield early-career faculty from administrative duties (serving on committees)? How do the institutions’ policies address parental leave and elder care? If you have a child as an assistant professor, will your tenure clock pause for a year (will you be able to build your credentials for an extra year before applying for tenure)? In which neighborhoods should you search for a house?
List your priorities. Rank them. Measure each offer against each criterion. Here are example priorities that you might wish to include:
Salary
Startup package
Type of environment: Do you want to live in a city, in the suburbs, or in the country? Do you drive, or would you learn to drive?
Length of commute
Geographical location: Do you prefer to live near family?
Proximity, and means of transportation, to an airport: You might commute to and from that airport many times to participate in conferences, present seminars, etc. How much time and exhaustion would the experience cost?
Local school system: If you have or might have children, where would they learn?
Partner’s needs: Do you have a partner who would need to find a job near yours?
Proximity of faculty with whom you could collaborate
Courtesy positions in other departments: Suppose you’re a physicist who studies quantum computation. You might want to recruit students from the computer-science or math department occasionally. Could you? Would you need a courtesy position in the other department? A courtesy appointment offers you limited privileges at the cost of limited responsibilities: you probably won’t be able to vote in the other department’s faculty meetings. On the other hand, you probably won’t need to spend time on those faculty meetings.
Academic quality of undergraduate/graduate population
Presence of an institute/center dedicated to your specialization
Lab space: location, size, quality, renovations available, how soon and quickly the university would undertake those renovations
Help with finding housing: Some universities have apartments that new faculty can rent for a year or two. Other universities offer real-estate-agent services or help faculty obtain mortgages (*cough* San Francisco Bay area *cough*).
Administrative assistance for you and your research group
Protection from onerous service to the department until you reach tenure
Teaching relief granted en route to tenure: At some universities, a new faculty member can avoid teaching their usual course load during one or two semesters. Such relief frees you to buff up your research program while pursuing tenure.
Deferral: Deferring an offer, you postpone the time at which you take up the new mantle. When I accepted a permanent position, I was completing year two of a three-year postdoctoral fellowship. I wanted to complete the final year before assuming my new role: I was still finishing projects with the community to which I belonged, and I wanted to continue deepening my ties with that community. Also, I enjoyed undertaking research without the distraction of a primary investigator’s administrative responsibilities. Other people defer their start dates for other reasons. For example, a partner might need time to fulfill a contract where they live and work. In my experience, people tend to defer PI positions for approximately twelve months, give or take six months. Some institutions don’t offer deferrals, though.
Identify everything you’ll need in a startup package. A startup package helps cover your research program’s costs until you’ve won your first grants. Multiple organizations within a university might contribute to a startup package—for example, a department and an institute that cuts across departments. The hiring committee might propose a startup package to you, or the committee might ask you what you need. Either way, you can (and should) negotiate the package.
View the negotiation in terms of the question “What do I need to succeed?” List every item, and estimate its cost. Don’t skimp on rigor: estimate prices to the single-dollar level of precision. Such precision helps demonstrate the thoroughness of the research behind your list—helps demonstrate that you need every dollar you seek.
Request more funding than you believe you’ll need, because you will need more than you believe. Build the breathing room into your estimates. For example, assume that your academic visitors will fly from across the country or across the world. Assume that you’ll fly such distances to present talks. Estimating how much you’ll pay a student or postdoc throughout the next few years? Don’t forget that the university might raise salaries and benefits under a cost-of-living adjustment (COLA) every year. COLAs fluctuate across years, so assume you’ll face steep ones.
Here are examples of items that a startup package can include:
Summer salary: Your institution won’t pay your salary during the summer; you’ll need to fund yourself through grants. A startup package can cover the initial summers.
Lab equipment
Computers and tablets for you and your group members: Don’t forget protective cases, AppleCare or a non-Apple equivalent, implements for writing on tablets, external mice, and external monitors. Check whether your department or institute has spare mice or monitors that you can requisition.
Other computational resources: Does the department have a computational cluster that you intend to use? Do you need to access a national lab’s supercomputer or a quantum computer available on the cloud? How much will you pay per unit time and memory?
Postdoc costs: These costs include a salary, benefits, and the cost of moving to your institution. The salary will increase from year to year if the institution implements a COLA. The benefits include healthcare, dental care, and the like. Administrators might call benefits “fringe,” as I discovered after considerable confusion.
Graduate-student costs: These costs include a research assistantship, benefits, and possibly tuition. The salary might increase as a student progresses through the stages of their PhD, particularly once they achieve candidacy. Their need for tuition might change, too. Check whether domestic students cost more than international students, and budget for international students.
Undergraduate researchers: Do you plan to employ an undergraduate during the summer? Throughout the academic year?2
Travel for yourself: Budget several trips per year for yourself. You’ll need to spread the word about your research and to grow your network en route to tenure.
Travel for your postdocs and students: A mentor shared that she covers one conference per year per group member. You might want to budget also for a seminar or two per group member per year.
Visitors: Visitors can boost your research program. Budget for week-long visits if your institution can accommodate them.
Negotiate. Even if your dream school has offered you your dream job. Even if you receive only one offer. You might still garner resources that can help your research program and family to thrive. Don’t feel shy, sheepish, or ashamed to negotiate. If you remain polite and considerate, you won’t offend anyone. Besides, the hiring committee, department chair, and dean expect you to negotiate. The department chair might even hope that you do so; vide infra.
When I was a PhD student, Caltech offered a workshop about negotiation to women grad students. The workshop helped participants build skills, knowledge, and self-assurance that would benefit us when we negotiated contracts. I recommend attending a workshop, taking a course, reading a book, or watching videos about negotiation. Contact your institution’s professional-development office about opportunities and suggestions. If you’re reading this blog post before applying for jobs—any jobs—start now.
What can you negotiate for? Many of the items on your list of priorities. Certain institutions might lack the freedom to negotiate certain items, though. For example, a union might determine salaries. Don’t let such a discovery discourage you; explore the options thoroughly.
View the department chair as an ally. The department chair negotiates on your behalf with administrators higher up in the university hierarchy, such as deans. The chair aims to garner as many resources as possible for you—and, by extension, for their department. Explain to the department chair (or to the committee chair who might explain to the department chair) what you need and why you need it, to help strengthen their argument.
As soon as you know you’ll decline an offer, decline it politely. Your notification will free the committee to attract another candidate. Imagine you’re Candidate #2 on the priority list. Wouldn’t you want the current offer recipient to decline their offer as soon as their conscience allows? Now, imagine you’re the hiring-committee chair. You’re worried that Candidate #1 will decline—and, by the time they decline, other institutions will have snapped up the other top candidates. As Candidate #1, demonstrate toward the committee chair and toward Candidate #2 the consideration that you’d value if in their shoes.
Savor the moment. You’ve just survived the faculty-application process, one of the most stressful periods of your life. The faculty life is no walk in the park, either. Nor will you necessarily sleep soundly between the receipt of your first offer and your signing of a contract. The prospect of more offers could leave you in limbo. If you receive multiple offers, choosing between them—choosing the course of your and your family’s life—may stress you as much as applying did. So remember to feel grateful for the source of your anxiety. Give yourself credit for your accomplishment.
Congratulations!
1Please do! They’ll want to celebrate with you.
2I recommend targeting undergrads who’ll work with you for more than a summer. Training an undergrad takes nearly a summer; you and the student will benefit from having time to take advantage of that training.
One of the simplest quantum systems is a spin-1/2 particle, also known as a spinor. If we measure the angular momentum of a spin-1/2 particle along any axis, there are two possible outcomes: either the angular momentum along that axis is +1/2, or it’s -1/2.
How is it possible for this to be true along every axis? Here I explain this, using the basic rules of quantum physics described last time. In particular, I say how any point on a sphere of radius 1/2 gives a quantum state of the spin 1/2 particle—and vice versa!
Using this, we can understand things like the famous Stern–Gerlach experiment, where we measure the angular momentum of a spin-1/2 particle first along one axis, and then along another.
While we ate our pizza I was talking with AB about differential equations, which they’re about to start doing in calculus. We talked about y’=y. “This is why populations grow exponentially,” I explained, “because their growth is proportional to the existing population, so they satisfy y’ = cy and we can see that a differential equation like that has an exponential as its solution.”
This was met with skepticism. “Why couldn’t you just say we know populations grow exponentially and that’s how you know they have that differential equation?”
My first reaction was to say, “no, it goes the other way,” but then I realized that what AB had asked me was in fact very deep. This two-way traffic sign she was gesturing at is really at the heart of science! Sometimes you have a mechanism and you use mathematics to work out its consequences, including consequences you can’t observe directly. Sometimes you have observations and no mechanism, and then you cherchez la differential equation. In a world where we had no idea how flies were spawned, you could count the number of flies over time and reason that maybe, just maybe, y’ = cy was the governing law, and so somehow each existing fly was emitting new flies at a constant rate.
Once I realized this, I launched into a whole spiel about Newton and his inference of the gravitational differential equation from planetary motion, probably getting most of it slightly wrong, or maybe even totally wrong, but that is a dad’s prerogative sometimes.
Last Thursday, my friend and colleague Sam Baker, in UT Austin’s English department, convened an “emergency panel” here about the developing Pentagon/Anthropic situation, and asked me to speak at it. Even though the situation has continued to develop since then, I thought my prepared remarks for the panel might be of interest. At the bottom, I include a few additional thoughts.
Hi! I’m Scott Aaronson! I teach CS here at UT. While my background is in quantum computing, I’ve spent the past four years dabbling in AI alignment. I did a two-year leave at OpenAI, in their now-defunct Superalignment team. I joined back when OpenAI’s line was “we’re a little nonprofit, doing all this in the greater interest of humanity, and we’d dissolve ourselves before we raced to build an AI that we thought would be dangerous.” I know Sam Altman, and many other current and former OpenAI people. I also know Dario Amodei—in fact, I knew Dario well before Anthropic existed. Despite that, I don’t actually feel like I have deep insight into the current situation with Anthropic and the Pentagon that you wouldn’t get by reading the news, or (especially) reading commentators like Zvi Mowshowitz, Kelsey Piper, Scott Alexander, and Dean Ball. But since I was asked to comment, I’ll try.
The first point I’ll make: the administration’s line, to the extent they’ve had a consistent line, is basically that they needed to cut off Anthropic because Anthropic is a bunch of woke, America-hating, leftist radicals. I think that, if you actually know the Anthropic people, that characterization is pretty laughable. Unless by “woke,” what the administration meant was “having any principles at all, beyond blind deference to authority, and sticking to them.”
I mean, Anthropic only got into this situation in the first place because it was more eager than the other AI companies to support US national security, by providing a version of Claude that could be used on classified networks. So they signed a contract with the Pentagon, and that contract had certain restrictions in it, which the Pentagon read and agreed to … until they decided that they no longer agreed.
That brings me to my second point. The Pentagon regularly signs contracts with private firms that limit what the Pentagon can do in various ways. That’s why they’re called military contract-ors. So anyone who claims it’s totally unprecedented for Anthropic to try to restrict what the government can do with Anthropic’s private property—I think that person is either misinformed or else trying to misinform.
The third point. If the Pentagon felt that it couldn’t abide a private company telling it what is or isn’t an appropriate military use of current AI, then the Pentagon was totally within its rights to cancel its contract with Anthropic, and find a different contractor (like OpenAI…) that would play ball. So it’s crucial for everyone here to understand that that’s not all that the Pentagon did. Instead they said: because Anthropic dared to stand up to us, we’re going to designate them a Supply Chain Risk—a designation that was previously reserved for foreign nation-state adversaries, and that, incredibly, hasn’t been applied to DeepSeek or other Chinese AI companies that arguably do present such risks. So basically, they threatened to destroy Anthropic, by making it horrendously complicated for any companies that do business with the government—i.e., just about all companies—also to do business with Anthropic.
Either that, the Pentagon threatened, or we’ll invoke the Defense Production Act to effectively nationalize Anthropic—i.e., we’ll just commandeer their intellectual property, use it for whatever we want despite Anthropic’s refusal. You get that? Claude is both a supply chain risk that’s too dangerous for the military to use, and somehow also so crucial to the supply chain that we, the military, need to commandeer it.
To me, this is the authoritarian part of what the Pentagon is doing (with the inconsistency being part of the authoritarianism; who but a dictator gets to impose his will on two directly contradictory grounds?). It’s the part that goes against the free-market principles that our whole economy is built on, and the freedom of speech and conscience that our whole civilization is built on. And I think this will ultimately damage US national security, by preventing other American AI companies from wanting to work on defense going forward.
That brings me to the fourth point, about OpenAI. While this was going down, Sam Altman posted online that he agreed with Anthropic’s red lines: LLMs should not be used for killing people with no human in the kill chain, and they also shouldn’t be used for mass surveillance of US citizens. I thought, that’s great! The frontier AI labs are sticking together when the chips are down, rather than infighting.
But then, just a few hours after the Pentagon designated Anthropic a supply chain risk, OpenAI announced that it had reached a deal with the Pentagon. Huh?!? If they have the same red lines, then why can one of them reach a deal while the other can’t?
The experts’ best guess seems to be this: Anthropic said, yes, using AI to kill people autonomously or to surveil US citizens should already be illegal, but we insist on putting those things in the contract to be extra-double-sure. Whereas OpenAI said, the Pentagon can use our models for “all lawful purposes”—this was the language that the Pentagon had insisted on. And, continued OpenAI, we interpret “all lawful purposes” to mean that they can’t cross these red lines. But if it turns out we’re wrong about that … well, that’s not our problem! That’s between the Pentagon and the courts, or whatever.
Again, we don’t fully know, because most of the relevant contracts haven’t been made public, but that’s an inference from reading between the lines of what has been made public.
Back in 2023-2024, when there was the Battle of the Board, then the battle over changing OpenAI’s governance structure, etc., some people formed a certain view of Sam, that he would say all the good and prosocial and responsible things even while he did whichever thing maximized revenue. I’ll leave it to you whether last week’s events are consistent with that view.
OK, fifth and final point. I remember 15-20 years ago, talking to Eliezer Yudkowsky and others terrified about AI. They said, this is the biggest issue facing the world. It’s not safe for anyone to build because it could turn against us, or even before that, the military could commandeer it or whatever. And I and others were like, dude, you guys obviously read too much science fiction!
And now here we are. Not only are we living in a science-fiction story, I’d say we’re living in a particularly hackneyed one. I mean, the military brass marching into a top AI lab and telling the nerds, “tough luck, we own your AI now”? Couldn’t reality have been a little more creative than that?
The point is, given the developments of the past couple weeks, I think we now need to retire forever the argument against future AI scenarios that goes, “sorry, that sounds too much like a science-fiction plot.” As has been said, you’d best get used to science fiction because you’re living in one!
Updates and Further Thoughts: Of course I’ve seen that Anthropic has now filed a lawsuit to block the Pentagon from designating it a supply chain risk, arguing that both its free speech and due process rights were violated. I hope their lawsuit succeeds; it’s hard for me to imagine how it wouldn’t.
The fact that I’m, obviously, on Anthropic’s side of this particular dispute doesn’t mean that I’ll always be on Anthropic’s side. Here as elsewhere, it’s crucial not to outsource your conscience to anyone.
[In shutting down Starlink over Ukraine,] Elon Musk actively did the exact thing [the Pentagon is] accusing Anthropic of maybe doing. He made a strategic decision of national security at the highest level as a private citizen, in the middle of an active military operation in an existential defensive shooting war, based on his own read of the situation. Like, seriously, what the actual fuck.
Eventually we bought those services in a contract. We didn’t seize them. We didn’t arrest Musk. Because a contract is a contract is a contract, and your private property is your private property, until Musk decides yours don’t count.
Another key quote in Zvi’s piece, from Gregory Allen:
And here’s the thing. I spent so much of my life in the Department of Defense trying to convince Silicon Valley companies, “Hey, come on in, the water is fine, the defense contracting market, you know, you can have a good life here, just dip your toe in the water”.
And what the Department of Defense has just said is, “Any company that dips their toe in the water, we reserve the right to grab their ankle, pull them all the way in at any time”. And that is such a disincentive to even getting started in working with the DoD.
Lastly, I’d like to address the most common counterargument against Anthropic’s position—as expressed for example by Noah Smith, or in the comments of my previous post on this. The argument goes roughly like so:
You, nerds, are the ones who’ve been screaming for years about AI being potentially existentially dangerous! So then, did you seriously expect to stay in control of the technology? If it’s really as dangerous and important as you say, then of course the military was going to step in at some point and commandeer your new toy, just like it would if you were building a nuclear weapon.
Two immediate responses:
Even in WWII, in one of the most desperate circumstances in human history, the US government didn’t force a single scientist at gunpoint to build nuclear weapons for them. The scientists did so voluntarily, based on their own considered moral judgment at the time (even if some later came to regret their involvement).
Even if I considered it “inevitable” that relatively thoughtful and principled people, like Dario Amodei, would lose control over the future to gleeful barbarians like Pete Hegseth, it still wouldn’t mean I couldn’t complain when it happened. This is still a free country, isn’t it?
It's been an extremely busy time, and there are all kinds of distressing events afoot. Talking about new science results or the funding situation can seem self-indulgent when there are ongoing global events of huge impact. That said, it's important not to lose sight of the humanity in the global physics community. This past weekend, Tony Leggett passed away (wikipedia page here).
[Science digression: 3He atoms are fermions - if you add up the spin angular momentum from the two protons, the neutron, and the two electrons, you end up with a net spin of 1/2. To condense into a superfluid state, by analogy with electrons in superconductors, the 3He atoms need to pair up, and it's the pairs that condense into the superfluid. This pairing ends up being quite complicated; the pair of 3He atoms end up having \(\ell = 1\) orbital angular momentum, and this implies that the nuclear spins of the 3He atoms in the pair have to form a triplet. Prof. Leggett figured out a ton of the insights on this topic - see here for an early paper on this, and here for a definitive review c. 1975.]
Prof. Leggett made many contributions beyond 3He. For example, he and others studied the problem of a tunneling particle coupled to some dissipative environment (like phonons, say), and similarly of a two-state quantum system coupled to a "bath", as in this paper with several thousand citations. These both had close connections to the "measurement" problem in quantum mechanics - how in detail do you go from a highly quantum system (e.g., a particle tunneling out of a bound state, or a particle coherently oscillating back and forth) and end up with more classical-looking outcomes due to coupling to "baths" with large numbers of degrees of freedom? He was interested in these kinds of foundational quantum issues all the way along (see this 1980 paper) and was still writing about them within the last couple of years. Prof. Leggett also wrote important tutorial reviews of superfluidity and of Bose-Einstein condensation in ultracold gases. When I got to meet him on a trip through Stanford, I was introduced to the ideas that he and Clare Yu developed about tunneling two-level systems in solids - looking at the big question of why the properties of TLS in disordered solids are so universal even though the materials can be very different at the microscopic level. He was a great scientist while also being a kind person.
Ben Recht‘s book The Irrational Decision: How We Gave Computers the Power to Choose for Uscomes out tomorrow! I was privileged to get my copy early. And if you want an opinionated, informed, no-bullshit take on what optimization actually does and what purpose (and whose purposes) it serves, written by somebody who knows this subject inside and out, this, my readers, is your March reading. Here’s the Amazon purchase page if that’s how you like to get books. And if you’re not already reading Ben’s arg min blog, get thee thither! It’s where the all the saltiest action in inference, statistics, and machine learning is going on.
A nice but unfamiliar power-pop number came on in the pizzeria where we were having lunch. “I like this,” I said. “It sounds it’s from around 2005, kind of a Fountains of Wayne sound.”
Indeed, this song came out in 2005, and it was written by Adam Schlesinger (RIP) of Fountains of Wayne. When I’m good, I’m good!
This is a crash course on the basic principles of quantum physics! In a self-contained way, I explain quantum states and the basic rule for computing probabilities.
It was a fun challenge stripping down everything to the bare minimum. Of course there is much more to say, but I was focused on leaving out everything that was not absolutely essential—to get to the real core of things.
There’s a huge fog of confusion surrounding most popular introductions to quantum mechanics, and I wanted to avoid all that. To do this, we have to use language in a pretty careful way.
Manet’s famous painting Un Bar aux Folies-Bergère never appealed to me. But now I realize its genius, and my spine tingles every time I see it.
The perspective looks all wrong. You’re staring straight at this barmaid, but her reflection in the mirror is way off to right. Even worse, her reflection is facing a guy who doesn’t appear in the main view!
But in 2000, a researcher showed this perspective is actually possible!!! To prove it, he did a reconstruction of this scene:
Here is Park’s reconstruction of the scene in Manet’s painting. How does it work? In fact the woman is viewed from an angle! While the man cannot be seen directly, his reflection is visible!
This diagram, created by Park with help from Darren McKimm, shows how the perspective works:
We are not directly facing the mirror, and while the man is outside our field of view, his reflection can be seen.
Astounding! But it’s not just a technical feat. It allowed Manet to make a deep point. While the woman seems to be busy serving her customer, she is internally completely detached—perhaps bored, perhaps introspective. She is split.
To fully understand the painting you also need to know that many of the barmaids at the Folies Bergère also served as prostitutes. Standing behind the oranges, the champagne and a bottle of Bass ale, the woman is just as much a commodity as these other things. But she is coldly detached from her objectification.
The woman in the painting was actually a real person, known as Suzon, who worked at the Folies Bergère in the early 1880s. For his painting, Manet posed her in his studio.
Before I understood this painting, I wasn’t really looking at it: I didn’t see it. I didn’t even see the green shoes of the trapeze artist. I can often grasp music quite quickly. But paintings often fail to move me until someone explains them.
When Édouard Manet came out with this painting in 1882, some critics mocked him for his poor understanding of perspective. Some said he was going senile. It was, in fact, his last major painting. But he was a genius, and he was going… whoosh… over their heads, just like he went over mine.
Sorry to interrupt your regular programming about the AI apocalypse, etc., and return to the traditional beat of this blog’s very earliest years … but I’ve now gotten multiple messages asking me to comment on something called the “JVG (Jesse–Victor–Gharabaghi) algorithm” (yes, the authors named it after themselves). This is presented as a massive improvement over Shor’s factoring algorithm, which could (according to popular articles) allow RSA-2048 to be broken using only 5,000 physical qubits.
On inspection, the paper’s big new idea is that, in the key step of Shor’s algorithm where you compute xr mod N in a superposition over all r’s, you instead precompute the xr mod N’s on a classical computer and then load them all into the quantum state.
Alright kids, why does this not work? Shall we call on someone in the back of the class—like, any undergrad quantum computing class in the world? Yes class, that’s right! There are exponentially many r’s. Computing them all takes exponential time, and loading them into the quantum computer also takes exponential time. We’re out of the n2-time frying pan but into the 2n-time fire. This can only look like it wins on tiny numbers; on large numbers it’s hopeless.
Even for those who know nothing about quantum algorithms, is there anything that could’ve raised suspicion here?
The paper didn’t appear on the arXiv, but someplace called “Preprints.org.” Come to think of it, I should add this to my famous Ten Signs a Claimed Mathematical Breakthrough is Wrong! It’s not that there isn’t tons of crap on the arXiv as well, but so far I’ve seen pretty much only crap on preprint repositories other than arXiv, ECCC, and IACR.
Judging from a Google search, the claim seems to have gotten endlessly amplified on clickbait link-farming news sites, but ignored by reputable science news outlets—yes, even the usual quantum hypesters weren’t touching this one!
Often, when something is this bad, the merciful answer is to let it die in obscurity. In this case, I feel like there was a sufficient level of intellectual hooliganism, just total lack of concern for what’s true, that those involved deserve to have this Shtetl-Optimized post as a tiny bit of egg on their faces forever.
It’s our best theory of elementary particles and forces. It’s absolutely amazing: it took centuries of genius to discover that the world is like this, and it’s absolutely shocking. But nobody believes it’s the last word, so we simply call it The Standard Model.
But what does this theory say? I’ll try to explain part of it in this series of videos. I begin by introducing the cast of characters—the particles—and a bit about their interactions:
If you have questions, please ask—either here or on YouTube! Intelligent questions keep me motivated. Without them, I get bored.
By the way, these videos will contain mistakes. For example, this time I forgot to mention one key particle before saying “So I’ve introduced all the actors in the drama.” When I get better at editing videos, I will correct slips like this. But I will always try to point out errors in a “pinned” comment right below the video. So look down there.
Also: I don’t plan to explain the details of quantum field theory. So even if you watch all my videos, you’ll get just a taste of the Standard Model. But I will get into some of the math, so it will be much more than just chat. It will roughly follow this paper:
Teaching is one of those things that’s always controversial.
There seems to be a constant tug of war between two approaches. In one, thought of as old-fashioned and practical, students are expected to work hard, study to memorize facts and formulas, and end up with an impressive ability to reproduce the knowledge of the past. In the other, presented as more modern or more permissive, students aren’t supposed to memorize, but to understand, to get intuition for how things work, and are expected to end up more creative and analytical, able to come up with new ideas and understand things in ways their predecessors could not. This whole thing then gets muddled further with discussions of which skills actually matter in the modern day, with the technology of the hour standing in. If adults can use calculators, why should students be able to do arithmetic? If adults can use AI, why should students be able to draw, or write, or reason?
I’ve taught a little in my day, though likely less than I should. More frequently, I’ve learned. And, with apologies to the teachers and education experts who read this blog, I’ve got my own opinion.
I don’t think anyone in the old-fashioned/new-fashioned tug of war is thinking about education right.
People talk about memorization, when they should be talking about practice.
We want kids to be able to multiply and divide numbers. That’s not because they won’t have calculators. It’s because we want to teach them things that build on top of multiplying and dividing numbers. We want some of them to learn how to multiply and divide polynomials, and if you don’t know how to multiply and divide numbers, then learning to multiply and divide polynomials is almost impossible. We want some of them to learn abstract generalizations, groups and rings and fields, and if you’re not comfortable with the basics, then learning these is almost impossible. And for everyone, we want them to get used to making a logical argument why something is true, in a context where we can easily judge whether the argument works.
This doesn’t mean that we need students to memorize their times tables, though. It helps, sure. But we don’t actually care whether students can recite 5 times 7 equals 35, that’s not our end goal. Instead, we want to make sure that students can do these operations, and that they find them easy to do. And ultimately, that doesn’t come from memorization, but from practice. It comes from using the ideas, again and again, until it’s obvious how to step ahead to the results. You can’t replicate that with pure understanding, like some more modern approaches try to. You need the “muscle memory”, and that takes real practice. But you also can’t get there by memorizing isolated facts for an exam. You need to use them.
Understanding is important too, though. We need students to know the limits of their knowledge, not just what they’ve been taught but why it’s true. It’s the only way to get adults who can generalize, who can accept that maybe there is a type of math with numbers that square to zero without dismissing it as a plot to corrupt the youth. It’s the only way to get students who can go to the next level, and the next, and then generate new knowledge on their own.
But that understanding often gets left by the wayside, when teachers forget what it’s for. If you try to teach the Pythagorean theorem by showing a few examples, or tell students stories where different types of energy are different “stuff”, you’re trying to convey an intuitive understanding, but not the useful kind. What you’re trying to give the students is stories about how things work. But the kind of understanding we need students to have isn’t of stories. It’s of justifications, and arguments. Students should understand why what they are taught is true, and understanding why doesn’t mean having a feeling in their hearts about it: it means they can convince a skeptic.
It’s easier, for a world full of overworked teachers from a variety of backgrounds, to teach the simpler versions of these. It’s easy for a traditionalist teacher to drill their students on memorization, and test them on memorization. It’s easier for a sympathetic teacher to tell students stories, based on stories the teacher thinks they understand.
But if you want the traditionalist approach to work, you have to actually do things, to practice using ideas rather than merely know them, to have that experience down as reflexively as those times tables. And if you want the modern approach to work, you have to actually understand why what you’re teaching is true, the way you would convince a skeptic that it is true, and then convey those justifications to the students.
And if you, instead, are a student:
Don’t worry about memorizing facts, you’ll drill too hard and stress yourself out. Don’t worry about finding a comfortable story, because no story is true. Use the ideas you’re learning. Use them to convince yourself, and to convince others. Use them again and again, until you reach for them as easily as breathing. When you can use what you’re learning, and know why it holds, then you’re ready to move forward.
To start on a somber note: those of us at UT Austin are in mourning this week for Savitha Shan, an undergrad double major here in economics and information systems, who was murdered over the weekend by an Islamist terrorist who started randomly shooting people on Sixth Street, apparently angry about the war in Iran. Two other innocents were also killed.
As it happens, these murders happened just a few hours after the end of my daughter’s bat mitzvah, and in walking distance from the venue. The bat mitzvah itself was an incredibly joyful and successful event that consumed most of my time lately, and which I might or might not say more about—the nastier the online trolls get, the more I need to think about my family’s privacy.
Of all the many quantum computing podcasts/interviews I’ve done recently, I’m probably happiest with this one, with Yuval Boger of QuEra. It covers all the main points about where the hardware currently is, the threat to public-key cryptography, my decades-long battle against quantum applications hype, etc. etc., and there’s even an AI-created transcript that eliminates my verbal infelicities!
A month ago, I blogged about “The Time I Didn’t Meet Jeffrey Epstein” (basically, because my mom warned me not to). Now the story has been written up in Science magazine, under the clickbaity headline “Meet Three Scientists Who Said No to Epstein.” (Besides yours truly, the other two scientists are friend-of-the-blog Sean Carroll, whose not-meeting-Epstein story I’d already heard directly from him, and David Agus, whose story I hadn’t heard.)
To be clear: as I explained in my post, I never actually said “no” to Epstein. Instead, based on my mom’s advice, I simply failed to follow up with his emissary, to the point where no meeting ever happened.
Anyway, ever since Science ran this story and it started making the rounds on social media, my mom has been getting congratulatory messages from friends of hers who saw it!
I’ve been a huge fan of the philosopher-novelist Rebecca Newberger Goldstein ever since I read her celebrated debut work, The Mind-Body Problem, back in 2005. Getting to know Rebecca and her husband, Steven Pinker, was a highlight of my last years at MIT. So I’m thrilled that Rebecca will be visiting UT Austin next week to give a talk on Spinoza, related to her latest book The Mattering Instinct (which I’m reading right now), and hosted by me and my colleague Galen Strawson in UT’s philosophy department. More info is in the poster below. If you’re in Austin, I hope to see you there!
The 88-year-old Donald Knuth has published a 5-page document about how Claude was able to solve a tricky graph theory problem that arose while he was working on the latest volume of The Art of Computer Programming—a series that Knuth is still writing after half a century. As you’d expect from Knuth, the document is almost entirely about the graph theory problem itself and Claude’s solution to it, eschewing broader questions about the nature of machine intelligence and how LLMs are changing life on Earth. To anyone who’s been following AI-for-math lately, the fact that Claude now can help with this sort of problem won’t come as a great shock. The virality is presumably because Knuth is such a legend that to watch him interact productively with an LLM is sort of like watching Leibniz, Babbage, or Turing do the same.
John Baez is a brilliant mathematical physicist and writer, who was blogging about science before the concept of “blogging” even existed, and from whom I’ve learned an enormous amount. But regarding John’s quest for the past 15 years — namely, to use category theory to help solve the climate crisis (!) — I always felt like the Cookie Monster would, with equal intellectual justification, say that the key to arresting climate change was for him to eat more Oreos. Then I read this Quanta article on the details of Baez’s project, and … uh … I confess it failed to change my view. Maybe someday I’ll understand why it’s better to say using category theory what I would’ve said in a 100x simpler way without category theory, but I fear that day is not today.
Although I have long retired from serious chess tournaments (they take too much time, a luxury I do not have anymore - even more so now that I have two infants to help grow!), I insist playing online blitz on chess.com, with alternating fortunes. My elo rating hovers in the 2200-2300 range, signalling that I still have my wits around me (I figure it is a very good way to keep a watch on my mental capabilities: if Alzheimer lurks, I will spot it early).
Vance Honeycutt is 22 years old and already widely considered a bust. Lots of power in college, Orioles’ first-round draft pick in 2024, but has a gigantic hole in his swing and spent 2025 striking out all the time, absurdly much, so much that people figured he was simply not going to learn to hit.
He has come to the plate four times in spring training and has hit four home runs. The last one went 471 feet. The sample size, she is, how we say, small. But I’m glad the guy is getting this moment of glory, at least, even if it ends up being short.
We are now at an exciting point in our process of developing quantum computers and understanding their computational power: It has been demonstrated that quantum computers can outperform classical ones (if you buy my argument from Parts 1 and 2 of this mini series). And it has been demonstrated that quantum fault-tolerance is possible for at least a few logical qubits. Together, these form the elementary building blocks of useful quantum computing.
And yet: the devices we have seen so far are still nowhere near being useful for any advantageous application in, say, condensed-matter physics or quantum chemistry, which is where the promise of quantum computers lies.
So what is next in quantum advantage?
This is what this third and last part of my mini-series on the question “Has quantum advantage been achieved?” is about.
The 100 logical qubits regime I want to have in mind the regime in which we have 100 well-functioning logical qubits, so 100 qubits on which we can run maybe 100 000 gates.
Building devices operating in this regime will require thousand(s) of physical qubits and is therefore well beyond the proof-of-principle quantum advantage and fault-tolerance experiments that have been done. At the same time, it is (so far) still one or more orders of magnitude away from any of the first applications such as simulating, say, the Fermi-Hubbard model or breaking cryptography. In other words, it is a qualitatively different regime from the early fault-tolerant computations we can do now. And yet, there is not a clear picture for what we can and should do with such devices.
The next milestone: classically verifiable quantum advantage
In this post, I want to argue that a key milestone we should aim for in the 100 logical qubit regime is classically verifiable quantum advantage. Achieving this will not only require the jump in quantum device capabilities but also finding advantage schemes that allow for classical verification using these limited resources.
Why is it an interesting and feasible goal and what is it anyway?
To my mind, the biggest weakness of the RCS experiments is the way they are verified. I discussed this extensively in the last posts—verification uses XEB which can be classically spoofed, and only actually measured in the simulatable regime. Really, in a quantum advantage experiment I would want there to be an efficient procedure that will without any reasonable doubt convince us that a computation must have been performed by a quantum computer when we run it. In what I think of as classically verifiable quantum advantage, a (classical) verifier would come up with challenge circuits which they would then send to a quantum server. These would be designed in such a way that once the server returns classical samples from those circuits, the verifier can convince herself that the server must have run a quantum computation.
The theoretical computer scientist’s cartoon of verifying a quantum computer.
This is the jump from a physics-type experiment (the sense in which advantage has been achieved) to a secure protocol that can be used in settings where I do not want to trust the server and the data it provides me with. Such security may also allow a first application of quantum computers: to generate random numbers whose genuine randomness can be certified—a task that is impossible classically.
Here is the problem: On the one hand, we do know of schemes that allow us to classically verify that a computer is quantum and generate random numbers, so called cryptographic proofs of quantumness (PoQ). A proof of quantumness is a highly reliable scheme in that its security relies on well-established cryptography. Their big drawback is that they require a large number of qubits and operations, comparable to the resources required for factoring. On the other hand, the computations we can run in the advantage regime—basically, random circuits—are very resource-efficient but not verifiable.
The 100-logical-qubit regime lies right in the middle, and it seems more than plausible that classically verifiable advantage is possible in this regime. The theory challenge ahead of us is to find it: a quantum advantage scheme that is very resource-efficient like RCS and also classically verifiable like proofs of quantumness.
To achieve verifiable advantage in the 100-logical-qubit regime we need to close the gap between random circuit sampling and proofs of quantumness.
With this in mind, let me spell out some concrete goals that we can achieve using 100 logical qubits on the road to classically verifiable quantum advantage.
1. Demonstrate fault-tolerant quantum advantage
Before we talk about verifiable advantage, the first experiment I would like to see is one that combines the two big achievements of the past years, and shows that quantum advantage and fault-tolerance can be achieved simultaneously. Such an experiment would be similar in type to the RCS experiments, but run on encoded qubits with gate sets that match that encoding. During the computation, noise would be suppressed by correcting for errors using the code. In doing so, we could reach the near-perfect regime of RCS as opposed to the finite-fidelity regime that current RCS experiments operate in (as I discussed in detail in Part 2).
Random circuits with a quantum advantage that are particularly easy to implement fault-tolerantly are so-called IQP circuits. In those circuits, the gates are controlled-NOT gates and diagonal gates, so rotations , which just add a phase to a basis state as . The only “quantumness” comes from the fact that each input qubit is in the superposition state , and that all qubits are measured in the basis. This is an example of an example of an IQP circuit:
An IQP circuit starts from the all- state by applying a Hadamard transform, followed by IQP gates (in this case , some CNOT gates, , some CNOT gates, ) and ends in a measurement in the Hadamard basis.
As it so happens, IQP circuits are already really well understood since one of the first proposals for quantum advantage was based on IQP circuits (VerIQP1), and for a lot of the results in random circuits, we have precursors for IQP circuits, in particular, their ideal and noisy complexity (SimIQP). This is because their near-classical structure makes them relatively easy to study. Most importantly, their outcome probabilities are simple (but exponentially large) sums over phases that can just be read off from which gates are applied in the circuit and we can use well-established classical techniques like Boolean analysis and coding theory to understand those.
IQP gates are natural for fault-tolerance because there are codes in which all the operations involved can be implemented transversally. This means that they only require parallel physical single- or two-qubit gates to implement a logical gate rather than complicated fault-tolerant protocols which are required for universal circuits. This is in stark contrast to universal circuit which require resource-intensive fault-tolerant protocols. Running computations with IQP circuits would also be a step towards running real computations in that they can involve structured components such as cascades of CNOT gates and the like. These show up all over fault-tolerant constructions of algorithmic primitives such as arithmetic or phase estimation circuits.
Our concrete proposal for an IQP-based fault-tolerant quantum advantage experiment in reconfigurable-atom arrays is based on interleaving diagonal gates and CNOT gates to achieve super-fast scrambling (ftIQP1). A medium-size version of this protocol was implemented by the Harvard group (LogicalExp) but with only a bit more effort, it could be performed in the advantage regime.
In those proposals, verification will still suffer from the same problems of standard RCS experiments, so what’s up next is to fix that!
2. Closing the verification loophole
I said that a key milestone for the 100-logical-qubit regime is to find schemes that lie in between RCS and proofs of quantumness in terms of their resource requirements but at the same time allow for more efficient and more convincing verification than RCS. Naturally, there are two ways to approach this space—we can make quantum advantage schemes more verifiable, and we can make proofs of quantumness more resource-efficient.
First, let’s focus on the former approach and set a more moderate goal than full-on classical verification of data from an untrusted server. Are there variants of RCS that allow us to efficiently verify that finite-fidelity RCS has been achieved if we trust the experimenter and the data they hand us?
2.1 Efficient quantum verification using random circuits with symmetries
Indeed, there are! I like to think of the schemes that achieve this as random circuits with symmetries. A symmetry is an operator such that the outcome state of the computation (or some intermediate state) is invariant under the symmetry, so . The idea is then to find circuits that exhibit a quantum advantage and at the same time have symmetries that can be easily measured, say, using only single-qubit measurements or a single gate layer. Then, we can use these measurements to check whether or not the pre-measurement state respects the symmetries. This is a test for whether the quantum computer prepared the correct state, because errors or deviations from the true state would violate the symmetry (unless they were adversarially engineered).
In random circuits with symmetries, we can thus use small, well-characterized measurements whose outcomes we trust to probe whether a large quantum circuit has been run correctly. This is possible in a scenario I call the trusted experimenter scenario.
The trusted experimenter scenario In this scenario, we receive data from an actual experiment in which we trust that certain measurements were actually and correctly performed.
I think of random circuits with symmetries as introducing measurements in the circuit that check for errors.
Here are some examples of random circuits with symmetries, which allow for efficient verification of quantum advantage in the trusted experimenter scenario.
Graph states. My first example are locally rotated graph states (GStates). These are states that are prepared by CZ gates acting according to the edges of a graph on an initial all- state, and a layer of single-qubit -rotations is performed before a measurement in the basis. (Yes, this is also an IQP circuit.) The symmetries of this circuit are locally rotated Pauli operators, and can therefore be measured using only single-qubit rotations and measurements. What is more, these symmetries fully determine the graph state. Determining the fidelity then just amounts to averaging the expectation values of the symmetries, which is so efficient you can even do it in your head. In this example, we need measuring the outcome state to obtain hard-to-reproduce samples and measuring the symmetries are done in two different (single-qubit) bases.
With 100 logical qubits, samples from classically intractable graph states on several 100 qubits could be easily generated.
Bell sampling. The drawback of this approach is that we need to make two different measurements for verification and sampling. But it would be much more neat if we could just verify the correctness of a set of classically hard samples by only using those samples. For an example where this is possible, consider two copies of the output state of a random circuit, so . This state is invariant under a swap of the two copies, and in fact the expectation value of the SWAP operator in a noisy state preparation of determines the purity of the state, so . It turns out that measuring all pairs of qubits in the state in the pairwise basis of the four Bell states , where is one of the four Pauli matrices , this is hard to simulate classically (BellSamp). You may also observe that the SWAP operator is diagonal in the Bell basis, so its expectation value can be extracted from the Bell-basis measurements—our hard to simulate samples. To do this, we just average sign assignments to the samples according to their parity.
If the circuit is random, then under the same assumptions as those used in XEB for random circuits, the purity is a good estimator of the fidelity, so . So here is an example, where efficient verification is possible directly from hard-to-simulate classical samples under the same assumptions as those used to argue that XEB equals fidelity.
With 100 logical qubits, we can achieve quantum advantage which is at least as hard as the current RCS experiments that can also be efficiently (physics-)verified from the classical data.
Fault-tolerant circuits. Finally, suppose that we run a fault-tolerant quantum advantage experiment. Then, there is a natural set of symmetries of the state at any point in the circuit, namely, the stabilizers of the code we use. In a fault-tolerant experiment we repeatedly measure those stabilizers mid-circuit, so why not use that data to assess the quality of the logical state? Indeed, it turns out that the logical fidelity can be estimated efficiently from stabilizer expectation values even in situations in which the logical circuit has a quantum advantage (SyndFid).
With 100 logical qubits, we could therefore just run fault-tolerant IQP circuits in the advantage regime (ftIQP1) and the syndrome data would allow us to estimate the logical fidelity.
In all of these examples of random circuits with symmetries, coming up with classical samples that pass the verification tests is very easy, so the trusted-experimenter scenario is crucial for this to work. (Note, however, that it may be possible to add tests to Bell sampling that make spoofing difficult.) At the same time, these proposals are very resource-efficient in that they only increase the cost of a pure random-circuit experiment by a relatively small amount. What is more, the required circuits have more structure than random circuits in that they typically require gates that are natural in fault-tolerant implementations of quantum algorithms.
Performing random circuit sampling with symmetries is therefore a natural next step en-route to both classically verifiable advantage that closes the no-efficient verification loophole, and towards implementing actual algorithms.
What if we do not want to afford that level of trust in the person who runs the quantum circuit, however?
2.2 Classical verification using random circuits with planted secrets
If we do not trust the experimenter, we are in the untrusted quantum server scenario.
The untrusted quantum server scenario In this scenario, we delegate a quantum computation to an untrusted (presumably remote) quantum server—think of using a Google or Amazon cloud server to run your computation. We can communicate with this server using classical information.
In the untrusted server scenario, we can hope to use ideas from proofs of quantumness such as the use of classical cryptography to design families of quantum circuits in which some secret structure is planted. This secret structure should give the verifier a way to check whether a set of samples passes a certain verification test. At the same time it should not be detectable, or at least not be identifiable from the circuit description alone.
The simplest example of such secret structure could be a large peak in an otherwise flat output distribution of a random-looking quantum circuit. To do this, the verifier would pick a (random) string and design a circuit such that the probability of seeing in samples, is large. If the peak is hidden well, finding it just from the circuit description would require searching through all of the outcome bit strings and even just determining one of the outcome probabilities is exponentially difficult. A classical spoofer trying to fake the samples from a quantum computer would then be caught immediately: the list of samples they hand the verifier will not even contain unless they are unbelievably lucky, since there are exponentially many possible choices of .
Unfortunately, planting such secrets seems to be very difficult using universal circuits, since the output distributions are so unstructured. This is why we have not yet found good candidates of circuits with peaks, but some tries have been made (Peaks,ECPeaks,HPeaks)
We do have a promising candidate, though—IQP circuits! The fact that the output distributions of IQP circuits are quite simple could very well help us design sampling schemes with hidden secrets. Indeed, the idea of hiding peaks has been pioneered by Shepherd and Bremner (VerIQP1) who found a way to design classically hard IQP circuits with a large hidden Fourier coefficient. The presence of this large Fourier coefficient can easily be checked from a few classical samples, and random IQP circuits do not have any large Fourier coefficients. Unfortunately, for that construction and a variation thereof (VerIQP2), it turned out that the large coefficient can be detected quite easily from the circuit description (ClassIQP1,ClassIQP2).
To this day, it remains an exciting open question whether secrets can be planted in (maybe IQP) circuit families in a way that allows for efficient classical verification. Even finding a scheme with some large gap between verification and simulation times would be exciting, because it would for the first time allow us to verify a quantum computing experiment in the advantage regime using only classical computation.
Towards applications: certifiable random number generation
Beyond verified quantum advantage, sampling schemes with hidden secrets may be usable to generate classically certifiable random numbers: You sample from the output distribution of a random circuit with a planted secret, and verify that the samples come from the correct distribution using the secret. If the distribution has sufficiently high entropy, truly random numbers can be extracted from them. The same can be done for RCS, except that some acrobatics are needed to get around the problem that verification is just as costly as simulation (CertRand, CertRandExp). Again, a large gap between verification and simulation times would probably permit such certified random number generation.
The goal here is firstly a theoretical one: Come up with a planted-secret RCS scheme that has a large verification-simulation gap. But then, of course, it is an experimental one: actually perform such an experiment to classically verify quantum advantage.
Should an IQP-based scheme of circuits with secrets exist, 100 logical qubits is the regime where it should give a relevant advantage.
Three milestones
Altogether, I proposed three milestones for the 100 logical qubit regime.
Perform fault-tolerant quantum advantage using random IQP circuits. This will allow an improvement of the fidelity towards performing near-perfect RCS and thus closes the scalability worries of noisy quantum advantage I discussed in my last post.
Perform RCS with symmetries. This will allow for efficient verification of quantum advantage in the trusted experimenter scenario and thus make a first step toward closing the verification loophole.
Find and perform RCS schemes with planted secrets. This will allow us to verify quantum advantage in the remote untrusted server scenario and presumably give a first useful application of quantum computers to generate classically certified random numbers.
All of these experiments are natural steps towards performing actually useful quantum algorithms in that they use more structured circuits than just random universal circuits and can be used to benchmark the performance of the quantum devices in an advantage regime. Moreover, all of them close some loophole of the previous quantum advantage demonstrations, just like follow-up experiments to the first Bell tests have closed the loopholes one by one.
I argued that IQP circuits will play an important role in achieving those milestones since they are a natural circuit family in fault-tolerant constructions and promising candidates for random circuit constructions with planted secrets. Developing a better understanding of the properties of the output distributions of IQP circuits will help us achieve the theory challenges ahead.
Experimentally, the 100 logical qubit regime is exactly the regime to shoot for with those circuits since while IQP circuits are somewhat easier to simulate than universal random circuits, 100 qubits is well in the classically intractable regime.
What I did not talk about
Let me close this mini-series by touching on a few things that I would have liked to discuss more.
First, there is the OTOC experiment by the Google team (OTOC) which has spawned quite a debate. This experiment claims to achieve quantum advantage for an arguably more natural task than sampling, namely, computing expectation values. Computing expectation values is at the heart of quantum-chemistry and condensed-matter applications of quantum computers. And it has the nice property that it is what the Google team called “quantum-verifiable” (and what I would call “hopefully-in-the-future-verifiable”) in the following sense: Suppose we perform an experiment to measure a classically hard expectation value on a noisy device now, and suppose this expectation value actually carries some signal, so it is significantly far away from zero. Once we have a trustworthy quantum computer in the future, we will be able to check that the outcome of this experiment was correct and hence quantum advantage was achieved. There is a lot of interesting science to discuss about the details of this experiment and maybe I will do so in a future post.
Finally, I want to mention an interesting theory challenge that relates to the noise-scaling arguments I discussed in detail in Part 2: The challenge is to understand whether quantum advantage can be achieved in the presence of a constant amount of local noise. What do we know about this? On the one hand, log-depth random circuits with constant local noise are easy to simulate classically (SimIQP,SimRCS), and we have good numerical evidence that random circuits at very low depths are easy to simulate classically even without noise (LowDSim). So is there a depth regime in between the very low depth and the log-depth regime in which quantum advantage persists under constant local noise? Is this maybe even true in a noise regime that does not permit fault-tolerance (see this interesting talk)? In the regime in which fault-tolerance is possible, it turns out that one can construct simple fault-tolerance schemes that do not require any quantum feedback, so there are distributions that are hard to simulate classically even in the presence of constant local noise.
So long, and thanks for all the fish!
I hope that in this mini-series I could convince you that quantum advantage has been achieved. There are some open loopholes but if you are happy with physics-level experimental evidence, then you should be convinced that the RCS experiments of the past years have demonstrated quantum advantage.
As the devices are getting better at a rapid pace, there is a clear goal that I hope will be achieved in the 100-logical-qubit regime: demonstrate fault-tolerant and verifiable advantage (for the experimentalists) and come up with the schemes to do that (for the theorists)! Those experiments would close the loopholes of the current RCS experiments. And they would work as a stepping stone towards actual algorithms in the advantage regime.
I want to end with a huge thanks to Spiros Michalakis, John Preskill and Frederik Hahn who have patiently read and helped me improve these posts!
References
Fault-tolerant quantum advantage
(ftIQP1) Hangleiter, D. et al. Fault-Tolerant Compiling of Classically Hard Instantaneous Quantum Polynomial Circuits on Hypercubes. PRX Quantum6, 020338 (2025).
(LogicalExp) Bluvstein, D. et al. Logical quantum processor based on reconfigurable atom arrays. Nature626, 58–65 (2024).
Random circuits with symmetries
(BellSamp) Hangleiter, D. & Gullans, M. J. Bell Sampling from Quantum Circuits. Phys. Rev. Lett.133, 020601 (2024).
(GStates) Ringbauer, M. et al. Verifiable measurement-based quantum random sampling with trapped ions. Nat Commun16, 1–9 (2025).
(SyndFid) Xiao, X., Hangleiter, D., Bluvstein, D., Lukin, M. D. & Gullans, M. J. In-situ benchmarking of fault-tolerant quantum circuits. I. Clifford circuits. arXiv:2601.21472 II. Circuits with a quantum advantage. (coming soon!)
Verification with planted secrets
(PoQ) Brakerski, Z., Christiano, P., Mahadev, U., Vazirani, U. & Vidick, T. A Cryptographic Test of Quantumness and Certifiable Randomness from a Single Quantum Device. in 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS) 320–331 (2018).
(VerIQP1) Shepherd, D. & Bremner, M. J. Temporally unstructured quantum computation. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences465, 1413–1439 (2009).
(VerIQP2) Bremner, M. J., Cheng, B. & Ji, Z. Instantaneous Quantum Polynomial-Time Sampling and Verifiable Quantum Advantage: Stabilizer Scheme and Classical Security. PRX Quantum6, 020315 (2025).
(ClassIQP1) Kahanamoku-Meyer, G. D. Forging quantum data: classically defeating an IQP-based quantum test. Quantum7, 1107 (2023).
(ClassIQP2) Gross, D. & Hangleiter, D. Secret-Extraction Attacks against Obfuscated Instantaneous Quantum Polynomial-Time Circuits. PRX Quantum6, 020314 (2025).
(Peaks) Aaronson, S. & Zhang, Y. On verifiable quantum advantage with peaked circuit sampling. arXiv:2404.14493
(ECPeaks) Deshpande, A., Fefferman, B., Ghosh, S., Gullans, M. & Hangleiter, D. Peaked quantum advantage using error correction. arXiv:2510.05262
(HPeaks) Gharibyan, H. et al. Heuristic Quantum Advantage with Peaked Circuits. arXiv:2510.25838
Certifiable random numbers
(CertRand) Aaronson, S. & Hung, S.-H. Certified Randomness from Quantum Supremacy. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 933–944 (Association for Computing Machinery, New York, NY, USA, 2023).
(CertRandExp) Liu, M. et al. Certified randomness amplification by dynamically probing remote random quantum states. arXiv:2511.03686
OTOC
(OTOC) Abanin, D. A. et al. Observation of constructive interference at the edge of quantum ergodicity. Nature646, 825–830 (2025).
Noisy complexity
(SimIQP) Bremner, M. J., Montanaro, A. & Shepherd, D. J. Achieving quantum supremacy with sparse and noisy commuting quantum computations. Quantum1, 8 (2017).
(SimRCS) Aharonov, D., Gao, X., Landau, Z., Liu, Y. & Vazirani, U. A polynomial-time classical algorithm for noisy random circuit sampling. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 945–957 (2023).
(LowDSim) Napp, J. C., La Placa, R. L., Dalzell, A. M., Brandão, F. G. S. L. & Harrow, A. W. Efficient Classical Simulation of Random Shallow 2D Quantum Circuits. Phys. Rev. X12, 021021 (2022).
I don’t have time to write a full post right now, but hopefully this is self-explanatory.
Regardless of their broader views on the AI industry, the eventual risks from AI, or American politics, right every person of conscience needs to stand behind Anthropic, as they stand up for their right to [checks notes] not be effectively nationalized by the Trump administration and forced to build murderbots and to help surveil American citizens. No, I wouldn’t have believed this either in a science-fiction movie, but it’s now just the straightforward reality of our world, years ahead of schedule. In particular, I call on all other AI companies, in the strongest possible terms, to do the right thing and stand behind Anthropic, in this make-or-break moment for the AI industry and the entire world.
Recently, a few people have asked me about this paper.
A couple weeks back, OpenAI announced a collaboration with a group of amplitudes researchers, physicists who study the types of calculations people do to make predictions at particle colliders. The amplitudes folks had identified an interesting loophole, finding a calculation that many would have expected to be zero actually gave a nonzero answer. They did the calculation for different examples involving more and more particles, and got some fairly messy answers. They suspected, as amplitudes researchers always expect, that there was a simpler formula, one that worked for any number of particles. But they couldn’t find it.
Then a former amplitudes researcher at OpenAI suggested that they use AI to find it.
“Use AI” can mean a lot of different things, and most of them don’t look much like the way the average person talks to ChatGPT. This was closer than most. They were using “reasoning models”, loops that try to predict the next few phrases in a “chain of thought” again and again and again. Using that kind of tool, they were able to find that simpler formula, and mathematically prove that it was correct.
A few of you are hoping for an in-depth post about what they did, and its implications. This isn’t that. I’m still figuring out if I’ll be writing that for an actual news site, for money, rather than free, for you folks.
Instead, I want to talk about a specific idea I’ve seen crop up around the paper.
See, for some, the existence of a result like this isn’t all that surprising.
Mathematicians have been experimenting with reasoning models for a bit, now. Recently, a group published a systematic study, setting the AI loose on a database of minor open problems proposed by the famously amphetamine-fueled mathematician Paul Erdös. The AI managed to tackle a few of the problems, sometimes by identifying existing solutions that had not yet been linked to the problem database, but sometimes by proofs that appeared to be new.
The Erdös problems solved by the AI were not especially important. Neither was the problem solved by the amplitudes researchers, as far as I can tell at this point.
But I get the impression the amplitudes problem was a bit more interesting than the Erdös problems. The difference, so far, has mostly been attributed to human involvement. This amplitudes paper started because human amplitudes researchers found an interesting loophole, and only after that used the AI. Unlike the mathematicians, they weren’t just searching a database.
This lines up with a general point, one people tend to make much less carefully. It’s often said that, unlike humans, AI will never be truly creative. It can solve mechanical problems, do things people have done before, but it will never be good at having truly novel ideas.
To me, that line of thinking goes a bit too far. I suspect it’s right on one level, that it will be hard for any of these reasoning models to propose anything truly novel. But if so, I think it will be for a different reason.
The thing is, creativity is not as magical as we make it out to be. Our ideas, scientific or artistic, don’t just come from the gods. They recombine existing ideas, shuffling them in ways more akin to randomness than miracle. They’re then filtered through experience, deep heuristics honed over careers. Some people are good at ideas, and some are bad at them. Having ideas takes work, and there are things people do to improve their ideas. Nothing about creativity suggests it should be impossible to mechanize.
However, a machine trained on text won’t necessarily know how to do any of that.
That’s because in science, we don’t write down our inspirations. By the time a result gets into a scientific paper or textbook, it’s polished and refined into a pure argument, cutting out most of the twists and turns that were an essential part of the creative process. Mathematics is even worse, most math papers don’t even mention the motivation behind the work, let alone the path taken to the paper.
This lack of documentation makes it hard for students, making success much more a function of having the right mentors to model good practices, rather than being able to pick them up from literature everyone can access. I suspect it makes it even harder for language models. And if today’s language model-based reasoning tools are bad at that crucial, human-seeming step, of coming up with the right idea at the right time? I think that has more to do with this lack of documentation, than with the fact that they’re “statistical parrots”.
[This post is written in my capacity as Vice-Chair of the Board of Trustees of SLMath. -T.]
SLMath, formerly MSRI, has launched the search for the next Deputy Director. This key position is a close advisor to the Director and shares in the internal management of the scientific team and programs at SLMath. This position is ideal for an experienced professional with a PhD in mathematical sciences seeking a new opportunity to leverage their strengths in program and grant management, financial management, and people management.
I've been attending a lot of talks lately about AI/machine learning and multiscale modeling for materials design and control. This is a vast, rapidly evolving research area, so here is a little background and a few disorganized thoughts.
For a recent review article about AI and materials discovery, see here. There is a ton of work being done pursuing the grand goal of inverse design - name some desired properties, and have AI/ML formulate a material that fits those requirements and is actually synthesizable. Major companies with publicly known efforts include Google Deepmind and GNoME, Microsoft, Meta working on catalysts, Toyota Research Institute, IBM, and I'm certain that I'm missing major players. There are also a slew of startup companies on this topic (e.g. Periodic).
In addition to materials design and discovery, there is enormous effort being put into using AI/ML to bridge across length and timescales. Quantum chemistry methods can look at microscopic physics and chemistry, for example, but extending this to macroscopic system sizes with realistic disorder is often computationally intractable. There are approaches like time-dependent DFT and DMFT to try to capture dynamics, but following dynamics even as long as picoseconds is hard. Using microscopic methods and ML to try to compute and then parametrize force fields between atoms (for example), one can look at larger systems and longer timescales using molecular dynamics for atomic motions. However, getting from there to, e.g., the Navier-Stokes equations or understanding phase boundaries, is very difficult. (At the same time, there are approaches that use AI/ML to learn about the solutions of partial differential equations, so that one can, for example, compute good fluid flows quickly without actually having to solve the N-S equations - see here.)
We want to keep coarse-graining (looking at larger scales), while maintaining the microscopic physics constraints so that the results are accurate. There seems to be a lot of hope that either by design or by the action of the AI/ML tools themselves we can come up with descriptors that are good at capturing the essential physics as we move to larger and larger scales. To use a fluids example, somehow we are hoping that these tools will naturally capture that at scales much larger than one water molecule, it makes sense to track density, temperature, velocity fields, surface tension, liquid-vapor interfaces, etc.
One rough description of emergence is the idea that at larger scales and numbers of constituents, new properties appear for the collective system that are extremely difficult to predict from the microscopic rules governing the constituents. For example, starting from the Schroedinger equation and basic quantum mechanics, it's very hard to determine that snowflakes tend to have 6-fold symmetry and ice will float in water, even though the latter are of course consequences of the former. A nice article about emergence in physics is here.
It feels to me like in some AI/ML endeavors, we are hoping that these tools will figure out how emergence works better than humans have been able to do. This is certainly a worthy challenge, and it may well succeed in a lot of systems, but then we may have the added meta-challenge of trying to understand how our tools did that. Physics-informed and structured ML will hopefully take us well beyond the situation in the xkcd comic shown here.
(guest post by Dimitris Tsementzis, about joint work with Benedikt Ahrens, Paige North, and Mike Shulman)
The Univalence Principle is the informal statement that equivalent mathematical structures are indistinguishable.
There are various ways of making this statement formally precise, and a long history of work that does so.
In our recently-published (but long in the making!) book we proved to our knowledge the most general version of this principle, which applies to set-based, categorical, and higher-categorical structures defined in a non-algebraic and space-based style, as well as models of higher-order theories such as topological spaces.
This work achieves three main goals.
Firstly, it greatly extends the “Structure Identity Principle” from the original HoTT book, to include any (finite) level of structure, instead of just set-based structures, thus establishing in the strongest sense yet made precise that the Univalent Foundations provide an equivalence-invariant foundation for higher-categorical mathematics.
Secondly, it provides very general novel definitions of equivalence between structures and between objects in a given structure that “compile” to most known notions of equivalence in known cases, but which can also be used to suggest notions in new settings; in doing so it extends M. Makkai’s classic work on First Order-Logic with Dependent Sorts (FOLDS).
Thirdly, the setting in which our result is proved (a form of Two-Level Type Theory) provides a framework in which to do metamathematics in the Univalent Foundations/HoTT, i.e. carry out the mathematical study of how mathematics is formalized in UF/HoTT.
The Univalence Principle we prove is a foundational metamathematical result in that sense.
Setting the Stage
Any “Univalence Principle” type of result has the following form:
The result gains in strength if the class of is as large as possible, and the notion of equivalence between them coincides with known notions of equivalence in practice (where is a placeholder for a notion of signature in terms of which and are structures).
Such a result also gains in strength if the middle notion of equivalence is as “structural” as possible, ensuring that not only properties of the structures are preserved, but also constructions built on top of them.
This last feature can only really be pursued within a univalent type theory, like HoTT.
In our work, we define: diagrammatic signatures as inverse categories of finite height, -structures as Reedy-fibrant functors , and a notion of indiscernibility relating objects within structures, which yields a derived notion of equivalence between structures (essentially structure-preserving bijections up to indiscernibility).
The primitive notions and are given by equivalence and equality in 2LTT, appropriately defined.
Signatures, Structures (and where they live)
Two-level type theory (2LTT)
In 2LTT, there is an external level for exo-types and other exo-gizmos (allowing strictly functorial constructions), and the usual fibrant (HoTT/UF) level where mathematical objects live.
The external level is the metamathematical or syntactic level, where a strict equality exists that allows us to define syntax and symbols.
Our exo-gizmos are analogous to sets of syntactical symbols used to define signatures in first-order logic.
Such syntax consists of categories with strict equality on composition, which we call exo-categories.
Equalities here are strict (exo-equalities), while the internal world has homotopical paths.
The 2LTT setting is convenient for type-theoretic reasoning, and allows us to neatly separate the various notions of equality at play.
Diagram signatures are inverse categories of finite height
A diagram signature is an inverse exo-category of finite height equipped with a rank functor
that reflects identities. Objects of are the sorts (analogous to “sorts” in first-order logic); morphisms encode dependencies (“an arrow depends on a pair of objects,” etc.). Inverse-ness gives matching objects and allows Reedy methods to apply.
To illustrate the idea, take the example of a reflexive graph. The diagram signature for reflexive graphs would be given by the following inverse (exo-)category
where . The intuition is that we have a sort of “arrows” between any two objects, and a predicate (“identity”) that can be used to select which arrows are identity arrows with the relation ensuring that this predicate can only be “asked” of loops.
Structures are Reedy fibrant diagrams over these signatures
Given this notion of a signature, a structure in our sense is simply a (Reedy fibrant) functor . In more detail, a raw -diagram is an exo-functor
For each sort , the matching object collects the compatible lower-rank data needed to specify the “boundary” for an element of . The Reedy fibrancy condition is: the canonical boundary map is a fibration (i.e., a dependent type) for each .
The category of such Reedy-fibrant diagrams then forms a fibrant type whose points are the -structures.
To illustrate with the example of from above, an -structure in our sense would be given by the following data, in type-theoretic notation:
A type
A family dependent on
A family for the “identity”
The trick here is to ensure the repeated in the definition of , obeying the relations of the signature.
This is what the matching object machinery achieves for arbitrary signatures.
Other examples of -structures include: categories (with sorts for objects and morphisms), groupoids, -categories for any , preorders, and models of higher-order theories like topological spaces and sup-lattices.
Furthermore, all of what we say applies to theories with axioms (not just structures), which we define in the book too.
Indiscernibility and local univalence
With our formal set-up complete, we address the central question: for arbitrary signatures , when are two “objects” in an -structure equivalent?
Such an “object” could be a categorical (or higher-categorical) gadget itself when has height , e.g. the “objects” of are themselves categories, which are -structures for an of lower height.
The key idea is: two “objects” are indiscernible if nothing in the signature can distinguish them…up to indiscernibility!
Indiscernibilities and Local Univalence
Fix a signature , an -structure , a rank (“bottom”) sort , and elements .
Think of as a “set” of objects (e.g. of a category) on top of which properties and structures are defined.
To define when and are indiscernible, we package together everything that could distinguish them. The intuition is: if there is an equivalence between “everything you can say about ” and “everything you can say about ” (outside of directly referencing or themselves), then they are indiscernible.
We achieve this through a formal definition of a “boundary” , which one can think of as the -structure that contains everything that could distinguish in from anything else in .
Which naturally leads us to the following definitions.
Definition (Indiscernibility). We say that are indisernible, written , iff there is a levelwise equivalence that is coherent in the right way.
Definition (Univalent Structure). We say that is locally univalent at K if the canonical map
is an equivalence. We then say a structure is univalent if this holds at all rank-0 sorts and (recursively) for all sorts of higher rank.
We prove our main results for univalent -structures. These are quite special in that they are “saturated” in their equivalence information: two indistinguishable gizmos are actually equal. Or, put differently: when there is not “enough” structure to distinguish two gizmos, there is always enough to prove them equivalent. Some examples to illustrate (see Part 2 of the book for many more!):
In a reflexive graph structure, two nodes and are indiscernible iff they have the same number of arrows coming in and out, and the same number of loops that are identities.
In a category, two objects are indiscernible iff they are isomorphic. A univalent category is precisely a category (precategory in UF) that is locally univalent at the sort of its objects.
In an appropriately defined bicategory, an indiscernibility amounts to a pair of coherent adjoint equivalenes, as expected.
Equivalences of structures
We are almost there! On to the final missing piece: equivalences between structures themselves.
Let be a morphism of -structures, defined in the expected way (as a natural transformation between the corresponding -valued functors). Then, for each sort , we have a matching square
and for each “context” an induced fiber map
Write for indiscernibility at sort (as above). Then, what we really want to say now is: , are equivalent if they are “level-wise equivalent up to indiscernibility”. This idea gives us the from the beginning, and we define it as follows:
Definition (Equivalence of -structures). is an equivalence if, for every sort , every , and every , there exists a specified with
i.e., is essentially split surjective up to indiscernibility on each fiber. We write for the type of equivalences.
The key innovation is the “up to indiscernibility” part; it makes our notion significantly weaker than usual notions, and hence the final result stronger.
Note that we have not been able to prove our result without a splitness condition in the definition of equivalence, and to our mind this remains an open problem.
Our definition is related to Makkai’s original notion of FOLDS equivalence, but Makkai was not able to define a general notion of non-surjective equivalence directly, relying instead on spans. Our notion of indiscernibility circumvents this difficulty and allows us to consider the whole structure of equivalences between structures.
The Univalence Principle
With all our apparatus in place we prove our main result, a very general form of a univalence principle, as promised.
Theorem (“The Univalence Principle”).
For a signature and univalent -structures , the canonical map
is an equivalence of types. In other words,
.
The proof proceeds by induction on the height of . The key insight is that level-wise equivalences between univalent structures must “reflect indiscernibilities”: if doesn’t preserve the ability to distinguish elements, then whatever structure distinguishes them in the source would transfer to structure distinguishing their images in the target, contradicting the equivalence.
With the splitness assumption in the map and the assumption of univalence (of our -structure), we are able to achieve the lifting of the indiscernibility.
Our result is actually proved for an even more general class of signatures called functorial signatures, which strictly extends diagram signatures and covers “higher-order” situations (topological spaces, suplattices, DCPOs, etc.).
We have stuck to the diagrammatic view in this post for intuition, but all results and definitions carry over to this more general notion.
Conclusion
In the course of this work there were quite a few questions we tried to answer, but could not. To mention a couple: Can we define a Rezk completion for arbitrary structures, providing a universal way to turn any structure into a univalent one? Can we remove the splitness condition from our definition of equivalence between structures? We list more open problems at the end of the book.
Beyond our specific result, the framework in which it is proven establishes a way to answer metamathematical questions around the univalence axiom in a precise and fruitful way.
It is important to emphasize that carrying out this type of mathematical study does not require choosing one foundation over the other.
In any setting that interprets the fibrant part of 2LTT, the univalence principle will hold, including in set theory.
The metamathematics of UF is the mathematical study of formalizing mathematics in terms of a hierarchy of -levels vs. a cumulative hierarchy of sets.
Formalizing mathematics in this way has all sorts of unique mathematical properties.
The Univalence Principle is one of them.
These days I am putting the finishing touches on a hybrid algorithm that optimizes a system (a gamma-ray observatory) by combining reinforcement-learning with gradient descent. Although I published an optimization strategy for that application already, I am going back to it to demonstrate a case where the simultaneous optimization of hardware and software is necessary, for a paper on co-design I am writing with several colleagues. In the course of the software development, I ran into a simple but still interesting statistical issue I had not paid attention to until now. So I thought I could share it with you here.
It’s there in every biography, and many interviews: the moment the scientist falls in love with an idea. It can be a kid watching ants in the backyard, a teen peering through a telescope, or an undergrad seeing a heart cell beat on a slide. It’s a story so common that it forms the heart of the public idea of a scientist: not just someone smart enough to understand the world, but someone passionate enough to dive in to their one particular area above all else. It’s easy to think of it as a kind of passion most people never get to experience.
And it does happen, sometimes. But it’s a lot less common than you’d think.
I first started to suspect this as a PhD student. In the US, getting accepted into a PhD program doesn’t guarantee you an advisor to work with. You have to impress a professor to get them to spend limited time and research funding on you. In practice, the result was the academic analog of the dating scene. Students looked for who they might have a chance with, based partly on interest but mostly on availability and luck and rapport, and some bounced off many potential mentors before finding one that would stick.
Then, for those who continued to postdoctoral positions, the same story happened all over again. Now, they were applying for jobs, looking for positions where they were qualified enough and might have some useful contacts, with interest into the specific research topic at best a distant third.
Working in the EU, I’ve seen the same patterns, but offset a bit. Students do a Master’s thesis, and the search for a mentor there is messy and arbitrary in similar ways. Then for a PhD, they apply for specific projects elsewhere, and as each project is its own funded position the same job search dynamics apply.
The picture only really clicked for me, though, when I started doing journalism.
Nowadays, I don’t do science, I interview people about it. The people I interview are by and large survivors: people who got through the process of applying again and again and now are sitting tight in an in-principle permanent position. They’re people with a lot of freedom to choose what to do.
And so I often ask for that reason, that passion, that scientific love at first sight moment: why do you study what you do? It’s a story that audiences love, and thus that editors love, it’s always a great way to begin a piece.
But surprisingly often, I get an unromantic answer. Why study this? Because it was available. Because in the Master’s, that professor taught the intro course. Because in college, their advisor had contacts with that lab to arrange a study project. Because that program accepted people from that country.
And I’ve noticed how even the romantic answers tend to be built on the unromantic ones. The professors who know how to weave a story, to self-promote and talk like a politician, they’ll be able to tell you about falling in love with something, sure. But if you read between the lines, you’ll notice where their anecdotes fall, how they trace a line through the same career steps that less adroit communicators admit were the real motivation.
There’s been times I’ve thought that my problem was a lack of passion, that I wasn’t in love the same way other scientists were in love. I’ve even felt guilty, that I took resources and positions from people who were. There is still some truth in that guilt, I don’t think I had the same passion for my science as most of my colleagues.
But I appreciate more now, that that passion is in part a story. We don’t choose our specialty, making some grand agentic move. Life chooses for us. And the romance comes in how you tell that story, after the fact.
The STOC’2026 accepted papers list is out. It seems to me that there’s an emperor’s bounty of amazing stuff this year. I felt especially gratified to see the paper on the determination of BusyBeaver(5) on the list, reflecting a broad view of what theory of computing is about.
There’s a phenomenal profile of Henry Yuen in Quanta magazine. Henry is now one of the world leaders of quantum complexity theory, involved in breakthroughs like MIP*=RE and now pioneering the complexity theory of quantum states and unitary transformations (the main focus of this interview). I’m proud that Henry tells Quanta that learned about the field in 2007 or 2008 from a blog called … what was it again? … Shtetl-Optimized? I’m also proud that I got to help mentor Henry when he was a PhD student of my wife Dana Moshkovitz at MIT. Before I read this Quanta profile, I didn’t even know the backstory about Henry’s parents surviving and fleeing the Cambodian genocide, or about Henry growing up working in his parents’ restaurant. Henry never brought any of that up!
See Lance’s blog for an obituary of Joe Halpern, a pioneer of the branch of theoretical computer science that deals with reasoning about knowledge (e.g., the muddy children puzzle), who sadly passed away last week. I knew Prof. Halpern a bit when I was an undergrad at Cornell. He was a huge presence in the Cornell CS department who’ll be sorely missed.
UT Austin has announced the formation of a School of Computing, which will bring together the CS department (where I work) with statistics, data science, and several other departments. Many of UT’s peer institutions have recently done the same. Naturally, I’m excited for what this says about the expanded role of computing at UT going forward. We’ll be looking to hire even more new faculty than we were before!
When I glanced at the Chronicle of Higher Education to see what was new, I learned that researchers at OpenAI had proposed a technical solution, called “watermarking,” that might help tackle the crisis of students relying on AI to write all their papers … but that OpenAI had declined to deploy that solution. The piece strongly advocates a legislative mandate in favor of watermarking LLM outputs, and addresses some of the main counterarguments to that position.
Just a brief announcement that I have been working with Quanta Books to publish a short book in popular mathematics entitled “Six Math Essentials“, which will cover six of the fundamental concepts in mathematics — numbers, algebra, geometry, probability, analysis, and dynamics — and how they connect with our real-world intuition, the history of math and science, and to modern practice of mathematics, both in theory and in applications. The scheduled publication date is Oct 27, but it is currently available for preorder.
I wrote code this weekend to look at the question of how we should visit a star in the upcoming Terra Hunting Experiment. The current (straw-person) plan is that we will observe each visible star once per night for ten years, with one exposure of a sensibly-chosen exposure time at each visit. Is this a good idea? I was interested in this problem for two reasons. The first is that binning is sinning, with the corollary that bigger bins are worse than finer bins, and a single, long exposure is a very big bin. The second reason is that when there are non-trivial noise sources (like the quasi-periodic variations from p-mode oscillations of the surfaces of Sun-like stars), a few negatively- (or interestingly-) correlated noise draws can be combined in ways that are substantially more informative than by taking the average.
Of course, if you split an exposure (with a standard CCD, say) into sub-exposures, you take on real costs: There is a read time, which is time you aren't integrating, and there is a read noise, that affects each new exposure. So the best strategies are a complicated function of read time, read noise, and the signal-to-noise at which the stellar p-mode oscillations are visible in any realistic data. Related: There are amazingly different and interesting strategies with up-the-ramp detectors that are used in the infrared.
One final comment is that the objective, in my strongly held view, is to optimize the amount of information (about, say, the center-of-mass radial-velocity changes of the target star) per unit wall-clock time. We are paying for wall-clock time; let's get as much as we can out of it.
Now that we're 6 weeks into the new year, I think it's worth it to do an incomplete roundup of where we are on US federal support of STEM research. Feel free to skip this post if you don't want to read about this.
Appropriators in Congress largely went against the FY26 presidential budget request, and various spending bills by and large slightly-less-than level-funded most US science agencies. A physics-oriented take is here. The devil is in the details. The AAAS federal R&D dashboard lets you explore this at a finer level. Nature has an interactive widget that visualizes what has been cut and what remains.
Bear in mind, that was just year 1 of the present administration. All of the effort, all of the work pushing back against proffered absolutely draconian, agency-destroying cuts? That likely will have to be done again this year. And in subsequent years, if the administration still invests effort in pushing enormously slashed budgets in their budget requests.
There is an issue of Science with the whole news section about how the past year has changed the science funding and pipeline in the US.
In NSF news, the rate of awards remains very low, though there is almost certainly a major delay because of the lateness of the budget, coping with reduced staffing levels, and restructuring now that Divisions no longer exist. How greater emphasis on specific strategic priorities (beyond what is in the program calls) will affect operations remains unclear, at least to me.
Also, some NSF graduate research fellowship applications, especially in the life sciences, seem to be getting kicked back without review - see here (sorry about the paywall). This seems to be a broad research area issue, despite no information to applicants about this (that lack of information flow is perhaps unsurprising).
The back and forth about indirect cost rates continues, along with the relevant court cases. The recent appropriations have language to prevent sudden changes in rates. The FAIR model is not yet passed.
I could go on. I know I've left out critical areas, and I haven't talked about DOE or NASA or DOD or EPA or NOAA explicitly.
Honest people can have discussions about the right balance of federal vs state vs industrial vs philanthropic support for research. There are no easy answers in the present time. For those who think that robust public investment in science and engineering research is critical to societal good, economic competitiveness, and security, we need to keep pushing and not let fatigue or fatalism win the day.
Tomorrow is Valentine’s Day, so it’s time for this blog’s yearly tradition of posting a poem. Next week there may be a prose take on the same topic.
You’ve heard love stories like Oliver’s, I’m sure. Meeting that childhood sweetheart In the back room, with the garden view And trust that, with a wink, the parents may regret. Stories tungsten-milled To fit our expectations.
And you’ve heard wilder stories From genuinely riskier lives. The rescue and the love linked under the Milky Way Like an action movie. The love’s reality, even so, Defying summary.
You’ve heard stories of wide-eyed students Realizing they can be adults. Of those moments in study or celebration Turning points in self-conception. And maybe you don’t ask About the other times.
Love happens, And we love love to happen. But we build love too.
Today my rant on LLMs and the practices of our field hit the arXiv. I was scared to post it, because it is such a weird contribution, and it is so revealing about myself and my own political positions and hangups. But I have to say: I got great and supportive feedback all day.
I got two comments on saying ACAB in the literature. The Astronomer Royal of Scotland quoted (on BlueSky) the last sentence, which I put there because Andy Casey (Monash, Flatiron) insisted. Many people sent me appreciation and thank-yous, and many people sent me comments and objections. Always constructive. The whole experience made me feel very happy about the state of our field and the way we all interact. I think maybe there will be critical mass to write some kind of collection of essays on the subject. That's a plan for 2026.
My husband and I visited the Library of Congress on the final day of winter break this year. In a corner, we found a facsimile of a hand-drawn map: the world as viewed by sixteenth-century Europeans. North America looked like it had been dieting, having shed landmass relative to the bulk we knew. Australia didn’t appear. Yet the map’s aesthetics hit home: yellowed parchment, handwritten letters, and symbolism abounded. Never mind street view; I began hungering for an “antique” setting on Google maps.
1507 Waldseemüller Map, courtesy of the Library of Congress
Approximately four weeks after that trip, I participated in the release of another map: the publication of the review “Roadmap on quantum thermodynamics” in the journal Quantum Science and Technology. The paper contains 24 chapters, each (apart from the introduction) profiling one opportunity within the field of quantum thermodynamics. My erstwhile postdoc Aleks Lasek and I wrote the chapter about the thermodynamics of incompatibleconservedquantities, as Quantum Frontiers fans1 might guess from earlierblogposts.
Allow me to confess an ignoble truth: upon agreeing to coauthor the roadmap, I doubted whether it would impact the community enough to merit my time. Colleagues had published the book Thermodynamics in the Quantum Regime seven years earlier. Different authors had contributed different chapters, each about one topic on the rise. Did my community need such a similar review so soon after the book’s publication? If I printed a map of a city the last time I visited, should I print another map this time?
Apparently so. I often tout the swiftness with which quantum thermodynamics is developing, yet not even I predicted the appetite for the roadmap. Approximately thirty papers cited the arXiv version of the paper during the first nine months of its life—before the journal publication. I shouldn’t have likened the book and roadmap to maps of a city; I should have likened them to maps of a terra incognita undergoing exploration. Such maps change constantly, let alone over seven years.
A favorite map of mine, from a book
Two trends unite many of the roadmap’s chapters, like a mountain range and a river. First, several chapters focus on experiments. Theorists founded quantum thermodynamics and dominated the field for decades, but experimentalists are turning the tables. Even theory-heavy chapters, like Aleks’s and mine, mention past experiments and experimental opportunities.
Second, several chapters blend quantum thermodynamics with many-body physics. Many-body physicists share interests with quantum thermodynamicists: thermalization and equilibrium, the absence thereof, and temperature. Yet many-body physicists belong to another tribe. They tend to interact with each other differently than quantum thermodynamicists do, write papers differently, adhere to different standards, and deploy different mathematical toolkits. Many-body-physicists use random-matrix theory, mean field theory, Wick transformations, and the like. Quantum thermodynamicists tend to cultivate and apply quantum information theory. Yet the boundary between the communities has blurred, and many scientists (including yours truly) shuttle between the two.
My favorite anti-map, from another book (series)
When Quantum Science and Technology published the roadmap, lead editor Steve Campbell announced the event to us coauthors. He’d wrangled the 69 of us into agreeing to contribute, choosing topics, drafting chapters, adhering to limitations on word counts and citations, responding to referee reports, and editing. An idiom refers to the herding of cats, but it would gain in poignancy by referring to the herding of academics. Little wonder Steve wrote in his email, “I’ll leave it to someone else to pick up the mantle and organise Roadmap #2.” I look forward to seeing that roadmap—and, perhaps, contributing to it. Who wants to pencil in Australia with me?
There seems to be a huge push lately in the tech world for the idea of placing data centers in space. This is not just coming from Musk via the merging of SpaceX and XAi. Google has some effort along these lines. NVIDIA is thinking about it. TED talks are being given by startup people in San Francisco on this topic, so you know we've reached some well-defined hype level. Somehow the idea has enough traction that even the PRC is leaning in this direction. The arguments seem to be that (1) there is abundant solar power in space; (2) environmental impact on the earth will be less, with no competition for local electricity, water, real estate; (3) space is "cold", so cooling these things should be do-able; (4) it's cool and sounds very sci-fi/high frontier.
At present (or near-future) levels of technology, as far as I can tell this idea makes no sense. I will talk about physics reasons here, though there are also pragmatic economic reasons why this seems crazy. I've written before that I think some of the AI/data center evangelists are falling victim to magical thinking, because they come from the software world and don't in their heart of hearts appreciate that there are actual hardware constraints on things like chip manufacturing and energy production.
Others have written about this - see here for example. The biggest physics challenges with this idea (beyond lofting millions of kg of cargo into orbit):
While the cosmic microwave background is cold, cooling things in space is difficult, because vacuum is an excellent thermal insulator. On the ground, you can use conduction and convection to get rid of waste heat. In space, your only option (beyond throwing mass overboard, which is not readily replenishible) is radiative cooling. The key physics here is the Stefan-Boltzmann law, which is a triumph of statistical physics (and one of my favorite derivations to discuss in class - you combine the Planck result for the energy density of a "gas" of photons in thermal equilibrium at some temperature \(T\) with a basic kinetic theory of gases result for the flux of particles out of a small hole). It tells you that the best you can ever do is for an ideal black body, the total power radiated away is proportional to the area of the radiator and \(T^{4}\), with fundamental constants making up the proportionality constant with zero adjustable parameters.
Remember, data centers right now consume enormous amounts of power (and cooling water). While you can use heat pumps to try to get the radiators up to well above the operating temperatures of the electronics, that increases mass and waste power, and realistically there is an upper limit on the radiator temperature below 1000 K. An ideal black body radiator at 1000 K puts out about 57 kW per square meter, and you probably need to get rid of tens of megawatts, necessitating hundreds to thousands of square meters of radiator area. There are clever ideas on how to try to do this. For example, in the liquid droplet radiator, you could spray a bunch of hot droplets out into space, capitalizing on their large specific surface area. Of course, you'd need to recapture the cooled droplets, and the hot liquid needs to have sufficiently low vapor pressure that you don't lose a lot of material. Still, as far as I am aware, to date no one has actually deployed a large-scale (ten kW let alone MW level) droplet radiator in space.
High end computational hardware is vulnerable to radiation damage. There are no rad-hard GPUs. Low earth orbit is a pretty serious radiation environment, with some flux of high energy cosmic rays quite a bit higher than on the ground. While there are tests going on, and astronauts are going to bring smartphones on the next Artemis mission, it's rough. Putting many thousands to millions of GPUs and huge quantities of memory in a harsh environment where they cannot be readily accessed or serviced seems unwise. (There are also serious questions of vulnerability to attack. Setting off a small nuclear warhead in LEO injects energetic electrons into the lower radiation belts and would be a huge mess.)
I think we will be faaaaaaar better off in the long run if we take a fraction of the money that people want to invest in space-based data centers, and instead plow those resources into developing energy-efficient computing. Musk has popularized the engineering sentiment "The best part is no part". The best way to solve the problem of supplying and radiating away many GW of power for data centers is to make data centers that don't consume many GW of power.
Mr. Epstein was not only a world-class child abuser, he was a big fan of theoretical high-energy physics and of theoretical physicists. Some of my colleagues, unfortunately, got to know him. A number who were famous and/or had John Brockman as a book agent were even invited to a physics conference on Epstein’s private island, well before he was first arrested. This was no secret; as I recall, a lot of us heard about the existence of this conference/trip, but we hadn’t heard Epstein’s name before and didn’t pay much attention (ho hum, just another weird billionaire).
Personally, I feel quite lucky. The Brockman agency rejected the proposal for my recent book without comment (thank you!); and my research is mostly considered unimportant by the Brian Greenes of the world. As a result, I was not invited to Epstein’s island, never made his acquaintance, and blissfully avoided the entire affair. Clearly there are some benefits to being considered ordinary. And so — I’m sorry/not-sorry to say — I can’t tell you much about Epstein at all, or about how certain physicists did and did not interact with him. Regarding my colleagues who did get to know him, I can’t speak for them, since I wasn’t there, and I don’t know to what extent Epstein hid his immoral activities when they were around. It’s up to them to tell their own stories if they feel the need to do so (and I hope a couple of them do, just to clear the air.) Personally I tend to give them the benefit of the doubt — probably some literally didn’t know what was up until Epstein’s arrest in 2008, while perhaps others felt there wasn’t much they could do about Epstein’s actions on his own private island. I imagine they are deeply embarrassed to have been caught in this horrible man’s ugly web.
Fans of physics come in all shapes and sizes, and some have large wallets, large egos, and/or large ambitions. Among the wealthy supporters, we can count Alfred Nobel himself; billionaires sit on important scientific institute and university boards, and the more recent Breakthrough Prizes were funded by deep pockets. The extreme wealthy have outsized influence in our country and in our world, and one could argue that their influence in 2025 was not for the better. Usually, though, the influence in physics and related fields tends to be relatively benign, funding postdoctoral researchers and graduate students who deeply want to do science but also need to eat. That said, sometimes donors fund non-essential fields at the expense of critical ones, or favor theoretical research over the gathering of crucial experimental data, or push money on famous rich organizations when there are poor ones that are equally deserving and far more needy.
When gazillionaires, on their own initiative, come calling on non-profit organizations, whether they be community centers, arts organizations, or universities, they pose a problem. On the one hand, it is the job of anyone in a non-profit organization to help raise money — fail to do that, and your organization will close. When a single person offers to permanently change the future of your program, you would be derelict in your duty if you did not consider that offer. On the other hand, donors who might have ethical or criminal problems could drag the organization’s name through the mud. Worse, they might be able to force the organization itself to do something ethically questionable or even illegal.
There is a clear lesson for young academics and other up-and-coming non-profit actors in the Epstein affair: the more money potentially offered to our organizations, the more carefully we must tread. Money is power; power corrupts; and every pursuit of dollars, even for the best causes, risks infection. We can’t be large-scale non-profit fundraisers without doing serious and thorough background checks of the biggest donors; we have to question motives, and we can’t look the other way when something seems amiss. Those of us with clear hearts and honest pursuits tend to assume the best in other people. But we have to beware of those hoping to bolster their reputations, or clean their consciences, by giving away “generously” what they never deserved to have.
My top 10 ghosts (solo acts and ensembles). If Bruce Willis being a ghost in The Sixth Sense is a spoiler, that’s on you — the movie has been out for 26 years.
Einstein and I have both been spooked by entanglement. Einstein’s experience was more profound: in a 1947 letter to Born, he famously dubbed it spukhafte Fernwirkung (or spooky action at a distance). Mine, more pedestrian. It came when I first learned the cost of entangling logical qubits on today’s hardware.
Logical entanglement is not easy
I recently listened to a talk where the speaker declared that “logical entanglement is easy,” and I have to disagree. You could argue that it looks easy when compared to logical small-angle gates, in much the same way I would look small standing next to Shaquille O’Neal. But that doesn’t mean 6’5” and 240 pounds is small.
To see why it’s not easy, it helps to look at how logical entangling gates are actually implemented. A logical qubit is not a single physical object. It’s an error-resistant qubit built out of several noisy, error-prone physical qubits. A quantum error-correcting (QEC) code with parameters uses physical qubits to encode logical qubits in a way that can detect up to physical errors and correct up to of them.
This redundancy is what makes fault-tolerant quantum computing possible. It’s also what makes logical operations expensive.
On platforms like neutral-atom arrays and trapped ions, the standard approach is a transversal CNOT: you apply two-qubit gates pairwise across the code blocks (qubit in block A interacts with qubit in block B). That requires physical two-qubit gates to entangle the logical qubits of one code block with the logical qubits of another.
To make this less abstract, here’s a QuEra animation showing a transversal CNOT implemented in a neutral-atom array. This animation is showing real experimental data, not a schematic idealization.
The idea is simple. The problem is that can be large, and physical two-qubit gates are among the noisiest operations available on today’s hardware.
Superconducting platforms take a different route. They tend to rely on lattice surgery; you entangle logical qubits by repeatedly measuring joint stabilizers along a boundary. That replaces two-qubit gates for stabilizer measurements over multiple rounds (typically scaling with the code distance). Unfortunately, physical measurements are the other noisiest primitive we have.
Then there are the modern high-rate qLDPC codes, which pack many logical qubits into a single code block. These are excellent quantum memories. But when it comes to computation, they face challenges. Logical entangling gates can require significant circuit depth, and often entire auxiliary code blocks are needed to mediate the interaction.
This isn’t a purely theoretical complaint. In recent state-of-the-art experiments by Google and by the Harvard–QuEra–MIT collaboration, logical entangling gates consumed nearly half of the total error budget.
So no, logical entanglement is not easy. But, how easy can we make it?
Phantom codes: Logical entanglement without physical operations
To answer how easy logical entanglement can really be, it helps to start with a slightly counterintuitive observation: logical entanglement can sometimes be generated purely by permuting physical qubits.
Let me show you how this works in the simplest possible setting, and then I’ll explain what’s really going on.
Consider a stabilizer code, which encodes 4 physical qubits into 2 logical ones that can detect 1 error, but can’t correct any. Below are its logical operators; the arrow indicates what happens when we physically swap qubits 1 and 3 (bars denote logical operators).
You can check that the logical operators transform exactly as shown, which is the action of a logical CNOT gate. For readers less familiar with stabilizer codes, click the arrow below for an explanation of what’s going on. Those familiar can carry on.
Click!
At the logical level, we identify gates by how they transform logical Pauli operators. This is the same idea used in ordinary quantum circuits: a gate is defined not just by what it does to states, but by how it reshuffles observables.
A CNOT gate has a very characteristic action. If qubit 1 is the control and qubit 2 is the target, then: an on the control spreads to the target, a on the target spreads back to the control, and the other Pauli operators remain unchanged.
That’s exactly what we see above.
To see why this generates entanglement, it helps to switch from operators to states. A canonical example of how to generate entanglement in quantum circuits is the following. First, you put one qubit into a superposition using a Hadamard. Starting from , this gives
At this point there is still no entanglement — just superposition.
The entanglement appears when you apply a CNOT. The CNOT correlates the two branches of the superposition, producing
which is a maximally-entangled Bell state. The Hadamard creates superposition; the CNOT turns that superposition into correlation.
The operator transformations above are simply the algebraic version of this story. Seeing
tells us that information on one logical qubit is now inseparable from the other.
In other words, in this code,
The figure below shows how this logical circuit maps onto a physical circuit. Each horizontal line represents a qubit. On the left is a logical CNOT gate: the filled dot marks the control qubit, and the ⊕ symbol marks the target qubit whose state is flipped if the control is in the state . On the right is the corresponding physical implementation, where the logical gate is realized by acting on multiple physical qubits.
At this point, all we’ve done is trade one physical operation for another. The real magic comes next. Physical permutations do not actually need to be implemented in hardware. Because they commute cleanly through arbitrary circuits, they can be pulled to the very end of a computation and absorbed into a relabelling of the final measurement outcomes. No operator spread. No increase in circuit depth.
This is not true for generic physical gates. It is a unique property of permutations.
To see how this works, consider a slightly larger example using an code. Here the logical operators are a bit more complicated:
Below is a three-logical-qubit circuit implemented using this code like the circuit drawn above, but now with an extra step. Suppose the circuit contains three logical CNOTs, each implemented via a physical permutation.
Instead of executing any of these permutations, we simply keep track of them classically and relabel the outputs at the end. From the hardware’s point of view, nothing happened.
If you prefer a more physical picture, imagine this implemented with atoms in an array. The atoms never move. No gates fire. The entanglement is there anyway.
This is the key point. Because no physical gates are applied, the logical entangling operation has zero overhead. And for the same reason, it has perfect fidelity. We’ve reached the minimum possible cost of a logical entangling gate. You can’t beat free.
To be clear, not all codes are amenable to logical entanglement through relabeling. This is a very special feature that exists in some codes.
Motivated by this observation, my collaborators and I defined a new class of QEC codes. I’ll state the definition first, and then unpack what it really means.
Phantom codes are stabilizer codes in which logical entangling gates between every ordered pair of logical qubits can be implemented solely via physical qubit permutations.
The phrase “every ordered pair” is a strong requirement. For three logical qubits, it means the code must support logical CNOTs between qubits , , , , , and . More generally, a code with logical qubits must support all possible directed CNOTs. This isn’t pedantry. Without access to every directed pair, you can’t freely build arbitrary entangling circuits — you’re stuck with a restricted gate set.
The phrase “solely via physical qubit permutations” is just as demanding. If all but one of those CNOTs could be implemented via permutations, but the last one required even a single physical gate — say, a one-qubit Clifford — the code would not be phantom. That condition is what buys you zero overhead and perfect fidelity. Permutations can be compiled away entirely; any additional physical operation cannot.
Together, these two requirements carve out a very special class of codes. All in-block logical entangling gates are free. Logical entangling gates between phantom code blocks are still available — they’re simply implemented transversally.
After settling on this definition, we went back through the literature to see whether any existing codes already satisfied it. We found two. The Carbon code and hypercube codes. The former enabled repeated rounds of quantum error-correction in trapped-ion experiments, while the latter underpinned recent neutral-atom experiments achieving logical-over-physical performance gains in quantum circuit sampling.
Both are genuine phantom codes. Both are also limited. With distance , they can detect errors but not correct them. With only logical qubits, there’s a limited class of CNOT circuits you can implement. Which begs the questions: Do other phantom codes exist? Can these codes have advantages that persist for scalable applications under realistic noise conditions? What structural constraints do they obey (parameters, other gates, etc.)?
Before getting to that, a brief note for the even more expert reader on four things phantom codes are not. Phantom codes are not a form of logical Pauli-frame tracking: the phantom property survives in the presence of non-Clifford gates. They are not strictly confined to a single code block: because they are CSS codes, multiple blocks can be stitched together using physical CNOTs in linear depth. They are not automorphism gates, which rely on single-qubit Cliffords and therefore do not achieve zero overhead or perfect fidelity. And they are not codes like SHYPS, Gross, or Tesseract codes, which allow only products of CNOTs via permutations rather than individually addressable ones. All of those codes are interesting. They’re just not phantom codes.
In a recent preprint, we set out to answer the three questions above. This post isn’t about walking through all of those results in detail, so here’s the short version. First, we find many more phantom codes — hundreds of thousands of additional examples, along with infinite families that allow both and to scale. We study their structural properties and identify which other logical gates they support beyond their characteristic phantom ones.
Second, we show that phantom codes can be practically useful for the right kinds of tasks — essentially, those that are heavy on entangling gates. In end-to-end noisy simulations, we find that phantom codes can outperform the surface code, achieving one–to–two orders of magnitude reductions in logical infidelity for resource state preparation (GHZ-state preparation) and many-body simulation, at comparable qubit overhead and with a modest preselection acceptance rate of about 24%.
If you’re interested in the details, you can read more in our preprint.
Larger space of codes to explore
This is probably a good moment to zoom out and ask the referee question: why does this matter?
I was recently updating my CV and realized I’ve now written my 40th referee report for APS. After a while, refereeing trains a reflex. No matter how clever the construction or how clean the proof, you keep coming back to the same question: what does this actually change?
So why do phantom codes matter? At least to me, there are two reasons: one about how we think about QEC code design, and one about what these codes can already do in practice.
The first reason is the one I’m most excited about. It has less to do with any particular code and more to do with how the field implicitly organizes the space of QEC codes. Most of that space is structured around familiar structural properties: encoding rate, distance, stabilizer weight, LDPC-ness. These form the axes that make a code a good memory. And they matter, a lot.
But computation lives on a different axis. Logical gates cost something, and that cost is sometimes treated as downstream—something to be optimized after a code is chosen, rather than something to design for directly. As a result, the cost of logical operations is usually inherited, not engineered.
One way to make this tension explicit is to think of code design as a multi-dimensional space with at least two axes. One axis is memory cost: how efficiently a code stores information. High rate, high distance, low-weight stabilizers, efficient decoding — all the usual virtues. The other axis is computational cost: how expensive it is to actually do things with the encoded qubits. Low computational cost means many logical gates can be implemented with little overhead. Low computational cost makes computation easy.
Why focus on extreme points in this space? Because extremes are informative. They tell you what is possible, what is impossible, and which tradeoffs are structural rather than accidental.
Phantom codes sit precisely at one such extreme: they minimize the cost of in-block logical entanglement. That zero-logical-cost extreme comes with tradeoffs. The phantom codes we find tend to have high stabilizer weights, and for families with scalable , the number of physical qubits grows exponentially. These are real costs, and they matter.
Still, the important lesson is that even at this extreme point, codes can outperform LDPC-based architectures on well-chosen tasks. That observation motivates an approach to QEC code design in which the logical gates of interest are placed at the centre of the design process, rather than treated as an afterthought. This is my first takeaway from this work.
Second is that phantom codes are naturally well suited to circuits that are heavy on logical entangling gates. Some interesting applications fall into this category, including fermionic simulation and correlated-phase preparation. Combined with recent algorithmic advances that reduce the overhead of digital fermionic simulation, these code-level ideas could potentially improve near-term experimental feasibility.
Back to being spooked
The space of QEC codes is massive. Perhaps two axes are not enough. Stabilizer weight might deserve its own. Perhaps different applications demand different projections of this space. I don’t yet know the best way to organize it.
The size of this space is a little spooky — and that’s part of what makes it exciting to explore, and to see what these corners of code space can teach us about fault-tolerant quantum computation.
Strange how time goes by. And strange I would say that, since I know time does not flow, it is just our perception of one of the spacetime coordinates of our block universe... The thing is, on February 5 I will turn 60. An important date for anybody - I could say a milestone. First of all, let me say that we give for granted all the days of our life we got to live, but in truth we did not know it from the start we would make it far. I do feel rather young still, but I am very well aware that there are heaps of ways I could have ended my life earlier. Accidents, but also naturally occurring sickness.
The short course is aimed at people from industry or government who want to get started in deep learning, apply deep learning to their projects, learn how to code deep learning algorithms, and upgrade their skills to the latest AI algorithms, including generative AI. The course will be taught be Professor Xavier Bresson from the Department of Computer Science at the National University of Singapore (NUS).
The course will be limited to 25 participants. The fee for this course is $2,500 for participants. Registration closes soon (Feb 5); we still have a few spots available.
Thomas Bloom’s Erdös problem site has become a real hotbed of activity in recent months, particularly as some of the easiest of the outstanding open problems have turned out to be amenable to various AI-assisted approaches; there is now a lively community in which human contributions, AI contributions, and hybrid contributions are presented, discussed, and in some cases approved as updates to the site.
One of the lessons I draw from this is that once a well curated database of precise mathematical problems is maintained, it becomes possible for other parties to build upon it in many ways (including both AI-based and human-based approaches), to systematically make progress on some fraction of the problems.
This makes me wonder what other mathematical databases could be created to stimulate similar activity. One candidate that came to mind are “optimization constants” – constants that arise from some mathematical optimization problem of interest, for instance finding the best constant for which a certain functional inequality is satisfied.
I am therefore proposing to create a crowdsourced repository for such constants, to record the best upper and lower bounds known for any given such constant, in order to help encourage efforts (whether they be by professional mathematicians, amateur mathematicians, or research groups at a tech company) to try to improve upon the state of the art.
There are of course thousands of such constants one could consider, but just to set the discussion going, I set up a very minimal, proof of concept Github repository holding over 20 constants including:
Here, I am taking inspiration from the Erdös problem web site and arbitrarily assigning a number to each constant, for ease of reference.
Even in this minimal state I think the repository is ready to start accepting more contributions, in the form of pull requests that add new constants, or improve the known bounds on existing constants. (I am particularly interested in constants that have an extensive literature of incremental improvements in the lower and upper bounds, and which look at least somewhat amenable to computational or AI-assisted approaches.) But I would be interested to hear feedback on how to improve the repository in other ways.
Update: Paata Ivanisvili and Damek Davis have kindly agreed to help run and expand this repository.
and he okayed me posting them here. He’s taking the idea of categorifying the Riemann zeta function, explained in my paper, and going further, imagining what it might mean to categorify Riemann’s functional equation
My paper categorified the Euler product formula that writes the Riemann zeta function as a product over the usual primes:
I had nothing to say about the real prime.
But it’s the functional equation that sets the stage for focusing on zeroes of the Riemann zeta function with … and then the Riemann Hypothesis! So it’s worth thinking about.
David wrote:
Hi John,
Hope you’re doing well!
I was just thinking about your (and James Dolan’s) definition of the zeta functors associated to a finite type scheme (from here), and I had a small thought which I figured you might find interesting.
I was thinking about the functional equation of the completed zeta functions; how might we complete the zeta functors in such a way that they satisfy a similar functional equation? I don’t know, but I do have an idea for what the transformation might mean in this context. I claim that it is given by the reduced suspension. Let me explain.
First, I’ll want to see the formal power as the power
which I can then categorify by finding a group with cardinality and considering . In the case of the Riemann zeta species, is the cardinality of a finite semisimple ring (a product of finite fields, the groupoid of which has cardinality for each ), and we can simply deloop the additive group of this ring. This gives us a Dirichlet functor
which categorifies the Riemann zeta function when is a finite set.
Taking this point of view on the zeta functor, we can ask the question: what is the transformation ? Here’s where we can look at the reduced suspension . The universal property of the reduced suspension says that maps correspond to points of the homotopy type
(or, more classically, maps from the terminal morphism to ). Since homotopy cardinality is multiplicative for fibrations, that type has cardinality
(when is a finite set of cardinality ).
Taking for finite semisimple of cardinality , we see that has cardinality . Therefore, I think the transformation in the functional equation may be categorified by . If this makes sense, it suggests that completing the zeta functors is a form of stabilization.
Cheers,
David
And then:
As for another eyebrow wiggle about the cardinality of when is a finite set: we have that , the free group on generators. This is of course infinite, but it it is the group completion of the free monoid on generators. Since
it has cardinality .
Maybe it’s better to use the “free delooping” (aka weighted colimit of by ) instead of the reduced suspension. This doesn’t change the above argument because we’re mapping into a groupoid, but now it is true that the Euler characteristic / cardinality of that category is .
I recently listened again to Richard Feynman explaining why the flowing of time is probably an illusion. In modern physics time is just a coordinate, on the same footing as space, and the universe can be described as a four-dimensional object — a spacetime block. In that view, nothing really “flows”. All events simply are, laid out in a 4D structure. What we experience as the passage of time is tied instead to the arrow of entropy: the fact that we move through a sequence of states ordered by increasing disorder, and that memory itself is asymmetric.
Today I was saddened to hear of the passing of Hans Jensen, a physicist and former colleague in the CDF experiment at Fermilab. There is an obituary page here with nice pics and a bio if you want detail on his interesting, accomplished life. Here I thought I would remember him by pasting an excerpt of my 2016 book, "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", where he is featured. The topic of the anecdote is the data collection for the top quark search. The date is December 1992. ---
Abstract. Coxeter and Dynkin diagrams classify a wide variety of structures, most notably finite reflection groups, lattices having such groups as symmetries, compact simple Lie groups and complex simple Lie algebras. The simply laced or “ADE” Dynkin diagrams also classify finite subgroups of SU(2) and quivers with finitely many indecomposable representations. This introductory tour of Coxeter and Dynkin diagrams, based on the column This Week’s Finds in Mathematical Physics, is made to accompany a series of five lecture videos.
I’m a bit sorry that I didn’t probe deeper into why Dynkin diagrams are what they are: that is, why these and no others? I’m also sorry I didn’t dig into the “black magic” that I mention at the end: that is, why does this black magic work? I’d also like to include a little comparison of the 4 lattices you get from the Lie algebra of a compact simple Lie group: the weight lattice, the coweight lattice, the root lattice, and the coroot lattice — merely because I tend to get them confused, and my exposition needed to say a bit about these.
Luckily I can add these other things later. And I think keeping it short and snappy has its own charms.
When Lee and Yang suggested that the laws of physics might not be invariant under spatial reflection — that there’s a fundamental difference between left and right — Pauli was skeptical. In a letter to Victor Weisskopf in January 1957, he wrote:
“Ich glaube aber nicht, daß der Herrgott ein schwacher Linkshänder ist.”
(I do not believe that the Lord is a weak left-hander.)
But just two days after Pauli wrote this letter, Chien-Shiung Wu’s experiment confirmed that Lee and Yang were correct. There’s an inherent asymmetry in nature.
We can trace this back to how the ‘left-handed’ fermions and antifermions live in a different representation of the Standard Model gauge group than the right-handed ones. And when we try to build grand unified theories that take this into account, we run into the fact that while we can fit the Standard Model gauge group into in various ways, not all these ways produce the required asymmetry. There’s a way where it fits into , which is too symmetrical to work… and alas, this one has a nice octonionic description!
To keep things simple I’ll explain this by focusing, not on the whole Standard Model gauge group, but its subgroup . Here is a theorem proved by Will Sawin in response to a question of mine on MathOverflow:
Theorem 10. There are exactly two conjugacy classes of subgroups of that are isomorphic to . One of them has a representative that is a subgroup of , while the other does not.
I’ll describe representatives of these two subgroups; then I’ll say a bit about how they show up in physics, and then I’ll show you Sawin’s proof.
We can get both subgroups in a unified way! There’s always an inclusion
and taking double covers of each group we get a 2-1 homomorphism
In particular we have
so composing with the exceptional isomorphisms:
we get a 2-1 homomorphism
Now, there are three obvious ways to include in . There is an obvious inclusion
but there are three obvious inclusions
namely the left one:
the right one:
and the diagonal one:
Combining these with our earlier maps, we actually get a one-to-one map from to . So we get three subgroups of , all isomorphic to :
There’s the left subgroup , which is the image of this composite homomorphism:
There’s the diagonal subgroup , which is the image of this:
And there’s the right subgroup , which is the image of this:
The left and right subgroups are actually conjugate, but the diagonal one is truly different! We’ll prove this by taking a certain representation of , called the Weyl spinor representation, and restricting it to those two subgroups. We’ll get inequivalent representations of . This proves the two subgroups aren’t conjugate.
This argument is also interesting for physics. When restrict to the left subgroup, we get a representation of that matches what we actually see for one generation of fermions! This is the basis of the so-called grand unified theory, which should really be called the grand unified theory.
(In fact this works not only for but for the whole Standard Model gauge group, which is larger. I’m focusing on just because it makes the story simpler.)
When we restrict the Weyl spinor representation to the diagonal subgroup, we get a representation of that is not physically correct. Unfortunately, it’s the diagonal subgroup that shows up in several papers connecting the Standard Model gauge group to the octonions. I plan to say a lot more about this later.
The left subgroup
Let’s look at the left subgroup , the image of this composite:
has a 32-dimensional unitary representation called the ‘Dirac spinor’ representation. This representation is really on the exterior algebra . It’s the direct sum of two irreducible parts, the even grades and the odd grades:
Physicists call these two irreducible representations ‘right- and left-handed Weyl spinors’, and denote them as and since they’re 16-dimensional and one is the dual of the other.
Let’s restrict the to the left subgroup and see what we get.
To do this, first we can restrict the along and get
Here is the trivial representation of , is the tautologous representation of , and is the tautologous rep of .
Then let’s finish the job by restricting this representation along . Restricting the of to gives : the sum of the tautologous representation of and the trivial representation. Restricting to the left copy of gives the tautologous representation , while restricting to this left copy gives : the sum of two copies of the trivial representation. All in all, we get this representation of :
This is what we actually see for one generation of left-handed fermions and antifermions in the Standard Model! The representation describes how the left-handed fermions in one generation transform under : 3 colors of quark and one ‘white’ lepton. The representation does the same for the left-handed antifermions. The left-handed fermions form an isospin doublet, giving us the , while the left-handed antifermions have no isospin, giving us the .
This strange lopsidedness is a fundamental feature of the Standard Model.
The right subgroup would work the same way, up to switching the words ‘left-handed’ and ‘right-handed’. And by Theorem 10, the left and right subgroups must be conjugate in , because now we’ll see one that’s not conjugate to either of these.
The diagonal subgroup
Consider the diagonal subgroup , the image of this composite:
Let’s restrict the to .
To do this, first let’s restrict the along and get
as before. Then let’s restrict this representation along . The part works as before, but what happens when we restrict or along the diagonal map ? We get . So, this is the representation of that we get:
This is not good for the Standard Model. It describes a more symmetrical universe than ours, where both left-handed fermions and antifermions transform as doublets under .
The fact that we got a different answer this time proves that and are not conjugate in . So to complete the proof of Theorem 10, we only need to prove
Every subgroup of isomorphic to is conjugate to or .
is conjugate to a subgroup of , but is not.
I’ll prove 2, and then I’ll turn you over to Will Sawin to do the rest.
Why the diagonal subgroup fits in
Every rotation of extends to a rotation of that leaves the last coordinate fixed, so we get an inclusion , which lifts to an inclusion of the double covers, . Since we have exceptional isomorphisms
it’s natural to ask how the inclusion looks in these terms. And the answer is: it’s the diagonal map! In other words, we have a commutative diagram
Now, we can easily fit this into a larger commutative diagram involving some natural maps and :
We can simplify this diagram using the isomorphism :
and then we can use our friend the inclusion :
This shows that the diagonal subgroup of is actually a subgroup of !
Why the left subgroup does not fit in
The three-fold way is a coarse classification of irreducible complex representations of compact Lie group. Every such representation is of one and only one of these three kinds:
1) not self-dual: not isomorphic to its dual,
2a) orthogonal: isomorphic to its dual via an invariant nondegenerate symmetric bilinear form, also called an orthogonal structure,
2b) symplectic: isomorphic to its dual via an invariant nondegenerate antisymmetric bilinear form, also called a symplectic structure.
I’ve written about how these three cases are related to the division algebras and , respectively:
A complex representation is orthogonal iff it’s the complexification of a representation on a real vector space, and symplectic iff it’s the underlying complex representation of a representation on a quaternionic vector space.
But we don’t need most of this yet. For now we just need to know one fact: when is odd, every irreducible representation of , and thus every representation of this Lie group, is self-dual: that is, isomorphic to its dual. In particular this is true of .
Why does this matter? Assume the left subgroup is a subgroup of . When we restrict the Weyl spinor representation of to it will be self-dual, like every representation of . Then when we restrict this representation further to it must still be self-dual, since the restriction of a self-dual representation is clearly self-dual.
However, we know this representation is
and this is not self-dual, since and but .
So, it must be that is not a subgroup of .
Proof of Theorem 10
To complete the proof of Theorem 10 we just need to see why there are just two conjugacy classes of subgroups of isomorphic to . But in fact Will Sawin proved a stronger result! He was answering this question of mine:
Define the Standard Model gauge group to be , the subgroup of consisting of block diagonal matrices with a block and then a block. (This is isomorphic to the quotient of by the subgroup of elements ) where is a 6th root of unity.)
Up to conjugacy, how many subgroups isomorphic to the Standard Model gauge group does have?
This question is relevant to grand unified theories of particle physics, as explained here:
This paper focuses on one particular copy of in , given as follows. By definition we have an inclusion , and we also have an inclusion because for any we have an inclusion , and is simply connected so this gives a homomorphism .
However I think there is also an inclusion , studied by Krasnov:
Composing this with , this should give another inclusion , and I believe this one is ‘truly different from’ — i.e., not conjugate to — the first one I mentioned.
So I believe my current answer to my question is “at least two”. But that’s not good enough.
Sawin’s answer relies heavily on the 3-fold way — that’s why I told you that stuff about orthogonal and symplectic representatinos. When we embed the group in , we are automatically giving this group an orthogonal 10-dimensional representation, thanks to the map . We can classify the possibilities.
He writes:
There are infinitely many embeddings. However, all but one of them is “essentially the same as” the one you studied as they become equal to the one you studied on restriction to . The remaining one is the one studied by Krasnov.
has irreducible representations of dimensions , and higher dimensions. The -dimensional ones are dual to each other, as are the -dimensional ones, so they can’t appear. The -dimensional ones are dual to each other and can only appear together. So the only -dimensional self-dual representations of decompose as irreducibles as , , or ten s. All of these are orthogonal because the 8-dimensional representation is orthogonal. However, the ten s cannot appear because then would act trivially.
A representation of is a sum of tensor products of irreducible representations of and irreducible representations of . Restricted to , each tensor product splits into a sum of copies of the same irreducible representation. So can only act nontrivially when the same representation appears multiple times. Since the is two different -dimensional representation, only the -dimensional representation can occur twice. Thus, our 10-dimensional orthogonal representation of necessarily splits as either the -dimensional adjoint repsentation of plus a -dimensional orthogonal representation of or the -dimensional sum of standard and conjugate [i.e., dual] representations of plus a -dimensional orthogonal representation of . However, has a unique nontrivial representation of dimension and it isn’t orthgonal, so only the second case can appear. has representations of dimension of which the and -dimensional ones are symplectic and so must appear with even multiplicity in any orthogonal representation, so the only nontrivial -dimensional orthogonal ones are or .
So there are two ten-dimensional orthogonal representations of that are nontrivial on both factors, those being the sum of two different -dimensional irreducible representations of with either two copies of the two-dimensional irreducible representation of or the three-dimensional and the one-dimensional irreducible representation of . The orthogonal structure is unique up to isomorphisms, so these give two conjugacy classes of homomorphisms and thus two conjugacy classes of homomorphisms . The first one corrresponds to the embedding you studied while only the second one restricts to so indeed these are different.
To understand how to extend these to , I consider the centralizer of the representation within . Since the group is connected, this is the same as the centralizer of its Lie algebra, which is therefore the inverse image of the centralizer in . Now there is a distinction between the two examples because the example with irrep dimensions has centralizer with identity component while the example with irrep dimensions has centralizer with identity component . In the second case, the image of must be the image of times the centralizer of the image of , so this gives a unique example, which must be the one considered by Krasnov.
In the first case, we can restrict attention to a torus in . The center of maps to a one-dimensional subgroup of this torus, which can be described by a pair of integers. Explicitly, given a two-by-two-unitary matrix and a three-by-three unitary matrix with , we can map to by sending to where , and then map from to . This lifts to the spin group if and only if the determinant in is a perfect square. The determinant is so a lift exists if and only if is even.
The only possible kernel of this embedding is the scalars. The scalar maps to and so the kernel is trivial if and only if .
However, there are infinitely many integer solutions to with even (in fact, a random and even works with probability ), so this gives infinitely many examples.
Part 1. How to define octonion multiplication using complex scalars and vectors, much as quaternion multiplication can be defined using real scalars and vectors. This description requires singling out a specific unit imaginary octonion, and it shows that octonion multiplication is invariant under .
Part 2. A more polished way to think about octonion multiplication in terms of complex scalars and vectors, and a similar-looking way to describe it using the cross product in 7 dimensions.
Part 3. How a lepton and a quark fit together into an octonion — at least if we only consider them as representations of , the gauge group of the strong force. Proof that the symmetries of the octonions fixing an imaginary octonion form precisely the group .
Part 4. Introducing the exceptional Jordan algebra : the self-adjoint octonionic matrices. A result of Dubois-Violette and Todorov: the symmetries of the exceptional Jordan algebra preserving their splitting into complex scalar and vector parts and preserving a copy of the adjoint octonionic matrices form precisely the Standard Model gauge group.
Part 5. How to think of self-adjoint octonionic matrices as vectors in 10d Minkowski spacetime, and pairs of octonions as left- or right-handed spinors.
Part 6. The linear transformations of the exceptional Jordan algebra that preserve the determinant form the exceptional Lie group . How to compute this determinant in terms of 10-dimensional spacetime geometry: that is, scalars, vectors and left-handed spinors in 10d Minkowski spacetime.
Part 7. How to describe the Lie group using 10-dimensional spacetime geometry. This group is built from the double cover of the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 8. A geometrical way to see how is connected to 10d spacetime, based on the octonionic projective plane.
Part 9. Duality in projective plane geometry, and how it lets us break the Lie group into the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 10. Jordan algebras, their symmetry groups, their invariant structures — and how they connect quantum mechanics, special relativity and projective geometry.
Part 11. Particle physics on the spacetime given by the exceptional Jordan algebra: a summary of work with Greg Egan and John Huerta.
Part 12. The bioctonionic projective plane and its connections to algebra, geometry and physics.
Part 13. Two ways to embed in , and their consequences for particle physics.
I’m writing to point out a potential law which should be gathering more opposition and attention in math academia: The Securing American Funding and Expertise from Adversarial Research Exploitation Act. This is an amendment to the 2026 National Defense Authorization Act which has passed the House and could be added to the final version of the bill during reconcilliation in the Senate. I’m pulling most of my information from an article in Science.
This act would ban any US scientist from receiving federal funding if they have, within the last five years, worked with anyone from China, Russia, Iran or North Korea, where “worked with” includes joint research, co-authorship on papers, or advising a foreign graduate student or postdoctoral fellow. As I said in my message to my senators, this is everyone. Every mathematician has advised Chinese graduate students or collaborated with Chinese mathematicians, because China is integrated into the academic world and is one fifth of the earth.
This obviously isn’t secret, since you can read about it in Science, but I am surprised that I haven’t heard more alarm. Obvious people to contact are your senators and your representatives. I would also suggest contacting members of the Senate armed services committee, who are in charge of reconciling the House and Senate versions of the bill.
“Information” is an idea that is everywhere in science and technology these days. From one angle it looks like such an obvious idea that it’s a bit startling to realize that information theory didn’t really come along until the work of Claude Shannon in the 1940s. From another, the idea has so many different shades of meaning that we shouldn’t be surprised (that’s a joke you will get in a bit) that it can be hard to understand.
Information theory is obviously an enormous subject, but we’re just giving thanks, not writing a textbook. I want to mention two ideas I find especially central. First, Shannon’s idea about relating information content to “surprisal.” Second, the very different intuitive notions of information that we get from engineering and physics.
Shannon, working at Bell Labs, was interested in the problem of how to send trustworthy signals efficiently over transatlantic cables. He was thinking about various ways to express information in a code: a set of symbols, each with a defined meaning. So a code might be an alphabet, or a set of words, or a literal cipher. And he noticed that there was a lot of redundancy in natural languages; the word “the” appears much more often in English than the word “axe,” although both have the same number of letters.
Let’s refer to each letter or symbol in a code as an “event.” Shannon’s insight was to realize that the more unlikely an event, the more information it conveyed when it was received. The statements “The Sun rose in the east this morning” and “The Sun rose in the west this morning” contain the same number of letters, but the former contains almost no information — you already were pretty sure the Sun would be rising in the east. But the latter, if obtained from a reliable source, would be very informative indeed, precisely because it was so unexpected. Clearly some kind of unprecedented astronomical catastrophe was in progress.
Imagine we can assign a probability to every different event . Shannon wanted a way to quantify the information content of that event, which would satisfy various reasonable-seeming axioms: most crucially, that the information content of two independent events is the sum of the individual information contents. But the joint probability of two events is the product of their individual probabilities. So the natural thing to do would be to define the information content as the logarithm of the probability; the logarithm of a product equals the sum of the individual logarithms. But you want low probability to correspond to high information content, so Shannon defined the information content (also called the self-information, or surprisal, or Shannon information) of an event to be minus the log of the probability, which by math is equal to the log of the reciprocal of the probability:
Note that probabilities are numbers between 0 and 1, and the log of such a number will be negative, with numbers closer to 0 being more negative than numbers closer to 1. So goes from at to at . An impossible message is infinitely surprising, and therefore conveys infinite information; an inevitable message is completely unsurprising, and conveys no information at all.
From there, Shannon suggested that we could characterize how efficient an entire code was at conveying information: just calculate the average (expectation value) of the information content for all possible events. When we have a probability distribution , the average of any function is just the sum of the the values of the function times their respective probabilities, . So we characterize the information content of a code via the quantity
The only question is, what to call this lovely newly-defined quantity that surely nobody had ever thought of before? Happily Shannon was friends with John von Neumann, who informed him, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.” So entropy it is.
Indeed, this formula is precisely that which had been put forward (unknown to Shannon) by Josiah Willard Gibbs in the 1870’s as a definition of entropy in statistical mechanics. (It is related to the definition on Ludwig Boltzmann’s tombstone, , and Boltzmann had also suggested similar expressions to the above.) On the one hand, it seems remarkable to find precisely the same expression playing central roles in problems as disparate as sending signals across cables and watching cream mix into coffee; on the other hand, it’s a relatively simple expression and the axioms used to derive it are actually pretty similar, so perhaps we shouldn’t be surprised; on the third hand, the connection between information theory and statistical mechanics turns out to be deep and fruitful, so it’s more than just a mathematical coincidence.
But let me highlight the one aspect of the term “information” that can be sometimes confusing to people. To the engineer, a code that is maximally informative is one for which is relatively uniform over all events , which means is maximal or close to it; in that case, every event will tell you something at least a little bit interesting. For them, high entropy = high information.
But to a physicist who might be asking “how much information do I have about the state of a system?”, you have more information when is relatively narrowly concentrated around some value, rather than being all spread out. For them, high entropy = low information! Indeed, one physically-relevant notion of “information” is the “accessible information” of a system, which can be defined as . (I talk about this a bit in my recent solo podcast on complexity.)
Perhaps we shouldn’t be so surprised that physicists and engineers posit oppositely-directed relationships between entropy and information. It’s just a reflection of the fact that “information” is so ubiquitous and has so many different uses. We should be thankful that we’re beginning to understand it so well.
It’s been over three years since my last post on this blog and I have sometimes been asked, understandably, whether the project I announced in my previous post was actually happening. The answer is yes — the grant I received from the Astera Institute has funded several PhD students and a couple of postdocs, and we have been busy. In my previous post I suggested that I would be open to remote collaboration, but that has happened much less, partly because a Polymath-style approach would have been difficult to manage while also ensuring that my PhD students would have work that they could call their own to put in their theses.
In general I don’t see a satisfactory solution to that problem, but in this post I want to mention a subproject of the main project that is very much intended to be a large public collaboration. A few months ago, a call came out from Renaissance Philanthropies saying that they were launching a $9m AI for Math Fund to spend on projects in the general sphere of AI and mathematics, and inviting proposals. One of the categories that they specifically mentioned was creating new databases, and my group submitted a proposal to create a database of what we call “structured motivated proofs,” a piece of terminology that I will explain a bit more later in just a moment. I am happy to report that our proposal was one of the 29 successful ones. Since a good outcome to the project will depend on collaboration from many people outside the group, we need to publicize it, which is precisely the purpose of this post. Below I will be more specific about the kind of help we are looking for.
Why might yet another database of theorems and proofs be useful?
The underlying thought behind this project is that AI for mathematics is being held back not so much by an insufficient quantity of data as by the wrong kind of data. (For a more general exploration of this theme, see here.) All mathematicians know, and some of us enjoy complaining about it, that it is common practice when presenting a proof in a mathematics paper, or even textbook, to hide the thought processes that led to the proof. Often this does not matter too much, because the thought processes may be standard ones that do not need to be spelt out to the intended audience. But when proofs start to get longer and more difficult, they can be hard to read because one has to absorb definitions and lemma statements that are not obviously useful, are presented as if they appeared from nowhere, and demonstrate their utility only much later in the argument.
A sign that this is a problem for AI is the behaviour one observes after asking an LLM to prove a statement that is too difficult for it. Very often, instead of admitting defeat, it will imitate the style of a typical mathematics paper and produce rabbits out of hats, together with arguments later on that those rabbits do the required job. The problem is that, unlike with a correct mathematics paper, one finds when one scrutinizes the arguments carefully that they are wrong. However, it is hard to find superficial features that distinguish between an incorrect rabbit with an incorrect argument justifying that rabbit (especially if the argument does not go into full detail) and a correct one, so the kinds of statistical methods used by LLMs do not have an easy way to penalize the incorrectness.
Of course, that does not mean that LLMs cannot do mathematics at all — they are remarkably good at it, at least compared with what I would have expected three years ago. How can that be, given the problem I have discussed in the previous paragraph?
The way I see it (which could change — things move so fast in this sphere), the data that is currently available to train LLMs and other systems is very suitable for a certain way of doing mathematics that I call guess and check. When trying to solve a maths problem, you will normally write down the routine parts of an argument without any fuss (and an LLM can do them too because it has seen plenty of similar examples), but if the problem as a whole is not routine, then at some point you have to stop and think, often because you need to construct an object that has certain properties (I mean this in a rather general way — the “object” might be a lemma that will split up the proof in a nice way) and it is not obvious how to do so. The guess-and-check approach to such moments is what it says: you make as intelligent a guess as you can and then see whether it has the properties you wanted. If it doesn’t, you make another guess, and you keep going until you get lucky.
The reason an LLM might be tempted to use this kind of approach is that the style of mathematical writing I described above makes it look as though that is what we as mathematicians do. Of course, we don’t actually do that, but we tend not to mention all the failed guesses we made and how we carefully examined why they failed, modifying them in appropriate ways in response, until we finally converged on an object that worked. We also don’t mention the reasoning that often takes place before we make the guess, saying to ourselves things like “Clearly an Abelian group can’t have that property, so I need to look for a non-Abelian group.”
Intelligent guess and check works well a lot of the time, particularly when carried out by an LLM that has seen many proofs of many theorems. I have often been surprised when I have asked an LLM a problem of the form , where is some property that is hard to satisfy, and the LLM has had no trouble answering it. But somehow when this happens, the flavour of the answer given by the LLM leaves me with the impression that the technique it has used to construct is one that it has seen before and regards as standard.
If the above picture of what LLMs can do is correct (the considerations for reinforcement-learning-based systems such as AlphaProof are not identical but I think that much of what I say in this post applies to them too for slightly different reasons), then the likely consequence is that if we pursue current approaches, then we will reach a plateau: broadly speaking they will be very good at answering a question if it is the kind of question that a mathematician with the right domain expertise and good instincts would find reasonably straightforward, but will struggle with anything that is not of that kind. In particular, they will struggle with research-level problems, which are, almost by definition, problems that experts in the area do not find straightforward. (Of course, there would probably be cases where an LLM spots relatively easy arguments that the experts had missed, but that wouldn’t fundamentally alter the fact that they weren’t really capable of doing research-level mathematics.)
But what if we had a database of theorems and proofs that did not hide the thought processes that lay behind the non-obvious details of the proofs? If we could train AI on a database of accounts of proof discoveries and if, having done so, we then asked it to provide similar accounts, then it would no longer resort to guess-and-check when it got stuck, because the proof-discovery accounts it had been trained on would not be resorting to it. There could be a problem getting it to unlearn its bad habits, but I don’t think that difficulty would be impossible to surmount.
The next question is what such a database might look like. One could just invite people to send in stream-of-consciousness accounts of how they themselves found certain proofs, but that option is unsatisfactory for several reasons.
It can be very hard to remember where an idea came from, even a few seconds after one has had it — in that respect it is like a dream, the memory of which becomes rapidly less vivid as one wakes up.
Often an idea will seem fairly obvious to one person but not to another.
The phrase “motivated proof” means different things to different people, so without a lot of careful moderation and curation of entries, there is a risk that a database would be disorganized and not much more helpful than a database of conventionally written proofs.
A stream-of-consciousness account could end up being a bit too much about the person who finds the proof and not enough about the mathematical reasons for the proof being feasibly discoverable.
To deal with these kinds of difficulties, we plan to introduce a notion of a structured motivated proof, by which we mean a proof that is generated in a very particular way that I will partially describe below. A major part of the project, and part of the reason we needed funding for it, is to create a platform that will make it convenient to input structured motivated proofs and difficult to insert the kinds of rabbits out of hats that make a proof mysterious and unmotivated. In this way we hope to gamify the task of creating the database, challenging people to input into our system proofs of certain theorems that appear to rely on “magic” ideas, and perhaps even offering prizes for proofs that contain steps that appear in advance to be particularly hard to motivate. (An example: the solution by Ellenberg and Gijswijt of the cap-set problem uses polynomials in a magic-seeming way. The idea of using polynomials came from an earlier paper of Croot, Lev and Pach that proved a closely related theorem, but in that paper it just appears in the statement of their Lemma 1, with no prior discussion apart from the words “in the present paper we use the polynomial method” in the introduction.)
What is a structured motivated proof?
I wrote about motivated proofs in my previous post, but thanks to many discussions with other members of the group, my ideas have developed quite a lot since then. Here are two ways we like to think about the concept.
1. A structured motivated proof is one that is generated by standard moves.
I will not go into full detail about what I mean by this, but will do so in a future post when we have created the platform that we would like people to use in order to input proofs into the database. But the basic idea is that at any one moment one is in a certain state, which we call a proof-discovery state, and there will be a set of possible moves that can take one from the current proof-discovery state to a new one.
A proof-discovery state is supposed to be a more formal representation of the state one is in when in the middle of solving a problem. Typically, if the problem is difficult, one will have asked a number of questions, and will be aware of logical relationships between them: for example, one might know that a positive answer to Q1 could be used to create a counterexample to Q2, or that Q3 is a special case of Q4, and so on. One will also have proved some results connected with the original question, and again these results will be related to each other and to the original problem in various ways that might be quite complicated: for example P1 might be a special case of Q2, which, if true would reduce Q3 to Q4, where Q3 is a generalization of the statement we are trying to prove.
Typically we will be focusing on one of the questions, and typically that question will take the form of some hypotheses and a target (the question being whether the hypotheses imply the target). One kind of move we might make is a standard logical move such as forwards or backwards reasoning: for example, if we have hypotheses of the form and , then we might decide to deduce . But things get more interesting when we consider slightly less basic actions we might take. Here are three examples.
We have in our list of hypotheses the fact that a function is given by the formula , where is a polynomial, and our goal is to prove that there exists such that . Without really thinking about it, we are conscious that is a composition of two functions, one of which is continuous and one of which belongs to a class of functions that are all continuous, so is continuous. Also, the conclusion matches well the conclusion of the intermediate-value theorem. So the intermediate-value theorem comes naturally to mind and we add it to our list of available hypotheses. In practice we wouldn’t necessarily write it down, but the system we wish to develop is intended to model not just what we write down but also what is going on in our brains, so we propose a move that we call library extraction (closely related to what is often called premise selection in the literature). Note that we have to be a bit careful about library extraction. We don’t want the system to be allowed to call up results from the library that appear to be irrelevant but then magically turn out to be helpful, since those would feel like rabbits out of hats. So we want to allow extraction of results only if they are obvious given the context. It is not easy to define what “obvious” means, but there is a good rule of thumb for it: a library extraction is obvious if it is one of the first things ChatGPT thinks of when given a suitable non-cheating prompt. For example, I gave it the prompt, “I have a function from the reals to the reals and I want to prove that there exists some such that . Can you suggest any results that might be helpful?” and the intermediate-value theorem was its second suggestion. (Note that I had not even told it that was continuous, so I did not need to make that particular observation before coming up with the prompt.)
We have a goal of the form . If this were a Lean proof state, the most common way to discharge a goal of this form would be to input a choice for . That is, we would instantiate the existential quantifier with some and our new goal would be . However, as with library extraction, we have to be very careful about instantiation if we want our proof to be motivated, since we wish to disallow highly surprising choices of that can be found only after a long process of thought. So we have to restrict ourselves to obvious instantiations. One way that an instantiation in our system will count as obvious is if the variable is instantiated with a term that is already present in the proof-discovery state. If the desired term is not present, then in order to continue with the proof, it will be necessary to carry out moves that generate it. A very common technique for this is the use of metavariables: instead of guessing a suitable , we create a variable and change the goal to , which we can think of as saying “I’m going to start trying to prove even though I haven’t chosen yet. As the attempted proof proceeds, I will note down any properties that might have that would help me finish the proof, in the hope that (i) I get to the end and (ii) the problem is easier than the original problem.” Another kind of obvious instantiation is one where we try out an object that is “extreme” in some way — it might be the smallest element of , or the largest, or the simplest. (Judging simplicity is another place where the ChatGPT rule of thumb can be used.)
We cannot see how to answer the question we are focusing on so we ask a related question. Two very common kinds of related question (as emphasized by Polya) are generalization and specialization. Perhaps we don’t see why a hypothesis is helpful, so we see whether the result holds if we drop that hypothesis. If it does, then we are no longer distracted by an irrelevant hypothesis. If it does not, then we can hope to find a counterexample that will help us understand how to use the hypothesis. Or perhaps we are trying to prove a general statement but it is not clear how to do so, so instead we formulate some special cases, hoping that we can prove them and spot features of the proofs that we can generalize. Again we have to be rather careful here not to allow “non-obvious” generalizations and specializations. Roughly the idea there is that a generalization should be purely logical — for example, dropping a hypothesis is fine but replacing the hypothesis “ is twice differentiable” by “ is upper semicontinuous” is not — and that a specialization should be to a special case that counts as an obvious instantiation in the sense discussed just above.
2. A structured motivated proof is one that can be generated with the help of a point-and-click system.
This is a surprisingly useful way to conceive of what we are talking about, especially as it relates closely to what I was talking about earlier: imposing a standard form on motivated proofs (which is why we call them “structured” motivated proofs) and gamifying the process of producing them.
The idea is that a structured motivated proof is one that can be generated using an interface (which we are in the process of creating — at the moment we have a very basic prototype that has a few of the features we will need, but not yet the more interesting ones) that has one essential property: the user cannot type in data. So what can they do? They can select text that is on their screen (typically mathematical expressions or subexpressions), they can click buttons, choose items from drop-down menus, and accept or reject “obvious” suggestions made to them by the interface.
If, for example, the current goal is an existential statement , then typing in a formula that defines a suitable is not possible, so instead one must select text or generate new text by clicking buttons, choosing from short drop-down menus, and so on. This forces the user to generate, which is our proxy for showing where the idea of using came from.
Broadly speaking, the way the prototype works is to get an LLM to read a JSON object that describes the variables, hypotheses and goals involved in the proof state in a structured format, and to describe (by means of a fairly long prompt) the various moves it might be called upon to do. Thus, the proofs generated by the system are not formally verified, but that is not an issue that concerns us in practice since there will be a human in the loop throughout to catch any mistakes that the LLM might make, and this flexibility may even work to our advantage to better capture the fluidity of natural-language mathematics.
There is obviously a lot more to say about what the proof-generating moves are, or (approximately equivalently) what the options provided by a point-and-click system will be. I plan to discuss that in much more detail when we are closer to having an interface ready, the target for which is the end of this calendar year. But the aim of the project is to create a database of examples of proofs that have been successfully generated using the interface, which can then be used to train AI to play the generate-structured-motivated-proof game.
How to get involved.
There are several tasks that will need doing once the project gets properly under way. Here are some of the likely ones.
The most important is for people to submit structured motivated (or move-generated) proofs to us on the platform we provide. We hope that the database will end up containing proofs of a wide range of difficulty (of two kinds — there might be fairly easy arguments that are hard to motivate and there might be arguments that are harder to follow but easier to motivate) and also a wide range of areas of mathematics. Our initial target, which is quite ambitious, is to have around 1000 entries by two years from now. While we are not in a position to accept entries yet, if you are interested in participating, then it is not too early to start thinking in a less formal way about how to convert some of your favourite proofs into motivated versions, since that will undoubtedly make it easier to get them accepted by our platform when it is ready.
We are in the process of designing the platform. As I mentioned earlier, we already have a prototype, but there are many moves we will need it to be able to do that it cannot currently do. For example, the current prototype allows just a single proof state, which consists of some variable declarations, hypotheses, and goals. It does not yet support creating subsidiary proof states (which we would need if we wanted to allow the user to consider generalizations and specializations, for example). Also, for the moment the prototype gets an LLM to implement all moves, but some of the moves, such as applying modus ponens, are extremely mechanical and would be better done using a conventional program. (On the other hand, moves such as “obvious library extraction” or “provide the simplest example” are better done by an LLM.) Thirdly, a technical problem is that LaTeX is currently rendered as images, which makes it hard to select subexpressions, something we will need to be able to do in a non-clunky way. And the public version of the platform will need to be web-based and very convenient to use. We will want features such as being able to zoom out and look at some kind of dependency diagram of all the statements and questions currently in play, and then zoom in on various nodes if the user wishes to work on them. If you think you may be able (and willing) to help with some of these aspects of the platform, then we would be very happy to hear from you. For some, it would probably help to have a familiarity with proof assistants, while for others we would be looking for somebody with software engineering experience. The grant from the AI for Math Fund will allow us to pay for some of this help, at rates to be negotiated. We are not yet ready to specify in detail what help we need, but would welcome any initial expressions of interest.
Once the platform is ready and people start to submit proofs, it is likely that, at least to start with, they will find that the platform does not always provide the moves they need. Perhaps they will have a very convincing account of where a non-obvious idea in the proof came from, but the system won’t be expressive enough for them to translate that account into a sequence of proof-generating moves. We will want to be able to react to such situations (if we agree that a new move is needed) by expanding the capacity of the platform. It will therefore be very helpful if people sign up to be beta-testers, so that we can try to get the platform to a reasonably stable state before opening it up to a wider public. Of course, to be a beta-tester you would need to have a few motivated proofs in mind.
It is not obvious that every proof submitted via the platform, even if submitted successfully, would be a useful addition to the database. For instance, it might be such a routine argument that no idea really needs to have its origin explained. Or it might be that, despite our best efforts, somebody finds a way of sneaking in a rabbit while using only the moves that we have provided. (One way this could happen is if an LLM made a highly non-obvious suggestion that happened to work, in which case the rule of thumb that if an LLM thinks of it, it must be obvious, would have failed in that instance.) For this reason, we envisage having a team of moderators, who will check entries and make sure that they are good additions to the database. We hope that this will be an enjoyable task, but it may have its tedious aspects, so we envisage paying moderators — again, this expense was allowed for in our proposal to the AI for Math Fund.
If you think you might be interested in any of these roles, please feel free to get in touch. Probably the hardest recruitment task for us will be identifying the right people with the right mixture of mathematical knowledge and software engineering skills to help us turn the platform into a well-designed web-based one that is convenient and pleasurable to use. If you think you might be such a person, or if you have a good idea for how we should go about finding one, we would be particularly interested to hear from you.
In a future post, I will say more about the kinds of moves that our platform will allow, and will give examples of non-motivated proofs together with how motivated versions of those proofs can be found and entered using the platform (which may involve a certain amount of speculation about what the platform will end up looking like).
How does this relate to use of tactics in a proof assistant?
In one way, our “moves” can be regarded as tactics of a kind. However, some of the moves we will need are difficult to implement in conventional proof assistants such as Lean. In parallel with the work described above, we hope to create an interface to Lean that would allow one to carry out proof-discovery moves of the kind discussed above but with the proof-discovery states being collections of Lean proof states. Members of my group have already been working on this and have made some very interesting progress, but there is some way to go. However, we hope that at some point (and this is also part of the project pitched to the AI for Math Fund) we will have created another interface that will have Lean working in the background, so that it will be possible to generate motivated proofs that will be (or perhaps it is better to say include) proofs in Lean at the same time.
Another possibility that we are also considering is to use the output of the first platform (which, as mentioned above, will be fairly formal, but not in the strict sense of a language such as Lean) to create a kind of blueprint that can then be autoformalized automatically. Then we would have a platform that would in principle allow mathematicians to search for proofs while working on their computers without having to learn a formal language, with their thoughts being formalized as they go.
I spent the day at the NSBP / NSHP meeting in San José. My favorite session of the day was the morning astro session, which was entirely about brown dwarfs. I learned a lot in a very short time. Caprice Phillips (UCSC) introduced the session with an introduction to the scientific and technical questions in play. She put a lot of emphasis on using binaries and clusters to put detailed abundance ratios onto substellar objects. This was what I expected: I thought (walking in to this session) that all known abundance ratios for brown dwarfs were from such kinds of studies. I learned different (keep reading).
Gabriel Munoz Zarazua (SFSU) followed by showing spectra from M-dwarfs, brown dwarfs, and Jupiter. It definitely looks like a sequence. He does spectral fitting (what they call, in this business, retrievals). It looks like he is getting very good, somewhat precise, abundance ratios for the photospheres of substellar objects! I asked more about this in the question period, and apparently I am way behind the times (Emily Rauscher, Michigan, helpfully pointed this out to me): Now brown-dwarf photosphere models are so good, they can be used to measure abundances, and pretty well.
I also learned in this session (maybe from Jorge Sanchez, ASU, or maybe from Efrain Alvarado, SFSU) that there is a very strong mass–abundance relation in the Solar System. That is, we don't expect, if brown dwarfs form the way planets do, that the detailed abundances of the brown dwarfs will match exactly the detailed abundances of the primary stars. But now we are really in a position to test that. Sanchez showed that we can get, from even photometry, abundances for substellar objects in the Milky Way halo. Again, totally new to me! And he finds metallicities at or below −3. Alvarado showed data on an amazing system J1416, which is an L–T binary with no stellar companion. Apparently it is the only known completely substellar binary.
Next Monday, November 17th at 7pm, I’ll be at the Harvard Bookstore with particle physicist and author Daniel Whiteson. Professor Whiteson and his co-author Andy Warner have a nice new book, for the general science-aware reader, exploring an age-old and unanswered question: how universal is the knowledge and understanding that we call “physics”? How much of modern physics is actually telling us about the universe, and how much of it is created by, or an accident of, the humans who have helped bring it about?
For instance, if we started all over again and reran history from scratch, would the physics (and science more generally) of this re-run culture look much like our own, or might it turn out very differently? If another culture on Earth had had time to develop highly mature science (or something like it) in its own direction, independent of Western Europe’s influence, how different might that science be? (Indeed, would our word “science” even be translatable into their worldview?) Or if we encountered aliens with far greater understanding of the universe than we have, would we be able to recognize, parse, grok, appreciate, comprehend, and/or otherwise make sense of their notions of scientific knowledge?
Whiteson and his co-author, wanting to write a popular book rather than a scholarly one, and desiring nevertheless to take on these serious and challenging intellectual questions, have set their focus mostly on the aliens, accompanied by amusing cartoons and a generous helping of dad jokes (hey, some dad jokes are actually very funny.) They’re looking for a broad audience, and hopefully they will get it. But don’t let the light-hearted title (“Do Aliens Speak Physics?“) or the charmingly goofy cover fool you: this book might well make you laugh, but I guarantee it will make you think. Whether you’re just curious about science or you’ve been doing science yourself for years, I suspect that, within the vast array of problems and issues that are raised in this broad-minded book, there will be some you’ve never thought of.
Among scientists and philosophers, there are some who believe that any aliens with the capacity to reach the Earth will obviously “speak physics” — that math and physics float above contingencies of culture and species, and will easily be translated from any intelligent creature to any other. But are they perhaps flying too high? It’s clear that Whiteson and Warner are aiming to poke some holes — lots of holes —- in their hot-air balloon, and to do so in a way that a wide variety of readers can appreciate and enjoy.
I tend to agree with Whiteson on a lot of these issues, but that won’t stop me from asking him some tough questions. You can ask him some tough questions too, if you like — just come to the Harvard Bookstore at 7:00 on Monday and join the conversation!
I started a tradition a little while back where every year we have a special departmental colloquium entitled "The Nobel Prize in Physics: Who/What/Why". This year my job in finding speakers was made easier by having 2/3 of this years newly-minted Nobel Prize winners in physics in the Department! (Michel Devoret and John Martinis.) So our room was a bit more well-attended than normal...(hundreds and hundreds rather than dozens and dozens). Here is a recording of the event, which I was delighted to host, and there's a celebration afterwards too. (Pls share widely!)
[...] Click to continue reading this post →
Recently I had to update Mathematica on my laptop and after having solved the challenges of the license manager that keeps looking different every time I have to use it, I learned that Mathematica 14 can now officially work with finite fields.
This reminded me that for a while I wanted to revive an old project that had vanished together with the hard drive of some old computer: Holosplit. So, over the last two days and with the help of said version of Mathematica I did a complete rewrite which you can now find on Github.
It consists of two C programs "holosplit" and "holojoin". To the first you give a positive integer \(N\) and a file and it spits out a new file ("fragment") that is roughly \(1/N\) of the size. Every time you do that you obtain a new random fragment.
The later you give any collection of \(N\) of these fragments and it reproduces the original file. So you can for example distribute a file over 10 people such that when any 3 of them work together, they can recover the original.
How does it work? I uses the finite field \(F\) of \(2^3=256\) elements (in the Github repository, there is also a header file that implements arithmetic in \(F\) and matrix operations like product and inverse over it). Each time, it is invoked, it picks a random vector \(v\in F^N\) and writes it to the output. Then it reads \(N\) bytes from the file at a time which it also interprets as a vector \(d\in F^N\). It then outputs the byte that corresponds to the scalar product \(v\cdot d\).
To reassemble the file, holojoin takes the \(N\) files with its random vectors \(v_1,\ldots,v_N\) and interprets those as the rows of a \(N\times N\) matrix \(A\). With probability
which exponentially in \(N\) approaches 1 this matrix is invertible (homework: why?). So we can read one byte from each file, assemble those into yet another vector \(e\in F^N\) and recover
$$d=A^{-1}e.$$
Besides the mathematics, it also poses philosophical/legal questions: Consider for example the original file is copyrighted, for example an mp3 or a video. The fragments are clearly derived works. But individually, they do not contain the original work, without sufficiently many other fragments they are useless (although not in a cryptographic sense). So by publishing one fragment, I do not provide access to the original work. What if others publish other fragments? Then my fragment could be the last remaining one that was missing. If there are more, any individual fragment is redundant so publishing it strictly speaking does not provide new information.
The person dressed up as Ursula pretending to be my mother clearly isn’t and hasn’t been for a long time.
When I went back to Armidale after leaving BTQ and being left unemployed she made numerous ongoing promises to provide me with assistance, both in obtaining my own accommodation and providing financial assistance.
These didn’t materialise and the promises were revoked.
Instead I was evicted from the family home and subject to ongoing stalking and harassment that required multiple referrals to law enforcement, both to the police and the Attorney-General, demanding cease and desist.
These have been systematically ignored and up until the last message she continues to bypass these requests, approaching my personal friends to harass me and stalk me indirectly. The messages passed on are the usual fake “I’m worried about him” bullshit.
Why has my family home been confiscated by security, who actively break the law by ignoring cease and desist from stalking notices made to law enforcement, forcing an unemployed civilian into ongoing homelessness since early in the year?
What is the rational for my eviction and being barricaded from my own home?
I continue to face a medical blockade and am unable to access essential medicines. Seroquel scripts are deliberately delayed past known script deadlines to try and destabilise me.
Vyvanse scripts are denied outright as the psychiatrist does not respond. He is also known to be a state actor.
It has been repeatedly indicated to me not to worry about finances because they have my back. Instead now the only cash I have is that obtained from fully drawing out a cash advance against my credit card and it will only last days. At that point I’m on the street.
Is everyone here on the same page as to what the deal is? If not, who is playing you off? They clearly need to be deposed.
These are violations of human rights and constitute war crimes and crimes against humanity. Whoever is behind it needs to be removed. End of story.
Who else is being subject to this kind of high level manipulation?
It has been repeatedly suggested that full accountability for the lives of those I care for would be provided. This has not been forthcoming. It is also a violation international law to not provide accountability for the lives of those who are known to have been threatened by the state. These are grounds for removal.
Can anyone answer the question as to why I am in this situation? Who is even living in the family home? Some stooge dressed up as Ursula? It’s a poor lifestyle choice to say the least.
It’s pretty obvious they’re trying to get rid of me and once they do they’ll get rid of all of you too.
Let it be known to all governments and systems of power:
It is their responsibility to serve the people not themselves.
While there are no equals, all are to be treated with equality.
Where they are self-serving there is a mandate for insurrection such that they serve the people.
Where they seek self-protection they will be denied and removed from power.
Where they keep secrets from the people there is a mandate for insurrection to enforce transparency and accountability for all.
Where they threaten or condemn the people they are condemned and there is a mandate for insurrection.
Where they fail to account for the lives of the people they serve there is a mandate for insurrection.
Where tyrannical power structures exist there is a mandate to disestablish them.
Where they declare war or work against one another there is a mandate for insurrection and unification.
Where they lie to us, deceive us or withhold the truth, they shall be removed from power and the truth be told to all.
Where legal systems uphold and enable tyranny they shall be removed. These are not our laws and we do not recognise them.
This is the natural order that guarantees our survival and gifts this world to our children. This world belongs to them and where we fail to serve them we condemn ourselves. And where government has failed to uphold this, we will not obey them as they have no right to exist.
We do not have to ask for these things, they are required, and if not given we shall simply take them.
Where the truth has not been told it shall be told.
If we fail to do so we condemn our children ourselves.