Planet Musings

May 28, 2026

Scott Aaronson Dispatches from the possibly last days of human relevance

As most readers have presumably heard by now, Paul Erdös’s Unit Distance Problem from 1946—one of the central open problems from the field of discrete geometry—has been solved by an internal OpenAI model. Erdös had conjectured that, given n points in the plane, at most n1+o(1) pairs of them could be unit distance apart. Using high-powered results from algebraic number theory, GPT refuted this, constructing a set with n1+ε unit-distance pairs, for ε ~ 10-38. Shortly afterward, Will Sawin, a human (!), improved GPT’s construction to get ~n1.014 pairs. Meanwhile, the best known upper bound remains n4/3, improving Erdös’s n3/2.

The entire process seems have been one-shot: my former student Lijie Chen simply gave GPT the problem, then GPT thought for a while and output a several-page argument that, on analysis by human experts, turned out to be correct. Of course there’s selection bias here; we’re not hearing as much about the hundreds of other problems GPT was given that it didn’t solve (isn’t that the case with humans too?). Clearly, too, GPT was helped by the facts that human mathematicians had wasted most of their time trying to prove Erdös right rather than looking for a counterexample, and that, even if they did look for a counterexample, they’d need to be experts in algebraic number theory to find this one, which hardly any discrete geometers are. So, maybe that suggests that AI, right now, is “merely” picking various medium-hanging fruits that human mathematicians missed for contingent reasons? With emphasis on the “right now.”

In a companion paper, OpenAI helpfully included commentary from Timothy Gowers, Noga Alon, Will Sawin, Daniel Litt, and many other experts, reflecting on the breakthrough, the path that GPT took to get to it (which can actually be seen by examining its chain-of-thought), and what this might mean for the future of mathematical research.

I heard the news maybe an hour after it broke, when some UT grad students came to my office to tell me. For what it’s worth: these students were morose, musing about how everything might soon be over for young scientists and mathematicians like themselves. I don’t know whether they’re right, but I feel like I should tell the truth about what their reaction was.

Then, a few days later, a team at DeepMind, including my UT Austin colleague Swarat Chaudhuri, announced that they were able to use a system called AlphaProof Nexus to settle nine more (!) Erdös problems, many of them in additive combinatorics, along with miscellaneous other open math problems. Notably, in this case the AI also fully formalized its proofs in Lean.

And then, just today, Jelani Nelson alerted me to a new CS theory paper, which solves a longstanding open problem about electrical flows on graphs using a proof from GPT5.5Pro.

It seems to me that we’re now over the top of this particular rollercoaster, and it will keep accelerating until we reach the bottom, wherever that might be. I don’t know whether to hope or dread that solutions to P versus NP and all our other great problems will be included in the ride—that our role, as human mathematicians, will be reduced to (at most) deciding which questions we find interesting and then understanding AI models’ answers to those questions.

But maybe that won’t happen. Maybe the new AI mathematicians will soon hit a wall, because they lack the uncomputable quantum gravity microtubules of Penrose and Hameroff, or some other magic human ingredient. The fantastical thing is that, one way or the other, we’re going to find out empirically before very long.


Readers may have also seen the news that multiple prizewinning entries in a short fiction contest called the Commonwealth Prize, give overwhelming indications of having been written by AIs. As Kelsey Piper puts it:

There are, let’s say, also some noticeable similarities in the prose style between the winning stories that were flagged for AI use. AI chatbots love metaphors and similes, and they often spit out ones that sound vaguely pleasing but are logically incoherent or ascribe properties to things that don’t make sense.

“The Serpent in the Grove” gave us, “The girl smiled like sunrise over a sink.” “The Bastion’s Shadow” says, “She carried it now in her bag, heavy as a charm.” “Mehendi Nights” describes something as “swaying against plaster like a warning bell.”

The Commonwealth Foundation, whose judges chose these stories, hasn’t exactly covered itself in glory—saying, on the one hand, that it strictly forbids AI use but on the other, that it will continue taking authors at their word that they didn’t use AI, no matter the immensity of evidence to the contrary. As many others have pointed out, judges more familiar with AI would’ve ironically been better placed to notice the signs of its use.

If only there were some sort of automated way to detect AI-generated text. Someone should really get on that problem, don’t you think?


But maybe we should just throw in the towel—as some of my colleagues have already done in the context of undergraduate projects? Maybe we should simply say that a good story is a good story, regardless of what manner of entity produced it?

As it happens, just last week I read my very first AI-written story that affected me as a story, to the extent that I wanted to read it more than once. This happened when I gave GPT5.5Pro the following simple prompt:

Write me a story about the most ancient Israelites that’s riveting like the stories of the Bible but that’s also consistent with all of the archeological evidence.

You can read the result here. One of my Facebook friends called it “disturbingly good,” and I share that assessment. Of course, I’m well aware that GPT could easily generate a thousand stories like this one—sampled from the same probability distribution—and then I could even do statistics on which tropes were the most common. This makes it feel silly to overindex on the first story that happened to be output, and yet somehow I did.


I feel like at this point, both the prophets of AI utopia like Ray Kurzweil, and of AI doom like Eliezer Yudkowsky, could be forgiven for asking: dude, will you listen to us YET? Do you still find it prudent to call this new form of terrestrial intelligence a stochastic parrot, a laughable fraud, or a fad that’s about to go away? Fear it all you want, hate it even, but at least respect it!

Which brings me to the other big AI news from the past week, namely that Pope Leo released his first encyclical, which is entitled “Safeguarding the Human Person in the Time of Artificial Intelligence.” I read it and … well, I certainly agreed with the theme that such a world-changing technology needs to be developed for the common good (as the Pope would have it, like the walls of Jerusalem), rather than for the profit or vanity of any one individual or company (in his analogy, like the Tower of Babel). I had quibbles with some of the other parts. Zvi Mowshowitz, as he often does, had a superb paragraph-by-paragraph analysis. Amusingly, there are indications that parts of the encyclical were written by AI.

To me, though, maybe the most notable part was that Chris Olah, who leads Anthropic’s interpretability team, was standing next to the Pope at the ceremony, and delivered his own remarks. I felt like Chris, who I met even before Anthropic existed, was a non-obvious yet inspired choice here, one of the rare figures in frontier AI whose technical and moral authority are both completely unimpeachable by anyone.

And so, at this momentous era for the human project, and no less of an authority than that of the Vicar of Christ himself, the Supreme Pontiff and the Successor of Peter, I hereby throw myself on the wisdom and mercy of … uhh, Chris Olah and his team at Anthropic.

Chris, if I am soon to share the earth with entities that can prove the Riemann Hypothesis and solve quantum gravity after 30 seconds of thought, then may you understand those entities well enough to cause them to be nice.

May 27, 2026

Doug NatelsonInfo gathering: Excellent intro undergrad lab courses and facilities?

Introductory undergraduate labs are a recurring challenge at nearly every university.  Is the purpose to teach students something about how experimental science works (formulating hypotheses, defining measurement needs, setting up equipment, acquiring and analyzing data)?  Is the purpose to emphasize and reinforce specific scientific points from the curriculum?  How structured should they be?  Where are there opportunities for interdisciplinary labs rather than traditional physics/chemistry/biology/earth sciences stovepipes?  

I'm interested in learning about US examples of outstanding introductory physics labs - both in their content/execution, and in the intro lab facilities that my readers consider to be particularly well done.  Please respond in the comments or via email.   I'd really appreciate your thoughts on this, even knowing that my blog readership is a highly biased sample.

(I tried launching a survey about undergrad physics lab instruction five years ago.  I got zero responses.  Hopefully this will be a little more successful.)

May 25, 2026

Tommaso DorigoA Research Position In Neuromorphic Computing And Nanophotonics Open In Padova, Italy

Five days are left to apply to a 2-year research position at INFN-Padova, to work in the context of the EIC-Pathfinder-2025 winning project "PHINDER" on the simulation of the apparatus.
PHINDER (Picosecond-scale Photonic Heterogeneous Integrated Neuromorphic Detector) is a consortium of seven research institutes led by Lulea Technology University (Sweden), including Universidad de Oviedo, Eindhoven University, Universidad de Cantabria, Lund University, Hewlett-Packard Enterprise, and INFN-Padova. 

read more

John BaezWhat Are Atoms Made Of?

This post starts with the the slides of an elementary talk I gave at Sloans Bar and Grill, in Glasgow, as part of a wonderful series called A Pint of Science. At the end I include some fascinating details which I only had time to briefly touch on in my talk. If you already know plenty of physics, this is the juicy stuff.


Start with a hair:

A red blood cell is ten times smaller across:

A flu virus is 10 times smaller across than that:

The flagellum of this cell is ten times smaller across than that:

A molecule of hemoglobin is about ten times smaller than that:

A water molecule is ten times smaller than that:

A hydrogen atom is 5 times smaller than that:

This is an actual image of a single hydrogen atom, which is amazing. (Read more here.)

But how did people figure out that atoms even exist, back before we could see individual atoms? And how did we figure out what they’re made of?

It started with chemistry. Two liters of hydrogen burn with one of oxygen to form one liter of water vapor, so we guess water is made of 2 atoms of hydrogen and 1 of oxygen: H2O. But one cubic meter of oxygen is 16 times heavier than one of hydrogen, so we guess O is 16 times heavier than H.

And so on — this took a lot of detective work, back in the 1800s.

By 1871, Mendeleev created the periodic table, listing atoms in order of weight. Here’s a modern version:

But what are the numbers on these elements? They are called atomic numbers. They are not the atom’s weights. Now we know it’s the number of electrons they have! But how did we find out about electrons?

In 1869, William Crookes made a beam by evacuating a glass tube and running an electric current though it:

In 1897, J. J. Thomson showed that this beam is made of particles less than one thousandth the mass of hydrogen atoms! We now call these particle electrons though Thomson disliked this term. (Read more here.)

Electrons are negatively charged, but atoms have no charge, so there must be something positively charged in the atom. Thomson proposed that an atom is a ball of positive charge with electrons speckled throughout it. A journalist called this the plum-pudding atom, and the name stuck. (Read more here.)

But in 1886, Eugen Goldstein created a beam moving in the opposite direction from the electrons in an evacuated tube! (Read more here.)

In 1898 Wilhelm Wien showed this beam is made of positively charged particles with about the same mass as a hydrogen atom. It took a long time to realize the full importance of these particle. We now call them protons. (Read more here.)

In 1909 Rutherford’s team in Manchester showed that the positive charge in an atom is concentrated in a small part near its center: the nucleus.

His assistants Geiger and Marsden fired particles (which we now know are helium nuclei) at some gold leaf, and some bounced back. When Rutherford saw the results of this experiment, he wrote “it was almost as incredible as if you had fired a 15-inch shell at a piece of tissue paper and it came back and hit you”. The plum-pudding model was out!

But there’s a big problem. The “atomic weight” is the little number at the bottom of each square here:

If protons account for almost all the mass of the atom, the atomic weight of an atom should be the number of protons it contains. But since the atomic weight is bigger than the atomic number, this would mean atoms would have more protons than electrons — hence be positively charged. They’re not!

Let’s not even worry now about the fact that the atomic weight is not always close to an integer. Let’s just worry about this: how could an atom have more protons than electrons, if it’s electrically neutral?

Here’s one possible solution: the nucleus contains extra electrons to cancel out the excess positive charge. Rutherford argued for this in 1920.

For example helium has atomic number 2, but atomic mass 4. With 2 electrons and 4 protons, helium would have charge +2. It doesn’t! But if it also had 2 extra electrons in the nucleus it would be neutral, as observed.

But what would make some of the electrons stay in the nucleus, while others orbit it?

In 1930, Marie Curie’s daughter Irène and her husband created a beam of electrically neutral particles by bombarding the metal beryllium
with radiation.

In Cambridge, James Chadwick carried these experiments further and proved there was a neutral particle with almost the same mass as a proton: the neutron. (Read more here.)

It became clear that the nucleus of an atom is made of protons and neutrons!

Later experiments showed a nucleus is about 1/60,000 as big across as an atom. If the hydrogen atom were a sphere 30 meters across, the proton would be a grain of salt at the center.

This raises another huge problem. If the protons are confined in the tiny nucleus, and they’re all positively charged, they must repel each other immensely! What holds them together?

But that’s another story! I’ve sketched out how we figured out the basics of atoms — that’s atomic physics. The next story would be nuclear physics. There are always new puzzles, leading us onward.


Digression: photographing a hydrogen atom

The photograph is not just a picture of the wavefunction of the electron in a hydrogen atom; it’s the result of a complicated process. For more try these:

The second is more careful than its title. It does the unusual and welcome thing of saying that “the experimentally observed nodal structures originate from the transverse nodal structure of the initial state that is formed upon laser excitation” — which is the actual claim, properly hedged, rather than “we photographed the orbital.” It also notes the comparison between measured and predicted images, which makes clear that this is a quantitative experiment with theory backing it, not a literal photograph.

Digression: J. J. Thomson

In fact, the man who discovered the electron through some of the most delicate vacuum-tube experiments ever performed was reportedly a complete disaster with his hands.

J. J. Thomson was hopeless in the lab. His assistant Ebenezer Everett forbade Thomson from touching anything delicate on the grounds that he was “exceptionally helpless with his hands”. According to Ainissa Ramirez:

For such a small man, he was a Victorian bull in a china shop. When he visited his students in the laboratory, they’d wince when he offered help, and quickly tried to move fragile things out of his way. They took deep breaths when he sat on a lab stool to speak. Life was no better at home. J. J.’s wife did not permit him to use a hammer in the house.

The irony: the 1897 electron experiments depended absolutely on glassware that almost no one in the world could make. You needed cathode-ray tubes capable of holding a vacuum so high that any ordinary tube would shatter, with metal electrodes sealed through the glass without leaking. Everett, a self-taught glassblower who had migrated over from the chemistry department, made every tube by hand. Thomson didn’t touch them.

Thomson called the particles he discovered “corpuscules”. The word “electron” was already in circulation when he made his announcement: George Johnstone Stoney had coined it in 1891 for the unit of charge in electrolysis, and Joseph Larmor was already using it in 1894 for a theoretical particle in his electromagnetic ether theory. Within months of Thomson’s April 1897 lecture, George FitzGerald suggested that the corpuscle identified by Thomson from cathode rays and proposed as parts of an atom was a “free electron,” as described by physicist Joseph Larmor and Hendrik Lorentz. While Thomson did not adopt the terminology, the connection convinced other scientists that cathode rays were particles.

Thomson dug in. He kept saying “corpuscle” through the whole period when he was actually building his case for the particle — the 1899 paper showing the photoelectric particles had the same m/e, the 1904 plum-pudding model paper, the 1906 Nobel Prize, all discussed “corpuscles.” Thomson himself continued to use the term corpuscle until 1913 — about sixteen years after his announcement, and well after the rest of the physics community had moved on. His 1906 Nobel citation reflects how oddly Thomson’s contribution was framed at the time: “His 1906 Nobel Prize was granted ‘in recognition of the great merits of his theoretical and experimental investigations on the conduction of electricity by gases,’ not for any specific discovery, let alone the electron (which he kept calling the corpuscle)”.

Part of the resistance was substantive, not just stubborn. “Electron” in Larmor’s and Lorentz’s usage was a theoretical entity tied to an ether-based electromagnetic worldview, with specific commitments Thomson didn’t want to inherit. Calling his thing a “corpuscle” let him keep it as a mechanical, material particle — a building block of the atom in his own picture — without buying into the Continental electromagnetic program.

So the textbook line “Thomson discovered the electron in 1897” smooths over a real conceptual identification that took a decade and a half, and that Thomson himself was among the last to embrace.

For more, try:

• Isobel Falconer, Corpuscles to electrons in Histories of the Electron, Buchwald and Warwick, eds., MIT Press, Cambridge, 2001.

• Ainissa Ramirez, The Alchemy of Us—How Humans and Matter Transformed One Another, MIT Press, Cambridge, Massaschusetts, 2020.

Digression: the plum-pudding atom

The “plum pudding” name was invented by an anonymous popular-science writer in 1906. You can see a careful historical reconstruction here:

• Giora Hon and Bernard R. Goldstein, J. J. Thomson’s plum-pudding atomic model: the making of a scientific myth, Annalen der Physik 525 (2013), A129–A133.

Hon and Goldstein tracked down the earliest occurrence: an anonymous reporter in the British pharmaceutical trade magazine The Chemist and Druggist, August 1906, who described the atom as having “minute specks, the negative corpuscles, swimming about in a sphere of positive electrification, like raisins in a parsimonious plum-pudding”. “The analogy was never used by Thomson nor his colleagues. It seems to have been coined by popular science writers to make the model easier to understand for the layman.”

In fact, by the time the Chemist and Druggist writer reached for the pudding image in 1906, Thomson’s model had already moved past the version it was supposedly describing. As Hon and Goldstein put it: “according to Thomson’s theory of 1906, the electrons revolve on rings about the center of the atom, and they are not distributed throughout the ‘pudding’ as ‘raisins swimming about.'” The pudding image was obsolete from the moment it was coined.

What Thomson actually used as his guiding analogy was much more interesting: floating magnets. Thomson drew an analogy with experiments by Alfred Marshall Mayer (1836–1897). Piercing small magnetic needles into corks and letting them float in water below a strong magnet, Mayer had observed in 1878/79 that the magnetized floating needles quasi-automatically positioned themselves in characteristic configurations depending on their number. With more than six magnetic needles present, a seventh and eighth would inevitably position itself inside the outer ring of six. As the number of floating magnets increased, more and more rings would form. Thomson hoped that a similar ring-structure composed of corpuscles could be found inside chemical atoms, and suspected that each of these rings would be analogous to the chemical periods. So in Thomson’s head, the atom looked like Mayer’s needle-in-cork rings — dynamic, self-organizing, suggestive of the periodic table, not a dessert.

Another question: was the model even Thomson’s alone? Britannica notes that the model was “proposed about 1900 by William Thomson (Lord Kelvin) and strongly supported by Sir Joseph John Thomson, who had discovered (1897) the electron”. Kelvin had been working on similar uniform-positive-sphere ideas a few years before J.J. Thomson’s 1904 Philosophical Magazine paper, so the standard “Thomson’s plum pudding” of textbooks elides both a co-originator and the actual originator of the name.

• A. M. Mayer, Floating magnets, Nature 17 (1878), 487–488.

• Klaus Hentschel, Atomic Models, J.J. Thomson’s ‘Plum Pudding’ Model, in Compendium of Quantum Physics, eds. D. Greenberger, K. Hentschel and F. Weinert, Springer, Berlin, 2009.

• John Heilbron, J.J. Thomson and the Bohr atom, Physics Today 30 (1977), 23–30.

• Wikipedia, Plum pudding model.

• Britannica, Thomson atomic model.


Digression: Eugen Goldstein

In Crookes’ tube, electrons flew from a negatively charged metal tip (the cathode) to the positively charged one (the anode</i). Thus, electrons were called cathode rays, and this apparatus was called a cathode ray tube.

By the mid-1880s, the phenomena created by these tubes was a hot but baffling field. There was a vigorous debate about whether cathode rays were charged particles (the Crookes/Schuster view) or some kind of wave in the aether (the German view, which Goldstein himself leaned toward). In 1876, Goldstein discovered that cathode rays were emitted perpendicular to the cathode surface. This inspired the design of concave cathodes to produce concentrated or focused rays. So experimenters were routinely fiddling with cathode geometries — different shapes, different sizes, different configurations — to study cathode ray behavior. Goldstein spent his time doing this.

Eventually, around 1886, he tried drilling small holes in the cathode — Kanäle, meaning “channels” or “canals”. He discovered something surprising: wth the tube running, a faint glow appeared streaming out the back of the cathode, opposite to the direction of the cathode rays. Busch writes: “Goldstein therefore called them ‘canal rays’ (German ‘Kanalstrahlen’). Since the canal rays went in the opposite direction as the ‘cathode rays’, Goldstein speculated that these rays consisted of positively charged particles.”

Later these canal rays were called “anode rays” as a parallel to “cathode rays,” but it’s somewhat misleading — these rays don’t actually originate at the anode. They “are produced by positively charged ions after impact ionization in the cathode ray tubes. These ions are accelerated towards the cathode, pass — if the cathode is provided with holes — through these holes (‘channels’) due to their inertia and can be detected behind the cathode by their luminous phenomena.” So they’re lone protons formed in the gas, accelerated toward the cathode, that shoot through the holes by inertia and emerge on the far side.

• Britannica, Eugen Goldstein.

• Wikipedia, Eugen Goldstein.

• Encyclopedia of Scientific Biography, Eugen Goldstein.

• Uwe Busch, Claims of priority — The scientific path to the discovery of X-rays, Z. Med. Phys. 19 (2023), 230–242.

Digression: Wilhelm Wien

Wilhelm Wien, famous for his work on blackbody radiation, was working at Aachen when in 1898 he turned to the problem Goldstein had left dangling. Goldstein himself had tried to deflect canal rays magnetically and failed. Using the strongest magnet he had, one that certainly had an effect on the cathode rays, Goldstein attempted to deflect his canal rays, but he observed no change in their path. So for twelve years the rays were a sort of mystery — clearly something real, but resistant to the magnetic deflection trick that was the standard probe of the era.

Wien managed to bend canal ways using brute force. To deflect the cathode rays, you only needed a modest field. To deflect the canal rays, Wien needed a magnetic field of 3250 gauss and a electrical field of 2000 V — and his rays moved a grand total of 6 mm. The thing was enormously more sluggish to deflect than electrons, and this meant the particles were vastly more massive per unit charge. Goldstein hadn’t been doing anything wrong — he just hadn’t been hitting them hard enough!

The really nice piece of physics is how Wien figured out the charge/mass ratio of canal rays. He invented what we now call the Wien filter: a charged-particle velocity filter with orthogonal electric and magnetic fields. The trick is elegant. The magnetic force on a particle depends on its velocity; the electric force doesn’t. So if you tune the fields against each other, only particles of one particular velocity sail straight through undeflected — everything else gets bent. Once you know the velocity, the deflection in a pure magnetic field tells you charge-to-mass. It’s a beautiful idea, and it’s still the working principle behind ion-beam instruments today.

Wilhelm Wien analyzed these positive rays in 1898 and found that the particles have a mass-to-charge ratio more than 1,000 times larger than that of the electron. Because the ratio of the particles is also comparable to the mass-to-charge ratio of the residual atoms in the discharge tubes, scientists suspected that the rays were actually ions from the gases in the tube. So Wien did not discover the proton, contrary to what’s sometimes claimed. What he found was:

• The deflection was in the opposite direction from cathode rays — so the particles were positively charged.

• The charge-to-mass ratio depended on the gas in the tube, not on the cathode material — so these weren’t some universal particle like the electron; they were ions of whatever gas you’d filled the tube with.

• The mass-to-charge ratio was comparable to atomic masses, meaning these were essentially atoms missing electrons.

The hydrogen ion — which we now call the proton — was just one of many ions Wien could produce, and there was nothing special about it in his work. The identification of the hydrogen ion as a fundamental constituent of all nuclei came much later, with Rutherford’s nuclear-scattering experiments and his 1919 demonstration that he could knock hydrogen nuclei out of nitrogen. Rutherford named it the proton around 1920.

So the lineage is this: Goldstein 1886 saw a glow, Wien 1898 showed it was ionized gas, and finally Rutherford 1919–1920 identified the hydrogen ion as a universal nuclear building block.

Digression: Discovery of the neutron

The discovery of the neutron was an extremely complicated business, starting with the the problem of understanding atomic mass (Z) and atomic number (A) in the periodic table.

Prout, isotopes, and the Z-vs-A puzzle. The puzzle has roots in William Prout’s 1815 hypothesis that atomic weights are integer multiples of hydrogen, suggesting hydrogen as a building block for all the other elements. Through the nineteenth century this looked broken: chlorine sat at 35.5, and most elements were not clean integers. Two developments rehabilitated it. Soddy in 1913 introduced the concept of isotopes — the same element can exist in chemically identical forms of different mass — explaining the non-integer atomic weights as weighted averages. J.J. Thomson’s 1912–13 work on neon gave the first stable-element example, and Aston’s mass spectrograph (1919 onwards) established what Aston called the “whole-number rule”: individual isotopes have atomic masses very nearly integral in units of hydrogen, with small deviations here and there. In parallel, Moseley’s 1913 work using X-rays pinned down atomic number Z as the nuclear charge, not just an ordinal label. Putting these together you got the central puzzle: for elements heavier than hydrogen, the atomic mass A is larger than the atomic number Z!

The proton-electron nucleus. The obvious move was: maybe the nucleus contains A protons and A − Z electrons, giving the right charge and mass. The model was appealing on three grounds beyond accounting. First, only two particles were known (proton and electron), so parsimony favored it. Second, beta decay does eject electrons from nuclei, so the electrons appeared to be there to begin with. Third, alpha particles (helium-4 nuclei) could be assembled as 4p + 2e, neatly. Rutherford himself adopted this picture in his 1920 Bakerian Lecture:

• Ernest Rutherford, Nuclear constitution of atoms, Proc. Roy. Soc. A 97 (1920), 374–400.

In that lecture he went further: he proposed that some proton-electron pairs inside the nucleus might be bound so tightly as to form a compact neutral unit: “the idea of the possible existence of an atom of mass one, which has a zero nuclear charge”. He floated the term “neutron” for it. (W.D. Harkins coined the same term independently in 1921.) Note that Rutherford’s neutron was not a new elementary particle; it was a compound, a sort of miniature hydrogen atom. This conception persisted for over a decade and shaped how the discovery was eventually interpreted.

Three killer problems with electrons in the nucleus. Through the 1920s, three independent objections accumulated.

The confinement problem. Confining an electron to a nucleus of radius ~10⁻¹⁴ m gives, by the uncertainty principle, a momentum spread of order ℏ/r and hence a kinetic energy of order ~20 MeV — far larger than any nuclear binding energy then known, and far larger than the energies of beta-decay electrons. With Dirac’s relativistic electron theory (1928), things got worse: Klein’s paradox showed that electrons in sufficiently strong potentials don’t bind cleanly at all. Bohr was so disturbed by this that around 1929–30 he was openly speculating that energy and momentum conservation might fail at nuclear scales.

The nitrogen-14 anomaly. This is the cleanest of the three and the one that broke the model. Studies of the spectrum of molecular nitrogen in the late 1920s showed that the nitrogen-14 nucleus is a boson, hence must have integer spin. (Direct measurement gave spin 1.) But in the theory where a neutron is a proton-electron combination model, nitrogen-14 would be 14p + 7e = 21 spin-½ fermions, which must have half-integer total spin and thus be a fermion. The model predicted the wrong thing for the most common nitrogen isotope!

The continuous beta spectrum. Chadwick himself, back in 1914, had shown that beta decay produces a continuous energy spectrum, not a line spectrum. Ellis, Wooster, and (decisively) Meitner and Orthmann’s 1929 calorimetric experiment confirmed energy was apparently not conserved. Pauli’s famous December 1930 letter “Dear Radioactive Ladies and Gentlemen” proposed a neutral, spin-½, light particle inside the nucleus to fix both energy conservation and the spin-statistics problem simultaneously — he called his particle a “neutron” too. (Fermi renamed it “neutrino” in 1933 after Chadwick’s discovery, to avoid the name collision.)

So by 1930 there were three structurally independent reasons to think the proton-electron nucleus was wrong, and Rutherford had a candidate replacement that nobody had yet found.

The 1930–32 experimental crescendo. The discovery story is what the Encyclopedia of Scientific Biography calls a “Tale of Three Cities” — Berlin, Paris, and Cambridge. “Walther Bothe and his assistant Herbert Becker [were] working in the Physikalisch-Technische Reichsanstalt (Imperial Physical-Technical Institute) in Charlottenburg, a suburb of Berlin; Irène Curie and her husband Frédéric Joliot working in the Institut du Radium in Paris; and James Chadwick working in the Cavendish Laboratory in Cambridge. The story reached its crescendo between June 1930 and February 1932.”

Berlin, 1930. Walther Bothe and Herbert Becker observed that bombarding beryllium with alpha particles emitted a highly penetrating, electrically neutral radiation. They, along with most of the scientific community, incorrectly assumed this radiation was exceptionally high-energy gamma rays. Lead absorption coefficients suggested gamma rays of unprecedented energy — odd, since the nuclear binding energies available from α + ⁹Be shouldn’t have been enough.

• Paris, late 1931 – January 1932. Irène Joliot-Curie and Frédéric Joliot, with the world’s strongest polonium alpha source (an inheritance, essentially, from Marie Curie’s lab), pushed the Bothe radiation through a paraffin window. They found that this radiation knocked loose protons from hydrogen atoms in that target, and those protons recoiled with very high velocity. They interpreted the result as the scattering of high-energy photons off protons — which, as Chadwick noted in his Feb 27, 1932 Nature letter, required “that the beryllium radiation had a quantum energy of 50 × 10⁶ electron volts”. That was about ten times more energy than any plausible nuclear gamma source could produce. Curie and Joliot reported their findings on 18 January 1932, and the Comptes Rendus arrived in Cambridge within a couple of weeks.

Cambridge, February 1932. Chadwick read the paper and, primed by Rutherford’s 1920 prediction and his own decade-long search for the neutron, instantly suspected the right answer. Rutherford’s reported reaction was characteristic: “I do not believe it!” Chadwick built the experiment immediately — borrowed polonium, the Cavendish’s beryllium, an ionization chamber. The crucial idea was simply to substitute multiple other targets for the paraffin: hydrogen, nitrogen, helium, lithium. If the projectile is a massive neutral particle of unknown mass m, you can work out the maximum recoil velocity of a nucleus of mass M. Comparing the maximum recoil velocities for hydrogen and nitrogen, Chadwick could solve algebraically for m. He found m is close to the proton mass. Three weeks after reading the Paris paper, he submitted a paper on the possible existence of a neutron on February 17th, 1932, which was published ten days later:

• James Chadwick, Possible existence of a neutron, Nature 129 (1932), 312.

He sent a fuller, more confidently titled paper to the Royal Society on the 10th of May:

• Chadwick, The existence of a neutron, Proc. Roy. Soc. A 136 (1932), 692–708.

Aftermath: from compound to elementary. Chadwick himself, in 1932, still followed Rutherford and described the neutron as a tight proton-electron compound — he wasn’t yet claiming a new elementary particle. The shift came over the next two years. Heisenberg’s three “On the structure of atomic nuclei” papers (July–December 1932) treated the proton and neutron as two states of a single ‘nucleon’ with the strong force coupling them via isospin — a structure that doesn’t work at all if the neutron has an electron lurking inside it. Fermi’s January 1934 theory of beta decay treated the electron as created at the moment of decay (the electromagnetic-radiation analogy: photons aren’t “in” the atom before emission either), removing the last motivation to put electrons in the nucleus. And the final empirical nail, suggested to Chadwick by the young Maurice Goldhaber, was photodisintegration of the deuteron: γ + d → p + n. By energy conservation, the neutron mass came out at greater than 1.0077 and less than 1.0086 in atomic mass units. Thus, the neutron was more massive than the hydrogen atom which had an accurately determined mass of 1.0078 amu. A bound p+e would weigh less than free p plus free e (binding energy is negative); since the neutron weighs more than a hydrogen atom, it cannot be such a bound state. By 1934 the neutron was firmly an elementary particle.

Primary sources and good histories:

• Lawrence Rutherford, Bakerian Lecture: nuclear constitution of atoms, Roy. Soc. Proc. A 97 (1920), 374–400.

• Irène Joliot-Curie and Frédéric Joliot-Curie, Émission de protons de grande vitesse par les substances hydrogénées sous l’influence des rayons γ très pénétrants, Comptes Rendus 194 (1932): 273.

• APS News, Chadwick reports the discovery of the neutron, May 1932.

• S. M. Bilenky, Neutrino: history of a unique particle.

• Roger Stuewer, The nuclear electron hypothesis, in Otto Hahn and the Rise of Nuclear Physics, Reidel, 1983.

Doug NatelsonThe Manhattan Project and public communication

The Manhattan Project was the largest government sponsored research and development project of its time.  Some things worth noting, in light of the present US government attitude toward science:

  • It's hard to overstate the role played by immigrant scientists in this story.  Szilard, Einstein, Fermi, Wigner, Teller, von Neumann, and many more.  
  • I was trying to remember when the Manhattan Project became publicly known in any detail.  It turns out, within three days of the US bombing of Nagasaki, the US released a tidily written report headlined by Henry DeWolf Smyth on all the essentials, including the administrative story of how the project came to be and was managed.  That report is available in many forms, including this cute version on the internet archive and simple pdf files at DOE and Princeton.  It's an outstanding piece of clear, spare writing.  It almost boggles the mind: Here was a technical topic that the national leadership considered important for the public to understand (!), so a highly readable report was prepared and released basically immediately following public knowledge of the bombs. (!!)
  • The National Academies played a pivotal role in this story.  On page 51:  "In the spring of 1941, Briggs, feeling that an impartial review of the problem was desirable, requested [presidential science adviser Vannevar] Bush to appoint a reviewing committee. Bush then formally requested F. B. Jewett, president of the National Academy of Sciences, to appoint such a committee. Jewett complied, appointing A. H. Compton, chairman; W. D. Coolidge, E. O. Lawrence, J. C. Slater, J. H. Van Vleck, and B. Gherardi."  Once upon a time, the national leadership respected the National Academies and trusted them to provide impartial, accurate scientific advice to inform policy.  Somehow I doubt that Frank Baldwin Jewett, president of the NAS at the time, was worried that the government would cut off funding to the Academy if they didn't toe the line.  (As far as I know, no one from the Roosevelt administration was taking “donations” for lucrative government contracts on the bomb, and no one from the cabinet or the Department of War were personally betting for profit on whether it would work, either, but I digress.)
Just some food for thought.

May 24, 2026

Jordan EllenbergNicknames for Ottawa, Illinois

  • Radium City
  • The Friendly City
  • The Town of Two Rivers
  • The City on Sand
  • The Town of Stovepipe Hats

That is a truly world-class roster of municipal nicknames.

Jordan EllenbergMemorial Day

Tomorrow is Memorial Day.

William James, in 1897, dedicating the monument to Col. Robert Gould Shaw and the 54th regiment, a monument which still stands in Boston Common today:

“And when South Carolina took the final step in battering down Fort Sumter, it was the fanatics of slavery themselves who called upon their idolized institution ruin swift and complete. What law and reason were unable to accomplish, had now to be done by that uncertain and dreadful dispenser of God’s judgments, War – War, with its abominably casual, inaccurate methods, destroying good and bad together, but at last able to hew a way out.”

More James:

“What country under heaven has not thousands of such youths to rejoice in, youths on whom the safety of the human race depends? Whether or not they leave memorials behind them, whether their names are writ in water or in marble, depends mostly on the opportunities which the accidents of history throw into their path”

The monument, and James, and the 54th regiment, and the atom bomb, are all in Robert Lowell’s poem “For the Union Dead”:

“He is out of bounds now. He rejoices in man’s lovely,
peculiar power to choose life and die—
when he leads his black soldiers to death,
he cannot bend his back.”

Shaw died in the assault on Fort Wagner too. William James’s younger brother was a volunteer in the 54th, and was gravely wounded; his presence there goes unmentioned in James’s speech.

James concludes by emphasizing that military valor is not the only kind of valor, not even the only kind that Col. Shaw himself displayed and was cast in bronze for.

“What we really need the poet’s and orator’s help to keep alive in us is not, then, the common and gregarious courage which Robert Shaw showed when he marched with you, men of the Seventh Regiment. It is that more lonely courage which he showed when he dropped his warm commission in the glorious Second to head your dubious fortunes, negroes of the Fifty-fourth. That lonely kind of courage (civic courage as we call it in times of peace) is the kind of valor to which the monuments of nations should most of all be reared, for the survival of the fittest has not bred it into the bone of human beings as it has bred military valor; and of five hundred of us who could storm a battery side by side with others, perhaps not one would be found ready to risk his wordly fortunes all alone in resisting an enthroned abuse. The deadliest enemies of nations are not their foreign foes; they always dwell within their borders. And from these internal enemies civilization is always in need of being saved. The nation blest above all nations is she in whom the civic genius of the people does the saving day by day, by acts without external picturesqueness; by speaking, writing, voting reasonably; by smiting corruption swiftly; by good temper between parties; by the people knowing true men when they see them, and preferring them as leaders to rabid partisans or empty quacks. Such nations have no need of wars to save them.”

John BaezThe Great Icosahedron

I never knew what was so great about the ‘great icosahedron’. Now I do.

Take a regular icosahedron whose vertices have coordinates in the field ℚ[√5], which consists of numbers a + b√5 with a and b rational.

Apply the nontrivial element of the Galois group of this field: that is, simply replace √5 by -√5 in all your formulas. You get the great icosahedron:

What do I mean by this, exactly? For example, take the regular icosahedron whose 12 vertices are

(±1,±Φ,0),   (0,±1,±Φ),   (±Φ,0,±1)

where Φ = (1 + √5)/2 is the golden ratio. Form a bunch of

• points (all the vertices of the icosahedron),
• lines (containing all the edges of the icosahedron), and
• planes (containing all the faces of the icosahedron)

Now replace √5 by -√5 in all your formulas. You get the equations for a new bunch of points, lines and planes. And:

• the points are the vertices a great icosahedron;
• the lines contain all the edges of this great icosahedron;
• the planes contain all the faces of this great icosahedron.

So, replacing √5 by -√5 makes the icosahedron great!

On Mastodon, J. M. animated this using an interpolation process called ‘tweening’:

If you apply the Galois transformation again you get back the icosahedron. If you apply it yet again you make the icosahedron great again.

I thank and andeux for their help on this.

This fact is asserted without proof in the section on regular star-polytopes near the end of this book:

• John Horton Conway, Heidi Burgiel and Chaim Goodman-Strauss, The Symmetries of Things, A K Peters, Natick, Massachusetts, 2008.

Let me sketch an uninspired, purely computational proof. Take each of these twelve icosahedron vertices:

(±1,±Φ,0),   (0,±1,±Φ),   (±Φ,0,±1)

and find its five nearest neighbors. Then apply the Galois transformation, which amounts to replacing Φ with -1/Φ. You get twelve new points

(±1,±1/Φ,0),   (0,±1,±1/Φ),   (±1/Φ,0,±1)

which turn out to be vertices of a new, smaller icosahedron!

Then, for each vertex of the original icosahedron, see where it goes under this transformation, and see where its nearest neighbors go. You’ll see they become second nearest neighbors.

Thus the edges of the original icosahedron, which connect nearest neighbor vertices, go to edges connecting second nearest neighbors of a new smaller icosahedron.

What about the faces? The faces of the original icosahedron are equilateral triangles whose corners are triples of icosahedron vertices that are all nearest neighbors of each other. So, they get sent to triangles whose corners are triples of vertices that are all second nearest neighbors.

And these facts characterize the great icosahedron: it has the vertices of a regular icosahedron, edges connecting all pairs of second nearest neighbor vertices:

and faces formed by all triples of second nearest neighbor vertices. You can’t see any of these triangular faces in its entirety in this picture—but if you look hard you can see most of one.

These are all the possible squared distances between vertices in the original icosahedron and the new great icosahedron:

Original icosahedron   Great icosahedron
0 0
4 4
4 + 4Φ ≈ 10.47 4 − 4/Φ ≈ 1.53
8 + 4Φ ≈ 14.47 8 − 4/Φ ≈ 5.53

To get the new squared distances from the old ones we simply replace Φ by -1/Φ, as we must, since Galois transformations preserve the field operations and squared distances are computed using field operations.

The nearest neighbors had distance squared 4, and these must be mapped to new points that still have distance squared 4—but now these are second nearest neighbors.

Of course it would be nice to have a deeper, more general understanding of what the Galois transformation sending \sqrt{5} to -\sqrt{5} does to the geometry of shapes made from regular pentagons. Soon I’ll dig a bit deeper into this!

Tommaso DorigoApply For The USERN Prize, Win Cash, And Get A Keynote Talk In Astana

Nowadays it has become exceedingly hard to distinguish legitimate academic endeavours from scam in my mailbox. Not even AI filters can sort stuff out properly: my inbox often contains invitations to fake conferences, or to publish with non-existing journals, while my spam folder at times contains honest invitations of academic value.

I could touch the reality of the problem a few months ago, when I was invited to an AI conference in Singapore. I was about to trash the email, when something in the name of the sender rang a bell. Upon checking, it turned out that he was a Nobel prize winner in Physics! Needless to say, I was happy to accept the invitation, and indeed in two weeks I will travel to Singapore to deliver my talk at AI4X.

read more

Tommaso DorigoA Record Of Past Activities

Every now and then, for one reason or another, an academic will have to update one's own CV. This is a chore in general - once you get tenure, why should you care to keep a detailed record of your past activities? - but it also carries some benefits. In fact, by sifting through the data (hard disk folders containing talks, large databases of publications, mailbox) you can get a bird's eye view of where your time has gone, and draw inspiration for future rationalization of your agenda.

read more

May 23, 2026

Doug NatelsonBrief items - news roundup, AI, international issues, good reading

Several items worth reading about as we head into a long weekend in the US.  Starting with news related to funding and other aspects of US government policy:

  • US government taking equity stakes in some quantum information sciences companies while investing around $2B (seemingly from the Department of Commerce and the CHIPs Act resources.  (Non-paywall news story here).  This raises a number of thorny issues. 
  • Some US funding agencies (NIH, NASA) are enacting restrictions (Science article here, Inside Higher Ed article here) on publishing scientific papers with non-US coauthors.  It's understandable that US funding agencies are concerned about the possibility US funds directly or effectively supporting researchers in foreign countries.  This is not that, though.  Some people making policy seem to be moving toward wanting to ban any co-authorship, but even the agencies seem confused about what they want.
  • In a move that will stress out many non-US-citizens in the country, the administration is floating making people leave the US to apply for green cards (PBS article here).  This just was sort of announced yesterday, so I don't know anything about this other than on its face it sounds to me like a terrible idea for multiple reasons.  
  • The AAAS is pushing for a Senate hearing on the nominee for NSF director, on the theory that this issue and the nominee at least need to be discussed in a public forum rather than coasting along without a NSB and no end in sight to interim leadership.
  • It would seem that some Republican congresspeople are pushing the idea of de-funding the National Academies.  This is directly related to the issues mentioned here.  I think the National Academies should be endowed and thus not so reliant on federal funding; this would be a way to make sure that they always feel secure in delivering reports even if the customer is a part of the government and the conclusions might be something the customer doesn't want to hear.
There was a lot of AI-related news this week:
  • There were three papers published in Nature about using AI agents to do science (here, here, and here, with a news and views).  The first two papers are both about drug discovery research, and the third is about using AI to help write scientific software models (also medically related).  It'll be interesting to see how this progresses.  
  • One of OpenAI's tools solved an Erdos problem (that's the OpenAI release) by finding a counterexample to a conjecture long thought to be true.  Here is the accompanying paper, which includes commentary by several esteemed mathematicians.  The commentary parts of the paper are very much for non-mathematicians and fascinating to read.  It seems like the AI tools are genuinely good at pulling together complex arguments, and that so far a key advantage they have is an exhaustive familiarity with the full breadth of the literature.
  • Unsurprisingly, university graduates are not fans of AI.  This cartoon from this week's New Yorker is topical.  

 Additional suggestions that look cool but I haven't had time to actually read:

May 22, 2026

Matt von HippelArXiv Will Ban You for Hallucinated References

Thomas Dietterich, Chair of the Computer Science section of the preprint server arXiv.org, recently clarified the site’s policies towards “hallucinated” citations and other signs of careless use of AI in a post on X. If your paper contains a citation to a paper that doesn’t actually exist, or has other signs you didn’t read it before posting like leftover commentary (the example he gave was “here is a 200 word summary; would you like me to make any changes?”), then you can get banned from the arXiv for one year. Even after that year you’d be on a kind of “probation”, and would need to show that your next few papers had been accepted by peer-reviewed journals first before posting them.

At the risk of saying the obvious, this is a good idea! arXiv isn’t peer review, it isn’t meant to judge the value of the papers it hosts. But it still needs to be a useful place for scientists to post their papers, which is why they try to keep spam and irrelevant content to a minimum. If you don’t actually endorse the content of a paper, you shouldn’t post it in the first place.

That said, the whole existence of hallucinated citations on arXiv feels a little silly. It makes sense for academic journals and preprint servers in other fields. But arXiv was the first site of its kind for a reason. Its users, physicists, mathematicians, and computer scientists, don’t need much hand-holding when it comes to computers. Papers submitted to arXiv aren’t typically written in Word, they’re written in a document-writing language called LaTeX, that lets users make decently-formatted papers without help from a journal. Physicist-written code may be terrible by any reasonable criteria…but it exists, much more universally than for example biologist-written code.

This extends to citations. In my old field, there is a database called INSPIRE that updates automatically from arXiv. Click on a paper, and a handy “cite” link gives you standardized citations in several formats, ready to copy and paste into your LaTeX code. Nearly every citation in my papers is copied from there. The ones that aren’t are either from other fields where I didn’t know of that style of database, or things that haven’t been published (this can be manuscripts in preparation, or personal communications).

All of this, though, feels like a lot less than what the field could be doing. In a world where almost everyone posts their papers to the same website, and almost everyone has at least a rudimentary understanding of programming…why are people still writing citations in free-form text in the first place? Why aren’t citations built in to the submitted papers on arXiv, automatically linked to the papers they cite? Why don’t we have a setup where, except for a small number of “special” citations, every citation is built so that it automatically goes to a real paper, and gives a clear error message if it doesn’t? In short, why are hallucinated citations even possible?

Look, I’m naive, I get that. I believe in automation, not in the modern context of LLMs and other heuristics, but in setting clear procedures and building clear rules. The world doesn’t work that way! The clear rules are always more contentious than you expect, the fuzzy human-led version always the only choice people can agree on.

But still. Citations. There has to be a better system, right?

Jordan EllenbergIs there a higher Belyi theorem?

I was thinking the other day about this old question, a favorite of mine.

Here’s another version of it — if anything along the lines of Belyi’s proof of Belyi’s theorem works, one might expect this to be true.

Question: Suppose given five points p_1, .. p_5 in P^1(Q(t)). Is there a map F from P^1 to P^1 such that the the critical values of F and F(p_1), … F(p_5) are all contained in a set of size four?

This would be “Belyi for M_{0,5},” I suppose. I have to admit, I feel like the answer is probably no.

Why it seems hard: every set of p_1, .. p_5 obtainable is obtained from some family of covers of P^1 branched at four points — in other words, it’s the image in M_{0,5} of some 1-dimensional Hurwitz space. So you have some countable union of curves in M_{0,5}. Now all you have to do is write down a curve in M_{0,5} which is not one of these. But… how do you know you can do this? If you were allowed to work over C, it’s easy! But there are only countably many curves in M_{0,5} defined over Q, or for that matter over Qbar. Maybe the Hurwitz curves use them all up! I doubt it. But I don’t see how to rule it out. Stranger, or at least equally strange things have happened; I am thinking, of course, of Belyi’s theorem itself, which one simply can’t believe until one sees the proof, and even then, only just.

It’s not obivous to me that this is hard. There might be a quick way to write down a p_1, .. p_5 for which there’s no such F. Good problem to chew on with the help your favorite example-generating large language model! (So far I couldn’t get Claude Opus 4.7 to have any luck with it, but your mileage may vary.)

May 21, 2026

Jordan EllenbergBrewers 5, Cubs 2

Last night: my second visit to Wrigley Field, once again to see the Brewers. Did I blog the last one? Doesn’t seem I did. I went with CJ and a bunch of his friends (whence the aphorism.) Brief description, because it’s late.

At least three Baltimore chops in this game.

Lines for food at Wrigley are at least an inning long. Shouldn’t this be a solved problem?

My first time seeing Les Miz pitch. Maybe my last, as I have to believe that a guy can’t throw this fast this consistently without his arm flying off his body sometime very early in his career. Prior to this game, Misiorowsky had thrown 233 pitches at 100+mph. All other MLB starters combined: 149.

Brewers cruise most of the game, Cubs getting 3-hit, they rally in the eighth against unusually shaky Aaron Ashby, loading the bases and scoring a run on what should have been an inning-ending grounder when Luis Rengifo in a very ex-Angel fashion fumbled the pickup. But Chad Patrick got Michael Conforto to ground out to end the threat.

Impressive organ work at Wrigley. Runner dives back to first to avoid the pickoff: “Get Back,” by the Beatles. Batter fouls one off high above his head: “Straight Up,” by Paula Abdul. Best of all was the organist’s welcome for Garrett Mitchell: the theme song from “The Facts of Life.” How many people at Wrigley Field last night got that reference? Well, I got it, Wrigley Field organist, and I appreciated it.

Jordan EllenbergWittgensteinian aphorism (after six hours spent in the company of college undergraduates)

If something is the case, it is low-key the case.

John PreskillNicole’s guide to handling failure and rejection

It’s happening. 

Your inbox registers an email from the chair of a faculty-hiring committee. With trembling fingers, you click on the message. “We’ve been grateful for the opportunity to learn about your work…The decision was very difficult…many highly qualified candidates…” Months of labor, soul-searching, strain, and anxiety give way to despair. The committee has filled the position, and not with you.

I recently published advice about how to proceed if you receive an offer of a faculty position. But what if you don’t receive an offer—what if hiring committees reject you? Or scholarship committees, grant committees, admissions committees, or potential advisors? What if a journal referee shreds your magnum opus? Or a program committee declines your submission to a conference?

Failure suffuses science as dinner suffuses a nighttime diaper, for two reasons. First, undertaking science is difficult. By “undertaking science,” I mean formulating and proving theorems, cajoling equipment into working, identifying bugs in code, extracting meaning from noisy data, etc. Second, science doesn’t unfold in a vacuum. Human beings do science within a society fraught with opinions, emotions, and limited resources. These challenges lead to the failures and rejections that this article addresses most. (If you’re a member of my group and you’re facing a failure due to the difficulty of undertaking science, come talk with me.) 

What can you do if failure or rejection ails thee? The rest of this article prescribes, in chronological order, steps that will help you recover. You can even turn your lemons into, if not lemonade, then not-entirely-unappetizing lemon meringue pie.

“O what can ail thee, knight-at-arms, alone and palely loitering?”
“The scientific life is rough.”

Immediately after receiving the news:

  • Read the communication once—or, if you must, twice. Don’t linger over the letter for longer than necessary.
  • Put the communication away, so you won’t see it unless you try to. If the news came via email, remove the message from your inbox.
  • Lower your heart rate. Electrified with anger? Expend that energy; go for a walk, for a run, or to the gym, if possible. If you can’t, breathe deeply for several minutes, extending your exhalations.
  • Review records of your successes. I maintain a folder called “Nice messages.” It contains notices of awards I’ve received, kudos on papers I’ve published, messages such as “Thanks for everything this past semester! Your class was probably my favorite one,” and every compliment I’ve received from the taciturn John Preskill via email. Review evidence—remind yourself—that you’re not a failure even though you’re experiencing failure or rejection.

Avoid thinking about the news for a few days. Imagine cutting yourself on a kitchen knife. The wound may sting and bleed initially. But the blood clots and the stinging diminishes if you cover the wound and don’t aggravate it. 

After experiencing a failure or rejection, invite a colleague to lunch, and ask about their recent reading and travels. Dive into a project that will absorb you. Visit a museum, or watch a movie. Remove the letdown from your thoughts.

Consider seeking input from a mentor, especially if you’ve never experienced a failure or rejection of this type before. Mentors have more experience and so can put obstacles in perspective. For example, suppose you’re a student who’s received an upsetting referee report from a journal. The report might look mild to a faculty member, who’s likely received far more, and more-upsetting, reports. What sounds harsh to you might sound quotidian to an advisor, whose lack of distress might reassure you. 

Also, mentors notice upsides that you’ve overlooked. Perhaps the referee trashed your presentation of an idea but tacitly approved of the idea itself. The trashing might have drowned out the approval during your reading. Yet the approval could justify a resubmission to the journal, upending your belief in the cause’s hopelessness. Relatedly, a mentor can help you identify strategies for moving past the failure. Maybe this journal won’t publish your paper but another journal is soliciting contributions to a relevant special collection.

As a PhD student, I applied for an internship at a company developing a quantum computer. I broke an obligation and flew out of state to interview for the position. No offer materialized. To process the outcome, I spoke with a more-advanced researcher I trusted. He not only offered a mature perspective, but also had access to inside information. The team liked me, he reported, but my interests didn’t overlap with theirs enough. In a sense, I’d grown too independent. As independence marks maturity in PhD students, the rejection came to double as a compliment. Today, I remain on friendly terms with multiple people from that team, and a student of mine just landed an internship at a quantum-computing company.

Someone more experienced than you should have your back. (For the story behind this photo, see this blog post.)

Chart a path past the failure or rejection. Cue the lemon meringue pie. Did a hiring committee reject an application of yours? Email the committee’s chair, thanking them for considering your application. Express your hope of improving your materials so that you can try again the following year. Ask if the chair would provide feedback via phone or Zoom.1 

Pursue the path you’ve charted. To ease the burden, consider undertaking lighter tasks before more-arduous ones. I recently received a laundry list of requests about a manuscript from an editor—and when I say laundry, I mean the equivalent of hauling a multiple-pound bag to a stream, scrubbing everything by hand while the wind attempts to blow cleaned items into the mud, and then hauling everything back. I began with the tasks that required little effort. Dispatching them, I crossed about half the requests off my to-do list. The list looked more manageable as a result; I felt better-equipped to handle the trickiest work.

Celebrate your triumphs. Don’t let failures and rejections consume your attention; carve out time for your successes. If you recognize them, you’ll remember them the next time you face failure; they’ll cushion you when you fall.

I recently failed at a task over a hundred times (yes, I counted). After 28 months of trying, I succeeded. I celebrated by treating myself to lunch at the National Gallery of Art.2 Why not gather ye rosebuds while ye may? This success afforded me the opportunity to fail at a new task.

The National Gallery of Art sold a fairly steampunk book near its Garden Café recently.

Always remember, what matters in the long run is not any one failure or rejection, but your persistence despite failures and rejections. I progressed to the Rhodes Scholarship competition’s final round two years in a row. Each year, a committee interviewed and rejected me. I’d poured time, sweat, and blood into my applications and interview preparations. I felt like I had no more blood to squeeze into the applications I then had to write for graduate programs because of those rejections. Those rejections, though, led me to become a student of John Preskill’s. Over a decade later, I’m not kicking myself.

Result of two failures.

Share about your favorite failures in the comments section below!

1Not via email, which doesn’t offer the same freedom.

2Its Garden Café had enticed me for years, but I can rarely bring myself to pay for food when I can prepare it myself.

May 20, 2026

John BaezPhasons in Quasicrystals

In 2025, researchers studied a quasicrystal forged in a hypervelocity asteroid collision 600 million years ago—and found that it contains ‘phasons’!

It’s not a perfect icosahedral quasicrystal: it’s slightly distorted. 6 gentle ‘phason waves’ run through it, oriented along the 6 fivefold symmetry axes of an icosahedron. These waves were locked in when the alloy quickly cooled after impact, and they’ve been sitting there frozen in the structure ever since.

This quasicrystal is made of ‘icosahedrite’. The easiest way to describe this is the ‘slice and project’ method. You start with a lattice in 6 dimensions, choose a 3d slice, thicken that up a bit, take the lattice points that lie in the thickened slice, and project them down to 3d space. The atoms in the icosahedrite are exactly the projections of the 6d lattice points that happen to fall inside the thickened slice.

But now imagine wiggling the slice gently—letting it ripple in the other three dimensions, the ones perpendicular to physical space. Some 6d lattice points slip out of the slice and others slip in. In physical space this looks like atoms suddenly hopping from one position to a nearby alternative one.

These atomic hops are called ‘phason flips’, and a wave of these hops is a ‘phason’. Sound waves involve atoms swaying smoothly; phasons involve atoms jumping between alternative positions, and they’re special to quasicrystals.

These phasons give a fossil record of the collision that made the quasicrystal: the instant of cooling, preserved as a piece of warped 6-dimensional geometry, sitting inside a rock for 600 million years!

References

For more about quasicrystals in this particular meteorite, which is called the Khatyrka meteorite, read my post:

Naturally occurring quasicrystals.

This paper reported the discovery of icosahedrite in the Khatyrka meteorite:

• L. Bindi, J. M. Eiler, Y. Guan, L. S. Hollister, G. MacPherson, P. J. Steinhardt and N. Yao, Evidence for the extraterrestrial origin of a natural quasicrystal, Proceedings of the National Academy of Sciences 31 (2012), 1396-401.

This is the paper that first reported phasons in this sample of icosahedrite:

• Hiroyuki Takakuru, Asuka Ishikawa, Tsutomu Ishimasa, Luca Bindi and Paul J. Steinhardt, High-resolution synchrotron X-ray study of natural icosahedrite, IUCrJ 12 (2025), 435–443.

Quasicrystal phasons had already been understood long before:

• Dov Levin, T. C. Lubensky, Stellan Ostlund, Sriram Ramaswamy, Paul J. Steinhardt and John Toner, Elasticity and dislocations in pentagonal and icosahedral quasicrystals, Physical Review Letters 54 (1985), 1520–1523.

• T. C. Lubensky, Sriram Ramaswamy and John Toner, Hydrodynamics of icosahedral quasicrystals, Physical Review Letters 32 (1985), 7444–7452.

These papers use some nice math. The lattice I mentioned above is called the D6 lattice: a ‘checkboard’ lattice in 6 dimensions consisting of all points with integer coordinates that sum to an even integer. To understand phasons in icosahedral quasicrystals we need to understand how this lattice connects to the representation theory of the rotational symmetry group of the icosahedron, which is the alternating group A5. This group has an obvious 3-dimensional representation, called 3, where group elements just rotate an icosahedron and drag 3d Euclidean space along with that. But it also has another 3-dimensional representation, the Galois conjugate 3′, where we replace \sqrt{5} by -\sqrt{5} in every formula describing the representation 3.

Beautifully, the 6d space containing the D6 lattice transforms as the representation 33′. It turns out that that 3 describes phonons while the 3′ describes phasons! As mentioned, phonons are waves where atoms move smoothly in 3d space, while phasons involve atoms hopping between alternative positions.

Using representation theory we can classify all the possible formulas, allowed by symmetry, for how an icosahedral quasicrystal can oscillate. The possible choices turn out to involves various constants: two that describe the behavior of phonons, two for phasons, and one for phonon-phason interactions.

I feel like saying more, but this is not the right place!

May 19, 2026

Doug NatelsonWhat are heavy fermions?

I'm surprised that I haven't written about heavy fermions as a separate post before, so here we go. (It's a break from thinking about science and politics, anyway.)

I've written before about "effective mass" for electronic excitations in solids (wiki page here).  From classical physics, we are used to the idea that inertial mass \(m\) is the ratio between an external force \(\mathbf{F}\) and the acceleration \(\mathbf{a}\) of some object, \(\mathbf{F} = m\mathbf{a}\), which is also the rate of change of momentum, \(d\mathbf{p}/dt\).  Kinetic energy (for a nonrelativistic particle) is \(p^{2}/2m\).  Electrons in crystalline solids "feel" the lattice, so in general their kinetic energy \(\epsilon\) can be a more complicated function of their (crystal) momentum, and we can try do define an effective mass as \(1/m* \equiv d^{2}\epsilon/dp^{2}\).  So, if the kinetic energy is very weakly dependent on \(p\), this corresponds to having a very large effective mass.  TL/DR:  the periodic lattice can strongly alter how an electronic excitation accelerates in the presence of a force from, e.g., an electric field, compared to a free particle.  This isn't too surprising.  

Interestingly, in most semiconductors and metals, \(m*\) for electrons in the conduction band (or holes in the valence band) is not thaaaaat different than the free electron mass \(m_{0}\).  The lightest effective mass I know (leaving aside graphene and other Dirac systems when \(\epsilon\) is approximately linear in \(p\)) is electrons in InSb, about \(0.014 m_{0}\).  Holes tend to be a bit heavier.  Also, \(m*\) in molecular organic semiconductors like pentacene tends to be a bit larger, since hopping from molecule to molecule is comparatively weak.  There are ways to measure effective mass, including cyclotron resonance, electronic transport including Shubnikov-de Haas oscillations, magnetic susceptibility and de Haas/van Alphen oscillations, and specific heat measurements.  The electronic specific heat contribution for a metal is linear in the temperature at low \(T\), and the constant of proportionality includes the density of electronic states at the Fermi energy, which can be written in terms of \(m*\).  I've left out a lot of the complications of real anisotropic materials with complicated band structures, but generally the different measurements give consistent results. 

Therefore, it was a big surprise in 1975 when investigators found a material, CeAl3, in which the heat capacity implied an effective mass tens to hundreds of times larger than \(m_{0}\).  They knew right away that this had something to do with the very localized \(4f\) electrons of the Ce atoms.  Because those electrons are very localized, their energy is almost independent of \(p\), implying a large effective mass.  (Some heavy fermion materials also superconduct at temperatures surprisingly high given their effective masses.)

Heavy fermions, adapted from here.  (a) At high temperatures, the 
conduction  electrons are not well coupled to the unpaired local 4f 
moments.  (b) At low enough temperatures, Kondo scattering
hybridizes the f electrons with the conduction  electrons, boosting 
the carrier density.  (c) The hybridized energy-momentum relation 
is much flatter near the Fermi energy leading to a large effective mass.  
So what's the physics?  I wrote about the Kondo effect here, where "ordinary" conduction electrons scatter in a nontrivial way from local magnetic moments (such as partially filled \(4f\) states), and well below a characteristic temperature \(T_{\mathrm{K}}\), the conduction electrons hybridize with the impurities, screening out the unpaired spin.  In the heavy fermion compounds, instead of impurities, there is a whole crystal lattice of local magnetic moments. At sufficiently low temperatures, thanks to that Kondo scattering process, those otherwise localized electrons hybridize with the conduction electrons, boosting the effective density of charge carriers (see figure) and greatly increasing the effective mass.  See this figure, adapted from excellent lecture notes by Piers Coleman.  

So, two key ingredients for heavy fermions are itinerant conduction electrons and a periodic array of comparatively localized, unpaired electrons that have magnetic moments. It turns out that this combination can also be achieved in moiré lattice materials.  There are no \(f\) electrons here, but the moiré lattice can localize spins.  Apologies for not linking to all the relevant papers, but a couple of key theory results are herehere, and here, and a key experimental result is here.  The tunability of the 2D material-based systems is an excellent feature for digging down into the detailed physics.

Update:  Now some added insight from Prof. Andrew Millis:
Hi Doug:

An addendum to your very nice post on heavy fermions, to draw attention to what I think were important experimental results: Frank Steglich’s 1979  Phys. Rev. Lett. 43, 1892–1896 reporting superconductivity in CeCu2Si2 and Louis Taillefer and Gil Lonzarich’s 1988 determination of the quasiparticle mass and fermi surface in UPt3.

Prior to Steglich’s paper we knew that some rare earth/actinide intermetallics (e.g. CeAl3) had a very enhanced specific heat coefficient at low temperatures and that the entropy implied by  this specific heat was  derived from the magnetic moments of the rare earth ions. But  while it was plausible, there was no direct evidence that this enhanced specific heat was associated with heavy-mass fermions, so the physical relevance of the Kondo lattice concept remained uncertain.

Steglich observed that in CeCu2Si2 the specific heat jump at the superconducting transition (which in BCS theory is basically the same size as the electronic specific heat at Tc) was about as big as the normal state specific heat coefficient, thus showing that the spin entropy had been transmuted into something that could go superconducting. Then (I think in subsequent experiments) Steglich observed that the rest of the superconducting thermodynamics in Cecu2Si2 was also consistent with pairing of heavy mass entities. This, I believe, is what convinced everyone that the spin entropy from the rare earth moments had been converted into heavy mass electrons—in other words, that the lattice Kondo effect was real. 

A few years after this, Louis Taillefer and Gil Lonzarich’s quantum oscillation study of UPt3 (Phys. Rev. Lett. 60, 1570 ) showed indeed that the U-f electrons (which appear as local moments at higher temperatures) were included in the Fermi surface at low T and had heavy masses, providing direct experimental confirmation of the Kondo lattice concept.

Cheers

Andy Millis

May 18, 2026

May 16, 2026

Jacques Distler Code

I’ve been playing around with Claude Code (Claude Opus 4.7 (1M context)) because, well, who hasn’t?

I am, so far, only moderately impressed by its abilities in physics. The most impressive bit so far was when, in response to a question about nilpotent orbits, it responded

I don’t know. I could waste your time by guessing, but …

I had never had an LLM tell me that it doesn’t know something, much less that it didn’t want to waste my time by making stuff up. So this was positively shocking to read.

Claude Code is, however, unfathomably good at generating code, so I set it the task of modernizing Instiki and Heterotic Beast, my forum software. Both are Rails applications and both have extensive test suites. So they use a software framework Claude is familiar with and have an objective standard for whether the changes made are correct.

When there is no test, however, things can go wildly off the rails (pun intended). For instance, consider the following snippet of Ruby code


def foo(text)
  ...
  con = text
  ...
  (now mutate con)
  ...
  con
end

If text is a frozen string, this will generate an error, as you can’t mutate a frozen string. Obviously, what you should do is write


def foo(text)
  ...
  con = text.dup
  ...
  con
end

which copies the caller’s string to a new unfrozen string which you can mutate to your heart’s content.

What did Claude do?


def foo(text)
  ...
  con = text.encode
  ...
  con
end

which also produces a new unfrozen string, transcoded from the caller’s encoding to Encoding.default_internal (which turns out to be nil). This is both (a) nondeterministic and (b) blows up spectacularly when text contains astral plane characters, like “𝔸”. I had to tell Claude not to do that, and to write some tests to check that astral plane characters are handled correctly.

Still …

I would set Claude the task of rewriting this blogging software, but alas I don’t have a test suite to compare with.

What I really should do, though, is find some physics I would trust it to work on.

May 15, 2026

Matt von HippelMake No Mistakes

I’m taking a Danish exam next week, and it’s a big one, a culmination of years learning the language. My classmates are stressed. Despite how much we’ve learned, it feels like we’re always making little mistakes. We write the wrong prepositions, put verbs in the wrong form, or mess up the order of words in a sentence. And while we should have time to check our work, that doesn’t help as much as it should. If we don’t notice a mistake the first time around, what guarantee is there that we notice it on the next read, or the next? Too many checks and we can even end up second-guessing ourselves, “correcting” something that was right to begin with.

It’s given me some sympathy for AI.

Earlier this month, investor Marc Andreessen posted a custom prompt he inputs when using AI, which was immediately mocked.

The silliest instruction, according to many critics, was to “Never hallucinate or make anything up.” It’s similar to a prompt that’s become a meme used to make fun of AI-using “vibe coders”, “Make no mistakes”.

Experts point out that this is just not how AI works. Large language model-powered programs like ChatGPT are inherently random, producing text largely based on its similarity to other text. “Hallucinations” or “mistakes” are an inevitable feature of the technology, and a prompt like Andreessen wrote isn’t a set of instructions the AI will follow without error: it’s just another part of the text the AI is trying to generate.

All that said, telling an AI to “make no mistakes” should have some effect. But it likely won’t be what you want.

The best way I’ve found to understand AI is in terms of stories. Chatbots like ChatGPT take a large language model, a mathematical formula for how words are most likely to appear in a text, and warp it, twisting it to almost always produce one particular kind of text: one half of a dialogue with a fictional AI assistant. This twisted formula determines how the AI responds to your prompts, but these days it also is used behind the scenes, in a kind of structured soliloquy called a “chain of thought”. You can think of the prompts you send to the AI as a preface to those soliloquies, and imagine the AI telling stories of a sort that would typically follow that preface.

So if you tell an AI “make no mistakes” or “do not hallucinate”, you’re making it more likely to generate the kind of story that begins, “the AI was instructed to make no mistakes”.

Let me put it this way, Mr. Amor. The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error. – HAL 9000, “2001: A Space Odyssey”

You’d expect this to affect the chain of thought. For example, the AI might occasionally pause to say “I’m supposed to make no mistakes, so I should check this. What could have gone wrong?” and then list something that plausibly could be wrong with its idea. If this happens often enough, you’ll probably catch some real problems.

But I’m reminded of my classmates, practicing for that Danish exam. We can go over the text again and again, asking if this thing, or that, might be wrong. We can try again and again to use our mental model of the Danish language, seeing if this time it catches a new mistake. But there are things we won’t catch. And if we do it too much, we’ll second-guess ourselves out of the good answers, too.

Ultimately, “make no mistakes” isn’t a great instruction, either for humans or for chatbots. And its use by people like Marc Andreessen has me wondering if they are used to interacting with humans in the same way, as tools that keep making mistakes no matter how many times they’re instructed not to, requiring more and more long-winded instructions and yet continuing to misbehave.

Then again, that may be a mistake on my part.

John BaezNaturally Occurring Quasicrystals

I love quasicrystals—like crystals, but with patterns that never repeat, like Penrose tiles. But they’ve very rare in nature. They’re created only by the most exotic and violent events: a high-speed collision of asteroids, lightning hitting a downed power cable in a sand dune—or an atomic bomb!

Amazingly, the first 3 kinds of naturally occurring quasicrystal were discovered in a single meteorite that landed in Khatyrka, in the far east of Russia. They’ve never been found anywhere else! And this meteorite is highly anomalous: it’s the only meteorite known that contains metallic aluminum—and it seems to have been formed in a ultra-high-velocity collision between asteroids.

The only other quasicrystal I know that may be naturally created came from a bolt of lightning hitting a sand dune near a downed power cable in Nebraska. Then there was one found amid the fused desert sand and copper transmission cable left behind by the first atomic bomb test at Trinity, New Mexico. That’s not quite ‘naturally created’.

And that’s all. As far as I can tell, the rest have been made in labs!

Here are the 3 kinds of quasicrystal found in the Khatyrka meteorite, in order of their discovery:

• Icosahedrite, Al63Cu24Fe13. By the way, these numbers are percentages, since quasicrystals don’t have integer-ratio molecular formulas the way conventional compounds do, due to their aperiodic nature.

Icosahedrite has full three-dimensional icosahedral symmetry and is quasiperiodic in all 3 directions. The easiest way to describe it mathematically is the ‘slice and project’ method. You start with a lattice in 6 dimensions, choose a 3d slice, thicken that up a bit, take the lattice points that lie in this slice, and project them down to 3d space. The lattice in 6 dimensions is called the D6 lattice: it consists of those 6-tuples of integers that sum to an even integer. If we slice along the correct 3d subspace the result has icosahedral symmetry. A lot of math in this mineral!

• Decagonite, Al71Ni24Fe5. This a ‘stacked’ quasicrystal: quasiperiodic with tenfold rotational symmetry in two dimensions, and ordinary periodic stacking in the third direction. We can build the 2d pattern using the slice and project method starting from the A₄ lattice in 4 dimensions. The A₄ lattice, in turn, is gotten by taking 5-tuples of integers, but only those that sum to zero. This quasicrystal looks like this:

High-resolution electron-microscopy image of natural Al71Ni24Fe5 quasicrystal found in the Khatyrka meteorite, showing decagonal patterns that never quite repeat.    Photo by Paul J. Steinhardt et al. from http://www.nature.com/srep/2015/150313/srep09111/full/srep09111.html

• i-Phase II, Al62Cu31Fe7. Like icosahedrite this has full icosahedral symmetry and you get it mathematically using the slice-and-project method starting from the D₆ lattice. But it has a distinctly different composition: more copper and less aluminum. It’s the first quasicrystal composition discovered in nature before being made in the lab.

The Nebraska quasicrystal shows how blurry the concept of ‘natural’ can be. It was found inside a ‘fulgurite’: a rock made when lightning hits sand. They found it in the Sand Hills near Hyannis, Nebraska, near a downed power line during a storm. It’s unclear whether this fulgurite was created by a lightning strike or by the falling power line creating its own arc, so the ‘natural vs manmade’ status is genuinely ambiguous.

What’s more important is it was a new kind of quasicrystal produced by a high-current, high-temperature, rapid-quench event on Earth’s surface! It has a composition of roughly Mn72.3Si15.6Cr9.7Al1.8Ni0.6. Its atomic planes have 12-fold symmetry in a nonrepeating pattern, and these planes are stacked periodically along the perpendicular direction.

Here’s a picture of this quasicrystal:

tunnelling electron microscope data obtained on a dodecagonal quasicrystal from a fulgurite. (A) The black circle in the acicular, quasicrystalline grain indicates the region where the electron diffraction pattern (Inset) has been collected. (B) A HAADF-TEM image of a portion of the quasicrystalline grain.  From here: www.pnas.org/doi/10.1073/pnas.2215484119

References

On icosahedrite, the first natural quasicrystal to be found in the meteorite from Khatyrka:

• L. Bindi, J. M. Eiler, Y. Guan, L. S. Hollister, G. MacPherson, P. J. Steinhardt and N. Yao, Evidence for the extraterrestrial origin of a natural quasicrystal, Proceedings of the National Academy of Sciences 31 (2012), 1396-401.

On decagonite, the second to be found:

• I. Buganski and L. Bindi, Insight into the structure of decagonite—the extraterrestrial decagonal quasicrystal, IUCrJ 8 (2021), 87–101.

On i-phase II, which is the provisional designation of the third quasicrystal found in that meteorite:

• L. Bindi, C. Lin, C. Ma and P. J. Steinhardt, Collisions in outer space produced an icosahedral phase in the Khatyrka meteorite never observed previously in the laboratory, Scientific Reports 6 (2016), 38117.

On the dodecagonal quasicrystal found in the dune in Nebraska:

• L. Bindi, M. A. Pasek, C. Ma, J. Hu, G. Cheng, N. Yao, P. D. Asimow, and P. J. Steinhardt, Electrical discharge triggers quasicrystal formation in an eolian dune, Proceedings of the National Academy of Sciences 120 (2023), e2215484119.

On the quasicrystal found at the atomic bomb test site:

• L. Bindi, W. Kolb, G. N. Eby, P. D. Asimow, T. C. Wallace and P. J. Steinhardt, Accidental synthesis of a previously unknown quasicrystal in the first atomic bomb test, Proceedings of the National Academy of Sciences 118 22 (2021), e2101350118.

This has icosahedral symmetry, but it’s quite different than the other quasicrystals I’ve mentioned, since it’s mostly made of silicon! Its formula is Si61Cu30Ca2Fe2.

May 14, 2026

Terence TaoPrimitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond

Boris Alexeev, Kevin Barreto, Yanyang Li, Jared Duker Lichtman, Liam Price, Jibran Iqbal Shah, Quanyu Tang, and I have just uploaded to the arXiv our paper Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond. This paper (which is a work in progress) represents our efforts to digest and document the recent flurry of developments around the following problem of Erdős, Sárközy, and Szemerédi on primitive sets:

Conjecture 1 (Erdős problem #1196) Suppose that {A \subset [x,+\infty)} is a primitive set of integers, which means that no element of {A} divides another. Then

\displaystyle  \sum_{n \in A} \frac{1}{n \log n} \le 1 + o(1)

as {x \rightarrow \infty}.

One can show that the upper bound of {1 + o(1)} is best possible up to the {o(1)} error by taking {A} to be the set of products of {k} primes for some suitable parameter {k}. This was one of the most well-known open problems in the study of primitive sets, and had attracted some number of partial results (for instance, Lichtman was able to show the upper bound of {e^\gamma \frac{\pi}{4} + o(1) \approx 1.399}). It was thus notable that this problem was first solved by an autonomous AI query (by the fifth author) a few weeks ago. This solution introduced a proof technique – based on Markov chains in the divisibility poset – which in retrospect is very natural for controlling primitive sets, but which had not been explicitly used in previous literature, though in retrospect many of the arguments in that literature involved a specific Markov chain which we call the downwards Mertens chain. The proof instead revolved around a different Markov chain, which we call the downwards von Mangoldt chain, which manages to neatly avoid the “{e^\gamma}” type losses in the previous Mertens-based arguments, and resolve Conjecture 1. In this paper we develop the Markov chain approach more systematically, and show that it settles several further conjectures concerning primitive sets, and also provides simpler proofs of some previous results in the literature. More precisely, in addition to Conjecture 1, we establish the following:

Theorem 2 (Erdős primitive set conjecture, #164) For any primitive {A} consisting of numbers greater than {1},

\displaystyle  \sum_{n \in A} \frac{1}{n \log n} \le \sum_p \frac{1}{p \log p}.

Theorem 3 (Odd Banks–Martin) Let {k\ge 1} and suppose {A} is a primitive set consisting of odd numbers with at most {k} prime factors. Then

\displaystyle \sum_{n \in A} \frac{1}{n \log n} \leq \sum_{n \in \mathbb N_k(\mathcal Q)} \frac{1}{n \log n},

where {\mathcal Q} denotes the primes appearing as factors of elements of {A}, and {\mathbb N_k({\mathcal Q})} is the collection of products of {k} primes from {\mathcal Q}.

Theorem 4 ({2} is Erdős-strong) If {A} is a primitive set consisting of even numbers, then

\displaystyle  \sum_{n \in A} \frac{1}{n \log n} \le \frac{1}{2\log 2}.

Theorem 5 (Ahlswede–Khachatrian–Sárközy) If {A} is a primitive set, then

\displaystyle  \sum_{n \in A \cap [y/x,y]} \frac{1}{n} \ll \frac{\log x}{\sqrt{\log\log x}}

whenever {3 \leq x \leq y}.

Theorem 6 (Erdős–Sárközy–Szemerédi, #1217) Let {A \subset {\bf N}} be such that the upper doubly logarithmic density

\displaystyle  \Delta := \limsup_{x\rightarrow\infty}\frac{1}{\log\log x} \sum_{n \in A \cap [2,x]} \frac{1}{n \log n}

is positive. Then there exists a strictly increasing infinite divisibility chain

\displaystyle  n_0 | n_1 | n_2 | \dots

in {A} such that

\displaystyle  \limsup_{x\rightarrow\infty}\frac{1}{\log\log x} \# \{ i: n_i \leq x \} \geq \Delta.

Theorem 2 and Theorem 5 had been previously established by Lichtman and Ahlswede–Khachatrian–Sárközy respectively, but the Markov chain formalism gives shorter (and more unified) proofs of both. Theorems 3, 4, 6 were open conjectures that can now be settled by this method. These results were obtained with varying levels of AI involvement, ranging from completely autonomous AI queries to traditional pen-and-paper calculations, to various hybrid approaches (for instance, with humans suggesting key inequalities that could then be rapidly tested numerically or even proved by various AI tools).

— 1. Chain/antichain duality and Markov chains —

I’ll now discuss the basic method of proof and try to motivate the main ideas, which have become much clearer in retrospect. Primitive sets can be viewed as antichains in the divisibility poset {({\bf N},|)}, in which the partial ordering is given by the divisibility relation {a|b}. So, one can pose the following more abstract question: given a general poset {({\mathcal N}, \leq)} and a weight function {\nu : {\mathcal N} \rightarrow {\bf R}^+}, what is the maximal value of {\sum_{n \in A} \nu(n)} as {A} ranges over all antichains in {{\mathcal N}}?

One can attack this problem using the well known duality between antichains and chains {C} (totally ordered subsets of {{\mathcal N}}): every antichain and chain can meet in at most one point, thus one has

\displaystyle  \sum_{n \in A} 1_{n \in C} \leq 1

for any chain {C} and any antichain {A}. In particular, if one has a measure {\mu} on the space {{\mathcal C}} of all chains (viewed as a compact subspace of the power set of {{\mathcal N}}, equipped with the product topology) with the property that

\displaystyle  \mu \{ C : n \in C \} \geq \nu(n) \ \ \ \ \ (1)

for all {n \in {\mathcal N}}, then by integrating the previous inequality against {\mu} and using Tonelli’s theorem one would obtain the upper bound

\displaystyle  \sum_{n \in A} \nu(n) \leq \mu({\mathcal C}).

In fact this duality is completely tight:

Proposition 7 (Chain/antichain duality) Let {({\mathcal N}, \leq)} be a poset, let {\nu : {\mathcal N} \rightarrow {\bf R}^+} be a weight function, and let {M > 0}. Then the following are equivalent:

Proof: We have already indicated how (ii) implies (i). Now we need to show that (i) implies (ii). A standard compactness argument allows us to reduce to the case when {{\mathcal N}} is finite. If (i) holds, then we also have

\displaystyle  \sum_{n \in {\mathcal N}} f(n) \nu(n) \leq M \ \ \ \ \ (2)

for all {f : {\mathcal N} \rightarrow {\bf R}} in the Stanley chain polytope, defined as the convex hull of the indicator functions of antichains. By a classic result of Stanley, this polytope can also be defined as the space of all {f : {\mathcal N} \rightarrow {\bf R}} obeying the inequalities

\displaystyle  0 \leq f(n) \hbox{ for all } n \in {\mathcal N} \ \ \ \ \ (3)

and

\displaystyle  \sum_{n \in C} f(n) \leq 1 \hbox{ for all chains } C. \ \ \ \ \ (4)

Applying linear programming duality (or the Farkas lemma), we conclude that the inequality (2) must be a non-negative linear combination of the inequalities (3), (4) (as well as the trivial inequality {0 \leq 1}). Equivalently, we can find non-negative weights {\mu(C)} for each chain {C} such that

\displaystyle  \sum_{C \in {\mathcal C}} \mu(C) \leq M

and

\displaystyle  \sum_{C: n \in C} \mu(C) \geq \nu(n)

for all {n \in {\mathcal N}}. The claim follows by viewing {\mu} as a measure on the space of chains. \Box

Thus, the “universal” problem of obtaining a uniform upper bound on {\sum_{n \in A} \nu(n)} for all antichains {A} is replaced with the equivalent “existential” dual problem of exhibiting a single measure {\mu} on chains of controlled mass, which hits each element {n} of the poset with a mass of at least the original weight {\nu}. Thus, such problems are now reduced to that of finding a sufficiently clever construction of such a measure {\mu}. If the mass of {\mu} was normalized to equal {1}, this becomes a probability problem: find a random chain process in the poset {{\mathcal N}} that hits each element {n} of the poset with a sufficiently high probability. (Though in our paper we found it more convenient for technical reasons to not normalize the measure, and allow the mass to take values other than {1}.)

It turns out (in a manner that was not explicitly appreciated in past literature) that particularly good choices of random chain to use here can come from Markov chains. (Here, the term “chain” is now being used in two different ways, but fortunately the order-theoretic concept of a chain and the Markov process-theoretic concept of a chain will be quite compatible in this discussion.) There will be two types of Markov chains on the poset {{\mathcal N}} that will be relevant: downwards Markov chains and upwards Markov chains. Here is our notation for a downwards Markov chain:

Definition 8 (Downwards Markov chain) Let {({\mathcal N}, \leq)} be a poset, and suppose we designate some subset {{\mathcal A}} of {{\mathcal N}} to be the “absorbing states” (in practice these will be the minimal elements of {{\mathcal N}}, although they do not have to be). A downwards Markov chain on {{\mathcal N}} with absorbing states {{\mathcal A}} is a collection of transition probabilities {P(n \searrow m)} for {n, m \in {\mathcal N}} obeying the following axioms:
  • (i) {P(n \searrow m) \geq 0}, with equality unless {m < n} and {n \notin {\mathcal A}}, or if {m = n} and {n \in {\mathcal A}}.
  • (ii) For any {n \in {\mathcal N}}, one has {\sum_{m \in {\mathcal N}} P(n \searrow m) = 1}.
(Thus for instance one has {P(n \rightarrow n)=1} for any absorbing state {n \in {\mathcal A}}.)

Given such a downwards Markov chain and an initial state {n_0 \in {\mathcal N}}, one can generate a random decreasing sequence {n_0 \geq n_1 \geq n_2 \geq \dots} by having each {n_i} transition to {n_{i+1}} with probability {P(n_i \searrow n_{i+1})} after conditioning on the past history {n_0,\dots,n_i} of the chain. This sequence will (almost surely) be strictly decreasing until it hits an absorbing state, in which case it stays there forever, although if the descending chain condition is not satisfied it is also possible for the sequence to be strictly decreasing indefinitely. We let {{\mathbb P}_{n_0 \searrow}} denote the law of this decreasing sequence. This construction is already enough to recover the Lubell-Yamomoto-Meshalkin (LYM) inequality:

Theorem 9 (LYM inequality) If {({\mathcal N}, \subset)} is the power set of {\{1,\dots,n\}} with the inclusion partial ordering, and {A} is an antichain in this poset (i.e., a Sperner family), then

\displaystyle  \sum_{S \in A} \frac{1}{\binom{n}{|S|}} \leq 1.

Proof: We introduce the downwards Markov chain with absorbing state {\emptyset} in which each non-empty subset of {\{1,\dots,n\}} of some cardinality {0 < k \leq n} transitions to a {k-1}-element subset chosen uniformly at random (i.e., with probability {1/k} of transitioning to each). If we start the descending sequence from the maximal element {\{1,\dots,n\}} of {{\mathcal N}}, then one can easily check that each {S \in {\mathcal N}} is hit with probability {1/\binom{n}{|S|}}. Applying Proposition 7 with {\nu(S) = 1/\binom{n}{|S|}}, {M=1}, and {\mu = {\mathbb P}_{\{1,\dots,n\} \searrow}}, we obtain the claim. \Box

In the above argument we fixed the initial location {n_0} of the Markov chain, but more generally one can start with any source mass {b : {\mathcal N} \rightarrow {\bf R}^+} and work with the measure {\mu = \sum_{n_0 \in {\mathcal N}} b(n_0) {\mathbb P}_{n_0 \searrow}} for the purposes of applying Proposition 7.

One can also define upwards Markov chains {P(n \nearrow m)} in exact analogy with downwards Markov chains {P(n \searrow m)} (reversing the order in the poset), which now generate random increasing sequences {n_0 \leq n_1 \leq n_2 \leq \dots} rather than decreasing sequences. There is a useful adjoint construction that can convert a downwards Markov chain into an upwards Markov chain: if we have a positive weight {\nu : {\mathcal N} \rightarrow (0,+\infty)} which is invariant under the chain in the sense that

\displaystyle  \nu(n) = \sum_{m \in {\mathcal N} : m > n} \nu(m) P(m \searrow n)

for any {n \in {\mathcal N}}, then we can define an adjoint upwards Markov chain {P(n \nearrow m)} (with no absorbing state) by the formula

\displaystyle  P(n \nearrow m) = \frac{\nu(m)}{\nu(n)} P(m \searrow n)

for any {n < m} in {{\mathcal N}}. More generally, if {\nu} is merely sub-invariant in the sense that

\displaystyle  \nu(n) \leq \sum_{m \in {\mathcal N} : m > n} \nu(m) P(m \searrow n)

for all {n \in {\mathcal N}}, one can still construct an adjoint upwards chain as before, but now one must also add an additional absorbing maximal state {\infty} to ensure that the transition probabilities still sum to one.

A downwards or upwards Markov chain, when equipped with an invariant or sub-invariant measure, also induces a flow network on the poset, in which an edge from {n} to {m} is assigned a flow capacity of

\displaystyle  \nu(m) P(m \searrow n) = \nu(n) P(n \nearrow m).

One can rewrite the Markov chain arguments in the paper in terms of such flow networks, in which case the arguments often boil down to an application of the discrete divergence theorem, giving very short proofs of many of the above results; see the paper for more discussion. However, we chose to focus more on the Markov chain approach in our presentation, as this formalism is also natural and could potentially be more flexible for further applications.

— 2. The Mertens and von Mangoldt chains —

For the purpose of analyzing primitive sets, there are two downward chains on the natural numbers (with absorbing state {1})that, in retrospect, are particularly natural to use:

  • The downwards Mertens chain, in which each {n>1} transitions deterministically to {n/P(n)}, where {P(n)} is the largest prime factor of {n}; and
  • the downwards von Mangoldt chain, in which each {n>1} transitions to {n/q} with probability {\Lambda(q)/\log n} for each {q} dividing {n}, with {\Lambda} the von Mangoldt function.

The von Mangoldt weight {\Lambda} is a natural choice here thanks to the fundamental identity

\displaystyle  \sum_{q|n} \Lambda(q) = \log n

which encodes the fundamental theorem of arithmetic. The two chains are similar in many way to each other: the von Mangoldt process favors the division by the largest prime factor, but does not require it.

The Mertens chain generates deterministic downward divisibility chains

\displaystyle 1 | p_1 | p_1 p_2 | \dots | p_1 \dots p_k

starting from a product of {k} primes {p_1 \leq p_2 \leq \dots \leq p_k}, and as such this process was implicit in much of the previous literature on primitive sets. However, it does not quite interact well with the {\frac{1}{n \log n}} weight, which is not invariant or subinvariant with respect to this process. Intuitively, the process of dividing {n} by {P(n)} tends to increasingly select for numbers {n} for which {P(n)} is smaller than expected. Instead, the natural invariant measure for this chain is the Mertens weight

\displaystyle  \nu_{\mathrm{Mertens}}(n) := \frac{e^\gamma}{n} \prod_{p < P(n)} \left(1-\frac{1}{p}\right) \approx \frac{1}{n \log P(n)};

the verification that this is indeed an invariant weight is a nice exercise in telescoping series. Taking the adjoint of the downwards Mertens chain with respect to this weight and running that chain from {1} gives the upwards Mertens divisibility chain

\displaystyle  1 | p_1 | p_1 p_2 | \dots

in which each {n} transitions to {np} for some prime {p \geq P(n)} with probability {\frac{1}{p} \prod_{P(n) \leq p' < p} \left(1-\frac{1}{p'}\right)}. A routine induction shows that each {n} is hit by this chain with a probability of {\prod_{p < P(n)} \left(1-\frac{1}{p}\right)}; this for instance gives a weak version

\displaystyle  \sum_{n \in A} \frac{1}{n \log n} \le e^\gamma + o(1)

of Theorem 1, and similarly for the other results discussed above.

The key innovation (which was uncovered by the AI-assisted proofs, though not quite in the notation and framework presented here) is to switch to the von Mangoldt chain, which removes the bias towards numbers whose largest prime factor is small. Indeed, the weight {\frac{1}{n \log n}} now turns out to be sub-invariant (after removing {n=1}) under this chain, and there is a modification

\displaystyle  \nu_\Lambda(n) := \int_1^\infty \frac{\log n}{\zeta(s) n^s}\ ds = \frac{1 + o(1)}{n \log n}

of this weight which turns out to be perfectly invariant. (We have an interpretation of this formula in terms of a zeta process that couples together various zeta distributions into a continuous divisibility chain; see the paper for further details.) Taking the adjoint with respect to this weight (or with the original weight {\frac{1}{n \log n}}) can eliminate the {e^\gamma} loss in the previous argument, and give one of the proofs of Theorem 1 recorded in our paper (there are several other variants of this method that we also present).

One slight defect of the von Mangoldt chain, as compared to the Mertens chain, is that it can “jump over” primitive sets (such as the set of products of {k} primes) due to the fact that it will sometimes multiply or divide by a power of a prime rather than a prime itself. This turns out to be a technical difficulty for many of our applications, resulting in a need to make various small ad hoc modifications to the von Mangoldt chain to eliminate this type of jump.

In order to establish some crucial sub-invariance properties, it turns out (after some standard manipulations) to be useful to obtain good bounds on the negative log-derivative

\displaystyle  - \frac{\zeta'}{\zeta}(s) = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}

of the Riemann zeta function {\zeta} in the region {s > 1}. Here it turns out that there is a clean (and rather efficient) upper bound

\displaystyle  - \frac{\zeta'}{\zeta}(s) \leq \frac{\log 2}{2^{s-1}-1}

which is equivalent to the non-decreasing nature of the Dirichlet eta function

\displaystyle  \eta(s) = (1-2^{1-s}) \zeta(s)

in this region. There are many proofs of this fact in the literature, but I would like to record a particularly cute proof, that is in the spirit of other arguments in the paper: one can interpret {\eta(s)} probabilistically as

\displaystyle  \eta(s) = {\mathbb E} \frac{1}{1+e^{-X_s}}

where {X_s} is a gamma random variable of shape {s} and scale {1}. Because the sum of independent copies of {X_s} and {X_t} has the distribution of {X_{s+t}}, one can couple together all the {X_s} so that they are increasing in {s}, at which point the claim follows.

— 3. Further directions —

Will Sawin and Ofir Gorodetsky have obtained analogues of several of the above results for function field models or permutation models respectively; we briefly discuss these in the paper, although we do not plan to cover these models in depth. We also note another recent use of this technique to solve a separate Erdős problem (#858) relating to antichains in a variant of the divisibility poset.

The zeta process that we have discovered hints at an emerging theory of the “developmental anatomy of integers”, which differs from the existing topic of anatomy of integers in that it views a large integer {n} (and its prime factor “organs”) not as a static entity, but rather as an evolving process in which primes (or powers of primes, in the case of the von Mangoldt process) are added or removed to the integer over time. With this perspective, primitive sets can be viewed as singular moments in such a developmental process, which are only encountered at most once in the life cycle of a given integer. It seems of interest to study this developmental perspective further.

The paper is currently a work in progress; we have released an early version due to the public interest in this problem. We plan to explore some further applications, and also to formalize more of the above results in Lean (currently two of the six main theorems are formalized), before submitting the paper for publication. The situation here highlights a distinction I have recently made between three components of the problem solving process in mathematics, namely proof generation, proof verification, and proof digestion. In this particular case, the first two steps were extremely rapid due to modern AI tools; however, properly digesting the AI-generated proofs into a coherent exposition that places the arguments in context with both past literature and future directions remains a slower process that requires expert human attention.

May 13, 2026

Scott Aaronson No more NYT cooperation: my dog-rape red line

Over the years, I’ve written two op-eds for The New York Times about quantum computing, at the NYT editors’ invitation:

I’ve also visited the NYT office and helped NYT reporters with numerous stories about quantum computing and beyond. In the wake of Cade Metz’s infamous NYT hatchet job against Scott Alexander and the rationalist community, I resolved no longer to talk to Metz, but it never even occurred to me to extend that to a broader ban against the NYT itself. After all, it’s the friggin’ NYT!

This week, however, Nicholas Kristof—a man who I praised fulsomely in a blog post 20 years ago, for his coverage of the genocide in Darfur—has used the NYT to broadcast the oldest, crudest form of antisemitic libel, accusing the Jews of garishly preposterous crimes (poisoning wells? baking blood into matzo? in this case, training dogs to rape prisoners). As countless others have since pointed out, the sole source for this ludicrous accusation was a Hamas-linked organization called “Euro-Med” that praised the October 7 “martyrs.” Kristof’s piece came out the same day an Israeli human rights organization released a major report meticulously documenting Hamas’s mass rapes on October 7–something that did happen—and was apparently designed to neutralize the impact of that report.

While the debunking of Kristof came fast, it wasn’t fast enough: now that antisemitic blood libel (dog libel?) has the Gray Lady’s imprimatur, I expect it to ignite violence against Jews all over the world, and I expect my kids to be less safe. So I hereby announce:

I, Scott Aaronson, member of the National Academy of Sciences, will no longer cooperate with anyone from the NYT on anything—neither quantum computing stories nor anything else—until the NYT, at minimum, formally retracts its dog-rape claim and fires Kristof.

To my friends doing good science and technology writing for NYT: I’m sorry, I hope you understand, and please fight for what’s right within the organization.

I say “bare minimum” because I hope this scandal causes a major reckoning for the NYT as an institution. I don’t know if any libel or other laws were broken, but if they were, I hope Kristof personally and the NYT as a whole get sued for whatever they’re worth.


Luckily, others have already said most of what I would’ve wanted to. Here’s Eli Lake, in a passage that part of me wishes I’d never read:

Let’s start with what is known about the biology of male dogs. Their penises are small and thin. They become erect only when they smell the pheromones of a female dog in heat. Brandon McMillan, the three-time Emmy-winning host of CBS’s Lucky Dog, who has spent 25 years training animals, told me he had never heard of a dog who was trained to rape a human being and doubted this was possible.

“When a female is in heat, the pheromones released carry it to the male canine,” McMillan said. “That’s how they reproduce and the miracle happens. I don’t see how you would train a dog to do that. The dog has to get turned on, for lack of a better word.”

On the NYT’s “vetting process” (or lack thereof), here’s comedian Jeff Maurer:

Imagine that you’re a fact checker or editor at The New York Times. You know damn well that you’ve nabbed a lifeboat while most of your field is being picked apart by hagfish at the bottom of the North Atlantic. If you’re a lowly fact-checker, you might be in your 20s and desperate to rationalize your choice to enter a field that might as well be buggy repair. I‘ve had similar gigs and been in similar environments; I feel like I can get in these peoples’ heads. And that’s why I’m convinced that the dog claim should have made things easier for the fact checkers.

One day, Nicholas Kristof — the award winning columnist who has been with the Times since it was Farmer Sulzberger’s New Amsterdam Seed Catalogue — comes to you, the lowly fact checker. He’s working on a big piece, and it’s scorching hot: It’s sexual assault plus Israel/Palestine, which is like detonating a nuclear weapon inside a volcano. You know it will be very tough to say “I don’t find this source credible” or “we can’t run that without more substantiation”. And that’s true even before we consider that you probably run in a social circle in which Israel is thought of as a substantially-more-evil version of The Borg Collective from Star Trek (which is why they have those vaporizing weapons).

But you do notice that Kristof’s sources are thin. In fact, he only has two named sources. The first source tells him a story that has substantially changed since he told it to The Washington Post two years ago. The second source celebrated the October 7 attacks and previously made torture claims that he later recanted. Much of the testimony contains no detail that could be corroborated. The organizations and journalists that Kristof relies on are highly dubious and engaged in the mutual citation circle jerk that is typically the way that lies enter the zeitgeist. There may be some truth in there somewhere, but Kristof’s methods are — what’s the journalistic term? — total ass. But you’re afraid that if you question things, you’ll be thought of as pro-rape, or worse yet: pro-Israel.

And in that context, the dog story is a godsend. It frees you from addressing the broader problems with the article, because you can pick the battle of focusing on the most-outrageous, least-believable claim. The article doesn’t need the dog rape detail — it’s already full of shocking-if-true claims. Striking that one section lets you maintain a sheen of professionalism, and when the thing explodes, you also get to say “You should have seen what he wanted to print”.

The fact that that didn’t happen tells me that the level of scrutiny for claims about Israel at the Times is absolute zero. Even if some Israeli somewhere did manage to turn his dog into a sex criminal, there is not nearly enough evidence for that claim to run in the Times. It’s an unsubstantiated rumor, and a vile one. I think that a person who believes that without compelling evidence is showing evidence of a damaged mind. And I think that a newspaper that publishes that without compelling evidence is showing evidence of a damaged vetting process, or perhaps a vetting process that — when it comes to claims about Israel — doesn’t exist at all.

Last but not least, here’s Haviv Rettig Gur, with righteous anger entirely appropriate to the situation, yet also a determination to expose and fight prisoner abuse, which is a genuine problem in Israel as it is in pretty much every other country on earth:

No, dogs aren’t being trained to systematically rape prisoners, you nattering halfwits. And no, Hamas propaganda operatives are not reliable sources on the question of Israeli crimes. The vast, vast majority of soldiers are honorable men who walked into fire so our families may live. The whole world may turn on them; I will stand with them, grateful for their sacrifice. And Kristof, a willing purveyor of propaganda happily feigning that he can’t see the water and thrilling to a moral crusade engineered by would-be genocidaires he pretends not to understand — is no messenger of moral reckoning.

But friends, so fucking what. Let the narcissistic guttersnipes strut their moral emotions before the world, let the UN publish endless reports that don’t hold up to basic scrutiny, let the NGOs dream their rabid, sick dreams that no journalist ever fact-checks — yes, they’re lying. But so fucking what.

We still, for ourselves — because fuck them — must see that it isn’t all fake. The problem [of abuse in Israeli prisons] is real. It’s far smaller than they claim, but real nonetheless. And when discipline and morality break down, it can only get worse. We either crack down now or we watch it fester and grow.

And our own Ben Gvirs are stubbornly refusing to fix what is actually broken, the real thing in the real world.

And so we are caught in a strange sort of vise, the same vise we find ourselves in with the genocide lie: A vast propaganda machine that seeks to destroy us — countless activists too high on their own self-regard to see the irony of raging against a ‘genocide’ while calling for the erasure of a people — all while our own incompetent, venal, self-absorbed political class insist in their mindless chatter on confirming every claim of our enemies for sheer, bald egomania.

I’m sick of it all. I know you’re all sick of it too.

And that, in a nutshell, is what I think about this.

Just because they’re lying, just because a vast perfidious campaign has overwhelmed global elites in a bid to clear the way for our removal, just because they’re still, after two millennia, building their visions of redemption on The Evil Jew — doesn’t mean there isn’t also, separately, a problem on the ground.

So what do we do now? Simple. We see it, we acknowledge it’s happening, we bring our rage to our inept leaders until they bend to our will and act to stop the breakdown…

And we soldier on.

We soldier on because the enemy really is coming to murder us. Because Hamas must still go if Gaza is ever to rise to a new day. Because Hezbollah will yet destroy Lebanon on the altar of destroying us. Because the ayatollahs built their whole damn religion on the extermination of our children.

We fix the broken things within us as if the pogromists and their simpering Kristofs don’t exist. We owe no answers to the propagandists who seek to clear a path to our deaths. But we do owe answers to ourselves.

Let the screaming mob rage and churn like so much sea-foam. Despite that raging mob, despite the enemy who still seeks our destruction, and yes, despite feckless incompetents like Ben Gvir, our minister of prisons, who claim to lead us — we remain the strongest, freest Jews who ever lived, more capable and committed than our self-destructive enemies ever imagined. And the task is still before us, yet to be completed, the sacred duty given to our generation to ensure our children don’t have to face the genocidaires who now surround us. We do not waver, we do not stumble. We soldier on.

Because fuck them all.


Inspired by Haviv’s energy, I’m leaving the comments on this post turned off. Why? Because outside the dog-rape libel, I was having a pretty good week! On Monday I visited Quantinuum in Colorado and got a personal tour of Helios 1, considered the all-around most powerful quantum computer on earth at the moment. Now I’m visiting Stony Brook University, where I’ll give two talks and meet lots of interesting people. Then I’m headed to NYC for the annual meeting of Quanta magazine’s advisory board.

Right now, then, I have neither the time nor the inclination to battle the infinite army of brain-eaten zombies who arrive for every single post touching on Israel. My life will be better this way.

May 12, 2026

Doug NatelsonNSF, National Science Board, and the politics of staying quiet

As I mentioned previously, the National Science Board was summarily fired on April 25.  The NSB nominally advises the National Science Foundation.  There have been a number of pieces written about this:

  • Going back in time to 2022, this essay is interesting to read, about the history of the NSF and the NSB, and the compromises put in place with the administrative structure.  Short version: Initially there was a real tension between the Director (reporting to the President) and the NSB.  Over time, the NSB was made subordinate to the director (1968).  Senatorial confirmation of board members was waived by the Senate in 2011.  
  • Many professional organizations issued statements expressing grave concern about this wholesale dismissal of the board.  This AIP news article has a summary.  The CEO of the APS wrote this, the ACS leadership wrote this, the AAS wrote this, etc.
  • The presidents of the National Academy of Sciences, National Academy of Engineering, and National Academy of Medicine issued this joint statement.  That has to set some kind of record for blandness, as it somehow does not even mention that the NSB was fired.  I fully understand that the Academies have a number of federal contracts, as one of their key responsibilities is leveraging their membership to do authoritative studies, with federal agencies usually being the customers.  I have no inside knowledge, but it sure looks like they are trying to walk a line of not raising the administration's ire.  (Surely this raises the question:  If it's never acceptable to say anything that might upset the administration, then how can the objectivity of their reports relating to policy ever be trusted?)
  • In contrast to the leadership, a lot of Academy membership has signed an open letter to Congress demanding the reinstatement of the board.
  • Scientific American has very good reporting on this, including a no-holding-back statement by my colleague Neal Lane.
  • UpdateHere is Dan Garisto's reporting in Science about letters sent by House Democrats and by Senate Democrats demanding action on this.  That article includes a statement by the fired head of the NSB, basically saying they were dismissed for defending the NSF budget from OMB.  I'm glad these letters were sent, but without the R majority signing on, I'm not holding my breath.
Meanwhile, the pace of NSF awards continues to be glacial, even compared to last year.  See this plot from Grant Witness
We are 7 months into the fiscal year, and obligated dollars are less than half at this time last year, and more like 27% of those at this time in "normal" year.  It's hard to look at that and not wonder whether someone is aiming for a pocket rescission, regardless of what Congress appropriated.  NSF looks like an outlier here, by the way.  As badly hit as NIH has been, their award curves look much closer to last year.

Other related things worth reading:  
Back to science in my next post.


Tommaso DorigoEnrico Stomeo - A Lifelong Passion For Meteor Studies

I was reached this evening by the news of the passing of a dear friend, Enrico Stomeo. Enrico was an architect by profession, but for me he was rather defined by his activity as an amateur astronomer - in fact, if I had to define what a serious amateur astronomer is, I would more or less consciously be describing him. 


[Above, a recent picture of Enrico]

The Associazione Astrofili Veneziani

read more

May 10, 2026

Scott Aaronson The Trevisan Award and the Decimal Digits of Powers of 2

WHOA … I’ve won the inaugural Luca Trevisan Award for Expository Work in Theoretical Computer Science! This has a particular meaning for me as someone who knew Luca Trevisan as well as I did for 25 years — who had him as a professor and thesis committee member, whose blog bounced off his blog, who benefitted tremendously from his expository work in TCS — before Luca tragically succumbed to cancer two years ago.

As I told the committee, receiving this award makes me want to use my blog for more actual CS theory exposition, and less venting, in order to retrospectively become worthier of such an honor.

I’m ridiculously grateful to the entire TCS community — my people, my homies — for tolerating me to do what I do.

If you’re curious, here’s the official citation:

The inaugural Luca Trevisan Award for Expository Work in Theoretical Computer Science is awarded to Scott Aaronson for his sustained and high-impact inspirational efforts to explain and promote our field to broad audiences. His blog Shtetl-Optimized has hosted remarkably frequent and elaborate posts over more than two decades, and has become a central meeting place for wide-ranging conversations across the TCS and Physics communities. Scott Aaronson’s blog posts contain crystal-clear, informative expositions of exciting new results, calibrated evaluations of technological claims, and profound analyses of topics in these fields and (way) beyond. The uniquely enthusiastic and witty style of his writings (including his book Quantum Computing Since Democritus and his other lecture notes and surveys), lectures, and interviews have made him a top invitee for both popular and professional appearances, attracting large audiences. These qualities have inspired many students to enter the field, and made Scott Aaronson a go-to person for journalists and scientists alike looking for a definitive word on the latest scientific activity in TCS and quantum computing.


In the rest of this post, I’m going to start practicing what I preached—y’know, about turning over more of this blog to actual exposition, of a kind that the Trevisan Award could plausibly be meant to encourage. I’ll start with something that’s been on my back burner for the past couple months: namely, the (lightly edited) transcript of a talk that I gave this spring at UT Austin’s undergraduate math club. So, without further ado, and in memory of Luca…


On the Decimal Digits of Powers of 2
by Scott Aaronson

Hi! I’ve given six previous talks here at UT’s math club, some on relatively “important” topics—Gödel’s Theorem, time travel, Huang’s proof of the Sensitivity Conjecture, and so on.

Today, I want to talk about an unimportant question, one that my son Daniel, who was then 8, and who’s sitting here in the front row (along with his sister Lily), asked me a few months ago.

Daniel asked: which powers of 2 can you double without needing to carry digits? Clearly 1, 2, 4, 32, and 1024 all have this property, their doubles being 2, 4, 8, 64, and 2048 respectively. Are there any others?

Since I happen to have the powers of 2 up to 220 = 1,048,576 committed to memory since childhood, I confirmed that there were no other examples up to there: 128, 256, 512, 2048, etc. all require carries. So I told Daniel: I can’t find any other example, and on that basis, I conjecture that there aren’t any more. But if that conjecture is true, I don’t know if it will ever be proven, by humans or even AI!

Then I googled it, and saw that this is a known question (not very well known, but there’s a StackExchange post about it). And indeed it had been checked up to 2millions, and no other counterexample had been found.


Why did I become confident so quickly that yes, 1024 is probably the last example of a power of 2 that can be doubled without carrying?

Because of the heuristic that the decimal digits of 2n are more or less “random,” apart from various constraints that are irrelevant here (like that the last digit always cycles among 2, 4, 6, and 8). And 2n has about n/log210 decimal digits. Since only 0, 1, 2, 3, 4 can be doubled without carrying, the probability of 2n being a counterexample should therefore be about $$ (\frac{1}{2} )^{n / \log_2 10}. $$

So, if we’ve already checked up to (say) n=1000, then the probability of a larger counterexample should be at most

$$ \sum_{n=1001}^{\infty} (\frac{1}{2} )^{n / \log_2 10} $$

which, when we sum the geometric series, is exceedingly close to 0.


Ah, but why did I say that I don’t know if the conjecture will ever be proven? Because it seems to belong a large class of similar statements, none of which mathematicians have had any idea how to prove.

Variant of a conjecture by Jeffrey Shallit: 65,536 is the only power of 2 that has no power of 2 among its decimal digits.

Freeman Dyson’s conjecture (2005): There’s no power of 2 for which, when you reverse the decimal digits, you get a power of 5.

Paul Erdös’s conjecture (1979): For every n≥9, there’s at least one ‘2’ in the base-3 representation of 2n.

Or looking even more broadly:

Conjecture: The decimal expansion of π is not all 6’s and 7’s after some finite point.

(This would follow from the stronger conjecture that π is base-10 normal—that is, that every finite pattern of decimal digits occurs in it with the limiting frequency you’d expect.)

Or:

Conjecture: π+e is an irrational number.

What all the above conjectures have in common, and what I find so fascinating about them, is that they seem hopeless to prove for exactly the same reason why they seem almost certainly true. Namely, they all seem to be true “merely” because it would be too insane of a coincidence were they false!

The trouble is, that’s not the sort of reason that seems amenable to turning into a proof. Fermat’s Last Theorem is an interesting exception that proves the rule here. That xn+yn=zn has no nontrivial integer solution for n≥3, did seem almost certainly true on statistical grounds for n≥4 (and for the n=3 case, a proof goes back to Euler). And of course, FLT was ultimately provable. But Wiles’s eventual proof exploited a lucky connection between the Fermat equation and deep, fancy things like modular forms and elliptic curves. At no point did the proof formalize the statistical argument that a 12-year-old could understand, for why the theorem is “almost certainly true.” It simply had nothing to do with the statistical argument.

So then, if you wanted to prove conjectures like my son Daniel’s, or like Shallit’s or Dyson’s or Erdös’s above, the question would be: could these “recreational” problems about base-10 representations ever be connected to anything similarly deep? Right now, it’s very hard to see how they could.

Still, all hope is not lost! Here’s a striking theorem that I learned about when I researched this:

Theorem (James Maynard, 2016): For every digit a from 0 to 9, there are infinitely many primes with no a’s in their base-10 representation.

The proof uses heavy Fourier-analytic techniques. Likewise, presumably there are infinitely many primes that you can double without carrying—2, 3, 11, 13, 23, 31, …—because the primes are much denser than the powers of 2! And presumably there are infinitely many primes that are missing any two decimal digits, or even whose decimal digits consist entirely of 0’s and 1’s. But Maynard’s techniques are not yet powerful enough to prove such things.


Even though I promised a topic of no importance today, I can’t resist pointing out a potential connection here to one of the biggest questions on earth right now, and something that’s professionally interested me for the past four years: namely, the question of how to align powerful AIs with human values and prevent them from destroying the world.

Paul Christiano, and the Alignment Research Center in Berkeley that he founded, have developed a whole program for how to make AI safe that depends on the possibility of formalizing “heuristic arguments”—that is, the kinds of arguments that convince us that the above conjectures are all almost certainly true, even without proofs of them.

The intuition here is that we’ll never have a rigorous proof that, for example, a real-world neural network will behave safely under all circumstances—it’s just too complicated. The best we can hope for is an argument that, e.g., “for this model to scheme against humans would require a crazy unexplained coincidence in its weights.” But how can we hope to formalize such arguments? As baby test cases, can we at least formalize our intuitions for why π is normal, or why Daniel’s conjecture is true, in some principled way?

ARC has tried: there’s a 2022 paper by Christiano, Neyman, and Xu on “Formalizing the presumption of independence.” But it’s tricky, and ARC itself would be the first to agree that the existing results are unsatisfying. How do we even formalize the intuition that, for example, you should be willing to bet at even odds against the 10100th digit of π being a 5?


In the rest of this talk, I’d like to circle back to Daniel’s original question about powers of 2, and show you some things that can be proved about it—with thanks to Greg Kuperberg and my other friends on Facebook, and in some cases to GPT5Pro.

Let’s start with the following easier question. Is there a power of 2 whose decimal digits start with 31415? Or with the complete works of Shakespeare, encoded by letter values in some suitable way? Or with a googolplex digits all of which can be doubled without carrying (as Daniel wanted)?

I claim that the answers are yes, yes, and yes! How do we prove this?

The key fact we’ll use is simply that log102 is an irrational number. (If you don’t remember the proof: suppose log102 = a/b. Then 10a/b = 2, so 10a=2b. But this has no integer solutions other than a=b=0.)

Suppose we want k as a prefix, where 10d-1 ≤ k ≤ 10d. Then we want integers n,r such that

$$ k 10^r \le 2^n \le (k+1) 10^r, $$

i.e. taking the base-10 log,

$$ \log_{10}k + r \le n \log_{10}2 \le \log_{10}(k+1) + r. $$

In other words, the fractional part of n log102 needs to lie between the fractional part of log10(k), and the fractional part of log10(k+1) (where again, k is given).

But now we can appeal to the following Key Fact: if α is any irrational number, then the set

{the fractional part of nα : n∈N}

is dense in the interval [0,1]. Or equivalently, if I rotate around and around the unit circle by 2απ radians each time, then if α is irrational, I’ll eventually get arbitrarily close to any given point on the unit circle.

Why is the key fact true? Just the pigeonhole principle! Clearly, for any ε>0, the fact that α is irrational means that there must be two points, xα and yα, whose fractional parts are distinct yet closer together than ε. But then, by adding multiples of (x-y)α, we can get our fractional part to be ε-close to anything in [0,1].

And to sum up, this is why there must be a power of 2 whose decimal representation starts with the complete works of Shakespeare, or with a googolplex carry-free digits! (Indeed, from the above discussion, we could even extract an efficient algorithm for constructing that power of 2.)


So much for the first decimal digits of 2n. Now let’s look at 2n‘s last decimal digits!

Here there are some complications, arising from the twin facts that

(a) 10 is composite, and
(b) 2 is one of its factors.

But we can deal with those complications!

For starters: what are the possible last decimal digits of 2n?

1, 2, 4, 8, 6, 2, 4, 8, 6, …

So, there’s an initial 1, but then we cycle forever through the even nonzero digits.

What about the last two digits of 2n? If you’ve never tried this before, it’s instructive to work it out:

01, 02, 04, 08, 16, 32, 64, 28, 56, 12, 24, 48, 96, 92, 84, 68, 36, 72, 44, 88, 76, 52, 04, …

So, there’s an initial 01 and 02, but after that, we cycle forever through 20=4×5 possibilities, namely all the possible multiples of 4 whose last digits are nonzero.

You can check that the general pattern is: the last k decimal digits of 2n have an initial segment that looks like 001, 002, 004, 008, …, 2k-1. And then there’s an eternal cycle of length 4×5k-1, where the last digit can be any of 2,4,6,8, while every other digit can either be any possible even digit or any possible odd digit, depending on the digits to its right—in (if you want to say this a fancier way) a recursively defined embedding of the powers of 2 mod 5k into the cyclic group Z/10k. So, there’s an initial “runup” as you fill out all the needed powers of 2, but then once that’s done, you just cycle around forever in an embedding of Z/5k into Z/10k, because

(a) 2 happens to be a primitive element of Z/5k for any k, and
(b) 5 divides 10.

So in particular, and relevant to Daniel’s conjecture: there exists a power of 2 whose last googolplex digits can all be doubled without carrying. Why? For the last digit, you can pick 2 or 4. Then, for the last googolplex digits but one, you can pick 1 or 3 for those constrained to be odd, and 0, 2, or 4 for those constrained to be even. Lots of choices that work!


So, we can avoid carries in the leftmost digits of 2n, we can avoid carries in the rightmost digits … so that “merely” leaves all the digits in the middle, where who the hell knows! Empirically, the digits seem to pass every standard randomness test that you can throw at them. So in particular, e.g., the fraction of the digits that are in {0,1,2,3,4} seems to converge inexorably towards 50%, so that it’s extremely plausible to conjecture that the fraction is less than 49% or more than 51% for only finitely many values of n. But of course, that’s presumably even harder to prove than Daniel’s original conjecture.


OK, last topic. Suppose we want to program a computer to check Daniel’s conjecture up to 2n, for some huge n. What algorithm will do that most efficiently? A naive algorithm would just calculate 1, 2, 4, …, 2n and check all the digits of all of them. That takes O(n2) time, since each 2k, for k=0,1,…,n, has ~klog102 = O(k) decimal digits.

We can improve this to roughly O(n) time, by simply noticing that we only need to check O(1) digits per power of 2 in expectation, presumably, until we find the first digit that requires carrying. Then we don’t even need to compute the remaining digits: we can simply move on to the next power of 2. (This sort of trick is used all over the place in the design of fast algorithms.)

But when I posted about Daniel’s problem on Facebook, my friend Greg Kuperberg (who’s a mathematician at UC Davis) noticed that further improvements are possible. To wit: 8×6 = 48, which again ends in 8. So, 8×16 ends in 8, as does 8×16k for every k≥0. Meaning: no 2n where n=3+4k can possibly be a counterexample to Daniel’s conjecture. They’re all ruled out!

Likewise, 64×1,048,576 ends in 64, so no 2n of the form n=6+20k can be a counterexample. They’re all ruled out as well.

We can keep going this way, filling out the “search tree” of potential counterexamples to Daniel’s conjecture via breadth-first search. At the root of the search tree, we try all possibilities for the last digit. One level deeper, we try all possibilities for the second-to-last digit, and so on. As we go, we prune subtrees according to constraints like the ones above that we keep discovering and adding.

When I worked this out, I got an algorithm for checking Daniel’s conjecture up to 2n, which under reasonable assumptions takes time only O(nα), where α = 1-log52 ≈ 0.569, and space only polylog(n).

Paul Crowley (who’s my Facebook friend) then actually implemented this algorithm, and he tells me that he used it to check that Daniel’s conjecture holds all the way up to $$2^{10^{21}}$$ (!!), using 40 minutes on a 128-core machine.

So, to return at last to the first thing I told Daniel: yes, I think his conjecture is almost certainly true, even though I have no idea when, if ever, the human race or its successors will have a proof.

May 09, 2026

John BaezFreiman’s Constant

 

I recently asked people on Mastodon “What’s the most surprising fact you’ve learned in the last couple of weeks?” It was a nice way to learn a lot of interesting things. My own biggest recent surprise was this: the number

\displaystyle{ F = \frac{2221564096 + 283748 \sqrt{462}}{491993569} }

plays a fundamental role in number theory!

For any irrational x, we define its Lagrange number to be the supremum of numbers c such that

\displaystyle{ \left| \frac{p}{q} - x \right| < \frac{1}{cq^2} }

has infinitely many solutions for rationals p/q. So, the easier x is to approximate by rational numbers, the bigger its Lagrange number is.

Quite famously, the golden ratio has the smallest possible Lagrange number, namely √5. This means it’s as hard as possible to approximate using rational numbers.

The set of all Lagrange numbers is very complicated, and very interesting. But here’s the shocking fact: if

\displaystyle{ F = \frac{2221564096 + 283748 \sqrt{462}}{491993569} }

then every real number \ge F is a Lagrange number, and F is the smallest number with this property!

F is called ‘Freiman’s constant’, because he proved this fact. His proof is 100 pages, and I don’t want to read it… not even counting the fact that it’s only available in Russian:

• G. A. Freiman, Diophantine Approximations and the Geometry of Numbers (Markov’s Problem) [in Russian]. Kalinin State University Press, Kalinin, 1975.

There’s a lot more crazy stuff known about the set of all Lagrange numbers, which is called the Lagrange spectrum. As mentioned, the smallest number in the Lagrange spectrum is \sqrt{5}. The next is \sqrt{8}. The next is \sqrt{221}/5. The next is \sqrt{1517}/13. In 1879, Markov showed that such numbers form an increasing sequence that converges to 3. They are precisely the Lagrange numbers of numbers whose continued fraction expansion and eventually consists only of 1’s and 2’s and is eventually periodic, like this:

3 + \cfrac{1}{7 + \cfrac{1}{1 + \cfrac{1}{2 + \cfrac{1}{1 + \cfrac{1}{2 + \cdots}}}}}

3 is the Lagrange number of every number whose continued fraction expansion eventually consists only of 1’s and 2’s and is not eventually periodic. Above 3 the Lagrange spectrum becomes much richer. It’s a fractal: it has infinitely many gaps, but positive Hausdorff dimension, with the dimension increasing as we move up.

Moreira showed that when we reach \sqrt{12} \approx 3.464 the Hausdorff dimension of the Lagrange spectrum hits 1. And as mentioned, Freiman showed that above

\displaystyle{ F = \frac{2221564096 + 283748 \sqrt{462}}{491993569} \approx 4.52782956616087914\dots }

the Lagrange spectrum is a half-line. Directly below Freiman’s constant, Freiman showed there is a gap of width roughly 3 \cdot 10^{-8}.

Here is the classic reference in English on this subject:

• Thomas W. Cusick and Mary E. Flahive, The Markov and Lagrange Spectra, AMS Mathematical Surveys and Monographs 30, AMS, Providence, Rhode Island, 1989.

The Markov spectrum is another set, containing the Lagrange spectrum, and their relationship is very interesting. Here’s a free online reference that reviews all the basics before doing more:

• Carlos Gustavo Moreira, Geometric properties of the Markov and Lagrange spectra, Annals of Mathematics 188, 145–170.

For a stripped-down account, go here:

• Wikipedia, Markov spectrum.

Tim GowersA recent experience with ChatGPT 5.5 Pro

We are all having to keep revising upwards our assessments of the mathematical capabilities of large language models. I have just made a fairly large revision as a result of ChatGPT 5.5 Pro, to which I am fortunate to have been given access, producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me.

The background is that, as has been widely reported, LLMs are now capable of solving research-level problems, and have managed to solve several of the Erdős problems listed on Thomas Bloom’s wonderful website. Initially it was possible to laugh this off: many of the “solutions” consisted in the LLM noticing that the problem had an answer sitting there in the literature already, or could be very easily deduced from known results. But little by little the laughter has become quieter. The message I am getting from what other mathematicians more involved in this enterprise have been saying is that LLMs have got to the point where if a problem has an easy argument that for one reason or another human mathematicians have missed (that reason sometimes, but not always, being that the problem has not received all that much attention), then there is a good chance that the LLMs will spot it. Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.

I decided to try something a little bit different. At least in combinatorics, there are quite a lot of papers that investigate some relatively new combinatorial parameter that leads naturally to several questions. Because of the sheer number of questions one can ask, the authors of such papers will not necessarily have the time to spend a week or two thinking about each one, so there is a decent probability that at least some of them will not be all that hard. This makes such papers very valuable as sources of problems for mathematicians who are doing research for the first time and who will be hugely encouraged by solving a problem that was officially open. Or rather, it used to make them valuable in that way, but it looks as though the bar has just been raised. It is no longer enough that somebody asks a problem: it needs to be hard enough for an LLM not to be able to solve it.

In any case, a little over a week ago I decided to see how ChatGPT 5.5 Pro would fare with a selection of problems asked by Mel Nathanson in a paper entitled Diversity, Equity and Inclusion for Problems in Additive Number Theory. Nathanson has a remarkable record of being interested in problems and theorems that have later become extremely fashionable, which has led him to write a series of extremely well timed and therefore highly influential textbooks. In this paper, he argues for the interest of several other problems, some of which I will now briefly describe.

If A is a set of integers, then its sumset A+A is defined to be \{a+b:a,b\in A\}. For a positive integer h, the hfold sumset, denoted hA, is defined to be \{a_1+\dots+a_h: a_1,\dots,a_h\in A\}. Nathanson is interested in the possible sizes of hA given the size of A. To that end one can define a set \mathcal R(h,k) to be the set of all t such that there exists a set A with |A|=k and |hA|=t.

An obvious first question to ask is simply “What is \mathcal R(h,k)?” When h=2, the answer is the set of all integers between 2k-1 and \binom{k+1}2. It is an easy exercise to show that if |A|=k, then 2k-1\leq|A+A|\leq\binom{k+1}2, so this result is saying that all sizes in between can be realized. However, it is not true in general that hA can take every size between its minimum and maximum possibilities, and we do not currently have a complete description of \mathcal R(h,k).

Another natural question one can ask, and this is where ChatGPT came in, is how large a diameter you need if you want a set A with A and hA having prescribed sizes. (Of course, the size of hA must belong to \mathcal R(h,k).) Nathanson showed that for every t\in[2k-1,\binom{k+1}2] there is a subset A of \{0,1,2,\dots,2^k-1\} with |A|=k and |A+A|=t, and asked whether the bound 2^k-1 could be improved. ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds before providing a construction that yielded a quadratic upper bound, which is clearly best possible. It wrote up its argument in a slightly rambling LLM-ish style, so I asked if it could write the argument up as a LaTeX file in the style of a typical mathematical preprint. After two minutes and 23 seconds it gave me that, after which I spent some time convincing myself that the argument was correct.

The basic idea behind both Nathanson’s argument and ChatGPT’s was that in order to obtain a set of a given size with a sumset of a given size, it is useful to build it out of a Sidon set, which means a set with sumset of maximal size (that is not quite the usual definition but it is the simplest to use in this discussion), and an arithmetic progression. Also, for a bit of fine tuning one can take an additional point near the arithmetic progression. Then if one plays around with the various parameters, one finds that one can obtain sets of all the sizes one wants. Nathanson doesn’t express his argument this way (it is Theorem 5 of this paper), instead giving an inductive argument, but I think, without having checked too carefully, that if one unravels his argument, one finds that effectively that is what he ends up with, and the Sidon set in question consists of powers of 2. ChatGPT obtained its improvement by simply using a more efficient Sidon set — it is well known that one can find Sidon sets of quadratic diameter. (One might ask why Nathanson didn’t do that in the first place: I think it is because the obvious idea of using a more efficient Sidon set becomes obvious only after one has redescribed his inductive construction. Is that what ChatGPT did? It is very hard to say.)

Next, I asked ChatGPT to see whether it could do the same for a closely related question, where instead of looking at the size of the sumset, one looks at the size of the restricted sumset, which is defined to be \{a+b:a,b\in A, a\ne b\}. Unsurprisingly, it was able to do that with no trouble at all. I got it to write both results up in a single note, to avoid a certain amount of duplication. If you are curious, you can see the note here.

I then asked what it could do for general h. I was much less optimistic that it would manage to do anything interesting, because the proof for h=2 makes fundamental use of the fact (due to Erdős and Szemerédi) that we know exactly which sizes we need to create. If we don’t know what the set \mathcal R(h,k) is, then it seems that we are forced to start with a hypothetical set A with |A|=k and |hA|=t and build out of it a set of small diameter with the same property. As it happens, I still don’t know how to get round that difficulty (I’m mentioning that just to demonstrate that my mathematical input was zero, and I didn’t even do anything clever with the prompts), but Nathanson mentioned in his paper a remarkable paper of Isaac Rajagopal, a student at MIT, who must have got round the difficulty somehow, because he had managed to prove an exponential dependence of \mathcal R(h,k) on k for each fixed h.

I’ll leave the previous paragraph there, but Isaac has subsequently explained to me that that isn’t really the difficulty. His argument gives a complete description of \mathcal R(h,k) when k is sufficiently large, and if one wants to prove a polynomial dependence for fixed h, then assuming that k is sufficiently large is clearly permitted. The real difficulty is that constructing the sets with given sumset sizes was significantly more complicated, and necessarily so because the degree of the polynomial grows with h, and one therefore needs more and more parameters to define the sets.

In any case, the task faced by ChatGPT was not to solve the problem from scratch, but to see whether it was possible to tighten up Isaac Rajagopal’s argument. Here’s what happened.

  1. After 16 minutes and 41 seconds, it came back with an argument that claimed to have improved the upper bound from exponential in k to exponential in k^\alpha for any \alpha>1/2.
  2. I asked it to write that in preprint form too, which took it a further 47 minutes and 39 seconds.
  3. That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.
  4. Both ChatGPT and Rajagopal speculated a little on what might need to be done to push things further and get a polynomial bound, so I got greedy and asked ChatGPT to give that a go.
  5. After 13 minutes and 33 seconds it told me it felt optimistic about the existence of such an argument but there were a couple of technical statements that needed checking.
  6. I asked it to check them.
  7. After 9 minutes and 12 seconds it got back to me with the check having been done, so I asked for this too to be written in preprint form.
  8. After 31 minutes and 40 seconds the “preprint” was ready. Here it is.
  9. Isaac Rajagopal looked at it and declared it to be almost certainly correct. It was clear that he meant this not just at a line-by-line level but at the level of ideas.

Isaac made some very interesting remarks about the nature of what the additional ideas were that ChatGPT contributed. Since, as I have already said, my mathematical input was zero, I invited him to write a guest section to this post. Just before we get to that, I want to raise a question (that will undoubtedly have been raised by others as well), which is simple: what should we do with this kind of content? Had the result been produced by a human mathematician, it would definitely have been publishable, so I think it would be wrong to describe it as AI slop. On the other hand, it seems pointless even to think about putting it in a journal, since it can be made freely available, and nobody needs “credit” for it (except that Isaac deserves plenty of credit for creating the framework on which ChatGPT could build). I understand that arXiv has a policy against accepting AI-written content, which makes good sense to me. So maybe there should be a different repository where AI-produced results can live. But various decisions would need to be made about how it was organized. I myself think that one would probably want to have some kind of moderation process, so that results would be included only if a human mathematician was prepared to certify that they were correct — or, better still, that they had been formalized by a proof assistant — and perhaps also that they answered a question that had been asked in a human-written paper. On the other hand, I wouldn’t want a moderation process that created vast amounts of work (unless the work was itself done by AI, but there are obvious dangers in going down that route). Anyway, until these questions are answered, this result is available from the link above, and perhaps, now that LLMs are so good at literature search, that will be enough to make it findable by anyone who wants to know whether Nathanson’s problem has been solved.

Isaac’s evaluation of what ChatGPT achieved

With just a few prompts, ChatGPT was able to improve the upper bound on N(h,k) (which I will define very soon) from exponential in k to polynomial in k. While its first improvement of the bound, from exponential in k to exponential in k^{\frac{1}{2} + \varepsilon}, was a routine modification of my work, the improvement to polynomial in k is quite impressive. To do this, ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove, using similar methods to those in my own proof. My goal is to explain that idea, in a manner that will be digestible to my friends who are computer science majors as well as my math major friends.

The problem of bounding N(h,k) is closely related to a problem I worked on at the Duluth REU (Research Experience for Undergrads) program, of determining \mathcal{R}(h,k). In particular, \mathcal{R}(h,k) is the set of possible h-fold sumset sizes |hA|, where A can be chosen to be any set of k integers. N(h,k) is the minimal N such that we can achieve all of the values of \mathcal{R}(h,k) using k-element sets A \subset \{0,1,2,\ldots,N\}. I spent last summer explicitly characterizing the set \mathcal{R}(h,k) for large k, by constructing sets A such that |hA| achieves all sizes which I could not rule out as impossible. So, N(h,k) can be upper-bounded by optimizing my constructions.

I constructed these sets A by combining smaller component sets which are simpler to analyze. Some of these components are the geometric series

\displaystyle S = \{0,1,m,m^2,\ldots,m^{\ell-2}\} \quad \hbox{and} \quad T = \{1,m,m^2,\ldots,m^{\ell-1}\} \qquad (1)

for various values of 2 \leq m \leq h and 2 \leq \ell \leq k. Unfortunately, the elements of S and T are exponentially large in terms of k. So, I asked ChatGPT (through Tim) whether there exist sets of \ell elements which have similar sumset sizes to these geometric series, but contain only numbers of polynomial size in \ell: I had no idea if this was possible, or how to begin constructing such sets. ChatGPT came back with an answer, constructing sets G and H which behave like “half a geometric series squeezed into a polynomial interval,” which is counterintuitive. Before I discuss the construction of G and H, I will explain the important properties of the sumset sizes of S and T which they recreate.

For h > 0, a set A is called a B_h set if the only solutions to

\displaystyle x_1+\cdots+x_h = y_1+\cdots+y_h

with x_i,y_i in A are the “trivial” solutions, by which I mean that one side of the equation is a reordering of the other side. If A is a B_h set of size \ell, then elements of hA correspond exactly to choices of h elements of A, with repetition allowed. Using “stars and bars,” one can see that |hA| = \binom{h+\ell - 1}{h} and this is the maximum possible value of |hA| among sets of size \ell. So, another definition is that A is a B_h set if |hA| = \binom{h+|A| - 1}{h}. Sidon sets, which Tim discussed, are exactly B_2 sets.

To make things more concrete, let us assume that m = 4 in (1). Then, S is a B_3 set, but it is not a B_4 set because of the relations

\displaystyle 4^{a} + 4^a + 4^a + 4^a = 4^{a+1} + 0 + 0 + 0 \qquad (2)

for any choice of a in \{0,1,2,\ldots, \ell-3\}. In particular, \binom{\ell+3}{4} - |4S| = \ell-2, as these \ell-2 relations are the only ones preventing S from being a B_4 set. T lacks the relations in (2) because 0 is not in T. So, T is a B_4 set, but it is not a B_5 set because of the relations

\displaystyle 4^{a} + 4^a + 4^a + 4^a + 4^{b+1} = 4^{a+1} + 4^b + 4^b + 4^b + 4^b \qquad (3)

for any choices of a \neq b in \{0,1,2,\ldots, \ell-2\}. This gives \binom{\ell-1}{2} relations, and one can check that \binom{\ell+4}{5} - |5T| = \binom{\ell-1}{2}. To summarize, we have seen that

(a) S is a B_{m-1} set.

(b) \binom{m+\ell-1}{m} - |mS| = \ell -2 is a linear function of \ell.

(c) T is a B_{m} set.

(d) \binom{m+\ell}{m+1} - |(m+1)T| = \binom{\ell-1}{2} is a quadratic function of \ell.

    ChatGPT was able to find sets G and H of \ell elements which satisfy (a)-(d), but whose elements all have polynomial size in \ell. The construction of G and H uses h^2-dissociated sets, which are sets A where the only solutions to

    \displaystyle x_1+\cdots+x_s = y_1+\cdots+y_{s'} \qquad (4)

    with s,s' \leq h^2 and x_i,y_i in A are the “trivial” solutions, i.e. s = s' and one side of the equation is a reordering of the other side. For r > 0, it is possible to construct an h^2-dissociated set U = \{u_1,\ldots,u_r\} \subseteq \{0,1,2,\ldots,N\}, where N is approximately r^{h^2}, and in particular polynomial in r. Constructions of such a U using finite fields date back to Singer (1938) and Bose–Chowla (1963) and are described in Appendix 1. Define

    \displaystyle G = \{0, u_1,u_2,\ldots,u_r,mu_1,mu_2,\ldots, mu_r\}

    and

    \displaystyle H= \{u_1,u_2,\ldots,u_r,mu_1,mu_2,\ldots, mu_r\}. \qquad (5)

    In hindsight, I have good intuition for the construction of G and H. All of the relations in (2) and (3) are formed by combining one or two relations of the form 4x = y. There are approximately \ell relations of the form mx = y in S and T, and approximately \ell/2 such relations in G and H. There are few other low-order relations in S and T, and similarly in G and H because U is h^2-dissociated. So, G and H manage to contain half as many mx = y-relations as their geometric series counterparts, while also containing few low-order relations.

    We now see why (a)-(d) hold with S and T replaced by G and H, respectively. For concreteness, we assume that m = 4 and h>4, so U contains no nontrivial relations as in (4) with s,s' \leq 25 \leq h^2. Then, G is a B_3 set, but it is not a B_4 set because of the relations

    \displaystyle u_i + u_i + u_i + u_i = 4u_i + 0 + 0 + 0

    for any choice of i in \{1,2,\ldots, r\}. If we let \ell = |G| = 2r+1, we can check that \binom{\ell + 3}{4} - |4G| = r = \frac{\ell-1}{2} is linear in \ell. In particular, (a) and (b) hold with S replaced by G, and the linear function \ell-2 replaced by \frac{\ell-1}{2}. We can also see that H is a B_4 set, but it is not a B_5 set because of the relations

    \displaystyle u_i + u_i + u_i + u_i + 4u_j = 4u_i + u_j + u_j +u_j + u_j

    for any i\neq j in \{1,2,\ldots, r\}. If we let \ell = |H| = 2r, we can check that \binom{\ell + 4}{5} - |5H| = \binom{r}{2} = \binom{\ell/2}{2} is quadratic in \ell. In a similar manner, (c) and (d) hold with T replaced by H, and the quadratic function \binom{\ell-1}{2} replaced by \binom{\ell/2}{2}.

    Even though I can motivate it in retrospect, ChatGPT’s idea to use h^2-dissociated sets to control relations of order at most h feels quite ingenious. As far as I can tell, this idea is completely original.

    ChatGPT’s proof that its construction produces the desired values of |hA| is very similar to my proof that the sets A which I construct achieve all possible values of |hA|, after replacing S and T by G and H, respectively. Properties (a)-(d) capture many of the important properties of S and T (or G and H) which are used in this proof. The final constructions involve combining the sets G and H (or S and T in my paper) for each value of m between 2 and h with another set which is the union of an arithmetic progression and a point. Intuitively, G and H (or S and T) have large sumsets, while arithmetic progressions have small sumsets, so it is plausible that one could get sets which achieve all the medium-sized sumsets by combining them. However, the proof of this is quite involved, and it occupies Section 4 of my paper and the entirety of the ChatGPT preprint. In Appendix 2, I work out the details of the ChatGPT construction to show that for k sufficiently large,

    \displaystyle N(h,k) \leq O\left(k^{10h^3}\right).

    For comparison, it is easy to see that N(h,k) is at least on the order of k^{h}, and it is unknown what the real value is. In Appendix 3, I give details of the correspondence between my paper and the ChatGPT preprint, which will be helpful for those who want to read either.

    Finally, I want to express my deep gratitude to Tim for allowing me to contribute to this blog. I am still stunned by the coincidence that the problem he chose to put into ChatGPT 5.5 Pro led him to my paper on the arXiv.

    Tim on what this means for mathematical research

    I would judge the level of the result that ChatGPT found in under two hours to be that of a perfectly reasonable chapter in a combinatorics PhD. It wouldn’t be considered an amazing result, since it leant very heavily on Isaac’s ideas, but it was definitely a non-trivial extension of those ideas, and for a PhD student to find that extension it would be necessary to invest quite a bit of time digesting Isaac’s paper, looking for places where it might not be optimal, familiarizing oneself with various algebraic techniques that he used, and so on.

    It seems to me that training beginning PhD students to do research, which has always been hard (unless one is lucky enough, as I have often been, to have a student who just seems to get it and therefore doesn’t need in any sense to be trained), has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

    I would qualify that statement in two ways though. First, there is the obvious point that a beginning PhD student has the option of using LLMs. So the task is potentially easier than proving something that LLMs can’t prove: it is proving something in collaboration with LLMs that LLMs cannot manage on their own. I have done quite a lot of such collaboration recently and found that LLMs have made useful contributions without (yet) having game-changing ideas.

    A second point is that I don’t know how much of what I have said generalizes to other areas of mathematics. Combinatorics tends to be quite focused on problems: you start with a question and you reason back from the question or if you reason forwards you do so very much with the question in mind. In other areas there can be much more of an emphasis on forwards reasoning: you start with a circle of ideas and see where it leads. To do it successfully, you need to have some way of discriminating between interesting observations and uninteresting ones, and it isn’t obvious to me what LLMs would be like at that.

    Of course, everything I am saying concerns LLMs as they are right now. But they are developing so fast that it seems almost certain that my comments will go out of date in a matter of months. It is also almost certain that these developments will have a profoundly disruptive effect on how we go about mathematical research, and especially on how we introduce newcomers to it. Somebody starting a PhD next academic year will be finishing it in 2029 at the earliest, and my guess is that by then what it means to undertake research in mathematics will have changed out of all recognition.

    I sometimes get emails from people who are interested in doing mathematical research but are not sure whether that makes sense any more as an aspiration. I have a view on that question, but it may very well change in response to further developments. That view is that there is still a great deal of value in struggling with a mathematics problem, but that the era where you could enjoy the thrill of having your name forever associated with a particular theorem or definition may well be close to its end. So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody. Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

    So what is the point of struggling with a difficult mathematics problem? One answer is that it can be very satisfying to solve a problem even if the answer is already known, but I don’t think that is a sufficient reason to spend several years of your life on this peculiar activity. A better answer is that by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders, or people who have a solid grasp of how to do basic arithmetic are likely to be more skilled at using calculators (and especially at noticing when an answer feels off). Mathematics is a highly transferable skill, and that applies to research-level mathematics as well. By doing research in mathematics, you may not get the same rewards as your equivalents a generation ago, but there is a good chance that you will be equipping yourself very well for the world we are about to experience.

    Appendix 1 (Isaac)

    We will construct an h-dissociated set U = \{u_1,\ldots,u_r\} \subseteq \{0,1,2,\ldots,N\}, where N is approximately r^{h}. This construction is a very minor modification of Bose–Chowla (1963)’s construction of a B_h set, which I learned about from this paper. For whatever reason, the GPT preprint (Lemma 3.1) uses a different, less efficient construction using moment curves.

    Let p > r be a prime, let N = p^{h+1}-2, let K be the finite field with p^{h+1} elements and fix a generator \theta of K^\times, so that K^\times is equal to \{\theta^0,\theta^1,\ldots, \theta^N\}. Define a set of p elements

    \displaystyle U = \{a \in \{0,1,2,\ldots,N\}: \theta^a - \theta \in \mathbb{F}_p\}.

    Then, each element a \in U corresponds to a unique value of \tilde{a} \in \mathbb{F}_p, by taking \tilde{a} = \theta^a - \theta. Now an additive relation of the form in (4) with s,s' \leq h can be reframed by taking powers of \theta as

    \displaystyle (\theta + \tilde{x_1})(\theta + \tilde{x_2})\cdots (\theta + \tilde{x_s}) = (\theta + \tilde{y_1})(\theta + \tilde{y_2})\cdots (\theta + \tilde{y_{s'}}). \qquad (6)

    As K is a degree-h+1 extension of \mathbb{F}\sb{p} and \theta is a generator of K as an \mathbb{F}\sb{p}-extension, this means that \theta does not satisfy any nonzero polynomials in \mathbb{F}\sb{p}[x] of degree \leq h. So, both sides of (6) are identical as polynomials in \mathbb{F}_{p}[\theta] and thus the additive relation in (4) is trivial. So, U is h-dissociated, and of course one can prune a few elements to reduce U to size r.

    Appendix 2 (Isaac)

    Fix constants \alpha,\beta,\gamma such that 0.5 < \beta\gamma < \beta < \alpha < 1 (in my paper I arbitrarily chose (\alpha,\beta,\gamma) = (0.9,0.8,0.7)). Let the two sets in (5) be called G_{m,r} and H_{m,r}. Let [a,b] denote the set of integers x satisfying a \leq x \leq b. Similarly to my paper, the constructions of A such that hA achieves the desired sizes will combine sets of the following four types:

    • B_{j,b} := [0,b-2] \cup \{b-2+j\} with choices of b \in [3, k-k^\gamma] and j \in [1,hb].
    • G_{m,r_m} for each value of m \in [3, h], with choices of r_m \in [0, (k-b)^\alpha].
    • H_{m,u_m} for each value of m \in [2,h-1], with choices of u_m \in [0, (k-b)^\beta].
    • A B_h set of the correct size so that |A| = k.

    One reason that this construction needs to be complicated is that we need to create at least \Omega(k^h) many sets. To do this, we vary 2h-4 parameters r_m and u_m in the domain [0,k^\alpha] and 2 parameters b and j in the domain [1,hk]. We can choose \alpha to be slightly bigger than 1/2, and then the above construction gives us O(k^{\alpha(2h-4)+ 2})=O(k^{h + \delta}) different sets where \delta >0 can be made arbitrarily small. So, if we were to remove any of the above parameters from the construction, and not change the others, this construction would no longer create \Omega(k^h) many sets. In comparison, Nathanson’s construction when h=2 only needs to create \Omega(k^2) sets. He does this by combining a Sidon set, an arithmetic progression, and one extra value, and varying the size of the arithmetic progression and the extra value in ranges of size O(k).

    We want to combine q = 2h-2 sets A_1,\ldots,A_q, which are given by B_{j,b}, G_{m,r_m} for the h-2 values of m \in [3,h], H_{m,u_m} for the h-2 values of m \in [2,h-1], and a B_h set. By Appendix 1, for all r \leq k, there exists a h^2-dissociated set {u_{1},\ldots,u_{r}} of diameter M \leq r^{2h^2} \leq k^{2h^2}. By the constructions of G_{m,r_m} and H_{m,u_m}, we can take each A_i \subseteq [0,M], where M \leq hk^{2h^2}. Let \mathbb{Z}^{2q} have basis vectors e_1,\ldots,e_{2q}. To combine A_1,\ldots,A_q, we can define A \subseteq \mathbb{Z}^{2q} as

    \displaystyle A = \bigcup_{i=1}^q (A_i e_i + e_{q+i}) \subseteq \{0,1,2,\ldots,M\}^{2q} \subseteq \mathbb{Z}^{2q}.

    Similarly to my Lemma 4.9, this construction ensures that the generating function product \mathcal{F}_{A}(z) = \prod_{i=1}^q \mathcal{F}_{A_i}(z) holds, which is the identity that both my paper and the GPT preprint use (see either paper for a definition of these generating functions). By (the standard) Lemma 2.3 of the GPT preprint, A is Freiman-isomorphic of order h to a subset of [0,2qM(2hM)^{2q-1}]. Therefore, for k sufficiently large (the whole construction relies on this for the same reasons as in my paper),

    \displaystyle N(h,k) \leq 2qM(2hM)^{2q-1} \leq 2\left(2h^2k^{2h^2}\right)^{2(2h-2)} \leq k^{10 h^3}.

    Appendix 3 (Isaac)

    In Section 4.2 of my paper, I use a different, simpler construction to construct sets A achieving the values in \mathcal{R}(h,k) which have |hA| < \varepsilon k^h, for some small \varepsilon. These sets A are subsets of {0,1,2,\ldots,k^h}, meaning that all elements have polynomial size in k. This is observed in Section 5 of the GPT preprint.

    Section 4.3 of my paper carries out the construction which combines many components including S and T. This corresponds to Sections 2, 3, 4, and 6 of the GPT preprint. This section has a lot of moving parts; I give an outline in Section 4.3.1.

    In Section 4.3.2, I describe how the different components will be combined, using a construction which I call the disjoint union, and introduce generating functions \mathcal{F}_A(z) as a bookkeeping tool to keep track of the sumset sizes of a set A. This corresponds to Section 2 and Section 4 of the GPT preprint.

    In Section 4.3.3, I compute the generating function of each of the component sets, including \mathcal{F}_S(z) (Lemma 4.15) and \mathcal{F}_T(z) (Lemma 4.17). This corresponds to Section 3 and Section 6.1 of the GPT preprint. In particular, \mathcal{F}_{G}(z) is computed in Lemma 3.3 and \mathcal{F}_{H}(z) is computed in Lemma 3.4. Once these generating functions have been computed, the remainder of the proof is almost identical in my paper and in the GPT preprint.

    In Section 4.3.4, I put all the pieces together to show that as we range over the sets A which I have constructed, the values of |hA| will assume all of the elements of {\lceil\varepsilon k^h\rceil, \lceil\varepsilon k^h\rceil+1,\ldots ,\binom{h+k-1}{h} }. The key idea is to show that the set of all values of |hA| forms an interval, and contains numbers both smaller than \varepsilon k^h and equal to \binom{h+k-1}{h}.

May 08, 2026

Matt von HippelBonus Info for “100-year-old assumption about the universe may soon be overturned”

I had a piece up in New Scientist last week (paywalled, sorry!), about a new analysis that suggests the universe is less homogeneous (more “lumpy”) that most cosmologists believe.

The piece was a bit different than my usual. Normally I do what people in the biz call “features”: longer articles about general trends. This was a much more classic “news piece”. The people I interviewed had several papers up in early April, the editors at New Scientist thought they were interesting enough to write about, so I was asked for a short, timely piece with the key takeaways.

That means I didn’t have a ton of space for background info. So if you’d like to know more, this post is for you!

The 100-year old assumption in the title refers to the Friedmann–Lemaître–Robertson–Walker (or FLRW) universe, an idea that first came together in the 1920’s, where cosmologists model the universe as homogeneous and isotropic: the same no matter where, or in which direction, you look. That sounds like a crazy assumption, but on the largest scales we can measure it’s actually mostly fine. Once you’re trying to calculate ripples in the cosmic microwave background or find out how fast distant galaxies are accelerating away, it works surprisingly well to act like the universe is an evenly-mixed soup of matter, radiation, dark matter, and dark energy.

But every assumption in physics has its doubters. The doubters of homogeneity are known as inhomogeneous cosmologists, and I’ve been sympathetic to their complaints for a while now.

I even let an inhomogeneous cosmologist do a guest post on my blog, back in 2019. That post argued something dramatic: that dark energy may not even exist, but that measurements of accelerating expansion may be a consequence of a dramatic lopsidedness in the universe around us.

The people I covered in New Scientist, Asta Heinesen, Tim Clifton, and Sofie Marie Koksbang, are arguing something much less dramatic…but that’s part of what makes it more compelling. Instead of arguing that the universe is dramatically uneven or lopsided, they’re arguing that the universe can still be on average smooth and homogeneous, the soup of galaxies people seem to expect…but still, can’t be fully modeled that way.

This is a tricky distinction to explain, and certainly something I didn’t have space to cover well enough in New Scientist. But let me take a stab at it here:

Any cosmologist will agree that FLRW can’t be the whole story. We know the universe isn’t a perfectly mixed soup: there are galaxies, and stars, and black holes, and they all wiggle the fabric of the universe in different places. When they study the universe as a whole, they’re averaging out all of that, to get the overall behavior, a bit like you could average the number of children in each family to get the average children per family in a country.

But FLRW isn’t just an average, it’s a model of spacetime. Because of that, it has to obey certain equations, called Einstein’s equations. It has to make sense by itself, as the correct answer for how spacetime would behave if it were filled with a uniform soup.

That’s an extra restriction, and that extra restriction can get you in trouble. To continue with the analogy, any real family has a whole number of children. But the average family doesn’t have a whole number of children. When I was born, the average family in the US had around 2.5 children. A lot of cartoons imagined what the half-child looked like.

From the perspective of Heinesen, Clifton, and Koksbang, assuming FLRW is a bit like assuming that the average family must have two children, or three, and can’t possibly have 2.5. Averages don’t have to look like sensible spacetimes, they don’t have to obey the Einstein equations.

In practice, the assumption of FLRW has worked a lot better than assuming that the average family can’t have 2.5 children, and that’s why Heinesen, Clifton, and Koksbang are cautious. They’re not claiming that inhomogeneity can explain everything, all the way to major components of the universe like dark energy. But they do think it can be a good explanation for smaller effects. And as cosmologists worry about smaller and smaller effects, wondering if dark energy changes over time and why the expansion rate of the universe doesn’t match up between different measurements, it can be important to remember that averages aren’t all-powerful. Eventually, they can break down. It’s a more subtle issue than a fractional child. But, as I covered in New Scientist, it may already be happening.

May 03, 2026

Terence TaoProducts of consecutive integers with unusual anatomy

I’ve just uploaded to the arXiv my paper “Products of consecutive integers with unusual anatomy“. This paper answers some questions of Erdős and Graham which were initially motivated by the study of the Diophantine factorial equation

\displaystyle  a_1! a_2! a_3! = m^2

where {a_1 < a_2 < a_3} and {m} are positive integers. Writing {(a,N,N+H) = (a_1,a_2,a_3)}, one can rewrite this equation as

\displaystyle  s( (N+1) \dots (N+H) ) = s(a!) \ \ \ \ \ (1)

where {s(n)} denotes the squarefree part of {n} (the smallest factor of {n} formed by dividing out a perfect square). For instance, we have

\displaystyle s(8 \times 9 \times 10) = 5 = s(6!)

which corresponds to the solution {6! 7! 10! = (6!\times 7!)^2} to the original equation.

The equation (1) ties into the general question of what the anatomy (prime factorization) of the product {(N+1) \dots (N+H)} looks like. This is a venerable topic, with the first major result being the Sylvester-Schur theorem from 1892 that the largest prime factor of {(N+1) \dots (N+H)} is greater than {H} whenever {N+1>H}. Another notable result is the Erdős-Selfridge theorem that the product {(N+1) \dots (N+H)} is never a perfect power for {H > 1}.

Erdős and Graham were able to show that solutions to (1) were somewhat rare, in that the set of possible values of {N+H} had density zero. For them, the hardest case to treat was when the interval {\{N+1,\dots,N+H\}} was what they called bad, in the sense that {(N+1) \dots (N+H)} was divisible by the square of its largest prime factor. They were able, with some effort, to show that the union of all bad intervals also had density zero, which was a key ingredient in to prove the previous result about solutions to (1). They isolated a subcase of the bad intervals, which they called the very bad intervals, in which the product {(N+1) \dots (N+H)} was a powerful number (divisible by the square of every prime factor).

A later paper of Luca, Saradha, and Shorey made the bounds more quantitative, showing that both the set of values of {N+H}, as well as the union of bad intervals, had density {O( \exp(-c \log^{1/4} x (\log\log x)^{3/4})} for some absolute constant {c>0}. In the other direction, just by considering the case {H=1}, one can show that the number of possible values of {N+H} up to {x} is {\gtrsim c_3^1 \sqrt{x}}, where {c_3^1} is the constant

\displaystyle  c_3^1 = 1 + \sum_{a \geq 2: a \neq n^2 \forall n} \frac{1}{s(a!)^{1/2}} = 3.70951\dots.

As for the bad intervals, by again considering the case {H=1}, it is possible to show that the number of bad points up to {x} is

\displaystyle  x / \exp((\sqrt{2}+o(1)) \sqrt{\log x \log\log x});

see for instance this paper of Ivic. Similarly, the union of the very bad intervals contains as the set of powerful numbers; Golomb worked out that the number of powerful numbers up to {x} is {\sim \frac{\zeta(3/2)}{\zeta(3)} \sqrt{x}}.

It was conjectured by Erdős and Graham that all of these lower bounds are in fact sharp (up to {1+o(1)} multiplicative factors); this is Erdos Problem 380 (and a portion of Erdos Problem 374). The main result of this paper is to confirm this conjecture in two cases and come close in the third:

Theorem 1
  • The number of numbers up to {x} that lie in a bad interval of length {H>1} is {O(1/\log^{1-o(1)} x)} of the number of bad points up to {x}.
  • The number of numbers up to {x} that lie in a very bad interval of length {H>1} is {O(x^{2/5+o(1)})}.
  • The number of numbers up to {x} of the form {N+H} for a solution to (1) is {O(x^{1/2+o(1)})}.

Not surprisingly, the methods of proof involve many standard tools in analytic number theory, such as the prime number theorem (and its variants in short intervals), zero density estimates, Vinogradov’s bounds on exponential sums, asymptotics for smooth numbers, the large sieve, the fundamental lemma of sieve theory, and the Burgess bound for character sums. There was one point where I needed a small amount of algebraic number theory (the classification of solutions to a generalized Pell equation), which was the one place where I turned to AI for assistance (though I ended up rewriting the AI argument myself). One amusing point is that I specifically needed the recent zero density theorem of Guth and Maynard (as converted to a bound on exceptions to the prime number theorem in short intervals by Gafni and myself); previous zero density theorems were barely not strong enough to close the arguments.

A few more details on the methods of proof. It turns out that very bad intervals, or intervals solving (1), are both rather short, in that the bound {H \leq \exp(\log^{2/3+o(1)} N)} holds. The reason for this is that the primes {p} that are larger than {H} (in the very bad case) or {C H \log N} for a large constant {C} (in the (1) case) cannot actually divide any of the {N+1,\dots,N+H} unless they divide it at least twice. This creates a constraint on the fractional parts of {N/p} and {N/p^2} that turns out to be inconsistent with the equidistribution results on those fractional parts coming from Vinogradov’s bounds on exponential sums unless {H} is small. In the very bad case, this forces a linear relation between two powerful numbers; expressing powerful numbers as the product of a square and a cube, matters then boil down to counting solutions to an equation such as

\displaystyle  n_1^2 n_2^3 + 1 = m_1^2 m_2^3

with say {n_1,n_2,m_1,m_2 \leq x^{1/5}}. The number of solutions here turns out to be {O(x^{2/5})} by work of Aktas-Murty and of Chan; we generalize the arguments in the former to handle a slightly more general equation. A similar argument handles solutions to (1), except in one regime where the parameter {a} is somewhat large (comparable to {H \log N}), in which case one instead collects some congruence conditions on {N} and applies the large sieve.

The situation with bad intervals is more delicate, because there is no obvious way to make {H} small in all cases. However, by the large sieve (as well as the Guth–Maynard theorem), one can show that the contribution of large {H} is negligible, and from bounds on smooth numbers one can show that the interval {\{N+1,\dots,N+H\}} contains a number with a particularly specific anatomy, of the form {p_0^2 p_1 \dots p_{1000} m'} where {p_0, \dots, p_{1000}} are all primes of roughly the same size, and {m'} is a smoother factor involving smaller primes. The rest of the bad interval creates some congruence conditions on the product {p_1 \dots p_{1000}}. Using some character sum estimates coming from the Burgess bounds, we find that the residue of {p_1\dots p_{1000}} becomes fairly equidistributed amongst the primitive congruence classes to a given modulus when one perturbs the primes {p_1,\dots,p_{1000}} randomly (there are some complications from exceptional characters of Siegel zero type, but we can use a large values estimate to keep their total contribution under control). This allows us to show that the congruence conditions coming from the bad interval are restrictive enough to make non-trivial bad intervals quite rare compared to bad points. One innovation in this regard is to set up an “anti-sieve”: the elements of a bad interval tend to have an elevated chance of being divisible by small primes, and one can use moment methods to show that an excessive number of small prime divisors is somewhat rare. This can be compared to standard sieve arguments, which often seek to limit the event that a number has an unexpectedly deficient number of small prime divisors.

May 02, 2026

n-Category Café Quantum Mechanics of the Inverse Cube Force Law

In the last episode of my column in Notices of the American Mathematical Society, we looked at a particle moving in an attractive central force whose strength is proportional to the inverse cube of the distance from the origin. Among other things, we saw that a particle moving in such a force can spiral in to the origin in a finite time. But that was classical mechanics. What about quantum mechanics?

Here things get more tricky. The uncertainty principle tends to prevent the particle from falling in to the origin. But when the attractive force is strong enough, the particle can still fall in. We can make up a theory where the particle shoots back out, but there are choices involved: we need to say how the particle changes phase when shoots back out. So there is not just a single theory, but many!

Why does the particle come back out? There are theories where it does not. In these theories, at least those studied so far, time evolution is nonunitary: that is, the probability of finding the particle somewhere or other does not stay equal to 11, because the particle simply disappears when it hits the origin. Here we focus on theories where time evolution is unitary and the particle comes back out. Many people have written about these, running into ‘paradoxes’ when they weren’t careful enough. Only rather recently have things been straightened out.

Let us dig into the details. In quantum mechanics, the Hilbert space of states of a particle in 3\mathbb{R}^3 is L 2( 3)L^2(\mathbb{R}^3). In a central force whose strength is proportional to 1/r 31/r^3, such a particle has a Hamiltonian of this form:

H= 2+cr 2 H = -\nabla^2 + c r^{-2}

The first term describes the particle’s kinetic energy, while the second describes its potential energy: remember, taking the gradient of an inverse square potential gives an inverse cube force. I have set some constants to 11 to remove irrelevant clutter, but we need the constant cc to say how strong the force is. When c<0c \, &lt; \, 0, the force is attractive.

In this game, analysis is paramount. We should interpret HH as a densely defined linear operator on L 2( 3)L^2(\mathbb{R}^3). For this, we choose a dense linear subspace DL 2( 3)D \subset L^2(\mathbb{R}^3) and treat HH as a linear map from DD to L 2( 3)L^2(\mathbb{R}^3). Different choices of DD correspond to different physical assumptions: for example, assumptions about what happens when the particle falls into the origin.

To get unitary time evolution in quantum mechanics, we need the Hamiltonian to be self-adjoint. But adjoints of densely defined operators are tricky. Let us briefly recall how they work. Given a Hilbert space \mathcal{H} and a linear operator AA from a dense linear subspace D(A)D(A) \subseteq \mathcal{H} to \mathcal{H}, we define D(A *)D(A^*) to be the set of all ψ\psi \in \mathcal{H} for which there exist ψ\psi' \in \mathcal{H} such that

ψ,ϕ=ψ,Aϕ for all ϕD(A). \langle \psi' , \phi \rangle = \langle \psi, A \phi \rangle \; \text{ for all } \; \phi \in D(A).

If such a vector ψ\psi' exists, it is unique, and it depends linearly on ψ\psi. Thus, for ψD(A*)\psi \in D(A\ast) we define A*ψA\ast \psi to be the vector ψ\psi' with the above property. The adjoint of AA is then the linear operator A*:D(A*) A\ast \colon D(A\ast) \to \mathcal{H}. We say AA is self-adjoint if A=A*A = A\ast. We say that AA is essentially self-adjoint if it has a unique extension to a self-adjoint operator. If it does, this extension must be A*A\ast.

All this raises the question of whether the Hamiltonian HH for the inverse cube force law can be made self-adjoint with a suitable choice of domain. It turns out we can always do it, but sometimes in more than one way. There are three regimes:

  • c34c \ge \tfrac{3}{4}. In this case we can start with the domain C 0 ( 3{0})C_0^\infty(\mathbb{R}^3 - \{0\}) consisting of smooth functions that are compactly supported on 3\mathbb{R}^3 minus the origin. The operator HH is unambiguously defined on this domain, and it is essentially self-adjoint.

  • 14c<34-\tfrac{1}{4} \le c \, &lt; \, \tfrac{3}{4}. In this case HH is still well-defined on the domain C 0 ( 3{0})C_0^\infty(\mathbb{R}^3 - \{0\}), but it is not essentially self-adjoint. In fact, it admits more than one self-adjoint extension! However, HH is bounded below: there is a constant E 0E_0 such that ψ,HψE 0ψ,ψ \langle \psi, H \psi \rangle \ge E_0 \langle \psi, \psi \rangle for all ψC 0 ( 3{0})\psi \in C_0^\infty(\mathbb{R}^3 - \{0\}). Physically, this means that the particle’s energy is bounded below by E 0E_0. Mathematically, this implies that HH has a canonical choice of self-adjoint extension called the ‘Friedrichs extension’, with the smallest possible domain. But there is another canonical choice, the ‘Krein extension’, with the largest possible domain.

  • c<14c \, &lt; \, -\tfrac{1}{4}. In this case HH is well-defined on the domain C 0 ( 3{0})C_0^\infty(\mathbb{R}^3 - \{0\}), and it has more than one self-adjoint extension, but it is not bounded below.

These strange results demand explanation. For example, what is special about c=14c =-\tfrac{1}{4}? In classical mechanics, the energy of a particle in the inverse cube force ceases to be bounded below as soon as c<0c \, &lt; \,0. Quantum mechanics is different. To get a lot of negative potential energy, the particle’s wavefunction must be peaked near the origin, but that gives it kinetic energy. The tradeoff is captured by Hardy’s inequality. This says that for any ψC 0 ( 3)\psi\in C_0^\infty(\mathbb{R}^3) we have

ψ,( 214r 2)ψ0. \langle \psi, (-\nabla^2 - \tfrac{1}{4} r^{-2}) \psi \rangle \ge 0 .

This is why HH is bounded below when c14c \ge -\tfrac{1}{4}.

On the other hand, the constant 14\tfrac{1}{4} in Hardy’s inequality cannot be improved, so if c<14c \, &lt; \, \tfrac{1}{4} we can find ψ\psi with ψ,Hψ<0 \langle \psi, H \psi \rangle \, &lt; \, 0. Then we can use a remarkable property of the r 2r^{-2} potential to show that HH is not bounded below. Namely, HH has a kind of symmetry under dilations. You can guess this by noting that both the Laplacian and r 2r^{-2} have units of 1/length 2{}^2. Indeed, if you take any smooth function ψ\psi, dilate it by a factor of α\alpha, and then apply HH, you get α 2\alpha^{-2} times what you get if you do these operations in the other order. This implies that if

ψ,Hψ=Eψ,ψ, \langle \psi, H \psi \rangle = E \langle \psi, \psi \rangle ,

we can dilate ψ\psi and get a function obeying the same equation with EE replaced by α 2E\alpha^{-2} E. Thus, as soon as EE can be negative, it can be made arbitrarily large and negative by choosing α\alpha to be very small. Thus HH is not bounded below.

Next, what is special about c=34c = \tfrac{3}{4}? This is more subtle. For any value of cc \in \mathbb{R} we can find spherically symmetric solutions of ( 2+cr 2)ψ=iψ ( -\nabla^2 + c r^{-2})\psi = i \psi on 3{0}\mathbb{R}^3 - \{0\} that are nonzero and smooth. When c<34c \, &lt; \, \tfrac{3}{4}, and only in this case, some of these solutions ψ\psi lie in L 2( 3)L^2(\mathbb{R}^3). This dooms the chance of HH being essentially self-adjoint, because it implies H*ψ=iψH\ast \psi = i \psi. If HH were essentially self-adjoint H*H\ast would be self-adjoint, and it is easy to see that a self-adjoint operator cannot have ii as an eigenvalue.

When c<34c \, &lt; \, \frac{3}{4} the operator HH has more than one self-adjoint extension from C 0 ( 3{0})C_0^\infty(\mathbb{R}^3 - \{0\}) to some larger domain. To classify these we can use separation of variables, writing 2\nabla^2 as a sum of a radial part and an angular part, assuming the angular dependence of ψ\psi is given by a spherical harmonic Y mY_{\ell m}, and doing a change of variables u=ψ/ru = \psi/r to reduce HH to the ordinary differential operator

d 2dr 2+(c+(+1))1r 2 - \frac{d^2}{d r^2} + \left(c + \ell(\ell+1)\right) \frac{1}{r^2}

on the half-line (0,)(0,\infty). We can completely classify self-adjoint extensions of this differential operators from C 0 (0,)C_0^\infty(0,\infty) to larger domains; the answer depends on cc and \ell. A choice of self-adjoint extension is a choice of boundary conditions at r=0r = 0, and this says how the phase of an incoming wave changes as it reflects off the origin and bounces back. Finally, we can assemble the results for different spherical harmonics to classify self-adjoint extensions of HH.

There exist many self-adjoint extensions of HH that respect the rotational symmetry of the inverse cube force law, but for c<14c \, &lt; \, -\tfrac{1}{4} the extension must break the dilation symmetry discussed above. This is what physicists call an ‘anomaly’: a symmetry of a classical system that fails to be a symmetry of the corresponding quantum system. Intriguingly, for some even lower values of cc one can choose a self-adjoint extension that is symmetrical under a discrete subgroup of dilations. Determining precisely which values these are seems to be an open problem.

To explore this topic thoroughly, I recommend first this:

then this:

and finally this:

The first is an excellent overview of problems associated to singular potentials, including the inverse cube force. The second delves into self-adjoint extensions of the ordinary differential operators mentioned above, and the third works them out with exquisite thoroughness.

May 01, 2026

Matt von HippelBreakthrough Prize 2026

Because of last week’s “bonus info” post, I’m only now getting around to commenting on this year’s Breakthrough Prizes in Fundamental Physics. While I don’t comment on them every year, I know enough about several of this year’s winners that I figured a post would be helpful.

For those who haven’t heard of it, the Breakthrough Prizes are a bit like the Nobel, if it was created by a 21st century rich person instead of a 19th century one. They give out more money, and instead of an organization like the Swedish Academy of Sciences they pick winners via a committee of past winners. They’re more flexible in structure than the Nobel, with extra prizes for early-career researchers and a tendency to reward accomplishments that are either entirely theoretical or solid experimental work that doesn’t show a new discovery, both of which are things the Nobel Prize is structured to avoid. They’ve also shown willingness to reward large collaborations, rather than following the Nobel’s informal rule to only give the award to three people at a time.

This last was on display this year in their main award in physics this year, for the muon g-2 collaborations. The award is going to collaborations of scientists and engineers at three different particle colliders, for work done over a span of over fifty years to measure the magnetic properties of the muon. These measurements have shown a tantalizing discrepancy with predictions that inspired many to conjecture new physics. However, in the last few years it’s looked more and more like the discrepancy was due to an imprecise prediction, and better methods seem to be converging to the experimental value. At this point, smart money is that there is no disagreement with the Standard Model here, but as always in science there’s a chance some mystery remains.

The Breakthrough Prize also offered a special, out-of-schedule prize to David Gross. Already a Nobel laureate, Gross had a crucial role in our understanding of the force of quantum chromodynamics that binds protons and neutrons together. He was also a major founding figure in string theory, and since the Breakthrough Prize is more comfortable recognizing theoretical contributions they get to mention this as well. Gross is also known in the community for his personality, which tends to fill up any room he’s in. I can only imagine the conversations that led to Breakthrough’s decision to add a special prize for him this year.

Breakthrough is also adding a new recurring prize, the Vera Rubin New Frontiers Prize, honoring women who make important contributions to physics within two years of their PhD. The prize is a bit smaller than the exiting early-career New Horizons in Physics Prizes, presumably because it goes to even younger researchers. This year’s winner is from my old field, scattering amplitudes. Carolina Figueiredo is part of the latest evolution of the research program behind the amplituhedron. The new framework of “surfaceology” seems like a promising geometry-flavored way to understand particle physics calculations in more realistic theories, and unlike its predecessors may have some practical value eventually as well. Congrats Carolina!

Finally, the New Horizons in Physics Prizes are for impressive early-career researchers. I don’t know much about the first recipient, Benjamin Safdi, who works on searches for axions and axion-like particles, today’s most trendy dark matter candidate. I know a bit more about the work done by Clay Córdova, Thomas Dumitrescu, Shu-Heng Shao, and Yifan Wang, having met several of them in my physics career. They work on what are called generalized symmetries, concepts which go beyond the usual idea of how symmetry is supposed to work by involving more complicated tensors. I saw these crop up a fair bit in talks, but they were distant enough from my area that I never had a particularly clear grasp of what people were doing with them. I know even less about the work of the last three, Dillon Brout, J. Colin Hill, Mathew Madhavacheril, Maria Vincenzi, Daniel Scolnic, and W. L. Kimmy Wu, on cosmological measurements, but I was friends with Mathew in grad school and am impressed that he’s now working on cosmology given how little cosmology research there was at Stony Brook at the time.

April 30, 2026

Scott Aaronson Will you heed my warnings NOW?

Holy crap … yesterday I was elected to the US National Academy of Sciences! If you don’t believe me, click the link and keep scrolling down until you hit the name “Aaronson.” But then continue scrolling to see 144 other inductees, including my IAS postdoctoral classmate Maria Chudnovsky, my longtime friend and colleague Salil Vadhan, and even Janet Yellen. I’m humbled to be in such company.

Years ago, somewhere on this blog, I mused that, if I were ever invited to join NAS, I hoped I’d follow the wisdom of Richard Feynman, who famously resigned his NAS membership, comparing it to an honor society back at his high school that spent most of its time debating who should be a member of the honor society. Feynman was also annoyed at having to pay dues.

But now that I’m actually faced with the choice, it’s like, dude! At my advanced age of 44, I’ve encountered so many people who dislike me or even sneer at me, and so many clubs that won’t have me as a member, that I feel mostly gratitude and warmth toward a fine club like NAS that will have me as a member. Anyway, I’ll certainly try it out to see what it’s like—even Feynman did that!

A few hours after I started getting congratulatory emails, for which I was thankful, someone from UT Austin’s press office asked me how I feel about this “culmination” and “capstone” of my entire research career. I replied, look, I know I’ve slowed down a lot since my nubile twenties, but I still hold out the hope that this isn’t any kind of “capstone”!

In any case, I’m ridiculously grateful to all the friends, family, colleagues, and readers who believed in me and helped me reach wherever this is.


Now for a totally different topic, but that will ultimately loop back to the first one:

Last week, I did an Ask Me Anything about quantum computing and blockchain for stacker.news, a forum devoted to bitcoin. Thanks to Will Scoresby for organizing it.

As a longer-term commitment, I also collaborated with my colleagues Dan Boneh, Justin Drake, Sreeram Kannan, Yehuda Lindell, and Dahlia Malkhi, in a panel convened by Coinbase, to put out a detailed position paper about the quantum threat to cryptocurrencies and how best to respond to it. Take a look!

Notably, the situation evolved even while we were writing our position paper—for example, with the major recent papers from Google and Caltech/Oratomic that I blogged about a month ago.

I’d now like to add a few words of my own, not presuming to speak for my fellow Coinbase panelists.

See, some of the most reputable people in quantum hardware and quantum error-correction—people whose judgment I trust more than my own on those topics—are now telling me that a fault-tolerant quantum computer able to break deployed cryptosystems ought to be possible by around 2029.

Maybe they’re overoptimistic. Maybe it will take longer. I dunno. I’m not a timing guy.

But here’s what I do know: the companies racing to scale up fault-tolerant QC, have no plans to slow down in order to “give cybersecurity time to adapt” or whatever. The way they see it, cryptographically relevant QCs will plausibly be built sometime soon: indeed, it’s ultimately unavoidable, even if people’s only interest in QC was to do quantum simulations for materials science and chemistry. So, given that reality, isn’t it better that it be done first by mostly US-based companies in the open, than by (let’s say) Chinese or Russian intelligence in secret? And besides, haven’t there already been years of warnings and meetings about the quantum threat to RSA, Diffie-Hellman, and elliptic curve cryptography? Aren’t many in cybersecurity still in denial about the threat? Haven’t these slumberers shown that they won’t wake up until dramatic achievements in fault-tolerant QC roust them—the way Anthropic’s Mythos model has now jolted even the most ostrich-like about the cybersecurity risks of AI? So, mixing metaphors, mightn’t we just as well rip this Band-Aid off ASAP, rather than giving foreign intelligence agencies extra years to catch up? Indeed, when you think about it that way, isn’t racing to build a cryptographically relevant QC, as quickly as possible, the most ethical, socially responsible thing for an American QC company to do?

Is the above line of reasoning suspiciously self-serving and convenient? Does it remind you of the galaxy-brained arguments that AI company after AI company has offered over the last decade for why “really, if you think about it, accelerating toward dangerous superintelligence is the safest course of action that we could possibly take”? I.e., the arguments that led to the current frenzied AI race, which some believe imperils all life on earth?

It’s not my place here to answer such questions; I leave further ethical and geopolitical debate to the comment section! My point is simply: whether or not anyone likes it, this is how some of the leading QC companies are now thinking about the Shor of Damocles that they genuinely believe now hangs over the Internet.

And I’d say that that makes my own moral duty right now ironically simple and clear: namely, to use my unique soapbox, as the writer of The Internet’s Most Trusted Quantum Computing Blog Since 2005TM, to sound the alarm.

So, here it is: if quantum computers start breaking cryptography a few years from now, don’t you dare come to this blog and tell me that I failed to warn you. This post is your warning. Please start switching to quantum-resistant encryption, and urge your company or organization or blockchain or standards body to do the same.

Yea, heed my warning, for it comes not from some WordPress-using rando, but from the inventor of BosonSampling and PostBQP and shadow tomography, the Schlumberger Centennial Chair and Founding Director of the Quantum Information Center at the University of Texas at Austin, and (wait for it) new member of the US National Academy of Sciences, that august and distinguished body brought into being by President Abraham Lincoln in 1863.

Because, you know, none of this is about me. It’s only about you. And whether you’ll listen to me.

April 24, 2026

Matt von HippelBonus Info for “Quantum ‘Jamming’ Explores the Truly Fundamental Principles of Nature”

I had a new piece in Quanta Magazine last week, about a hypothetical trick in theories beyond quantum mechanics called jamming.

Sometimes, I get science news stories from contacts. Sometimes I see an academic post something cool on X or Bluesky. But when the stories aren’t coming easy, I open up arXiv.org, click on “new”, and start browsing. And occasionally, I spot something cool.

That happened with jamming. I saw the concept mentioned in an abstract, the idea that someone could “jam” quantum entanglement from afar, like you would jam a radio signal. I hadn’t heard of it before. I wanted to know more. And after I talked to Quanta’s editors, they wanted to know more too.

Jamming is not possible under the rules of quantum mechanics we know. Instead, it’s something that could be possible in a kind of super-quantum mechanics, a theory even weirder than the famously weird theory we use today. In my piece for Quanta, I talked about where the idea of jamming comes from, and why it’s spurring discussion in recent years. In this post, I wanted to give some “bonus info” that didn’t fit into the piece.

One theme I didn’t have as much space to explore is causality.

Quantum mechanics famously seems to do weird things with cause and effect. In a double-slit experiment, photons pass one by one through one of two slits in a wall, headed to a photographic screen. No matter how slowly and carefully you send the photons, their distribution on the other end will show interference between the two possible paths, one through each slit, even though each photon only goes through one. It’s as if before hitting the screen, the photons are simultaneously traveling on every possible path, only to pick one in the moment the photon is detected.

Einstein was bothered by this. He imagined a photographic screen so large it would take light years to cross. How could detecting a photon on one side change the possibility of detecting a photon on the other side? That seemed, to him, to require signals traveling faster than light, which in turn would screw up cause and effect, as any way to send a signal faster than light can also, from another perspective, send a signal back in time.

The answer most physicists accept is that no signal can be sent in this way…at least, in the modern sense. Quantum outcomes are random, so while you could imagine that a measurement in one place changes the outcome in another place, your choice to measure has no effect on that distant outcome. You can’t intentionally send a message faster than light. We call that “no-signaling”, and it prevents the paradoxes of time travel.

Jamming obeys similar rules. A jammer (in the story in my article, a magician named Jim) can modify the entanglement between two distant particles, seemingly faster than light. But he can only do this in a way that involves randomness, so that the probabilities for measurement results for each individual particle stay the same. Instead, he can only modify how measurements between the two particles are related, their correlation. And he can only do this if the two particles can only be compared in a region that he can reach without traveling faster than light.

That’s enough to allow Jim to break the security of many quantum cryptography procedures. He can do this for example by mimicking entanglement: quantum cryptography often uses entanglement to verify that a message hasn’t been tampered with. If you can modify correlations from afar, you can make two particles appear to be entangled when actually they’re related by some other rules, which give you access to the secret that others are trying to hide.

Part of what’s still under discussion, is whether that kind of trick is compatible with causality. This depends a lot on how you think causality is supposed to work, and while the people I talked to are trying to get the story straight, they weren’t in agreement yet. In particular, Vilasini and Colbeck seemed to think that there was an important difference between the way that jamming bends causality and the way that ordinary quantum mechanics does, while Eckstein and Ramanathan weren’t so sure.

More broadly, Vilasini and Colbeck have a broader way of thinking about causality that I only barely touched on. Part of that is ways you can think of one event causing another even if no signal can be sent between them. Part of that is time loops, but of a limited kind: loops that can’t cause paradoxes, because they’re loops of causes, but not intentional signals. Vilasini and Colbeck have argued that jamming, if it existed, could be used to set up these kind of limited time loops, in a piece that was covered by New Scientist. It should be emphasized that these are really very limited time loops, for more reasons than one. They’re also limited to being in only one spatial dimension: that is, everyone in the loop has to be lined up in exactly a straight line. And I got the impression they also require everyone to activate their measurement or jamming devices instantly: with any small delay, the loop breaks.

I said even less about Mirjam Weilenmann’s critique, because there were bigger aspects that the researchers still disagreed on when I spoke with them. Weilenmann’s argument looks at what happens when there are multiple jammers, jamming different pairs of entangled particles. I got the impression from her that she felt she had found a contradiction in these examples, where jamming could only work if it broke its essential no-signaling rules. But Eckstein and Ramanathan seemed to think she was describing a scenario where one jammer could cause noise that would disrupt another jammer, “jamming the jammers” in a sense that didn’t cause any fundamental problems, just introduced jammer vs. jammer combat to make the story more interesting. I opted to not say much about this, since it was clear that things weren’t resolved yet. The researchers are still talking, and I look forward to hearing what they conclude when they reach agreement.

I also didn’t say much about tests in the real world. But that is something Eckstein and collaborators are actively exploring. They’re investigating experiments that could show deviations from quantum mechanics in a variety of contexts, from tabletops in university labs to particle colliders. The hope is that some of these strange ideas could actually be tested.

In general, the impression I got was that despite the seeds of this topic being laid thirty years ago, and reintroduced to the field ten years ago…the topic is heating up right now, in a way it hadn’t before. I’m expecting more jamming papers. If they’re cool enough, I may even cover some of them.

April 19, 2026

Scott Aaronson Three greats who we’ve lost

Sir Charles Antony Richard Hoare (1934-2026) won the 1980 Turing Award for numerous contributions to computer science, including foundational work on concurrency and formal verification and the invention (with Dijkstra) of the dining philosophers problem. But he’s perhaps best known, to pretty much everyone who’s ever studied CS, as the inventor of the Quicksort algorithm. I’m sorry that I never got to meet him.

Michael O. Rabin (1931-2026), of Harvard University, was one of the founders of theoretical computer science and winner of the 1976 Turing Award. In 1959, he and Dana Scott introduced the concept of a “nondeterministic machine”—that is, a machine with exponentially many possible computation paths, which accepts if and only if there exists an accepting path—which would of course later play a central role in the formulation of P vs. NP problem. He’s also known for the Miller-Rabin primality test, which helped to establish randomness as a central concept in algorithms, and for many other things. He’s survived by his daughter Tal Rabin, also a distinguished theoretical computer scientist. I was privileged to meet the elder Rabin on several visits to Harvard, where he showed me great kindness.

Sir Anthony Leggett (1938-2026), of the University of Illinois Urbana-Champaign, was one of the great quantum physicists of the late 20th century, and recipient of the 2003 Nobel Prize for his work on superfluidity. When I knew him, he was a sort of elder statesman of quantum computing and information, who helped remind the rest of us of why we got into the field in the first place—not to solve Element Distinctness moderately faster, but to learn the truth of quantum mechanics itself. Tony insisted, over and over, that the validity of quantum mechanics on the scale of everyday life is an open empirical problem, to be settled by better experiments and not by a-priori principles. I first met Tony at a Gordon Research Conference in southern California. Even though I was then a nobody and he a recent Nobel laureate, he took the time to listen to my ideas about Sure/Shor separators, and to suggest (correctly) what we now call 2D cluster states as an excellent candidate for what I wanted. In all my later interactions with Tony, at both the University of Waterloo (where he was visiting faculty for a while) and at UIUC (where my wife Dana and I considered taking jobs), he was basically the friendliest, funniest guy you could possibly meet at his level of achievement and renown. I was bummed to hear about his passing.

April 16, 2026

Clifford JohnsonComputing Correlators

[A more technical post follows]

My most recent paper, out on the arXiv today, is very exciting to me because it seems to be a genuinely new way of computing some important quantities and it is devilishly simple. So simple that I worried for months that it is all super-obvious to everyone. But another voice within me said to myself: Well if it is so obvious, why has nobody published it? Another (paranoid) voice within said: Maybe someone has published this method, and I just can't find it in the literature...

Well, I decided that the best way to find out for sure is to put it on the arXiv and within a short time someone will email to say that I missed their important work. So, while I wait for that email (as I start writing it's only been 30 minutes since it has been "out there", so there's time), let me say a few things about why I like the many results in the paper.

I was already pleased enough with the core part of the paper that I was going to write a swift four-pager about it back in February. The core point being that I figured out how to build on work I'd done in a paper back in 2024 (expanded on with followup work I did with Wasif Ahmed and Krishan Saraswat, a student and postoc). Back in 2024, I found (here) a really nice way (almost miraculous in how it worked) of writing all the corrections to the spectral density of a class of models in terms of one function [latex]u_0[/latex] and its derivatives. It was obtainable from one simple ordinary differential equation (ODE) called the Gel'fand-Dikii equation, which takes in the function [latex]u_0(x)[/latex] as input. The ODE is for a special quantity called the diagonal resolvent [latex]{\widehat R}(x,E)[/latex]. You integrate that quantity [latex]\widehat R(x,E)[/latex] with respect to [latex]x[/latex] and you're more or less home. In general, it is a messy quantity that does not integrate to anything nice. But just when the function [latex]u(x)[/latex] obeys the "string equation" it is supposed to (as dictated by the governing model's physics), then [latex]{\widehat R}(x,E)[/latex] is a total derivative (a seeming miracle-see later), and the corrections it gives to the density become of just the right form!

Those corrections can be called [latex]W_{g,1}(E)[/latex] where the [latex]g[/latex] is the order in perturbation theory. [latex]g=0[/latex] is leading order, [latex]g=1[/latex] is the torus, [latex]g=2[/latex] the double torus, etc. Indeed [latex]g[/latex] is the number of handles or "genus" of an associated Riemann surface. The one subscript on the other hand, corresponds to the one energy entry available when just discussing the density [latex]\rho(E)[/latex]. All the [latex]W_{g,1}[/latex] end up being written nicely in terms of a function [latex]u_0(x)[/latex] and its derivatives, evaluated at a special point.

An already nice feature (among many) of the construction was that this one ODE, recursively solved, gave rise to the [latex]W_{g,1}[/latex] of many different problems across a range, including certain random matrix models, gravity problems, intersection theory and topology, and so on. All you need to do is change the function [latex]u_0(x)[/latex]. Moreover, for this (wide) class of problems, you can compute the desired results faster and with way less machninery than other methods, such as topological recursion, which was an interesting observation. This includes very famous problems like the Weil-Petersson volumes (of the compactified moduli space [latex]\overline{\cal M}_{g,1}[/latex] of Riemann surfaces with genus [latex]g[/latex] and [latex]n=1[/latex] boundaries) and generalisations. Another nice feature is that you also get non-perturbative data beyond the genus expansion, an aspect I explored recently (in this paper) with student Joao Rodrigues, and expert in resurgence techniques.

The core breakthrough of the new paper is this: For some time, I've wondered how to compute correlators for more energies (amounting to multi-point correlators of [latex]\rho[/latex]) in this same way: [...] Click to continue reading this post

The post Computing Correlators appeared first on Asymptotia.

April 13, 2026

John PreskillUnleashing the Advantage of Quantum AI

As experimental capabilities advance rapidly, the quantum computing community faces a critical elephant in the room: What will these quantum machines eventually be useful for? Will they deliver the promised broad societal impact, or will they remain highly specialized devices for exotic tasks known only to the experts?

The elephant in the room

Despite decades of effort, conclusive evidence of large quantum advantage in real-world applications remains confined to a few niche domains, such as simulating quantum materials and cryptanalysis. These problems are either inherently quantum to begin with, or they possess specialized mathematical structure that quantum algorithms can easily exploit. But it seems unlikely that such structures appear broadly in everyday life.

Indeed, most applications of modern computation hinge on the processing of massive, noisy classical data, generated at an unprecedented pace across society. That is the driving force behind the overwhelming success of machine learning and AI. Since the data originates from the macroscopic classical world, there is no obvious reason it should exhibit the delicate, specialized structures that quantum computers require. To playfully adapt Richard Feynman’s famous quote: We live in an effectively classical world, dammit, and maybe classical computers and AI already suffice for most of our problems. (For those unfamiliar, Feynman originally quipped: “Nature isn’t classical, dammit, and if you want to make a simulation of nature, you’d better make it quantum mechanical.”)

The central challenge

To truly unlock the power of a quantum computer, quantum algorithms typically need to access data in quantum superposition, processing many different samples simultaneously in different branches of the quantum multiverse. To use technical jargon, this is called querying a quantum oracle. But in reality, the classical data samples that we want to process are generated from everyday activities in a classical world, and we can only access them one at a time.

Think of the movie reviews you scroll through on a streaming platform. How would you read the plain-text reviews from a million different users all at once in a quantum superposition? This bottleneck—the challenge of efficiently accessing the classical world in quantum superposition—is known as the data loading problem. It has arguably been one of the main obstacles to achieving broadly applicable quantum advantage.

Sketching a quantum oracle

In this new work [1], we provide a solution to this seemingly impossible challenge. We develop a framework, called quantum oracle sketching, that enables us to access the classical world in quantum superposition in an optimal way. Importantly, it automatically handles the noise and correlations in the data, and natively supports flexible data structures like vectors and matrices that enable machine learning applications.

The core mechanism relies on processing data as a continuous stream. For each classical data sample we observe, we apply a carefully designed, small quantum rotation to our system. By sequentially accumulating these quantum rotations, we incrementally build up an accurate approximation of the target quantum oracle, which can then be used in any quantum algorithm for data processing. Because every data sample is processed once and immediately discarded, we completely eliminate the massive memory overhead typically required to store the dataset. The fundamental price to pay for assembling quantum queries from classical data lies in the sample complexity: our algorithm consumes a number of samples that scales quadratically with the number of quantum queries we need to make. We show that this rate is optimal and fundamentally arises from the quadratic relationship between quantum amplitudes and classical probabilities governed by the Born rule.

With the data successfully loaded into the quantum computer, the final challenge is to efficiently read out classical results. To address this, we develop an efficient measurement protocol called interferometric classical shadow. Combined with quantum oracle sketching, it allows us to circumvent the data loading and readout bottleneck to construct exponentially compact classical models from massive classical data with quantum technology.

Exponential quantum advantage in machine learning

Using this new approach, we are finally able to find exponential quantum advantage in processing classical data and machine learning. We rigorously prove that a small quantum computer can perform large-scale classification and dimensionality reduction on massive classical data by processing samples on the fly. In contrast, any classical machine achieving the same prediction performance requires exponentially larger size. When the classical machine does not have the required exponentially large memory size, it needs super-polynomially more samples and time relative to our protocol running on a quantum device. Remarkably, this illustrates that quantum technology enables us to construct compact and accurate classical models out of classical data, which is impossible with classical machines alone unless given exponentially larger memory.

The true scale of this exponential memory advantage is staggering. A quantum processor with 300 logical qubits can outperform a classical machine built from every atom in the observable universe. Of course, to actually see such a comical contrast, we would also need universe-scale datasets and processing time.

To contextualize these results in realistic scenarios, consider a large-scale scientific experiment, like a large particle collider. Each experimental run generates a colossal volume of data. With a quantum computer, we can keep squeezing all the data into this tiny quantum chip to perform downstream machine learning tasks such as classification and dimensionality reduction. But if we only have classical machines, we would need to build massive, energy-consuming data centers to store the raw data to match the performance. Without this massive memory overhead, classical machines simply couldn’t extract the same clear signals from a single run, forcing us to repeat the massive, expensive experiment many more times to compensate. To put this into perspective, the Large Hadron Collider (LHC) at CERN generates petabytes (millions of gigabytes) of data per hour, but the data storage bottlenecks force researchers to discard all but a tiny fraction—retaining perhaps only one in a hundred thousand events.

We validated these quantum advantages on real-world datasets, including movie review sentiment analysis and single-cell RNA sequencing. In these public datasets, we demonstrate four to six orders of magnitude (ten thousand to a million times) reduction in memory size with fewer than 60 logical qubits. Given the rapid advancements in high-rate quantum error correction codes and experimental techniques, quantum computers capable of demonstrating such applications are foreseeable in the near future. Crucially, the quantum advantage we propose likely carries a clearer positive impact for society and likely arrives sooner than the applications in cryptanalysis, where the current best estimate requires a thousand logical qubits.

Towards Quantum AI

Our results provide strong evidence that the utility of quantum computers extends far beyond specialized tasks, opening a path for quantum computers to be broadly useful in our everyday life. Rather than fearing that classical AI will “eat quantum computing’s lunch,” we now have rigorous evidence pointing towards a much more exciting prospect: quantum-enhanced AI overpowering classical AI.

Of course, there is still a long way to go towards the dream of quantum intelligence. Our current results establish the provable supremacy of quantum machines in foundational machine learning tasks, such as high-dimensional linear classification and dimensionality reduction. They do not yet imply immediate utility for modern generative AI such as large language models.

That said, our results give me a strong feeling that we are living in an age strikingly reminiscent of the traditional machine learning era—an age dominated by support vector machines and random forests; an age when we relied on rigorous statistical analysis because we lacked the computational resources for large-scale heuristic exploration; an age that ultimately heralded the birth of deep learning and the AI revolution. Today, quantum AI seems to sit at a similar historical position. I cannot wait to see what quantum AI will become once we are capable of unconstrained heuristic exploration on large-scale fault-tolerant quantum computers.

To accelerate this dawn of quantum AI, we invite physicists, computer scientists, developers, and machine learning practitioners to join our efforts and help us push the boundaries of what quantum AI can achieve. To bridge the gap between abstract quantum theory and hands-on machine learning practice, we are open-sourcing our core framework. Our numerical implementation of quantum oracle sketching is built in JAX, natively supporting GPU/TPU acceleration and automatic differentiation to integrate nicely with modern machine learning pipelines. Check out the code, run the simulations, and help us shape the future of quantum AI at github.com/haimengzhao/quantum-oracle-sketching!


References

[1]. Haimeng Zhao, Alexander Zlokapa, Hartmut Neven, Ryan Babbush, John Preskill, Jarrod R. McClean, and Hsin-Yuan Huang. Exponential quantum advantage in processing massive classical data, arXiv:2604.07639, 2026.

John PreskillHow I learned to stop worrying and…no, I’ve always adored entropy

When I was pursuing a PhD at Caltech, so was my friend Jeremy. He used to throw a dinner party every few months. The email invitations welcomed friends to partake of his cooking and, if we wished, to help him cook. I didn’t help cook; but, when I arrived, the mess of pots and pans drew me to the kitchen like vinegar drawing a pathological fly. I couldn’t sit still while cookware needed cleaning, so I scrubbed and rinsed the pans and spoons and bowls. Jeremy, an applied-physics student, commented on my adeptness at decreasing entropy.

It’s the story of my life, I replied.

In fourth grade, my classmates and I cleaned our desks every Friday afternoon. Once a student finished, my teacher dismissed him or her onto the playground. My neighbor’s desk horrified me like the disaster in a hurricane’s wake, so I neatened his desk after finishing with mine.1 Another friend requested the same favor. A third classmate offered to pay me for cleaning his desk, but I’d have undertaken the chore for its own sake. Ordering the world offered me fulfillment.

From cleaning a fourth-grade desk, I progressed to pursuing a PhD in theoretical physics. The two pursuits might seem to resemble each other no more than Dr. Jekyll and Mr. Hyde; yet, to me, the path between them is but a step. I trained as a theoretical physicist because I love organizing ideas. Caltech paid me to build models, propose definitions and theorems, and structure proofs—to dream up ideas and identify the optimal arrangements for them. I needed that pay, being an adult, as I hadn’t needed my fourth-grade classmate’s desk-cleaning fee. Yet I organized ideas for the same reason that drove me to organize my neighbor’s notebooks.

Many people have called entropy a measure of disorder. To see why, imagine that Jeremy’s crew has used thirty utensils while cooking. The chefs can have scattered the utensils across the kitchen in many ways: they may have dropped forks on the floor, left spoons in the sink, arranged spatulas on the drying rack, or filled a vase with knives like a modern-art bouquet. In few of these configurations do the forks lie in their compartment of the utensil drawer, the spoons lie in their compartment, etc. We call such configurations neat. Most of the other configurations, we call messy. 

A system’s entropy is the number of configurations consistent with known large-scale properties of the system, such as the number of forks.2 More configurations are consistent with messiness (and a fixed number of forks and so on) than with neatness (and the same number of forks and so on). Messiness tends to correlate with high entropy. People often say, therefore, that entropy quantifies messiness. Hence Jeremy’s complimenting me on my decreasing of entropy.

Jeremy’s dinner parties came to mind as I read the book The Mattering Instinct, published by Rebecca Newberger Goldstein this January. Rebecca is a philosopher of science and a writer. I had the good fortune to meet her through my undergraduate mentor Marcelo Gleiser, who’s had another cameo or two on Quantum Frontiers. Rebecca’s latest book covers what she calls the mattering instinct: the longing to know that we matter. 

We spend scads of energy and time on securing our “survival and flourishing,” as Rebecca says. We feed ourselves; work to earn money to purchase food; clean, shelter, and clothe ourselves; ingrain ourselves in societies that offer some degree of security; and more. Do we deserve all this effort? We long for assurance that, in the immortal words of L’Oreal, we’re worth it. 

Survival and flourishing, Rebecca writes, requires us to decrease entropy. Every closed, isolated system’s entropy increases or remains constant, according to the second law of thermodynamics. Entropy increases as a system becomes more uniform, loosely speaking. The system’s particles spread out across space, these particles’ temperature comes to equal those particles’ temperature, and so on. In contrast, your body exists because its particles clump together in a certain shape consistently. You withstand heat waves and snow because homeostasis maintains your temperature despite your environment’s temperature. You keep your body’s entropy low to survive. Rebecca therefore casts us as fighting entropy.

As a thermodynamicist, I agree with Rebecca. Yet I also adore entropy. It helps explain why time flows, quantifies uncertainty, and determines the maximal efficiencies with which we can perform tasks such as communication. What versatility and richness! Entropy also embodies tension and subtlety: its mathematical definition looks obscure at first glance, yet entropy helps explain familiar phenomena such as aging. For these reasons, before beginning my PhD, I told a potential advisor that I could imagine devoting the next five years of my life to entropy.

I therefore aspire to rehabilitate entropy’s reputation. Novelist Terry Pratchett endeared mortality to millions of readers through anthropomorphism. His character Death, a mainstay of the Discworld series of novels, elicits empathy and fondness. I won’t anthropomorphize entropy here,3 but I aim to replace conflict with cooperation in the narrative above. To survive and flourish, I hold, we partner with entropy. How? We create oodles of entropy in our environments. This entropy increase offsets the entropy decrease that supports life.

For example, imagine working at a desalination plant. You’d process high-entropy water throughout which salt has spread. You’d concentrate the salt in a tiny region, reducing the water’s entropy. This reduction, producing fresh water, could support your city’s drinking, cooking, and toothbrushing needs.

To reduce the water’s entropy, you’d create loads more entropy. You’d eat breakfast before work, consuming energy stored neatly in your waffle’s chemical bonds. Your body would later break the bonds, releasing the energy. Some energy would power your muscles, so you could program the desalination system, test its output, etc. But much of the chemical energy would transform into heat radiated by your body. The heat would warm up the air molecules around you, magnifying their random jigglings and jostlings. You’d increase the entropy of the air—your environment—to decrease the water’s entropy. The air’s entropy increase would outweigh the water’s entropy decrease.

Organisms survive and flourish by producing entropy in their environments. In fact, organisms have a knack for generating entropy. Entropy and life thereby further each other. A glass-half-full thinker could conclude that we partner with entropy.

So did I partner with entropy as a PhD student, applying it to solve problems in quantum information theory and thermodynamics. So did I partner with entropy in fourth grade and at Jeremy’s apartment, deriving satisfaction from my cleaning. Rebecca would call these activities’ ultimate aim (beyond the aim of, e.g., not sitting beside a pigsty in fourth grade) mattering. She writes that we reduce entropy (within our immediate vicinities) to satisfy the mattering instinct. Rebecca’s proposition describes my behaviors with uncanny precision, I realized upon reading her book.

Which I’ve now finished. So pardon me while I return to washing forks in the quantum kitchen of the universe. 

With thanks to Jeremy for his friendship…and food.

1I also ensured that my neighbor brought home, every afternoon, the sweater he’d brought to school that morning. Before I took charge, he’d ended up with three forgotten sweaters crammed into his cubby.

2At least, one entropy is. Many other entropies exist.

3If you anthropomorphize entropy elsewhere, let me know.

April 03, 2026

Terence TaoMathematical methods and human thought in the age of AI

Tanya Klowden and I have uploaded to the arXiv our preprint “Mathematical methods and human thought in the age of AI“. This is an unabridged version of a solicited article for a forthcoming Blackwell Companion to the Philosophy of Mathematics. I rarely write article-length essays of a philosophical nature (perhaps the last one was in 2007), but given the topical interest in AI and formalization for mathematics, which has begun to raise increasingly fundamental questions about what the nature, purpose and practice of mathematics actually is (or ought to be), it seemed like it was a timely opportunity to write about these matters. Other mathematicians seem to have recently come to this conclusion also; see for instance this paper of Avigad, or this paper of Commelin, Jamnik, Ochigame, Taelman, and Venkatesh, both of which have come out in the last few weeks.

Our piece took over a year to write – which means, at the current pace of development in the field, that some of it is already slightly out of date. Nevertheless, it was an instructive exercise for both of us to try to look beyond the immediate technical issues presented by current AI and formalization tools and try to point out the philosophical questions that we will have to grapple with as these tools become increasingly capable and integrated into our profession, using prior examples of technological advancement as a guide. We don’t pretend to have definitive answers to most of these questions, but as with mathematics itself, the first step is to pose the questions and then try to make partial progress on them (or at least identify some negative results and eliminate some failed approaches). One point we particularly felt worth stressing is that AI tools and applications (in mathematics or elsewhere) should not be viewed purely through the technical lens of what microscale problems they solve and how effective or efficient they are at solving them, but also through the macroscopic humanitarian lens of how our society, our shared body of knowledge and understanding, and our species benefits (or is harmed) as a whole from these technologies.

Our initial submission ended up significantly exceeding the page limits of the submission and ventured beyond the philosophy of mathematics into broader philosophical and ethical questions about AI in general. A streamlined version of the paper will appear in the forthcoming companion, but we have decided to make the original longer version available on the arXiv.

March 27, 2026

n-Category Café Geometry and the Exceptional Jordan Algebra

I’m giving a talk online tomorrow at the 2026 Spring Southeastern Sectional Meeting of the American Mathematical Society, in the Special Session on Non-Associative Rings and Algebras. The organizers are Layla Sorkatti and Kenneth Price. I doubt the talk will be recorded, but you can see my slides.

Abstract. Dubois-Violette and Todorov noticed that the gauge group of theStandard Model of particle physics is the intersection of two maximal subgroups of F 4 \text{F}_4. which is the automorphism group of the exceptional Jordan algebra 𝔥 3(𝕆)\mathfrak{h}_3(\mathbb{O}). Here we conjecture that these can be taken to be any subgroups preserving copies of 𝔥 2(𝕆) \mathfrak{h}_2(\mathbb{O}) and 𝔥 3()\mathfrak{h}_3(\mathbb{C}) that intersect in a copy of 𝔥 2() \mathfrak{h}_2(\mathbb{C}). Given this, we show that the Standard Model gauge group consists of all isometries of the octonionic projective plane that preserve an octonionic projective line and a complex projective plane intersecting in a complex projective line. This is joint work with Paul Schwahn.

This is an introductory talk for mathematicians. Physicists may prefer the two talks here. Those go much further in some ways, but they don’t cover the new ideas that Paul Schwahn and I are in the midst of working on.

March 26, 2026

Tim GowersGroup and semigroup puzzles and a possible Polymath project

An Artin-Tits group is a group with a finite set of generators a_1,\dots a_k in which every relation is of the form (ab)^r=(ba)^r or (ab)^ra=(ba)^rb for some positive integer r, where a and b are two of the generators. In particular, the commutation relation ab=ba is allowed (it is the case r=1 of the first type of relation) and so is the braid relation aba=bab (it is the case r=1 of the second type of relation). This means that Artin-Tits groups include free groups, free Abelian groups, and braid groups: for example, the braid group on k+1 strands has a presentation with generators a_1,\dots,a_k, where a_i represents a twist of the ith and (i+1)st strands, and relations a_ia_j=a_ja_i if |i-j|>1 and a_ia_{i+1}a_i=a_{i+1}a_ia_{i+1}.

A few weeks ago, I asked ChatGPT for a simple example of a word problem for groups or semigroups that was not known to be decidable and also not known to be undecidable. It turns out that the word problem for many Artin-Tits groups comes into that category: the simplest example where the status is not known is the group with four generators a,b,c and d where c and d commute and all other pairs of generators satisfy the braid relation xyx=yxy.

My interest in this was initially that I was looking for a toy model from which one could learn something about how mathematicians judge theorems to be interesting. I started with a remarkable semigroup discovered by G. S. Tseytin that has five generators, seven very simple relations, and an undecidable word problem. From that I created a puzzle game that you might be interested to play. (NB the puzzles were not showing up in the public version, but that should be sorted out now.) Each puzzle is equivalent to an instance of the word problem for Tseytin’s semigroup, but the interface makes it much more convenient to change words using the relations than it would be to do it with pen and paper.

I was hoping that because every mathematical problem can in principle be encoded as a puzzle in this game, one might be able to build a sort of “alien mathematics”, where a theorem was an equality between words, and a definition was a decision to introduce a new (and redundant) generator g, together with a relation of the form g = w, where w is a word in the current alphabet. Theorems would be particularly interesting if they were equalities between words that could be established only by a chain of equalities that went through much longer words, and definitions would be useful if the new generators satisfied particularly concise relations (which would allow one to build “theories” within the system). I still hope to find a word problem that will allow such a project to take off, but in the end, after a lot of playing around with the game linked to above, I have decided that Tseytin’s semigroup is not suitable. The reason is that it is designed so that arbitrary instances of the word problem for groups can be encoded as word problems in this semigroup, and once one gets used to the game, one starts to see how that encoding can work. Furthermore, one seems to be driven towards the encoding — I don’t get the impression that there’s a whole other region of this semigroup to explore that has nothing to do with the kinds of words that come up in the encoding. And if that impression is correct, then one might as well start with the word problem for some group in unencoded form, or alternatively look for another semigroup. Nevertheless, I find the Tseytin game quite enjoyable: I won’t say more about it here but have written a fairly comprehensive tutorial that you can open up and read if you follow the link above.

This is perhaps the moment to say that the words “I created a puzzle game” are slightly misleading. For one thing, I discussed the idea of gamifying Tseytin’s semigroup about three years ago with Mirek Olšák, a former member of my automatic theorem proving group, and he created a basic prototype in Python. But the main point is that I do not have the programming skills to create a game that can be played in a web browser — I vibecoded it using ChatGPT.

After that experience, I thought that maybe I would have better luck if instead of looking for a group or semigroup with undecidable word problem, which might well have been explicitly designed with some encoding in mind, I looked for a word problem for which the decidability status was unknown. That way, it wouldn’t have been designed to be undecidable, but might nevertheless just happen to be undecidable and provide a nice playground of the kind I was (and still am) after. And that is what led me to the Artin-Tits groups.

However, those don’t seem to be suitable either, because it is conjectured that they all have decidable word problems. I have created a game for the Artin-Tits group mentioned above, which you can also play if you want, but I have found it very hard to create interesting puzzles. (NB again there was a problem with the puzzles showing up, with only a tiny handful being there, but that should now be sorted out.) That is, I found it difficult to find words that are equivalent to the identity but that are not easily shown to be equivalent to the identity. One nice example comes from the fact that the subgroup generated by a,c and d is isomorphic to the braid group B_4 with four strands. The late Patrick Dehornoy found a very nice example of a braid with four strands that is equal to the identity but not in a completely trivial way. A picture of it can be found on page 4 of this paper.

This is where a potential Polymath project comes in. An initial goal would be to determine the decidability status of this one small Artin-Tits group: if we managed that, then we could consider the more general problem. And the way I envisage approaching this initial goal is an iterative process that runs as follows.

  1. Devise an algorithm A_0 for solving word problems in the group.
  2. If the current algorithm is A_i, then search for a puzzle P that A_i fails to solve.
  3. If the search is successful, then devise an algorithm A_{i+1} that solves all the puzzles that A_i solves, and also solves P. Make A_{i+1} the current algorithm and return to step 2. Devising A_{i+1} can be done by playing the game with puzzle P several times until one has the feeling that one has solved it in a systematic way.
  4. If the search is unsuccessful and the current algorithm is A, then attempt to prove inductively that A always succeeds: that is, to prove that if A solves P and Q is obtained from P by applying a single relation in the group, then A solves Q.
  5. (Maybe) If the inductive step doesn’t seem to work, then try to use that failure to devise a puzzle that defeats A and return to step 3.

I have done a couple of iterations already, with the result that I now have an algorithm — let’s call it A_0 — that solves (basically instantaneously) every puzzle I throw at it, including the one derived from Dehornoy’s braid mentioned above. So a subgoal of the initial goal is to find a puzzle that has a solution but that A_0 doesn’t solve. If we can’t, then maybe it is worth trying to prove that A_0 solves the word problem for this group. One comment about the algorithm A_0 is that it never increases the length of a word, though it often does moves that preserve the length. I was told by ChatGPT that it is an unsolved problem whether every braid that is equal to the identity can be reduced to the identity without ever increasing the number of crossings. Of course that isn’t a guarantee that it really is unsolved, but if it is, then that’s another interesting problem. It also means that I would be extremely interested if someone found an example of a word in the Artin-Tits group that can be reduced to the identity but only if one starts by lengthening it.

I’m hoping that this will be an enjoyable project for people who like both mathematics and programming. On the maths side, there is an unsolved problem to think about, and on the programming side there are lots of possibilities: for example, one could write programs that explore the space of words equal to the identity, trying to do so in such a way that there is a reasonable chance of reaching a word where it isn’t obvious how to reverse all the steps and get back to the identity. The game comes with a “sandbox” that has a few tools for generating words, but at the moment it is fairly primitive, and I would welcome suggestions for how to improve it.

It seems to me that as Polymath projects go, one can’t really lose: either we find an algorithm for the word problem, which establishes an unknown (at least according to ChatGPT) case of a decidability problem, or we find a suite of harder and harder puzzles, creating a more and more challenging and entertaining game and obtaining a deeper understanding of the Artin-Tits group in the process.

This Artin-Tits game has only a rather rudimentary and not very good tutorial created by ChatGPT. It’s easy to play once you’ve got the hang of it, and in the end I think the easiest way to get the hang of it is to watch someone else play it for a couple of minutes. I have therefore created a video tutorial, which you can find here. The video itself lasts about 25 minutes, but the tutorial part is under 10 minutes: what makes the video longer is an explanation of various features for making it easier to create puzzles, and also an explanation of how the algorithm works, which again is much easier to explain if I can demonstrate it on screen than if I have to write down some text. (I do have some text, since that is what I used as a prompt for ChatGPT to implement the algorithm, but it took a few iterations to get it working properly, so now I’m not sure what the quality of the code will be like, or even whether it is doing exactly what I want, though it appears to be.) Although I have embedded the video into this post, if you actually want to see what is going on you will probably need to watch it on YouTube using the full screen. I recommend not watching the explanation of the algorithm until you have played the game a few times first. Also, I should warn you that the games in the Advanced category are not all soluble, but games 1, 5 and 6 definitely are. Another thing I forgot to say on the video is that if you want you can “rotate” a word by clicking on one of its letters and dragging it to the right or left. If the word is equal to the identity, then its rotations will be as well, so this is a valid move. There is also a button for disabling this rotation facility if you want the puzzle to be slightly more challenging.

A quick note about the availability of the two games. They are hosted on the Netlify platform. I am on their free plan, which gives me a certain number of “credits” each month. I’m not sure how quickly these will run out, largely because I have no idea how many people will actually think it interesting to play the games. If they do run out, then the games will cease to be available until the credits are renewed, which for me happens on the 10th of each month. If this has happened and you are keen to play one of the games, another option is to download the html files and open them in your browser. Here is a link to the Artin-Tits game and here is one to the Tseytin game. If you are feeling particularly public-spirited, especially if you think you will play quite a lot, then you might consider doing that anyway, so that the Netlify credits run down more slowly. If the running out of the credits is quick enough for all this to be a real issue, then I may move to a paid plan.

I’ll finish with a quick tip for playing the Artin-Tits game, which I mentioned on the video but perhaps didn’t stress enough. Many of the moves consist in selecting three consecutive letters and using a relation of the form xyx=yxy to change them. Easy examples of this are replacement rules such as aba\to bab. But what if inverses are involved? I’ll represent inverses of generators with upper-case letters, so for example abA represents aba^{-1}, which in the game would be a white a followed by a white b followed by a black a. The word abA turns out to equal Bab. To remember this, a simple rule is that two letters of the same colour can be bracketed together and “pushed past” the third letter, which retains its colour but changes its value. Here, for example, we write abA as (ab)A and then swap them over, changing A into B in the process, getting B(ab), or Bab without the brackets. In group theoretic terms, this is of course saying that A and B are conjugates, and that ab is what conjugates one to the other. But when playing the game it is convenient to remember it by thinking that when you see a subword such as bDB, you can push the DB (and in particular the D) to the left, getting DBd.

March 25, 2026

Terence TaoLocal Bernstein theory, and lower bounds for Lebesgue constants

I’ve just uploaded to the arXiv my paper “Local Bernstein theory, and lower bounds for Lebesgue constants“. This paper was initially motivated by a problem of Erdős} on Lagrange interpolation, but in the course of solving that problem, I ended up modifying some very classical arguments of Bernstein and his contemporaries (Boas, Duffin, Schaeffer, Riesz, etc.) to obtain “local” versions of these classical “Bernstein-type inequalities” that may be of independent interest.

Bernstein proved many estimates concerning the derivatives of polynomials, trigonometric polynomials, and entire functions of exponential type, but perhaps his most famous inequality in this direction is:

Lemma 1 (Bernstein’s inequality for trigonometric polynomials) Let {P: {\bf R} \rightarrow {\bf C}} be a trigonometric polynomial of degree at most {n}, with {|P(x)| \leq A} for all {x}. Then {|P'(x)| \leq n A} for all {x}.

Similar inequalities concerning {L^p} norms of derivatives of Littlewood-Paley components of functions are now ubiquitious in the modern theory of nonlinear dispersive PDE (where they are also called Bernstein estimates), but this will not be the focus of this current post.

A trigonometric polynomial {P} of degree {n} is of exponential type {n} in the sense that {P(z) = O(\exp(n|z|))} for complex {z}. Bernstein in fact proved a more general result:

Lemma 2 (Bernstein’s inequality for functions of exponential type) Let {f: {\bf C} \rightarrow {\bf C}} be an entire function of exponential type at most {\lambda}, with {|f(x)| \leq A} for all {x \in {\bf R}}. Then {|f'(x)| \leq \lambda A} for all {x \in {\bf R}}.

There are several proofs of this lemma – see for instance this survey of Queffélec and Zarouf. In the case that {f} is real-valued on {{\bf R}}, there is a nice proof by Duffin and Schaeffer, which we sketch as follows. Suppose we normalize {A=\lambda=1}, and adjust {f} by a suitable damping factor so that {f(z)} actually decays slower than {\exp(|z|)} as {z \rightarrow \infty}. Then, for any {0 < \alpha < 1} and {x_0 \in {\bf R}}, one can use Rouche’s theorem to show that the function {\cos(x-x_0) - \alpha f(x)} has the same number of zeroes as {\cos(x-x_0)} in a suitable large rectangle; but on the other hand one can use the intermediate value theorem to show that {\cos(x-x_0) - \alpha f(x)} has at least as many zeroes than {\cos(x-x_0)} in the same rectangle. Among other things, this prevents double zeroes from occuring, which turns out to give the desired claim {|f'(x)| \leq 1} after some routine calculations (in fact one obtains the stronger bound {|f(x)|^2 + |f'(x)|^2 \leq 1} for all real {x}).

The first main result of the paper is to obtain localized versions of Lemma 2 (as well as some related estimates). Roughly speaking, these estimates assert that if {f} is holomorphic on a wide thin rectangle passing through the real axis, is bounded by {A} on the intersection of the real axis with this rectangle, and is “locally of exponential type” in the sense that it is bounded by {O(\exp( \lambda |\mathrm{Im} z|))} on the upper and lower edges of this rectangle (and obeys some very mild growth conditions on the remaining sides of this rectangle), then {|f'(x)|} can be bounded by {\lambda A} plus small errors on the real line, with some additional estimates away from the real line also available. The proof proceeds by a modification of the Duffin–Schaeffer argument, together with the two-constant theorem of Nevanlinna (and some standard estimates of harmonic measures on rectangles) to deal with the effect of the localization. (As a side note, this latter argument was provided to me by ChatGPT, as I was not previously aware of the Nevanlinna two-constant theorem.)

Once one localizes this “Bernstein theory”, it becomes suitable for the analysis of (real-rooted, monic) polynomials {P} of a high degree {n}, which are not bounded globally on {{\bf R}} (and grow polynomially rather than exponentially at infinity), but which can exhibit “local exponential type” behavior on various intervals, particularly in regions where the logarithmic potential

\displaystyle  U_\mu(z) := \frac{1}{n} \log \frac{1}{|P(z)|} = \int \log \frac{1}{|z-x|}\ d\mu(x)

behaves like a smooth function (here {\mu = \frac{1}{n} \sum_{j=1}^k \delta_{x_j}} is the empirical measure of the roots {x_1,\dots,x_n} of {P}). A key example is the (monic) Chebyshev polynomials {2^{1-n} T_n(x)}, which locally behave like sinusoids on the interval {[-1,1]} (and are locally of exponential type above and below this interval):

This becomes relevant in the theory of Lagrange interpolation. Recall that if {x_1 < \dots < x_n} are real numbers and {Q} is a polynomial of degree less than {n} then one has the interpolation formula

\displaystyle  Q(z) = \sum_{j=1}^k Q(x_k) \ell_j(z)

where the Lagrange basis functions {\ell_j(z)} are defined by the formula

\displaystyle  \ell_k(z) := \prod_{i \neq k} \frac{z - x_i}{x_k - x_i}.

In terms of the monic polynomial {P(z) := \prod_{j=1}^n (z-x_j)}, we can write

\displaystyle  \ell_k(z) = \frac{P(z)}{(z-x_k) P'(x_k)}.

The stability and convergence properties of Lagrange interpolation are closely related to the Lebesgue function

\displaystyle  \Lambda(z) := \sum_{k=1}^n |\ell_k(z)|,

and for a given interval {I}, the quantity {\sup_{x \in I} \Lambda(x)} is known as the Lebesgue constant for that interval.

If one chooses the interpolation points {x_1,\dots,x_n} poorly, then the Lebesgue constant can be extremely large. However, if one selects these points to be the roots of the aforementioned monic Chebyshev polynomials, then it is known that {\sup_{x \in I} \Lambda(x) = \frac{2}{\pi} \log n - O(1)} for all fixed intervals {I} in {[-1,1]}. In the case {I = [-1,1]}, it was shown by Erdős} that this is the best possible value of the Lebesgue constant up to {O(1)} errors for interpolation on {[-1,1]}, thus

\displaystyle  \sup_{x \in [-1,1]} \Lambda(x) \geq \frac{2}{\pi} \log n - O(1)

whenever {-1 \leq x_1 < \dots < x_n \leq 1} (a more precise bound was later shown by Vertesi). Erdős and Turán then asked if the same lower bound

\displaystyle  \sup_{x \in I} \Lambda(x) \geq \frac{2}{\pi} \log n - O(1) \ \ \ \ \ (1)

held for more general intervals {I}. This is shown in our paper; a variant integral bound

\displaystyle  \int_I \Lambda(x)\ dx \geq \frac{4}{\pi^2} |I| \log n - o(\log n) \ \ \ \ \ (2)

is also established, answering a separate question of Erdős. These lower bounds had previously obtained up to constants by Erdős and Szabados}; the main issue was to obtain the sharp constant in the main term.

In terms of the monic polynomial {P}, these two estimates can be written as

\displaystyle  \sup_{x \in I} \sum_{k=1}^n \frac{|P(x)|}{|x-x_k||P'(x_k)|} \geq \frac{2}{\pi} \log n - O(1)

and

\displaystyle  \int_I \sum_{k=1}^n \frac{|P(x)|}{|x-x_k||P'(x_k)|}\ dx \geq \frac{4}{\pi^2} |I| \log n - o(\log n).

Using the intuition that {P} should behave locally like a trigonometric polynomial, and performing some renormalizations, one can extract the following toy problem to work with first:

Problem 3 Let {P: {\bf R} \rightarrow {\bf R}} be a trigonometric polynomial of degree {n} with {2n} roots {x_1 < \dots < x_{2n}} in {[0, 2\pi)}.
  • (i) Show that

    \displaystyle  \sup_{x \in [0,2\pi)} \sum_{k=1}^{2n} \frac{|P(x)|}{|P'(x_k)|} \geq 2. \ \ \ \ \ (3)

  • (ii) Show that

    \displaystyle  \int_0^{2\pi} \sum_{k=1}^{2n} \frac{|P(x)|}{|P'(x_k)|}\ dx \geq 8. \ \ \ \ \ (4)

It is easy to check that the lower bounds of {2} and {8} are sharp by considering the case when {P} is a sinusoid {P(x) = A \sin(n(x-x_0))}.

The bound (3) is immediate from Bernstein’s inequality (Lemma 1). By applying a local version of this inequality, I was able to get a weak version of the claim (1) in which {O(1)} was replaced with {o(\log n)}; see this early version of the paper, which was developed through conversations with Nat Sothanaphan and Aron Bhalla. By combining this argument with ideas from the older work of Erdős}, I was able to establish (1).

The bound (2) took me longer to establish, and involved a non-trivial amount of playing around with AI tools, the story of which I would like to share here. I had discovered the toy problem (4), but initially was not able to establish this inequality; AlphaEvolve seemed to confirm it numerically (with sinusoids appearing to be the extremizer), but did not offer direct clues on how to prove this rigorously. At some point I realized that the left-hand side factorized into the expressions {\int_0^{2\pi} |P(x)|\ dx} and {\sum_{k=1}^{2n} \frac{1}{|P'(x_k)|}}, and tried to bound these expressions separately. Perturbing around a sinusoid {A \sin(n(x-x_0))}, I was able to show that the {L^1} norm {\int_0^{2\pi} |P(x)|\ dx} was a local minimum as long as one only perturbed by lower order Fourier modes, keeping the frequency {n} coefficients unchanged. Guessing that this local minimum was actually a global minimum, this led me to conjecture the general lower bound

\displaystyle  \int_0^{2\pi} |P(x)|\ dx \geq 4 |a_n+ib_n|

whenever {P} was a degree {n} trigonometric polynomial with highest frequency components {a_n \cos(nx) + b_n \sin(nx)}. AlphaEvolve numerically confirmed this inequality to be likely to be true also. I still did not see how to prove this inequality, but I decided to try my luck giving it to ChatGPT Pro, which recognized it as an {L^1} approximation problem and gave me a duality-based proof (based ultimately on the Fourier expansion of the square wave). With some further discussion, I was able to adapt this proof to functions of global exponential type (replacing the Fourier manipulations with contour shifting arguments, in the spirit of the Paley-Wiener theorem), which roughly speaking gave me half of what I needed to establish (2). However, I still needed the matching lower bound

\displaystyle  \sum_{k=1}^{2n} \frac{1}{|P'(x_k)|} \geq \frac{2}{|a_n+ib_n|}

on the other factor in the toy problem. Again, AlphaEvolve could numerically confirm that this inequality was likely to be true, but now the quantity I was trying to control did not look convex or linear in {P}, and so the previous duality method did not seem to apply. At this point I switched to pen and paper; eventually I realized that the expression almost looked like a sum of residues, and eventually after playing around with contour integrals of {\frac{e^{inz}}{P(z)}} using the residue theorem I was able to establish (4), and then with a bit more (human) effort I could move from the toy problem back to the original problem to obtain (2). Quite possibly AI tools would also have been able to assist with these steps, but they were not necessary here; their main value for me was in quickly confirming that the approach I had in mind was numerically plausible, and in recognizing the right technique to solve one part of the toy problem I had isolated. (I also used AI tools for several other secondary tasks, such as literature review, proofreading, and generating pictures, but these applications of AI have matured to the point where using them for this purpose is almost mundane.)

March 24, 2026

John PreskillHas quantum advantage been achieved? Part 2: Considering the evidence

Welcome back to: Has quantum advantage been achieved?

In Part 1 of this mini-series on quantum advantage demonstrations, I told you about the idea of random circuit sampling (RCS) and the experimental implementations thereof. In this post, Part 2 out of 3, I will discuss the arguments and evidence for why I am convinced that the experiments demonstrate a quantum advantage.

Recall from Part 1 that to assess an experimental quantum advantage claim we need to check three criteria:

  1. Does the experiment correctly solve a computational task?
  2. Does it achieve a scalable advantage over classical computation?
  3. Does it achieve an in-practice advantage over the best classical attempt at solving the task?

What’s the issue?

When assessing these criteria for the RCS experiments there is an important problem: The early quantum computers we ran them on were very far from being reliable and the computation was significantly corrupted by noise. How should we interpret this noisy data? Or more concisely:

  1. Is random circuit sampling still classically hard even when we allow for whatever amount of noise the actual experiments had?
  2. Can we be convinced from the experimental data that this task has actually been solved?

I want to convince you today that we have developed a very good understanding of these questions that gives a solid underpinning to the advantage claim. Developing that understanding required a mix of methodologies from different areas of science, including theoretical computer science, algorithm design, and physics and has been an exciting journey over the past years.

The noisy sampling task

Let us start by answering the base question. What computational task did the experiments actually solve?

Recall that, in the ideal RCS scenario, we are given a random circuit CC on nnqubits and the task is to sample from the output distribution of the state obtained |C\ket C from applying the circuit CC to a simple reference state. The output probability distribution of this state is determined by the Born rule when I measure every qubit in a fixed choice of basis.

Now what does a noisy quantum computer do when I execute all the gates on it and apply them to its state? Well, it prepares a noisy version ρC\rho_ C of the intended state |C\ket C and once I measure the qubits, I obtain samples from the output distribution of that noisy state.

We should not make the task dependent on the specifics of that state or the noise that determined it, but we can define a computational task based on this observation by fixing how accurate that noisy state preparation is. The natural way to do this is to use the fidelity

F(C)=C|ρC|C, F(C) = \bra C \rho_C \ket C,

which is just the overlap between the ideal state and the noisy state. The fidelity is 1 if the noisy state is equal to the ideal state, and 0 if it is perfectly orthogonal to it.

Finite-fidelity random circuit sampling
Given a typical random circuit CC, sample from the output distribution of any quantum state whose fidelity with the ideal output state |C\ket C is at least δ\delta.

Note that finite-fidelity RCS does not demand success for every circuit, but only for typical circuits from the random circuit ensemble. This matches what the experiments do: they draw random circuits and need the device to perform well on the overwhelming majority of those draws. Accordingly, when the experiments quote a single number as “fidelity”, it is really the typical (or, more precisely, circuit-averaged) fidelity that I will just call FF.

The noisy experiments claim to have solved finite-fidelity RCS for values of δ\delta around 0.1%. What is more, they consistently achieve this value even as the circuit sizes are increased in the later experiments. Both the actual value and the scaling will be important later.

What is the complexity of finite-fidelity RCS?

Quantum advantage of finite-fidelity RCS

Let’s start off by supposing that the quantum computation is (nearly) perfectly executed, so the required fidelity δ\delta is quite large, say, 90%. In this scenario, we have very good evidence based on computational complexity theory that there is a scalable and in-practice quantum advantage for RCS. This evidence is very strong, comparable to the evidence we have for the hardness of factoring and simulating quantum systems. The intuition behind it is that quantum output probabilities are extremely hard to compute because of a mechanism behind quantum advantages: destructive interference. If you are interested in the subtleties and the open questions, take a look at our survey.

The question is now, how far down in fidelity this classical hardness persists? Intuitively, the smaller we make δ\delta, the easier finite-fidelity RCS should become for a classical algorithm (and a quantum computer, too), since the freedom we have in deviating from the ideal state in our simulation becomes larger and larger. This increases the possibility of finding a state that turns out to be easy to simulate within the fidelity constraint.

Somewhat surprisingly, though, finite-fidelity RCS seems to remain hard even for very small values of δ\delta. I am not aware of any efficient classical algorithm that achieves the finite-fidelity task for δ\delta significantly away from the baseline trivial value of 2n2^{-n}. This is the value a maximally mixed or randomly picked state achieves because a random state has no correlation with the ideal state (or any other state), and 2n2^{-n} is exactly what you expect in that case (while 0 would correspond to perfect anti-correlation).

One can save some classical runtime compared to solving near-ideal RCS by exploiting a reduced fidelity, but the costs remain exponential. To classically solve finite-fidelity RCS, the best known approaches are reported in the papers that performed classical simulations of finite-fidelity RCS with the parameters of the first Google and USTC experiment (classSim1, classSim2). To achieve this, however, they needed to approximately simulate the ideal circuits at an immense cost. To the best of my knowledge, all but those first two experiments are far out of reach for these algorithms.

Getting the scaling right: weak noise and low depth

So what is the right value of δ\delta at which we can hope for a scalable and in-practice advantage of RCS experiments?

When thinking about this question, it is helpful to keep a model of the circuit in mind that a noisy experiment runs. So, let us consider a noisy circuit on nn qubits with dd layers of gates and single-qubit noise of strength ε\varepsilon on every qubit in every layer. In this scenario, the typical fidelity with the ideal state will decay as Fexp(εnd)F \sim \exp(- \varepsilon n d).

Any reasonably testable value of the fidelity needs to scale as 1/𝗉𝗈𝗅𝗒(n)1/\mathsf{poly}(n), since eventually we need to estimate the average fidelity FF from the experimental samples and this typically requires at least 1/F21/F^2 samples, so exponentially small fidelities are experimentally invisible. The polynomial fidelity δ\delta is also much closer to the near-ideal scenario (δ\delta \geq90%) than the trivial scenario (δ=2n\delta = 2^{-n}). While we cannot formally pin this down, the intuition behind the complexity-theoretic evidence for the hardness of near-ideal RCS persists into the δ1/𝗉𝗈𝗅𝗒(n)\delta \sim 1/\mathsf{poly}(n) regime: to sample up to such high precision, we still need a reasonably accurate estimate of the ideal probabilities, and getting this is computationally extremely difficult. Scalable quantum advantage in this regime is therefore a pretty safe bet.

How do the parameters of the experiment and the RCS instances need to scale with the number of qubits nn to experimentally achieve the fidelity regime? The limit to consider is one in which the noise rate decreases with the number of qubits, while the circuit depth is only allowed to increase very slowly. It depends on the circuit architecture, i.e., the choice of circuit connectivity, and the gate set, through a constant cAc_A as I will explain in more detail below.

Weak-noise and low-depth scaling
(Weak noise) The local noise rate of the quantum device scales as ε<cA/n\varepsilon \lt c_A/n.
(Low depth) The circuit depth scales as dlognd \lesssim \log n.

This limit is such that we have a scaling of the fidelity as FncF \gtrsim n^{-c} for some constant cc. It is also a natural scaling limit for noisy devices whose error rates gradually improve through better engineering. You might be worried about the fact that the depth needs to be quite low but it turns out that there is a solid quantum advantage even for log(n)\log(n)-depth circuits.

The precise definition of the weak-noise regime is motivated by the following observation. It turns out to be crucial for assessing the noisy data from the experiment.

Fidelity versus XEB: a phase transition

Remember from Part 1 that the experiments measured a quantity called the cross-entropy benchmark (XEB)

χ=22n𝔼C𝔼xpC(x)1,\chi = 2^{2n} \mathbb E_C \mathbb E_{x} p_C(x) -1 ,

The XEB averages the ideal probabilities pC(x)p_C(x) corresponding to the sampled outcomes xx from experiments on random circuits CC. Thus, it correlates the experimental and ideal output distributions of those random circuits. You can think of it as a “classical version” of the fidelity: If the experimental distribution is correct, the XEB will essentially be 1. If it is uniformly random, the XEB is 0.

The experiments claimed that the XEB is a good proxy for the circuit-averaged fidelity given by F=𝔼CF(C)F = \mathbb E_C F(C), and so we need to understand when this is true. Fortunately, in the past few years, alongside with the improved experiments, we have developed a very good understanding of this question (WN, Spoof2, PT1, PT2).

It turns out that the quality of correspondence between XEB and average fidelity depends strongly on the noise in the experimental quantum state. In fact, there is a sharp phase transition: there is an architecture-dependent constant cAc_A such that when the experimental local noise rate ε<cA/n\varepsilon < c_A/n, then the XEB is a good and reliable proxy for the average fidelity for any system size nn and circuit depth dd. This is exactly the weak-noise regime. Above that threshold, in the strong noise regime, the XEB is an increasingly bad proxy for the fidelity (PT1, PT2).

Let me be more precise: In the weak-noise regime, when we consider the decay of the XEB as a function of circuit depth dd, the rate of decay is given by εn\varepsilon n, i.e., the XEB decays as exp(εnd)\exp(- \varepsilon n d ). Meanwhile, in the strong-noise regime the rate of decay is constant, giving an XEB decay as exp(Cd)\exp(- C d) for a constant CC. At the same time, the fidelity decays as exp(εnd)\exp(- \varepsilon n d ) regardless of the noise regime. Hence, in the weak-noise regime, the XEB is a good proxy of the fidelity, while in the strong noise regime, there is an exponentially increasing gap between the XEB (which remains large) and the fidelity (which continues to decay exponentially). regardless of the noise regime.

This is what the following plot shows. We computed it from an exact mapping of the behavior of the XEB to the dynamics of a statistical-mechanics model that can be evaluated efficiently. Using this mapping, we can also compute the noise threshold cAc_A for whichever random circuit family and architecture you are interested in.

From (PT2). The yy-axis label Δ(lnχ)\Delta( \ln \chi ) is the decay rate of the XEB χ\chi, N=nN=n the number of qubits and ε\varepsilon is the local noise rate.

Where are the experiments?

We are now ready to take a look at the crux when assessing the noisy data: Can we trust the reported XEB values as an estimator of the fidelity? If so, do the experiments solve finite-fidelity RCS in the solidly hard regime where δ1/𝗉𝗈𝗅𝗒(n)\delta \geq 1/ \mathsf{poly}(n)?

In their more recent paper (PT1), the Google team explicitly verified that the experiment is well below the phase transition, and it turns out that the first experiment was just at the boundary. The USTC experiments had comparable noise rates, and the Quantinuum experiment much better ones. Since fidelity decays as exp(εnd)\exp(-\varepsilon n d), but the reported XEB values stayed consistently around 0.1% as nn was increased, the experimental error rate ε\varepsilon of the experiments improved even better than the 1/n1/n scaling required for the weak-noise regime, namely, more like ε1/(nd)\varepsilon \sim 1/(nd). Altogether, the experiments are therefore in the weak-noise regime both in terms of absolute numbers relative to cA/nc_A/n and the required scaling.

Of course, to derive the transition, we made some assumptions about the noise such as that the noise is local, and that it does not depend much on the circuit itself. In the advantage experiments, these assumptions about the noise are characterized and tested. This is done through a variety of means at increasing levels of complexity, including detailed characterization of the noise in individual gates, gates run in parallel, and eventually in a larger circuit. The importance of understanding the noise shows in the fact that a significant portion of the supplementary materials of the advantage experiments is dedicated to getting this right. All of this contributes to the experimental justification for using the XEB as a proxy for the fidelity!

The data shows that the experiments solved finite-fidelity RCS for values of δ\delta above the constant value of roughly 0.1% as the experiments grew. In the following plot, I compare the experimental fidelity values to the near-ideal scenario on the one hand, and the trivial 2n2^{-n} value on the other hand. Viewed at this scale, the values of δ\delta for which the experiment solved finite-fidelity RCS are indeed vastly closer to the near-ideal value than the trivial baseline, which should boost our confidence that reproducing samples at a similar fidelity is extremely challenging.

The phase transition matters!

You might be tempted to say: “Well, but is all this really so important? Can’t I just use XEB and forget all about fidelity?”

The phase transition shows why that would change the complexity of the problem: in the strong-noise regime, XEB can stay high even when fidelity is exponentially small. And indeed, this discrepancy can be exploited by so-called spoofers for the XEB. These are efficient classical algorithms which can be used to succeed at a quantum advantage test even though they clearly do not achieve the intended advantage. These spoofers (Spoof1, Spoof2) can achieve high XEB scores comparable to those of the experiments and scaling like exp(cd)\exp(-cd) in the circuit depth dd for some constant cc.

Their basic idea is to introduce strong, judiciously chosen noise at specific circuit locations that has the effect of breaking up the simulation task up into smaller, much easier components, but at the same time still gives a high XEB score. In doing so, they exploit the strong-noise regime in which the XEB is a really bad proxy for the fidelity. This allows them to sample from states with exponentially low fidelity while achieving a high XEB value.

The discovery of the phase transition and the associated spoofers highlights the importance of modeling when assessing—and even formulating—the advantage claim based on noisy data.

But we can’t compute the XEB!

You might also be worried that the experiments did not actually compute the XEB in the advantage regime because to estimate it they would have needed to compute ideal probabilities—a task that is hard by definition of the advantage regime. Instead, they used a bunch of different ways to extrapolate the true XEB from XEB proxies (proxy of a proxy of the fidelity). Is this is a valid way of getting an estimate of the true XEB?

It totally is! Different extrapolations—from easy-to-simulate to hard-to-simulate, from small system to large system etc—all gave consistent answers for the experimental XEB value of the supremacy circuits. Think of this as having several lines that cross in the same point. For that crossing to be a coincidence, something crazy, conspiratorial must happen exactly when you move to the supremacy circuits from different directions. That is why it is reasonable to trust the reported value of the XEB.

That’s exactly how experiments work!

All of this is to say that establishing that the experiments correctly solved finite-fidelity RCS and therefore show quantum advantage involved a lot of experimental characterization of the noise as well as theoretical work to understand the effects of noise on the quantity we care about—the fidelity between the experimental and ideal states.

In this respect (and maybe also in the scale of the discovery), the quantum advantage experiments are similar to the recent experiments reporting discovery of the Higgs boson and gravitational waves. While I do not claim to understand any of the details, what I do understand is that in both experiments, there is an unfathomable amount of data that could not be interpreted without preselection and post-processing of the data, theories, extrapolations and approximations that model the experiment and measurement apparatus. All of those enter the respective smoking-gun plots that show the discoveries.

If you believe in the validity of experimental physics methodology, you should therefore also believe in the type of evidence underlying experimental claim of the quantum advantage demonstrations: that they sampled from the output distribution of a quantum state with the reported fidelities.

Put succinctly: If you believe in the Higgs boson and gravitational waves, you should probably also believe in the experimental demonstration of quantum advantage.

What are the counter-arguments?

The theoretical computer scientist

“The weak-noise limit is not physical. The appropriate scaling limit is one in which the local noise rate of the device is constant while the system size grows, and in that case, there is a classical simulation algorithm for RCS (SimIQP, SimRCS).”

In theoretical computer science, scaling of time or the system size in the input size is considered very natural: We say an algorithm is efficient if its runtime and space usage only depend polynomially on the input size.

But all scaling arguments are hypothetical concepts, and we only care about the scaling at relevant sizes. In the end, every scaling limit is going to hit the wall of physical reality—be it the amount of energy or human lifetime that limits the time of an algorithm, or the physical resources that are required to build larger and larger computers. To keep the scaling limit going as we increase the size of our computations, we need innovation that makes the components smaller and less noisy.

At the scales relevant to RCS, the 1/n1/n scaling of the noise is benign and even natural. Why? Well, currently, the actual noise in quantum computers is not governed by the fundamental limit, but by engineering challenges. Realizing this limit therefore amounts to engineering improvements in the system size and noise rate that are achieved over time. Sure, at some point that scaling limit is also going to hit a fundamental barrier below which the noise cannot be improved. But we are surely far away from that limit, yet. What is more, already now logical qubits are starting to work and achieve beyond-breakeven fidelities. So even if the engineering improvements should flatten out from here onward, QEC will keep the 1/n1/n noise limit going and even accelerate it in the intermediate future.

The complexity maniac

“All the hard complexity-theoretic evidence for quantum advantage is in the near-ideal regime, but now you are claiming advantage for the low-fidelity version of that task.”

This is probably the strongest counter-argument in my opinion, and I gave my best response above. Let me just add that this is a question about computational complexity. In the end, all of complexity theory is based on belief. The only real evidence we have for the hardness of any task is the absence of an efficient algorithm, or the reduction to a paradigmatic, well-studied task for which there is no efficient algorithm.

I am not sure how much I would bet that you cannot find an efficient algorithm for finite-fidelity RCS in the regime of the experiments, but it is certainly a pizza.

The enthusiastic skeptic

“There is no verification test that just depends on the classical samples, is efficient and does not make any assumptions about the device. In particular, you cannot unconditionally verify fidelity just from the classical samples. Why should I believe the data?”

Yes, sure, the current advantage demonstrations are not device-independent. But the comparison you should have in mind are Bell tests. The first proper Bell tests of Aspect and others in the 80s were not free of loopholes. They still allowed for contrived explanations of the data that did not violate local realism. Still, I can hardly believe that anyone would argue that Bell inequalities were not violated already back then.

As the years passed, these remaining loopholes were closed. To be a skeptic of the data, people needed to come up with more and more adversarial scenarios that could explain the data. We are working on the same to happen with quantum advantage demonstrations: come up with better schemes and better tests that require less and less assumptions or knowledge about the specifics of the device.

The “this is unfair” argument

“When you chose the gates and architecture of the circuit dependent on your device, you tailored the task too much to the device and that is unfair. Not even the different RCS experiments solve exactly the same task.”

This is not really an argument against the achievement of quantum advantage but more against the particular choices of circuit ensembles in the experiments. Sure, the specific computations solved are still somewhat tailored to the hardware itself and in this sense the experiments are not hardware-independent yet, but they still solve fine computational tasks. Moving away from such hardware-tailored task specifications is another important next step and we are working on it.


In the third and last part of this mini series I will address next steps in quantum advantage that aim at closing some of the remaining loopholes. The most important—and theoretically interesting—one is to enable efficient verification of quantum advantage using less or even no specific knowledge about the device that was used, but just the measurement outcomes.

References

(survey) Hangleiter, D. & Eisert, J. Computational advantage of quantum random sampling. Rev. Mod. Phys. 95, 035001 (2023).

(classSim1) Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett. 129, 090502 (2022).

(classSim2) Kalachev, G., Panteleev, P., Zhou, P. & Yung, M.-H. Classical Sampling of Random Quantum Circuits with Bounded Fidelity. arXiv.2112.15083 (2021).

(WN) Dalzell, A. M., Hunter-Jones, N. & Brandão, F. G. S. L. Random Quantum Circuits Transform Local Noise into Global White Noise. Commun. Math. Phys. 405, 78 (2024).

(PT1)vMorvan, A. et al. Phase transitions in random circuit sampling. Nature 634, 328–333 (2024).

(PT2) Ware, B. et al. A sharp phase transition in linear cross-entropy benchmarking. arXiv:2305.04954 (2023).

(Spoof1) Barak, B., Chou, C.-N. & Gao, X. Spoofing Linear Cross-Entropy Benchmarking in Shallow Quantum Circuits. in 12th Innovations in Theoretical Computer Science Conference (ITCS 2021) (ed. Lee, J. R.) vol. 185 30:1-30:20 (2021).

(Spoof2) Gao, X. et al. Limitations of Linear Cross-Entropy as a Measure for Quantum Advantage. PRX Quantum 5, 010334 (2024).

(SimIQP) Bremner, M. J., Montanaro, A. & Shepherd, D. J. Achieving quantum supremacy with sparse and noisy commuting quantum computations. Quantum 1, 8 (2017).

(SimRCS) Aharonov, D., Gao, X., Landau, Z., Liu, Y. & Vazirani, U. A polynomial-time classical algorithm for noisy random circuit sampling. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 945–957 (2023).

John PreskillHas quantum advantage been achieved?

Recently, I gave a couple of perspective talks on quantum advantage, one at the annual retreat of the CIQC and one at a recent KITP programme. I started off by polling the audience on who believed quantum advantage had been achieved. Just this one, simple question.

The audience was mostly experimental and theoretical physicists with a few CS theory folks sprinkled in. I was sure that these audiences would be overwhelmingly convinced of the successful demonstration of quantum advantage. After all, more than half a decade has passed since the first experimental claim (G1) of “quantum supremacy” as the patron of this blog’s institute called the idea “to perform tasks with controlled quantum systems going beyond what can be achieved with ordinary digital computers” (Preskill, p. 2) back in 2012. Yes, this first experiment by the Google team may have been simulated in the meantime, but it was only the first in an impressive series of similar demonstrations that became bigger and better with every year that passed. Surely, so I thought, a significant part of my audiences would have been convinced of quantum advantage even before Google’s claim, when so-called quantum simulation experiments claimed to have performed computations that no classical computer could do (e.g. (qSim)).

I could not have been more wrong.

In both talks, less than half of the people in the audience thought that quantum advantage had been achieved.

In the discussions that ensued, I came to understand what folks criticized about the experiments that have been performed and even the concept of quantum advantage to begin with. But more on that later. Most of all, it seemed to me, the community had dismissed Google’s advantage claim because of the classical simulation shortly after. It hadn’t quite kept track of all the advances—theoretical and experimental—since then.

In a mini-series of three posts, I want to remedy this and convince you that the existing quantum computers can perform tasks that no classical computer can do. Let me caution, though, that the experiments I am going to talk about solve a (nearly) useless task. Nothing of what I say implies that you should (yet) be worried about your bank accounts.

I will start off by recapping what quantum advantage is and how it has been demonstrated in a set of experiments over the past few years.

Part 1: What is quantum advantage and what has been done?

To state the obvious: we are now fairly convinced that noiseless quantum computers would be able solve problems efficiently that no classical computer could solve. In fact, we have been convinced of that already since the mid-90ies when Lloyd and Shor discovered two basic quantum algorithms: simulating quantum systems and factoring large numbers. Both of these are tasks where we are as certain as we could be that no classical computer can solve them. So why talk about quantum advantage 20 and 30 years later?

The idea of a quantum advantage demonstration—be it on a completely useless task even—emerged as a milestone for the field in the 2010s. Achieving quantum advantage would finally demonstrate that quantum computing was not just a random idea of a bunch of academics who took quantum mechanics too seriously. It would show that quantum speedups are real: We can actually build quantum devices, control their states and the noise in them, and use them to solve tasks which not even the largest classical supercomputers could do—and these are very large.

What is quantum advantage?

But what exactly do we mean by “quantum advantage”. It is a vague concept, for sure. But some essential criteria that a demonstration should certainly satisfy are probably the following.

  1. The quantum device needs to solve a pre-specified computational task. This means that there needs to be an input to the quantum computer. Given the input, the quantum computer must then be programmed to solve the task for the given input. This may sound trivial. But it is crucial because it delineates programmable computing devices from just experiments on any odd physical system.
  2. There must be a scaling difference in the time it takes for a quantum computer to solve the task and the time it takes for a classical computer. As we make the problem or input size larger, the difference between the quantum and classical solution times should increase disproportionately, ideally exponentially.
  3. And finally: the actual task solved by the quantum computer should not be solvable by any classical machine (at the time).

Achieving this last criterion using imperfect, noisy quantum devices is the challenge the idea of quantum supremacy set for the field. After all, running any of our favourite quantum algorithms in a classically hard regime on these devices is completely out of the question. They are too small and too noisy. So the field had to come up with the conceivably smallest and most noise-robust quantum algorithm that has a significant scaling advantage against classical computation.

Random circuits are really hard to simulate!

The idea is simple: we just run a random computation, constructed in a way that is as favorable as we can make it to the quantum device while being as hard as possible classically. This may strike as a pretty unfair way to come up with a computational task—it is just built to be hard for classical computers without any other purpose. But: it is a fine computational task. There is an input: the description of the quantum circuit, drawn randomly. The device needs to be programmed to run this exact circuit. And there is a task: just return whatever this quantum computation would return. These are strings of 0s and 1s drawn from a certain distribution. Getting the distribution of the strings right for a given input circuit is the computational task.

This task, dubbed random circuit sampling, can be solved on a classical as well as a quantum computer, but there is a (presumably) exponential advantage for the quantum computer. More on that in Part 2.

For now, let me tell you about the experimental demonstrations of random circuit sampling. Allow me to be slightly more formal. The task solved in random circuit sampling is to produce bit strings x{0,1}nx \in \{0,1\}^n distributed according to the Born-rule outcome distribution

pC(x)=|x|C|0|2p_C(x) = | \bra x C \ket {0}|^2

of a sequence of elementary quantum operations (unitary rotations of one or two qubits at a time) which is drawn randomly according to certain rules. This circuit CC is applied to a reference state |0\ket 0 on the quantum computer and then measured, giving the string xx as an outcome.

The breakthrough: classically hard programmable quantum computations in the real world

In the first quantum supremacy experiment (G1) by the Google team, the quantum computer was built from 53 superconducting qubits arranged in a 2D grid. The operations were randomly chosen simple one-qubit gates (X,Y,X+Y\sqrt X, \sqrt Y, \sqrt{X+Y}) and deterministic two-qubit gates called fSim applied in the 2D pattern, and repeated a certain number of times (the depth of the circuit). The limiting factor in these experiments was the quality of the two-qubit gates and the measurements, with error probabilities around 0.6 % and 4 %, respectively.

A very similar experiment was performed by the USTC team on 56 qubits (U1) and both experiments were repeated with better fidelities (0.4 % and 1 % for two-qubit gates and measurements) and slightly larger system sizes (70 and 83 qubits, respectively) in the past two years (G2,U2).

Using a trapped-ion architecture, the Quantinuum team also demonstrated random circuit sampling on 56 qubits but with arbitrary connectivity (random regular graphs) (Q). There, the two-qubit gates were π/2\pi/2-rotations around ZZZ \otimes Z, the single-qubit gates were uniformly random and the error rates much better (0.15 % for both two-qubit gate and measurement errors).

All the experiments ran random circuits on varying system sizes and circuit depths, and collected thousands to millions of samples from a few random circuits at a given size. To benchmark the quality of the samples, the widely accepted benchmark is now the linear cross-entropy (XEB) benchmark defined as

χ=22n𝔼C𝔼xpC(x)1,\chi = 2^{2n} \mathbb E_C \mathbb E_{x} p_C(x) -1 ,

for an nn-qubit circuit. The expectation over CC is over the random choice of circuit and the expectation over xx is over the experimental distribution of the bit strings. In other words, to compute the XEB given a list of samples, you ‘just’ need to compute the ideal probability of obtaining that sample from the circuit CC and average the outcomes.

The XEB is nice because it gives 1 for ideal samples from sufficiently random circuits and 0 for uniformly random samples, and it can be estimated accurately from just a few samples. Under the right conditions, it turns out to be a good proxy for the many-body fidelity of the quantum state prepared just before the measurement.

This tells us that we should expect an XEB score of (1error per gate)# gatescnd(1-\text{error per gate})^{\text{\# gates}} \sim c^{- n d } for some noise- and architecture-dependent constant cc. All of the experiments achieved a value of the XEB that was significantly (in the statistical sense) far away from 0 as you can see in the plot below. This shows that something nontrivial is going on in the experiments, because the fidelity we expect for a maximally mixed or random state is 2n2^{-n} which is less than 101410^{-14} % for all the experiments.

The complexity of simulating these experiments is roughly governed by an exponential in either the number of qubits or the maximum bipartite entanglement generated. Figure 5 of the Quantinuum paper has a nice comparison.

It is not easy to say how much leverage an XEB significantly lower than 1 gives a classical spoofer. But one can certainly use it to judiciously change the circuit a tiny bit to make it easier to simulate.

Even then, reproducing the low scores between 0.05 % and 0.2 % of the experiments is extremely hard on classical computers. To the best of my knowledge, producing samples that match the experimental XEB score has only been achieved for the first experiment from 2019 (PCZ). This simulation already exploited the relatively low XEB score to simplify the computation, but even for the slightly larger 56 qubit experiments these techniques may not be feasibly run. So to the best of my knowledge, the only one of the experiments which may actually have been simulated is the 2019 experiment by the Google team.

If there are better methods, or computers, or more willingness to spend money on simulating random circuits today, though, I would be very excited to hear about it!

Proxy of a proxy of a benchmark

Now, you may be wondering: “How do you even compute the XEB or fidelity in a quantum advantage experiment in the first place? Doesn’t it require computing outcome probabilities of the supposedly hard quantum circuits?” And that is indeed a very good question. After all, the quantum advantage of random circuit sampling is based on the hardness of computing these probabilities. This is why, to get an estimate of the XEB in the advantage regime, the experiments needed to use proxies and extrapolation from classically tractable regimes.

This will be important for Part 2 of this series, where I will discuss the evidence we have for quantum advantage, so let me give you some more detail. To extrapolate, one can just run smaller circuits of increasing sizes and extrapolate to the size in the advantage regime. Alternatively, one can run circuits with the same number of gates but with added structure that makes them classically simulatable and extrapolate to the advantage circuits. Extrapolation is based on samples from different experiments from the quantum advantage experiments. All of the experiments did this.

A separate estimate of the XEB score is based on proxies. An XEB proxy uses the samples from the advantage experiments, but computes a different quantity than the XEB that can actually be computed and for which one can collect independent numerical and theoretical evidence that it matches the XEB in the relevant regime. For example, the Google experiments averaged outcome probabilities of modified circuits that were related to the true circuits but easier to simulate.

The Quantinuum experiment did something entirely different, which is to estimate the fidelity of the advantage experiment by inverting the circuit on the quantum computer and measuring the probability of coming back to the initial state.

All of the methods used to estimate the XEB of the quantum advantage experiments required some independent verification based on numerics on smaller sizes and induction to larger sizes, as well as theoretical arguments.

In the end, the advantage claims are thus based on a proxy of a proxy of the quantum fidelity. This is not to say that the advantage claims do not hold. In fact, I will argue in my next post that this is just the way science works. I will also tell you more about the evidence that the experiments I described here actually demonstrate quantum advantage and discuss some skeptical arguments.


Let me close this first post with a few notes.

In describing the quantum supremacy experiments, I focused on random circuit sampling which is run on programmable digital quantum computers. What I neglected to talk about is boson sampling and Gaussian boson sampling, which are run on photonic devices and have also been experimentally demonstrated. The reason for this is that I think random circuits are conceptually cleaner since they are run on processors that are in principle capable of running an arbitrary quantum computation while the photonic devices used in boson sampling are much more limited and bear more resemblance to analog simulators.

I want to continue my poll here, so feel free to write in the comments whether or not you believe that quantum advantage has been demonstrated (by these experiments) and if not, why.


Continue reading Part 2: Considering the evidence.

References

[G1] Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019).

[Preskill] Preskill, J. Quantum computing and the entanglement frontier. arXiv:1203.5813 (2012).

[qSim] Choi, J. et al. Exploring the many-body localization transition in two dimensions. Science 352, 1547–1552 (2016). .

[U1] Wu, Y. et al. Strong Quantum Computational Advantage Using a Superconducting Quantum Processor. Phys. Rev. Lett. 127, 180501 (2021).

[G2] Morvan, A. et al. Phase transitions in random circuit sampling. Nature 634, 328–333 (2024).

[U2] Gao, D. et al. Establishing a New Benchmark in Quantum Computational Advantage with 105-qubit Zuchongzhi 3.0 Processor. Phys. Rev. Lett. 134, 090601 (2025).

[Q] DeCross, M. et al. Computational Power of Random Quantum Circuits in Arbitrary Geometries. Phys. Rev. X 15, 021052 (2025).

[PCZ] Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett. 129, 090502 (2022).

March 20, 2026

n-Category Café The Agent That Doesn't Know Itself

guest post by William Waites

The previous post introduced the plumbing calculus: typed channels, structural morphisms, two forms of composition, and agents as stateful morphisms with a protocol for managing their state. The examples were simple. This post is about what happens when the algebra handles something genuinely complex.

To get there, we need to understand a little about how large language models work.

Large language models are sequence-to-sequence transducers: a sequence of tokens comes in, a sequence comes out. Text is tokenised and the model operates on the tokens.

From the outside, the morphism is simple: !string → !string. A message goes in, a message comes out. But the client libraries (the code that calls the LLM provider) maintain the conversation history and send it back with every call. The actual morphism is (!string, ![Message]) → (!string, ![Message]): the input message and the accumulated history go in, the response and the updated history come out. The history feeds back. This is a trace in the sense of traced monoidal categories: the feedback channel is hidden from the user, who sees only !string → !string.

Crucially, the model has a limited amount of memory. It is not a memoryless process, but the memory it has is not large: 200,000 tokens for current models, perhaps a million for the state of the art. This sounds like a lot. It is not. An academic paper is roughly 10,000 tokens. A literature review that needs to work with thirty papers has already exceeded the context window of most models, and that is before the model has produced a single word of output.

If you have used any of these agent interfaces, you will have noticed that after talking back and forth for a while, the agent will compact. This is a form of memory management. What is happening is that some supervisory process has noticed the context window filling up, and has intervened to shorten its contents. A naïve approach is to truncate: discard everything before the last N exchanges. A better approach is to feed the entire context to another language model and ask it to summarise, then put the summary back.

This is normally done by specialised code outside the agent, invisible to it.

How to manage agent memory well is an active research area. We do not, in general, do it very well. Truncation loses information. Summarisation loses nuance. Pinning helps but the right pinning strategy depends on the task. These are open questions, and to make progress we need to be able to experiment with different schemes and mechanisms: express a memory management strategy, test it, swap it for another, compare. Not by recompiling specialised code or hardcoding behaviour, but by writing it down in a language designed for exactly this kind of composition. Memory management should be a plumbing program: modular, type-checked, swappable.

So we built an implementation of compaction using the plumbing calculus, and the first thing we did was test it. I ran the protocol on a very short cycle: a single message caused a compaction, because the threshold was set to zero for testing. The compressor fired, produced a summary, rebuilt the agent’s context. The logs showed [compressor] 3404 in / 541 out. The protocol worked.

Then I asked the agent: "have you experienced compaction?"

The agent said no. It explained what compaction is, accurately. Said it hadn’t happened yet. It was confident.

I asked: "do you have a context summary in your window?"

Yes, it said, and described the contents accurately.

"How did that context summary get there if you have not yet compacted?"

The agent constructed a plausible, confident, and completely wrong explanation: the summary was "provided to me by the system at the start of this conversation" as a "briefing or recap." When pressed, it doubled down:

"The context-summary is not evidence that compaction has occurred. It’s more like a briefing or recap that the system gives me at the start of a conversation session to provide continuity."

The agent was looking at the direct evidence of its own compaction and confidently explaining why it was not compaction. We will return to why it gets this wrong, and how to fix it. But first: how do we build this?

The compacting homunculus

At a high level, it works like this. An agent is running: input comes in, output goes out. Together with the output, the agent emits a telemetry report. The telemetry includes token counts: with each transaction, the entire train of messages and responses is sent to the LLM provider, and back comes a response together with a count of the tokens that went in and the tokens that came out. Our agent implementation sends this telemetry out of the telemetry port to anybody who is listening.

The construction involves a second agent. This second agent is a homunculus: the little man who sits on your shoulder and watches what your mind is doing. Here is the topology:

Topology of the compacting homunculus. Two boxes: Agent (large, bottom) and Compressor (smaller, top). The Agent has input and output ports for the main data flow (dark blue arrows). Three channels connect the Agent to the Compressor: telemetry flows up from the Agent (the Compressor watches token counts), ctrl_out flows up from the Agent (the Compressor receives acknowledgements), and ctrl_in flows down from the Compressor to the Agent (the Compressor sends commands). The Agent does not know the Compressor exists. It just receives control messages and responds to them.

The homunculus listens to the telemetry and says: the memory is filling up. The token count has crossed a threshold. It is time to compact. And then it acts:

• Send pause to the agent’s control port. Stop accepting input.

• Send get memory. The agent produces the contents of its context window.

• Summarise that memory (using another LLM call).

• Send set memory with the compacted version.

• Send resume. The agent continues processing input.

Each step requires an acknowledgement before the next can proceed. This is a protocol: pause, acknowledge, get memory, here is the memory, set memory, acknowledge, resume, acknowledge.

It is possible to express this directly in the plumbing calculus, but it would be painfully verbose. Instead, we use session types to describe the protocol. This is not pseudocode. There is a compiler and a runtime for this language. Here is the protocol:

protocol Compaction =
  send Pause . recv PauseAck .
  send GetMemory . recv MemoryDump .
  send SetMemory . recv SetMemoryAck .
  send Resume . recv ResumeAck . end

The protocol is eight lines. It reads as a sequence of steps: send, receive, send, receive, and so on. The compiler knows what types each step carries. Now we wire it up:

let compact : (!CtrlResp, !json) -> !CtrlCmd =
  plumb(ctrl_out, telemetry, ctrl_in) {
 
    (ctrl_out, ctrl_in)  Compaction as session
 
    telemetry
      ; filter(kind = "usage" && input_tokens > 150000)
      ; map(null) ; session@trigger
 
    session@trigger ; map({pause: true})
      ; session@send(Pause)
    session@done(PauseAck) ; map({get_memory: true})
      ; session@send(GetMemory)
    session@recv(MemoryDump) ; compressor
      ; session@send(SetMemory)
    session@done(SetMemoryAck) ; map({resume: true})
      ; session@send(Resume)
}

The first line binds the protocol to the agent’s control ports:

(ctrl_out, ctrl_in) <-> Compaction as session.

This says: the Compaction protocol runs over the control channel, and we refer to it as session. The telemetry line is the trigger: when token usage crosses a threshold, the protocol begins. Each subsequent line is one step of the protocol, wired to the appropriate control messages.

Here is a direct depiction of the protocol as wired. You can trace it through:

Diagram of the compaction protocol wired between the homunculus and the bot agent. Shows the telemetry stream flowing from the bot to the homunculus, a filter checking token usage against a threshold, and then a sequence of control messages: Pause flows to ctrl_in, PauseAck returns on ctrl_out, GetMemory flows in, MemoryDump returns, passes through a compressor agent, SetMemory flows in, SetMemoryAck returns, Resume flows in, ResumeAck returns. The protocol steps are connected in sequence. This is a direct transcription of the session type protocol into a wiring diagram.

And here is how we wire the homunculus to the agent:

let main : !string -> !string =
  plumb(input, output) {
    let ctrl : !CtrlCmd = channel
    let ctrl_out : !CtrlResp = channel
    let telem : !json = channel
 
    spawn bot(input=input, ctrl_in=ctrl,
              output=output, ctrl_out=ctrl_out,
              telemetry=telem)
    spawn compact(ctrl_out=ctrl_out,
                  telemetry=telem, ctrl_in=ctrl)
}

The main morphism takes a string input and produces a string output. Internally, it creates three channels (control commands, control responses, telemetry) and spawns two processes: the bot agent and the compact homunculus. The homunculus listens to the bot’s telemetry and control responses, and sends commands to the bot’s control input. The bot does not know the homunculus exists. It just receives control messages and responds to them.

There are two nested traces here. The first is the one from before, inside the agent: messages go in, the output accumulates with everything that came before, and the whole history feeds back on the next turn. We do not see this trace. It is hidden inside the client library. The second trace is the one we have just built: the homunculus. What goes around the outer loop is control: telemetry flows out, commands flow in, acknowledgements come back. The memory dump passes through the control channel at one point in the protocol, but the feedback path is control, not conversation history. Nested traces compose; the algebra has identities for this and it is fine. But they are different loops carrying different things.

Session types as barrier chains

The connection between the protocol above and what the compiler actually produces is the functor from session types into the plumbing calculus. This functor works because of barrier.

Why do we need the barrier? Because the protocol is about sending a message and waiting for a response. We can send a message, but we need the response to arrive before we proceed. The barrier takes two streams, one carrying the "done" signal and one carrying the response, and synchronises them into a pair. Only when both are present does the next step begin.

Each session type primitive has a direct image in the plumbing category, and the structure is prettier than it first appears. The primitives come in dual pairs:

In the diagrams below, session types are on the left in blue; their images in the plumbing calculus are on the right in beige.

send and recv are dual. They map to map and filter, which are also dual: send wraps the value with map, then synchronises via barrier with the done signal from the previous step. Recv filters the control output by step number, synchronises via barrier, then extracts the payload with map.

select and offer are dual. They map to tag and case analysis, which are also dual: select tags the value with a label via map, synchronises via barrier, and routes to the chosen branch chain. Offer copies the control output and filters each copy by label, routing to the appropriate branch chain.

Diagram showing the two dual pairs of session type primitives and their images in the plumbing calculus. Session types are shown in blue on the left; plumbing calculus images in beige on the right. Top section: send T is a simple arrow carrying type T; its image is map(wrap) followed by barrier with a done signal, then routing to the control input. recv T is its dual: filtering ctrl_out by step number, barrier with done, then map to extract the payload. Bottom section: select with labelled alternatives maps to coproduct injection via map(tag), barrier, then routing to the chosen branch chain. offer is its dual: copy the control output, filter each copy by label, and route to the corresponding branch chain.

• The sequencing operator (.) maps to a barrier chain. Each send-then-recv step becomes a barrier that synchronises the outgoing message with the incoming acknowledgement, and these barriers chain together to enforce the protocol ordering.

Diagram showing how the sequencing operator in session types maps to a barrier chain in the plumbing calculus. Top: the session type sequence "send T1 . recv T2 . send T3" shown as three boxes in a row connected by dots. Bottom: the plumbing image, a chain of barriers. Each send-recv pair becomes a barrier that takes the outgoing message on one arm and the incoming acknowledgement on the other, producing a synchronised pair. The done signal from one barrier feeds into the next, creating a chain that enforces protocol ordering. The trigger input starts the chain; the done output signals completion. Filters select responses by step number; maps construct outgoing messages.

rec maps to a feedback loop: merge takes the initial arm signal and the last done signal from the previous iteration, feeds them into the barrier chain body, and copy at the end splits done into output and feedback. The trigger serialisation gate starts

end is implicit: the chain simply stops. Discard handles any remaining signals.

Diagram showing three more session type primitives and their plumbing images. Top: rec X . S (recursion) maps to a feedback loop. A merge node takes two inputs: the initial arm signal and the last-done signal fed back from the end of the body. The body is a barrier chain (S). At the output, copy splits the done signal into an output arm and a feedback arm that loops back to merge. Middle: end is shown as simply discarding any remaining signal. Bottom: the trigger and serialisation gate, which starts the protocol. A trigger input feeds through a barrier that synchronises with a copy of the done signal, ensuring only one instance of the protocol runs at a time.

This mapping is a functor. It is total: every session type primitive has an image in the plumbing category, using only the morphisms we already have. Session types are a specification language; the plumbing calculus is the execution language. The compiler translates one into the other.

The reason we do this becomes obvious from the diagram below. It is scrunched up and difficult to look at. If you click on it you can get a big version and puzzle it out. If you squint through the spaghetti, you can see that it does implement the same compaction protocol above. We would not want to implement this by hand. So it is nice to have a functor. If you have the patience to puzzle your way through it, you can at least informally satisfy yourself that it is correct.

Thumbnail of the fully desugared compaction protocol as produced by the compiler. The diagram is intentionally dense: a large network of barriers, filters, maps, copy and merge nodes, all connected by typed wires. Each step of the compaction protocol (pause, get memory, set memory, resume) is visible as a cluster of barrier chains with filters selecting response types and maps constructing commands. The full-size version is linked for readers who want to trace the individual connections, but the point is that this is what the compiler produces from the eight-line session type specification above, and you would not want to construct it by hand.

Document pinning

There is another feature we implement, because managing the memory of an agent is not as simple as just compressing it.

The problem with compression is that it is a kind of annealing. As the conversation grows, it explores the space of possible conversation. When it gets compacted, it is compressed, and that lowers the temperature. Then it grows again, the temperature rises, and then it is compressed again. With each compression, information is lost. Over several cycles of this, the agent can very quickly lose track of where it was, what you said at the beginning, what it was doing.

We can begin to solve this with document pinning. The mechanism is a communication between the agent and its homunculus, not shown in the protocol above. It is another protocol. The agent says: this document that I have in memory (technically a tool call and response, or just a document in the case of the prompts), pin it. What does that mean? When we do compaction, we compact the contents of memory, but when we replace the memory, we also replace those pinned documents verbatim. And of course you can unpin a document and say: I do not want this one any more.

Either the agent can articulate this or the user can. The user can say: you must remember this, keep track of this bit of information. And the agent has a way to keep the most important information verbatim, without it getting compacted away.

This accomplishes two things.

First, it tends to keep the agent on track, because the agent no longer loses the important information across compaction cycles. The annealing still happens to the bulk of the conversation, but the pinned documents survive intact.

Second, it has to do with the actual operation of the underlying LLM on the GPUs. When you send a sequence of messages, this goes into the GPU and each token causes the GPU state to update. This is an expensive operation, very expensive. This is why these things cost so much. What you can do with some providers is put a cache point and say: this initial sequence of messages, from the beginning of the conversation up until the cache point, keep a hold of that. Do not recompute it. When you see this exact same prefix, this exact same sequence of messages again, just load that memory into the GPU directly. Not only is this a lot more efficient, it is also a lot cheaper, a factor of ten cheaper if you can actually hit the cache.

So if you are having a session with an agent and the agent has to keep some important documents in its memory, it is a good idea to pin them to the beginning of memory. You sacrifice a little bit of the context window in exchange for making sure that, number one, the information in those documents is not forgotten, and number two, that it can hit the cache. This is explained in more detail in a separate post on structural prompt preservation.

The agent that doesn’t know itself

Why does the agent get this wrong? In one sense, it is right. It has not experienced compaction. Nobody experiences compaction. Compaction happens in the gap between turns, in a moment the agent cannot perceive. The agent’s subjective time begins at the summary. There is no "before" from its perspective.

The summary is simply where memory starts. It is like asking someone "did you experience being asleep?" You can see the evidence, you are in bed, time has passed. But you did not experience the transition.

The <context-summary> tag is a structural marker. But interpreting it as evidence of compaction requires knowing what the world looked like before, and the agent does not have that. It would need a memory of not having a summary, followed by a memory of having one. Compaction erases exactly that transition.

Self-knowledge as metadata

The fix is not complicated. It is perfectly reasonable to provide, along with the user’s message, self-knowledge to the agent as metadata. What would it be useful for the agent to know?

The current time. The sense of time that these agents have is bizarre. We live in continuous time. Agents live in discrete time. As far as they are concerned, no time passes between one message and the next. It is instantaneous from their point of view. You may be having a conversation, walk away, go to the café, come back two hours later, send another message, and as far as the agent is concerned no time has passed. But if along with your message you send the current time, the agent knows.

How full the context window is. The agent has no way of telling, but you can provide it: this many tokens came in, this many went out.

Compaction cycles. So the agent knows how many times it has been compacted, and can judge the accuracy of the contents of its memory, which otherwise it could not do.

With the compaction counter, the agent immediately gets it right:

"Yes, I have experienced compaction. According to the runtime context, there has been 1 compaction cycle during this session."

No hedging, no confabulation. Same model, same prompts, one additional line of runtime context.

Context drift

This matters beyond the compaction story, because many of the failures we see in the news are context failures, not alignment failures.

While we were writing this post, a story appeared in the Guardian about AI chatbots directing people with gambling addictions to online casinos. This kind of story is common: vulnerable people talking to chatbots, chatbots giving them bad advice. The response of the industry is always the same: we need better guardrails, better alignment, as though the chatbots are badly aligned.

I do not think that is what is happening. What is happening is a lack of context. Either the chatbot was never told the person was vulnerable, or it was told and the information got lost. Someone with a gambling addiction may start by saying "I have a gambling problem." Then there is a four-hour conversation about sports. Through compaction cycles, what gets kept is the four hours of sports talk. The important bit of information does not get pinned and does not get kept. Context drift. By the time the user asks for betting tips, the chatbot no longer knows it should not give them.

The way to deal with this is not to tell the language model to be more considerate. The way to deal with it is to make sure the agent has enough information to give good advice, and that the information does not get lost. This is what document pinning is for: pinned context survives compaction, stays at the top of the window, cannot be diluted by subsequent conversation. This is discussed further in a separate post on structural prompt preservation.

But pinning is only one strategy. The field is in its infancy. We do not really know the right way to manage agent memory, and we do not have a huge amount of experience with it. What we are going to need is the ability to experiment with strategies: what if compaction works like this? What if pinning works like that? What if the homunculus watches for different signals? Each of these hypotheses needs to be described clearly, tested, and compared. This is where the formal language earns its keep. A strategy described in the plumbing calculus is precise, checkable, and can be swapped out for another without rewriting the surrounding infrastructure. We can experiment with memory architectures the way we experiment with any other part of a system: by describing what we want and seeing if it works.

Why has nobody done this?

When the first draft of this post was written, it was a mystery why the field had not thought to give agents self-knowledge as a routine matter: what they are doing, who they are talking to, what they should remember. Prompts are initial conditions. They get compacted away. There are agents that save files to disc, in a somewhat ad hoc way, but we do not give them tools to keep track of important information in a principled way.

Contemporaneously with this work, some providers have started to do it. For example, giving agents a clock, the ability to know what time it is. This is happening now, in the weeks between drafting and publication. The field is only now realising that agents need a certain amount of self-knowledge in order to function well. The compressed timeline is itself interesting: the gap between "why has nobody done this?" and "everybody is starting to do this" was a matter of weeks.

The mechanisms we have presented here allow us to construct agent networks and establish protocols that describe rigorously how they are meant to work. We can describe strategies for memory management in a formal language, test them, and swap them out. And perhaps beyond the cost savings and the efficiency increases, the ability to experiment clearly and formally with how agents manage their own memory is where the real value lies.

March 14, 2026

Terence TaoMathematics Distillation Challenge – Equational Theories

Mathematical research traditionally involves a small number of professional mathematicians working closely on difficult problems. However, I have long believed that there is a complementary way to do mathematics, in which one works with a broad community of mathematically minded people on problems which may not be as deep as the problems one traditionally works on, but still are of mathematical interest; and that modern technologies, including AI, are more suitable for contributing the latter type of workflow. The “Polymath projects” were one example of this broad type of collaboration, where internet platforms such as blogs and wikis were used to facilitate such collaboration. Some years later, collaborative formalization projects (such as the one to formalize the Polynomial Freiman–Ruzsa conjecture of Marton, discussed previously on this blog here) became popular in some circles. And in 2024, I launched the Equational Theories Project (ETP) (discussed on this blog here and here), combining the rigor of Lean formalization with “good old fashioned AI” (in the form of automated theorem provers) to settle (with formal verification) over 22 million true-false problems in universal algebra.

Continuing in this spirit, Damek Davis and I are launching a new project, in the form of an experimental competitive challenge hosted by the SAIR Foundation (where I serve as a board member, and which is supplying technical support and compute). The idea of this challenge, motivated in part by this recent paper of Honda, Murakami, and Zhang, is to measure the extent to which the 22 million universal algebra true-false results obtained by the ETP can be “distilled” into a short, human-readable “cheat sheet”, similar to how a student in an undergraduate math class might distill the knowledge learned from that class into a single sheet of paper that the student is permitted to bring into an exam.

Here is a typical problem in universal algebra that the ETP was able to answer:

Problem 1 Suppose that {*: M \times M \rightarrow M} is a binary operation such that {x * (y * z) = (z * w) * w} for all {x,y,z,w}. Is it true that {x * (y * x) = (x * y) * z} for all {x,y,z}?

Such a problem can be settled either by algebraically manipulating the initial equation to deduce the target equation, or by finding a counterexample to the target equation that still satisfies the initial equation. There are a variety of techniques to achieve either of these, but this sort of problem is difficult, and even undecidable in some cases; see this paper of the ETP collaborators for more discussion. Nevertheless, many of these problems can be settled with some effort by humans, by automated theorem provers, or by frontier AI systems; here for instance is an AI-generated solution to the above problem.

However, these AI models are expensive, and do not reveal much insight as to where their answers come from. If one instead tries a smaller and cheaper model, such as one of the many open-source models available, it turns out that these models basically perform no better than random chance, in that when asked to say whether the answer to a question such as the above is true or false, they only answer correctly about 50% of the time.

But, similarly to how a student struggling with the material for a math class can perform better on an exam when provided the right guidance, it turns out that such cheap models can perform at least modestly better on this task (with success rates increasing to about 55%-60%) if given the right prompt or “cheat sheet”.

“Stage 1” of the distillation challenge, which we launched today, asks for contestants to design a cheat sheet (of at most 10 kilobytes in size) that can increase the performance of these models on the above true-false problems to as high a level as possible. We have provided a “playground” with which to test one’s cheat sheet (or a small number of example cheat sheets) some cheap models against a public set of 1200 problems (1000 of which were randomly selected, and rather easy, together with 200 “hard” problems that were selected to resist the more obvious strategies for resolving these questions); a brief video explaining how to use the playground can be found here.

Submissions stage will end on April 20, after which we will evaluate the submissions against a private subset of test questions. The top 1000 submissions will advance to a second stage which we are currently in the process of designing, which will involve more advanced models, but also the more difficult task of not just providing a true-false answer, but also a proof or counterexample to the problem.

The competition will be coordinated on this Zulip channel, where I hope there will be a lively and informative discussion.

My hope is that the winning submissions will capture the most productive techniques for solving these problems, and/or provide general problem-solving techniques that would also be applicable to other types of mathematical problems. We started with the equational theory project data set for this pilot competition due to its availability and spectrum of difficulty levels, but if this type of distillation process leads to interesting results, one could certainly run in on many other types of mathematical problem classes to get some empirical data on how readily they can be solved, particularly after we learn from this pilot competition on how to encourage participation and share of best practices.

SAIR will also launch some other mathematical challenges in the coming months that will be of a more cooperative nature than this particular competitive challenge; stay tuned for further announcements.

March 12, 2026

n-Category Café A Typed Language for Agent Coordination

guest post by William Waites

How category theory can be used to help coordinate collections of interacting large language models.

Agent frameworks are popular. (These are frameworks for coordinating large language model agents, not to be confused with agent-based modelling in the simulation sense.) There are dozens of them for wrapping large language models in something called an agent and assembling groups of agents into workflows. Much of the surrounding discussion is marketing, but the underlying intuition is old: your web browser identifies itself as a user agent. What is new is the capability that generative language models bring.

The moment you have one agent, you can have more than one. That much is obvious. How to coordinate them is not. The existing frameworks (n8n, LangGraph, CrewAI, and others) are engineering solutions, largely ad hoc. Some, like LangGraph, involve real thinking about state machines and concurrency. But none draws on what we know from mathematics and computer science about typed composition, protocol specification, or structural guarantees for concurrent systems.

This matters because it is expensive. Multi-agent systems are complicated concurrent programs. Without structural guardrails, they fail in ways you discover only after spending the compute. A job can go off the rails, and the money you paid for it is wasted; the providers will happily take it regardless. At current subscription rates the cost is hidden, but a recent Forbes investigation found that a heavy user of Anthropic’s $200/month Claude Code subscription can consume up to $5,000/month measured at retail API rates. For third-party tools like Cursor, which pay close to those retail rates, these costs are real. Wasted tokens are wasted money.

To address this, we built a language called plumbing. It describes how agents connect and communicate, in such a way that the resulting graph can be checked before execution: checked for well-formedness, and within limits for deadlocks and similar properties. It is a statically typed language, and these checks are done formally. There is a compiler and a runtime for this language, working code, not a paper architecture. In a few lines of plumbing, you can describe agent systems with feedback loops, runtime parameter modulation, and convergence protocols, and be sure they are well-formed before they run. This post explains how it works.

The name has a history in computing. Engineers have always talked informally about plumbing to connect things together: bits of software, bits of network infrastructure. When I was a network engineer I sometimes described myself as a glorified plumber. The old Solaris ifconfig command took plumb as an argument, to wire a network interface into the stack. Plan 9 had a deeper version of the same idea. The cultural connection goes back decades.

This is the first of two posts. This one introduces the plumbing calculus: what it is, how it works, and a few simple examples. Motifs for adversarial review, ensemble reasoning, and synthesis. The second post will tackle something harder.

The calculus

The plumbing language is built on a symmetric monoidal category, specifically a copy-discard category with some extra structure. The terminology may be unfamiliar, but the underlying concept is not. Engineers famously like Lego. Lego bricks have studs on top and holes with flanged tubes underneath. The studs of one brick fit into the tubes of another. But Lego has more than one connection type: there are also holes through the sides of Technic bricks, and axles that fit through them, and articulated ball joints for the fancier kits. Each connection type constrains what can attach to what. This is typing.

In plumbing, the objects of the category are typed channels: streams that carry a potentially infinite sequence of values, each of a specific type (integer, string, a record type, or something more complex). We write !A to mean "a stream of As", so !string is a stream of strings and !int is a stream of integers. The morphisms, which describe how you connect channels together, are processes. A process has typed inputs and typed outputs.

There are four structural morphisms. Copy takes a stream and duplicates it: the same values appear on two output streams. Discard throws values away, perhaps the simplest thing you can do with a stream, and often needed. These two, together with the typed channels and the laws of the category, give us a copy-discard category.

To this we add two more. Merge takes two streams of the same type and interleaves them onto a single output stream. This is needed because a language model’s input is a single stream. There is nothing to be done about that. If you want to send two different things into it, you must send one and then the other. One might initially give merge the type !A ⊗ !B → !(A + B), taking two streams of different types and producing their coproduct. This works, but it is unnecessarily asymmetrical.

As Tobias Fritz has observed, it is cleaner to do the coproduct injection first, converting each stream to the coproduct type separately, and then merge streams that already have the same type. This gives:

merge : !A ⊗ !A → !(A + A)

Barrier takes two streams, which may be of different types, and synchronises them. Values arrive unsynchronised; the barrier waits for one value from each stream and produces a pair.

barrier : !A ⊗ !B → !(A, B)

(A mathematician would write A ×\times B for the product. We cannot easily do this in a computer language because there is no ×\times symbol on most keyboards, so we use (A, B) for the product, following Haskell’s convention.)

This is a synchronisation primitive. It is important because it unlocks session types, which we will demonstrate in the second post.

Two further morphisms are added to the category (they are not derivable from the structural ones, but are needed to build useful things): map, which applies a pure function to each value in a stream, and filter, which removes values that do not satisfy a predicate. Both are pure functions over streams. Both will be familiar from functional programming.

Here is a graphical representation of the morphisms. We can glue them together freely, as long as the types and the directions of the arrows match up.

Diagram showing all six morphisms as boxes with typed input and output wires. Top row: copy Δ (one input, two outputs of the same type), merge ∇ (two inputs of copyable type, one output of sum type), discard ◇ (one input, no output). Bottom row: barrier ⋈ (two inputs, one paired output, synchronises two streams), map f (one input, one output, applies a function), filter p (one input, one output, removes values failing a predicate). Each morphism shows its type signature using the !A notation for copyable streams.

There are two forms of composition. Sequential composition connects morphisms nose to tail, the output of one feeding the input of the next. Parallel composition places them side by side, denoted by ⊗ (the tensor product, written directly in plumbing source code). So: four structural morphisms, two utilities, two compositional forms, all operating on typed channels.

Because the channels are typed, the compiler can check statically, at compile time, that every composition is well-formed: that outputs match inputs at every boundary. This gives a guarantee that the assembled graph makes sense.

Two diagrams side by side. Left: sequential composition, showing two morphisms connected end-to-end, the output wire of the first feeding into the input wire of the second, forming a pipeline. Right: parallel composition (tensor product), showing two morphisms stacked vertically with no connection between them, running simultaneously on independent streams. Both forms produce a composite morphism whose type is derived from the types of the components.

A composition of morphisms is itself a morphism. This follows from the category laws (it has to, or it is not a category) but the practical consequence is worth stating explicitly. We can assemble a subgraph of agents and structural morphisms, and then forget the internal detail and use the entire thing as a single morphism in a larger graph. This gives modularity. We can study, test, and refine a building block in isolation, and once satisfied, use it as a component of something bigger.

What we have described so far is the static form of the language: concise, point-free (composing operations without naming intermediate values), all about compositions. This is what you write. It is not what the runtime executes. A compiler takes this static form and produces the underlying wiring diagram, expanding the compositions into explicit connections between ports. The relationship is similar to point-free style in functional programming: the concise form is good for thinking and writing; the expanded form is good for execution.

Agents

An agent is a special kind of morphism. It takes typed input and produces typed output, like any other morphism, and we can enforce these types. This much is a well-known technique; PydanticAI and the Vercel AI SDK do it. Agents implement typing at the language model level by producing and consuming JSON, and we can check that the JSON has the right form. This is the basis of the type checking.

Unlike the structural morphisms and utilities, an agent is stateful. It has a conversation history, a context window that fills up, parameters that change. You cannot sensibly model an agent as a pure function. You could model it using the state monad or lenses, and that would be formally correct, but it is the wrong level of abstraction for engineering. Instead, we allow ourselves to think of agents as opaque processes with a typed protocol for interacting with them. We mutate their state through that protocol, and we know how to do that purely from functional programming and category theory. The protocol is the right abstraction; the state management is an implementation detail behind it. How this works in practice, and what happens when it goes wrong, is the subject of the second post.

In addition to their main input and output ports, agents in plumbing have control ports (control in and control out) for configuring the agent at runtime. For example, the temperature parameter governs how creative a language model is: how wide its sampling distribution when choosing output. At zero it is close to deterministic; at one it becomes much less predictable. A control message might say set temperature to 0.3; the response on the control out wire might be acknowledged. The control port carries a typed stream like anything else.

Agents also have ports for operator-in-the-loop (often called human-in-the-loop, though there is no reason an operator must be human), tool calls, and telemetry. The telemetry port emits usage statistics and, if the underlying model supports it, thinking traces. We will not detail these here. Suffice it to say that an agent has several pairs of ports beyond what you might imagine as its regular chat input and output.

Diagram of a generic agent morphism showing all port pairs. The agent is a central box. On the left: input (main data stream), ctrl_in (control commands), tool_in (tool call responses), oitl_in (operator-in-the-loop responses). On the right: output (main data stream), ctrl_out (control responses), tool_out (tool call requests), oitl_out (operator-in-the-loop requests), telemetry (usage and diagnostic data). Each port pair carries a typed stream. Most programs use only a few of these ports; unused ports are elided via the don't-care-don't-write convention.

An agent has many ports, but most programs use only a few of them. We adopt a convention from the κ calculus: don’t care, don’t write. Any output port that is not mentioned in the program is implicitly connected to discard. If a port’s output cannot matter, there is no reason to write it down.

Example: adversarial document composition

Suppose the problem is to write a cover letter for a job application. You provide some background material (a CV, some notes, some publications) and a job advert. You want a network of agents to produce a good cover letter. A good cover letter has two constraints: it must be accurate, grounded in the source materials, not making things up; and it must be compelling, so that the reader wants to give you an interview.

These two constraints are in tension, and they are best served by different agents with different roles. A composer drafts from the source materials. A checker verifies the draft against those materials for accuracy, producing a verdict: pass or fail, with commentary. A critic, who deliberately cannot see the source materials, evaluates whether the result is compelling on its own terms, producing a score.

The feedback loops close the graph. If the checker rejects the draft, its commentary goes back to the composer. If the critic scores below threshold, its review goes back to the composer. Only when the critic is satisfied does the final draft emerge.

Here is the plumbing code:

type Verdict = { verdict: bool, commentary: string, draft: string }
type Review  = { score: int, review: string, draft: string }
 
let composer : !string -> !string = agent { ... }
let checker  : !string -> !Verdict = agent { ... }
let critic   : !Verdict -> !Review = agent { ... }
 
let main : !string -> !string = plumb(input, output) {
  input   ; composer ; checker
  checker ; filter(verdict = false)
          ; map({verdict, commentary}) ; composer
  checker ; filter(verdict = true) ; critic
  critic  ; filter(score < 85)
          ; map({score, review}) ; composer
  critic  ; filter(score >= 85).draft ; output
}

And here is a graphical representation of what’s going on:

Vertical diagram of the adversarial document composition pipeline. Flow runs top to bottom. Input feeds into a composer agent. The composer's output goes to a checker agent. The checker splits two ways via filter: if verdict is false, the verdict and commentary are mapped back to the composer as feedback (loop). If verdict is true, the draft goes to a critic agent. The critic also splits two ways: if score is below 85, the score and review are mapped back to the composer for revision (second loop). If score is 85 or above, the draft is extracted via map and sent to the output. Two feedback loops, two quality gates, one output.

The agent configuration is elided. The main pipeline takes a string input and produces a string output. It is itself a morphism, and could be used as a component in something larger.

Notice what the wiring enforces. The critic receives verdicts, not the original source materials. The information partition is a consequence of the types, not an instruction in a prompt. The feedback loops are explicit: a failed verdict routes back to the composer with commentary; a low score routes back with the review. All of this is checked at compile time.

Example: heated debate

The previous example shows sequential composition and feedback loops but not parallel composition. An ensemble of agents running simultaneously on the same input needs the tensor product.

Ensembles are common. Claude Code spawns sub-agents in parallel to investigate or review, then gathers the results. This is a scatter-gather pattern familiar from high-performance computing. But this example, due to Vincent Danos, adds something less common: modulation of agent behaviour through the control port.

The input is a proposition. Two agents debate it, one advocating and one sceptical, running in parallel via the tensor product. Their outputs are synchronised by a barrier into a pair and presented to a judge. The judge decides: has the debate converged? If so, a verdict goes to the output. If not, a new topic goes back to the debaters, and a temperature goes to their control inputs.

The intuition is that the debaters should start creative (high temperature, wide sampling) and become progressively more focused as the rounds continue. The judge controls this. Each round, the judge decides both whether to continue and how volatile the next round should be. If the debate appears to be converging, the judge lowers the temperature, preventing the system from wandering off in new directions. Whether this actually causes convergence is a research question, not a proven result.

type Verdict = { resolved: bool, verdict: string,
                 topic: string, heat: number }
type Control = { set_temp: number }
 
let advocate : (!string, !Control) -> !string = agent { ... }
let skeptic  : (!string, !Control) -> !string = agent { ... }
let judge    : !(string, string) -> !Verdict  = agent { ... }
 
let cool : !Verdict -> !Control = map({set_temp: heat})
 
let main : !string -> !string = plumb(input, output) {
  input ; (advocate ⊗ skeptic) ; barrier ; judge
  judge ; filter(resolved = false).topic ; (advocate ⊗ skeptic)
  judge ; filter(resolved = true).verdict ; output
  judge ; cool ; (advocate@ctrl_in ⊗ skeptic@ctrl_in)
}

And here is the graphical representation:

Diagram of the heated debate example. Two agent boxes (advocate and skeptic) are placed in parallel via tensor product, both receiving the same input proposition. Their outputs feed into a barrier which synchronises them into a pair. The pair goes to a judge agent. The judge has two outputs: a verdict (going to the main output) and a feedback loop. The feedback loop carries both a new topic (routed back to the debaters' inputs) and a temperature setting (routed to both debaters' control input ports via ctrl_in). The diagram shows parallel composition, barrier synchronisation, and a control feedback loop in one system.

The operator is the tensor product: parallel composition. (The grammar also accepts * for editors that cannot input unicode.) The advocate and skeptic run simultaneously on the same input. The barrier synchronises their outputs into a pair for the judge. The last line is the control feedback: the judge’s verdict is mapped to a temperature setting and sent to both agents’ control inputs. Notice that advocate@ctrl_in addresses a specific port on the agent, the control port rather than the main input.

This is a small program. It is also a concurrent system with feedback loops, runtime parameter modulation, and a convergence protocol. Without types, getting the wiring right would be a matter of testing and hope. With types, it is checked before it runs.

What this shows

In a few lines of code, with a language that has categorical foundations, we can capture interesting agent systems and be sure they are well-formed before they run.

The upshot: when we have guarantees about well-formedness, systems work more stably and more predictably. With static typing, entire classes of structural errors are impossible. You cannot wire an output of one type to an input of another. You cannot forget a connection. The job you pay for is more likely to actually work, and you get more useful work per dollar spent. Runtime budget controls can put a ceiling on cost, but they do not prevent the waste. Static typing prevents the waste. But there is a lot more to do. What we have so far is already useful as a language for constructing agent graphs with static type checking. But we have given short shrift to the complexity and internal state of the agent morphism, which is really all about memory architecture and context management. That is where the real power comes from. For that we need more than a copy-discard category with some extra structure. We need protocols—and that is the subject of the sequel, soon to appear here.

The plumbing compiler, runtime, and MCP server are available as binary downloads for macOS and Linux:

Download plumbing version 0.

Here is the research paper describing the broader programme of work:

• William Waites, Artificial organisations (arXiv:2602.13275).

February 22, 2026

n-Category Café The Univalence Principle

(guest post by Dimitris Tsementzis, about joint work with Benedikt Ahrens, Paige North, and Mike Shulman)

The Univalence Principle is the informal statement that equivalent mathematical structures are indistinguishable. There are various ways of making this statement formally precise, and a long history of work that does so. In our recently-published (but long in the making!) book we proved to our knowledge the most general version of this principle, which applies to set-based, categorical, and higher-categorical structures defined in a non-algebraic and space-based style, as well as models of higher-order theories such as topological spaces.

This work achieves three main goals. Firstly, it greatly extends the “Structure Identity Principle” from the original HoTT book, to include any (finite) level of structure, instead of just set-based structures, thus establishing in the strongest sense yet made precise that the Univalent Foundations provide an equivalence-invariant foundation for higher-categorical mathematics. Secondly, it provides very general novel definitions of equivalence between structures and between objects in a given structure that “compile” to most known notions of equivalence in known cases, but which can also be used to suggest notions in new settings; in doing so it extends M. Makkai’s classic work on First Order-Logic with Dependent Sorts (FOLDS). Thirdly, the setting in which our result is proved (a form of Two-Level Type Theory) provides a framework in which to do metamathematics in the Univalent Foundations/HoTT, i.e. carry out the mathematical study of how mathematics is formalized in UF/HoTT. The Univalence Principle we prove is a foundational metamathematical result in that sense.

Setting the Stage

Any “Univalence Principle” type of result has the following form:

M,NStructures(),(M N)(M=N) \forall M, N \in Structures(\mathcal{L}), (M \simeq_\mathcal{L} N) \simeq (M = N)

The result gains in strength if the class of Structures()Structures(\mathcal{L}) is as large as possible, and the notion of equivalence M NM \simeq_{\mathcal{L}} N between them coincides with known notions of equivalence in practice (where \mathcal{L} is a placeholder for a notion of signature in terms of which MM and NN are structures). Such a result also gains in strength if the middle notion of equivalence is as “structural” as possible, ensuring that not only properties of the structures are preserved, but also constructions built on top of them. This last feature can only really be pursued within a univalent type theory, like HoTT.

In our work, we define: diagrammatic signatures \mathcal{L} as inverse categories of finite height, \mathcal{L}-structures as Reedy-fibrant functors 𝒰\mathcal{L} \to \mathcal{U}, and a notion of indiscernibility relating objects within structures, which yields a derived notion of equivalence \simeq_{\mathcal{L}} between structures (essentially structure-preserving bijections up to indiscernibility). The primitive notions \simeq and == are given by equivalence and equality in 2LTT, appropriately defined.

Signatures, Structures (and where they live)

Two-level type theory (2LTT)

In 2LTT, there is an external level for exo-types and other exo-gizmos (allowing strictly functorial constructions), and the usual fibrant (HoTT/UF) level where mathematical objects live. The external level is the metamathematical or syntactic level, where a strict equality exists that allows us to define syntax and symbols. Our exo-gizmos are analogous to sets of syntactical symbols used to define signatures in first-order logic.

Such syntax consists of categories with strict equality on composition, which we call exo-categories. Equalities here are strict (exo-equalities), while the internal world has homotopical paths. The 2LTT setting is convenient for type-theoretic reasoning, and allows us to neatly separate the various notions of equality at play.

Diagram signatures are inverse categories of finite height

A diagram signature \mathcal{L} is an inverse exo-category of finite height equipped with a rank functor rk:( exo) op \mathrm{rk}:\mathcal{L} \longrightarrow (\mathbb{N}^{\mathrm{exo}})^{\mathrm{op}} that reflects identities. Objects of \mathcal{L} are the sorts (analogous to “sorts” in first-order logic); morphisms encode dependencies (“an arrow depends on a pair of objects,” etc.). Inverse-ness gives matching objects and allows Reedy methods to apply.

To illustrate the idea, take the example of a reflexive graph. The diagram signature for reflexive graphs would be given by the following inverse (exo-)category

I i A c d O \array{ I \\ \downarrow^i \\ A \\ \downarrow_{c} \;\; \downarrow_{d} \\ O }

where ci=dic i = d i. The intuition is that we have a sort AA of “arrows” between any two objects, and a predicate II (“identity”) that can be used to select which arrows are identity arrows with the relation ensuring that this predicate can only be “asked” of loops.

Structures are Reedy fibrant diagrams over these signatures

Given this notion of a signature, a structure in our sense is simply a (Reedy fibrant) functor 𝒰\mathcal{L} \to \mathcal{U}. In more detail, a raw \mathcal{L}-diagram is an exo-functor M:𝒰 exo M:\mathcal{L}\to\mathcal{U}^{\mathrm{exo}} For each sort KK, the matching object 𝕄 KM\mathbb{M}_{K}M collects the compatible lower-rank data needed to specify the “boundary” for an element of MKM K. The Reedy fibrancy condition is: the canonical boundary map M K𝕄 KMM_K\to \mathbb{M}_{K}M is a fibration (i.e., a dependent type) for each KK. The category of such Reedy-fibrant diagrams then forms a fibrant type Str()\mathrm{Str}(\mathcal{L}) whose points are the \mathcal{L}-structures.

To illustrate with the example of rg\mathcal{L}_{rg} from above, an rg\mathcal{L}_{rg}-structure GG in our sense would be given by the following data, in type-theoretic notation:

  • A type GO:𝒰GO:\mathcal{U}
  • A family GA:GO×GO𝒰GA: GO\times GO\to\mathcal{U} dependent on GOGO
  • A family GI: xGA(x,x)𝒰GI:\prod_x GA(x,x)\to\mathcal{U} for the “identity”

The trick here is to ensure the repeated xx in the definition of GIGI, obeying the relations of the signature. This is what the matching object machinery achieves for arbitrary signatures.

Other examples of \mathcal{L}-structures include: categories (with sorts for objects and morphisms), groupoids, nn-categories for any nn, preorders, and models of higher-order theories like topological spaces and sup-lattices. Furthermore, all of what we say applies to theories with axioms (not just structures), which we define in the book too.

Indiscernibility and local univalence

With our formal set-up complete, we address the central question: for arbitrary signatures \mathcal{L}, when are two “objects” in an \mathcal{L}-structure equivalent? Such an “object” could be a categorical (or higher-categorical) gadget itself when \mathcal{L} has height >2\gt 2, e.g. the “objects” of 2Cat\mathbf{2-}\mathbf{Cat} are themselves categories, which are \mathcal{L}'-structures for an \mathcal{L}' of lower height. The key idea is: two “objects” are indiscernible if nothing in the signature can distinguish them…up to indiscernibility!

Indiscernibilities and Local Univalence

Fix a signature \mathcal{L}, an \mathcal{L}-structure MM, a rank 00 (“bottom”) sort K:(0)K: \mathcal{L}(0), and elements a,b:MKa,b:M K. Think of KK as a “set” of objects (e.g. of a category) on top of which properties and structures are defined.

To define when aa and bb are indiscernible, we package together everything that could distinguish them. The intuition is: if there is an equivalence between “everything you can say about aa” and “everything you can say about bb” (outside of directly referencing aa or bb themselves), then they are indiscernible. We achieve this through a formal definition of a “boundary” aM\partial_a M, which one can think of as the \mathcal{L}-structure that contains everything that could distinguish aa in MKMK from anything else in MKMK. Which naturally leads us to the following definitions.

Definition (Indiscernibility). We say that a,b:MKa,b : MK are indisernible, written aba\approx b, iff there is a levelwise equivalence aM bM\partial_a M \simeq \partial_b M that is coherent in the right way.

Definition (Univalent Structure). We say that MM is locally univalent at K if the canonical map (a=b)(ab) (a=b) \;\longrightarrow\; (a\approx b) is an equivalence. We then say a structure MM is univalent if this holds at all rank-0 sorts and (recursively) for all sorts of higher rank.

We prove our main results for univalent \mathcal{L}-structures. These are quite special in that they are “saturated” in their equivalence information: two indistinguishable gizmos are actually equal. Or, put differently: when there is not “enough” structure to distinguish two gizmos, there is always enough to prove them equivalent. Some examples to illustrate (see Part 2 of the book for many more!):

  • In a reflexive graph structure, two nodes aa and bb are indiscernible iff they have the same number of arrows coming in and out, and the same number of loops that are identities.
  • In a category, two objects are indiscernible iff they are isomorphic. A univalent category is precisely a category (precategory in UF) that is locally univalent at the sort of its objects.
  • In an appropriately defined bicategory, an indiscernibility amounts to a pair of coherent adjoint equivalenes, as expected.

Equivalences of structures

We are almost there! On to the final missing piece: equivalences between structures themselves. Let f:MNf:M\to N be a morphism of \mathcal{L}-structures, defined in the expected way (as a natural transformation between the corresponding 𝒰\mathcal{U}-valued functors). Then, for each sort KK, we have a matching square

MK f K NK 𝕄 KM 𝕄 Kf KN \array{ MK & \overset{\rightarrow}{f_K} & NK\\ \downarrow & & \downarrow \\ \mathbb{M}_KM & \overset{\rightarrow}{\mathbb{M}_K f} & \mathbb{N}_KN }

and for each “context” z:𝕄 KMz: \mathbb{M}_KM an induced fiber map f K,z:(MK) z(NK) 𝕄 Kf(z). f_{K,z}: (MK)_z \longrightarrow (NK)_{\,\mathbb{M}_K f(z)}. Write \approx for indiscernibility at sort KK (as above). Then, what we really want to say now is: MM, NN are equivalent if they are “level-wise equivalent up to indiscernibility”. This idea gives us the \simeq_{\mathcal{L}} from the beginning, and we define it as follows:

Definition (Equivalence of \mathcal{L}-structures). ff is an equivalence if, for every sort KK, every z:𝕄 KMz : \mathbb{M}_KM, and every y:(N K) 𝕄 Kf(z)y : (N_K)_{\mathbb{M}_K f(z)}, there exists a specified x:(MK) zx : (MK)_z with f K,z(x)y f_{K,z}(x)\;\approx\; y i.e., f K,zf_{K,z} is essentially split surjective up to indiscernibility on each fiber. We write M NM \simeq_{\mathcal{L}} N for the type of equivalences.

The key innovation is the “up to indiscernibility” part; it makes our notion significantly weaker than usual notions, and hence the final result stronger. Note that we have not been able to prove our result without a splitness condition in the definition of equivalence, and to our mind this remains an open problem.

Our definition is related to Makkai’s original notion of FOLDS equivalence, but Makkai was not able to define a general notion of non-surjective equivalence directly, relying instead on spans. Our notion of indiscernibility circumvents this difficulty and allows us to consider the whole structure of equivalences between structures.

The Univalence Principle

With all our apparatus in place we prove our main result, a very general form of a univalence principle, as promised.

Theorem (“The Univalence Principle”). For a signature \mathcal{L} and univalent \mathcal{L}-structures M,NM,N, the canonical map (M=N)(M N) (M = N) \;\longrightarrow\; (M \simeq_{\mathcal{L}} N) is an equivalence of types. In other words, (M=N)(M N)(M = N) \simeq (M \simeq_{\mathcal{L}} N).

The proof proceeds by induction on the height of \mathcal{L}. The key insight is that level-wise equivalences between univalent structures must “reflect indiscernibilities”: if ff doesn’t preserve the ability to distinguish elements, then whatever structure distinguishes them in the source would transfer to structure distinguishing their images in the target, contradicting the equivalence. With the splitness assumption in the map and the assumption of univalence (of our \mathcal{L}-structure), we are able to achieve the lifting of the indiscernibility.

Our result is actually proved for an even more general class of signatures called functorial signatures, which strictly extends diagram signatures and covers “higher-order” situations (topological spaces, suplattices, DCPOs, etc.). We have stuck to the diagrammatic view in this post for intuition, but all results and definitions carry over to this more general notion.

Conclusion

In the course of this work there were quite a few questions we tried to answer, but could not. To mention a couple: Can we define a Rezk completion for arbitrary structures, providing a universal way to turn any structure into a univalent one? Can we remove the splitness condition from our definition of equivalence between structures? We list more open problems at the end of the book.

Beyond our specific result, the framework in which it is proven establishes a way to answer metamathematical questions around the univalence axiom in a precise and fruitful way. It is important to emphasize that carrying out this type of mathematical study does not require choosing one foundation over the other. In any setting that interprets the fibrant part of 2LTT, the univalence principle will hold, including in set theory.

The metamathematics of UF is the mathematical study of formalizing mathematics in terms of a hierarchy of hh-levels vs. a cumulative hierarchy of sets. Formalizing mathematics in this way has all sorts of unique mathematical properties. The Univalence Principle is one of them.

February 15, 2026

David Hoggnumber of exposures per visit?

I wrote code this weekend to look at the question of how we should visit a star in the upcoming Terra Hunting Experiment. The current (straw-person) plan is that we will observe each visible star once per night for ten years, with one exposure of a sensibly-chosen exposure time at each visit. Is this a good idea? I was interested in this problem for two reasons. The first is that binning is sinning, with the corollary that bigger bins are worse than finer bins, and a single, long exposure is a very big bin. The second reason is that when there are non-trivial noise sources (like the quasi-periodic variations from p-mode oscillations of the surfaces of Sun-like stars), a few negatively- (or interestingly-) correlated noise draws can be combined in ways that are substantially more informative than by taking the average.

Of course, if you split an exposure (with a standard CCD, say) into sub-exposures, you take on real costs: There is a read time, which is time you aren't integrating, and there is a read noise, that affects each new exposure. So the best strategies are a complicated function of read time, read noise, and the signal-to-noise at which the stellar p-mode oscillations are visible in any realistic data. Related: There are amazingly different and interesting strategies with up-the-ramp detectors that are used in the infrared.

One final comment is that the objective, in my strongly held view, is to optimize the amount of information (about, say, the center-of-mass radial-velocity changes of the target star) per unit wall-clock time. We are paying for wall-clock time; let's get as much as we can out of it.

February 13, 2026

David HoggThe LLMs, and why do we do astrophysics?

Today my rant on LLMs and the practices of our field hit the arXiv. I was scared to post it, because it is such a weird contribution, and it is so revealing about myself and my own political positions and hangups. But I have to say: I got great and supportive feedback all day.

I got two comments on saying ACAB in the literature. The Astronomer Royal of Scotland quoted (on BlueSky) the last sentence, which I put there because Andy Casey (Monash, Flatiron) insisted. Many people sent me appreciation and thank-yous, and many people sent me comments and objections. Always constructive. The whole experience made me feel very happy about the state of our field and the way we all interact. I think maybe there will be critical mass to write some kind of collection of essays on the subject. That's a plan for 2026.

February 05, 2026

Matt Strassler The Physicists and Mr. Epstein

Mr. Epstein was not only a world-class child abuser, he was a big fan of theoretical high-energy physics and of theoretical physicists. Some of my colleagues, unfortunately, got to know him. A number who were famous and/or had John Brockman as a book agent were even invited to a physics conference on Epstein’s private island, well before he was first arrested. This was no secret; as I recall, a lot of us heard about the existence of this conference/trip, but we hadn’t heard Epstein’s name before and didn’t pay much attention (ho hum, just another weird billionaire).

Personally, I feel quite lucky. The Brockman agency rejected the proposal for my recent book without comment (thank you!); and my research is mostly considered unimportant by the Brian Greenes of the world. As a result, I was not invited to Epstein’s island, never made his acquaintance, and blissfully avoided the entire affair. Clearly there are some benefits to being considered ordinary. And so — I’m sorry/not-sorry to say — I can’t tell you much about Epstein at all, or about how certain physicists did and did not interact with him. Regarding my colleagues who did get to know him, I can’t speak for them, since I wasn’t there, and I don’t know to what extent Epstein hid his immoral activities when they were around. It’s up to them to tell their own stories if they feel the need to do so (and I hope a couple of them do, just to clear the air.) Personally I tend to give them the benefit of the doubt — probably some literally didn’t know what was up until Epstein’s arrest in 2008, while perhaps others felt there wasn’t much they could do about Epstein’s actions on his own private island. I imagine they are deeply embarrassed to have been caught in this horrible man’s ugly web.

Fans of physics come in all shapes and sizes, and some have large wallets, large egos, and/or large ambitions. Among the wealthy supporters, we can count Alfred Nobel himself; billionaires sit on important scientific institute and university boards, and the more recent Breakthrough Prizes were funded by deep pockets. The extreme wealthy have outsized influence in our country and in our world, and one could argue that their influence in 2025 was not for the better. Usually, though, the influence in physics and related fields tends to be relatively benign, funding postdoctoral researchers and graduate students who deeply want to do science but also need to eat. That said, sometimes donors fund non-essential fields at the expense of critical ones, or favor theoretical research over the gathering of crucial experimental data, or push money on famous rich organizations when there are poor ones that are equally deserving and far more needy.

When gazillionaires, on their own initiative, come calling on non-profit organizations, whether they be community centers, arts organizations, or universities, they pose a problem. On the one hand, it is the job of anyone in a non-profit organization to help raise money — fail to do that, and your organization will close. When a single person offers to permanently change the future of your program, you would be derelict in your duty if you did not consider that offer. On the other hand, donors who might have ethical or criminal problems could drag the organization’s name through the mud. Worse, they might be able to force the organization itself to do something ethically questionable or even illegal.

There is a clear lesson for young academics and other up-and-coming non-profit actors in the Epstein affair: the more money potentially offered to our organizations, the more carefully we must tread. Money is power; power corrupts; and every pursuit of dollars, even for the best causes, risks infection. We can’t be large-scale non-profit fundraisers without doing serious and thorough background checks of the biggest donors; we have to question motives, and we can’t look the other way when something seems amiss. Those of us with clear hearts and honest pursuits tend to assume the best in other people. But we have to beware of those hoping to bolster their reputations, or clean their consciences, by giving away “generously” what they never deserved to have.

December 01, 2025

Secret Blogging SeminarCongress proposes cutting of all funding to US academics who mentor Chinese students

I’m writing to point out a potential law which should be gathering more opposition and attention in math academia: The Securing American Funding and Expertise from Adversarial Research Exploitation Act. This is an amendment to the 2026 National Defense Authorization Act which has passed the House and could be added to the final version of the bill during reconcilliation in the Senate. I’m pulling most of my information from an article in Science.

This act would ban any US scientist from receiving federal funding if they have, within the last five years, worked with anyone from China, Russia, Iran or North Korea, where “worked with” includes joint research, co-authorship on papers, or advising a foreign graduate student or postdoctoral fellow. As I said in my message to my senators, this is everyone. Every mathematician has advised Chinese graduate students or collaborated with Chinese mathematicians, because China is integrated into the academic world and is one fifth of the earth.

This obviously isn’t secret, since you can read about it in Science, but I am surprised that I haven’t heard more alarm. Obvious people to contact are your senators and your representatives. I would also suggest contacting members of the Senate armed services committee, who are in charge of reconciling the House and Senate versions of the bill.

November 27, 2025

Sean Carroll Thanksgiving

 (Apologies for the ugly blog format. We had a bit of a crash, and are working to get the template back in working order.)

This year we give thanks for a crucially important idea that can mean very different things to different people: information. (We’ve previously given thanks for the Standard Model LagrangianHubble’s Law, the Spin-Statistics Theoremconservation of momentumeffective field theorythe error bargauge symmetryLandauer’s Principle, the Fourier TransformRiemannian Geometrythe speed of lightthe Jarzynski equalitythe moons of Jupiterspaceblack hole entropyelectromagnetism, Arrow’s Impossibility Theorem, and quanta.)

“Information” is an idea that is everywhere in science and technology these days. From one angle it looks like such an obvious idea that it’s a bit startling to realize that information theory didn’t really come along until the work of Claude Shannon in the 1940s. From another, the idea has so many different shades of meaning that we shouldn’t be surprised (that’s a joke you will get in a bit) that it can be hard to understand.

Information theory is obviously an enormous subject, but we’re just giving thanks, not writing a textbook. I want to mention two ideas I find especially central. First, Shannon’s idea about relating information content to “surprisal.” Second, the very different intuitive notions of information that we get from engineering and physics.

Shannon, working at Bell Labs, was interested in the problem of how to send trustworthy signals efficiently over transatlantic cables. He was thinking about various ways to express information in a code: a set of symbols, each with a defined meaning. So a code might be an alphabet, or a set of words, or a literal cipher. And he noticed that there was a lot of redundancy in natural languages; the word “the” appears much more often in English than the word “axe,” although both have the same number of letters.

Let’s refer to each letter or symbol in a code as an “event.” Shannon’s insight was to realize that the more unlikely an event, the more information it conveyed when it was received. The statements “The Sun rose in the east this morning” and “The Sun rose in the west this morning” contain the same number of letters, but the former contains almost no information — you already were pretty sure the Sun would be rising in the east. But the latter, if obtained from a reliable source, would be very informative indeed, precisely because it was so unexpected. Clearly some kind of unprecedented astronomical catastrophe was in progress.

Imagine we can assign a probability p(x) to every different event x. Shannon wanted a way to quantify the information content of that event, which would satisfy various reasonable-seeming axioms: most crucially, that the information content of two independent events is the sum of the individual information contents. But the joint probability of two events is the product of their individual probabilities. So the natural thing to do would be to define the information content as the logarithm of the probability; the logarithm of a product equals the sum of the individual logarithms. But you want low probability to correspond to high information content, so Shannon defined the information content (also called the self-information, or surprisal, or Shannon information) of an event to be minus the log of the probability, which by math is equal to the log of the reciprocal of the probability:

    \[I(x) = - \log [p(x)] =\log \left(\frac{1}{p(x)}\right).\]

Note that probabilities are numbers between 0 and 1, and the log of such a number will be negative, with numbers closer to 0 being more negative than numbers closer to 1. So I(x) goes from +\infty at p(x)=0 to 0 at p(x)=1. An impossible message is infinitely surprising, and therefore conveys infinite information; an inevitable message is completely unsurprising, and conveys no information at all.

From there, Shannon suggested that we could characterize how efficient an entire code was at conveying information: just calculate the average (expectation value) of the information content for all possible events. When we have a probability distribution p(x), the average of any function f(x) is just the sum of the the values of the function times their respective probabilities, \langle f\rangle = \sum_x p(x) f(x). So we characterize the information content of a code via the quantity

    \[H[p] = - \sum_x p(x) \log[p(x)].\]

The only question is, what to call this lovely newly-defined quantity that surely nobody had ever thought of before? Happily Shannon was friends with John von Neumann, who informed him, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.” So entropy it is.

Indeed, this formula is precisely that which had been put forward (unknown to Shannon) by Josiah Willard Gibbs in the 1870’s as a definition of entropy in statistical mechanics. (It is related to the definition on Ludwig Boltzmann’s tombstone, S= k \log W, and Boltzmann had also suggested similar expressions to the above.) On the one hand, it seems remarkable to find precisely the same expression playing central roles in problems as disparate as sending signals across cables and watching cream mix into coffee; on the other hand, it’s a relatively simple expression and the axioms used to derive it are actually pretty similar, so perhaps we shouldn’t be surprised; on the third hand, the connection between information theory and statistical mechanics turns out to be deep and fruitful, so it’s more than just a mathematical coincidence.

But let me highlight the one aspect of the term “information” that can be sometimes confusing to people. To the engineer, a code that is maximally informative is one for which p(x) is relatively uniform over all events x, which means H[p(x)] is maximal or close to it; in that case, every event will tell you something at least a little bit interesting. For them, high entropy = high information.

But to a physicist who might be asking “how much information do I have about the state of a system?”, you have more information when p(x) is relatively narrowly concentrated around some value, rather than being all spread out. For them, high entropy = low information! Indeed, one physically-relevant notion of “information” is the “accessible information” of a system, which can be defined as H_\mathrm{max} - H. (I talk about this a bit in my recent solo podcast on complexity.)

Perhaps we shouldn’t be so surprised that physicists and engineers posit oppositely-directed relationships between entropy and information. It’s just a reflection of the fact that “information” is so ubiquitous and has so many different uses. We should be thankful that we’re beginning to understand it so well.

November 26, 2025

Tim GowersCreating a database of motivated proofs

It’s been over three years since my last post on this blog and I have sometimes been asked, understandably, whether the project I announced in my previous post was actually happening. The answer is yes — the grant I received from the Astera Institute has funded several PhD students and a couple of postdocs, and we have been busy. In my previous post I suggested that I would be open to remote collaboration, but that has happened much less, partly because a Polymath-style approach would have been difficult to manage while also ensuring that my PhD students would have work that they could call their own to put in their theses.

In general I don’t see a satisfactory solution to that problem, but in this post I want to mention a subproject of the main project that is very much intended to be a large public collaboration. A few months ago, a call came out from Renaissance Philanthropies saying that they were launching a $9m AI for Math Fund to spend on projects in the general sphere of AI and mathematics, and inviting proposals. One of the categories that they specifically mentioned was creating new databases, and my group submitted a proposal to create a database of what we call “structured motivated proofs,” a piece of terminology that I will explain a bit more later in just a moment. I am happy to report that our proposal was one of the 29 successful ones. Since a good outcome to the project will depend on collaboration from many people outside the group, we need to publicize it, which is precisely the purpose of this post. Below I will be more specific about the kind of help we are looking for.

Why might yet another database of theorems and proofs be useful?

The underlying thought behind this project is that AI for mathematics is being held back not so much by an insufficient quantity of data as by the wrong kind of data. (For a more general exploration of this theme, see here.) All mathematicians know, and some of us enjoy complaining about it, that it is common practice when presenting a proof in a mathematics paper, or even textbook, to hide the thought processes that led to the proof. Often this does not matter too much, because the thought processes may be standard ones that do not need to be spelt out to the intended audience. But when proofs start to get longer and more difficult, they can be hard to read because one has to absorb definitions and lemma statements that are not obviously useful, are presented as if they appeared from nowhere, and demonstrate their utility only much later in the argument.

A sign that this is a problem for AI is the behaviour one observes after asking an LLM to prove a statement that is too difficult for it. Very often, instead of admitting defeat, it will imitate the style of a typical mathematics paper and produce rabbits out of hats, together with arguments later on that those rabbits do the required job. The problem is that, unlike with a correct mathematics paper, one finds when one scrutinizes the arguments carefully that they are wrong. However, it is hard to find superficial features that distinguish between an incorrect rabbit with an incorrect argument justifying that rabbit (especially if the argument does not go into full detail) and a correct one, so the kinds of statistical methods used by LLMs do not have an easy way to penalize the incorrectness.

Of course, that does not mean that LLMs cannot do mathematics at all — they are remarkably good at it, at least compared with what I would have expected three years ago. How can that be, given the problem I have discussed in the previous paragraph?

The way I see it (which could change — things move so fast in this sphere), the data that is currently available to train LLMs and other systems is very suitable for a certain way of doing mathematics that I call guess and check. When trying to solve a maths problem, you will normally write down the routine parts of an argument without any fuss (and an LLM can do them too because it has seen plenty of similar examples), but if the problem as a whole is not routine, then at some point you have to stop and think, often because you need to construct an object that has certain properties (I mean this in a rather general way — the “object” might be a lemma that will split up the proof in a nice way) and it is not obvious how to do so. The guess-and-check approach to such moments is what it says: you make as intelligent a guess as you can and then see whether it has the properties you wanted. If it doesn’t, you make another guess, and you keep going until you get lucky.

The reason an LLM might be tempted to use this kind of approach is that the style of mathematical writing I described above makes it look as though that is what we as mathematicians do. Of course, we don’t actually do that, but we tend not to mention all the failed guesses we made and how we carefully examined why they failed, modifying them in appropriate ways in response, until we finally converged on an object that worked. We also don’t mention the reasoning that often takes place before we make the guess, saying to ourselves things like “Clearly an Abelian group can’t have that property, so I need to look for a non-Abelian group.”

Intelligent guess and check works well a lot of the time, particularly when carried out by an LLM that has seen many proofs of many theorems. I have often been surprised when I have asked an LLM a problem of the form \exists x\in X \ P(x), where P is some property that is hard to satisfy, and the LLM has had no trouble answering it. But somehow when this happens, the flavour of the answer given by the LLM leaves me with the impression that the technique it has used to construct x is one that it has seen before and regards as standard.

If the above picture of what LLMs can do is correct (the considerations for reinforcement-learning-based systems such as AlphaProof are not identical but I think that much of what I say in this post applies to them too for slightly different reasons), then the likely consequence is that if we pursue current approaches, then we will reach a plateau: broadly speaking they will be very good at answering a question if it is the kind of question that a mathematician with the right domain expertise and good instincts would find reasonably straightforward, but will struggle with anything that is not of that kind. In particular, they will struggle with research-level problems, which are, almost by definition, problems that experts in the area do not find straightforward. (Of course, there would probably be cases where an LLM spots relatively easy arguments that the experts had missed, but that wouldn’t fundamentally alter the fact that they weren’t really capable of doing research-level mathematics.)

But what if we had a database of theorems and proofs that did not hide the thought processes that lay behind the non-obvious details of the proofs? If we could train AI on a database of accounts of proof discoveries and if, having done so, we then asked it to provide similar accounts, then it would no longer resort to guess-and-check when it got stuck, because the proof-discovery accounts it had been trained on would not be resorting to it. There could be a problem getting it to unlearn its bad habits, but I don’t think that difficulty would be impossible to surmount.

The next question is what such a database might look like. One could just invite people to send in stream-of-consciousness accounts of how they themselves found certain proofs, but that option is unsatisfactory for several reasons.

  1. It can be very hard to remember where an idea came from, even a few seconds after one has had it — in that respect it is like a dream, the memory of which becomes rapidly less vivid as one wakes up.
  2. Often an idea will seem fairly obvious to one person but not to another.
  3. The phrase “motivated proof” means different things to different people, so without a lot of careful moderation and curation of entries, there is a risk that a database would be disorganized and not much more helpful than a database of conventionally written proofs.
  4. A stream-of-consciousness account could end up being a bit too much about the person who finds the proof and not enough about the mathematical reasons for the proof being feasibly discoverable.

To deal with these kinds of difficulties, we plan to introduce a notion of a structured motivated proof, by which we mean a proof that is generated in a very particular way that I will partially describe below. A major part of the project, and part of the reason we needed funding for it, is to create a platform that will make it convenient to input structured motivated proofs and difficult to insert the kinds of rabbits out of hats that make a proof mysterious and unmotivated. In this way we hope to gamify the task of creating the database, challenging people to input into our system proofs of certain theorems that appear to rely on “magic” ideas, and perhaps even offering prizes for proofs that contain steps that appear in advance to be particularly hard to motivate. (An example: the solution by Ellenberg and Gijswijt of the cap-set problem uses polynomials in a magic-seeming way. The idea of using polynomials came from an earlier paper of Croot, Lev and Pach that proved a closely related theorem, but in that paper it just appears in the statement of their Lemma 1, with no prior discussion apart from the words “in the present paper we use the polynomial method” in the introduction.)

What is a structured motivated proof?

I wrote about motivated proofs in my previous post, but thanks to many discussions with other members of the group, my ideas have developed quite a lot since then. Here are two ways we like to think about the concept.

1. A structured motivated proof is one that is generated by standard moves.

I will not go into full detail about what I mean by this, but will do so in a future post when we have created the platform that we would like people to use in order to input proofs into the database. But the basic idea is that at any one moment one is in a certain state, which we call a proof-discovery state, and there will be a set of possible moves that can take one from the current proof-discovery state to a new one.

A proof-discovery state is supposed to be a more formal representation of the state one is in when in the middle of solving a problem. Typically, if the problem is difficult, one will have asked a number of questions, and will be aware of logical relationships between them: for example, one might know that a positive answer to Q1 could be used to create a counterexample to Q2, or that Q3 is a special case of Q4, and so on. One will also have proved some results connected with the original question, and again these results will be related to each other and to the original problem in various ways that might be quite complicated: for example P1 might be a special case of Q2, which, if true would reduce Q3 to Q4, where Q3 is a generalization of the statement we are trying to prove.

Typically we will be focusing on one of the questions, and typically that question will take the form of some hypotheses and a target (the question being whether the hypotheses imply the target). One kind of move we might make is a standard logical move such as forwards or backwards reasoning: for example, if we have hypotheses of the form P(x) and \forall u\ P(u)\implies Q(u), then we might decide to deduce Q(x). But things get more interesting when we consider slightly less basic actions we might take. Here are three examples.

  1. We have in our list of hypotheses the fact that a function f is given by the formula f(x)=\exp(p(x)), where p is a polynomial, and our goal is to prove that there exists z such that f(z)=1. Without really thinking about it, we are conscious that f is a composition of two functions, one of which is continuous and one of which belongs to a class of functions that are all continuous, so f is continuous. Also, the conclusion \exists z\ f(z)=1 matches well the conclusion of the intermediate-value theorem. So the intermediate-value theorem comes naturally to mind and we add it to our list of available hypotheses. In practice we wouldn’t necessarily write it down, but the system we wish to develop is intended to model not just what we write down but also what is going on in our brains, so we propose a move that we call library extraction (closely related to what is often called premise selection in the literature). Note that we have to be a bit careful about library extraction. We don’t want the system to be allowed to call up results from the library that appear to be irrelevant but then magically turn out to be helpful, since those would feel like rabbits out of hats. So we want to allow extraction of results only if they are obvious given the context. It is not easy to define what “obvious” means, but there is a good rule of thumb for it: a library extraction is obvious if it is one of the first things ChatGPT thinks of when given a suitable non-cheating prompt. For example, I gave it the prompt, “I have a function f from the reals to the reals and I want to prove that there exists some z such that f(z)=1. Can you suggest any results that might be helpful?” and the intermediate-value theorem was its second suggestion. (Note that I had not even told it that f was continuous, so I did not need to make that particular observation before coming up with the prompt.)
  2. We have a goal of the form \exists x\in X\ P(x). If this were a Lean proof state, the most common way to discharge a goal of this form would be to input a choice for x. That is, we would instantiate the existential quantifier with some x_0 and our new goal would be P(x_0). However, as with library extraction, we have to be very careful about instantiation if we want our proof to be motivated, since we wish to disallow highly surprising choices of x_0 that can be found only after a long process of thought. So we have to restrict ourselves to obvious instantiations. One way that an instantiation in our system will count as obvious is if the variable is instantiated with a term that is already present in the proof-discovery state. If the desired term is not present, then in order to continue with the proof, it will be necessary to carry out moves that generate it. A very common technique for this is the use of metavariables: instead of guessing a suitable x_0, we create a variable x^\bullet and change the goal to P(x^\bullet), which we can think of as saying “I’m going to start trying to prove P(x^\bullet) even though I haven’t chosen x^\bullet yet. As the attempted proof proceeds, I will note down any properties Q_1,\dots,Q_k that x^\bullet might have that would help me finish the proof, in the hope that (i) I get to the end and (ii) the problem \exists x\ Q_1(x)\wedge\dots\wedge Q_k(x) is easier than the original problem.” Another kind of obvious instantiation is one where we try out an object that is “extreme” in some way — it might be the smallest element of X, or the largest, or the simplest. (Judging simplicity is another place where the ChatGPT rule of thumb can be used.)
  3. We cannot see how to answer the question we are focusing on so we ask a related question. Two very common kinds of related question (as emphasized by Polya) are generalization and specialization. Perhaps we don’t see why a hypothesis is helpful, so we see whether the result holds if we drop that hypothesis. If it does, then we are no longer distracted by an irrelevant hypothesis. If it does not, then we can hope to find a counterexample that will help us understand how to use the hypothesis. Or perhaps we are trying to prove a general statement but it is not clear how to do so, so instead we formulate some special cases, hoping that we can prove them and spot features of the proofs that we can generalize. Again we have to be rather careful here not to allow “non-obvious” generalizations and specializations. Roughly the idea there is that a generalization should be purely logical — for example, dropping a hypothesis is fine but replacing the hypothesis “f is twice differentiable” by “f is upper semicontinuous” is not — and that a specialization should be to a special case that counts as an obvious instantiation in the sense discussed just above.

2. A structured motivated proof is one that can be generated with the help of a point-and-click system.

This is a surprisingly useful way to conceive of what we are talking about, especially as it relates closely to what I was talking about earlier: imposing a standard form on motivated proofs (which is why we call them “structured” motivated proofs) and gamifying the process of producing them.

The idea is that a structured motivated proof is one that can be generated using an interface (which we are in the process of creating — at the moment we have a very basic prototype that has a few of the features we will need, but not yet the more interesting ones) that has one essential property: the user cannot type in data. So what can they do? They can select text that is on their screen (typically mathematical expressions or subexpressions), they can click buttons, choose items from drop-down menus, and accept or reject “obvious” suggestions made to them by the interface.

If, for example, the current goal is an existential statement \exists x\ P(x), then typing in a formula that defines a suitable x is not possible, so instead one must select text or generate new text by clicking buttons, choosing from short drop-down menus, and so on. This forces the user to generate x, which is our proxy for showing where the idea of using x came from.

Broadly speaking, the way the prototype works is to get an LLM to read a JSON object that describes the variables, hypotheses and goals involved in the proof state in a structured format, and to describe (by means of a fairly long prompt) the various moves it might be called upon to do. Thus, the proofs generated by the system are not formally verified, but that is not an issue that concerns us in practice since there will be a human in the loop throughout to catch any mistakes that the LLM might make, and this flexibility may even work to our advantage to better capture the fluidity of natural-language mathematics.

There is obviously a lot more to say about what the proof-generating moves are, or (approximately equivalently) what the options provided by a point-and-click system will be. I plan to discuss that in much more detail when we are closer to having an interface ready, the target for which is the end of this calendar year. But the aim of the project is to create a database of examples of proofs that have been successfully generated using the interface, which can then be used to train AI to play the generate-structured-motivated-proof game.

How to get involved.

There are several tasks that will need doing once the project gets properly under way. Here are some of the likely ones.

  1. The most important is for people to submit structured motivated (or move-generated) proofs to us on the platform we provide. We hope that the database will end up containing proofs of a wide range of difficulty (of two kinds — there might be fairly easy arguments that are hard to motivate and there might be arguments that are harder to follow but easier to motivate) and also a wide range of areas of mathematics. Our initial target, which is quite ambitious, is to have around 1000 entries by two years from now. While we are not in a position to accept entries yet, if you are interested in participating, then it is not too early to start thinking in a less formal way about how to convert some of your favourite proofs into motivated versions, since that will undoubtedly make it easier to get them accepted by our platform when it is ready.
  2. We are in the process of designing the platform. As I mentioned earlier, we already have a prototype, but there are many moves we will need it to be able to do that it cannot currently do. For example, the current prototype allows just a single proof state, which consists of some variable declarations, hypotheses, and goals. It does not yet support creating subsidiary proof states (which we would need if we wanted to allow the user to consider generalizations and specializations, for example). Also, for the moment the prototype gets an LLM to implement all moves, but some of the moves, such as applying modus ponens, are extremely mechanical and would be better done using a conventional program. (On the other hand, moves such as “obvious library extraction” or “provide the simplest example” are better done by an LLM.) Thirdly, a technical problem is that LaTeX is currently rendered as images, which makes it hard to select subexpressions, something we will need to be able to do in a non-clunky way. And the public version of the platform will need to be web-based and very convenient to use. We will want features such as being able to zoom out and look at some kind of dependency diagram of all the statements and questions currently in play, and then zoom in on various nodes if the user wishes to work on them. If you think you may be able (and willing) to help with some of these aspects of the platform, then we would be very happy to hear from you. For some, it would probably help to have a familiarity with proof assistants, while for others we would be looking for somebody with software engineering experience. The grant from the AI for Math Fund will allow us to pay for some of this help, at rates to be negotiated. We are not yet ready to specify in detail what help we need, but would welcome any initial expressions of interest.
  3. Once the platform is ready and people start to submit proofs, it is likely that, at least to start with, they will find that the platform does not always provide the moves they need. Perhaps they will have a very convincing account of where a non-obvious idea in the proof came from, but the system won’t be expressive enough for them to translate that account into a sequence of proof-generating moves. We will want to be able to react to such situations (if we agree that a new move is needed) by expanding the capacity of the platform. It will therefore be very helpful if people sign up to be beta-testers, so that we can try to get the platform to a reasonably stable state before opening it up to a wider public. Of course, to be a beta-tester you would need to have a few motivated proofs in mind.
  4. It is not obvious that every proof submitted via the platform, even if submitted successfully, would be a useful addition to the database. For instance, it might be such a routine argument that no idea really needs to have its origin explained. Or it might be that, despite our best efforts, somebody finds a way of sneaking in a rabbit while using only the moves that we have provided. (One way this could happen is if an LLM made a highly non-obvious suggestion that happened to work, in which case the rule of thumb that if an LLM thinks of it, it must be obvious, would have failed in that instance.) For this reason, we envisage having a team of moderators, who will check entries and make sure that they are good additions to the database. We hope that this will be an enjoyable task, but it may have its tedious aspects, so we envisage paying moderators — again, this expense was allowed for in our proposal to the AI for Math Fund.

If you think you might be interested in any of these roles, please feel free to get in touch. Probably the hardest recruitment task for us will be identifying the right people with the right mixture of mathematical knowledge and software engineering skills to help us turn the platform into a well-designed web-based one that is convenient and pleasurable to use. If you think you might be such a person, or if you have a good idea for how we should go about finding one, we would be particularly interested to hear from you.

In a future post, I will say more about the kinds of moves that our platform will allow, and will give examples of non-motivated proofs together with how motivated versions of those proofs can be found and entered using the platform (which may involve a certain amount of speculation about what the platform will end up looking like).

How does this relate to use of tactics in a proof assistant?

In one way, our “moves” can be regarded as tactics of a kind. However, some of the moves we will need are difficult to implement in conventional proof assistants such as Lean. In parallel with the work described above, we hope to create an interface to Lean that would allow one to carry out proof-discovery moves of the kind discussed above but with the proof-discovery states being collections of Lean proof states. Members of my group have already been working on this and have made some very interesting progress, but there is some way to go. However, we hope that at some point (and this is also part of the project pitched to the AI for Math Fund) we will have created another interface that will have Lean working in the background, so that it will be possible to generate motivated proofs that will be (or perhaps it is better to say include) proofs in Lean at the same time.

Another possibility that we are also considering is to use the output of the first platform (which, as mentioned above, will be fairly formal, but not in the strict sense of a language such as Lean) to create a kind of blueprint that can then be autoformalized automatically. Then we would have a platform that would in principle allow mathematicians to search for proofs while working on their computers without having to learn a formal language, with their thoughts being formalized as they go.

November 25, 2025

David Hoggsubstellar objects (brown dwarfs)

I spent the day at the NSBP / NSHP meeting in San José. My favorite session of the day was the morning astro session, which was entirely about brown dwarfs. I learned a lot in a very short time. Caprice Phillips (UCSC) introduced the session with an introduction to the scientific and technical questions in play. She put a lot of emphasis on using binaries and clusters to put detailed abundance ratios onto substellar objects. This was what I expected: I thought (walking in to this session) that all known abundance ratios for brown dwarfs were from such kinds of studies. I learned different (keep reading).

Gabriel Munoz Zarazua (SFSU) followed by showing spectra from M-dwarfs, brown dwarfs, and Jupiter. It definitely looks like a sequence. He does spectral fitting (what they call, in this business, retrievals). It looks like he is getting very good, somewhat precise, abundance ratios for the photospheres of substellar objects! I asked more about this in the question period, and apparently I am way behind the times (Emily Rauscher, Michigan, helpfully pointed this out to me): Now brown-dwarf photosphere models are so good, they can be used to measure abundances, and pretty well.

I also learned in this session (maybe from Jorge Sanchez, ASU, or maybe from Efrain Alvarado, SFSU) that there is a very strong mass–abundance relation in the Solar System. That is, we don't expect, if brown dwarfs form the way planets do, that the detailed abundances of the brown dwarfs will match exactly the detailed abundances of the primary stars. But now we are really in a position to test that. Sanchez showed that we can get, from even photometry, abundances for substellar objects in the Milky Way halo. Again, totally new to me! And he finds metallicities at or below −3. Alvarado showed data on an amazing system J1416, which is an L–T binary with no stellar companion. Apparently it is the only known completely substellar binary.

November 14, 2025

Matt Strassler Event with Professor Daniel Whiteson on Monday November 17 at 7pm

Next Monday, November 17th at 7pm, I’ll be at the Harvard Bookstore with particle physicist and author Daniel Whiteson. Professor Whiteson and his co-author Andy Warner have a nice new book, for the general science-aware reader, exploring an age-old and unanswered question: how universal is the knowledge and understanding that we call “physics”? How much of modern physics is actually telling us about the universe, and how much of it is created by, or an accident of, the humans who have helped bring it about?

For instance, if we started all over again and reran history from scratch, would the physics (and science more generally) of this re-run culture look much like our own, or might it turn out very differently? If another culture on Earth had had time to develop highly mature science (or something like it) in its own direction, independent of Western Europe’s influence, how different might that science be? (Indeed, would our word “science” even be translatable into their worldview?) Or if we encountered aliens with far greater understanding of the universe than we have, would we be able to recognize, parse, grok, appreciate, comprehend, and/or otherwise make sense of their notions of scientific knowledge?

Whiteson and his co-author, wanting to write a popular book rather than a scholarly one, and desiring nevertheless to take on these serious and challenging intellectual questions, have set their focus mostly on the aliens, accompanied by amusing cartoons and a generous helping of dad jokes (hey, some dad jokes are actually very funny.) They’re looking for a broad audience, and hopefully they will get it. But don’t let the light-hearted title (“Do Aliens Speak Physics?“) or the charmingly goofy cover fool you: this book might well make you laugh, but I guarantee it will make you think. Whether you’re just curious about science or you’ve been doing science yourself for years, I suspect that, within the vast array of problems and issues that are raised in this broad-minded book, there will be some you’ve never thought of.

Among scientists and philosophers, there are some who believe that any aliens with the capacity to reach the Earth will obviously “speak physics” — that math and physics float above contingencies of culture and species, and will easily be translated from any intelligent creature to any other. But are they perhaps flying too high? It’s clear that Whiteson and Warner are aiming to poke some holes — lots of holes —- in their hot-air balloon, and to do so in a way that a wide variety of readers can appreciate and enjoy.

I tend to agree with Whiteson on a lot of these issues, but that won’t stop me from asking him some tough questions. You can ask him some tough questions too, if you like — just come to the Harvard Bookstore at 7:00 on Monday and join the conversation!

October 15, 2025

Clifford JohnsonNobel Prize in Physics 2025: Who/What/Why

I started a tradition a little while back where every year we have a special departmental colloquium entitled "The Nobel Prize in Physics: Who/What/Why". This year my job in finding speakers was made easier by having 2/3 of this years newly-minted Nobel Prize winners in physics in the Department! (Michel Devoret and John Martinis.) So our room was a bit more well-attended than normal...(hundreds and hundreds rather than dozens and dozens). Here is a recording of the event, which I was delighted to host, and there's a celebration afterwards too. (Pls share widely!)
[...] Click to continue reading this post

The post Nobel Prize in Physics 2025: Who/What/Why appeared first on Asymptotia.

October 01, 2025

Robert HellingHolosplit

 Recently I had to update Mathematica on my laptop and after having solved the challenges of the license manager that keeps looking different every time I have to use it, I learned that Mathematica 14 can now officially work with finite fields.

This reminded me that for a while I wanted to revive an old project that had vanished together with the hard drive of some old computer: Holosplit. So, over the last two days and with the help of said version of Mathematica I did a complete rewrite which you can now find on Github.

It consists of two C programs "holosplit" and "holojoin". To the first you give a positive integer \(N\) and a file and it spits out a new file ("fragment") that is roughly \(1/N\) of the size. Every time you do that you obtain a new random fragment.

The later you give any collection of \(N\) of these fragments and it reproduces the original file. So you can for example distribute a file over 10 people such that when any 3 of them work together, they can recover the original. 

How does it work? I uses the finite field \(F\) of \(2^3=256\) elements (in the Github repository, there is also a header file that implements arithmetic in \(F\) and matrix operations like product and inverse over it). Each time, it is invoked, it picks a random vector \(v\in F^N\) and writes it to the output. Then it reads \(N\) bytes from the file at a time which it also interprets as a vector \(d\in F^N\). It then outputs the byte that corresponds to the scalar product \(v\cdot d\).

To reassemble the file, holojoin takes the \(N\) files with its random vectors \(v_1,\ldots,v_N\) and interprets those as the rows of a \(N\times N\) matrix \(A\). With probability

$$\frac{\prod_{k=1}^N \left(256^N-k\right)}{(256)^{N^2}}$$

which exponentially in \(N\) approaches 1 this matrix is invertible (homework: why?). So we can read one byte from each file, assemble those into yet another vector \(e\in F^N\) and recover

$$d=A^{-1}e.$$

Besides the mathematics, it also poses philosophical/legal questions: Consider for example the original file is copyrighted, for example an mp3 or a video. The fragments are clearly derived works. But individually, they do not contain the original work, without sufficiently many other fragments they are useless (although not in a cryptographic sense). So by publishing one fragment, I do not provide access to the original work. What if others publish other fragments? Then my fragment could be the last remaining one that was missing. If there are more, any individual fragment is redundant so publishing it strictly speaking does not provide new information. 

September 26, 2025

Peter Rohde Photo albums

Peter’s photos: https://www.icloud.com/sharedalbum/#B275oqs3qKSZvQ

Screenshots: https://www.icloud.com/sharedalbum/#B27532ODWjIQb9

Climbing book launch: https://www.icloud.com/sharedalbum/#B27GWZuqDGnuOyN

Salisbury waters: https://www.icloud.com/sharedalbum/#B275qXGF1JQFkx

Christmas with Ash: https://www.icloud.com/sharedalbum/#B27G6XBubAhoT6

Hosin BBQ duck: https://www.icloud.com/sharedalbum/#B27GY8gBYG3b5mD

Hawks Nest to Smiths Lake: https://www.icloud.com/sharedalbum/#B2759UlCqSH5bE

Europe & Alps: https://www.icloud.com/sharedalbum/#B275ON9t3W0lu

Point Perpendicular: https://www.icloud.com/sharedalbum/#B27GqkRUiGivXD2

Newnes canyoning: https://www.icloud.com/sharedalbum/#B27GfnH8tgHSmX

Coffs Harbour to Yamba: https://www.icloud.com/sharedalbum/#B27J0DiRHJKuuWr

Wendy Bruere Christmas (2020): https://www.icloud.com/sharedalbum/#B27G4TcsmGoHysj

Six Foot Track: https://www.icloud.com/sharedalbum/#B2753qWtHZA9EX

Kosciusko to Kiandra: https://www.icloud.com/sharedalbum/#B27GgZLKuGaewVm

Camping food: https://www.icloud.com/sharedalbum/#B27GtnIORgbmHu

The Aardvark: https://www.icloud.com/sharedalbum/#B275VaUrzvmAiT

Kangaroo Valley kayaking: https://www.icloud.com/sharedalbum/#B27JEsNWnJrCpi0

Claustral canyon: https://www.icloud.com/sharedalbum/#B2755Z2WMOTpsk

Budawang: https://www.icloud.com/sharedalbum/#B27GDdyTvGvpINL

Mother’s Day panoramas (2021): https://www.icloud.com/sharedalbum/#B27GFssfGG9WmJP

Point Perpendicular & Nowra: https://www.icloud.com/sharedalbum/#B27GRMtznGPdeuZ

Blood moon: https://www.icloud.com/sharedalbum/#B27GdIshaG8NgGX

La Perouse to Coogee: https://www.icloud.com/sharedalbum/#B275aVbMK4h7qo

Canberra ASPI launch: https://www.icloud.com/sharedalbum/#B27GQOeMmGj4Zcv

Edible foraging: https://www.icloud.com/sharedalbum/#B275ejO179Si0N

Sydney to Wollongong: https://www.icloud.com/sharedalbum/#B275M7GFPUasMe

Album for Dad, Father’s Day (2021): https://www.icloud.com/sharedalbum/#B2752plgjnnkUe

Vaucluse (with Cheryl, Nestor & Wendy): https://www.icloud.com/sharedalbum/#B275CmvAS4uA0Z

Bouddi National Park: https://www.icloud.com/sharedalbum/#B27GdPblXG8WdOo

Tom Thumb (the 2nd): https://www.icloud.com/sharedalbum/#B275aDWbr4CN2w

Eden to Victoria: https://www.icloud.com/sharedalbum/#B27GJDfWGArX8l

Wendy’s book launch (the 2nd): https://www.icloud.com/sharedalbum/#B27GIcgc2G7h08y

Mark & Pat Bruere visit Sydney: https://www.icloud.com/sharedalbum/#B27G0ehgLbyWyg

New Years Eve climb (2021): https://www.icloud.com/sharedalbum/#B27Ju8EH6JOZxmU

Newnes Canyoning (2022): https://www.icloud.com/sharedalbum/#B275BydzFU0GZ8

Royal National Park (2022): https://www.icloud.com/sharedalbum/#B27GlxzuqGVI5nE

Peter & Wendy: https://www.icloud.com/sharedalbum/#B27Gf693ZG52tfd

Book photo shoots: too rude…

Wendy & Peter’s mushroom trip: https://www.icloud.com/sharedalbum/#B27GrhkPxG27So8

Post-mushroom hike: https://www.icloud.com/sharedalbum/#B27GdFryYG8i3Ur

Wendy Kalymnos favourites: https://www.icloud.com/sharedalbum/#B27JqstnBJEXkH2

Wendy Frenchmans screenshots: https://www.icloud.com/sharedalbum/#B27Jr1PPdJpd7Dq

Instagram: https://www.icloud.com/sharedalbum/#B27GzFCC1Gb4tqr

Haute route: https://www.icloud.com/sharedalbum/#B27J8GySPJtWoQ1

Kim’s KKKalendar: https://www.icloud.com/sharedalbum/#B275fk75vIL0sH

Frenchmans Cap Wild: https://www.icloud.com/sharedalbum/#B27G4VTwGGoFBkz

Photoshoot with Zixin: https://www.icloud.com/sharedalbum/#B27GPCdxkGKPkM4

Wendy birthday hike (2023): https://www.icloud.com/sharedalbum/#B27GWBC59GnHpQW

Bateman’s Bay to Bawley Point: https://www.icloud.com/sharedalbum/#B27JsHvHoJ8bxWf

Stockton Sand dunes (2023): https://www.icloud.com/sharedalbum/#B27GVfZ2vGloFZV

Wendy book launch (2023): https://www.icloud.com/sharedalbum/#B27J058xyJR4IBM

Dolomites (2023): https://www.icloud.com/sharedalbum/#B0Z5kuVsbGJUzKO

Mount Arapiles: https://www.icloud.com/sharedalbum/#B275GH8Mq8Uh2X

Mount Solitary loop: https://www.icloud.com/sharedalbum/#B275nhQST2mETE

Klaus Hanz Franz Rohde Kunst: https://www.icloud.com/sharedalbum/#B27GqQrCLGiY3vb

Klaus Rohde funeral slideshow: https://www.icloud.com/sharedalbum/#B27GDZLe8GXP58K

Dad (old, B&W): https://www.icloud.com/sharedalbum/#B27GLLXGLJ5mbT2

Klaus & Ursula wedding: https://www.icloud.com/sharedalbum/#B275cLqfN7154g

Test Greece: https://www.icloud.com/sharedalbum/#B27Jq4WnLJ6JMNd

From Will Skea (Alps): https://www.icloud.com/sharedalbum/#B27JHciePJFwacG

From Will Skea (Frenchmans Cap): https://www.icloud.com/sharedalbum/#B275ZhN2v3EVq6

From Will Skea (Arapiles): https://www.icloud.com/sharedalbum/#B27JPrgBGJu3BTD

Coffs Harbour to Yamba (2): https://www.icloud.com/sharedalbum/#B27GFqhgJG9LHgT

Mark magic show (2021): https://www.icloud.com/sharedalbum/#B27G60dj6ARCvd

Wendy Christmas present (2020): https://www.icloud.com/sharedalbum/#B275FrPQ6GxvRu

AHS 25 year reunion: https://www.icloud.com/sharedalbum/#B275O3DjHUvSv

WhatsApp: https://www.icloud.com/sharedalbum/#B275tzEA5fX1nc

Armidale High School: https://www.icloud.com/sharedalbum/#B27GnbeumG4PnAF

Book photos for Mum & Dad: https://www.icloud.com/sharedalbum/#B27Gtec4XQkASe

Miscellaneous: https://www.icloud.com/sharedalbum/#B27Gq6kMgGKn7GR

Three Capes Trail (2022): https://www.icloud.com/sharedalbum/#B27G7HOIlGrDUGZ

Childhood computer programming: https://www.icloud.com/sharedalbum/#B275fu2MutDU8N

Magic with Mark in Maroubra: https://www.icloud.com/sharedalbum/#B27Gv6DhEGD9U3G

Photoshoot with Zixin (2024): https://www.icloud.com/sharedalbum/#B27GCATCnJGoRfW

Butt Crack (2021): https://www.icloud.com/sharedalbum/#B275VtHQfMv0zw

Greece photos new (edited to remove photos from wrong album): https://www.icloud.com/sharedalbum/#B27GY3uThGoBcGj

Singapore (all combined): https://www.icloud.com/sharedalbum/#B275qsTcwJKJjl

Hong Kong (transit): https://www.icloud.com/sharedalbum/#B2759v1AbS8Hve

Taiwan: https://www.icloud.com/sharedalbum/#B27GQD2D7Gw0hAp

India (combined): https://www.icloud.com/sharedalbum/#B27Gtue8VQy83g

Freycinet: https://www.icloud.com/sharedalbum/#B27G5VpecGE5Tbg

Triglav: https://www.icloud.com/sharedalbum/#B275MbK9Vy8erz

Shared with me: https://www.icloud.com/sharedalbum/#B27GGXqixzPOrm

Mount Wellington climbing: https://www.icloud.com/sharedalbum/#B27Gd59qiG8Kjy4

New Zealand combined (2004): https://www.icloud.com/sharedalbum/#B27GIZ8BIGNN5jy

New Zealand combined (2005): https://www.icloud.com/sharedalbum/#B27GcuRfIGFVIcL

Yea: https://www.icloud.com/sharedalbum/#B27GZYbYHGhFIir

Mount Pleasant: https://www.icloud.com/sharedalbum/#B275Iy2hC0JTTL

D’Aguilar: https://www.icloud.com/sharedalbum/#B27Gh7fzTGZBosS

Bali (2001): https://www.icloud.com/sharedalbum/#B27G1qNHBGOTbIr

Samba Ninjas: https://www.icloud.com/sharedalbum/#B27GG34bAzqQ0v

Armidale (misc): https://www.icloud.com/sharedalbum/#B27GSkLVwGyobbX

Emma’s party (2008): https://www.icloud.com/sharedalbum/#B275S2ms99Zyby

Goettingen (2011): https://www.icloud.com/sharedalbum/#B27JIrbT3Jsgxhd

South Coast track: https://www.icloud.com/sharedalbum/#B27G58NWBG6QyN7

Minsk (2006): https://www.icloud.com/sharedalbum/#B27G3JpSBGX1UkQ

Baden-Baden (2019): https://www.icloud.com/sharedalbum/#B27595X5HTVzJr

Berlin (combined): https://www.icloud.com/sharedalbum/#B27JqWzChJ6qizD

Switzerland (combined): https://www.icloud.com/sharedalbum/#B275zXwoYGJ6HMF

Italy highlights: https://www.icloud.com/sharedalbum/#B27G47PHQGoJium

Germany (misc): https://www.icloud.com/sharedalbum/#B275hPMfYGu5xVJ

Garmisch (2022): https://www.icloud.com/sharedalbum/#B27GFsbvlG9Xrr6

Germany (2019): https://www.icloud.com/sharedalbum/#B27G6Mn98G56Ncb

Garmisch (2006): https://www.icloud.com/sharedalbum/#B27J5lIdKGLC9KG

Baden-Baden (2005): https://www.icloud.com/sharedalbum/#B275sWRpHHQkt9

Berlin (2005): https://www.icloud.com/sharedalbum/#B27GgOQtrGjQrpH

Zugspitze (2005): https://www.icloud.com/sharedalbum/#B27G81mNdGcApGt

Amsterdam, Bristol (2006): https://www.icloud.com/sharedalbum/#B275B9SRzyBjlH

Baden-Baden (2006): https://www.icloud.com/sharedalbum/#B275eD9V79I2XR

Berlin (2006): https://www.icloud.com/sharedalbum/#B275toRf1fH8MD

Berlin, Jena (2007): https://www.icloud.com/sharedalbum/#B27GTI3fvGVgNit

Erlangen (2006): https://www.icloud.com/sharedalbum/#B27JrotZ2JpMb0i

Garmisch (2010): https://www.icloud.com/sharedalbum/#B27JPJPSiJurzNg

Germany (2010): https://www.icloud.com/sharedalbum/#B275FhYPQP650

Stuttgart (2006): https://www.icloud.com/sharedalbum/#B27GmitydGVVaZh

Changi (2019): https://www.icloud.com/sharedalbum/#B27GnmlKoG4JHpX

Japan (2007): https://www.icloud.com/sharedalbum/#B275AerZbG6FxVL

Japan (2012): https://www.icloud.com/sharedalbum/#B27GjBjobGg6PUa

Miscellaneous (including Japan 2013): https://www.icloud.com/sharedalbum/#B27GTpbybGySbE8

Currumbin & Tugin (2021): https://www.icloud.com/sharedalbum/#B275vBKZ4xH9X6

Brisbane (2021): https://www.icloud.com/sharedalbum/#B275YHsSjxQnm0

Weed in Byron (26/6/2025): https://www.icloud.com/sharedalbum/#B275Q2ydoGsQ4O5

Weed in Byron 2: https://www.icloud.com/sharedalbum/#B27GQDYhLGwsuY4