

On hopeThe comments on my previous post, on recent AI breakthroughs in solving Erdös problems and beyond, must’ve set some sort of record for the number of separate reasons commenters offered me to despair about the future of humanity. All this in a post that I saw as relatively nerdy and anodyne, goring few oxen, when I clicked “Publish”!
According to some persistent commenters, the only reason why I wrote about recent AI-enabled math breakthroughs is that I’m a shameless shill for the AI companies, my loud public criticisms of those companies being nothing more than a cynical smokescreen. Except that I’m also a laughable dupe of the AI companies.
See, AI, despite all appearances to the contrary, has not solved the Erdös unit distance problem or any other important math problems at all. It’s merely produced vast amounts of garbage via brute-force search, and then human mathematicians, sifting through the digital garbage pile, found some things they could call “proofs.” Except also, those human mathematicians aren’t even real mathematicians! They’re merely Hungarian combinatorialists, the kind obsessed with trivial, uninteresting Erdös problems, which it stands to reason that AI can now solve. AI will never touch the truly deep, creative parts of math, epitomized by Grothendieck-style algebraic geometry.
(When I relayed this to a world-leading algebraic geometer of my acquaintance, he laughed and said that everyone has to tell themselves whatever it takes to cope with the situation. He himself has been using LLMs in his research, and while they can’t yet write his papers for him, he expects them to improve very rapidly.)
When pressed, my commenters made it explicit: Timothy Gowers, the Fields Medalist who told his fellow mathematicians that he hopes they’re sitting down before he broke the news about the Erdös distance problem, is not a real mathematician, just a combinatorial puzzle-solver. Paul Erdös himself was not a real mathematician either.
Oh, and also, I’m a genocidal Judeofascist Zionist. That entered the comments as well, with the pretext being that I had shared a GPT-written story about ancient Israelites.
(Note: For every comment that I allowed to appear in the thread, assume as usual that there are many worse ones that I didn’t.)
Does anyone wonder why I blog so much less than I used to? Seeing what humanity has to offer, as reflected in my comment section, I feel like maybe we should take our chances with our future AI overlords. Except that some of my comments are—ironically, given their content—likely to be AI-generated as well.
These days friend after friend of mine, colleague after colleague, acquaintance after acquaintance is becoming a multimillionaire or even a billionaire from startup equity. Meanwhile, I’ve scrupulously abstained from all of that.
Why? Well, probably the single biggest reason has been Shtetl-Optimized. I’ve zealously sought to protect my “neutrality” and “objectivity” as a commentator, on this free (and even ad-free) public forum—the one where I try every week to reason with anonymous trolls with “proton.me” email addresses who show up to call me a hack and a shill and a baby-killer and a dunce. Ironically, the actual billionaires who I know hardly ever get called those sorts of names, mostly because they don’t offer the world a huge attack surface like I do. Or if they occasionally do get called them, they don’t care.
On reflection, all the commenters calling me a dunce have a point. When one looks at how I chose to spend my life, versus how all these billionaires did, I am kind of a moron.
And yet I titled this post “On Hope.” In a situation like the present, one needs to find hope wherever I can. And right now, I’m choosing to find it in this open letter, which has been signed by over 1,250 professors at the University of California. Let me quote the beginning of it:
To the UC Regents, UCOP, Academic Senate leadership, and the people of California:
We write as University of California mathematics faculty, joined by faculty from other STEM disciplines. UC has long served students from every background and has been a powerful engine of social mobility for the people of California. That public trust must be protected for future generations. Today, UC’s mission is at risk. To preserve that mission:
We call for the reinstatement of the SAT/ACT mathematics requirement for applicants to STEM majors beginning with the 2027 admissions cycle, alongside STEM faculty oversight of readiness standards and admissions practices affecting those majors.
Over the past five years, we have seen a widening divergence in mathematical preparation levels within the same classroom. This trend indicates that current admissions practices do not provide a sufficiently reliable check on mathematical readiness for STEM majors. The UC San Diego Senate–Administration Workgroup on Admissions report documents this crisis in stark terms: in the last five years, the number of students whose mathematics skills fall below high school level increased nearly thirtyfold; moreover, 70% of those students fall below middle school levels, reaching roughly one in twelve members of the entering cohort. These findings are corroborated by data across our campuses…
Everywhere one looks right now, and on every part of the political spectrum, doofuses and blankfaces strut across the earth triumphantly. Yet there remain pockets of sanity. What reading this open letter told me is that the University of California STEM faculty is one of them.
With enough such pockets, I could live a perfectly reasonable rest of my life, from now until my natural death (or until AI changes all our lives beyond recognition), regardless of who shows up in 3 … 2 … 1 … with a “proton.me” email address to confidently tell me otherwise.
Readers may have noticed that I haven’t been very active here for a while. That isn’t because I haven’t felt the “blogging urge”, but because I felt that the things I want to blog about right now wouldn’t be very interesting to much of the n-Category Cafe audience: they’re mostly fairly technical details about implementing proof assistants (because that’s what I’m mostly working on right now).
Accordingly, I’ve started a new blog! It’s at https://gwaithimirdain.github.io/blog/. (Gwaith-i-Mírdain is the github organization for development of Narya, the experimental proof assistant for Higher Observational Type Theory – and now Multimodal Type Theory as well – that I’ve been spending most of my time on, and will primarily be blogging about.) And I already wrote three posts (mostly about implementing multimodal type theory, with several survey questions for the reader), so you can check it out right now and see whether it’s likely to be your cup of tea.
Never fear, I’ll still come back here when I have more category-theoretic things to write about.
I’m not completely happy with this interview with Micah Zarin. It was nothing he did, it was me. I forgot to say that current-day AI wastes a lot of energy, and companies hope to use it to lay off people, and oligarchs are using it to extract lots of money from everyone. While obvious, these things are tremendously important and I should have emphasized them.
I was distracted by Micah’s fear that AI would make a career in math pointless, which really surprised me. So instead of giving my general thoughts on AI, I focused on putting myself in his place and imagining what to do in that situation. I suggested doing math with the help of AI as a way to overcome his fear and go ahead doing math while keeping abreast of new developments. If AI overtakes humans in math in his lifetime, which is far from certain, this could be a way to keep productively participating in math throughout this process. But I warned him to be very critical of what LLMs say, to lessen the danger of getting caught up in the ‘AI vortex’ that is turning many people into crackpots.
Mathematics, in case you haven’t been paying attention, is different from some other subjects because it’s an area where LLMs have shown some truly impressive problem-solving ability: read the various mathematicians’ comments in Remarks on the disproof of the unit distance conjecture where they grapple with this. But nobody really knows where this is going. So far LLMs have not shown much ability to invent new theories of mathematics, so it would be jumping to conclusions to assume AI will soon overtake humans in that realm. It would also be jumping to conclusions to assume it won’t.
Whatever happens, the real danger is not that AI will become too good, but that it will become too evil—most likely because of the oligarchs, corporations and governments behind it. I wish I had emphasized that point, which is always on my mind.
I think I succeeded in making another point, which is that life will not become pointless simply because some other entity gets better than humans at something and knocks us off our throne. To think that the meaning of life resides in our superiority is a childish attitude.
The sum of the reciprocals of the primes diverges, but very slowly. The sum of the reciprocals of the first 100 primes is
The sum of the reciprocals of the first 1,000 primes is
For the first 10,000 it’s
And it keeps creeping up, ever more slowly. To get the sum to reach 6, you need to add up the reciprocals of the first 3 × 10¹³² primes—far more than the number of atoms in the observable universe! Luckily there is no shortage of primes.
Here’s how you can see that the sum diverges, and that it diverges very slowly. First, remember Euler’s product formula for the Riemann zeta function:
This diverges logarithmically when Taking logs we see
must diverge too—but only log-logarithmically!
A deeper result, called Merten’s Second Theorem and proved here, says that
for some constant This constant is called the Meissel–Mertens constant. You can think of it as a fancier relative of Euler’s constant
which is defined by
But it’s much less widespread in mathematics than Euler’s constant, much as primes are less widespread than natural numbers.
Here’s how it works:
What’s a bit surprising, given the slowness of convergence and the somewhat erratic behavior of the primes, is that people can compute the Meissel–Mertens constant very precisely:
The trick, of course, is to use another formula for this constant, which lets you compute it much more efficiently than the definition. Here it is:
where is the Möbius function and
is the Riemann zeta function. By the way, this formula shows that calling
a fancier relative of
is not just talk.
When you know how to compute the Meissel–Mertens constant, you can compare the sum of reciprocals of primes to and the agreement is very good:
In 1983, Guy Robin proved the curve goes above and below the actual sum infinitely many times, i.e.
changes sign infinitely many times. You can see a bit of that happening here:
Puzzle. Can you find a naturally occurring sum that diverges even more slowly, for example like
The pictures were created by Dcoetzee, Marek Wolf and Saroad, respectively, and placed into the public domain on Wikicommons. Click on the pictures for more details.
“Let’s hazard an assertion: On or about June 2007, human character changed. To be more exact—because the phrase human character now feels antique—we might say instead that the human sensorium changed. By this we don’t necessarily mean a sudden and definite alteration in how we perceive the world—in the forms, sources, and amount of information we absorb, and in how we conduct our relations with parents, children, spouses, partners, mentors, friends. Yet a transition was set in motion, differentiating life before the omnipresent smartphone and life after, and dating its onset to the birth of the iPhone seems apt.”
This is Nicholas Dames writing in the Atlantic about Ben Lerner’s new novel, Transcription. He is riffing on Virginia Woolf, who said something similar, but about December 1910 and not about the iPhone.
Anyway, I vehemently and respectfully disagree. The iPhone didn’t change human character. Human character is much as it was when I was a child. And my children are growing up in much the same world that I did. The world I grew up in makes sense to them and the world they live in makes sense to me. How it was for Woolf in 1924 I can’t say. Maybe she was wrong, too! AB read the Odyssey this year in high school. (OK, selected clips from the Odyssey. Maybe on or about September 1995 high school English curricula changed, I’d be willing to grant that.) Her response was “People were not really very different back then, were they?” and I think she was right.
Anyway, Transcription is good, really good. My favorite of his since his first. You can read it in a day. It is not about iPhones or screen time, despite some editions having a phone on the cover. It is about human character, though, which is a good and traditional thing for novels and epic poems and blog posts to be about, and which will never be antique, as long as I have anything to say about it.
Dispatches from the possibly last days of human relevanceAs most readers have presumably heard by now, Paul Erdös’s Unit Distance Problem from 1946—one of the central open problems from the field of discrete geometry—has been solved by an internal OpenAI model. Erdös had conjectured that, given n points in the plane, at most n1+o(1) pairs of them could be unit distance apart. Using high-powered results from algebraic number theory, GPT refuted this, constructing a set with n1+ε unit-distance pairs, for ε ~ 10-38. Shortly afterward, Will Sawin, a human (!), improved GPT’s construction to get ~n1.014 pairs. There’s since been a claim to improve this further, to ~n1.034. Meanwhile, the best known upper bound remains n4/3, improving Erdös’s n3/2.
The entire process seems have been one-shot: my former student Lijie Chen simply gave GPT the problem, then GPT thought for a while and output a several-page argument that, on analysis by human experts, turned out to be correct. Of course there’s selection bias here; we’re not hearing as much about the hundreds of other problems GPT was given that it didn’t solve (isn’t that the case with humans too?). Clearly, too, GPT was helped by the facts that human mathematicians had wasted most of their time trying to prove Erdös right rather than looking for a counterexample, and that, even if they did look for a counterexample, they’d need to be experts in algebraic number theory to find this one, which hardly any discrete geometers are. So, maybe that suggests that AI, right now, is “merely” picking various medium-hanging fruits that human mathematicians missed for contingent reasons? With emphasis on the “right now.”
In a companion paper, OpenAI helpfully included commentary from Timothy Gowers, Noga Alon, Will Sawin, Daniel Litt, and many other experts, reflecting on the breakthrough, the path that GPT took to get to it (which can actually be seen by examining its chain-of-thought), and what this might mean for the future of mathematical research.
I heard the news maybe an hour after it broke, when some UT grad students came to my office to tell me. For what it’s worth: these students were morose, musing about how everything might soon be over for young scientists and mathematicians like themselves. I don’t know whether they’re right, but I feel like I should tell the truth about what their reaction was.
[Update: News has been coming faster than I can write about it, but today we learned that another important conjecture of Erdös has been refuted. Erdös and Szemerédi’s strong sumset conjecture over R, from the 1970s, had said that, if A is a finite set of real numbers, then either |A+A| or |A×A| must be at least |A|2-o(1). In this case humans, including the aforementioned Sawin, did almost all of the work of constructing the counterexample, but they were directly inspired by GPT’s earlier refutation of the unit distance conjecture. It remains open whether such a counterexample exists where A is a set of integers.]
Then, a few days later, a team at DeepMind, including my UT Austin colleague Swarat Chaudhuri, announced that they were able to use a system called AlphaProof Nexus to settle nine more (!) Erdös problems, many of them in additive combinatorics, along with miscellaneous other open math problems. Notably, in this case the AI also fully formalized its proofs in Lean.
And then, just today, Jelani Nelson alerted me to a new CS theory paper, which solves a longstanding open problem about electrical flows on graphs using a proof from GPT5.5Pro.
It seems to me that we’re now over the top of this particular rollercoaster, and it will keep accelerating until we reach the bottom, wherever that might be. I don’t know whether to hope or dread that solutions to P versus NP and all our other great problems will be included in the ride—that our role, as human mathematicians, will be reduced to (at most) deciding which questions we find interesting and then understanding AI models’ answers to those questions.
But maybe that won’t happen. Maybe the new AI mathematicians will soon hit a wall, because they lack the uncomputable quantum gravity microtubules of Penrose and Hameroff, or some other magic human ingredient. The fantastical thing is that, one way or the other, we’re going to find out empirically before very long.
Readers may have also seen the news that multiple prizewinning entries in a short fiction contest called the Commonwealth Prize, give overwhelming indications of having been written by AIs. As Kelsey Piper puts it:
There are, let’s say, also some noticeable similarities in the prose style between the winning stories that were flagged for AI use. AI chatbots love metaphors and similes, and they often spit out ones that sound vaguely pleasing but are logically incoherent or ascribe properties to things that don’t make sense.
“The Serpent in the Grove” gave us, “The girl smiled like sunrise over a sink.” “The Bastion’s Shadow” says, “She carried it now in her bag, heavy as a charm.” “Mehendi Nights” describes something as “swaying against plaster like a warning bell.”
The Commonwealth Foundation, whose judges chose these stories, hasn’t exactly covered itself in glory—saying, on the one hand, that it strictly forbids AI use but on the other, that it will continue taking authors at their word that they didn’t use AI, no matter the immensity of evidence to the contrary. As many others have pointed out, judges more versed in AI would’ve ironically been better placed to notice the signs of its use.
If only there were some sort of automated way to detect AI-generated text. Someone should really get on that problem, don’t you think?
But maybe we should just throw in the towel—as some of my colleagues have already done in the context of undergraduate projects? Maybe we should simply say that a good story is a good story, regardless of what manner of entity produced it?
As it happens, just last week I read my very first AI-written story that affected me as a story, to the extent that I wanted to read it more than once. This happened when I gave GPT5.5Pro the following simple prompt:
Write me a story about the most ancient Israelites that’s riveting like the stories of the Bible but that’s also consistent with all of the archeological evidence.
You can read the result here. One of my Facebook friends called it “disturbingly good,” and whatever the problems with the piece, I share that feeling. Of course, I’m well aware that GPT could easily generate a thousand stories like it—sampled from the same probability distribution—and then I could even do statistics on which tropes were the most common. This makes it feel silly to overindex on the first story that happened to be output, and yet somehow I did.
I feel like at this point, both the prophets of AI utopia like Ray Kurzweil, and of AI doom like Eliezer Yudkowsky, could be forgiven for asking: dude, will you listen to us YET? Do you still find it prudent to call this new form of terrestrial intelligence a stochastic parrot, a laughable fraud, or a fad that’s about to go away? Fear it all you want, hate it even, but at least respect it!
Which brings me to the other big AI news from the past week, namely that Pope Leo released his first encyclical, which is entitled “Safeguarding the Human Person in the Time of Artificial Intelligence.” I read it and … well, I certainly agreed with the theme that such a world-changing technology needs to be developed for the common good (as the Pope would have it, like the walls of Jerusalem), rather than for the profit or vanity of any one individual or company (in his analogy, like the Tower of Babel). I had quibbles with some of the other parts. Zvi Mowshowitz, as he often does, had a superb paragraph-by-paragraph analysis. Amusingly, there are indications that parts of the encyclical were written by AI.
To me, though, maybe the most notable part was that Chris Olah, who leads Anthropic’s interpretability team, was standing next to the Pope at the ceremony, and delivered his own remarks. I felt like Chris, who I met even before Anthropic existed, was a non-obvious yet inspired choice here, one of the rare figures in frontier AI whose technical and moral authority are both completely unimpeachable by anyone.
And so, at this momentous era for the human project, and on no less of an authority than that of the Vicar of Christ himself, the Supreme Pontiff and the Successor of Peter, I hereby throw myself on the wisdom and mercy of … uhh, I guess, Chris Olah and his team at Anthropic.
Chris, if I am soon to share the earth with entities that can prove the Riemann Hypothesis and solve quantum gravity after 30 seconds of thought, then may you understand those entities well enough to cause them to be nice.
Endnote: I should have foreseen, but didn’t, that the comments on this post would be dominated by people looking for ways to minimize whichever specific AI accomplishments I blogged about. Thus, it turns out, the ability of AI to solve Erdös problems just demonstrates that Erdös’s problems were never “serious” math in the first place—nothing like algebraic geometry or Grothendieck-style theory-building, which remain untouched. Likewise, the story I shared was obvious AI slop.
I had taken it as obvious that, when assessing AI’s impact on the world, one needs to look at least somewhat into the future: to remember where things were four years ago, compared to where they are today, and at least try to draw a straight line through the data, if not the exponential that seems to fit better.
Does anyone seriously doubt at this point that major open problems in algebraic geometry and other “Grothendieck-friendly” areas of math will fall to future AI models? Or that AI-written stories will improve, not only to win literary awards from AI-naïve judges, but to avoid the features that commenters here are complaining about? And that, whenever that happens, there will be new confident reasons not to care immediately offered up in comment sections like mine?
Apparently people do still doubt—hence the throwaway remark in my post about Penrose and Hameroff and microtubules. If not that or something like it, what exactly do they think the ceiling will be, and why?
Recently (I should have mentioned this before), I came across what I consider one of the greatest social experiments of all time, one that illuminates people’s reactions to every AI advance. A Twitter/X user named SHL0MS displayed the following AI-generated fake “Monet painting,” and asked people to explain what made it worse than real Monet paintings:
If you haven’t seen this yet, I recommend that you try the exercise yourself before reading further.
As it was, numerous art aficionados responded at length, savaging the flat, lifeless, uncreative AI slop, the emotionless composition, the missing spark, the lack of tranquility, the harshness, the lack of depth and symbiosis, and on and on and on.
Only after they had all said their piece did SHL0MS reveal that this is an actual Monet painting.
In the US, funding agencies seem to be increasingly opposed to an often inevitable feature of good science: international collaboration. Scientists have been told by officials at the National Institutes of Health that they need to remove mention of foreign collaborators from progress reports, or that they need to avoid such collaborations to begin with. At NASA, officials have told scientists that rather than just avoiding funding work in China, they should actively avoid collaborating with Chinese researchers. And a recently introduced bill would make that restriction more explicit.
I have a general policy against discussing concrete political issues on this blog, so I’m not going to dig into the details of who’s doing what here, how far it’s going or how novel it is. That policy extends to the comments. If you mention specific laws, politicians, or political parties, I will delete your comment.
I do want to say something more general, though. I think people often underestimate just how important international collaboration is.
I’ve talked before about how scientific specialization spreads scientists around the world. Scientists want to work with people who work on their specific interests, and there are often only a few people that fit that description. So people move across the world, creating centers of expertise.
More than that, though, essentially any activity, done well, is done internationally. The better you want to perform, the more likely it is that the best collaborator will be someone in another country.
People don’t notice this as much as they could, because they’re used to the exceptions. Popular art is often siloed by language and cultural references. Sports are intentionally set up as competitions between regions and nations, and militaries compete as a practical necessity. But without those exceptions, international competition wins out. The best doctor, the best classical musician, and the best businessperson for a job can’t be expected to come from one country or another. Those fields, like science, are international.
When that internationalism is weak, it’s a warning sign. Without that drive to succeed on an international stage, scientists get lazy. There are countries with a history of academic cronyism, where universities were run more on interpersonal politics than scholarly merit, cozy fiefdoms where prominent academics dole out positions. To combat this, policymakers work to make their research systems more international. They explicitly ask about international collaborations and participation in international conferences in grant applications, not to discourage them, but to encourage them: to reward academics who show merit on the international stage and break up lazy patronage networks.
It worries me that it sounds like some US policymakers want to do the opposite. People are increasingly worried about bias and groupthink in the sciences, and increasingly mad that scientists could be wasting the public’s money to maintain a cushy lifestyle. International collaboration is how you hold scientists to account, how you force them to compete and show their merit. If you drop that, academia is going to get a whole lot worse.
I recently showed you that if you take the regular icosahedron:
considered in a coordinate system based on the golden ratio, and then replace √5 by -√5 in all your formulas, you get the great icosahedron:
But this fact isn’t an isolated one-off! If we do the same for the regular dodecahedron:
we get the great stellated dodecahedron:
There’s also a star polyhedron called the great dodecahedron:
and if we play the same game, replacing by
in all the formulas, we get the small stellated dodecahedron:
These six polyhedra form a family; the four nonconvex ones are called the Kepler–Poinsot polyhedra. I never understood what was so great about them, though of course they look ravishingly attractive. So it was nice to learn that if we include the convex ones, they come in three pairs related by the operation of replacing by
which is called Galois conjugation. This is mentioned near the end of this book:
• John Horton Conway, Heidi Burgiel and Chaim Goodman-Strauss, The Symmetries of Things, A K Peters, Natick, Massachusetts, 2008.
These authors spend more energy describing three other relations among this family of polyhedra:
But I’m more interested in Galois conjugation, which carries each polyhedron in this picture to the one at the opposite corner of the hexagon. I got interested in Galois conjugation because it interchanges two kinds of quasiparticles that propagate in icosahedral quasicrystals, called phonons and phasons:
But there’s some simple geometry behind it, which I’d like to discuss here.
You’ll notice that in all the examples I gave, Galois conjugation takes regular pentagons to regular pentagrams. And that turns out to be a general fact!
Let be the golden field: that is, the set of all numbers
with rational, equipped with the usual addition, multiplication, subtraction and division. Define Galois conjugation
by
This map preserves all the field operations, and if you apply it twice you get back where you started. Thus, it’s like complex conjugation in many respects.
The golden field gets its name because it contains the golden ratio
If we apply Galois conjugation to the golden ratio, we get its negative reciprocal:
This suggests that Galois conjugation should somehow map regular pentagons to regular pentagrams! Why? Well, in a regular pentagon, each exterior turning angle is :
while in a regular pentagram, each exterior turning angle is
The cosine of the exterior turning angle for the pentagon is
and we apply Galois conjugation to this, we get the cosine of the exterior turning angle for the pentagram!
This is not quite a proof that Galois conjugation turns regular pentagons into regular pentagrams—indeed, we have to clarify what we even mean by that claim. But it’s a key ingredient of the proof.
To be more precise, let’s consider a regular pentagon in whose vertices lie in
Beware: such a pentagon is impossible in the plane!
Puzzle. Show this.
But it’s possible in 3 or more dimensions. For example:
taken in cyclic order, are the vertices of a regular pentagon in 3 dimensions. And once you can get one, you can get plenty, by translations and rotations. It may take a bit of thought to dream up rotation matrices with entries in the golden field, but there are lots: even rotation matrices with rational entries are dense among all rotations.
Now, Galois conjugation acts on coordinatewise; let’s abuse language and call this map
If we take a regular pentagon and apply this map to its vertices and edges, what do we get? A regular pentagram, I claim!
We can simplify the proof by noticing that only the cyclic ordering on the vertices is needed to distinguish a regular pentagon and a regular pentagram.
Theorem. Let be the vertices of a regular pentagon, listed in cyclic order. Then
listed in the same cyclic order, are the vertices of a regular pentagram.
Proof. Define the edge vectors
All these have the same squared length
The exterior turning angle at each vertex of the pentagon is so this is the angle between the consecutive edge vectors
and
Since
we have
Now let and
It is easy to check that the usual dot product of
obeys
so
and
Therefore the cosine of the angle between consecutive edge vectors is
Since
it follows that the angle between consecutive edge vectors is
But wait! The above calculation secretly assumed is positive, because we claimed that the usual positive square root of
equals
and we also felt free to divide by
Why is
positive? Writing
with we have
and thus
This is a sum of squares of real numbers, hence nonnegative. It is strictly positive because is injective, so the
are not all zero.
The five points are coplanar, since coplanarity amounts to the vanishing of certain 3 × 3 minors in the matrix of coordinate differences, which is a polynomial condition over
hence preserved by
These points are also distinct, since
is injective. The edges
thus form a planar closed
-gon with equal edge lengths
and constant exterior turning angle
at each vertex. This is a regular pentagram. █
The same style of argument shows that applying Galois conjugation to a regular pentagram with vertices in we get back a regular pentagon. And if you’re worried about what happened to the plane, fear not! We can draw regular pentagons in the plane whose vertices have coordinates in a certain quadratic extension of
This larger field again has an automorphism that carries regular pentagons to regular pentagrams.
We can also play similar games with heptagons and the like, using different fields.
The red pictures of polyhedra were made using were created using Robert Webb’s Stella software and placed on Wikicommons. The diagram of Kepler–Poinsot polyhedra was created by Tilman Piesk and placed on Wikicommons.
For a description of the small stellated dodecahedron and great dodecahedron as Riemann surfaces—branched coverings of the sphere—try this:
• John Baez, Small stellated dodecahedron, Visual Insight, 15 June 2016.
I do not know how these branched coverings are related to the Galois theory perspective given here!
On Mastodon, J. M. animated an icosahedron morphing into a great icosahedron:
and a dodecahedron morphing to a great stellated dodecahedron:
Here’s another fun example. If you take a rhombicosidodecahedron with vertex coordinates all in the golden field:
and apply the Galois transformation the pentagons turn into pentagrams, while the squares stay squares and the equilateral triangles stay equilateral triangles. It gets messy if we draw everything, but if we draw just the pentagrams it’s beautiful:
Interestingly the squares and triangles, not drawn, stay the same size—because the squared lengths of their edges are rational! But the squared lengths of the pentagon edges involve so they change as they become pentagram edges.
This post starts with the the slides of an elementary talk I gave at Sloans Bar and Grill, in Glasgow, as part of a wonderful series called A Pint of Science. At the end I include some fascinating details which I only had time to briefly touch on in my talk. If you already know plenty of physics, this is the juicy stuff.
Start with a hair:

A red blood cell is ten times smaller across:

A flu virus is 10 times smaller across than that:
The flagellum of this cell is ten times smaller across than that:

A molecule of hemoglobin is about ten times smaller than that:

A water molecule is ten times smaller than that:
A hydrogen atom is 5 times smaller than that:
This is an actual image of a single hydrogen atom, which is amazing. (Read more here.)
But how did people figure out that atoms even exist, back before we could see individual atoms? And how did we figure out what they’re made of?
It started with chemistry. Two liters of hydrogen burn with one of oxygen to form one liter of water vapor, so we guess water is made of 2 atoms of hydrogen and 1 of oxygen: H2O. But one cubic meter of oxygen is 16 times heavier than one of hydrogen, so we guess O is 16 times heavier than H.
And so on — this took a lot of detective work, back in the 1800s.
By 1871, Mendeleev created the periodic table, listing atoms in order of weight. Here’s a modern version:
But what are the numbers on these elements? They are called atomic numbers. They are not the atom’s weights. Now we know it’s the number of electrons they have! But how did we find out about electrons?
In 1869, William Crookes made a beam by evacuating a glass tube and running an electric current though it:
In 1897, J. J. Thomson showed that this beam is made of particles less than one thousandth the mass of hydrogen atoms! We now call these particle electrons though Thomson disliked this term. (Read more here.)
Electrons are negatively charged, but atoms have no charge, so there must be something positively charged in the atom. Thomson proposed that an atom is a ball of positive charge with electrons speckled throughout it. A journalist called this the plum-pudding atom, and the name stuck. (Read more here.)
But in 1886, Eugen Goldstein created a beam moving in the opposite direction from the electrons in an evacuated tube! (Read more here.)
In 1898 Wilhelm Wien showed this beam is made of positively charged particles with about the same mass as a hydrogen atom. It took a long time to realize the full importance of these particle. We now call them protons. (Read more here.)
In 1909 Rutherford’s team in Manchester showed that the positive charge in an atom is concentrated in a small part near its center: the nucleus.
His assistants Geiger and Marsden fired particles (which we now know are helium nuclei) at some gold leaf, and some bounced back. When Rutherford saw the results of this experiment, he wrote “it was almost as incredible as if you had fired a 15-inch shell at a piece of tissue paper and it came back and hit you”. The plum-pudding model was out!
But there’s a big problem. The “atomic weight” is the little number at the bottom of each square here:
If protons account for almost all the mass of the atom, the atomic weight of an atom should be the number of protons it contains. But since the atomic weight is bigger than the atomic number, this would mean atoms would have more protons than electrons — hence be positively charged. They’re not!
Let’s not even worry now about the fact that the atomic weight is not always close to an integer. Let’s just worry about this: how could an atom have more protons than electrons, if it’s electrically neutral?
Here’s one possible solution: the nucleus contains extra electrons to cancel out the excess positive charge. Rutherford argued for this in 1920.
For example helium has atomic number 2, but atomic mass 4. With 2 electrons and 4 protons, helium would have charge +2. It doesn’t! But if it also had 2 extra electrons in the nucleus it would be neutral, as observed.
But what would make some of the electrons stay in the nucleus, while others orbit it?
In 1930, Marie Curie’s daughter Irène and her husband created a beam of electrically neutral particles by bombarding the metal beryllium
with radiation.
In Cambridge, James Chadwick carried these experiments further and proved there was a neutral particle with almost the same mass as a proton: the neutron. (Read more here.)
It became clear that the nucleus of an atom is made of protons and neutrons!
Later experiments showed a nucleus is about 1/60,000 as big across as an atom. If the hydrogen atom were a sphere 30 meters across, the proton would be a grain of salt at the center.
This raises another huge problem. If the protons are confined in the tiny nucleus, and they’re all positively charged, they must repel each other immensely! What holds them together?
But that’s another story! I’ve sketched out how we figured out the basics of atoms — that’s atomic physics. The next story would be nuclear physics. There are always new puzzles, leading us onward.
The photograph is not just a picture of the wavefunction of the electron in a hydrogen atom; it’s the result of a complicated process. For more try these:
The second is more careful than its title. It does the unusual and welcome thing of saying that “the experimentally observed nodal structures originate from the transverse nodal structure of the initial state that is formed upon laser excitation” — which is the actual claim, properly hedged, rather than “we photographed the orbital.” It also notes the comparison between measured and predicted images, which makes clear that this is a quantitative experiment with theory backing it, not a literal photograph.
In fact, the man who discovered the electron through some of the most delicate vacuum-tube experiments ever performed was reportedly a complete disaster with his hands.
J. J. Thomson was hopeless in the lab. His assistant Ebenezer Everett forbade Thomson from touching anything delicate on the grounds that he was “exceptionally helpless with his hands”. According to Ainissa Ramirez:
For such a small man, he was a Victorian bull in a china shop. When he visited his students in the laboratory, they’d wince when he offered help, and quickly tried to move fragile things out of his way. They took deep breaths when he sat on a lab stool to speak. Life was no better at home. J. J.’s wife did not permit him to use a hammer in the house.
The irony: the 1897 electron experiments depended absolutely on glassware that almost no one in the world could make. You needed cathode-ray tubes capable of holding a vacuum so high that any ordinary tube would shatter, with metal electrodes sealed through the glass without leaking. Everett, a self-taught glassblower who had migrated over from the chemistry department, made every tube by hand. Thomson didn’t touch them.
Thomson called the particles he discovered “corpuscules”. The word “electron” was already in circulation when he made his announcement: George Johnstone Stoney had coined it in 1891 for the unit of charge in electrolysis, and Joseph Larmor was already using it in 1894 for a theoretical particle in his electromagnetic ether theory. Within months of Thomson’s April 1897 lecture, George FitzGerald suggested that the corpuscle identified by Thomson from cathode rays and proposed as parts of an atom was a “free electron,” as described by physicist Joseph Larmor and Hendrik Lorentz. While Thomson did not adopt the terminology, the connection convinced other scientists that cathode rays were particles.
Thomson dug in. He kept saying “corpuscle” through the whole period when he was actually building his case for the particle — the 1899 paper showing the photoelectric particles had the same m/e, the 1904 plum-pudding model paper, the 1906 Nobel Prize, all discussed “corpuscles.” Thomson himself continued to use the term corpuscle until 1913 — about sixteen years after his announcement, and well after the rest of the physics community had moved on. His 1906 Nobel citation reflects how oddly Thomson’s contribution was framed at the time: “His 1906 Nobel Prize was granted ‘in recognition of the great merits of his theoretical and experimental investigations on the conduction of electricity by gases,’ not for any specific discovery, let alone the electron (which he kept calling the corpuscle)”.
Part of the resistance was substantive, not just stubborn. “Electron” in Larmor’s and Lorentz’s usage was a theoretical entity tied to an ether-based electromagnetic worldview, with specific commitments Thomson didn’t want to inherit. Calling his thing a “corpuscle” let him keep it as a mechanical, material particle — a building block of the atom in his own picture — without buying into the Continental electromagnetic program.
So the textbook line “Thomson discovered the electron in 1897” smooths over a real conceptual identification that took a decade and a half, and that Thomson himself was among the last to embrace.
For more, try:
• Isobel Falconer, Corpuscles to electrons in Histories of the Electron, Buchwald and Warwick, eds., MIT Press, Cambridge, 2001.
• Ainissa Ramirez, The Alchemy of Us—How Humans and Matter Transformed One Another, MIT Press, Cambridge, Massaschusetts, 2020.
The “plum pudding” name was invented by an anonymous popular-science writer in 1906. You can see a careful historical reconstruction here:
• Giora Hon and Bernard R. Goldstein, J. J. Thomson’s plum-pudding atomic model: the making of a scientific myth, Annalen der Physik 525 (2013), A129–A133.
Hon and Goldstein tracked down the earliest occurrence: an anonymous reporter in the British pharmaceutical trade magazine The Chemist and Druggist, August 1906, who described the atom as having “minute specks, the negative corpuscles, swimming about in a sphere of positive electrification, like raisins in a parsimonious plum-pudding”. “The analogy was never used by Thomson nor his colleagues. It seems to have been coined by popular science writers to make the model easier to understand for the layman.”
In fact, by the time the Chemist and Druggist writer reached for the pudding image in 1906, Thomson’s model had already moved past the version it was supposedly describing. As Hon and Goldstein put it: “according to Thomson’s theory of 1906, the electrons revolve on rings about the center of the atom, and they are not distributed throughout the ‘pudding’ as ‘raisins swimming about.'” The pudding image was obsolete from the moment it was coined.
What Thomson actually used as his guiding analogy was much more interesting: floating magnets. Thomson drew an analogy with experiments by Alfred Marshall Mayer (1836–1897). Piercing small magnetic needles into corks and letting them float in water below a strong magnet, Mayer had observed in 1878/79 that the magnetized floating needles quasi-automatically positioned themselves in characteristic configurations depending on their number. With more than six magnetic needles present, a seventh and eighth would inevitably position itself inside the outer ring of six. As the number of floating magnets increased, more and more rings would form. Thomson hoped that a similar ring-structure composed of corpuscles could be found inside chemical atoms, and suspected that each of these rings would be analogous to the chemical periods. So in Thomson’s head, the atom looked like Mayer’s needle-in-cork rings — dynamic, self-organizing, suggestive of the periodic table, not a dessert.
Another question: was the model even Thomson’s alone? Britannica notes that the model was “proposed about 1900 by William Thomson (Lord Kelvin) and strongly supported by Sir Joseph John Thomson, who had discovered (1897) the electron”. Kelvin had been working on similar uniform-positive-sphere ideas a few years before J.J. Thomson’s 1904 Philosophical Magazine paper, so the standard “Thomson’s plum pudding” of textbooks elides both a co-originator and the actual originator of the name.
• A. M. Mayer, Floating magnets, Nature 17 (1878), 487–488.
• Klaus Hentschel, Atomic Models, J.J. Thomson’s ‘Plum Pudding’ Model, in Compendium of Quantum Physics, eds. D. Greenberger, K. Hentschel and F. Weinert, Springer, Berlin, 2009.
• John Heilbron, J.J. Thomson and the Bohr atom, Physics Today 30 (1977), 23–30.
• Wikipedia, Plum pudding model.
• Britannica, Thomson atomic model.
In Crookes’ tube, electrons flew from a negatively charged metal tip (the cathode) to the positively charged one (the anode). Thus, electrons were called cathode rays, and this apparatus was called a cathode ray tube.
By the mid-1880s, the phenomena created by these tubes was a hot but baffling field. There was a vigorous debate about whether cathode rays were charged particles (the Crookes/Schuster view) or some kind of wave in the aether (the German view, which Goldstein himself leaned toward). In 1876, Goldstein discovered that cathode rays were emitted perpendicular to the cathode surface. This inspired the design of concave cathodes to produce concentrated or focused rays. So experimenters were routinely fiddling with cathode geometries — different shapes, different sizes, different configurations — to study cathode ray behavior. Goldstein spent his time doing this.
Eventually, around 1886, he tried drilling small holes in the cathode — Kanäle, meaning “channels” or “canals”. He discovered something surprising: wth the tube running, a faint glow appeared streaming out the back of the cathode, opposite to the direction of the cathode rays. Busch writes: “Goldstein therefore called them ‘canal rays’ (German ‘Kanalstrahlen’). Since the canal rays went in the opposite direction as the ‘cathode rays’, Goldstein speculated that these rays consisted of positively charged particles.”
Later these canal rays were called “anode rays” as a parallel to “cathode rays,” but it’s somewhat misleading — these rays don’t actually originate at the anode. They “are produced by positively charged ions after impact ionization in the cathode ray tubes. These ions are accelerated towards the cathode, pass — if the cathode is provided with holes — through these holes (‘channels’) due to their inertia and can be detected behind the cathode by their luminous phenomena.” So they’re lone protons formed in the gas, accelerated toward the cathode, that shoot through the holes by inertia and emerge on the far side.
• Britannica, Eugen Goldstein.
• Wikipedia, Eugen Goldstein.
• Encyclopedia of Scientific Biography, Eugen Goldstein.
• Uwe Busch, Claims of priority — The scientific path to the discovery of X-rays, Z. Med. Phys. 19 (2023), 230–242.
Wilhelm Wien, famous for his work on blackbody radiation, was working at Aachen when in 1898 he turned to the problem Goldstein had left dangling. Goldstein himself had tried to deflect canal rays magnetically and failed. Using the strongest magnet he had, one that certainly had an effect on the cathode rays, Goldstein attempted to deflect his canal rays, but he observed no change in their path. So for twelve years the rays were a sort of mystery — clearly something real, but resistant to the magnetic deflection trick that was the standard probe of the era.
Wien managed to bend canal ways using brute force. To deflect the cathode rays, you only needed a modest field. To deflect the canal rays, Wien needed a magnetic field of 3250 gauss and a electrical field of 2000 V — and his rays moved a grand total of 6 mm. The thing was enormously more sluggish to deflect than electrons, and this meant the particles were vastly more massive per unit charge. Goldstein hadn’t been doing anything wrong — he just hadn’t been hitting them hard enough!
The really nice piece of physics is how Wien figured out the charge/mass ratio of canal rays. He invented what we now call the Wien filter: a charged-particle velocity filter with orthogonal electric and magnetic fields. The trick is elegant. The magnetic force on a particle depends on its velocity; the electric force doesn’t. So if you tune the fields against each other, only particles of one particular velocity sail straight through undeflected — everything else gets bent. Once you know the velocity, the deflection in a pure magnetic field tells you charge-to-mass. It’s a beautiful idea, and it’s still the working principle behind ion-beam instruments today.
Wilhelm Wien analyzed these positive rays in 1898 and found that the particles have a mass-to-charge ratio more than 1,000 times larger than that of the electron. Because the ratio of the particles is also comparable to the mass-to-charge ratio of the residual atoms in the discharge tubes, scientists suspected that the rays were actually ions from the gases in the tube. So Wien did not discover the proton, contrary to what’s sometimes claimed. What he found was:
• The deflection was in the opposite direction from cathode rays — so the particles were positively charged.
• The charge-to-mass ratio depended on the gas in the tube, not on the cathode material — so these weren’t some universal particle like the electron; they were ions of whatever gas you’d filled the tube with.
• The mass-to-charge ratio was comparable to atomic masses, meaning these were essentially atoms missing electrons.
The hydrogen ion — which we now call the proton — was just one of many ions Wien could produce, and there was nothing special about it in his work. The identification of the hydrogen ion as a fundamental constituent of all nuclei came much later, with Rutherford’s nuclear-scattering experiments and his 1919 demonstration that he could knock hydrogen nuclei out of nitrogen. Rutherford named it the proton around 1920.
So the lineage is this: Goldstein 1886 saw a glow, Wien 1898 showed it was ionized gas, and finally Rutherford 1919–1920 identified the hydrogen ion as a universal nuclear building block.
The discovery of the neutron was an extremely complicated business, starting with the the problem of understanding atomic mass (Z) and atomic number (A) in the periodic table.
Prout, isotopes, and the Z-vs-A puzzle. The puzzle has roots in William Prout’s 1815 hypothesis that atomic weights are integer multiples of hydrogen, suggesting hydrogen as a building block for all the other elements. Through the nineteenth century this looked broken: chlorine sat at 35.5, and most elements were not clean integers. Two developments rehabilitated it. Soddy in 1913 introduced the concept of isotopes — the same element can exist in chemically identical forms of different mass — explaining the non-integer atomic weights as weighted averages. J.J. Thomson’s 1912–13 work on neon gave the first stable-element example, and Aston’s mass spectrograph (1919 onwards) established what Aston called the “whole-number rule”: individual isotopes have atomic masses very nearly integral in units of hydrogen, with small deviations here and there. In parallel, Moseley’s 1913 work using X-rays pinned down atomic number Z as the nuclear charge, not just an ordinal label. Putting these together you got the central puzzle: for elements heavier than hydrogen, the atomic mass A is larger than the atomic number Z!
The proton-electron nucleus. The obvious move was: maybe the nucleus contains A protons and A − Z electrons, giving the right charge and mass. The model was appealing on three grounds beyond accounting. First, only two particles were known (proton and electron), so parsimony favored it. Second, beta decay does eject electrons from nuclei, so the electrons appeared to be there to begin with. Third, alpha particles (helium-4 nuclei) could be assembled as 4p + 2e, neatly. Rutherford himself adopted this picture in his 1920 Bakerian Lecture:
• Ernest Rutherford, Nuclear constitution of atoms, Proc. Roy. Soc. A 97 (1920), 374–400.
In that lecture he went further: he proposed that some proton-electron pairs inside the nucleus might be bound so tightly as to form a compact neutral unit: “the idea of the possible existence of an atom of mass one, which has a zero nuclear charge”. He floated the term “neutron” for it. (W.D. Harkins coined the same term independently in 1921.) Note that Rutherford’s neutron was not a new elementary particle; it was a compound, a sort of miniature hydrogen atom. This conception persisted for over a decade and shaped how the discovery was eventually interpreted.
Three killer problems with electrons in the nucleus. Through the 1920s, three independent objections accumulated.
• The confinement problem. Confining an electron to a nucleus of radius ~10⁻¹⁴ m gives, by the uncertainty principle, a momentum spread of order ℏ/r and hence a kinetic energy of order ~20 MeV — far larger than any nuclear binding energy then known, and far larger than the energies of beta-decay electrons. With Dirac’s relativistic electron theory (1928), things got worse: Klein’s paradox showed that electrons in sufficiently strong potentials don’t bind cleanly at all. Bohr was so disturbed by this that around 1929–30 he was openly speculating that energy and momentum conservation might fail at nuclear scales.
• The nitrogen-14 anomaly. This is the cleanest of the three and the one that broke the model. Studies of the spectrum of molecular nitrogen in the late 1920s showed that the nitrogen-14 nucleus is a boson, hence must have integer spin. (Direct measurement gave spin 1.) But in the theory where a neutron is a proton-electron combination model, nitrogen-14 would be 14p + 7e = 21 spin-½ fermions, which must have half-integer total spin and thus be a fermion. The model predicted the wrong thing for the most common nitrogen isotope!
• The continuous beta spectrum. Chadwick himself, back in 1914, had shown that beta decay produces a continuous energy spectrum, not a line spectrum. Ellis, Wooster, and (decisively) Meitner and Orthmann’s 1929 calorimetric experiment confirmed energy was apparently not conserved. Pauli’s famous December 1930 letter “Dear Radioactive Ladies and Gentlemen” proposed a neutral, spin-½, light particle inside the nucleus to fix both energy conservation and the spin-statistics problem simultaneously — he called his particle a “neutron” too. (Fermi renamed it “neutrino” in 1933 after Chadwick’s discovery, to avoid the name collision.)
So by 1930 there were three structurally independent reasons to think the proton-electron nucleus was wrong, and Rutherford had a candidate replacement that nobody had yet found.
The 1930–32 experimental crescendo. The discovery story is what the Encyclopedia of Scientific Biography calls a “Tale of Three Cities” — Berlin, Paris, and Cambridge. “Walther Bothe and his assistant Herbert Becker [were] working in the Physikalisch-Technische Reichsanstalt (Imperial Physical-Technical Institute) in Charlottenburg, a suburb of Berlin; Irène Curie and her husband Frédéric Joliot working in the Institut du Radium in Paris; and James Chadwick working in the Cavendish Laboratory in Cambridge. The story reached its crescendo between June 1930 and February 1932.”
• Berlin, 1930. Walther Bothe and Herbert Becker observed that bombarding beryllium with alpha particles emitted a highly penetrating, electrically neutral radiation. They, along with most of the scientific community, incorrectly assumed this radiation was exceptionally high-energy gamma rays. Lead absorption coefficients suggested gamma rays of unprecedented energy — odd, since the nuclear binding energies available from α + ⁹Be shouldn’t have been enough.
• Paris, late 1931 – January 1932. Irène Joliot-Curie and Frédéric Joliot, with the world’s strongest polonium alpha source (an inheritance, essentially, from Marie Curie’s lab), pushed the Bothe radiation through a paraffin window. They found that this radiation knocked loose protons from hydrogen atoms in that target, and those protons recoiled with very high velocity. They interpreted the result as the scattering of high-energy photons off protons — which, as Chadwick noted in his Feb 27, 1932 Nature letter, required “that the beryllium radiation had a quantum energy of 50 × 10⁶ electron volts”. That was about ten times more energy than any plausible nuclear gamma source could produce. Curie and Joliot reported their findings on 18 January 1932, and the Comptes Rendus arrived in Cambridge within a couple of weeks.
• Cambridge, February 1932. Chadwick read the paper and, primed by Rutherford’s 1920 prediction and his own decade-long search for the neutron, instantly suspected the right answer. Rutherford’s reported reaction was characteristic: “I do not believe it!” Chadwick built the experiment immediately — borrowed polonium, the Cavendish’s beryllium, an ionization chamber. The crucial idea was simply to substitute multiple other targets for the paraffin: hydrogen, nitrogen, helium, lithium. If the projectile is a massive neutral particle of unknown mass m, you can work out the maximum recoil velocity of a nucleus of mass M. Comparing the maximum recoil velocities for hydrogen and nitrogen, Chadwick could solve algebraically for m. He found m is close to the proton mass. Three weeks after reading the Paris paper, he submitted a paper on the possible existence of a neutron on February 17th, 1932, which was published ten days later:
• James Chadwick, Possible existence of a neutron, Nature 129 (1932), 312.
He sent a fuller, more confidently titled paper to the Royal Society on the 10th of May:
• Chadwick, The existence of a neutron, Proc. Roy. Soc. A 136 (1932), 692–708.
Aftermath: from compound to elementary. Chadwick himself, in 1932, still followed Rutherford and described the neutron as a tight proton-electron compound — he wasn’t yet claiming a new elementary particle. The shift came over the next two years. Heisenberg’s three “On the structure of atomic nuclei” papers (July–December 1932) treated the proton and neutron as two states of a single ‘nucleon’ with the strong force coupling them via isospin — a structure that doesn’t work at all if the neutron has an electron lurking inside it. Fermi’s January 1934 theory of beta decay treated the electron as created at the moment of decay (the electromagnetic-radiation analogy: photons aren’t “in” the atom before emission either), removing the last motivation to put electrons in the nucleus. And the final empirical nail, suggested to Chadwick by the young Maurice Goldhaber, was photodisintegration of the deuteron: γ + d → p + n. By energy conservation, the neutron mass came out at greater than 1.0077 and less than 1.0086 in atomic mass units. Thus, the neutron was more massive than the hydrogen atom which had an accurately determined mass of 1.0078 amu. A bound p+e would weigh less than free p plus free e (binding energy is negative); since the neutron weighs more than a hydrogen atom, it cannot be such a bound state. By 1934 the neutron was firmly an elementary particle.
Primary sources and good histories:
• Lawrence Rutherford, Bakerian Lecture: nuclear constitution of atoms, Roy. Soc. Proc. A 97 (1920), 374–400.
• Irène Joliot-Curie and Frédéric Joliot-Curie, Émission de protons de grande vitesse par les substances hydrogénées sous l’influence des rayons γ très pénétrants, Comptes Rendus 194 (1932): 273.
• APS News, Chadwick reports the discovery of the neutron, May 1932.
• S. M. Bilenky, Neutrino: history of a unique particle.
• Roger Stuewer, The nuclear electron hypothesis, in Otto Hahn and the Rise of Nuclear Physics, Reidel, 1983.
As experimental capabilities advance rapidly, the quantum computing community faces a critical elephant in the room: What will these quantum machines eventually be useful for? Will they deliver the promised broad societal impact, or will they remain highly specialized devices for exotic tasks known only to the experts?
Despite decades of effort, conclusive evidence of large quantum advantage in real-world applications remains confined to a few niche domains, such as simulating quantum materials and cryptanalysis. These problems are either inherently quantum to begin with, or they possess specialized mathematical structure that quantum algorithms can easily exploit. But it seems unlikely that such structures appear broadly in everyday life.
Indeed, most applications of modern computation hinge on the processing of massive, noisy classical data, generated at an unprecedented pace across society. That is the driving force behind the overwhelming success of machine learning and AI. Since the data originates from the macroscopic classical world, there is no obvious reason it should exhibit the delicate, specialized structures that quantum computers require. To playfully adapt Richard Feynman’s famous quote: We live in an effectively classical world, dammit, and maybe classical computers and AI already suffice for most of our problems. (For those unfamiliar, Feynman originally quipped: “Nature isn’t classical, dammit, and if you want to make a simulation of nature, you’d better make it quantum mechanical.”)

To truly unlock the power of a quantum computer, quantum algorithms typically need to access data in quantum superposition, processing many different samples simultaneously in different branches of the quantum multiverse. To use technical jargon, this is called querying a quantum oracle. But in reality, the classical data samples that we want to process are generated from everyday activities in a classical world, and we can only access them one at a time.
Think of the movie reviews you scroll through on a streaming platform. How would you read the plain-text reviews from a million different users all at once in a quantum superposition? This bottleneck—the challenge of efficiently accessing the classical world in quantum superposition—is known as the data loading problem. It has arguably been one of the main obstacles to achieving broadly applicable quantum advantage.

In this new work [1], we provide a solution to this seemingly impossible challenge. We develop a framework, called quantum oracle sketching, that enables us to access the classical world in quantum superposition in an optimal way. Importantly, it automatically handles the noise and correlations in the data, and natively supports flexible data structures like vectors and matrices that enable machine learning applications.
The core mechanism relies on processing data as a continuous stream. For each classical data sample we observe, we apply a carefully designed, small quantum rotation to our system. By sequentially accumulating these quantum rotations, we incrementally build up an accurate approximation of the target quantum oracle, which can then be used in any quantum algorithm for data processing. Because every data sample is processed once and immediately discarded, we completely eliminate the massive memory overhead typically required to store the dataset. The fundamental price to pay for assembling quantum queries from classical data lies in the sample complexity: our algorithm consumes a number of samples that scales quadratically with the number of quantum queries we need to make. We show that this rate is optimal and fundamentally arises from the quadratic relationship between quantum amplitudes and classical probabilities governed by the Born rule.

With the data successfully loaded into the quantum computer, the final challenge is to efficiently read out classical results. To address this, we develop an efficient measurement protocol called interferometric classical shadow. Combined with quantum oracle sketching, it allows us to circumvent the data loading and readout bottleneck to construct exponentially compact classical models from massive classical data with quantum technology.
Using this new approach, we are finally able to find exponential quantum advantage in processing classical data and machine learning. We rigorously prove that a small quantum computer can perform large-scale classification and dimensionality reduction on massive classical data by processing samples on the fly. In contrast, any classical machine achieving the same prediction performance requires exponentially larger size. When the classical machine does not have the required exponentially large memory size, it needs super-polynomially more samples and time relative to our protocol running on a quantum device. Remarkably, this illustrates that quantum technology enables us to construct compact and accurate classical models out of classical data, which is impossible with classical machines alone unless given exponentially larger memory.

The true scale of this exponential memory advantage is staggering. A quantum processor with 300 logical qubits can outperform a classical machine built from every atom in the observable universe. Of course, to actually see such a comical contrast, we would also need universe-scale datasets and processing time.
To contextualize these results in realistic scenarios, consider a large-scale scientific experiment, like a large particle collider. Each experimental run generates a colossal volume of data. With a quantum computer, we can keep squeezing all the data into this tiny quantum chip to perform downstream machine learning tasks such as classification and dimensionality reduction. But if we only have classical machines, we would need to build massive, energy-consuming data centers to store the raw data to match the performance. Without this massive memory overhead, classical machines simply couldn’t extract the same clear signals from a single run, forcing us to repeat the massive, expensive experiment many more times to compensate. To put this into perspective, the Large Hadron Collider (LHC) at CERN generates petabytes (millions of gigabytes) of data per hour, but the data storage bottlenecks force researchers to discard all but a tiny fraction—retaining perhaps only one in a hundred thousand events.
We validated these quantum advantages on real-world datasets, including movie review sentiment analysis and single-cell RNA sequencing. In these public datasets, we demonstrate four to six orders of magnitude (ten thousand to a million times) reduction in memory size with fewer than 60 logical qubits. Given the rapid advancements in high-rate quantum error correction codes and experimental techniques, quantum computers capable of demonstrating such applications are foreseeable in the near future. Crucially, the quantum advantage we propose likely carries a clearer positive impact for society and likely arrives sooner than the applications in cryptanalysis, where the current best estimate requires a thousand logical qubits.
Our results provide strong evidence that the utility of quantum computers extends far beyond specialized tasks, opening a path for quantum computers to be broadly useful in our everyday life. Rather than fearing that classical AI will “eat quantum computing’s lunch,” we now have rigorous evidence pointing towards a much more exciting prospect: quantum-enhanced AI overpowering classical AI.
Of course, there is still a long way to go towards the dream of quantum intelligence. Our current results establish the provable supremacy of quantum machines in foundational machine learning tasks, such as high-dimensional linear classification and dimensionality reduction. They do not yet imply immediate utility for modern generative AI such as large language models.
That said, our results give me a strong feeling that we are living in an age strikingly reminiscent of the traditional machine learning era—an age dominated by support vector machines and random forests; an age when we relied on rigorous statistical analysis because we lacked the computational resources for large-scale heuristic exploration; an age that ultimately heralded the birth of deep learning and the AI revolution. Today, quantum AI seems to sit at a similar historical position. I cannot wait to see what quantum AI will become once we are capable of unconstrained heuristic exploration on large-scale fault-tolerant quantum computers.
To accelerate this dawn of quantum AI, we invite physicists, computer scientists, developers, and machine learning practitioners to join our efforts and help us push the boundaries of what quantum AI can achieve. To bridge the gap between abstract quantum theory and hands-on machine learning practice, we are open-sourcing our core framework. Our numerical implementation of quantum oracle sketching is built in JAX, natively supporting GPU/TPU acceleration and automatic differentiation to integrate nicely with modern machine learning pipelines. Check out the code, run the simulations, and help us shape the future of quantum AI at github.com/haimengzhao/quantum-oracle-sketching!
[1]. Haimeng Zhao, Alexander Zlokapa, Hartmut Neven, Ryan Babbush, John Preskill, Jarrod R. McClean, and Hsin-Yuan Huang. Exponential quantum advantage in processing massive classical data, arXiv:2604.07639, 2026.
The Manhattan Project was the largest government sponsored research and development project of its time. Some things worth noting, in light of the present US government attitude toward science:
That is a truly world-class roster of municipal nicknames.
Tomorrow is Memorial Day.
William James, in 1897, dedicating the monument to Col. Robert Gould Shaw and the 54th regiment, a monument which still stands in Boston Common today:
“And when South Carolina took the final step in battering down Fort Sumter, it was the fanatics of slavery themselves who called upon their idolized institution ruin swift and complete. What law and reason were unable to accomplish, had now to be done by that uncertain and dreadful dispenser of God’s judgments, War – War, with its abominably casual, inaccurate methods, destroying good and bad together, but at last able to hew a way out.”
More James:
“What country under heaven has not thousands of such youths to rejoice in, youths on whom the safety of the human race depends? Whether or not they leave memorials behind them, whether their names are writ in water or in marble, depends mostly on the opportunities which the accidents of history throw into their path”
The monument, and James, and the 54th regiment, and the atom bomb, are all in Robert Lowell’s poem “For the Union Dead”:
“He is out of bounds now. He rejoices in man’s lovely,
peculiar power to choose life and die—
when he leads his black soldiers to death,
he cannot bend his back.”
Shaw died in the assault on Fort Wagner too. William James’s younger brother was a volunteer in the 54th, and was gravely wounded; his presence there goes unmentioned in James’s speech.
James concludes by emphasizing that military valor is not the only kind of valor, not even the only kind that Col. Shaw himself displayed and was cast in bronze for.
“What we really need the poet’s and orator’s help to keep alive in us is not, then, the common and gregarious courage which Robert Shaw showed when he marched with you, men of the Seventh Regiment. It is that more lonely courage which he showed when he dropped his warm commission in the glorious Second to head your dubious fortunes, negroes of the Fifty-fourth. That lonely kind of courage (civic courage as we call it in times of peace) is the kind of valor to which the monuments of nations should most of all be reared, for the survival of the fittest has not bred it into the bone of human beings as it has bred military valor; and of five hundred of us who could storm a battery side by side with others, perhaps not one would be found ready to risk his wordly fortunes all alone in resisting an enthroned abuse. The deadliest enemies of nations are not their foreign foes; they always dwell within their borders. And from these internal enemies civilization is always in need of being saved. The nation blest above all nations is she in whom the civic genius of the people does the saving day by day, by acts without external picturesqueness; by speaking, writing, voting reasonably; by smiting corruption swiftly; by good temper between parties; by the people knowing true men when they see them, and preferring them as leaders to rabid partisans or empty quacks. Such nations have no need of wars to save them.”
I never knew what was so great about the ‘great icosahedron’. Now I do.
Take a regular icosahedron whose vertices have coordinates in the field ℚ[√5], which consists of numbers a + b√5 with a and b rational.
Apply the nontrivial element of the Galois group of this field: that is, simply replace √5 by -√5 in all your formulas. You get the great icosahedron:
What do I mean by this, exactly? For example, take the regular icosahedron whose 12 vertices are
where Φ = (1 + √5)/2 is the golden ratio. Form a bunch of
• points (all the vertices of the icosahedron),
• lines (containing all the edges of the icosahedron), and
• planes (containing all the faces of the icosahedron)
Now replace √5 by -√5 in all your formulas. You get the equations for a new bunch of points, lines and planes. And:
• the points are the vertices a great icosahedron;
• the lines contain all the edges of this great icosahedron;
• the planes contain all the faces of this great icosahedron.
So, replacing √5 by -√5 makes the icosahedron great!
On Mastodon, J. M. animated this using an interpolation process called ‘tweening’:
If you apply the Galois transformation again you get back the icosahedron. If you apply it yet again you make the icosahedron great again.
I thank and andeux for their help on this.
This fact is asserted without proof in the section on regular star-polytopes near the end of this book:
• John Horton Conway, Heidi Burgiel and Chaim Goodman-Strauss, The Symmetries of Things, A K Peters, Natick, Massachusetts, 2008.
Let me sketch an uninspired, purely computational proof. Take each of these twelve icosahedron vertices:
and find its five nearest neighbors. Then apply the Galois transformation, which amounts to replacing Φ with -1/Φ. You get twelve new points
which turn out to be vertices of a new, smaller icosahedron!
Then, for each vertex of the original icosahedron, see where it goes under this transformation, and see where its nearest neighbors go. You’ll see they become second nearest neighbors.
Thus the edges of the original icosahedron, which connect nearest neighbor vertices, go to edges connecting second nearest neighbors of a new smaller icosahedron.
What about the faces? The faces of the original icosahedron are equilateral triangles whose corners are triples of icosahedron vertices that are all nearest neighbors of each other. So, they get sent to triangles whose corners are triples of vertices that are all second nearest neighbors.
And these facts characterize the great icosahedron: it has the vertices of a regular icosahedron, edges connecting all pairs of second nearest neighbor vertices:
and faces formed by all triples of second nearest neighbor vertices. You can’t see any of these triangular faces in its entirety in this picture—but if you look hard you can see most of one.
These are all the possible squared distances between vertices in the original icosahedron and the new great icosahedron:
| Original icosahedron | Great icosahedron |
|---|---|
| 0 | 0 |
| 4 | 4 |
| 4 + 4Φ ≈ 10.47 | 4 − 4/Φ ≈ 1.53 |
| 8 + 4Φ ≈ 14.47 | 8 − 4/Φ ≈ 5.53 |
To get the new squared distances from the old ones we simply replace Φ by -1/Φ, as we must, since Galois transformations preserve the field operations and squared distances are computed using field operations.
The nearest neighbors had distance squared 4, and these must be mapped to new points that still have distance squared 4—but now these are second nearest neighbors.
Of course it would be nice to have a deeper, more general understanding of what the Galois transformation sending to
does to the geometry of shapes made from regular pentagons. Soon I’ll dig a bit deeper into this!
Several items worth reading about as we head into a long weekend in the US. Starting with news related to funding and other aspects of US government policy:
Additional suggestions that look cool but I haven't had time to actually read:
Thomas Dietterich, Chair of the Computer Science section of the preprint server arXiv.org, recently clarified the site’s policies towards “hallucinated” citations and other signs of careless use of AI in a post on X. If your paper contains a citation to a paper that doesn’t actually exist, or has other signs you didn’t read it before posting like leftover commentary (the example he gave was “here is a 200 word summary; would you like me to make any changes?”), then you can get banned from the arXiv for one year. Even after that year you’d be on a kind of “probation”, and would need to show that your next few papers had been accepted by peer-reviewed journals first before posting them.
At the risk of saying the obvious, this is a good idea! arXiv isn’t peer review, it isn’t meant to judge the value of the papers it hosts. But it still needs to be a useful place for scientists to post their papers, which is why they try to keep spam and irrelevant content to a minimum. If you don’t actually endorse the content of a paper, you shouldn’t post it in the first place.
That said, the whole existence of hallucinated citations on arXiv feels a little silly. It makes sense for academic journals and preprint servers in other fields. But arXiv was the first site of its kind for a reason. Its users, physicists, mathematicians, and computer scientists, don’t need much hand-holding when it comes to computers. Papers submitted to arXiv aren’t typically written in Word, they’re written in a document-writing language called LaTeX, that lets users make decently-formatted papers without help from a journal. Physicist-written code may be terrible by any reasonable criteria…but it exists, much more universally than for example biologist-written code.
This extends to citations. In my old field, there is a database called INSPIRE that updates automatically from arXiv. Click on a paper, and a handy “cite” link gives you standardized citations in several formats, ready to copy and paste into your LaTeX code. Nearly every citation in my papers is copied from there. The ones that aren’t are either from other fields where I didn’t know of that style of database, or things that haven’t been published (this can be manuscripts in preparation, or personal communications).
All of this, though, feels like a lot less than what the field could be doing. In a world where almost everyone posts their papers to the same website, and almost everyone has at least a rudimentary understanding of programming…why are people still writing citations in free-form text in the first place? Why aren’t citations built in to the submitted papers on arXiv, automatically linked to the papers they cite? Why don’t we have a setup where, except for a small number of “special” citations, every citation is built so that it automatically goes to a real paper, and gives a clear error message if it doesn’t? In short, why are hallucinated citations even possible?
Look, I’m naive, I get that. I believe in automation, not in the modern context of LLMs and other heuristics, but in setting clear procedures and building clear rules. The world doesn’t work that way! The clear rules are always more contentious than you expect, the fuzzy human-led version always the only choice people can agree on.
But still. Citations. There has to be a better system, right?
I was thinking the other day about this old question, a favorite of mine.
Here’s another version of it — if anything along the lines of Belyi’s proof of Belyi’s theorem works, one might expect this to be true.
Question: Suppose given five points p_1, .. p_5 in P^1(Q(t)). Is there a map F from P^1 to P^1 such that the the critical values of F and F(p_1), … F(p_5) are all contained in a set of size four?
This would be “Belyi for M_{0,5},” I suppose. I have to admit, I feel like the answer is probably no.
Why it seems hard: every set of p_1, .. p_5 obtainable is obtained from some family of covers of P^1 branched at four points — in other words, it’s the image in M_{0,5} of some 1-dimensional Hurwitz space. So you have some countable union of curves in M_{0,5}. Now all you have to do is write down a curve in M_{0,5} which is not one of these. But… how do you know you can do this? If you were allowed to work over C, it’s easy! But there are only countably many curves in M_{0,5} defined over Q, or for that matter over Qbar. Maybe the Hurwitz curves use them all up! I doubt it. But I don’t see how to rule it out. Stranger, or at least equally strange things have happened; I am thinking, of course, of Belyi’s theorem itself, which one simply can’t believe until one sees the proof, and even then, only just.
It’s not obivous to me that this is hard. There might be a quick way to write down a p_1, .. p_5 for which there’s no such F. Good problem to chew on with the help your favorite example-generating large language model! (So far I couldn’t get Claude Opus 4.7 to have any luck with it, but your mileage may vary.)
Last night: my second visit to Wrigley Field, once again to see the Brewers. Did I blog the last one? Doesn’t seem I did. I went with CJ and a bunch of his friends (whence the aphorism.) Brief description, because it’s late.
At least three Baltimore chops in this game.
Lines for food at Wrigley are at least an inning long. Shouldn’t this be a solved problem?
My first time seeing Les Miz pitch. Maybe my last, as I have to believe that a guy can’t throw this fast this consistently without his arm flying off his body sometime very early in his career. Prior to this game, Misiorowsky had thrown 233 pitches at 100+mph. All other MLB starters combined: 149.
Brewers cruise most of the game, Cubs getting 3-hit, they rally in the eighth against unusually shaky Aaron Ashby, loading the bases and scoring a run on what should have been an inning-ending grounder when Luis Rengifo in a very ex-Angel fashion fumbled the pickup. But Chad Patrick got Michael Conforto to ground out to end the threat.
Impressive organ work at Wrigley. Runner dives back to first to avoid the pickoff: “Get Back,” by the Beatles. Batter fouls one off high above his head: “Straight Up,” by Paula Abdul. Best of all was the organist’s welcome for Garrett Mitchell: the theme song from “The Facts of Life.” How many people at Wrigley Field last night got that reference? Well, I got it, Wrigley Field organist, and I appreciated it.
It’s happening.
Your inbox registers an email from the chair of a faculty-hiring committee. With trembling fingers, you click on the message. “We’ve been grateful for the opportunity to learn about your work…The decision was very difficult…many highly qualified candidates…” Months of labor, soul-searching, strain, and anxiety give way to despair. The committee has filled the position, and not with you.
I recently published advice about how to proceed if you receive an offer of a faculty position. But what if you don’t receive an offer—what if hiring committees reject you? Or scholarship committees, grant committees, admissions committees, or potential advisors? What if a journal referee shreds your magnum opus? Or a program committee declines your submission to a conference?
Failure suffuses science as dinner suffuses a nighttime diaper, for two reasons. First, undertaking science is difficult. By “undertaking science,” I mean formulating and proving theorems, cajoling equipment into working, identifying bugs in code, extracting meaning from noisy data, etc. Second, science doesn’t unfold in a vacuum. Human beings do science within a society fraught with opinions, emotions, and limited resources. These challenges lead to the failures and rejections that this article addresses most. (If you’re a member of my group and you’re facing a failure due to the difficulty of undertaking science, come talk with me.)
What can you do if failure or rejection ails thee? The rest of this article prescribes, in chronological order, steps that will help you recover. You can even turn your lemons into, if not lemonade, then not-entirely-unappetizing lemon meringue pie.

Immediately after receiving the news:
Avoid thinking about the news for a few days. Imagine cutting yourself on a kitchen knife. The wound may sting and bleed initially. But the blood clots and the stinging diminishes if you cover the wound and don’t aggravate it.
After experiencing a failure or rejection, invite a colleague to lunch, and ask about their recent reading and travels. Dive into a project that will absorb you. Visit a museum, or watch a movie. Remove the letdown from your thoughts.
Consider seeking input from a mentor, especially if you’ve never experienced a failure or rejection of this type before. Mentors have more experience and so can put obstacles in perspective. For example, suppose you’re a student who’s received an upsetting referee report from a journal. The report might look mild to a faculty member, who’s likely received far more, and more-upsetting, reports. What sounds harsh to you might sound quotidian to an advisor, whose lack of distress might reassure you.
Also, mentors notice upsides that you’ve overlooked. Perhaps the referee trashed your presentation of an idea but tacitly approved of the idea itself. The trashing might have drowned out the approval during your reading. Yet the approval could justify a resubmission to the journal, upending your belief in the cause’s hopelessness. Relatedly, a mentor can help you identify strategies for moving past the failure. Maybe this journal won’t publish your paper but another journal is soliciting contributions to a relevant special collection.
As a PhD student, I applied for an internship at a company developing a quantum computer. I broke an obligation and flew out of state to interview for the position. No offer materialized. To process the outcome, I spoke with a more-advanced researcher I trusted. He not only offered a mature perspective, but also had access to inside information. The team liked me, he reported, but my interests didn’t overlap with theirs enough. In a sense, I’d grown too independent. As independence marks maturity in PhD students, the rejection came to double as a compliment. Today, I remain on friendly terms with multiple people from that team, and a student of mine just landed an internship at a quantum-computing company.

Chart a path past the failure or rejection. Cue the lemon meringue pie. Did a hiring committee reject an application of yours? Email the committee’s chair, thanking them for considering your application. Express your hope of improving your materials so that you can try again the following year. Ask if the chair would provide feedback via phone or Zoom.1
Pursue the path you’ve charted. To ease the burden, consider undertaking lighter tasks before more-arduous ones. I recently received a laundry list of requests about a manuscript from an editor—and when I say laundry, I mean the equivalent of hauling a multiple-pound bag to a stream, scrubbing everything by hand while the wind attempts to blow cleaned items into the mud, and then hauling everything back. I began with the tasks that required little effort. Dispatching them, I crossed about half the requests off my to-do list. The list looked more manageable as a result; I felt better-equipped to handle the trickiest work.
Celebrate your triumphs. Don’t let failures and rejections consume your attention; carve out time for your successes. If you recognize them, you’ll remember them the next time you face failure; they’ll cushion you when you fall.
I recently failed at a task over a hundred times (yes, I counted). After 28 months of trying, I succeeded. I celebrated by treating myself to lunch at the National Gallery of Art.2 Why not gather ye rosebuds while ye may? This success afforded me the opportunity to fail at a new task.
Always remember, what matters in the long run is not any one failure or rejection, but your persistence despite failures and rejections. I progressed to the Rhodes Scholarship competition’s final round two years in a row. Each year, a committee interviewed and rejected me. I’d poured time, sweat, and blood into my applications and interview preparations. I felt like I had no more blood to squeeze into the applications I then had to write for graduate programs because of those rejections. Those rejections, though, led me to become a student of John Preskill’s. Over a decade later, I’m not kicking myself.
Share about your favorite failures in the comments section below!
1Not via email, which doesn’t offer the same freedom.
2Its Garden Café had enticed me for years, but I can rarely bring myself to pay for food when I can prepare it myself.
![]() |
Heavy fermions, adapted from here. (a) At high temperatures, the conduction electrons are not well coupled to the unpaired local 4f moments. (b) At low enough temperatures, Kondo scattering hybridizes the f electrons with the conduction electrons, boosting the carrier density. (c) The hybridized energy-momentum relation is much flatter near the Fermi energy leading to a large effective mass. |
Hi Doug:An addendum to your very nice post on heavy fermions, to draw attention to what I think were important experimental results: Frank Steglich’s 1979 Phys. Rev. Lett. 43, 1892–1896 reporting superconductivity in CeCu2Si2 and Louis Taillefer and Gil Lonzarich’s 1988 determination of the quasiparticle mass and fermi surface in UPt3.Prior to Steglich’s paper we knew that some rare earth/actinide intermetallics (e.g. CeAl3) had a very enhanced specific heat coefficient at low temperatures and that the entropy implied by this specific heat was derived from the magnetic moments of the rare earth ions. But while it was plausible, there was no direct evidence that this enhanced specific heat was associated with heavy-mass fermions, so the physical relevance of the Kondo lattice concept remained uncertain.Steglich observed that in CeCu2Si2 the specific heat jump at the superconducting transition (which in BCS theory is basically the same size as the electronic specific heat at Tc) was about as big as the normal state specific heat coefficient, thus showing that the spin entropy had been transmuted into something that could go superconducting. Then (I think in subsequent experiments) Steglich observed that the rest of the superconducting thermodynamics in Cecu2Si2 was also consistent with pairing of heavy mass entities. This, I believe, is what convinced everyone that the spin entropy from the rare earth moments had been converted into heavy mass electrons—in other words, that the lattice Kondo effect was real.A few years after this, Louis Taillefer and Gil Lonzarich’s quantum oscillation study of UPt3 (Phys. Rev. Lett. 60, 1570 ) showed indeed that the U-f electrons (which appear as local moments at higher temperatures) were included in the Fermi surface at low T and had heavy masses, providing direct experimental confirmation of the Kondo lattice concept.CheersAndy Millis
I’ve been playing around with Claude Code (Claude Opus 4.7 (1M context)) because, well, who hasn’t?
I am, so far, only moderately impressed by its abilities in physics. The most impressive bit so far was when, in response to a question about nilpotent orbits, it responded
I don’t know. I could waste your time by guessing, but …
I had never had an LLM tell me that it doesn’t know something, much less that it didn’t want to waste my time by making stuff up. So this was positively shocking to read.
Claude Code is, however, unfathomably good at generating code, so I set it the task of modernizing Instiki and Heterotic Beast, my forum software. Both are Rails applications and both have extensive test suites. So they use a software framework Claude is familiar with and have an objective standard for whether the changes made are correct.
When there is no test, however, things can go wildly off the rails (pun intended). For instance, consider the following snippet of Ruby code
def foo(text)
...
con = text
...
(now mutate con)
...
con
end
If text is a frozen string, this will generate an error, as you can’t mutate a frozen string. Obviously, what you should do is write
def foo(text)
...
con = text.dup
...
con
end
which copies the caller’s string to a new unfrozen string which you can mutate to your heart’s content.
What did Claude do?
def foo(text)
...
con = text.encode
...
con
end
which also produces a new unfrozen string, transcoded from the caller’s encoding to Encoding.default_internal (which turns out to be nil). This is both (a) nondeterministic and (b) blows up spectacularly when text contains astral plane characters, like “𝔸”. I had to tell Claude not to do that, and to write some tests to check that astral plane characters are handled correctly.
Still …
I would set Claude the task of rewriting this blogging software, but alas I don’t have a test suite to compare with.
What I really should do, though, is find some physics I would trust it to work on.
I’m taking a Danish exam next week, and it’s a big one, a culmination of years learning the language. My classmates are stressed. Despite how much we’ve learned, it feels like we’re always making little mistakes. We write the wrong prepositions, put verbs in the wrong form, or mess up the order of words in a sentence. And while we should have time to check our work, that doesn’t help as much as it should. If we don’t notice a mistake the first time around, what guarantee is there that we notice it on the next read, or the next? Too many checks and we can even end up second-guessing ourselves, “correcting” something that was right to begin with.
It’s given me some sympathy for AI.
Earlier this month, investor Marc Andreessen posted a custom prompt he inputs when using AI, which was immediately mocked.
The silliest instruction, according to many critics, was to “Never hallucinate or make anything up.” It’s similar to a prompt that’s become a meme used to make fun of AI-using “vibe coders”, “Make no mistakes”.
Experts point out that this is just not how AI works. Large language model-powered programs like ChatGPT are inherently random, producing text largely based on its similarity to other text. “Hallucinations” or “mistakes” are an inevitable feature of the technology, and a prompt like Andreessen wrote isn’t a set of instructions the AI will follow without error: it’s just another part of the text the AI is trying to generate.
All that said, telling an AI to “make no mistakes” should have some effect. But it likely won’t be what you want.
The best way I’ve found to understand AI is in terms of stories. Chatbots like ChatGPT take a large language model, a mathematical formula for how words are most likely to appear in a text, and warp it, twisting it to almost always produce one particular kind of text: one half of a dialogue with a fictional AI assistant. This twisted formula determines how the AI responds to your prompts, but these days it also is used behind the scenes, in a kind of structured soliloquy called a “chain of thought”. You can think of the prompts you send to the AI as a preface to those soliloquies, and imagine the AI telling stories of a sort that would typically follow that preface.
So if you tell an AI “make no mistakes” or “do not hallucinate”, you’re making it more likely to generate the kind of story that begins, “the AI was instructed to make no mistakes”.
Let me put it this way, Mr. Amor. The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error. – HAL 9000, “2001: A Space Odyssey”
You’d expect this to affect the chain of thought. For example, the AI might occasionally pause to say “I’m supposed to make no mistakes, so I should check this. What could have gone wrong?” and then list something that plausibly could be wrong with its idea. If this happens often enough, you’ll probably catch some real problems.
But I’m reminded of my classmates, practicing for that Danish exam. We can go over the text again and again, asking if this thing, or that, might be wrong. We can try again and again to use our mental model of the Danish language, seeing if this time it catches a new mistake. But there are things we won’t catch. And if we do it too much, we’ll second-guess ourselves out of the good answers, too.
Ultimately, “make no mistakes” isn’t a great instruction, either for humans or for chatbots. And its use by people like Marc Andreessen has me wondering if they are used to interacting with humans in the same way, as tools that keep making mistakes no matter how many times they’re instructed not to, requiring more and more long-winded instructions and yet continuing to misbehave.
Then again, that may be a mistake on my part.
Boris Alexeev, Kevin Barreto, Yanyang Li, Jared Duker Lichtman, Liam Price, Jibran Iqbal Shah, Quanyu Tang, and I have just uploaded to the arXiv our paper Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond. This paper (which is a work in progress) represents our efforts to digest and document the recent flurry of developments around the following problem of Erdős, Sárközy, and Szemerédi on primitive sets:
Conjecture 1 (Erdős problem #1196) Suppose thatis a primitive set of integers, which means that no element of
divides another. Then
as
.
One can show that the upper bound of is best possible up to the
error by taking
to be the set of products of
primes for some suitable parameter
. This was one of the most well-known open problems in the study of primitive sets, and had attracted some number of partial results (for instance, Lichtman was able to show the upper bound of
). It was thus notable that this problem was first solved by an autonomous AI query (by the fifth author) a few weeks ago. This solution introduced a proof technique – based on Markov chains in the divisibility poset – which in retrospect is very natural for controlling primitive sets, but which had not been explicitly used in previous literature, though in retrospect many of the arguments in that literature involved a specific Markov chain which we call the downwards Mertens chain. The proof instead revolved around a different Markov chain, which we call the downwards von Mangoldt chain, which manages to neatly avoid the “
” type losses in the previous Mertens-based arguments, and resolve Conjecture 1. In this paper we develop the Markov chain approach more systematically, and show that it settles several further conjectures concerning primitive sets, and also provides simpler proofs of some previous results in the literature. More precisely, in addition to Conjecture 1, we establish the following:
Theorem 2 (Erdős primitive set conjecture, #164) For any primitiveconsisting of numbers greater than
,
Theorem 3 (Odd Banks–Martin) Letand suppose
is a primitive set consisting of odd numbers with at most
prime factors. Then
where
![]()
denotes the primes appearing as factors of elements of
, and
is the collection of products of
primes from
.
Theorem 4 (is Erdős-strong) If
is a primitive set consisting of even numbers, then
Theorem 5 (Ahlswede–Khachatrian–Sárközy) Ifis a primitive set, then
whenever
.
Theorem 6 (Erdős–Sárközy–Szemerédi, #1217) Letbe such that the upper doubly logarithmic density
is positive. Then there exists a strictly increasing infinite divisibility chain
in
such that
Theorem 2 and Theorem 5 had been previously established by Lichtman and Ahlswede–Khachatrian–Sárközy respectively, but the Markov chain formalism gives shorter (and more unified) proofs of both. Theorems 3, 4, 6 were open conjectures that can now be settled by this method. These results were obtained with varying levels of AI involvement, ranging from completely autonomous AI queries to traditional pen-and-paper calculations, to various hybrid approaches (for instance, with humans suggesting key inequalities that could then be rapidly tested numerically or even proved by various AI tools).
— 1. Chain/antichain duality and Markov chains —
I’ll now discuss the basic method of proof and try to motivate the main ideas, which have become much clearer in retrospect. Primitive sets can be viewed as antichains in the divisibility poset , in which the partial ordering is given by the divisibility relation
. So, one can pose the following more abstract question: given a general poset
and a weight function
, what is the maximal value of
as
ranges over all antichains in
?
One can attack this problem using the well known duality between antichains and chains (totally ordered subsets of
): every antichain and chain can meet in at most one point, thus one has
In fact this duality is completely tight:
Proposition 7 (Chain/antichain duality) Letbe a poset, let
be a weight function, and let
. Then the following are equivalent:
- (i)
for all antichains
.
- (ii) There exists a measure
of total mass at most
on the space of chains, such that (1) holds for all
.
Proof: We have already indicated how (ii) implies (i). Now we need to show that (i) implies (ii). A standard compactness argument allows us to reduce to the case when is finite. If (i) holds, then we also have
Thus, the “universal” problem of obtaining a uniform upper bound on for all antichains
is replaced with the equivalent “existential” dual problem of exhibiting a single measure
on chains of controlled mass, which hits each element
of the poset with a mass of at least the original weight
. Thus, such problems are now reduced to that of finding a sufficiently clever construction of such a measure
. If the mass of
was normalized to equal
, this becomes a probability problem: find a random chain process in the poset
that hits each element
of the poset with a sufficiently high probability. (Though in our paper we found it more convenient for technical reasons to not normalize the measure, and allow the mass to take values other than
.)
It turns out (in a manner that was not explicitly appreciated in past literature) that particularly good choices of random chain to use here can come from Markov chains. (Here, the term “chain” is now being used in two different ways, but fortunately the order-theoretic concept of a chain and the Markov process-theoretic concept of a chain will be quite compatible in this discussion.) There will be two types of Markov chains on the poset that will be relevant: downwards Markov chains and upwards Markov chains. Here is our notation for a downwards Markov chain:
Definition 8 (Downwards Markov chain) Letbe a poset, and suppose we designate some subset
of
to be the “absorbing states” (in practice these will be the minimal elements of
, although they do not have to be). A downwards Markov chain on
with absorbing states
is a collection of transition probabilities
for
obeying the following axioms:
(Thus for instance one has
- (i)
, with equality unless
and
, or if
and
.
- (ii) For any
, one has
.
for any absorbing state
.)
Given such a downwards Markov chain and an initial state , one can generate a random decreasing sequence
by having each
transition to
with probability
after conditioning on the past history
of the chain. This sequence will (almost surely) be strictly decreasing until it hits an absorbing state, in which case it stays there forever, although if the descending chain condition is not satisfied it is also possible for the sequence to be strictly decreasing indefinitely. We let
denote the law of this decreasing sequence. This construction is already enough to recover the Lubell-Yamomoto-Meshalkin (LYM) inequality:
Theorem 9 (LYM inequality) Ifis the power set of
with the inclusion partial ordering, and
is an antichain in this poset (i.e., a Sperner family), then
Proof: We introduce the downwards Markov chain with absorbing state in which each non-empty subset of
of some cardinality
transitions to a
-element subset chosen uniformly at random (i.e., with probability
of transitioning to each). If we start the descending sequence from the maximal element
of
, then one can easily check that each
is hit with probability
. Applying Proposition 7 with
,
, and
, we obtain the claim.
In the above argument we fixed the initial location of the Markov chain, but more generally one can start with any source mass
and work with the measure
for the purposes of applying Proposition 7.
One can also define upwards Markov chains in exact analogy with downwards Markov chains
(reversing the order in the poset), which now generate random increasing sequences
rather than decreasing sequences. There is a useful adjoint construction that can convert a downwards Markov chain into an upwards Markov chain: if we have a positive weight
which is invariant under the chain in the sense that
A downwards or upwards Markov chain, when equipped with an invariant or sub-invariant measure, also induces a flow network on the poset, in which an edge from to
is assigned a flow capacity of
— 2. The Mertens and von Mangoldt chains —
For the purpose of analyzing primitive sets, there are two downward chains on the natural numbers (with absorbing state )that, in retrospect, are particularly natural to use:
The von Mangoldt weight is a natural choice here thanks to the fundamental identity
The Mertens chain generates deterministic downward divisibility chains
The key innovation (which was uncovered by the AI-assisted proofs, though not quite in the notation and framework presented here) is to switch to the von Mangoldt chain, which removes the bias towards numbers whose largest prime factor is small. Indeed, the weight now turns out to be sub-invariant (after removing
) under this chain, and there is a modification
One slight defect of the von Mangoldt chain, as compared to the Mertens chain, is that it can “jump over” primitive sets (such as the set of products of primes) due to the fact that it will sometimes multiply or divide by a power of a prime rather than a prime itself. This turns out to be a technical difficulty for many of our applications, resulting in a need to make various small ad hoc modifications to the von Mangoldt chain to eliminate this type of jump.
In order to establish some crucial sub-invariance properties, it turns out (after some standard manipulations) to be useful to obtain good bounds on the negative log-derivative
— 3. Further directions —
Will Sawin and Ofir Gorodetsky have obtained analogues of several of the above results for function field models or permutation models respectively; we briefly discuss these in the paper, although we do not plan to cover these models in depth. We also note another recent use of this technique to solve a separate Erdős problem (#858) relating to antichains in a variant of the divisibility poset.
The zeta process that we have discovered hints at an emerging theory of the “developmental anatomy of integers”, which differs from the existing topic of anatomy of integers in that it views a large integer (and its prime factor “organs”) not as a static entity, but rather as an evolving process in which primes (or powers of primes, in the case of the von Mangoldt process) are added or removed to the integer over time. With this perspective, primitive sets can be viewed as singular moments in such a developmental process, which are only encountered at most once in the life cycle of a given integer. It seems of interest to study this developmental perspective further.
The paper is currently a work in progress; we have released an early version due to the public interest in this problem. We plan to explore some further applications, and also to formalize more of the above results in Lean (currently two of the six main theorems are formalized), before submitting the paper for publication. The situation here highlights a distinction I have recently made between three components of the problem solving process in mathematics, namely proof generation, proof verification, and proof digestion. In this particular case, the first two steps were extremely rapid due to modern AI tools; however, properly digesting the AI-generated proofs into a coherent exposition that places the arguments in context with both past literature and future directions remains a slower process that requires expert human attention.
No more NYT cooperation: my dog-rape red lineOver the years, I’ve written two op-eds for The New York Times about quantum computing, at the NYT editors’ invitation:
I’ve also visited the NYT office and helped NYT reporters with numerous stories about quantum computing and beyond. In the wake of Cade Metz’s infamous NYT hatchet job against Scott Alexander and the rationalist community, I resolved no longer to talk to Metz, but it never even occurred to me to extend that to a broader ban against the NYT itself. After all, it’s the friggin’ NYT!
This week, however, Nicholas Kristof—a man who I praised fulsomely in a blog post 20 years ago, for his coverage of the genocide in Darfur—has used the NYT to broadcast the oldest, crudest form of antisemitic libel, accusing the Jews of garishly preposterous crimes (poisoning wells? baking blood into matzo? in this case, training dogs to rape prisoners). As countless others have since pointed out, the sole source for this ludicrous accusation was a Hamas-linked organization called “Euro-Med” that praised the October 7 “martyrs.” Kristof’s piece came out the same day an Israeli human rights organization released a major report meticulously documenting Hamas’s mass rapes on October 7–something that did happen—and was apparently designed to neutralize the impact of that report.
While the debunking of Kristof came fast, it wasn’t fast enough: now that antisemitic blood libel (dog libel?) has the Gray Lady’s imprimatur, I expect it to ignite violence against Jews all over the world, and I expect my kids to be less safe. So I hereby announce:
I, Scott Aaronson, member of the National Academy of Sciences, will no longer cooperate with anyone from the NYT on anything—neither quantum computing stories nor anything else—until the NYT, at minimum, formally retracts its dog-rape claim and fires Kristof.
To my friends doing good science and technology writing for NYT: I’m sorry, I hope you understand, and please fight for what’s right within the organization.
I say “bare minimum” because I hope this scandal causes a major reckoning for the NYT as an institution. I don’t know if any libel or other laws were broken, but if they were, I hope Kristof personally and the NYT as a whole get sued for whatever they’re worth.
Luckily, others have already said most of what I would’ve wanted to. Here’s Eli Lake, in a passage that part of me wishes I’d never read:
Let’s start with what is known about the biology of male dogs. Their penises are small and thin. They become erect only when they smell the pheromones of a female dog in heat. Brandon McMillan, the three-time Emmy-winning host of CBS’s Lucky Dog, who has spent 25 years training animals, told me he had never heard of a dog who was trained to rape a human being and doubted this was possible.
“When a female is in heat, the pheromones released carry it to the male canine,” McMillan said. “That’s how they reproduce and the miracle happens. I don’t see how you would train a dog to do that. The dog has to get turned on, for lack of a better word.”
On the NYT’s “vetting process” (or lack thereof), here’s comedian Jeff Maurer:
Imagine that you’re a fact checker or editor at The New York Times. You know damn well that you’ve nabbed a lifeboat while most of your field is being picked apart by hagfish at the bottom of the North Atlantic. If you’re a lowly fact-checker, you might be in your 20s and desperate to rationalize your choice to enter a field that might as well be buggy repair. I‘ve had similar gigs and been in similar environments; I feel like I can get in these peoples’ heads. And that’s why I’m convinced that the dog claim should have made things easier for the fact checkers.
One day, Nicholas Kristof — the award winning columnist who has been with the Times since it was Farmer Sulzberger’s New Amsterdam Seed Catalogue — comes to you, the lowly fact checker. He’s working on a big piece, and it’s scorching hot: It’s sexual assault plus Israel/Palestine, which is like detonating a nuclear weapon inside a volcano. You know it will be very tough to say “I don’t find this source credible” or “we can’t run that without more substantiation”. And that’s true even before we consider that you probably run in a social circle in which Israel is thought of as a substantially-more-evil version of The Borg Collective from Star Trek (which is why they have those vaporizing weapons).
But you do notice that Kristof’s sources are thin. In fact, he only has two named sources. The first source tells him a story that has substantially changed since he told it to The Washington Post two years ago. The second source celebrated the October 7 attacks and previously made torture claims that he later recanted. Much of the testimony contains no detail that could be corroborated. The organizations and journalists that Kristof relies on are highly dubious and engaged in the mutual citation circle jerk that is typically the way that lies enter the zeitgeist. There may be some truth in there somewhere, but Kristof’s methods are — what’s the journalistic term? — total ass. But you’re afraid that if you question things, you’ll be thought of as pro-rape, or worse yet: pro-Israel.
And in that context, the dog story is a godsend. It frees you from addressing the broader problems with the article, because you can pick the battle of focusing on the most-outrageous, least-believable claim. The article doesn’t need the dog rape detail — it’s already full of shocking-if-true claims. Striking that one section lets you maintain a sheen of professionalism, and when the thing explodes, you also get to say “You should have seen what he wanted to print”.
The fact that that didn’t happen tells me that the level of scrutiny for claims about Israel at the Times is absolute zero. Even if some Israeli somewhere did manage to turn his dog into a sex criminal, there is not nearly enough evidence for that claim to run in the Times. It’s an unsubstantiated rumor, and a vile one. I think that a person who believes that without compelling evidence is showing evidence of a damaged mind. And I think that a newspaper that publishes that without compelling evidence is showing evidence of a damaged vetting process, or perhaps a vetting process that — when it comes to claims about Israel — doesn’t exist at all.
Last but not least, here’s Haviv Rettig Gur, with righteous anger entirely appropriate to the situation, yet also a determination to expose and fight prisoner abuse, which is a genuine problem in Israel as it is in pretty much every other country on earth:
No, dogs aren’t being trained to systematically rape prisoners, you nattering halfwits. And no, Hamas propaganda operatives are not reliable sources on the question of Israeli crimes. The vast, vast majority of soldiers are honorable men who walked into fire so our families may live. The whole world may turn on them; I will stand with them, grateful for their sacrifice. And Kristof, a willing purveyor of propaganda happily feigning that he can’t see the water and thrilling to a moral crusade engineered by would-be genocidaires he pretends not to understand — is no messenger of moral reckoning.
But friends, so fucking what. Let the narcissistic guttersnipes strut their moral emotions before the world, let the UN publish endless reports that don’t hold up to basic scrutiny, let the NGOs dream their rabid, sick dreams that no journalist ever fact-checks — yes, they’re lying. But so fucking what.
We still, for ourselves — because fuck them — must see that it isn’t all fake. The problem [of abuse in Israeli prisons] is real. It’s far smaller than they claim, but real nonetheless. And when discipline and morality break down, it can only get worse. We either crack down now or we watch it fester and grow.
And our own Ben Gvirs are stubbornly refusing to fix what is actually broken, the real thing in the real world.
And so we are caught in a strange sort of vise, the same vise we find ourselves in with the genocide lie: A vast propaganda machine that seeks to destroy us — countless activists too high on their own self-regard to see the irony of raging against a ‘genocide’ while calling for the erasure of a people — all while our own incompetent, venal, self-absorbed political class insist in their mindless chatter on confirming every claim of our enemies for sheer, bald egomania.
I’m sick of it all. I know you’re all sick of it too.
And that, in a nutshell, is what I think about this.
Just because they’re lying, just because a vast perfidious campaign has overwhelmed global elites in a bid to clear the way for our removal, just because they’re still, after two millennia, building their visions of redemption on The Evil Jew — doesn’t mean there isn’t also, separately, a problem on the ground.
So what do we do now? Simple. We see it, we acknowledge it’s happening, we bring our rage to our inept leaders until they bend to our will and act to stop the breakdown…
And we soldier on.
We soldier on because the enemy really is coming to murder us. Because Hamas must still go if Gaza is ever to rise to a new day. Because Hezbollah will yet destroy Lebanon on the altar of destroying us. Because the ayatollahs built their whole damn religion on the extermination of our children.
We fix the broken things within us as if the pogromists and their simpering Kristofs don’t exist. We owe no answers to the propagandists who seek to clear a path to our deaths. But we do owe answers to ourselves.
Let the screaming mob rage and churn like so much sea-foam. Despite that raging mob, despite the enemy who still seeks our destruction, and yes, despite feckless incompetents like Ben Gvir, our minister of prisons, who claim to lead us — we remain the strongest, freest Jews who ever lived, more capable and committed than our self-destructive enemies ever imagined. And the task is still before us, yet to be completed, the sacred duty given to our generation to ensure our children don’t have to face the genocidaires who now surround us. We do not waver, we do not stumble. We soldier on.
Because fuck them all.
Inspired by Haviv’s energy, I’m leaving the comments on this post turned off. Why? Because outside the dog-rape libel, I was having a pretty good week! On Monday I visited Quantinuum in Colorado and got a personal tour of Helios 1, considered the all-around most powerful quantum computer on earth at the moment. Now I’m visiting Stony Brook University, where I’ll give two talks and meet lots of interesting people. Then I’m headed to NYC for the annual meeting of Quanta magazine’s advisory board.
Right now, then, I have neither the time nor the inclination to battle the infinite army of brain-eaten zombies who arrive for every single post touching on Israel. My life will be better this way.
The Trevisan Award and the Decimal Digits of Powers of 2WHOA … I’ve won the inaugural Luca Trevisan Award for Expository Work in Theoretical Computer Science! This has a particular meaning for me as someone who knew Luca Trevisan as well as I did for 25 years — who had him as a professor and thesis committee member, whose blog bounced off his blog, who benefitted tremendously from his expository work in TCS — before Luca tragically succumbed to cancer two years ago.
As I told the committee, receiving this award makes me want to use my blog for more actual CS theory exposition, and less venting, in order to retrospectively become worthier of such an honor.
I’m ridiculously grateful to the entire TCS community — my people, my homies — for tolerating me to do what I do.
If you’re curious, here’s the official citation:
The inaugural Luca Trevisan Award for Expository Work in Theoretical Computer Science is awarded to Scott Aaronson for his sustained and high-impact inspirational efforts to explain and promote our field to broad audiences. His blog Shtetl-Optimized has hosted remarkably frequent and elaborate posts over more than two decades, and has become a central meeting place for wide-ranging conversations across the TCS and Physics communities. Scott Aaronson’s blog posts contain crystal-clear, informative expositions of exciting new results, calibrated evaluations of technological claims, and profound analyses of topics in these fields and (way) beyond. The uniquely enthusiastic and witty style of his writings (including his book Quantum Computing Since Democritus and his other lecture notes and surveys), lectures, and interviews have made him a top invitee for both popular and professional appearances, attracting large audiences. These qualities have inspired many students to enter the field, and made Scott Aaronson a go-to person for journalists and scientists alike looking for a definitive word on the latest scientific activity in TCS and quantum computing.
In the rest of this post, I’m going to start practicing what I preached—y’know, about turning over more of this blog to actual exposition, of a kind that the Trevisan Award could plausibly be meant to encourage. I’ll start with something that’s been on my back burner for the past couple months: namely, the (lightly edited) transcript of a talk that I gave this spring at UT Austin’s undergraduate math club. So, without further ado, and in memory of Luca…
On the Decimal Digits of Powers of 2
by Scott Aaronson
Hi! I’ve given six previous talks here at UT’s math club, some on relatively “important” topics—Gödel’s Theorem, time travel, Huang’s proof of the Sensitivity Conjecture, and so on.
Today, I want to talk about an unimportant question, one that my son Daniel, who was then 8, and who’s sitting here in the front row (along with his sister Lily), asked me a few months ago.
Daniel asked: which powers of 2 can you double without needing to carry digits? Clearly 1, 2, 4, 32, and 1024 all have this property, their doubles being 2, 4, 8, 64, and 2048 respectively. Are there any others?
Since I happen to have the powers of 2 up to 220 = 1,048,576 committed to memory since childhood, I confirmed that there were no other examples up to there: 128, 256, 512, 2048, etc. all require carries. So I told Daniel: I can’t find any other example, and on that basis, I conjecture that there aren’t any more. But if that conjecture is true, I don’t know if it will ever be proven, by humans or even AI!
Then I googled it, and saw that this is a known question (not very well known, but there’s a StackExchange post about it). And indeed it had been checked up to 2millions, and no other counterexample had been found.
Why did I become confident so quickly that yes, 1024 is probably the last example of a power of 2 that can be doubled without carrying?
Because of the heuristic that the decimal digits of 2n are more or less “random,” apart from various constraints that are irrelevant here (like that the last digit always cycles among 2, 4, 6, and 8). And 2n has about n/log210 decimal digits. Since only 0, 1, 2, 3, 4 can be doubled without carrying, the probability of 2n being a counterexample should therefore be about $$ (\frac{1}{2} )^{n / \log_2 10}. $$
So, if we’ve already checked up to (say) n=1000, then the probability of a larger counterexample should be at most
$$ \sum_{n=1001}^{\infty} (\frac{1}{2} )^{n / \log_2 10} $$
which, when we sum the geometric series, is exceedingly close to 0.
Ah, but why did I say that I don’t know if the conjecture will ever be proven? Because it seems to belong a large class of similar statements, none of which mathematicians have had any idea how to prove.
Variant of a conjecture by Jeffrey Shallit: 65,536 is the only power of 2 that has no power of 2 among its decimal digits.
Freeman Dyson’s conjecture (2005): There’s no power of 2 for which, when you reverse the decimal digits, you get a power of 5.
Paul Erdös’s conjecture (1979): For every n≥9, there’s at least one ‘2’ in the base-3 representation of 2n.
Or looking even more broadly:
Conjecture: The decimal expansion of π is not all 6’s and 7’s after some finite point.
(This would follow from the stronger conjecture that π is base-10 normal—that is, that every finite pattern of decimal digits occurs in it with the limiting frequency you’d expect.)
Or:
Conjecture: π+e is an irrational number.
What all the above conjectures have in common, and what I find so fascinating about them, is that they seem hopeless to prove for exactly the same reason why they seem almost certainly true. Namely, they all seem to be true “merely” because it would be too insane of a coincidence were they false!
The trouble is, that’s not the sort of reason that seems amenable to turning into a proof. Fermat’s Last Theorem is an interesting exception that proves the rule here. That xn+yn=zn has no nontrivial integer solution for n≥3, did seem almost certainly true on statistical grounds for n≥4 (and for the n=3 case, a proof goes back to Euler). And of course, FLT was ultimately provable. But Wiles’s eventual proof exploited a lucky connection between the Fermat equation and deep, fancy things like modular forms and elliptic curves. At no point did the proof formalize the statistical argument that a 12-year-old could understand, for why the theorem is “almost certainly true.” It simply had nothing to do with the statistical argument.
So then, if you wanted to prove conjectures like my son Daniel’s, or like Shallit’s or Dyson’s or Erdös’s above, the question would be: could these “recreational” problems about base-10 representations ever be connected to anything similarly deep? Right now, it’s very hard to see how they could.
Still, all hope is not lost! Here’s a striking theorem that I learned about when I researched this:
Theorem (James Maynard, 2016): For every digit a from 0 to 9, there are infinitely many primes with no a’s in their base-10 representation.
The proof uses heavy Fourier-analytic techniques. Likewise, presumably there are infinitely many primes that you can double without carrying—2, 3, 11, 13, 23, 31, …—because the primes are much denser than the powers of 2! And presumably there are infinitely many primes that are missing any two decimal digits, or even whose decimal digits consist entirely of 0’s and 1’s. But Maynard’s techniques are not yet powerful enough to prove such things.
Even though I promised a topic of no importance today, I can’t resist pointing out a potential connection here to one of the biggest questions on earth right now, and something that’s professionally interested me for the past four years: namely, the question of how to align powerful AIs with human values and prevent them from destroying the world.
Paul Christiano, and the Alignment Research Center in Berkeley that he founded, have developed a whole program for how to make AI safe that depends on the possibility of formalizing “heuristic arguments”—that is, the kinds of arguments that convince us that the above conjectures are all almost certainly true, even without proofs of them.
The intuition here is that we’ll never have a rigorous proof that, for example, a real-world neural network will behave safely under all circumstances—it’s just too complicated. The best we can hope for is an argument that, e.g., “for this model to scheme against humans would require a crazy unexplained coincidence in its weights.” But how can we hope to formalize such arguments? As baby test cases, can we at least formalize our intuitions for why π is normal, or why Daniel’s conjecture is true, in some principled way?
ARC has tried: there’s a 2022 paper by Christiano, Neyman, and Xu on “Formalizing the presumption of independence.” But it’s tricky, and ARC itself would be the first to agree that the existing results are unsatisfying. How do we even formalize the intuition that, for example, you should be willing to bet at even odds against the 10100th digit of π being a 5?
In the rest of this talk, I’d like to circle back to Daniel’s original question about powers of 2, and show you some things that can be proved about it—with thanks to Greg Kuperberg and my other friends on Facebook, and in some cases to GPT5Pro.
Let’s start with the following easier question. Is there a power of 2 whose decimal digits start with 31415? Or with the complete works of Shakespeare, encoded by letter values in some suitable way? Or with a googolplex digits all of which can be doubled without carrying (as Daniel wanted)?
I claim that the answers are yes, yes, and yes! How do we prove this?
The key fact we’ll use is simply that log102 is an irrational number. (If you don’t remember the proof: suppose log102 = a/b. Then 10a/b = 2, so 10a=2b. But this has no integer solutions other than a=b=0.)
Suppose we want k as a prefix, where 10d-1 ≤ k ≤ 10d. Then we want integers n,r such that
$$ k 10^r \le 2^n \le (k+1) 10^r, $$
i.e. taking the base-10 log,
$$ \log_{10}k + r \le n \log_{10}2 \le \log_{10}(k+1) + r. $$
In other words, the fractional part of n log102 needs to lie between the fractional part of log10(k), and the fractional part of log10(k+1) (where again, k is given).
But now we can appeal to the following Key Fact: if α is any irrational number, then the set
{the fractional part of nα : n∈N}
is dense in the interval [0,1]. Or equivalently, if I rotate around and around the unit circle by 2απ radians each time, then if α is irrational, I’ll eventually get arbitrarily close to any given point on the unit circle.
Why is the key fact true? Just the pigeonhole principle! Clearly, for any ε>0, the fact that α is irrational means that there must be two points, xα and yα, whose fractional parts are distinct yet closer together than ε. But then, by adding multiples of (x-y)α, we can get our fractional part to be ε-close to anything in [0,1].
And to sum up, this is why there must be a power of 2 whose decimal representation starts with the complete works of Shakespeare, or with a googolplex carry-free digits! (Indeed, from the above discussion, we could even extract an efficient algorithm for constructing that power of 2.)
So much for the first decimal digits of 2n. Now let’s look at 2n‘s last decimal digits!
Here there are some complications, arising from the twin facts that
(a) 10 is composite, and
(b) 2 is one of its factors.
But we can deal with those complications!
For starters: what are the possible last decimal digits of 2n?
1, 2, 4, 8, 6, 2, 4, 8, 6, …
So, there’s an initial 1, but then we cycle forever through the even nonzero digits.
What about the last two digits of 2n? If you’ve never tried this before, it’s instructive to work it out:
01, 02, 04, 08, 16, 32, 64, 28, 56, 12, 24, 48, 96, 92, 84, 68, 36, 72, 44, 88, 76, 52, 04, …
So, there’s an initial 01 and 02, but after that, we cycle forever through 20=4×5 possibilities, namely all the possible multiples of 4 whose last digits are nonzero.
You can check that the general pattern is: the last k decimal digits of 2n have an initial segment that looks like 001, 002, 004, 008, …, 2k-1. And then there’s an eternal cycle of length 4×5k-1, where the last digit can be any of 2,4,6,8, while every other digit can either be any possible even digit or any possible odd digit, depending on the digits to its right—in (if you want to say this a fancier way) a recursively defined embedding of the powers of 2 mod 5k into the cyclic group Z/10k. So, there’s an initial “runup” as you fill out all the needed powers of 2, but then once that’s done, you just cycle around forever in an embedding of Z/5k into Z/10k, because
(a) 2 happens to be a primitive element of Z/5k for any k, and
(b) 5 divides 10.
So in particular, and relevant to Daniel’s conjecture: there exists a power of 2 whose last googolplex digits can all be doubled without carrying. Why? For the last digit, you can pick 2 or 4. Then, for the last googolplex digits but one, you can pick 1 or 3 for those constrained to be odd, and 0, 2, or 4 for those constrained to be even. Lots of choices that work!
So, we can avoid carries in the leftmost digits of 2n, we can avoid carries in the rightmost digits … so that “merely” leaves all the digits in the middle, where who the hell knows! Empirically, the digits seem to pass every standard randomness test that you can throw at them. So in particular, e.g., the fraction of the digits that are in {0,1,2,3,4} seems to converge inexorably towards 50%, so that it’s extremely plausible to conjecture that the fraction is less than 49% or more than 51% for only finitely many values of n. But of course, that’s presumably even harder to prove than Daniel’s original conjecture.
OK, last topic. Suppose we want to program a computer to check Daniel’s conjecture up to 2n, for some huge n. What algorithm will do that most efficiently? A naive algorithm would just calculate 1, 2, 4, …, 2n and check all the digits of all of them. That takes O(n2) time, since each 2k, for k=0,1,…,n, has ~klog102 = O(k) decimal digits.
We can improve this to roughly O(n) time, by simply noticing that we only need to check O(1) digits per power of 2 in expectation, presumably, until we find the first digit that requires carrying. Then we don’t even need to compute the remaining digits: we can simply move on to the next power of 2. (This sort of trick is used all over the place in the design of fast algorithms.)
But when I posted about Daniel’s problem on Facebook, my friend Greg Kuperberg (who’s a mathematician at UC Davis) noticed that further improvements are possible. To wit: 8×6 = 48, which again ends in 8. So, 8×16 ends in 8, as does 8×16k for every k≥0. Meaning: no 2n where n=3+4k can possibly be a counterexample to Daniel’s conjecture. They’re all ruled out!
Likewise, 64×1,048,576 ends in 64, so no 2n of the form n=6+20k can be a counterexample. They’re all ruled out as well.
We can keep going this way, filling out the “search tree” of potential counterexamples to Daniel’s conjecture via breadth-first search. At the root of the search tree, we try all possibilities for the last digit. One level deeper, we try all possibilities for the second-to-last digit, and so on. As we go, we prune subtrees according to constraints like the ones above that we keep discovering and adding.
When I worked this out, I got an algorithm for checking Daniel’s conjecture up to 2n, which under reasonable assumptions takes time only O(nα), where α = 1-log52 ≈ 0.569, and space only polylog(n).
Paul Crowley (who’s my Facebook friend) then actually implemented this algorithm, and he tells me that he used it to check that Daniel’s conjecture holds all the way up to $$2^{10^{21}}$$ (!!), using 40 minutes on a 128-core machine.
So, to return at last to the first thing I told Daniel: yes, I think his conjecture is almost certainly true, even though I have no idea when, if ever, the human race or its successors will have a proof.
We are all having to keep revising upwards our assessments of the mathematical capabilities of large language models. I have just made a fairly large revision as a result of ChatGPT 5.5 Pro, to which I am fortunate to have been given access, producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me.
The background is that, as has been widely reported, LLMs are now capable of solving research-level problems, and have managed to solve several of the Erdős problems listed on Thomas Bloom’s wonderful website. Initially it was possible to laugh this off: many of the “solutions” consisted in the LLM noticing that the problem had an answer sitting there in the literature already, or could be very easily deduced from known results. But little by little the laughter has become quieter. The message I am getting from what other mathematicians more involved in this enterprise have been saying is that LLMs have got to the point where if a problem has an easy argument that for one reason or another human mathematicians have missed (that reason sometimes, but not always, being that the problem has not received all that much attention), then there is a good chance that the LLMs will spot it. Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.
I decided to try something a little bit different. At least in combinatorics, there are quite a lot of papers that investigate some relatively new combinatorial parameter that leads naturally to several questions. Because of the sheer number of questions one can ask, the authors of such papers will not necessarily have the time to spend a week or two thinking about each one, so there is a decent probability that at least some of them will not be all that hard. This makes such papers very valuable as sources of problems for mathematicians who are doing research for the first time and who will be hugely encouraged by solving a problem that was officially open. Or rather, it used to make them valuable in that way, but it looks as though the bar has just been raised. It is no longer enough that somebody asks a problem: it needs to be hard enough for an LLM not to be able to solve it.
In any case, a little over a week ago I decided to see how ChatGPT 5.5 Pro would fare with a selection of problems asked by Mel Nathanson in a paper entitled Diversity, Equity and Inclusion for Problems in Additive Number Theory. Nathanson has a remarkable record of being interested in problems and theorems that have later become extremely fashionable, which has led him to write a series of extremely well timed and therefore highly influential textbooks. In this paper, he argues for the interest of several other problems, some of which I will now briefly describe.
If is a set of integers, then its sumset
is defined to be
. For a positive integer
, the
–fold sumset, denoted
, is defined to be
. Nathanson is interested in the possible sizes of
given the size of
. To that end one can define a set
to be the set of all
such that there exists a set
with
and
.
An obvious first question to ask is simply “What is ?” When
, the answer is the set of all integers between
and
. It is an easy exercise to show that if
, then
, so this result is saying that all sizes in between can be realized. However, it is not true in general that
can take every size between its minimum and maximum possibilities, and we do not currently have a complete description of
.
Another natural question one can ask, and this is where ChatGPT came in, is how large a diameter you need if you want a set with
and
having prescribed sizes. (Of course, the size of
must belong to
.) Nathanson showed that for every
there is a subset
of
with
and
, and asked whether the bound
could be improved. ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds before providing a construction that yielded a quadratic upper bound, which is clearly best possible. It wrote up its argument in a slightly rambling LLM-ish style, so I asked if it could write the argument up as a LaTeX file in the style of a typical mathematical preprint. After two minutes and 23 seconds it gave me that, after which I spent some time convincing myself that the argument was correct.
The basic idea behind both Nathanson’s argument and ChatGPT’s was that in order to obtain a set of a given size with a sumset of a given size, it is useful to build it out of a Sidon set, which means a set with sumset of maximal size (that is not quite the usual definition but it is the simplest to use in this discussion), and an arithmetic progression. Also, for a bit of fine tuning one can take an additional point near the arithmetic progression. Then if one plays around with the various parameters, one finds that one can obtain sets of all the sizes one wants. Nathanson doesn’t express his argument this way (it is Theorem 5 of this paper), instead giving an inductive argument, but I think, without having checked too carefully, that if one unravels his argument, one finds that effectively that is what he ends up with, and the Sidon set in question consists of powers of 2. ChatGPT obtained its improvement by simply using a more efficient Sidon set — it is well known that one can find Sidon sets of quadratic diameter. (One might ask why Nathanson didn’t do that in the first place: I think it is because the obvious idea of using a more efficient Sidon set becomes obvious only after one has redescribed his inductive construction. Is that what ChatGPT did? It is very hard to say.)
Next, I asked ChatGPT to see whether it could do the same for a closely related question, where instead of looking at the size of the sumset, one looks at the size of the restricted sumset, which is defined to be . Unsurprisingly, it was able to do that with no trouble at all. I got it to write both results up in a single note, to avoid a certain amount of duplication. If you are curious, you can see the note here.
I then asked what it could do for general . I was much less optimistic that it would manage to do anything interesting, because the proof for
makes fundamental use of the fact (due to Erdős and Szemerédi) that we know exactly which sizes we need to create. If we don’t know what the set
is, then it seems that we are forced to start with a hypothetical set
with
and
and build out of it a set of small diameter with the same property. As it happens, I still don’t know how to get round that difficulty (I’m mentioning that just to demonstrate that my mathematical input was zero, and I didn’t even do anything clever with the prompts), but Nathanson mentioned in his paper a remarkable paper of Isaac Rajagopal, a student at MIT, who must have got round the difficulty somehow, because he had managed to prove an exponential dependence of
on
for each fixed
.
I’ll leave the previous paragraph there, but Isaac has subsequently explained to me that that isn’t really the difficulty. His argument gives a complete description of when
is sufficiently large, and if one wants to prove a polynomial dependence for fixed
, then assuming that
is sufficiently large is clearly permitted. The real difficulty is that constructing the sets with given sumset sizes was significantly more complicated, and necessarily so because the degree of the polynomial grows with
, and one therefore needs more and more parameters to define the sets.
In any case, the task faced by ChatGPT was not to solve the problem from scratch, but to see whether it was possible to tighten up Isaac Rajagopal’s argument. Here’s what happened.
Isaac made some very interesting remarks about the nature of what the additional ideas were that ChatGPT contributed. Since, as I have already said, my mathematical input was zero, I invited him to write a guest section to this post. Just before we get to that, I want to raise a question (that will undoubtedly have been raised by others as well), which is simple: what should we do with this kind of content? Had the result been produced by a human mathematician, it would definitely have been publishable, so I think it would be wrong to describe it as AI slop. On the other hand, it seems pointless even to think about putting it in a journal, since it can be made freely available, and nobody needs “credit” for it (except that Isaac deserves plenty of credit for creating the framework on which ChatGPT could build). I understand that arXiv has a policy against accepting AI-written content, which makes good sense to me. So maybe there should be a different repository where AI-produced results can live. But various decisions would need to be made about how it was organized. I myself think that one would probably want to have some kind of moderation process, so that results would be included only if a human mathematician was prepared to certify that they were correct — or, better still, that they had been formalized by a proof assistant — and perhaps also that they answered a question that had been asked in a human-written paper. On the other hand, I wouldn’t want a moderation process that created vast amounts of work (unless the work was itself done by AI, but there are obvious dangers in going down that route). Anyway, until these questions are answered, this result is available from the link above, and perhaps, now that LLMs are so good at literature search, that will be enough to make it findable by anyone who wants to know whether Nathanson’s problem has been solved.
With just a few prompts, ChatGPT was able to improve the upper bound on (which I will define very soon) from exponential in
to polynomial in
. While its first improvement of the bound, from exponential in
to exponential in
, was a routine modification of my work, the improvement to polynomial in
is quite impressive. To do this, ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove, using similar methods to those in my own proof. My goal is to explain that idea, in a manner that will be digestible to my friends who are computer science majors as well as my math major friends.
The problem of bounding is closely related to a problem I worked on at the Duluth REU (Research Experience for Undergrads) program, of determining
. In particular,
is the set of possible
-fold sumset sizes
, where
can be chosen to be any set of
integers.
is the minimal
such that we can achieve all of the values of
using
-element sets
. I spent last summer explicitly characterizing the set
for large
, by constructing sets
such that
achieves all sizes which I could not rule out as impossible. So,
can be upper-bounded by optimizing my constructions.
I constructed these sets by combining smaller component sets which are simpler to analyze. Some of these components are the geometric series
for various values of and
. Unfortunately, the elements of
and
are exponentially large in terms of
. So, I asked ChatGPT (through Tim) whether there exist sets of
elements which have similar sumset sizes to these geometric series, but contain only numbers of polynomial size in
: I had no idea if this was possible, or how to begin constructing such sets. ChatGPT came back with an answer, constructing sets
and
which behave like “half a geometric series squeezed into a polynomial interval,” which is counterintuitive. Before I discuss the construction of
and
, I will explain the important properties of the sumset sizes of
and
which they recreate.
For , a set
is called a
set if the only solutions to
with in
are the “trivial” solutions, by which I mean that one side of the equation is a reordering of the other side. If
is a
set of size
, then elements of
correspond exactly to choices of
elements of
, with repetition allowed. Using “stars and bars,” one can see that
and this is the maximum possible value of
among sets of size
. So, another definition is that
is a
set if
. Sidon sets, which Tim discussed, are exactly
sets.
To make things more concrete, let us assume that in (1). Then,
is a
set, but it is not a
set because of the relations
for any choice of in
. In particular,
, as these
relations are the only ones preventing
from being a
set.
lacks the relations in (2) because
is not in
. So,
is a
set, but it is not a
set because of the relations
for any choices of in
. This gives
relations, and one can check that
. To summarize, we have seen that
(a) is a
set.
(b) is a linear function of
.
(c) is a
set.
(d) is a quadratic function of
.
ChatGPT was able to find sets and
of
elements which satisfy (a)-(d), but whose elements all have polynomial size in
. The construction of
and
uses
-dissociated sets, which are sets
where the only solutions to
with and
in
are the “trivial” solutions, i.e.
and one side of the equation is a reordering of the other side. For
, it is possible to construct an
-dissociated set
, where
is approximately
, and in particular polynomial in
. Constructions of such a
using finite fields date back to Singer (1938) and Bose–Chowla (1963) and are described in Appendix 1. Define
and
In hindsight, I have good intuition for the construction of and
. All of the relations in (2) and (3) are formed by combining one or two relations of the form
. There are approximately
relations of the form
in
and
, and approximately
such relations in
and
. There are few other low-order relations in
and
, and similarly in
and
because
is
-dissociated. So,
and
manage to contain half as many
-relations as their geometric series counterparts, while also containing few low-order relations.
We now see why (a)-(d) hold with and
replaced by
and
, respectively. For concreteness, we assume that
and
, so
contains no nontrivial relations as in (4) with
. Then,
is a
set, but it is not a
set because of the relations
for any choice of in
. If we let
, we can check that
is linear in
. In particular, (a) and (b) hold with
replaced by
, and the linear function
replaced by
. We can also see that
is a
set, but it is not a
set because of the relations
for any in
. If we let
, we can check that
is quadratic in
. In a similar manner, (c) and (d) hold with
replaced by
, and the quadratic function
replaced by
.
Even though I can motivate it in retrospect, ChatGPT’s idea to use -dissociated sets to control relations of order at most
feels quite ingenious. As far as I can tell, this idea is completely original.
ChatGPT’s proof that its construction produces the desired values of is very similar to my proof that the sets
which I construct achieve all possible values of
, after replacing
and
by
and
, respectively. Properties (a)-(d) capture many of the important properties of
and
(or
and
) which are used in this proof. The final constructions involve combining the sets
and
(or
and
in my paper) for each value of
between
and
with another set which is the union of an arithmetic progression and a point. Intuitively,
and
(or
and
) have large sumsets, while arithmetic progressions have small sumsets, so it is plausible that one could get sets which achieve all the medium-sized sumsets by combining them. However, the proof of this is quite involved, and it occupies Section 4 of my paper and the entirety of the ChatGPT preprint. In Appendix 2, I work out the details of the ChatGPT construction to show that for
sufficiently large,
For comparison, it is easy to see that is at least on the order of
, and it is unknown what the real value is. In Appendix 3, I give details of the correspondence between my paper and the ChatGPT preprint, which will be helpful for those who want to read either.
Finally, I want to express my deep gratitude to Tim for allowing me to contribute to this blog. I am still stunned by the coincidence that the problem he chose to put into ChatGPT 5.5 Pro led him to my paper on the arXiv.
I would judge the level of the result that ChatGPT found in under two hours to be that of a perfectly reasonable chapter in a combinatorics PhD. It wouldn’t be considered an amazing result, since it leant very heavily on Isaac’s ideas, but it was definitely a non-trivial extension of those ideas, and for a PhD student to find that extension it would be necessary to invest quite a bit of time digesting Isaac’s paper, looking for places where it might not be optimal, familiarizing oneself with various algebraic techniques that he used, and so on.
It seems to me that training beginning PhD students to do research, which has always been hard (unless one is lucky enough, as I have often been, to have a student who just seems to get it and therefore doesn’t need in any sense to be trained), has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.
I would qualify that statement in two ways though. First, there is the obvious point that a beginning PhD student has the option of using LLMs. So the task is potentially easier than proving something that LLMs can’t prove: it is proving something in collaboration with LLMs that LLMs cannot manage on their own. I have done quite a lot of such collaboration recently and found that LLMs have made useful contributions without (yet) having game-changing ideas.
A second point is that I don’t know how much of what I have said generalizes to other areas of mathematics. Combinatorics tends to be quite focused on problems: you start with a question and you reason back from the question or if you reason forwards you do so very much with the question in mind. In other areas there can be much more of an emphasis on forwards reasoning: you start with a circle of ideas and see where it leads. To do it successfully, you need to have some way of discriminating between interesting observations and uninteresting ones, and it isn’t obvious to me what LLMs would be like at that.
Of course, everything I am saying concerns LLMs as they are right now. But they are developing so fast that it seems almost certain that my comments will go out of date in a matter of months. It is also almost certain that these developments will have a profoundly disruptive effect on how we go about mathematical research, and especially on how we introduce newcomers to it. Somebody starting a PhD next academic year will be finishing it in 2029 at the earliest, and my guess is that by then what it means to undertake research in mathematics will have changed out of all recognition.
I sometimes get emails from people who are interested in doing mathematical research but are not sure whether that makes sense any more as an aspiration. I have a view on that question, but it may very well change in response to further developments. That view is that there is still a great deal of value in struggling with a mathematics problem, but that the era where you could enjoy the thrill of having your name forever associated with a particular theorem or definition may well be close to its end. So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody. Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.
So what is the point of struggling with a difficult mathematics problem? One answer is that it can be very satisfying to solve a problem even if the answer is already known, but I don’t think that is a sufficient reason to spend several years of your life on this peculiar activity. A better answer is that by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders, or people who have a solid grasp of how to do basic arithmetic are likely to be more skilled at using calculators (and especially at noticing when an answer feels off). Mathematics is a highly transferable skill, and that applies to research-level mathematics as well. By doing research in mathematics, you may not get the same rewards as your equivalents a generation ago, but there is a good chance that you will be equipping yourself very well for the world we are about to experience.
We will construct an -dissociated set
, where
is approximately
. This construction is a very minor modification of Bose–Chowla (1963)’s construction of a
set, which I learned about from this paper. For whatever reason, the GPT preprint (Lemma 3.1) uses a different, less efficient construction using moment curves.
Let be a prime, let
, let
be the finite field with
elements and fix a generator
of
, so that
is equal to
. Define a set of
elements
Then, each element corresponds to a unique value of
, by taking
. Now an additive relation of the form in (4) with
can be reframed by taking powers of
as
As is a degree-
extension of
and
is a generator of
as an
-extension, this means that
does not satisfy any nonzero polynomials in
of degree
. So, both sides of (6) are identical as polynomials in
and thus the additive relation in (4) is trivial. So,
is
-dissociated, and of course one can prune a few elements to reduce
to size
.
Fix constants such that
(in my paper I arbitrarily chose
). Let the two sets in (5) be called
and
. Let
denote the set of integers
satisfying
. Similarly to my paper, the constructions of
such that
achieves the desired sizes will combine sets of the following four types:
One reason that this construction needs to be complicated is that we need to create at least many sets. To do this, we vary
parameters
and
in the domain
and
parameters
and
in the domain
. We can choose
to be slightly bigger than
, and then the above construction gives us
different sets where
can be made arbitrarily small. So, if we were to remove any of the above parameters from the construction, and not change the others, this construction would no longer create
many sets. In comparison, Nathanson’s construction when
only needs to create
sets. He does this by combining a Sidon set, an arithmetic progression, and one extra value, and varying the size of the arithmetic progression and the extra value in ranges of size
.
We want to combine sets
, which are given by
,
for the
values of
,
for the
values of
, and a
set. By Appendix 1, for all
, there exists a
-dissociated set
of diameter
. By the constructions of
and
, we can take each
, where
. Let
have basis vectors
. To combine
, we can define
as
Similarly to my Lemma 4.9, this construction ensures that the generating function product holds, which is the identity that both my paper and the GPT preprint use (see either paper for a definition of these generating functions). By (the standard) Lemma 2.3 of the GPT preprint,
is Freiman-isomorphic of order
to a subset of
. Therefore, for
sufficiently large (the whole construction relies on this for the same reasons as in my paper),
In Section 4.2 of my paper, I use a different, simpler construction to construct sets achieving the values in
which have
, for some small
. These sets
are subsets of
, meaning that all elements have polynomial size in
. This is observed in Section 5 of the GPT preprint.
Section 4.3 of my paper carries out the construction which combines many components including and
. This corresponds to Sections 2, 3, 4, and 6 of the GPT preprint. This section has a lot of moving parts; I give an outline in Section 4.3.1.
In Section 4.3.2, I describe how the different components will be combined, using a construction which I call the disjoint union, and introduce generating functions as a bookkeeping tool to keep track of the sumset sizes of a set
. This corresponds to Section 2 and Section 4 of the GPT preprint.
In Section 4.3.3, I compute the generating function of each of the component sets, including (Lemma 4.15) and
(Lemma 4.17). This corresponds to Section 3 and Section 6.1 of the GPT preprint. In particular,
is computed in Lemma 3.3 and
is computed in Lemma 3.4. Once these generating functions have been computed, the remainder of the proof is almost identical in my paper and in the GPT preprint.
In Section 4.3.4, I put all the pieces together to show that as we range over the sets which I have constructed, the values of
will assume all of the elements of
. The key idea is to show that the set of all values of
forms an interval, and contains numbers both smaller than
and equal to
.
I had a piece up in New Scientist last week (paywalled, sorry!), about a new analysis that suggests the universe is less homogeneous (more “lumpy”) that most cosmologists believe.
The piece was a bit different than my usual. Normally I do what people in the biz call “features”: longer articles about general trends. This was a much more classic “news piece”. The people I interviewed had several papers up in early April, the editors at New Scientist thought they were interesting enough to write about, so I was asked for a short, timely piece with the key takeaways.
That means I didn’t have a ton of space for background info. So if you’d like to know more, this post is for you!
The 100-year old assumption in the title refers to the Friedmann–Lemaître–Robertson–Walker (or FLRW) universe, an idea that first came together in the 1920’s, where cosmologists model the universe as homogeneous and isotropic: the same no matter where, or in which direction, you look. That sounds like a crazy assumption, but on the largest scales we can measure it’s actually mostly fine. Once you’re trying to calculate ripples in the cosmic microwave background or find out how fast distant galaxies are accelerating away, it works surprisingly well to act like the universe is an evenly-mixed soup of matter, radiation, dark matter, and dark energy.
But every assumption in physics has its doubters. The doubters of homogeneity are known as inhomogeneous cosmologists, and I’ve been sympathetic to their complaints for a while now.
I even let an inhomogeneous cosmologist do a guest post on my blog, back in 2019. That post argued something dramatic: that dark energy may not even exist, but that measurements of accelerating expansion may be a consequence of a dramatic lopsidedness in the universe around us.
The people I covered in New Scientist, Asta Heinesen, Tim Clifton, and Sofie Marie Koksbang, are arguing something much less dramatic…but that’s part of what makes it more compelling. Instead of arguing that the universe is dramatically uneven or lopsided, they’re arguing that the universe can still be on average smooth and homogeneous, the soup of galaxies people seem to expect…but still, can’t be fully modeled that way.
This is a tricky distinction to explain, and certainly something I didn’t have space to cover well enough in New Scientist. But let me take a stab at it here:
Any cosmologist will agree that FLRW can’t be the whole story. We know the universe isn’t a perfectly mixed soup: there are galaxies, and stars, and black holes, and they all wiggle the fabric of the universe in different places. When they study the universe as a whole, they’re averaging out all of that, to get the overall behavior, a bit like you could average the number of children in each family to get the average children per family in a country.
But FLRW isn’t just an average, it’s a model of spacetime. Because of that, it has to obey certain equations, called Einstein’s equations. It has to make sense by itself, as the correct answer for how spacetime would behave if it were filled with a uniform soup.
That’s an extra restriction, and that extra restriction can get you in trouble. To continue with the analogy, any real family has a whole number of children. But the average family doesn’t have a whole number of children. When I was born, the average family in the US had around 2.5 children. A lot of cartoons imagined what the half-child looked like.
From the perspective of Heinesen, Clifton, and Koksbang, assuming FLRW is a bit like assuming that the average family must have two children, or three, and can’t possibly have 2.5. Averages don’t have to look like sensible spacetimes, they don’t have to obey the Einstein equations.
In practice, the assumption of FLRW has worked a lot better than assuming that the average family can’t have 2.5 children, and that’s why Heinesen, Clifton, and Koksbang are cautious. They’re not claiming that inhomogeneity can explain everything, all the way to major components of the universe like dark energy. But they do think it can be a good explanation for smaller effects. And as cosmologists worry about smaller and smaller effects, wondering if dark energy changes over time and why the expansion rate of the universe doesn’t match up between different measurements, it can be important to remember that averages aren’t all-powerful. Eventually, they can break down. It’s a more subtle issue than a fractional child. But, as I covered in New Scientist, it may already be happening.
I’ve just uploaded to the arXiv my paper “Products of consecutive integers with unusual anatomy“. This paper answers some questions of Erdős and Graham which were initially motivated by the study of the Diophantine factorial equation
The equation (1) ties into the general question of what the anatomy (prime factorization) of the product looks like. This is a venerable topic, with the first major result being the Sylvester-Schur theorem from 1892 that the largest prime factor of
is greater than
whenever
. Another notable result is the Erdős-Selfridge theorem that the product
is never a perfect power for
.
Erdős and Graham were able to show that solutions to (1) were somewhat rare, in that the set of possible values of had density zero. For them, the hardest case to treat was when the interval
was what they called bad, in the sense that
was divisible by the square of its largest prime factor. They were able, with some effort, to show that the union of all bad intervals also had density zero, which was a key ingredient in to prove the previous result about solutions to (1). They isolated a subcase of the bad intervals, which they called the very bad intervals, in which the product
was a powerful number (divisible by the square of every prime factor).
A later paper of Luca, Saradha, and Shorey made the bounds more quantitative, showing that both the set of values of , as well as the union of bad intervals, had density
for some absolute constant
. In the other direction, just by considering the case
, one can show that the number of possible values of
up to
is
, where
is the constant
It was conjectured by Erdős and Graham that all of these lower bounds are in fact sharp (up to multiplicative factors); this is Erdos Problem 380 (and a portion of Erdos Problem 374). The main result of this paper is to confirm this conjecture in two cases and come close in the third:
Theorem 1
- The number of numbers up to
that lie in a bad interval of length
is
of the number of bad points up to
.
- The number of numbers up to
that lie in a very bad interval of length
is
.
- The number of numbers up to
of the form
for a solution to (1) is
.
Not surprisingly, the methods of proof involve many standard tools in analytic number theory, such as the prime number theorem (and its variants in short intervals), zero density estimates, Vinogradov’s bounds on exponential sums, asymptotics for smooth numbers, the large sieve, the fundamental lemma of sieve theory, and the Burgess bound for character sums. There was one point where I needed a small amount of algebraic number theory (the classification of solutions to a generalized Pell equation), which was the one place where I turned to AI for assistance (though I ended up rewriting the AI argument myself). One amusing point is that I specifically needed the recent zero density theorem of Guth and Maynard (as converted to a bound on exceptions to the prime number theorem in short intervals by Gafni and myself); previous zero density theorems were barely not strong enough to close the arguments.
A few more details on the methods of proof. It turns out that very bad intervals, or intervals solving (1), are both rather short, in that the bound holds. The reason for this is that the primes
that are larger than
(in the very bad case) or
for a large constant
(in the (1) case) cannot actually divide any of the
unless they divide it at least twice. This creates a constraint on the fractional parts of
and
that turns out to be inconsistent with the equidistribution results on those fractional parts coming from Vinogradov’s bounds on exponential sums unless
is small. In the very bad case, this forces a linear relation between two powerful numbers; expressing powerful numbers as the product of a square and a cube, matters then boil down to counting solutions to an equation such as
The situation with bad intervals is more delicate, because there is no obvious way to make small in all cases. However, by the large sieve (as well as the Guth–Maynard theorem), one can show that the contribution of large
is negligible, and from bounds on smooth numbers one can show that the interval
contains a number with a particularly specific anatomy, of the form
where
are all primes of roughly the same size, and
is a smoother factor involving smaller primes. The rest of the bad interval creates some congruence conditions on the product
. Using some character sum estimates coming from the Burgess bounds, we find that the residue of
becomes fairly equidistributed amongst the primitive congruence classes to a given modulus when one perturbs the primes
randomly (there are some complications from exceptional characters of Siegel zero type, but we can use a large values estimate to keep their total contribution under control). This allows us to show that the congruence conditions coming from the bad interval are restrictive enough to make non-trivial bad intervals quite rare compared to bad points. One innovation in this regard is to set up an “anti-sieve”: the elements of a bad interval tend to have an elevated chance of being divisible by small primes, and one can use moment methods to show that an excessive number of small prime divisors is somewhat rare. This can be compared to standard sieve arguments, which often seek to limit the event that a number has an unexpectedly deficient number of small prime divisors.

In the last episode of my column in Notices of the American Mathematical Society, we looked at a particle moving in an attractive central force whose strength is proportional to the inverse cube of the distance from the origin. Among other things, we saw that a particle moving in such a force can spiral in to the origin in a finite time. But that was classical mechanics. What about quantum mechanics?
Here things get more tricky. The uncertainty principle tends to prevent the particle from falling in to the origin. But when the attractive force is strong enough, the particle can still fall in. We can make up a theory where the particle shoots back out, but there are choices involved: we need to say how the particle changes phase when shoots back out. So there is not just a single theory, but many!
Why does the particle come back out? There are theories where it does not. In these theories, at least those studied so far, time evolution is nonunitary: that is, the probability of finding the particle somewhere or other does not stay equal to , because the particle simply disappears when it hits the origin. Here we focus on theories where time evolution is unitary and the particle comes back out. Many people have written about these, running into ‘paradoxes’ when they weren’t careful enough. Only rather recently have things been straightened out.
Let us dig into the details. In quantum mechanics, the Hilbert space of states of a particle in is . In a central force whose strength is proportional to , such a particle has a Hamiltonian of this form:
The first term describes the particle’s kinetic energy, while the second describes its potential energy: remember, taking the gradient of an inverse square potential gives an inverse cube force. I have set some constants to to remove irrelevant clutter, but we need the constant to say how strong the force is. When , the force is attractive.
In this game, analysis is paramount. We should interpret as a densely defined linear operator on . For this, we choose a dense linear subspace and treat as a linear map from to . Different choices of correspond to different physical assumptions: for example, assumptions about what happens when the particle falls into the origin.
To get unitary time evolution in quantum mechanics, we need the Hamiltonian to be self-adjoint. But adjoints of densely defined operators are tricky. Let us briefly recall how they work. Given a Hilbert space and a linear operator from a dense linear subspace to , we define to be the set of all for which there exist such that
If such a vector exists, it is unique, and it depends linearly on . Thus, for we define to be the vector with the above property. The adjoint of is then the linear operator . We say is self-adjoint if . We say that is essentially self-adjoint if it has a unique extension to a self-adjoint operator. If it does, this extension must be .
All this raises the question of whether the Hamiltonian for the inverse cube force law can be made self-adjoint with a suitable choice of domain. It turns out we can always do it, but sometimes in more than one way. There are three regimes:
. In this case we can start with the domain consisting of smooth functions that are compactly supported on minus the origin. The operator is unambiguously defined on this domain, and it is essentially self-adjoint.
. In this case is still well-defined on the domain , but it is not essentially self-adjoint. In fact, it admits more than one self-adjoint extension! However, is bounded below: there is a constant such that for all . Physically, this means that the particle’s energy is bounded below by . Mathematically, this implies that has a canonical choice of self-adjoint extension called the ‘Friedrichs extension’, with the smallest possible domain. But there is another canonical choice, the ‘Krein extension’, with the largest possible domain.
. In this case is well-defined on the domain , and it has more than one self-adjoint extension, but it is not bounded below.
These strange results demand explanation. For example, what is special about ? In classical mechanics, the energy of a particle in the inverse cube force ceases to be bounded below as soon as . Quantum mechanics is different. To get a lot of negative potential energy, the particle’s wavefunction must be peaked near the origin, but that gives it kinetic energy. The tradeoff is captured by Hardy’s inequality. This says that for any we have
This is why is bounded below when .
On the other hand, the constant in Hardy’s inequality cannot be improved, so if we can find with . Then we can use a remarkable property of the potential to show that is not bounded below. Namely, has a kind of symmetry under dilations. You can guess this by noting that both the Laplacian and have units of 1/length. Indeed, if you take any smooth function , dilate it by a factor of , and then apply , you get times what you get if you do these operations in the other order. This implies that if
we can dilate and get a function obeying the same equation with replaced by . Thus, as soon as can be negative, it can be made arbitrarily large and negative by choosing to be very small. Thus is not bounded below.
Next, what is special about ? This is more subtle. For any value of we can find spherically symmetric solutions of on that are nonzero and smooth. When , and only in this case, some of these solutions lie in . This dooms the chance of being essentially self-adjoint, because it implies . If were essentially self-adjoint would be self-adjoint, and it is easy to see that a self-adjoint operator cannot have as an eigenvalue.
When the operator has more than one self-adjoint extension from to some larger domain. To classify these we can use separation of variables, writing as a sum of a radial part and an angular part, assuming the angular dependence of is given by a spherical harmonic , and doing a change of variables to reduce to the ordinary differential operator
on the half-line . We can completely classify self-adjoint extensions of this differential operators from to larger domains; the answer depends on and . A choice of self-adjoint extension is a choice of boundary conditions at , and this says how the phase of an incoming wave changes as it reflects off the origin and bounces back. Finally, we can assemble the results for different spherical harmonics to classify self-adjoint extensions of .
There exist many self-adjoint extensions of that respect the rotational symmetry of the inverse cube force law, but for the extension must break the dilation symmetry discussed above. This is what physicists call an ‘anomaly’: a symmetry of a classical system that fails to be a symmetry of the corresponding quantum system. Intriguingly, for some even lower values of one can choose a self-adjoint extension that is symmetrical under a discrete subgroup of dilations. Determining precisely which values these are seems to be an open problem.
To explore this topic thoroughly, I recommend first this:
then this:
and finally this:
The first is an excellent overview of problems associated to singular potentials, including the inverse cube force. The second delves into self-adjoint extensions of the ordinary differential operators mentioned above, and the third works them out with exquisite thoroughness.
Because of last week’s “bonus info” post, I’m only now getting around to commenting on this year’s Breakthrough Prizes in Fundamental Physics. While I don’t comment on them every year, I know enough about several of this year’s winners that I figured a post would be helpful.
For those who haven’t heard of it, the Breakthrough Prizes are a bit like the Nobel, if it was created by a 21st century rich person instead of a 19th century one. They give out more money, and instead of an organization like the Swedish Academy of Sciences they pick winners via a committee of past winners. They’re more flexible in structure than the Nobel, with extra prizes for early-career researchers and a tendency to reward accomplishments that are either entirely theoretical or solid experimental work that doesn’t show a new discovery, both of which are things the Nobel Prize is structured to avoid. They’ve also shown willingness to reward large collaborations, rather than following the Nobel’s informal rule to only give the award to three people at a time.
This last was on display this year in their main award in physics this year, for the muon g-2 collaborations. The award is going to collaborations of scientists and engineers at three different particle colliders, for work done over a span of over fifty years to measure the magnetic properties of the muon. These measurements have shown a tantalizing discrepancy with predictions that inspired many to conjecture new physics. However, in the last few years it’s looked more and more like the discrepancy was due to an imprecise prediction, and better methods seem to be converging to the experimental value. At this point, smart money is that there is no disagreement with the Standard Model here, but as always in science there’s a chance some mystery remains.
The Breakthrough Prize also offered a special, out-of-schedule prize to David Gross. Already a Nobel laureate, Gross had a crucial role in our understanding of the force of quantum chromodynamics that binds protons and neutrons together. He was also a major founding figure in string theory, and since the Breakthrough Prize is more comfortable recognizing theoretical contributions they get to mention this as well. Gross is also known in the community for his personality, which tends to fill up any room he’s in. I can only imagine the conversations that led to Breakthrough’s decision to add a special prize for him this year.
Breakthrough is also adding a new recurring prize, the Vera Rubin New Frontiers Prize, honoring women who make important contributions to physics within two years of their PhD. The prize is a bit smaller than the exiting early-career New Horizons in Physics Prizes, presumably because it goes to even younger researchers. This year’s winner is from my old field, scattering amplitudes. Carolina Figueiredo is part of the latest evolution of the research program behind the amplituhedron. The new framework of “surfaceology” seems like a promising geometry-flavored way to understand particle physics calculations in more realistic theories, and unlike its predecessors may have some practical value eventually as well. Congrats Carolina!
Finally, the New Horizons in Physics Prizes are for impressive early-career researchers. I don’t know much about the first recipient, Benjamin Safdi, who works on searches for axions and axion-like particles, today’s most trendy dark matter candidate. I know a bit more about the work done by Clay Córdova, Thomas Dumitrescu, Shu-Heng Shao, and Yifan Wang, having met several of them in my physics career. They work on what are called generalized symmetries, concepts which go beyond the usual idea of how symmetry is supposed to work by involving more complicated tensors. I saw these crop up a fair bit in talks, but they were distant enough from my area that I never had a particularly clear grasp of what people were doing with them. I know even less about the work of the last three, Dillon Brout, J. Colin Hill, Mathew Madhavacheril, Maria Vincenzi, Daniel Scolnic, and W. L. Kimmy Wu, on cosmological measurements, but I was friends with Mathew in grad school and am impressed that he’s now working on cosmology given how little cosmology research there was at Stony Brook at the time.
Will you heed my warnings NOW?Holy crap … yesterday I was elected to the US National Academy of Sciences! If you don’t believe me, click the link and keep scrolling down until you hit the name “Aaronson.” But then continue scrolling to see 144 other inductees, including my IAS postdoctoral classmate Maria Chudnovsky, my longtime friend and colleague Salil Vadhan, and even Janet Yellen. I’m humbled to be in such company.
Years ago, somewhere on this blog, I mused that, if I were ever invited to join NAS, I hoped I’d follow the wisdom of Richard Feynman, who famously resigned his NAS membership, comparing it to an honor society back at his high school that spent most of its time debating who should be a member of the honor society. Feynman was also annoyed at having to pay dues.
But now that I’m actually faced with the choice, it’s like, dude! At my advanced age of 44, I’ve encountered so many people who dislike me or even sneer at me, and so many clubs that won’t have me as a member, that I feel mostly gratitude and warmth toward a fine club like NAS that will have me as a member. Anyway, I’ll certainly try it out to see what it’s like—even Feynman did that!
A few hours after I started getting congratulatory emails, for which I was thankful, someone from UT Austin’s press office asked me how I feel about this “culmination” and “capstone” of my entire research career. I replied, look, I know I’ve slowed down a lot since my nubile twenties, but I still hold out the hope that this isn’t any kind of “capstone”!
In any case, I’m ridiculously grateful to all the friends, family, colleagues, and readers who believed in me and helped me reach wherever this is.
Now for a totally different topic, but that will ultimately loop back to the first one:
Last week, I did an Ask Me Anything about quantum computing and blockchain for stacker.news, a forum devoted to bitcoin. Thanks to Will Scoresby for organizing it.
As a longer-term commitment, I also collaborated with my colleagues Dan Boneh, Justin Drake, Sreeram Kannan, Yehuda Lindell, and Dahlia Malkhi, in a panel convened by Coinbase, to put out a detailed position paper about the quantum threat to cryptocurrencies and how best to respond to it. Take a look!
Notably, the situation evolved even while we were writing our position paper—for example, with the major recent papers from Google and Caltech/Oratomic that I blogged about a month ago.
I’d now like to add a few words of my own, not presuming to speak for my fellow Coinbase panelists.
See, some of the most reputable people in quantum hardware and quantum error-correction—people whose judgment I trust more than my own on those topics—are now telling me that a fault-tolerant quantum computer able to break deployed cryptosystems ought to be possible by around 2029.
Maybe they’re overoptimistic. Maybe it will take longer. I dunno. I’m not a timing guy.
But here’s what I do know: the companies racing to scale up fault-tolerant QC, have no plans to slow down in order to “give cybersecurity time to adapt” or whatever. The way they see it, cryptographically relevant QCs will plausibly be built sometime soon: indeed, it’s ultimately unavoidable, even if people’s only interest in QC was to do quantum simulations for materials science and chemistry. So, given that reality, isn’t it better that it be done first by mostly US-based companies in the open, than by (let’s say) Chinese or Russian intelligence in secret? And besides, haven’t there already been years of warnings and meetings about the quantum threat to RSA, Diffie-Hellman, and elliptic curve cryptography? Aren’t many in cybersecurity still in denial about the threat? Haven’t these slumberers shown that they won’t wake up until dramatic achievements in fault-tolerant QC roust them—the way Anthropic’s Mythos model has now jolted even the most ostrich-like about the cybersecurity risks of AI? So, mixing metaphors, mightn’t we just as well rip this Band-Aid off ASAP, rather than giving foreign intelligence agencies extra years to catch up? Indeed, when you think about it that way, isn’t racing to build a cryptographically relevant QC, as quickly as possible, the most ethical, socially responsible thing for an American QC company to do?
Is the above line of reasoning suspiciously self-serving and convenient? Does it remind you of the galaxy-brained arguments that AI company after AI company has offered over the last decade for why “really, if you think about it, accelerating toward dangerous superintelligence is the safest course of action that we could possibly take”? I.e., the arguments that led to the current frenzied AI race, which some believe imperils all life on earth?
It’s not my place here to answer such questions; I leave further ethical and geopolitical debate to the comment section! My point is simply: whether or not anyone likes it, this is how some of the leading QC companies are now thinking about the Shor of Damocles that they genuinely believe now hangs over the Internet.
And I’d say that that makes my own moral duty right now ironically simple and clear: namely, to use my unique soapbox, as the writer of The Internet’s Most Trusted Quantum Computing Blog Since 2005TM, to sound the alarm.
So, here it is: if quantum computers start breaking cryptography a few years from now, don’t you dare come to this blog and tell me that I failed to warn you. This post is your warning. Please start switching to quantum-resistant encryption, and urge your company or organization or blockchain or standards body to do the same.
Yea, heed my warning, for it comes not from some WordPress-using rando, but from the inventor of BosonSampling and PostBQP and shadow tomography, the Schlumberger Centennial Chair and Founding Director of the Quantum Information Center at the University of Texas at Austin, and (wait for it) new member of the US National Academy of Sciences, that august and distinguished body brought into being by President Abraham Lincoln in 1863.
Because, you know, none of this is about me. It’s only about you. And whether you’ll listen to me.
[A more technical post follows]
My most recent paper, out on the arXiv today, is very exciting to me because it seems to be a genuinely new way of computing some important quantities and it is devilishly simple. So simple that I worried for months that it is all super-obvious to everyone. But another voice within me said to myself: Well if it is so obvious, why has nobody published it? Another (paranoid) voice within said: Maybe someone has published this method, and I just can't find it in the literature...
Well, I decided that the best way to find out for sure is to put it on the arXiv and within a short time someone will email to say that I missed their important work. So, while I wait for that email (as I start writing it's only been 30 minutes since it has been "out there", so there's time), let me say a few things about why I like the many results in the paper.
I was already pleased enough with the core part of the paper that I was going to write a swift four-pager about it back in February. The core point being that I figured out how to build on work I'd done in a paper back in 2024 (expanded on with followup work I did with Wasif Ahmed and Krishan Saraswat, a student and postoc). Back in 2024, I found (here) a really nice way (almost miraculous in how it worked) of writing all the corrections to the spectral density of a class of models in terms of one function [latex]u_0[/latex] and its derivatives. It was obtainable from one simple ordinary differential equation (ODE) called the Gel'fand-Dikii equation, which takes in the function [latex]u_0(x)[/latex] as input. The ODE is for a special quantity called the diagonal resolvent [latex]{\widehat R}(x,E)[/latex]. You integrate that quantity [latex]\widehat R(x,E)[/latex] with respect to [latex]x[/latex] and you're more or less home. In general, it is a messy quantity that does not integrate to anything nice. But just when the function [latex]u(x)[/latex] obeys the "string equation" it is supposed to (as dictated by the governing model's physics), then [latex]{\widehat R}(x,E)[/latex] is a total derivative (a seeming miracle-see later), and the corrections it gives to the density become of just the right form!
Those corrections can be called [latex]W_{g,1}(E)[/latex] where the [latex]g[/latex] is the order in perturbation theory. [latex]g=0[/latex] is leading order, [latex]g=1[/latex] is the torus, [latex]g=2[/latex] the double torus, etc. Indeed [latex]g[/latex] is the number of handles or "genus" of an associated Riemann surface. The one subscript on the other hand, corresponds to the one energy entry available when just discussing the density [latex]\rho(E)[/latex]. All the [latex]W_{g,1}[/latex] end up being written nicely in terms of a function [latex]u_0(x)[/latex] and its derivatives, evaluated at a special point.
An already nice feature (among many) of the construction was that this one ODE, recursively solved, gave rise to the [latex]W_{g,1}[/latex] of many different problems across a range, including certain random matrix models, gravity problems, intersection theory and topology, and so on. All you need to do is change the function [latex]u_0(x)[/latex]. Moreover, for this (wide) class of problems, you can compute the desired results faster and with way less machninery than other methods, such as topological recursion, which was an interesting observation. This includes very famous problems like the Weil-Petersson volumes (of the compactified moduli space [latex]\overline{\cal M}_{g,1}[/latex] of Riemann surfaces with genus [latex]g[/latex] and [latex]n=1[/latex] boundaries) and generalisations. Another nice feature is that you also get non-perturbative data beyond the genus expansion, an aspect I explored recently (in this paper) with student Joao Rodrigues, and expert in resurgence techniques.
The core breakthrough of the new paper is this: For some time, I've wondered how to compute correlators for more energies (amounting to multi-point correlators of [latex]\rho[/latex]) in this same way: [...] Click to continue reading this post
The post Computing Correlators appeared first on Asymptotia.
When I was pursuing a PhD at Caltech, so was my friend Jeremy. He used to throw a dinner party every few months. The email invitations welcomed friends to partake of his cooking and, if we wished, to help him cook. I didn’t help cook; but, when I arrived, the mess of pots and pans drew me to the kitchen like vinegar drawing a pathological fly. I couldn’t sit still while cookware needed cleaning, so I scrubbed and rinsed the pans and spoons and bowls. Jeremy, an applied-physics student, commented on my adeptness at decreasing entropy.
It’s the story of my life, I replied.
In fourth grade, my classmates and I cleaned our desks every Friday afternoon. Once a student finished, my teacher dismissed him or her onto the playground. My neighbor’s desk horrified me like the disaster in a hurricane’s wake, so I neatened his desk after finishing with mine.1 Another friend requested the same favor. A third classmate offered to pay me for cleaning his desk, but I’d have undertaken the chore for its own sake. Ordering the world offered me fulfillment.
From cleaning a fourth-grade desk, I progressed to pursuing a PhD in theoretical physics. The two pursuits might seem to resemble each other no more than Dr. Jekyll and Mr. Hyde; yet, to me, the path between them is but a step. I trained as a theoretical physicist because I love organizing ideas. Caltech paid me to build models, propose definitions and theorems, and structure proofs—to dream up ideas and identify the optimal arrangements for them. I needed that pay, being an adult, as I hadn’t needed my fourth-grade classmate’s desk-cleaning fee. Yet I organized ideas for the same reason that drove me to organize my neighbor’s notebooks.
Many people have called entropy a measure of disorder. To see why, imagine that Jeremy’s crew has used thirty utensils while cooking. The chefs can have scattered the utensils across the kitchen in many ways: they may have dropped forks on the floor, left spoons in the sink, arranged spatulas on the drying rack, or filled a vase with knives like a modern-art bouquet. In few of these configurations do the forks lie in their compartment of the utensil drawer, the spoons lie in their compartment, etc. We call such configurations neat. Most of the other configurations, we call messy.
A system’s entropy is the number of configurations consistent with known large-scale properties of the system, such as the number of forks.2 More configurations are consistent with messiness (and a fixed number of forks and so on) than with neatness (and the same number of forks and so on). Messiness tends to correlate with high entropy. People often say, therefore, that entropy quantifies messiness. Hence Jeremy’s complimenting me on my decreasing of entropy.
Jeremy’s dinner parties came to mind as I read the book The Mattering Instinct, published by Rebecca Newberger Goldstein this January. Rebecca is a philosopher of science and a writer. I had the good fortune to meet her through my undergraduate mentor Marcelo Gleiser, who’s had another cameo or two on Quantum Frontiers. Rebecca’s latest book covers what she calls the mattering instinct: the longing to know that we matter.
We spend scads of energy and time on securing our “survival and flourishing,” as Rebecca says. We feed ourselves; work to earn money to purchase food; clean, shelter, and clothe ourselves; ingrain ourselves in societies that offer some degree of security; and more. Do we deserve all this effort? We long for assurance that, in the immortal words of L’Oreal, we’re worth it.
Survival and flourishing, Rebecca writes, requires us to decrease entropy. Every closed, isolated system’s entropy increases or remains constant, according to the second law of thermodynamics. Entropy increases as a system becomes more uniform, loosely speaking. The system’s particles spread out across space, these particles’ temperature comes to equal those particles’ temperature, and so on. In contrast, your body exists because its particles clump together in a certain shape consistently. You withstand heat waves and snow because homeostasis maintains your temperature despite your environment’s temperature. You keep your body’s entropy low to survive. Rebecca therefore casts us as fighting entropy.
As a thermodynamicist, I agree with Rebecca. Yet I also adore entropy. It helps explain why time flows, quantifies uncertainty, and determines the maximal efficiencies with which we can perform tasks such as communication. What versatility and richness! Entropy also embodies tension and subtlety: its mathematical definition looks obscure at first glance, yet entropy helps explain familiar phenomena such as aging. For these reasons, before beginning my PhD, I told a potential advisor that I could imagine devoting the next five years of my life to entropy.
I therefore aspire to rehabilitate entropy’s reputation. Novelist Terry Pratchett endeared mortality to millions of readers through anthropomorphism. His character Death, a mainstay of the Discworld series of novels, elicits empathy and fondness. I won’t anthropomorphize entropy here,3 but I aim to replace conflict with cooperation in the narrative above. To survive and flourish, I hold, we partner with entropy. How? We create oodles of entropy in our environments. This entropy increase offsets the entropy decrease that supports life.
For example, imagine working at a desalination plant. You’d process high-entropy water throughout which salt has spread. You’d concentrate the salt in a tiny region, reducing the water’s entropy. This reduction, producing fresh water, could support your city’s drinking, cooking, and toothbrushing needs.
To reduce the water’s entropy, you’d create loads more entropy. You’d eat breakfast before work, consuming energy stored neatly in your waffle’s chemical bonds. Your body would later break the bonds, releasing the energy. Some energy would power your muscles, so you could program the desalination system, test its output, etc. But much of the chemical energy would transform into heat radiated by your body. The heat would warm up the air molecules around you, magnifying their random jigglings and jostlings. You’d increase the entropy of the air—your environment—to decrease the water’s entropy. The air’s entropy increase would outweigh the water’s entropy decrease.
Organisms survive and flourish by producing entropy in their environments. In fact, organisms have a knack for generating entropy. Entropy and life thereby further each other. A glass-half-full thinker could conclude that we partner with entropy.
So did I partner with entropy as a PhD student, applying it to solve problems in quantum information theory and thermodynamics. So did I partner with entropy in fourth grade and at Jeremy’s apartment, deriving satisfaction from my cleaning. Rebecca would call these activities’ ultimate aim (beyond the aim of, e.g., not sitting beside a pigsty in fourth grade) mattering. She writes that we reduce entropy (within our immediate vicinities) to satisfy the mattering instinct. Rebecca’s proposition describes my behaviors with uncanny precision, I realized upon reading her book.
Which I’ve now finished. So pardon me while I return to washing forks in the quantum kitchen of the universe.
With thanks to Jeremy for his friendship…and food.
1I also ensured that my neighbor brought home, every afternoon, the sweater he’d brought to school that morning. Before I took charge, he’d ended up with three forgotten sweaters crammed into his cubby.
2At least, one entropy is. Many other entropies exist.
3If you anthropomorphize entropy elsewhere, let me know.
Tanya Klowden and I have uploaded to the arXiv our preprint “Mathematical methods and human thought in the age of AI“. This is an unabridged version of a solicited article for a forthcoming Blackwell Companion to the Philosophy of Mathematics. I rarely write article-length essays of a philosophical nature (perhaps the last one was in 2007), but given the topical interest in AI and formalization for mathematics, which has begun to raise increasingly fundamental questions about what the nature, purpose and practice of mathematics actually is (or ought to be), it seemed like it was a timely opportunity to write about these matters. Other mathematicians seem to have recently come to this conclusion also; see for instance this paper of Avigad, or this paper of Commelin, Jamnik, Ochigame, Taelman, and Venkatesh, both of which have come out in the last few weeks.
Our piece took over a year to write – which means, at the current pace of development in the field, that some of it is already slightly out of date. Nevertheless, it was an instructive exercise for both of us to try to look beyond the immediate technical issues presented by current AI and formalization tools and try to point out the philosophical questions that we will have to grapple with as these tools become increasingly capable and integrated into our profession, using prior examples of technological advancement as a guide. We don’t pretend to have definitive answers to most of these questions, but as with mathematics itself, the first step is to pose the questions and then try to make partial progress on them (or at least identify some negative results and eliminate some failed approaches). One point we particularly felt worth stressing is that AI tools and applications (in mathematics or elsewhere) should not be viewed purely through the technical lens of what microscale problems they solve and how effective or efficient they are at solving them, but also through the macroscopic humanitarian lens of how our society, our shared body of knowledge and understanding, and our species benefits (or is harmed) as a whole from these technologies.
Our initial submission ended up significantly exceeding the page limits of the submission and ventured beyond the philosophy of mathematics into broader philosophical and ethical questions about AI in general. A streamlined version of the paper will appear in the forthcoming companion, but we have decided to make the original longer version available on the arXiv.
I’m giving a talk online tomorrow at the 2026 Spring Southeastern Sectional Meeting of the American Mathematical Society, in the Special Session on Non-Associative Rings and Algebras. The organizers are Layla Sorkatti and Kenneth Price. I doubt the talk will be recorded, but you can see my slides.
Abstract. Dubois-Violette and Todorov noticed that the gauge group of theStandard Model of particle physics is the intersection of two maximal subgroups of . which is the automorphism group of the exceptional Jordan algebra . Here we conjecture that these can be taken to be any subgroups preserving copies of and that intersect in a copy of . Given this, we show that the Standard Model gauge group consists of all isometries of the octonionic projective plane that preserve an octonionic projective line and a complex projective plane intersecting in a complex projective line. This is joint work with Paul Schwahn.
This is an introductory talk for mathematicians. Physicists may prefer the two talks here. Those go much further in some ways, but they don’t cover the new ideas that Paul Schwahn and I are in the midst of working on.
An Artin-Tits group is a group with a finite set of generators in which every relation is of the form
or
for some positive integer
, where
and
are two of the generators. In particular, the commutation relation
is allowed (it is the case
of the first type of relation) and so is the braid relation
(it is the case
of the second type of relation). This means that Artin-Tits groups include free groups, free Abelian groups, and braid groups: for example, the braid group on
strands has a presentation with generators
, where
represents a twist of the
th and
st strands, and relations
if
and
.
A few weeks ago, I asked ChatGPT for a simple example of a word problem for groups or semigroups that was not known to be decidable and also not known to be undecidable. It turns out that the word problem for many Artin-Tits groups comes into that category: the simplest example where the status is not known is the group with four generators and
where
and
commute and all other pairs of generators satisfy the braid relation
.
My interest in this was initially that I was looking for a toy model from which one could learn something about how mathematicians judge theorems to be interesting. I started with a remarkable semigroup discovered by G. S. Tseytin that has five generators, seven very simple relations, and an undecidable word problem. From that I created a puzzle game that you might be interested to play. (NB the puzzles were not showing up in the public version, but that should be sorted out now.) Each puzzle is equivalent to an instance of the word problem for Tseytin’s semigroup, but the interface makes it much more convenient to change words using the relations than it would be to do it with pen and paper.
I was hoping that because every mathematical problem can in principle be encoded as a puzzle in this game, one might be able to build a sort of “alien mathematics”, where a theorem was an equality between words, and a definition was a decision to introduce a new (and redundant) generator g, together with a relation of the form g = w, where w is a word in the current alphabet. Theorems would be particularly interesting if they were equalities between words that could be established only by a chain of equalities that went through much longer words, and definitions would be useful if the new generators satisfied particularly concise relations (which would allow one to build “theories” within the system). I still hope to find a word problem that will allow such a project to take off, but in the end, after a lot of playing around with the game linked to above, I have decided that Tseytin’s semigroup is not suitable. The reason is that it is designed so that arbitrary instances of the word problem for groups can be encoded as word problems in this semigroup, and once one gets used to the game, one starts to see how that encoding can work. Furthermore, one seems to be driven towards the encoding — I don’t get the impression that there’s a whole other region of this semigroup to explore that has nothing to do with the kinds of words that come up in the encoding. And if that impression is correct, then one might as well start with the word problem for some group in unencoded form, or alternatively look for another semigroup. Nevertheless, I find the Tseytin game quite enjoyable: I won’t say more about it here but have written a fairly comprehensive tutorial that you can open up and read if you follow the link above.
This is perhaps the moment to say that the words “I created a puzzle game” are slightly misleading. For one thing, I discussed the idea of gamifying Tseytin’s semigroup about three years ago with Mirek Olšák, a former member of my automatic theorem proving group, and he created a basic prototype in Python. But the main point is that I do not have the programming skills to create a game that can be played in a web browser — I vibecoded it using ChatGPT.
After that experience, I thought that maybe I would have better luck if instead of looking for a group or semigroup with undecidable word problem, which might well have been explicitly designed with some encoding in mind, I looked for a word problem for which the decidability status was unknown. That way, it wouldn’t have been designed to be undecidable, but might nevertheless just happen to be undecidable and provide a nice playground of the kind I was (and still am) after. And that is what led me to the Artin-Tits groups.
However, those don’t seem to be suitable either, because it is conjectured that they all have decidable word problems. I have created a game for the Artin-Tits group mentioned above, which you can also play if you want, but I have found it very hard to create interesting puzzles. (NB again there was a problem with the puzzles showing up, with only a tiny handful being there, but that should now be sorted out.) That is, I found it difficult to find words that are equivalent to the identity but that are not easily shown to be equivalent to the identity. One nice example comes from the fact that the subgroup generated by and
is isomorphic to the braid group
with four strands. The late Patrick Dehornoy found a very nice example of a braid with four strands that is equal to the identity but not in a completely trivial way. A picture of it can be found on page 4 of this paper.
This is where a potential Polymath project comes in. An initial goal would be to determine the decidability status of this one small Artin-Tits group: if we managed that, then we could consider the more general problem. And the way I envisage approaching this initial goal is an iterative process that runs as follows.
I have done a couple of iterations already, with the result that I now have an algorithm — let’s call it — that solves (basically instantaneously) every puzzle I throw at it, including the one derived from Dehornoy’s braid mentioned above. So a subgoal of the initial goal is to find a puzzle that has a solution but that
doesn’t solve. If we can’t, then maybe it is worth trying to prove that
solves the word problem for this group. One comment about the algorithm
is that it never increases the length of a word, though it often does moves that preserve the length. I was told by ChatGPT that it is an unsolved problem whether every braid that is equal to the identity can be reduced to the identity without ever increasing the number of crossings. Of course that isn’t a guarantee that it really is unsolved, but if it is, then that’s another interesting problem. It also means that I would be extremely interested if someone found an example of a word in the Artin-Tits group that can be reduced to the identity but only if one starts by lengthening it.
I’m hoping that this will be an enjoyable project for people who like both mathematics and programming. On the maths side, there is an unsolved problem to think about, and on the programming side there are lots of possibilities: for example, one could write programs that explore the space of words equal to the identity, trying to do so in such a way that there is a reasonable chance of reaching a word where it isn’t obvious how to reverse all the steps and get back to the identity. The game comes with a “sandbox” that has a few tools for generating words, but at the moment it is fairly primitive, and I would welcome suggestions for how to improve it.
It seems to me that as Polymath projects go, one can’t really lose: either we find an algorithm for the word problem, which establishes an unknown (at least according to ChatGPT) case of a decidability problem, or we find a suite of harder and harder puzzles, creating a more and more challenging and entertaining game and obtaining a deeper understanding of the Artin-Tits group in the process.
This Artin-Tits game has only a rather rudimentary and not very good tutorial created by ChatGPT. It’s easy to play once you’ve got the hang of it, and in the end I think the easiest way to get the hang of it is to watch someone else play it for a couple of minutes. I have therefore created a video tutorial, which you can find here. The video itself lasts about 25 minutes, but the tutorial part is under 10 minutes: what makes the video longer is an explanation of various features for making it easier to create puzzles, and also an explanation of how the algorithm works, which again is much easier to explain if I can demonstrate it on screen than if I have to write down some text. (I do have some text, since that is what I used as a prompt for ChatGPT to implement the algorithm, but it took a few iterations to get it working properly, so now I’m not sure what the quality of the code will be like, or even whether it is doing exactly what I want, though it appears to be.) Although I have embedded the video into this post, if you actually want to see what is going on you will probably need to watch it on YouTube using the full screen. I recommend not watching the explanation of the algorithm until you have played the game a few times first. Also, I should warn you that the games in the Advanced category are not all soluble, but games 1, 5 and 6 definitely are. Another thing I forgot to say on the video is that if you want you can “rotate” a word by clicking on one of its letters and dragging it to the right or left. If the word is equal to the identity, then its rotations will be as well, so this is a valid move. There is also a button for disabling this rotation facility if you want the puzzle to be slightly more challenging.
A quick note about the availability of the two games. They are hosted on the Netlify platform. I am on their free plan, which gives me a certain number of “credits” each month. I’m not sure how quickly these will run out, largely because I have no idea how many people will actually think it interesting to play the games. If they do run out, then the games will cease to be available until the credits are renewed, which for me happens on the 10th of each month. If this has happened and you are keen to play one of the games, another option is to download the html files and open them in your browser. Here is a link to the Artin-Tits game and here is one to the Tseytin game. If you are feeling particularly public-spirited, especially if you think you will play quite a lot, then you might consider doing that anyway, so that the Netlify credits run down more slowly. If the running out of the credits is quick enough for all this to be a real issue, then I may move to a paid plan.
I’ll finish with a quick tip for playing the Artin-Tits game, which I mentioned on the video but perhaps didn’t stress enough. Many of the moves consist in selecting three consecutive letters and using a relation of the form to change them. Easy examples of this are replacement rules such as
. But what if inverses are involved? I’ll represent inverses of generators with upper-case letters, so for example
represents
, which in the game would be a white
followed by a white
followed by a black
. The word
turns out to equal
. To remember this, a simple rule is that two letters of the same colour can be bracketed together and “pushed past” the third letter, which retains its colour but changes its value. Here, for example, we write
as
and then swap them over, changing
into
in the process, getting
, or
without the brackets. In group theoretic terms, this is of course saying that
and
are conjugates, and that
is what conjugates one to the other. But when playing the game it is convenient to remember it by thinking that when you see a subword such as
, you can push the
(and in particular the
) to the left, getting
.
I’ve just uploaded to the arXiv my paper “Local Bernstein theory, and lower bounds for Lebesgue constants“. This paper was initially motivated by a problem of Erdős} on Lagrange interpolation, but in the course of solving that problem, I ended up modifying some very classical arguments of Bernstein and his contemporaries (Boas, Duffin, Schaeffer, Riesz, etc.) to obtain “local” versions of these classical “Bernstein-type inequalities” that may be of independent interest.
Bernstein proved many estimates concerning the derivatives of polynomials, trigonometric polynomials, and entire functions of exponential type, but perhaps his most famous inequality in this direction is:
Lemma 1 (Bernstein’s inequality for trigonometric polynomials) Letbe a trigonometric polynomial of degree at most
, with
for all
. Then
for all
.
Similar inequalities concerning norms of derivatives of Littlewood-Paley components of functions are now ubiquitious in the modern theory of nonlinear dispersive PDE (where they are also called Bernstein estimates), but this will not be the focus of this current post.
A trigonometric polynomial of degree
is of exponential type
in the sense that
for complex
. Bernstein in fact proved a more general result:
Lemma 2 (Bernstein’s inequality for functions of exponential type) Letbe an entire function of exponential type at most
, with
for all
. Then
for all
.
There are several proofs of this lemma – see for instance this survey of Queffélec and Zarouf. In the case that is real-valued on
, there is a nice proof by Duffin and Schaeffer, which we sketch as follows. Suppose we normalize
, and adjust
by a suitable damping factor so that
actually decays slower than
as
. Then, for any
and
, one can use Rouche’s theorem to show that the function
has the same number of zeroes as
in a suitable large rectangle; but on the other hand one can use the intermediate value theorem to show that
has at least as many zeroes than
in the same rectangle. Among other things, this prevents double zeroes from occuring, which turns out to give the desired claim
after some routine calculations (in fact one obtains the stronger bound
for all real
).
The first main result of the paper is to obtain localized versions of Lemma 2 (as well as some related estimates). Roughly speaking, these estimates assert that if is holomorphic on a wide thin rectangle passing through the real axis, is bounded by
on the intersection of the real axis with this rectangle, and is “locally of exponential type” in the sense that it is bounded by
on the upper and lower edges of this rectangle (and obeys some very mild growth conditions on the remaining sides of this rectangle), then
can be bounded by
plus small errors on the real line, with some additional estimates away from the real line also available. The proof proceeds by a modification of the Duffin–Schaeffer argument, together with the two-constant theorem of Nevanlinna (and some standard estimates of harmonic measures on rectangles) to deal with the effect of the localization. (As a side note, this latter argument was provided to me by ChatGPT, as I was not previously aware of the Nevanlinna two-constant theorem.)
Once one localizes this “Bernstein theory”, it becomes suitable for the analysis of (real-rooted, monic) polynomials of a high degree
, which are not bounded globally on
(and grow polynomially rather than exponentially at infinity), but which can exhibit “local exponential type” behavior on various intervals, particularly in regions where the logarithmic potential
This becomes relevant in the theory of Lagrange interpolation. Recall that if are real numbers and
is a polynomial of degree less than
then one has the interpolation formula
If one chooses the interpolation points poorly, then the Lebesgue constant can be extremely large. However, if one selects these points to be the roots of the aforementioned monic Chebyshev polynomials, then it is known that
for all fixed intervals
in
. In the case
, it was shown by Erdős} that this is the best possible value of the Lebesgue constant up to
errors for interpolation on
, thus
In terms of the monic polynomial , these two estimates can be written as
Problem 3 Letbe a trigonometric polynomial of degree
with
roots
in
.
It is easy to check that the lower bounds of and
are sharp by considering the case when
is a sinusoid
.
The bound (3) is immediate from Bernstein’s inequality (Lemma 1). By applying a local version of this inequality, I was able to get a weak version of the claim (1) in which was replaced with
; see this early version of the paper, which was developed through conversations with Nat Sothanaphan and Aron Bhalla. By combining this argument with ideas from the older work of Erdős}, I was able to establish (1).
The bound (2) took me longer to establish, and involved a non-trivial amount of playing around with AI tools, the story of which I would like to share here. I had discovered the toy problem (4), but initially was not able to establish this inequality; AlphaEvolve seemed to confirm it numerically (with sinusoids appearing to be the extremizer), but did not offer direct clues on how to prove this rigorously. At some point I realized that the left-hand side factorized into the expressions and
, and tried to bound these expressions separately. Perturbing around a sinusoid
, I was able to show that the
norm
was a local minimum as long as one only perturbed by lower order Fourier modes, keeping the frequency
coefficients unchanged. Guessing that this local minimum was actually a global minimum, this led me to conjecture the general lower bound
Welcome back to: Has quantum advantage been achieved?
In Part 1 of this mini-series on quantum advantage demonstrations, I told you about the idea of random circuit sampling (RCS) and the experimental implementations thereof. In this post, Part 2 out of 3, I will discuss the arguments and evidence for why I am convinced that the experiments demonstrate a quantum advantage.
Recall from Part 1 that to assess an experimental quantum advantage claim we need to check three criteria:
When assessing these criteria for the RCS experiments there is an important problem: The early quantum computers we ran them on were very far from being reliable and the computation was significantly corrupted by noise. How should we interpret this noisy data? Or more concisely:
I want to convince you today that we have developed a very good understanding of these questions that gives a solid underpinning to the advantage claim. Developing that understanding required a mix of methodologies from different areas of science, including theoretical computer science, algorithm design, and physics and has been an exciting journey over the past years.
Let us start by answering the base question. What computational task did the experiments actually solve?
Recall that, in the ideal RCS scenario, we are given a random circuit on qubits and the task is to sample from the output distribution of the state obtained from applying the circuit to a simple reference state. The output probability distribution of this state is determined by the Born rule when I measure every qubit in a fixed choice of basis.
Now what does a noisy quantum computer do when I execute all the gates on it and apply them to its state? Well, it prepares a noisy version of the intended state and once I measure the qubits, I obtain samples from the output distribution of that noisy state.
We should not make the task dependent on the specifics of that state or the noise that determined it, but we can define a computational task based on this observation by fixing how accurate that noisy state preparation is. The natural way to do this is to use the fidelity
which is just the overlap between the ideal state and the noisy state. The fidelity is 1 if the noisy state is equal to the ideal state, and 0 if it is perfectly orthogonal to it.
Finite-fidelity random circuit sampling
Given a typical random circuit , sample from the output distribution of any quantum state whose fidelity with the ideal output state is at least .
Note that finite-fidelity RCS does not demand success for every circuit, but only for typical circuits from the random circuit ensemble. This matches what the experiments do: they draw random circuits and need the device to perform well on the overwhelming majority of those draws. Accordingly, when the experiments quote a single number as “fidelity”, it is really the typical (or, more precisely, circuit-averaged) fidelity that I will just call .
The noisy experiments claim to have solved finite-fidelity RCS for values of around 0.1%. What is more, they consistently achieve this value even as the circuit sizes are increased in the later experiments. Both the actual value and the scaling will be important later.
What is the complexity of finite-fidelity RCS?
Let’s start off by supposing that the quantum computation is (nearly) perfectly executed, so the required fidelity is quite large, say, 90%. In this scenario, we have very good evidence based on computational complexity theory that there is a scalable and in-practice quantum advantage for RCS. This evidence is very strong, comparable to the evidence we have for the hardness of factoring and simulating quantum systems. The intuition behind it is that quantum output probabilities are extremely hard to compute because of a mechanism behind quantum advantages: destructive interference. If you are interested in the subtleties and the open questions, take a look at our survey.
The question is now, how far down in fidelity this classical hardness persists? Intuitively, the smaller we make , the easier finite-fidelity RCS should become for a classical algorithm (and a quantum computer, too), since the freedom we have in deviating from the ideal state in our simulation becomes larger and larger. This increases the possibility of finding a state that turns out to be easy to simulate within the fidelity constraint.
Somewhat surprisingly, though, finite-fidelity RCS seems to remain hard even for very small values of . I am not aware of any efficient classical algorithm that achieves the finite-fidelity task for significantly away from the baseline trivial value of . This is the value a maximally mixed or randomly picked state achieves because a random state has no correlation with the ideal state (or any other state), and is exactly what you expect in that case (while 0 would correspond to perfect anti-correlation).
One can save some classical runtime compared to solving near-ideal RCS by exploiting a reduced fidelity, but the costs remain exponential. To classically solve finite-fidelity RCS, the best known approaches are reported in the papers that performed classical simulations of finite-fidelity RCS with the parameters of the first Google and USTC experiment (classSim1, classSim2). To achieve this, however, they needed to approximately simulate the ideal circuits at an immense cost. To the best of my knowledge, all but those first two experiments are far out of reach for these algorithms.
So what is the right value of at which we can hope for a scalable and in-practice advantage of RCS experiments?
When thinking about this question, it is helpful to keep a model of the circuit in mind that a noisy experiment runs. So, let us consider a noisy circuit on qubits with layers of gates and single-qubit noise of strength on every qubit in every layer. In this scenario, the typical fidelity with the ideal state will decay as .
Any reasonably testable value of the fidelity needs to scale as , since eventually we need to estimate the average fidelity from the experimental samples and this typically requires at least samples, so exponentially small fidelities are experimentally invisible. The polynomial fidelity is also much closer to the near-ideal scenario (90%) than the trivial scenario (). While we cannot formally pin this down, the intuition behind the complexity-theoretic evidence for the hardness of near-ideal RCS persists into the regime: to sample up to such high precision, we still need a reasonably accurate estimate of the ideal probabilities, and getting this is computationally extremely difficult. Scalable quantum advantage in this regime is therefore a pretty safe bet.
How do the parameters of the experiment and the RCS instances need to scale with the number of qubits to experimentally achieve the fidelity regime? The limit to consider is one in which the noise rate decreases with the number of qubits, while the circuit depth is only allowed to increase very slowly. It depends on the circuit architecture, i.e., the choice of circuit connectivity, and the gate set, through a constant as I will explain in more detail below.
Weak-noise and low-depth scaling
(Weak noise) The local noise rate of the quantum device scales as .
(Low depth) The circuit depth scales as .
This limit is such that we have a scaling of the fidelity as for some constant . It is also a natural scaling limit for noisy devices whose error rates gradually improve through better engineering. You might be worried about the fact that the depth needs to be quite low but it turns out that there is a solid quantum advantage even for -depth circuits.
The precise definition of the weak-noise regime is motivated by the following observation. It turns out to be crucial for assessing the noisy data from the experiment.
Remember from Part 1 that the experiments measured a quantity called the cross-entropy benchmark (XEB)
The XEB averages the ideal probabilities corresponding to the sampled outcomes from experiments on random circuits . Thus, it correlates the experimental and ideal output distributions of those random circuits. You can think of it as a “classical version” of the fidelity: If the experimental distribution is correct, the XEB will essentially be 1. If it is uniformly random, the XEB is 0.
The experiments claimed that the XEB is a good proxy for the circuit-averaged fidelity given by , and so we need to understand when this is true. Fortunately, in the past few years, alongside with the improved experiments, we have developed a very good understanding of this question (WN, Spoof2, PT1, PT2).
It turns out that the quality of correspondence between XEB and average fidelity depends strongly on the noise in the experimental quantum state. In fact, there is a sharp phase transition: there is an architecture-dependent constant such that when the experimental local noise rate , then the XEB is a good and reliable proxy for the average fidelity for any system size and circuit depth . This is exactly the weak-noise regime. Above that threshold, in the strong noise regime, the XEB is an increasingly bad proxy for the fidelity (PT1, PT2).
Let me be more precise: In the weak-noise regime, when we consider the decay of the XEB as a function of circuit depth , the rate of decay is given by , i.e., the XEB decays as . Meanwhile, in the strong-noise regime the rate of decay is constant, giving an XEB decay as for a constant . At the same time, the fidelity decays as regardless of the noise regime. Hence, in the weak-noise regime, the XEB is a good proxy of the fidelity, while in the strong noise regime, there is an exponentially increasing gap between the XEB (which remains large) and the fidelity (which continues to decay exponentially). regardless of the noise regime.
This is what the following plot shows. We computed it from an exact mapping of the behavior of the XEB to the dynamics of a statistical-mechanics model that can be evaluated efficiently. Using this mapping, we can also compute the noise threshold for whichever random circuit family and architecture you are interested in.

We are now ready to take a look at the crux when assessing the noisy data: Can we trust the reported XEB values as an estimator of the fidelity? If so, do the experiments solve finite-fidelity RCS in the solidly hard regime where ?
In their more recent paper (PT1), the Google team explicitly verified that the experiment is well below the phase transition, and it turns out that the first experiment was just at the boundary. The USTC experiments had comparable noise rates, and the Quantinuum experiment much better ones. Since fidelity decays as , but the reported XEB values stayed consistently around 0.1% as was increased, the experimental error rate of the experiments improved even better than the scaling required for the weak-noise regime, namely, more like . Altogether, the experiments are therefore in the weak-noise regime both in terms of absolute numbers relative to and the required scaling.
Of course, to derive the transition, we made some assumptions about the noise such as that the noise is local, and that it does not depend much on the circuit itself. In the advantage experiments, these assumptions about the noise are characterized and tested. This is done through a variety of means at increasing levels of complexity, including detailed characterization of the noise in individual gates, gates run in parallel, and eventually in a larger circuit. The importance of understanding the noise shows in the fact that a significant portion of the supplementary materials of the advantage experiments is dedicated to getting this right. All of this contributes to the experimental justification for using the XEB as a proxy for the fidelity!
The data shows that the experiments solved finite-fidelity RCS for values of above the constant value of roughly 0.1% as the experiments grew. In the following plot, I compare the experimental fidelity values to the near-ideal scenario on the one hand, and the trivial value on the other hand. Viewed at this scale, the values of for which the experiment solved finite-fidelity RCS are indeed vastly closer to the near-ideal value than the trivial baseline, which should boost our confidence that reproducing samples at a similar fidelity is extremely challenging.

You might be tempted to say: “Well, but is all this really so important? Can’t I just use XEB and forget all about fidelity?”
The phase transition shows why that would change the complexity of the problem: in the strong-noise regime, XEB can stay high even when fidelity is exponentially small. And indeed, this discrepancy can be exploited by so-called spoofers for the XEB. These are efficient classical algorithms which can be used to succeed at a quantum advantage test even though they clearly do not achieve the intended advantage. These spoofers (Spoof1, Spoof2) can achieve high XEB scores comparable to those of the experiments and scaling like in the circuit depth for some constant .
Their basic idea is to introduce strong, judiciously chosen noise at specific circuit locations that has the effect of breaking up the simulation task up into smaller, much easier components, but at the same time still gives a high XEB score. In doing so, they exploit the strong-noise regime in which the XEB is a really bad proxy for the fidelity. This allows them to sample from states with exponentially low fidelity while achieving a high XEB value.
The discovery of the phase transition and the associated spoofers highlights the importance of modeling when assessing—and even formulating—the advantage claim based on noisy data.
You might also be worried that the experiments did not actually compute the XEB in the advantage regime because to estimate it they would have needed to compute ideal probabilities—a task that is hard by definition of the advantage regime. Instead, they used a bunch of different ways to extrapolate the true XEB from XEB proxies (proxy of a proxy of the fidelity). Is this is a valid way of getting an estimate of the true XEB?
It totally is! Different extrapolations—from easy-to-simulate to hard-to-simulate, from small system to large system etc—all gave consistent answers for the experimental XEB value of the supremacy circuits. Think of this as having several lines that cross in the same point. For that crossing to be a coincidence, something crazy, conspiratorial must happen exactly when you move to the supremacy circuits from different directions. That is why it is reasonable to trust the reported value of the XEB.
All of this is to say that establishing that the experiments correctly solved finite-fidelity RCS and therefore show quantum advantage involved a lot of experimental characterization of the noise as well as theoretical work to understand the effects of noise on the quantity we care about—the fidelity between the experimental and ideal states.
In this respect (and maybe also in the scale of the discovery), the quantum advantage experiments are similar to the recent experiments reporting discovery of the Higgs boson and gravitational waves. While I do not claim to understand any of the details, what I do understand is that in both experiments, there is an unfathomable amount of data that could not be interpreted without preselection and post-processing of the data, theories, extrapolations and approximations that model the experiment and measurement apparatus. All of those enter the respective smoking-gun plots that show the discoveries.
If you believe in the validity of experimental physics methodology, you should therefore also believe in the type of evidence underlying experimental claim of the quantum advantage demonstrations: that they sampled from the output distribution of a quantum state with the reported fidelities.
Put succinctly: If you believe in the Higgs boson and gravitational waves, you should probably also believe in the experimental demonstration of quantum advantage.
“The weak-noise limit is not physical. The appropriate scaling limit is one in which the local noise rate of the device is constant while the system size grows, and in that case, there is a classical simulation algorithm for RCS (SimIQP, SimRCS).”
In theoretical computer science, scaling of time or the system size in the input size is considered very natural: We say an algorithm is efficient if its runtime and space usage only depend polynomially on the input size.
But all scaling arguments are hypothetical concepts, and we only care about the scaling at relevant sizes. In the end, every scaling limit is going to hit the wall of physical reality—be it the amount of energy or human lifetime that limits the time of an algorithm, or the physical resources that are required to build larger and larger computers. To keep the scaling limit going as we increase the size of our computations, we need innovation that makes the components smaller and less noisy.
At the scales relevant to RCS, the scaling of the noise is benign and even natural. Why? Well, currently, the actual noise in quantum computers is not governed by the fundamental limit, but by engineering challenges. Realizing this limit therefore amounts to engineering improvements in the system size and noise rate that are achieved over time. Sure, at some point that scaling limit is also going to hit a fundamental barrier below which the noise cannot be improved. But we are surely far away from that limit, yet. What is more, already now logical qubits are starting to work and achieve beyond-breakeven fidelities. So even if the engineering improvements should flatten out from here onward, QEC will keep the noise limit going and even accelerate it in the intermediate future.
“All the hard complexity-theoretic evidence for quantum advantage is in the near-ideal regime, but now you are claiming advantage for the low-fidelity version of that task.”
This is probably the strongest counter-argument in my opinion, and I gave my best response above. Let me just add that this is a question about computational complexity. In the end, all of complexity theory is based on belief. The only real evidence we have for the hardness of any task is the absence of an efficient algorithm, or the reduction to a paradigmatic, well-studied task for which there is no efficient algorithm.
I am not sure how much I would bet that you cannot find an efficient algorithm for finite-fidelity RCS in the regime of the experiments, but it is certainly a pizza.
“There is no verification test that just depends on the classical samples, is efficient and does not make any assumptions about the device. In particular, you cannot unconditionally verify fidelity just from the classical samples. Why should I believe the data?”
Yes, sure, the current advantage demonstrations are not device-independent. But the comparison you should have in mind are Bell tests. The first proper Bell tests of Aspect and others in the 80s were not free of loopholes. They still allowed for contrived explanations of the data that did not violate local realism. Still, I can hardly believe that anyone would argue that Bell inequalities were not violated already back then.
As the years passed, these remaining loopholes were closed. To be a skeptic of the data, people needed to come up with more and more adversarial scenarios that could explain the data. We are working on the same to happen with quantum advantage demonstrations: come up with better schemes and better tests that require less and less assumptions or knowledge about the specifics of the device.
“When you chose the gates and architecture of the circuit dependent on your device, you tailored the task too much to the device and that is unfair. Not even the different RCS experiments solve exactly the same task.”
This is not really an argument against the achievement of quantum advantage but more against the particular choices of circuit ensembles in the experiments. Sure, the specific computations solved are still somewhat tailored to the hardware itself and in this sense the experiments are not hardware-independent yet, but they still solve fine computational tasks. Moving away from such hardware-tailored task specifications is another important next step and we are working on it.
In the third and last part of this mini series I will address next steps in quantum advantage that aim at closing some of the remaining loopholes. The most important—and theoretically interesting—one is to enable efficient verification of quantum advantage using less or even no specific knowledge about the device that was used, but just the measurement outcomes.
(survey) Hangleiter, D. & Eisert, J. Computational advantage of quantum random sampling. Rev. Mod. Phys. 95, 035001 (2023).
(classSim1) Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett. 129, 090502 (2022).
(classSim2) Kalachev, G., Panteleev, P., Zhou, P. & Yung, M.-H. Classical Sampling of Random Quantum Circuits with Bounded Fidelity. arXiv.2112.15083 (2021).
(WN) Dalzell, A. M., Hunter-Jones, N. & Brandão, F. G. S. L. Random Quantum Circuits Transform Local Noise into Global White Noise. Commun. Math. Phys. 405, 78 (2024).
(PT1)vMorvan, A. et al. Phase transitions in random circuit sampling. Nature 634, 328–333 (2024).
(PT2) Ware, B. et al. A sharp phase transition in linear cross-entropy benchmarking. arXiv:2305.04954 (2023).
(Spoof1) Barak, B., Chou, C.-N. & Gao, X. Spoofing Linear Cross-Entropy Benchmarking in Shallow Quantum Circuits. in 12th Innovations in Theoretical Computer Science Conference (ITCS 2021) (ed. Lee, J. R.) vol. 185 30:1-30:20 (2021).
(Spoof2) Gao, X. et al. Limitations of Linear Cross-Entropy as a Measure for Quantum Advantage. PRX Quantum 5, 010334 (2024).
(SimIQP) Bremner, M. J., Montanaro, A. & Shepherd, D. J. Achieving quantum supremacy with sparse and noisy commuting quantum computations. Quantum 1, 8 (2017).
(SimRCS) Aharonov, D., Gao, X., Landau, Z., Liu, Y. & Vazirani, U. A polynomial-time classical algorithm for noisy random circuit sampling. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 945–957 (2023).
Recently, I gave a couple of perspective talks on quantum advantage, one at the annual retreat of the CIQC and one at a recent KITP programme. I started off by polling the audience on who believed quantum advantage had been achieved. Just this one, simple question.
The audience was mostly experimental and theoretical physicists with a few CS theory folks sprinkled in. I was sure that these audiences would be overwhelmingly convinced of the successful demonstration of quantum advantage. After all, more than half a decade has passed since the first experimental claim (G1) of “quantum supremacy” as the patron of this blog’s institute called the idea “to perform tasks with controlled quantum systems going beyond what can be achieved with ordinary digital computers” (Preskill, p. 2) back in 2012. Yes, this first experiment by the Google team may have been simulated in the meantime, but it was only the first in an impressive series of similar demonstrations that became bigger and better with every year that passed. Surely, so I thought, a significant part of my audiences would have been convinced of quantum advantage even before Google’s claim, when so-called quantum simulation experiments claimed to have performed computations that no classical computer could do (e.g. (qSim)).
I could not have been more wrong.
In both talks, less than half of the people in the audience thought that quantum advantage had been achieved.
In the discussions that ensued, I came to understand what folks criticized about the experiments that have been performed and even the concept of quantum advantage to begin with. But more on that later. Most of all, it seemed to me, the community had dismissed Google’s advantage claim because of the classical simulation shortly after. It hadn’t quite kept track of all the advances—theoretical and experimental—since then.
In a mini-series of three posts, I want to remedy this and convince you that the existing quantum computers can perform tasks that no classical computer can do. Let me caution, though, that the experiments I am going to talk about solve a (nearly) useless task. Nothing of what I say implies that you should (yet) be worried about your bank accounts.
I will start off by recapping what quantum advantage is and how it has been demonstrated in a set of experiments over the past few years.
To state the obvious: we are now fairly convinced that noiseless quantum computers would be able solve problems efficiently that no classical computer could solve. In fact, we have been convinced of that already since the mid-90ies when Lloyd and Shor discovered two basic quantum algorithms: simulating quantum systems and factoring large numbers. Both of these are tasks where we are as certain as we could be that no classical computer can solve them. So why talk about quantum advantage 20 and 30 years later?
The idea of a quantum advantage demonstration—be it on a completely useless task even—emerged as a milestone for the field in the 2010s. Achieving quantum advantage would finally demonstrate that quantum computing was not just a random idea of a bunch of academics who took quantum mechanics too seriously. It would show that quantum speedups are real: We can actually build quantum devices, control their states and the noise in them, and use them to solve tasks which not even the largest classical supercomputers could do—and these are very large.
But what exactly do we mean by “quantum advantage”. It is a vague concept, for sure. But some essential criteria that a demonstration should certainly satisfy are probably the following.
Achieving this last criterion using imperfect, noisy quantum devices is the challenge the idea of quantum supremacy set for the field. After all, running any of our favourite quantum algorithms in a classically hard regime on these devices is completely out of the question. They are too small and too noisy. So the field had to come up with the conceivably smallest and most noise-robust quantum algorithm that has a significant scaling advantage against classical computation.
The idea is simple: we just run a random computation, constructed in a way that is as favorable as we can make it to the quantum device while being as hard as possible classically. This may strike as a pretty unfair way to come up with a computational task—it is just built to be hard for classical computers without any other purpose. But: it is a fine computational task. There is an input: the description of the quantum circuit, drawn randomly. The device needs to be programmed to run this exact circuit. And there is a task: just return whatever this quantum computation would return. These are strings of 0s and 1s drawn from a certain distribution. Getting the distribution of the strings right for a given input circuit is the computational task.
This task, dubbed random circuit sampling, can be solved on a classical as well as a quantum computer, but there is a (presumably) exponential advantage for the quantum computer. More on that in Part 2.
For now, let me tell you about the experimental demonstrations of random circuit sampling. Allow me to be slightly more formal. The task solved in random circuit sampling is to produce bit strings distributed according to the Born-rule outcome distribution
of a sequence of elementary quantum operations (unitary rotations of one or two qubits at a time) which is drawn randomly according to certain rules. This circuit is applied to a reference state on the quantum computer and then measured, giving the string as an outcome.
In the first quantum supremacy experiment (G1) by the Google team, the quantum computer was built from 53 superconducting qubits arranged in a 2D grid. The operations were randomly chosen simple one-qubit gates () and deterministic two-qubit gates called fSim applied in the 2D pattern, and repeated a certain number of times (the depth of the circuit). The limiting factor in these experiments was the quality of the two-qubit gates and the measurements, with error probabilities around 0.6 % and 4 %, respectively.
A very similar experiment was performed by the USTC team on 56 qubits (U1) and both experiments were repeated with better fidelities (0.4 % and 1 % for two-qubit gates and measurements) and slightly larger system sizes (70 and 83 qubits, respectively) in the past two years (G2,U2).
Using a trapped-ion architecture, the Quantinuum team also demonstrated random circuit sampling on 56 qubits but with arbitrary connectivity (random regular graphs) (Q). There, the two-qubit gates were -rotations around , the single-qubit gates were uniformly random and the error rates much better (0.15 % for both two-qubit gate and measurement errors).
All the experiments ran random circuits on varying system sizes and circuit depths, and collected thousands to millions of samples from a few random circuits at a given size. To benchmark the quality of the samples, the widely accepted benchmark is now the linear cross-entropy (XEB) benchmark defined as
for an -qubit circuit. The expectation over is over the random choice of circuit and the expectation over is over the experimental distribution of the bit strings. In other words, to compute the XEB given a list of samples, you ‘just’ need to compute the ideal probability of obtaining that sample from the circuit and average the outcomes.
The XEB is nice because it gives 1 for ideal samples from sufficiently random circuits and 0 for uniformly random samples, and it can be estimated accurately from just a few samples. Under the right conditions, it turns out to be a good proxy for the many-body fidelity of the quantum state prepared just before the measurement.
This tells us that we should expect an XEB score of for some noise- and architecture-dependent constant . All of the experiments achieved a value of the XEB that was significantly (in the statistical sense) far away from 0 as you can see in the plot below. This shows that something nontrivial is going on in the experiments, because the fidelity we expect for a maximally mixed or random state is which is less than % for all the experiments.

The complexity of simulating these experiments is roughly governed by an exponential in either the number of qubits or the maximum bipartite entanglement generated. Figure 5 of the Quantinuum paper has a nice comparison.
It is not easy to say how much leverage an XEB significantly lower than 1 gives a classical spoofer. But one can certainly use it to judiciously change the circuit a tiny bit to make it easier to simulate.
Even then, reproducing the low scores between 0.05 % and 0.2 % of the experiments is extremely hard on classical computers. To the best of my knowledge, producing samples that match the experimental XEB score has only been achieved for the first experiment from 2019 (PCZ). This simulation already exploited the relatively low XEB score to simplify the computation, but even for the slightly larger 56 qubit experiments these techniques may not be feasibly run. So to the best of my knowledge, the only one of the experiments which may actually have been simulated is the 2019 experiment by the Google team.
If there are better methods, or computers, or more willingness to spend money on simulating random circuits today, though, I would be very excited to hear about it!
Now, you may be wondering: “How do you even compute the XEB or fidelity in a quantum advantage experiment in the first place? Doesn’t it require computing outcome probabilities of the supposedly hard quantum circuits?” And that is indeed a very good question. After all, the quantum advantage of random circuit sampling is based on the hardness of computing these probabilities. This is why, to get an estimate of the XEB in the advantage regime, the experiments needed to use proxies and extrapolation from classically tractable regimes.
This will be important for Part 2 of this series, where I will discuss the evidence we have for quantum advantage, so let me give you some more detail. To extrapolate, one can just run smaller circuits of increasing sizes and extrapolate to the size in the advantage regime. Alternatively, one can run circuits with the same number of gates but with added structure that makes them classically simulatable and extrapolate to the advantage circuits. Extrapolation is based on samples from different experiments from the quantum advantage experiments. All of the experiments did this.
A separate estimate of the XEB score is based on proxies. An XEB proxy uses the samples from the advantage experiments, but computes a different quantity than the XEB that can actually be computed and for which one can collect independent numerical and theoretical evidence that it matches the XEB in the relevant regime. For example, the Google experiments averaged outcome probabilities of modified circuits that were related to the true circuits but easier to simulate.
The Quantinuum experiment did something entirely different, which is to estimate the fidelity of the advantage experiment by inverting the circuit on the quantum computer and measuring the probability of coming back to the initial state.
All of the methods used to estimate the XEB of the quantum advantage experiments required some independent verification based on numerics on smaller sizes and induction to larger sizes, as well as theoretical arguments.
In the end, the advantage claims are thus based on a proxy of a proxy of the quantum fidelity. This is not to say that the advantage claims do not hold. In fact, I will argue in my next post that this is just the way science works. I will also tell you more about the evidence that the experiments I described here actually demonstrate quantum advantage and discuss some skeptical arguments.
Let me close this first post with a few notes.
In describing the quantum supremacy experiments, I focused on random circuit sampling which is run on programmable digital quantum computers. What I neglected to talk about is boson sampling and Gaussian boson sampling, which are run on photonic devices and have also been experimentally demonstrated. The reason for this is that I think random circuits are conceptually cleaner since they are run on processors that are in principle capable of running an arbitrary quantum computation while the photonic devices used in boson sampling are much more limited and bear more resemblance to analog simulators.
I want to continue my poll here, so feel free to write in the comments whether or not you believe that quantum advantage has been demonstrated (by these experiments) and if not, why.
Continue reading Part 2: Considering the evidence.
[G1] Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019).
[Preskill] Preskill, J. Quantum computing and the entanglement frontier. arXiv:1203.5813 (2012).
[qSim] Choi, J. et al. Exploring the many-body localization transition in two dimensions. Science 352, 1547–1552 (2016). .
[U1] Wu, Y. et al. Strong Quantum Computational Advantage Using a Superconducting Quantum Processor. Phys. Rev. Lett. 127, 180501 (2021).
[G2] Morvan, A. et al. Phase transitions in random circuit sampling. Nature 634, 328–333 (2024).
[U2] Gao, D. et al. Establishing a New Benchmark in Quantum Computational Advantage with 105-qubit Zuchongzhi 3.0 Processor. Phys. Rev. Lett. 134, 090601 (2025).
[Q] DeCross, M. et al. Computational Power of Random Quantum Circuits in Arbitrary Geometries. Phys. Rev. X 15, 021052 (2025).
[PCZ] Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett. 129, 090502 (2022).
guest post by William Waites
The previous post introduced the plumbing calculus: typed channels, structural morphisms, two forms of composition, and agents as stateful morphisms with a protocol for managing their state. The examples were simple. This post is about what happens when the algebra handles something genuinely complex.
To get there, we need to understand a little about how large language models work.
Large language models are sequence-to-sequence transducers: a sequence of tokens comes in, a sequence comes out. Text is tokenised and the model operates on the tokens.
From the outside, the morphism is simple: !string → !string. A message goes in, a message comes out. But the client libraries (the code that calls the LLM provider) maintain the conversation history and send it back with every call. The actual morphism is
(!string, ![Message]) → (!string, ![Message]): the input message and the accumulated history go in, the response and the updated history come out. The history feeds back. This is a trace in the sense of traced monoidal categories: the feedback channel is hidden from the user, who sees only !string → !string.
Crucially, the model has a limited amount of memory. It is not a memoryless process, but the memory it has is not large: 200,000 tokens for current models, perhaps a million for the state of the art. This sounds like a lot. It is not. An academic paper is roughly 10,000 tokens. A literature review that needs to work with thirty papers has already exceeded the context window of most models, and that is before the model has produced a single word of output.
If you have used any of these agent interfaces, you will have noticed that after talking back and forth for a while, the agent will compact. This is a form of memory management. What is happening is that some supervisory process has noticed the context window filling up, and has intervened to shorten its contents. A naïve approach is to truncate: discard everything before the last N exchanges. A better approach is to feed the entire context to another language model and ask it to summarise, then put the summary back.
This is normally done by specialised code outside the agent, invisible to it.
How to manage agent memory well is an active research area. We do not, in general, do it very well. Truncation loses information. Summarisation loses nuance. Pinning helps but the right pinning strategy depends on the task. These are open questions, and to make progress we need to be able to experiment with different schemes and mechanisms: express a memory management strategy, test it, swap it for another, compare. Not by recompiling specialised code or hardcoding behaviour, but by writing it down in a language designed for exactly this kind of composition. Memory management should be a plumbing program: modular, type-checked, swappable.
So we built an implementation of compaction using the plumbing calculus, and the first thing we did was test it. I ran the protocol on a very short cycle: a single message caused a compaction, because the threshold was set to zero for testing. The compressor fired, produced a summary, rebuilt the agent’s context. The logs showed [compressor] 3404 in / 541 out. The protocol worked.
Then I asked the agent: "have you experienced compaction?"
The agent said no. It explained what compaction is, accurately. Said it hadn’t happened yet. It was confident.
I asked: "do you have a context summary in your window?"
Yes, it said, and described the contents accurately.
"How did that context summary get there if you have not yet compacted?"
The agent constructed a plausible, confident, and completely wrong explanation: the summary was "provided to me by the system at the start of this conversation" as a "briefing or recap." When pressed, it doubled down:
"The context-summary is not evidence that compaction has occurred. It’s more like a briefing or recap that the system gives me at the start of a conversation session to provide continuity."
The agent was looking at the direct evidence of its own compaction and confidently explaining why it was not compaction. We will return to why it gets this wrong, and how to fix it. But first: how do we build this?
At a high level, it works like this. An agent is running: input comes in, output goes out. Together with the output, the agent emits a telemetry report. The telemetry includes token counts: with each transaction, the entire train of messages and responses is sent to the LLM provider, and back comes a response together with a count of the tokens that went in and the tokens that came out. Our agent implementation sends this telemetry out of the telemetry port to anybody who is listening.
The construction involves a second agent. This second agent is a homunculus: the little man who sits on your shoulder and watches what your mind is doing. Here is the topology:
The homunculus listens to the telemetry and says: the memory is filling up. The token count has crossed a threshold. It is time to compact. And then it acts:
• Send pause to the agent’s control port. Stop accepting input.
• Send get memory. The agent produces the contents of its context window.
• Summarise that memory (using another LLM call).
• Send set memory with the compacted version.
• Send resume. The agent continues processing input.
Each step requires an acknowledgement before the next can proceed. This is a protocol: pause, acknowledge, get memory, here is the memory, set memory, acknowledge, resume, acknowledge.
It is possible to express this directly in the plumbing calculus, but it would be painfully verbose. Instead, we use session types to describe the protocol. This is not pseudocode. There is a compiler and a runtime for this language. Here is the protocol:
protocol Compaction =
send Pause . recv PauseAck .
send GetMemory . recv MemoryDump .
send SetMemory . recv SetMemoryAck .
send Resume . recv ResumeAck . end
The protocol is eight lines. It reads as a sequence of steps: send, receive, send, receive, and so on. The compiler knows what types each step carries. Now we wire it up:
let compact : (!CtrlResp, !json) -> !CtrlCmd =
plumb(ctrl_out, telemetry, ctrl_in) {
(ctrl_out, ctrl_in) Compaction as session
telemetry
; filter(kind = "usage" && input_tokens > 150000)
; map(null) ; session@trigger
session@trigger ; map({pause: true})
; session@send(Pause)
session@done(PauseAck) ; map({get_memory: true})
; session@send(GetMemory)
session@recv(MemoryDump) ; compressor
; session@send(SetMemory)
session@done(SetMemoryAck) ; map({resume: true})
; session@send(Resume)
}
The first line binds the protocol to the agent’s control ports:
(ctrl_out, ctrl_in) <-> Compaction as session.
This says: the Compaction protocol runs over the control channel, and we refer to it as session. The telemetry line is the trigger: when token usage crosses a threshold, the protocol begins. Each subsequent line is one step of the protocol, wired to the appropriate control
messages.
Here is a direct depiction of the protocol as wired. You can trace it through:
And here is how we wire the homunculus to the agent:
let main : !string -> !string =
plumb(input, output) {
let ctrl : !CtrlCmd = channel
let ctrl_out : !CtrlResp = channel
let telem : !json = channel
spawn bot(input=input, ctrl_in=ctrl,
output=output, ctrl_out=ctrl_out,
telemetry=telem)
spawn compact(ctrl_out=ctrl_out,
telemetry=telem, ctrl_in=ctrl)
}
The main morphism takes a string input and produces a string output. Internally, it creates three channels (control commands, control responses, telemetry) and spawns two processes: the bot agent and the compact homunculus. The homunculus listens to the bot’s telemetry and control responses, and sends commands to the bot’s control input. The bot does not know the homunculus exists. It just receives control messages and responds to them.
There are two nested traces here. The first is the one from before, inside the agent: messages go in, the output accumulates with everything that came before, and the whole history feeds back on the next turn. We do not see this trace. It is hidden inside the client library. The second trace is the one we have just built: the homunculus. What goes around the outer loop is control: telemetry flows out, commands flow in, acknowledgements come back. The memory dump passes through the control channel at one point in the protocol, but the feedback path is control, not conversation history. Nested traces compose; the algebra has identities for this and it is fine. But they are different loops carrying different things.
The connection between the protocol above and what the compiler actually produces is the functor from session types into the plumbing calculus. This functor works because of barrier.
Why do we need the barrier? Because the protocol is about sending a message and waiting for a response. We can send a message, but we need the response to arrive before we proceed. The barrier takes two streams, one carrying the "done" signal and one carrying the response, and synchronises them into a pair. Only when both are present does the next step begin.
Each session type primitive has a direct image in the plumbing category, and the structure is prettier than it first appears. The primitives come in dual pairs:
In the diagrams below, session types are on the left in blue; their images in the plumbing calculus are on the right in beige.
• send and recv are dual. They map to map and filter, which are also dual: send wraps the value with map, then synchronises via barrier with the done signal from the previous step. Recv filters the control output by step number, synchronises via barrier, then extracts the payload with map.
• select and offer are dual. They map to tag and case analysis, which are also dual: select tags the value with a label via map, synchronises via barrier, and routes to the chosen branch chain. Offer copies the control output and filters each copy by label, routing to the appropriate branch chain.
• The sequencing operator (.) maps to a barrier chain. Each send-then-recv step becomes a barrier that synchronises the outgoing message with the incoming acknowledgement, and these barriers chain together to enforce the protocol ordering.
• rec maps to a feedback loop: merge takes the initial arm signal and the last done signal from the previous iteration, feeds them into the barrier chain body, and copy at the end splits done into output and feedback. The trigger serialisation gate starts
• end is implicit: the chain simply stops. Discard handles any remaining signals.
This mapping is a functor. It is total: every session type primitive has an image in the plumbing category, using only the morphisms we already have. Session types are a specification language; the plumbing calculus is the execution language. The compiler translates one into the other.
The reason we do this becomes obvious from the diagram below. It is scrunched up and difficult to look at. If you click on it you can get a big version and puzzle it out. If you squint through the spaghetti, you can see that it does implement the same compaction protocol above. We would not want to implement this by hand. So it is nice to have a functor. If you have the patience to puzzle your way through it, you can at least informally satisfy yourself that it is correct.
There is another feature we implement, because managing the memory of an agent is not as simple as just compressing it.
The problem with compression is that it is a kind of annealing. As the conversation grows, it explores the space of possible conversation. When it gets compacted, it is compressed, and that lowers the temperature. Then it grows again, the temperature rises, and then it is compressed again. With each compression, information is lost. Over several cycles of this, the agent can very quickly lose track of where it was, what you said at the beginning, what it was doing.
We can begin to solve this with document pinning. The mechanism is a communication between the agent and its homunculus, not shown in the protocol above. It is another protocol. The agent says: this document that I have in memory (technically a tool call and response, or just a document in the case of the prompts), pin it. What does that mean? When we do compaction, we compact the contents of memory, but when we replace the memory, we also replace those pinned documents verbatim. And of course you can unpin a document and say: I do not want this one any more.
Either the agent can articulate this or the user can. The user can say: you must remember this, keep track of this bit of information. And the agent has a way to keep the most important information verbatim, without it getting compacted away.
This accomplishes two things.
First, it tends to keep the agent on track, because the agent no longer loses the important information across compaction cycles. The annealing still happens to the bulk of the conversation, but the pinned documents survive intact.
Second, it has to do with the actual operation of the underlying LLM on the GPUs. When you send a sequence of messages, this goes into the GPU and each token causes the GPU state to update. This is an expensive operation, very expensive. This is why these things cost so much. What you can do with some providers is put a cache point and say: this initial sequence of messages, from the beginning of the conversation up until the cache point, keep a hold of that. Do not recompute it. When you see this exact same prefix, this exact same sequence of messages again, just load that memory into the GPU directly. Not only is this a lot more efficient, it is also a lot cheaper, a factor of ten cheaper if you can actually hit the cache.
So if you are having a session with an agent and the agent has to keep some important documents in its memory, it is a good idea to pin them to the beginning of memory. You sacrifice a little bit of the context window in exchange for making sure that, number one, the information in those documents is not forgotten, and number two, that it can hit the cache. This is explained in more detail in a separate post on structural prompt preservation.
Why does the agent get this wrong? In one sense, it is right. It has not experienced compaction. Nobody experiences compaction. Compaction happens in the gap between turns, in a moment the agent cannot perceive. The agent’s subjective time begins at the summary. There is no "before" from its perspective.
The summary is simply where memory starts. It is like asking someone "did you experience being asleep?" You can see the evidence, you are in bed, time has passed. But you did not experience the transition.
The <context-summary> tag is a structural marker. But interpreting it as evidence of compaction requires knowing what the world looked like before, and the agent does not have that. It would need a memory of not having a summary, followed by a memory of having one. Compaction erases exactly that transition.
The fix is not complicated. It is perfectly reasonable to provide, along with the user’s message, self-knowledge to the agent as metadata. What would it be useful for the agent to know?
The current time. The sense of time that these agents have is bizarre. We live in continuous time. Agents live in discrete time. As far as they are concerned, no time passes between one message and the next. It is instantaneous from their point of view. You may be having a conversation, walk away, go to the café, come back two hours later, send another message, and as far as the agent is concerned no time has passed. But if along with your message you send the current time, the agent knows.
How full the context window is. The agent has no way of telling, but you can provide it: this many tokens came in, this many went out.
Compaction cycles. So the agent knows how many times it has been compacted, and can judge the accuracy of the contents of its memory, which otherwise it could not do.
With the compaction counter, the agent immediately gets it right:
"Yes, I have experienced compaction. According to the runtime context, there has been 1 compaction cycle during this session."
No hedging, no confabulation. Same model, same prompts, one additional line of runtime context.
This matters beyond the compaction story, because many of the failures we see in the news are context failures, not alignment failures.
While we were writing this post, a story appeared in the Guardian about AI chatbots directing people with gambling addictions to online casinos. This kind of story is common: vulnerable people talking to chatbots, chatbots giving them bad advice. The response of the industry is always the same: we need better guardrails, better alignment, as though the chatbots are badly aligned.
I do not think that is what is happening. What is happening is a lack of context. Either the chatbot was never told the person was vulnerable, or it was told and the information got lost. Someone with a gambling addiction may start by saying "I have a gambling problem." Then there is a four-hour conversation about sports. Through compaction cycles, what gets kept is the four hours of sports talk. The important bit of information does not get pinned and does not get kept. Context drift. By the time the user asks for betting tips, the chatbot no longer knows it should not give them.
The way to deal with this is not to tell the language model to be more considerate. The way to deal with it is to make sure the agent has enough information to give good advice, and that the information does not get lost. This is what document pinning is for: pinned context survives compaction, stays at the top of the window, cannot be diluted by subsequent conversation. This is discussed further in a separate post on structural prompt preservation.
But pinning is only one strategy. The field is in its infancy. We do not really know the right way to manage agent memory, and we do not have a huge amount of experience with it. What we are going to need is the ability to experiment with strategies: what if compaction works like this? What if pinning works like that? What if the homunculus watches for different signals? Each of these hypotheses needs to be described clearly, tested, and compared. This is where the formal language earns its keep. A strategy described in the plumbing calculus is precise, checkable, and can be swapped out for another without rewriting the surrounding infrastructure. We can experiment with memory architectures the way we experiment with any other part of a system: by describing what we want and seeing if it works.
When the first draft of this post was written, it was a mystery why the field had not thought to give agents self-knowledge as a routine matter: what they are doing, who they are talking to, what they should remember. Prompts are initial conditions. They get compacted away. There are agents that save files to disc, in a somewhat ad hoc way, but we do not give them tools to keep track of important information in a principled way.
Contemporaneously with this work, some providers have started to do it. For example, giving agents a clock, the ability to know what time it is. This is happening now, in the weeks between drafting and publication. The field is only now realising that agents need a certain amount of self-knowledge in order to function well. The compressed timeline is itself interesting: the gap between "why has nobody done this?" and "everybody is starting to do this" was a matter of weeks.
The mechanisms we have presented here allow us to construct agent networks and establish protocols that describe rigorously how they are meant to work. We can describe strategies for memory management in a formal language, test them, and swap them out. And perhaps beyond the cost savings and the efficiency increases, the ability to experiment clearly and formally with how agents manage their own memory is where the real value lies.
Mathematical research traditionally involves a small number of professional mathematicians working closely on difficult problems. However, I have long believed that there is a complementary way to do mathematics, in which one works with a broad community of mathematically minded people on problems which may not be as deep as the problems one traditionally works on, but still are of mathematical interest; and that modern technologies, including AI, are more suitable for contributing the latter type of workflow. The “Polymath projects” were one example of this broad type of collaboration, where internet platforms such as blogs and wikis were used to facilitate such collaboration. Some years later, collaborative formalization projects (such as the one to formalize the Polynomial Freiman–Ruzsa conjecture of Marton, discussed previously on this blog here) became popular in some circles. And in 2024, I launched the Equational Theories Project (ETP) (discussed on this blog here and here), combining the rigor of Lean formalization with “good old fashioned AI” (in the form of automated theorem provers) to settle (with formal verification) over 22 million true-false problems in universal algebra.
Continuing in this spirit, Damek Davis and I are launching a new project, in the form of an experimental competitive challenge hosted by the SAIR Foundation (where I serve as a board member, and which is supplying technical support and compute). The idea of this challenge, motivated in part by this recent paper of Honda, Murakami, and Zhang, is to measure the extent to which the 22 million universal algebra true-false results obtained by the ETP can be “distilled” into a short, human-readable “cheat sheet”, similar to how a student in an undergraduate math class might distill the knowledge learned from that class into a single sheet of paper that the student is permitted to bring into an exam.
Here is a typical problem in universal algebra that the ETP was able to answer:
Problem 1 Suppose thatis a binary operation such that
for all
. Is it true that
for all
?
Such a problem can be settled either by algebraically manipulating the initial equation to deduce the target equation, or by finding a counterexample to the target equation that still satisfies the initial equation. There are a variety of techniques to achieve either of these, but this sort of problem is difficult, and even undecidable in some cases; see this paper of the ETP collaborators for more discussion. Nevertheless, many of these problems can be settled with some effort by humans, by automated theorem provers, or by frontier AI systems; here for instance is an AI-generated solution to the above problem.
However, these AI models are expensive, and do not reveal much insight as to where their answers come from. If one instead tries a smaller and cheaper model, such as one of the many open-source models available, it turns out that these models basically perform no better than random chance, in that when asked to say whether the answer to a question such as the above is true or false, they only answer correctly about 50% of the time.
But, similarly to how a student struggling with the material for a math class can perform better on an exam when provided the right guidance, it turns out that such cheap models can perform at least modestly better on this task (with success rates increasing to about 55%-60%) if given the right prompt or “cheat sheet”.
“Stage 1” of the distillation challenge, which we launched today, asks for contestants to design a cheat sheet (of at most 10 kilobytes in size) that can increase the performance of these models on the above true-false problems to as high a level as possible. We have provided a “playground” with which to test one’s cheat sheet (or a small number of example cheat sheets) some cheap models against a public set of 1200 problems (1000 of which were randomly selected, and rather easy, together with 200 “hard” problems that were selected to resist the more obvious strategies for resolving these questions); a brief video explaining how to use the playground can be found here.
Submissions stage will end on April 20, after which we will evaluate the submissions against a private subset of test questions. The top 1000 submissions will advance to a second stage which we are currently in the process of designing, which will involve more advanced models, but also the more difficult task of not just providing a true-false answer, but also a proof or counterexample to the problem.
The competition will be coordinated on this Zulip channel, where I hope there will be a lively and informative discussion.
My hope is that the winning submissions will capture the most productive techniques for solving these problems, and/or provide general problem-solving techniques that would also be applicable to other types of mathematical problems. We started with the equational theory project data set for this pilot competition due to its availability and spectrum of difficulty levels, but if this type of distillation process leads to interesting results, one could certainly run in on many other types of mathematical problem classes to get some empirical data on how readily they can be solved, particularly after we learn from this pilot competition on how to encourage participation and share of best practices.
SAIR will also launch some other mathematical challenges in the coming months that will be of a more cooperative nature than this particular competitive challenge; stay tuned for further announcements.
guest post by William Waites
How category theory can be used to help coordinate collections of interacting large language models.
Agent frameworks are popular. (These are frameworks for coordinating large language model agents, not to be confused with agent-based modelling in the simulation sense.) There are dozens of them for wrapping large language models in something called an agent and assembling groups of agents into workflows. Much of the surrounding discussion is marketing, but the underlying intuition is old: your web browser identifies itself as a user agent. What is new is the capability that generative language models bring.
The moment you have one agent, you can have more than one. That much is obvious. How to coordinate them is not. The existing frameworks (n8n, LangGraph, CrewAI, and others) are engineering solutions, largely ad hoc. Some, like LangGraph, involve real thinking about state machines and concurrency. But none draws on what we know from mathematics and computer science about typed composition, protocol specification, or structural guarantees for concurrent systems.
This matters because it is expensive. Multi-agent systems are complicated concurrent programs. Without structural guardrails, they fail in ways you discover only after spending the compute. A job can go off the rails, and the money you paid for it is wasted; the providers will happily take it regardless. At current subscription rates the cost is hidden, but a recent Forbes investigation found that a heavy user of Anthropic’s $200/month Claude Code subscription can consume up to $5,000/month measured at retail API rates. For third-party tools like Cursor, which pay close to those retail rates, these costs are real. Wasted tokens are wasted money.
To address this, we built a language called plumbing. It describes how agents connect and communicate, in such a way that the resulting graph can be checked before execution: checked for well-formedness, and within limits for deadlocks and similar properties. It is a statically typed language, and these checks are done formally. There is a compiler and a runtime for this language, working code, not a paper architecture. In a few lines of plumbing, you can describe agent systems with feedback loops, runtime parameter modulation, and convergence protocols, and be sure they are well-formed before they run. This post explains how it works.
The name has a history in computing. Engineers have always talked informally about plumbing to connect things together: bits of software, bits of network infrastructure. When I was a network engineer I sometimes described myself as a glorified plumber. The old Solaris ifconfig command took plumb as an argument, to wire a network interface into the stack. Plan 9 had a deeper version of the same idea. The cultural connection goes back decades.
This is the first of two posts. This one introduces the plumbing calculus: what it is, how it works, and a few simple examples. Motifs for adversarial review, ensemble reasoning, and synthesis. The second post will tackle something harder.
The plumbing language is built on a symmetric monoidal category, specifically a copy-discard category with some extra structure. The terminology may be unfamiliar, but the underlying concept is not. Engineers famously like Lego. Lego bricks have studs on top and holes with flanged tubes underneath. The studs of one brick fit into the tubes of another. But Lego has more than one connection type: there are also holes through the sides of Technic bricks, and axles that fit through them, and articulated ball joints for the fancier kits. Each connection type constrains what can attach to what. This is typing.
In plumbing, the objects of the category are typed channels: streams that carry a potentially infinite sequence of values, each of a specific type (integer, string, a record type, or something more complex). We write !A to mean "a stream of As", so !string is a stream of strings and !int is a stream of integers. The morphisms, which describe how you connect channels together, are processes. A process has typed inputs and typed outputs.
There are four structural morphisms. Copy takes a stream and duplicates it: the same values appear on two output streams. Discard throws values away, perhaps the simplest thing you can do with a stream, and often needed. These two, together with the typed channels and the laws of the category, give us a copy-discard category.
To this we add two more. Merge takes two streams of the same type and interleaves them onto a single output stream. This is needed because a language model’s input is a single stream. There is nothing to be done about that. If you want to send two different things into it, you must send one and then the other. One might initially give merge the type !A ⊗ !B → !(A + B), taking two streams of different types and producing their coproduct. This works, but it is unnecessarily asymmetrical.
As Tobias Fritz has observed, it is cleaner to do the coproduct injection first, converting each stream to the coproduct type separately, and then merge streams that already have the same type. This gives:
merge : !A ⊗ !A → !(A + A)
Barrier takes two streams, which may be of different types, and synchronises them. Values arrive unsynchronised; the barrier waits for one value from each stream and produces a pair.
barrier : !A ⊗ !B → !(A, B)
(A mathematician would write A B for the product. We cannot easily do this in a computer language because there is no symbol on most keyboards, so we use (A, B) for the product, following Haskell’s convention.)
This is a synchronisation primitive. It is important because it unlocks session types, which we will demonstrate in the second post.
Two further morphisms are added to the category (they are not derivable from the structural ones, but are needed to build useful things): map, which applies a pure function to each value in a stream, and filter, which removes values that do not satisfy a predicate. Both are pure functions over streams. Both will be familiar from functional programming.
Here is a graphical representation of the morphisms. We can glue them together freely, as long as the types and the directions of the arrows match up.

There are two forms of composition. Sequential composition connects morphisms nose to tail, the output of one feeding the input of the next. Parallel composition places them side by side, denoted by ⊗ (the tensor product, written directly in plumbing source code). So: four structural morphisms, two utilities, two compositional forms, all operating on typed channels.
Because the channels are typed, the compiler can check statically, at compile time, that every composition is well-formed: that outputs match inputs at every boundary. This gives a guarantee that the assembled graph makes sense.

A composition of morphisms is itself a morphism. This follows from the category laws (it has to, or it is not a category) but the practical consequence is worth stating explicitly. We can assemble a subgraph of agents and structural morphisms, and then forget the internal detail and use the entire thing as a single morphism in a larger graph. This gives modularity. We can study, test, and refine a building block in isolation, and once satisfied, use it as a component of something bigger.
What we have described so far is the static form of the language: concise, point-free (composing operations without naming intermediate values), all about compositions. This is what you write. It is not what the runtime executes. A compiler takes this static form and produces the underlying wiring diagram, expanding the compositions into explicit connections between ports. The relationship is similar to point-free style in functional programming: the concise form is good for thinking and writing; the expanded form is good for execution.
An agent is a special kind of morphism. It takes typed input and produces typed output, like any other morphism, and we can enforce these types. This much is a well-known technique; PydanticAI and the Vercel AI SDK do it. Agents implement typing at the language model level by producing and consuming JSON, and we can check that the JSON has the right form. This is the basis of the type checking.
Unlike the structural morphisms and utilities, an agent is stateful. It has a conversation history, a context window that fills up, parameters that change. You cannot sensibly model an agent as a pure function. You could model it using the state monad or lenses, and that would be formally correct, but it is the wrong level of abstraction for engineering. Instead, we allow ourselves to think of agents as opaque processes with a typed protocol for interacting with them. We mutate their state through that protocol, and we know how to do that purely from functional programming and category theory. The protocol is the right abstraction; the state management is an implementation detail behind it. How this works in practice, and what happens when it goes wrong, is the subject of the second post.
In addition to their main input and output ports, agents in plumbing have control ports (control in and control out) for configuring the agent at runtime. For example, the temperature parameter governs how creative a language model is: how wide its sampling distribution when choosing output. At zero it is close to deterministic; at one it becomes much less predictable. A control message might say set temperature to 0.3; the response on the control out wire might be acknowledged. The control port carries a typed stream like anything else.
Agents also have ports for operator-in-the-loop (often called human-in-the-loop, though there is no reason an operator must be human), tool calls, and telemetry. The telemetry port emits usage statistics and, if the underlying model supports it, thinking traces. We will not detail these here. Suffice it to say that an agent has several pairs of ports beyond what you might imagine as its regular chat input and output.

An agent has many ports, but most programs use only a few of them. We adopt a convention from the κ calculus: don’t care, don’t write. Any output port that is not mentioned in the program is implicitly connected to discard. If a port’s output cannot matter, there is no reason to write it down.
Suppose the problem is to write a cover letter for a job application. You provide some background material (a CV, some notes, some publications) and a job advert. You want a network of agents to produce a good cover letter. A good cover letter has two constraints: it must be accurate, grounded in the source materials, not making things up; and it must be compelling, so that the reader wants to give you an interview.
These two constraints are in tension, and they are best served by different agents with different roles. A composer drafts from the source materials. A checker verifies the draft against those materials for accuracy, producing a verdict: pass or fail, with commentary. A critic, who deliberately cannot see the source materials, evaluates whether the result is compelling on its own terms, producing a score.
The feedback loops close the graph. If the checker rejects the draft, its commentary goes back to the composer. If the critic scores below threshold, its review goes back to the composer. Only when the critic is satisfied does the final draft emerge.
Here is the plumbing code:
type Verdict = { verdict: bool, commentary: string, draft: string }
type Review = { score: int, review: string, draft: string }
let composer : !string -> !string = agent { ... }
let checker : !string -> !Verdict = agent { ... }
let critic : !Verdict -> !Review = agent { ... }
let main : !string -> !string = plumb(input, output) {
input ; composer ; checker
checker ; filter(verdict = false)
; map({verdict, commentary}) ; composer
checker ; filter(verdict = true) ; critic
critic ; filter(score < 85)
; map({score, review}) ; composer
critic ; filter(score >= 85).draft ; output
}
And here is a graphical representation of what’s going on:

The agent configuration is elided. The main pipeline takes a string input and produces a string output. It is itself a morphism, and could be used as a component in something larger.
Notice what the wiring enforces. The critic receives verdicts, not the original source materials. The information partition is a consequence of the types, not an instruction in a prompt. The feedback loops are explicit: a failed verdict routes back to the composer with commentary; a low score routes back with the review. All of this is checked at compile time.
The previous example shows sequential composition and feedback loops but not parallel composition. An ensemble of agents running simultaneously on the same input needs the tensor product.
Ensembles are common. Claude Code spawns sub-agents in parallel to investigate or review, then gathers the results. This is a scatter-gather pattern familiar from high-performance computing. But this example, due to Vincent Danos, adds something less common: modulation of agent behaviour through the control port.
The input is a proposition. Two agents debate it, one advocating and one sceptical, running in parallel via the tensor product. Their outputs are synchronised by a barrier into a pair and presented to a judge. The judge decides: has the debate converged? If so, a verdict goes to the output. If not, a new topic goes back to the debaters, and a temperature goes to their control inputs.
The intuition is that the debaters should start creative (high temperature, wide sampling) and become progressively more focused as the rounds continue. The judge controls this. Each round, the judge decides both whether to continue and how volatile the next round should be. If the debate appears to be converging, the judge lowers the temperature, preventing the system from wandering off in new directions. Whether this actually causes convergence is a research question, not a proven result.
type Verdict = { resolved: bool, verdict: string,
topic: string, heat: number }
type Control = { set_temp: number }
let advocate : (!string, !Control) -> !string = agent { ... }
let skeptic : (!string, !Control) -> !string = agent { ... }
let judge : !(string, string) -> !Verdict = agent { ... }
let cool : !Verdict -> !Control = map({set_temp: heat})
let main : !string -> !string = plumb(input, output) {
input ; (advocate ⊗ skeptic) ; barrier ; judge
judge ; filter(resolved = false).topic ; (advocate ⊗ skeptic)
judge ; filter(resolved = true).verdict ; output
judge ; cool ; (advocate@ctrl_in ⊗ skeptic@ctrl_in)
}
And here is the graphical representation:

The ⊗ operator is the tensor product: parallel composition. (The grammar also accepts * for editors that cannot input unicode.) The advocate and skeptic run simultaneously on the same input. The barrier synchronises their outputs into a pair for the judge. The last line is the control feedback: the judge’s verdict is mapped to a temperature setting and sent to both agents’ control inputs. Notice that advocate@ctrl_in addresses a specific port on the agent, the control port rather than the main input.
This is a small program. It is also a concurrent system with feedback loops, runtime parameter modulation, and a convergence protocol. Without types, getting the wiring right would be a matter of testing and hope. With types, it is checked before it runs.
In a few lines of code, with a language that has categorical foundations, we can capture interesting agent systems and be sure they are well-formed before they run.
The upshot: when we have guarantees about well-formedness, systems work more stably and more predictably. With static typing, entire classes of structural errors are impossible. You cannot wire an output of one type to an input of another. You cannot forget a connection. The job you pay for is more likely to actually work, and you get more useful work per dollar spent. Runtime budget controls can put a ceiling on cost, but they do not prevent the waste. Static typing prevents the waste. But there is a lot more to do. What we have so far is already useful as a language for constructing agent graphs with static type checking. But we have given short shrift to the complexity and internal state of the agent morphism, which is really all about memory architecture and context management. That is where the real power comes from. For that we need more than a copy-discard category with some extra structure. We need protocols—and that is the subject of the sequel, soon to appear here.
The plumbing compiler, runtime, and MCP server are available as binary downloads for macOS and Linux:
• Download plumbing version 0.
Here is the research paper describing the broader programme of work:
• William Waites, Artificial organisations (arXiv:2602.13275).
I wrote code this weekend to look at the question of how we should visit
a star in the upcoming Terra Hunting Experiment. The current (straw-person) plan is that we will observe each visible star once per night for ten years, with one exposure of a sensibly-chosen exposure time at each visit. Is this a good idea? I was interested in this problem for two reasons. The first is that binning is sinning, with the corollary that bigger bins are worse than finer bins, and a single, long exposure is a very big bin. The second reason is that when there are non-trivial noise sources (like the quasi-periodic variations from p-mode oscillations of the surfaces of Sun-like stars), a few negatively- (or interestingly-) correlated noise draws can be combined in ways that are substantially more informative than by taking the average.
Of course, if you split an exposure (with a standard CCD, say) into sub-exposures, you take on real costs: There is a read time, which is time you aren't integrating, and there is a read noise, that affects each new exposure. So the best strategies are a complicated function of read time, read noise, and the signal-to-noise at which the stellar p-mode oscillations are visible in any realistic data. Related: There are amazingly different and interesting strategies with up-the-ramp detectors that are used in the infrared.
One final comment is that the objective, in my strongly held view, is to optimize the amount of information (about, say, the center-of-mass radial-velocity changes of the target star) per unit wall-clock time. We are paying for wall-clock time; let's get as much as we can out of it.
Today my rant on LLMs and the practices of our field hit the arXiv. I was scared to post it, because it is such a weird contribution, and it is so revealing about myself and my own political positions and hangups. But I have to say: I got great and supportive feedback all day.
I got two comments on saying ACAB
in the literature. The Astronomer Royal of Scotland quoted (on BlueSky) the last sentence, which I put there because Andy Casey (Monash, Flatiron) insisted. Many people sent me appreciation and thank-yous, and many people sent me comments and objections. Always constructive. The whole experience made me feel very happy about the state of our field and the way we all interact. I think maybe there will be critical mass to write some kind of collection of essays on the subject. That's a plan for 2026.
The Physicists and Mr. EpsteinMr. Epstein was not only a world-class child abuser, he was a big fan of theoretical high-energy physics and of theoretical physicists. Some of my colleagues, unfortunately, got to know him. A number who were famous and/or had John Brockman as a book agent were even invited to a physics conference on Epstein’s private island, well before he was first arrested. This was no secret; as I recall, a lot of us heard about the existence of this conference/trip, but we hadn’t heard Epstein’s name before and didn’t pay much attention (ho hum, just another weird billionaire).
Personally, I feel quite lucky. The Brockman agency rejected the proposal for my recent book without comment (thank you!); and my research is mostly considered unimportant by the Brian Greenes of the world. As a result, I was not invited to Epstein’s island, never made his acquaintance, and blissfully avoided the entire affair. Clearly there are some benefits to being considered ordinary. And so — I’m sorry/not-sorry to say — I can’t tell you much about Epstein at all, or about how certain physicists did and did not interact with him. Regarding my colleagues who did get to know him, I can’t speak for them, since I wasn’t there, and I don’t know to what extent Epstein hid his immoral activities when they were around. It’s up to them to tell their own stories if they feel the need to do so (and I hope a couple of them do, just to clear the air.) Personally I tend to give them the benefit of the doubt — probably some literally didn’t know what was up until Epstein’s arrest in 2008, while perhaps others felt there wasn’t much they could do about Epstein’s actions on his own private island. I imagine they are deeply embarrassed to have been caught in this horrible man’s ugly web.
Fans of physics come in all shapes and sizes, and some have large wallets, large egos, and/or large ambitions. Among the wealthy supporters, we can count Alfred Nobel himself; billionaires sit on important scientific institute and university boards, and the more recent Breakthrough Prizes were funded by deep pockets. The extreme wealthy have outsized influence in our country and in our world, and one could argue that their influence in 2025 was not for the better. Usually, though, the influence in physics and related fields tends to be relatively benign, funding postdoctoral researchers and graduate students who deeply want to do science but also need to eat. That said, sometimes donors fund non-essential fields at the expense of critical ones, or favor theoretical research over the gathering of crucial experimental data, or push money on famous rich organizations when there are poor ones that are equally deserving and far more needy.
When gazillionaires, on their own initiative, come calling on non-profit organizations, whether they be community centers, arts organizations, or universities, they pose a problem. On the one hand, it is the job of anyone in a non-profit organization to help raise money — fail to do that, and your organization will close. When a single person offers to permanently change the future of your program, you would be derelict in your duty if you did not consider that offer. On the other hand, donors who might have ethical or criminal problems could drag the organization’s name through the mud. Worse, they might be able to force the organization itself to do something ethically questionable or even illegal.
There is a clear lesson for young academics and other up-and-coming non-profit actors in the Epstein affair: the more money potentially offered to our organizations, the more carefully we must tread. Money is power; power corrupts; and every pursuit of dollars, even for the best causes, risks infection. We can’t be large-scale non-profit fundraisers without doing serious and thorough background checks of the biggest donors; we have to question motives, and we can’t look the other way when something seems amiss. Those of us with clear hearts and honest pursuits tend to assume the best in other people. But we have to beware of those hoping to bolster their reputations, or clean their consciences, by giving away “generously” what they never deserved to have.
I’m writing to point out a potential law which should be gathering more opposition and attention in math academia: The Securing American Funding and Expertise from Adversarial Research Exploitation Act. This is an amendment to the 2026 National Defense Authorization Act which has passed the House and could be added to the final version of the bill during reconcilliation in the Senate. I’m pulling most of my information from an article in Science.
This act would ban any US scientist from receiving federal funding if they have, within the last five years, worked with anyone from China, Russia, Iran or North Korea, where “worked with” includes joint research, co-authorship on papers, or advising a foreign graduate student or postdoctoral fellow. As I said in my message to my senators, this is everyone. Every mathematician has advised Chinese graduate students or collaborated with Chinese mathematicians, because China is integrated into the academic world and is one fifth of the earth.
This obviously isn’t secret, since you can read about it in Science, but I am surprised that I haven’t heard more alarm. Obvious people to contact are your senators and your representatives. I would also suggest contacting members of the Senate armed services committee, who are in charge of reconciling the House and Senate versions of the bill.
Thanksgiving(Apologies for the ugly blog format. We had a bit of a crash, and are working to get the template back in working order.)
This year we give thanks for a crucially important idea that can mean very different things to different people: information. (We’ve previously given thanks for the Standard Model Lagrangian, Hubble’s Law, the Spin-Statistics Theorem, conservation of momentum, effective field theory, the error bar, gauge symmetry, Landauer’s Principle, the Fourier Transform, Riemannian Geometry, the speed of light, the Jarzynski equality, the moons of Jupiter, space, black hole entropy, electromagnetism, Arrow’s Impossibility Theorem, and quanta.)
“Information” is an idea that is everywhere in science and technology these days. From one angle it looks like such an obvious idea that it’s a bit startling to realize that information theory didn’t really come along until the work of Claude Shannon in the 1940s. From another, the idea has so many different shades of meaning that we shouldn’t be surprised (that’s a joke you will get in a bit) that it can be hard to understand.
Information theory is obviously an enormous subject, but we’re just giving thanks, not writing a textbook. I want to mention two ideas I find especially central. First, Shannon’s idea about relating information content to “surprisal.” Second, the very different intuitive notions of information that we get from engineering and physics.
Shannon, working at Bell Labs, was interested in the problem of how to send trustworthy signals efficiently over transatlantic cables. He was thinking about various ways to express information in a code: a set of symbols, each with a defined meaning. So a code might be an alphabet, or a set of words, or a literal cipher. And he noticed that there was a lot of redundancy in natural languages; the word “the” appears much more often in English than the word “axe,” although both have the same number of letters.
Let’s refer to each letter or symbol in a code as an “event.” Shannon’s insight was to realize that the more unlikely an event, the more information it conveyed when it was received. The statements “The Sun rose in the east this morning” and “The Sun rose in the west this morning” contain the same number of letters, but the former contains almost no information — you already were pretty sure the Sun would be rising in the east. But the latter, if obtained from a reliable source, would be very informative indeed, precisely because it was so unexpected. Clearly some kind of unprecedented astronomical catastrophe was in progress.
Imagine we can assign a probability
to every different event
. Shannon wanted a way to quantify the information content of that event, which would satisfy various reasonable-seeming axioms: most crucially, that the information content of two independent events is the sum of the individual information contents. But the joint probability of two events is the product of their individual probabilities. So the natural thing to do would be to define the information content as the logarithm of the probability; the logarithm of a product equals the sum of the individual logarithms. But you want low probability to correspond to high information content, so Shannon defined the information content (also called the self-information, or surprisal, or Shannon information) of an event to be minus the log of the probability, which by math is equal to the log of the reciprocal of the probability:
![]()
Note that probabilities are numbers between 0 and 1, and the log of such a number will be negative, with numbers closer to 0 being more negative than numbers closer to 1. So
goes from
at
to
at
. An impossible message is infinitely surprising, and therefore conveys infinite information; an inevitable message is completely unsurprising, and conveys no information at all.
From there, Shannon suggested that we could characterize how efficient an entire code was at conveying information: just calculate the average (expectation value) of the information content for all possible events. When we have a probability distribution
, the average of any function
is just the sum of the the values of the function times their respective probabilities,
. So we characterize the information content of a code via the quantity
![]()
The only question is, what to call this lovely newly-defined quantity that surely nobody had ever thought of before? Happily Shannon was friends with John von Neumann, who informed him, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.” So entropy it is.
Indeed, this formula is precisely that which had been put forward (unknown to Shannon) by Josiah Willard Gibbs in the 1870’s as a definition of entropy in statistical mechanics. (It is related to the definition on Ludwig Boltzmann’s tombstone,
, and Boltzmann had also suggested similar expressions to the above.) On the one hand, it seems remarkable to find precisely the same expression playing central roles in problems as disparate as sending signals across cables and watching cream mix into coffee; on the other hand, it’s a relatively simple expression and the axioms used to derive it are actually pretty similar, so perhaps we shouldn’t be surprised; on the third hand, the connection between information theory and statistical mechanics turns out to be deep and fruitful, so it’s more than just a mathematical coincidence.
But let me highlight the one aspect of the term “information” that can be sometimes confusing to people. To the engineer, a code that is maximally informative is one for which
is relatively uniform over all events
, which means
is maximal or close to it; in that case, every event will tell you something at least a little bit interesting. For them, high entropy = high information.

But to a physicist who might be asking “how much information do I have about the state of a system?”, you have more information when
is relatively narrowly concentrated around some value, rather than being all spread out. For them, high entropy = low information! Indeed, one physically-relevant notion of “information” is the “accessible information” of a system, which can be defined as
. (I talk about this a bit in my recent solo podcast on complexity.)

Perhaps we shouldn’t be so surprised that physicists and engineers posit oppositely-directed relationships between entropy and information. It’s just a reflection of the fact that “information” is so ubiquitous and has so many different uses. We should be thankful that we’re beginning to understand it so well.
It’s been over three years since my last post on this blog and I have sometimes been asked, understandably, whether the project I announced in my previous post was actually happening. The answer is yes — the grant I received from the Astera Institute has funded several PhD students and a couple of postdocs, and we have been busy. In my previous post I suggested that I would be open to remote collaboration, but that has happened much less, partly because a Polymath-style approach would have been difficult to manage while also ensuring that my PhD students would have work that they could call their own to put in their theses.
In general I don’t see a satisfactory solution to that problem, but in this post I want to mention a subproject of the main project that is very much intended to be a large public collaboration. A few months ago, a call came out from Renaissance Philanthropies saying that they were launching a $9m AI for Math Fund to spend on projects in the general sphere of AI and mathematics, and inviting proposals. One of the categories that they specifically mentioned was creating new databases, and my group submitted a proposal to create a database of what we call “structured motivated proofs,” a piece of terminology that I will explain a bit more later in just a moment. I am happy to report that our proposal was one of the 29 successful ones. Since a good outcome to the project will depend on collaboration from many people outside the group, we need to publicize it, which is precisely the purpose of this post. Below I will be more specific about the kind of help we are looking for.
The underlying thought behind this project is that AI for mathematics is being held back not so much by an insufficient quantity of data as by the wrong kind of data. (For a more general exploration of this theme, see here.) All mathematicians know, and some of us enjoy complaining about it, that it is common practice when presenting a proof in a mathematics paper, or even textbook, to hide the thought processes that led to the proof. Often this does not matter too much, because the thought processes may be standard ones that do not need to be spelt out to the intended audience. But when proofs start to get longer and more difficult, they can be hard to read because one has to absorb definitions and lemma statements that are not obviously useful, are presented as if they appeared from nowhere, and demonstrate their utility only much later in the argument.
A sign that this is a problem for AI is the behaviour one observes after asking an LLM to prove a statement that is too difficult for it. Very often, instead of admitting defeat, it will imitate the style of a typical mathematics paper and produce rabbits out of hats, together with arguments later on that those rabbits do the required job. The problem is that, unlike with a correct mathematics paper, one finds when one scrutinizes the arguments carefully that they are wrong. However, it is hard to find superficial features that distinguish between an incorrect rabbit with an incorrect argument justifying that rabbit (especially if the argument does not go into full detail) and a correct one, so the kinds of statistical methods used by LLMs do not have an easy way to penalize the incorrectness.
Of course, that does not mean that LLMs cannot do mathematics at all — they are remarkably good at it, at least compared with what I would have expected three years ago. How can that be, given the problem I have discussed in the previous paragraph?
The way I see it (which could change — things move so fast in this sphere), the data that is currently available to train LLMs and other systems is very suitable for a certain way of doing mathematics that I call guess and check. When trying to solve a maths problem, you will normally write down the routine parts of an argument without any fuss (and an LLM can do them too because it has seen plenty of similar examples), but if the problem as a whole is not routine, then at some point you have to stop and think, often because you need to construct an object that has certain properties (I mean this in a rather general way — the “object” might be a lemma that will split up the proof in a nice way) and it is not obvious how to do so. The guess-and-check approach to such moments is what it says: you make as intelligent a guess as you can and then see whether it has the properties you wanted. If it doesn’t, you make another guess, and you keep going until you get lucky.
The reason an LLM might be tempted to use this kind of approach is that the style of mathematical writing I described above makes it look as though that is what we as mathematicians do. Of course, we don’t actually do that, but we tend not to mention all the failed guesses we made and how we carefully examined why they failed, modifying them in appropriate ways in response, until we finally converged on an object that worked. We also don’t mention the reasoning that often takes place before we make the guess, saying to ourselves things like “Clearly an Abelian group can’t have that property, so I need to look for a non-Abelian group.”
Intelligent guess and check works well a lot of the time, particularly when carried out by an LLM that has seen many proofs of many theorems. I have often been surprised when I have asked an LLM a problem of the form , where
is some property that is hard to satisfy, and the LLM has had no trouble answering it. But somehow when this happens, the flavour of the answer given by the LLM leaves me with the impression that the technique it has used to construct
is one that it has seen before and regards as standard.
If the above picture of what LLMs can do is correct (the considerations for reinforcement-learning-based systems such as AlphaProof are not identical but I think that much of what I say in this post applies to them too for slightly different reasons), then the likely consequence is that if we pursue current approaches, then we will reach a plateau: broadly speaking they will be very good at answering a question if it is the kind of question that a mathematician with the right domain expertise and good instincts would find reasonably straightforward, but will struggle with anything that is not of that kind. In particular, they will struggle with research-level problems, which are, almost by definition, problems that experts in the area do not find straightforward. (Of course, there would probably be cases where an LLM spots relatively easy arguments that the experts had missed, but that wouldn’t fundamentally alter the fact that they weren’t really capable of doing research-level mathematics.)
But what if we had a database of theorems and proofs that did not hide the thought processes that lay behind the non-obvious details of the proofs? If we could train AI on a database of accounts of proof discoveries and if, having done so, we then asked it to provide similar accounts, then it would no longer resort to guess-and-check when it got stuck, because the proof-discovery accounts it had been trained on would not be resorting to it. There could be a problem getting it to unlearn its bad habits, but I don’t think that difficulty would be impossible to surmount.
The next question is what such a database might look like. One could just invite people to send in stream-of-consciousness accounts of how they themselves found certain proofs, but that option is unsatisfactory for several reasons.
To deal with these kinds of difficulties, we plan to introduce a notion of a structured motivated proof, by which we mean a proof that is generated in a very particular way that I will partially describe below. A major part of the project, and part of the reason we needed funding for it, is to create a platform that will make it convenient to input structured motivated proofs and difficult to insert the kinds of rabbits out of hats that make a proof mysterious and unmotivated. In this way we hope to gamify the task of creating the database, challenging people to input into our system proofs of certain theorems that appear to rely on “magic” ideas, and perhaps even offering prizes for proofs that contain steps that appear in advance to be particularly hard to motivate. (An example: the solution by Ellenberg and Gijswijt of the cap-set problem uses polynomials in a magic-seeming way. The idea of using polynomials came from an earlier paper of Croot, Lev and Pach that proved a closely related theorem, but in that paper it just appears in the statement of their Lemma 1, with no prior discussion apart from the words “in the present paper we use the polynomial method” in the introduction.)
I wrote about motivated proofs in my previous post, but thanks to many discussions with other members of the group, my ideas have developed quite a lot since then. Here are two ways we like to think about the concept.
I will not go into full detail about what I mean by this, but will do so in a future post when we have created the platform that we would like people to use in order to input proofs into the database. But the basic idea is that at any one moment one is in a certain state, which we call a proof-discovery state, and there will be a set of possible moves that can take one from the current proof-discovery state to a new one.
A proof-discovery state is supposed to be a more formal representation of the state one is in when in the middle of solving a problem. Typically, if the problem is difficult, one will have asked a number of questions, and will be aware of logical relationships between them: for example, one might know that a positive answer to Q1 could be used to create a counterexample to Q2, or that Q3 is a special case of Q4, and so on. One will also have proved some results connected with the original question, and again these results will be related to each other and to the original problem in various ways that might be quite complicated: for example P1 might be a special case of Q2, which, if true would reduce Q3 to Q4, where Q3 is a generalization of the statement we are trying to prove.
Typically we will be focusing on one of the questions, and typically that question will take the form of some hypotheses and a target (the question being whether the hypotheses imply the target). One kind of move we might make is a standard logical move such as forwards or backwards reasoning: for example, if we have hypotheses of the form and
, then we might decide to deduce
. But things get more interesting when we consider slightly less basic actions we might take. Here are three examples.
This is a surprisingly useful way to conceive of what we are talking about, especially as it relates closely to what I was talking about earlier: imposing a standard form on motivated proofs (which is why we call them “structured” motivated proofs) and gamifying the process of producing them.
The idea is that a structured motivated proof is one that can be generated using an interface (which we are in the process of creating — at the moment we have a very basic prototype that has a few of the features we will need, but not yet the more interesting ones) that has one essential property: the user cannot type in data. So what can they do? They can select text that is on their screen (typically mathematical expressions or subexpressions), they can click buttons, choose items from drop-down menus, and accept or reject “obvious” suggestions made to them by the interface.
If, for example, the current goal is an existential statement , then typing in a formula that defines a suitable
is not possible, so instead one must select text or generate new text by clicking buttons, choosing from short drop-down menus, and so on. This forces the user to generate
, which is our proxy for showing where the idea of using
came from.
Broadly speaking, the way the prototype works is to get an LLM to read a JSON object that describes the variables, hypotheses and goals involved in the proof state in a structured format, and to describe (by means of a fairly long prompt) the various moves it might be called upon to do. Thus, the proofs generated by the system are not formally verified, but that is not an issue that concerns us in practice since there will be a human in the loop throughout to catch any mistakes that the LLM might make, and this flexibility may even work to our advantage to better capture the fluidity of natural-language mathematics.
There is obviously a lot more to say about what the proof-generating moves are, or (approximately equivalently) what the options provided by a point-and-click system will be. I plan to discuss that in much more detail when we are closer to having an interface ready, the target for which is the end of this calendar year. But the aim of the project is to create a database of examples of proofs that have been successfully generated using the interface, which can then be used to train AI to play the generate-structured-motivated-proof game.
There are several tasks that will need doing once the project gets properly under way. Here are some of the likely ones.
If you think you might be interested in any of these roles, please feel free to get in touch. Probably the hardest recruitment task for us will be identifying the right people with the right mixture of mathematical knowledge and software engineering skills to help us turn the platform into a well-designed web-based one that is convenient and pleasurable to use. If you think you might be such a person, or if you have a good idea for how we should go about finding one, we would be particularly interested to hear from you.
In a future post, I will say more about the kinds of moves that our platform will allow, and will give examples of non-motivated proofs together with how motivated versions of those proofs can be found and entered using the platform (which may involve a certain amount of speculation about what the platform will end up looking like).
In one way, our “moves” can be regarded as tactics of a kind. However, some of the moves we will need are difficult to implement in conventional proof assistants such as Lean. In parallel with the work described above, we hope to create an interface to Lean that would allow one to carry out proof-discovery moves of the kind discussed above but with the proof-discovery states being collections of Lean proof states. Members of my group have already been working on this and have made some very interesting progress, but there is some way to go. However, we hope that at some point (and this is also part of the project pitched to the AI for Math Fund) we will have created another interface that will have Lean working in the background, so that it will be possible to generate motivated proofs that will be (or perhaps it is better to say include) proofs in Lean at the same time.
Another possibility that we are also considering is to use the output of the first platform (which, as mentioned above, will be fairly formal, but not in the strict sense of a language such as Lean) to create a kind of blueprint that can then be autoformalized automatically. Then we would have a platform that would in principle allow mathematicians to search for proofs while working on their computers without having to learn a formal language, with their thoughts being formalized as they go.
I spent the day at the NSBP / NSHP meeting in San José. My favorite session of the day was the morning astro session, which was entirely about brown dwarfs. I learned a lot in a very short time. Caprice Phillips (UCSC) introduced the session with an introduction to the scientific and technical questions in play. She put a lot of emphasis on using binaries and clusters to put detailed abundance ratios onto substellar objects. This was what I expected: I thought (walking in to this session) that all known abundance ratios for brown dwarfs were from such kinds of studies. I learned different (keep reading).
Gabriel Munoz Zarazua (SFSU) followed by showing spectra from M-dwarfs, brown dwarfs, and Jupiter. It definitely looks like a sequence. He does spectral fitting (what they call, in this business, retrievals). It looks like he is getting very good, somewhat precise, abundance ratios for the photospheres of substellar objects! I asked more about this in the question period, and apparently I am way behind the times (Emily Rauscher, Michigan, helpfully pointed this out to me): Now brown-dwarf photosphere models are so good, they can be used to measure abundances, and pretty well.
I also learned in this session (maybe from Jorge Sanchez, ASU, or maybe from Efrain Alvarado, SFSU) that there is a very strong mass–abundance relation in the Solar System. That is, we don't expect, if brown dwarfs form the way planets do, that the detailed abundances of the brown dwarfs will match exactly the detailed abundances of the primary stars. But now we are really in a position to test that. Sanchez showed that we can get, from even photometry, abundances for substellar objects in the Milky Way halo. Again, totally new to me! And he finds metallicities at or below −3. Alvarado showed data on an amazing system J1416, which is an L–T binary with no stellar companion. Apparently it is the only known completely substellar binary.
Event with Professor Daniel Whiteson on Monday November 17 at 7pmNext Monday, November 17th at 7pm, I’ll be at the Harvard Bookstore with particle physicist and author Daniel Whiteson. Professor Whiteson and his co-author Andy Warner have a nice new book, for the general science-aware reader, exploring an age-old and unanswered question: how universal is the knowledge and understanding that we call “physics”? How much of modern physics is actually telling us about the universe, and how much of it is created by, or an accident of, the humans who have helped bring it about?
For instance, if we started all over again and reran history from scratch, would the physics (and science more generally) of this re-run culture look much like our own, or might it turn out very differently? If another culture on Earth had had time to develop highly mature science (or something like it) in its own direction, independent of Western Europe’s influence, how different might that science be? (Indeed, would our word “science” even be translatable into their worldview?) Or if we encountered aliens with far greater understanding of the universe than we have, would we be able to recognize, parse, grok, appreciate, comprehend, and/or otherwise make sense of their notions of scientific knowledge?
Whiteson and his co-author, wanting to write a popular book rather than a scholarly one, and desiring nevertheless to take on these serious and challenging intellectual questions, have set their focus mostly on the aliens, accompanied by amusing cartoons and a generous helping of dad jokes (hey, some dad jokes are actually very funny.) They’re looking for a broad audience, and hopefully they will get it. But don’t let the light-hearted title (“Do Aliens Speak Physics?“) or the charmingly goofy cover fool you: this book might well make you laugh, but I guarantee it will make you think. Whether you’re just curious about science or you’ve been doing science yourself for years, I suspect that, within the vast array of problems and issues that are raised in this broad-minded book, there will be some you’ve never thought of.
Among scientists and philosophers, there are some who believe that any aliens with the capacity to reach the Earth will obviously “speak physics” — that math and physics float above contingencies of culture and species, and will easily be translated from any intelligent creature to any other. But are they perhaps flying too high? It’s clear that Whiteson and Warner are aiming to poke some holes — lots of holes —- in their hot-air balloon, and to do so in a way that a wide variety of readers can appreciate and enjoy.
I tend to agree with Whiteson on a lot of these issues, but that won’t stop me from asking him some tough questions. You can ask him some tough questions too, if you like — just come to the Harvard Bookstore at 7:00 on Monday and join the conversation!

I started a tradition a little while back where every year we have a special departmental colloquium entitled "The Nobel Prize in Physics: Who/What/Why". This year my job in finding speakers was made easier by having 2/3 of this years newly-minted Nobel Prize winners in physics in the Department! (Michel Devoret and John Martinis.) So our room was a bit more well-attended than normal...(hundreds and hundreds rather than dozens and dozens). Here is a recording of the event, which I was delighted to host, and there's a celebration afterwards too. (Pls share widely!)
[...] Click to continue reading this post
The post Nobel Prize in Physics 2025: Who/What/Why appeared first on Asymptotia.
Recently I had to update Mathematica on my laptop and after having solved the challenges of the license manager that keeps looking different every time I have to use it, I learned that Mathematica 14 can now officially work with finite fields.
This reminded me that for a while I wanted to revive an old project that had vanished together with the hard drive of some old computer: Holosplit. So, over the last two days and with the help of said version of Mathematica I did a complete rewrite which you can now find on Github.
It consists of two C programs "holosplit" and "holojoin". To the first you give a positive integer \(N\) and a file and it spits out a new file ("fragment") that is roughly \(1/N\) of the size. Every time you do that you obtain a new random fragment.
The later you give any collection of \(N\) of these fragments and it reproduces the original file. So you can for example distribute a file over 10 people such that when any 3 of them work together, they can recover the original.
How does it work? I uses the finite field \(F\) of \(2^3=256\) elements (in the Github repository, there is also a header file that implements arithmetic in \(F\) and matrix operations like product and inverse over it). Each time, it is invoked, it picks a random vector \(v\in F^N\) and writes it to the output. Then it reads \(N\) bytes from the file at a time which it also interprets as a vector \(d\in F^N\). It then outputs the byte that corresponds to the scalar product \(v\cdot d\).
To reassemble the file, holojoin takes the \(N\) files with its random vectors \(v_1,\ldots,v_N\) and interprets those as the rows of a \(N\times N\) matrix \(A\). With probability
$$\frac{\prod_{k=1}^N \left(256^N-k\right)}{(256)^{N^2}}$$
which exponentially in \(N\) approaches 1 this matrix is invertible (homework: why?). So we can read one byte from each file, assemble those into yet another vector \(e\in F^N\) and recover
$$d=A^{-1}e.$$
Besides the mathematics, it also poses philosophical/legal questions: Consider for example the original file is copyrighted, for example an mp3 or a video. The fragments are clearly derived works. But individually, they do not contain the original work, without sufficiently many other fragments they are useless (although not in a cryptographic sense). So by publishing one fragment, I do not provide access to the original work. What if others publish other fragments? Then my fragment could be the last remaining one that was missing. If there are more, any individual fragment is redundant so publishing it strictly speaking does not provide new information.
Photo albumsPeter’s photos: https://www.icloud.com/sharedalbum/#B275oqs3qKSZvQ
Screenshots: https://www.icloud.com/sharedalbum/#B27532ODWjIQb9
Climbing book launch: https://www.icloud.com/sharedalbum/#B27GWZuqDGnuOyN
Salisbury waters: https://www.icloud.com/sharedalbum/#B275qXGF1JQFkx
Christmas with Ash: https://www.icloud.com/sharedalbum/#B27G6XBubAhoT6
Hosin BBQ duck: https://www.icloud.com/sharedalbum/#B27GY8gBYG3b5mD
Hawks Nest to Smiths Lake: https://www.icloud.com/sharedalbum/#B2759UlCqSH5bE
Europe & Alps: https://www.icloud.com/sharedalbum/#B275ON9t3W0lu
Point Perpendicular: https://www.icloud.com/sharedalbum/#B27GqkRUiGivXD2
Newnes canyoning: https://www.icloud.com/sharedalbum/#B27GfnH8tgHSmX
Coffs Harbour to Yamba: https://www.icloud.com/sharedalbum/#B27J0DiRHJKuuWr
Wendy Bruere Christmas (2020): https://www.icloud.com/sharedalbum/#B27G4TcsmGoHysj
Six Foot Track: https://www.icloud.com/sharedalbum/#B2753qWtHZA9EX
Kosciusko to Kiandra: https://www.icloud.com/sharedalbum/#B27GgZLKuGaewVm
Camping food: https://www.icloud.com/sharedalbum/#B27GtnIORgbmHu
The Aardvark: https://www.icloud.com/sharedalbum/#B275VaUrzvmAiT
Kangaroo Valley kayaking: https://www.icloud.com/sharedalbum/#B27JEsNWnJrCpi0
Claustral canyon: https://www.icloud.com/sharedalbum/#B2755Z2WMOTpsk
Budawang: https://www.icloud.com/sharedalbum/#B27GDdyTvGvpINL
Mother’s Day panoramas (2021): https://www.icloud.com/sharedalbum/#B27GFssfGG9WmJP
Point Perpendicular & Nowra: https://www.icloud.com/sharedalbum/#B27GRMtznGPdeuZ
Blood moon: https://www.icloud.com/sharedalbum/#B27GdIshaG8NgGX
La Perouse to Coogee: https://www.icloud.com/sharedalbum/#B275aVbMK4h7qo
Canberra ASPI launch: https://www.icloud.com/sharedalbum/#B27GQOeMmGj4Zcv
Edible foraging: https://www.icloud.com/sharedalbum/#B275ejO179Si0N
Sydney to Wollongong: https://www.icloud.com/sharedalbum/#B275M7GFPUasMe
Album for Dad, Father’s Day (2021): https://www.icloud.com/sharedalbum/#B2752plgjnnkUe
Vaucluse (with Cheryl, Nestor & Wendy): https://www.icloud.com/sharedalbum/#B275CmvAS4uA0Z
Bouddi National Park: https://www.icloud.com/sharedalbum/#B27GdPblXG8WdOo
Tom Thumb (the 2nd): https://www.icloud.com/sharedalbum/#B275aDWbr4CN2w
Eden to Victoria: https://www.icloud.com/sharedalbum/#B27GJDfWGArX8l
Wendy’s book launch (the 2nd): https://www.icloud.com/sharedalbum/#B27GIcgc2G7h08y
Mark & Pat Bruere visit Sydney: https://www.icloud.com/sharedalbum/#B27G0ehgLbyWyg
New Years Eve climb (2021): https://www.icloud.com/sharedalbum/#B27Ju8EH6JOZxmU
Newnes Canyoning (2022): https://www.icloud.com/sharedalbum/#B275BydzFU0GZ8
Royal National Park (2022): https://www.icloud.com/sharedalbum/#B27GlxzuqGVI5nE
Peter & Wendy: https://www.icloud.com/sharedalbum/#B27Gf693ZG52tfd
Book photo shoots: too rude…
Wendy & Peter’s mushroom trip: https://www.icloud.com/sharedalbum/#B27GrhkPxG27So8
Post-mushroom hike: https://www.icloud.com/sharedalbum/#B27GdFryYG8i3Ur
Wendy Kalymnos favourites: https://www.icloud.com/sharedalbum/#B27JqstnBJEXkH2
Wendy Frenchmans screenshots: https://www.icloud.com/sharedalbum/#B27Jr1PPdJpd7Dq
Instagram: https://www.icloud.com/sharedalbum/#B27GzFCC1Gb4tqr
Haute route: https://www.icloud.com/sharedalbum/#B27J8GySPJtWoQ1
Kim’s KKKalendar: https://www.icloud.com/sharedalbum/#B275fk75vIL0sH
Frenchmans Cap Wild: https://www.icloud.com/sharedalbum/#B27G4VTwGGoFBkz
Photoshoot with Zixin: https://www.icloud.com/sharedalbum/#B27GPCdxkGKPkM4
Wendy birthday hike (2023): https://www.icloud.com/sharedalbum/#B27GWBC59GnHpQW
Bateman’s Bay to Bawley Point: https://www.icloud.com/sharedalbum/#B27JsHvHoJ8bxWf
Stockton Sand dunes (2023): https://www.icloud.com/sharedalbum/#B27GVfZ2vGloFZV
Wendy book launch (2023): https://www.icloud.com/sharedalbum/#B27J058xyJR4IBM
Dolomites (2023): https://www.icloud.com/sharedalbum/#B0Z5kuVsbGJUzKO
Mount Arapiles: https://www.icloud.com/sharedalbum/#B275GH8Mq8Uh2X
Mount Solitary loop: https://www.icloud.com/sharedalbum/#B275nhQST2mETE
Klaus Hanz Franz Rohde Kunst: https://www.icloud.com/sharedalbum/#B27GqQrCLGiY3vb
Klaus Rohde funeral slideshow: https://www.icloud.com/sharedalbum/#B27GDZLe8GXP58K
Dad (old, B&W): https://www.icloud.com/sharedalbum/#B27GLLXGLJ5mbT2
Klaus & Ursula wedding: https://www.icloud.com/sharedalbum/#B275cLqfN7154g
Test Greece: https://www.icloud.com/sharedalbum/#B27Jq4WnLJ6JMNd
From Will Skea (Alps): https://www.icloud.com/sharedalbum/#B27JHciePJFwacG
From Will Skea (Frenchmans Cap): https://www.icloud.com/sharedalbum/#B275ZhN2v3EVq6
From Will Skea (Arapiles): https://www.icloud.com/sharedalbum/#B27JPrgBGJu3BTD
Coffs Harbour to Yamba (2): https://www.icloud.com/sharedalbum/#B27GFqhgJG9LHgT
Mark magic show (2021): https://www.icloud.com/sharedalbum/#B27G60dj6ARCvd
Wendy Christmas present (2020): https://www.icloud.com/sharedalbum/#B275FrPQ6GxvRu
AHS 25 year reunion: https://www.icloud.com/sharedalbum/#B275O3DjHUvSv
WhatsApp: https://www.icloud.com/sharedalbum/#B275tzEA5fX1nc
Armidale High School: https://www.icloud.com/sharedalbum/#B27GnbeumG4PnAF
Book photos for Mum & Dad: https://www.icloud.com/sharedalbum/#B27Gtec4XQkASe
Miscellaneous: https://www.icloud.com/sharedalbum/#B27Gq6kMgGKn7GR
Three Capes Trail (2022): https://www.icloud.com/sharedalbum/#B27G7HOIlGrDUGZ
Childhood computer programming: https://www.icloud.com/sharedalbum/#B275fu2MutDU8N
Magic with Mark in Maroubra: https://www.icloud.com/sharedalbum/#B27Gv6DhEGD9U3G
Photoshoot with Zixin (2024): https://www.icloud.com/sharedalbum/#B27GCATCnJGoRfW
Butt Crack (2021): https://www.icloud.com/sharedalbum/#B275VtHQfMv0zw
Greece photos new (edited to remove photos from wrong album): https://www.icloud.com/sharedalbum/#B27GY3uThGoBcGj
Singapore (all combined): https://www.icloud.com/sharedalbum/#B275qsTcwJKJjl
Hong Kong (transit): https://www.icloud.com/sharedalbum/#B2759v1AbS8Hve
Taiwan: https://www.icloud.com/sharedalbum/#B27GQD2D7Gw0hAp
India (combined): https://www.icloud.com/sharedalbum/#B27Gtue8VQy83g
Freycinet: https://www.icloud.com/sharedalbum/#B27G5VpecGE5Tbg
Triglav: https://www.icloud.com/sharedalbum/#B275MbK9Vy8erz
Shared with me: https://www.icloud.com/sharedalbum/#B27GGXqixzPOrm
Mount Wellington climbing: https://www.icloud.com/sharedalbum/#B27Gd59qiG8Kjy4
New Zealand combined (2004): https://www.icloud.com/sharedalbum/#B27GIZ8BIGNN5jy
New Zealand combined (2005): https://www.icloud.com/sharedalbum/#B27GcuRfIGFVIcL
Yea: https://www.icloud.com/sharedalbum/#B27GZYbYHGhFIir
Mount Pleasant: https://www.icloud.com/sharedalbum/#B275Iy2hC0JTTL
D’Aguilar: https://www.icloud.com/sharedalbum/#B27Gh7fzTGZBosS
Bali (2001): https://www.icloud.com/sharedalbum/#B27G1qNHBGOTbIr
Samba Ninjas: https://www.icloud.com/sharedalbum/#B27GG34bAzqQ0v
Armidale (misc): https://www.icloud.com/sharedalbum/#B27GSkLVwGyobbX
Emma’s party (2008): https://www.icloud.com/sharedalbum/#B275S2ms99Zyby
Goettingen (2011): https://www.icloud.com/sharedalbum/#B27JIrbT3Jsgxhd
South Coast track: https://www.icloud.com/sharedalbum/#B27G58NWBG6QyN7
Minsk (2006): https://www.icloud.com/sharedalbum/#B27G3JpSBGX1UkQ
Baden-Baden (2019): https://www.icloud.com/sharedalbum/#B27595X5HTVzJr
Berlin (combined): https://www.icloud.com/sharedalbum/#B27JqWzChJ6qizD
Switzerland (combined): https://www.icloud.com/sharedalbum/#B275zXwoYGJ6HMF
Italy highlights: https://www.icloud.com/sharedalbum/#B27G47PHQGoJium
Germany (misc): https://www.icloud.com/sharedalbum/#B275hPMfYGu5xVJ
Garmisch (2022): https://www.icloud.com/sharedalbum/#B27GFsbvlG9Xrr6
Germany (2019): https://www.icloud.com/sharedalbum/#B27G6Mn98G56Ncb
Garmisch (2006): https://www.icloud.com/sharedalbum/#B27J5lIdKGLC9KG
Baden-Baden (2005): https://www.icloud.com/sharedalbum/#B275sWRpHHQkt9
Berlin (2005): https://www.icloud.com/sharedalbum/#B27GgOQtrGjQrpH
Zugspitze (2005): https://www.icloud.com/sharedalbum/#B27G81mNdGcApGt
Amsterdam, Bristol (2006): https://www.icloud.com/sharedalbum/#B275B9SRzyBjlH
Baden-Baden (2006): https://www.icloud.com/sharedalbum/#B275eD9V79I2XR
Berlin (2006): https://www.icloud.com/sharedalbum/#B275toRf1fH8MD
Berlin, Jena (2007): https://www.icloud.com/sharedalbum/#B27GTI3fvGVgNit
Erlangen (2006): https://www.icloud.com/sharedalbum/#B27JrotZ2JpMb0i
Garmisch (2010): https://www.icloud.com/sharedalbum/#B27JPJPSiJurzNg
Germany (2010): https://www.icloud.com/sharedalbum/#B275FhYPQP650
Stuttgart (2006): https://www.icloud.com/sharedalbum/#B27GmitydGVVaZh
Changi (2019): https://www.icloud.com/sharedalbum/#B27GnmlKoG4JHpX
Japan (2007): https://www.icloud.com/sharedalbum/#B275AerZbG6FxVL
Japan (2012): https://www.icloud.com/sharedalbum/#B27GjBjobGg6PUa
Miscellaneous (including Japan 2013): https://www.icloud.com/sharedalbum/#B27GTpbybGySbE8
Currumbin & Tugin (2021): https://www.icloud.com/sharedalbum/#B275vBKZ4xH9X6
Brisbane (2021): https://www.icloud.com/sharedalbum/#B275YHsSjxQnm0
Weed in Byron (26/6/2025): https://www.icloud.com/sharedalbum/#B275Q2ydoGsQ4O5
Weed in Byron 2: https://www.icloud.com/sharedalbum/#B27GQDYhLGwsuY4