To start on a somber note: those of us at UT Austin are in mourning this week for Savitha Shan, an undergrad double major here in economics and information systems, who was murdered over the weekend by an Islamist terrorist who started randomly shooting people on Sixth Street, apparently angry about the war in Iran. Two other innocents were also killed.
As it happens, these murders happened just a few hours after the end of my daughter’s bat mitzvah, and in walking distance from the venue. The bat mitzvah itself was an incredibly joyful and successful event that consumed most of my time lately, and which I might or might not say more about—the nastier the online trolls get, the more I need to think about my family’s privacy.
Of all the many quantum computing podcasts/interviews I’ve done recently, I’m probably happiest with this one, with Yuval Boger of QuEra. It covers all the main points about where the hardware currently is, the threat to public-key cryptography, my decades-long battle against quantum applications hype, etc. etc., and there’s even an AI-created transcript that eliminates my verbal infelicities!
A month ago, I blogged about “The Time I Didn’t Meet Jeffrey Epstein” (basically, because my mom warned me not to). Now the story has been written up in Science magazine, under the clickbaity headline “Meet Three Scientists Who Said No to Epstein.” (Besides yours truly, the other two scientists are friend-of-the-blog Sean Carroll, whose not-meeting-Epstein story I’d already heard directly from him, and David Agus, whose story I hadn’t heard.)
To be clear: as I explained in my post, I never actually said “no” to Epstein. Instead, based on my mom’s advice, I simply failed to follow up with his emissary, to the point where no meeting ever happened.
Anyway, ever since Science ran this story and it started making the rounds on social media, my mom has been getting congratulatory messages from friends of hers who saw it!
I’ve been a huge fan of the philosopher-novelist Rebecca Newberger Goldstein ever since I read her celebrated debut work, The Mind-Body Problem, back in 2005. Getting to know Rebecca and her husband, Steven Pinker, was a highlight of my last years at MIT. So I’m thrilled that Rebecca will be visiting UT Austin next week to give a talk on Spinoza, related to her latest book The Mattering Instinct (which I’m reading right now), and hosted by me and my colleague Galen Strawson in UT’s philosophy department. More info is in the poster below. If you’re in Austin, I hope to see you there!
The 88-year-old Donald Knuth has published a 5-page document about how Claude was able to solve a tricky graph theory problem that arose while he was working on the latest volume of The Art of Computer Programming—a series that Knuth is still writing after half a century. As you’d expect from Knuth, the document is almost entirely about the graph theory problem itself and Claude’s solution to it, eschewing broader questions about the nature of machine intelligence and how LLMs are changing life on Earth. To anyone who’s been following AI-for-math lately, the fact that Claude now can help with this sort of problem won’t come as a great shock. The virality is presumably because Knuth is such a legend that to watch him interact productively with an LLM is sort of like watching Leibniz, Babbage, or Turing do the same.
John Baez is a brilliant mathematical physicist and writer, who was blogging about science before the concept of “blogging” even existed, and from whom I’ve learned an enormous amount. But regarding John’s quest for the past 15 years — namely, to use category theory to help solve the climate crisis (!) — I always felt like the Cookie Monster would, with equal intellectual justification, say that the key to arresting climate change was for him to eat more Oreos. Then I read this Quanta article on the details of Baez’s project, and … uh … I confess it failed to change my view. Maybe someday I’ll understand why it’s better to say using category theory what I would’ve said in a 100x simpler way without category theory, but I fear that day is not today.
Although I have long retired from serious chess tournaments (they take too much time, a luxury I do not have anymore - even more so now that I have two infants to help grow!), I insist playing online blitz on chess.com, with alternating fortunes. My elo rating hovers in the 2200-2300 range, signalling that I still have my wits around me (I figure it is a very good way to keep a watch on my mental capabilities: if Alzheimer lurks, I will spot it early).
Vance Honeycutt is 22 years old and already widely considered a bust. Lots of power in college, Orioles’ first-round draft pick in 2024, but has a gigantic hole in his swing and spent 2025 striking out all the time, absurdly much, so much that people figured he was simply not going to learn to hit.
He has come to the plate four times in spring training and has hit four home runs. The last one went 471 feet. The sample size, she is, how we say, small. But I’m glad the guy is getting this moment of glory, at least, even if it ends up being short.
Natalie interviewed me twice for this, and it also features Matteo Capucci, Brendan Fong, Bob Coecke, David Spivak, Amar Hadzihasanovic, Nathaniel Osgood and Tom Leinster.
I’m glad that it explains a bit about ‘green mathematics’ and my struggle to do mathematics that will help the world. I coined that term exactly 15 years ago here on this blog.
We are now at an exciting point in our process of developing quantum computers and understanding their computational power: It has been demonstrated that quantum computers can outperform classical ones (if you buy my argument from Parts 1 and 2 of this mini series). And it has been demonstrated that quantum fault-tolerance is possible for at least a few logical qubits. Together, these form the elementary building blocks of useful quantum computing.
And yet: the devices we have seen so far are still nowhere near being useful for any advantageous application in, say, condensed-matter physics or quantum chemistry, which is where the promise of quantum computers lies.
So what is next in quantum advantage?
This is what this third and last part of my mini-series on the question “Has quantum advantage been achieved?” is about.
The 100 logical qubits regime I want to have in mind the regime in which we have 100 well-functioning logical qubits, so 100 qubits on which we can run maybe 100 000 gates.
Building devices operating in this regime will require thousand(s) of physical qubits and is therefore well beyond the proof-of-principle quantum advantage and fault-tolerance experiments that have been done. At the same time, it is (so far) still one or more orders of magnitude away from any of the first applications such as simulating, say, the Fermi-Hubbard model or breaking cryptography. In other words, it is a qualitatively different regime from the early fault-tolerant computations we can do now. And yet, there is not a clear picture for what we can and should do with such devices.
The next milestone: classically verifiable quantum advantage
In this post, I want to argue that a key milestone we should aim for in the 100 logical qubit regime is classically verifiable quantum advantage. Achieving this will not only require the jump in quantum device capabilities but also finding advantage schemes that allow for classical verification using these limited resources.
Why is it an interesting and feasible goal and what is it anyway?
To my mind, the biggest weakness of the RCS experiments is the way they are verified. I discussed this extensively in the last posts—verification uses XEB which can be classically spoofed, and only actually measured in the simulatable regime. Really, in a quantum advantage experiment I would want there to be an efficient procedure that will without any reasonable doubt convince us that a computation must have been performed by a quantum computer when we run it. In what I think of as classically verifiable quantum advantage, a (classical) verifier would come up with challenge circuits which they would then send to a quantum server. These would be designed in such a way that once the server returns classical samples from those circuits, the verifier can convince herself that the server must have run a quantum computation.
The theoretical computer scientist’s cartoon of verifying a quantum computer.
This is the jump from a physics-type experiment (the sense in which advantage has been achieved) to a secure protocol that can be used in settings where I do not want to trust the server and the data it provides me with. Such security may also allow a first application of quantum computers: to generate random numbers whose genuine randomness can be certified—a task that is impossible classically.
Here is the problem: On the one hand, we do know of schemes that allow us to classically verify that a computer is quantum and generate random numbers, so called cryptographic proofs of quantumness (PoQ). A proof of quantumness is a highly reliable scheme in that its security relies on well-established cryptography. Their big drawback is that they require a large number of qubits and operations, comparable to the resources required for factoring. On the other hand, the computations we can run in the advantage regime—basically, random circuits—are very resource-efficient but not verifiable.
The 100-logical-qubit regime lies right in the middle, and it seems more than plausible that classically verifiable advantage is possible in this regime. The theory challenge ahead of us is to find it: a quantum advantage scheme that is very resource-efficient like RCS and also classically verifiable like proofs of quantumness.
To achieve verifiable advantage in the 100-logical-qubit regime we need to close the gap between random circuit sampling and proofs of quantumness.
With this in mind, let me spell out some concrete goals that we can achieve using 100 logical qubits on the road to classically verifiable quantum advantage.
1. Demonstrate fault-tolerant quantum advantage
Before we talk about verifiable advantage, the first experiment I would like to see is one that combines the two big achievements of the past years, and shows that quantum advantage and fault-tolerance can be achieved simultaneously. Such an experiment would be similar in type to the RCS experiments, but run on encoded qubits with gate sets that match that encoding. During the computation, noise would be suppressed by correcting for errors using the code. In doing so, we could reach the near-perfect regime of RCS as opposed to the finite-fidelity regime that current RCS experiments operate in (as I discussed in detail in Part 2).
Random circuits with a quantum advantage that are particularly easy to implement fault-tolerantly are so-called IQP circuits. In those circuits, the gates are controlled-NOT gates and diagonal gates, so rotations , which just add a phase to a basis state as . The only “quantumness” comes from the fact that each input qubit is in the superposition state , and that all qubits are measured in the basis. This is an example of an example of an IQP circuit:
An IQP circuit starts from the all- state by applying a Hadamard transform, followed by IQP gates (in this case , some CNOT gates, , some CNOT gates, ) and ends in a measurement in the Hadamard basis.
As it so happens, IQP circuits are already really well understood since one of the first proposals for quantum advantage was based on IQP circuits (VerIQP1), and for a lot of the results in random circuits, we have precursors for IQP circuits, in particular, their ideal and noisy complexity (SimIQP). This is because their near-classical structure makes them relatively easy to study. Most importantly, their outcome probabilities are simple (but exponentially large) sums over phases that can just be read off from which gates are applied in the circuit and we can use well-established classical techniques like Boolean analysis and coding theory to understand those.
IQP gates are natural for fault-tolerance because there are codes in which all the operations involved can be implemented transversally. This means that they only require parallel physical single- or two-qubit gates to implement a logical gate rather than complicated fault-tolerant protocols which are required for universal circuits. This is in stark contrast to universal circuit which require resource-intensive fault-tolerant protocols. Running computations with IQP circuits would also be a step towards running real computations in that they can involve structured components such as cascades of CNOT gates and the like. These show up all over fault-tolerant constructions of algorithmic primitives such as arithmetic or phase estimation circuits.
Our concrete proposal for an IQP-based fault-tolerant quantum advantage experiment in reconfigurable-atom arrays is based on interleaving diagonal gates and CNOT gates to achieve super-fast scrambling (ftIQP1). A medium-size version of this protocol was implemented by the Harvard group (LogicalExp) but with only a bit more effort, it could be performed in the advantage regime.
In those proposals, verification will still suffer from the same problems of standard RCS experiments, so what’s up next is to fix that!
2. Closing the verification loophole
I said that a key milestone for the 100-logical-qubit regime is to find schemes that lie in between RCS and proofs of quantumness in terms of their resource requirements but at the same time allow for more efficient and more convincing verification than RCS. Naturally, there are two ways to approach this space—we can make quantum advantage schemes more verifiable, and we can make proofs of quantumness more resource-efficient.
First, let’s focus on the former approach and set a more moderate goal than full-on classical verification of data from an untrusted server. Are there variants of RCS that allow us to efficiently verify that finite-fidelity RCS has been achieved if we trust the experimenter and the data they hand us?
2.1 Efficient quantum verification using random circuits with symmetries
Indeed, there are! I like to think of the schemes that achieve this as random circuits with symmetries. A symmetry is an operator such that the outcome state of the computation (or some intermediate state) is invariant under the symmetry, so . The idea is then to find circuits that exhibit a quantum advantage and at the same time have symmetries that can be easily measured, say, using only single-qubit measurements or a single gate layer. Then, we can use these measurements to check whether or not the pre-measurement state respects the symmetries. This is a test for whether the quantum computer prepared the correct state, because errors or deviations from the true state would violate the symmetry (unless they were adversarially engineered).
In random circuits with symmetries, we can thus use small, well-characterized measurements whose outcomes we trust to probe whether a large quantum circuit has been run correctly. This is possible in a scenario I call the trusted experimenter scenario.
The trusted experimenter scenario In this scenario, we receive data from an actual experiment in which we trust that certain measurements were actually and correctly performed.
I think of random circuits with symmetries as introducing measurements in the circuit that check for errors.
Here are some examples of random circuits with symmetries, which allow for efficient verification of quantum advantage in the trusted experimenter scenario.
Graph states. My first example are locally rotated graph states (GStates). These are states that are prepared by CZ gates acting according to the edges of a graph on an initial all- state, and a layer of single-qubit -rotations is performed before a measurement in the basis. (Yes, this is also an IQP circuit.) The symmetries of this circuit are locally rotated Pauli operators, and can therefore be measured using only single-qubit rotations and measurements. What is more, these symmetries fully determine the graph state. Determining the fidelity then just amounts to averaging the expectation values of the symmetries, which is so efficient you can even do it in your head. In this example, we need measuring the outcome state to obtain hard-to-reproduce samples and measuring the symmetries are done in two different (single-qubit) bases.
With 100 logical qubits, samples from classically intractable graph states on several 100 qubits could be easily generated.
Bell sampling. The drawback of this approach is that we need to make two different measurements for verification and sampling. But it would be much more neat if we could just verify the correctness of a set of classically hard samples by only using those samples. For an example where this is possible, consider two copies of the output state of a random circuit, so . This state is invariant under a swap of the two copies, and in fact the expectation value of the SWAP operator in a noisy state preparation of determines the purity of the state, so . It turns out that measuring all pairs of qubits in the state in the pairwise basis of the four Bell states , where is one of the four Pauli matrices , this is hard to simulate classically (BellSamp). You may also observe that the SWAP operator is diagonal in the Bell basis, so its expectation value can be extracted from the Bell-basis measurements—our hard to simulate samples. To do this, we just average sign assignments to the samples according to their parity.
If the circuit is random, then under the same assumptions as those used in XEB for random circuits, the purity is a good estimator of the fidelity, so . So here is an example, where efficient verification is possible directly from hard-to-simulate classical samples under the same assumptions as those used to argue that XEB equals fidelity.
With 100 logical qubits, we can achieve quantum advantage which is at least as hard as the current RCS experiments that can also be efficiently (physics-)verified from the classical data.
Fault-tolerant circuits. Finally, suppose that we run a fault-tolerant quantum advantage experiment. Then, there is a natural set of symmetries of the state at any point in the circuit, namely, the stabilizers of the code we use. In a fault-tolerant experiment we repeatedly measure those stabilizers mid-circuit, so why not use that data to assess the quality of the logical state? Indeed, it turns out that the logical fidelity can be estimated efficiently from stabilizer expectation values even in situations in which the logical circuit has a quantum advantage (SyndFid).
With 100 logical qubits, we could therefore just run fault-tolerant IQP circuits in the advantage regime (ftIQP1) and the syndrome data would allow us to estimate the logical fidelity.
In all of these examples of random circuits with symmetries, coming up with classical samples that pass the verification tests is very easy, so the trusted-experimenter scenario is crucial for this to work. (Note, however, that it may be possible to add tests to Bell sampling that make spoofing difficult.) At the same time, these proposals are very resource-efficient in that they only increase the cost of a pure random-circuit experiment by a relatively small amount. What is more, the required circuits have more structure than random circuits in that they typically require gates that are natural in fault-tolerant implementations of quantum algorithms.
Performing random circuit sampling with symmetries is therefore a natural next step en-route to both classically verifiable advantage that closes the no-efficient verification loophole, and towards implementing actual algorithms.
What if we do not want to afford that level of trust in the person who runs the quantum circuit, however?
2.2 Classical verification using random circuits with planted secrets
If we do not trust the experimenter, we are in the untrusted quantum server scenario.
The untrusted quantum server scenario In this scenario, we delegate a quantum computation to an untrusted (presumably remote) quantum server—think of using a Google or Amazon cloud server to run your computation. We can communicate with this server using classical information.
In the untrusted server scenario, we can hope to use ideas from proofs of quantumness such as the use of classical cryptography to design families of quantum circuits in which some secret structure is planted. This secret structure should give the verifier a way to check whether a set of samples passes a certain verification test. At the same time it should not be detectable, or at least not be identifiable from the circuit description alone.
The simplest example of such secret structure could be a large peak in an otherwise flat output distribution of a random-looking quantum circuit. To do this, the verifier would pick a (random) string and design a circuit such that the probability of seeing in samples, is large. If the peak is hidden well, finding it just from the circuit description would require searching through all of the outcome bit strings and even just determining one of the outcome probabilities is exponentially difficult. A classical spoofer trying to fake the samples from a quantum computer would then be caught immediately: the list of samples they hand the verifier will not even contain unless they are unbelievably lucky, since there are exponentially many possible choices of .
Unfortunately, planting such secrets seems to be very difficult using universal circuits, since the output distributions are so unstructured. This is why we have not yet found good candidates of circuits with peaks, but some tries have been made (Peaks,ECPeaks,HPeaks)
We do have a promising candidate, though—IQP circuits! The fact that the output distributions of IQP circuits are quite simple could very well help us design sampling schemes with hidden secrets. Indeed, the idea of hiding peaks has been pioneered by Shepherd and Bremner (VerIQP1) who found a way to design classically hard IQP circuits with a large hidden Fourier coefficient. The presence of this large Fourier coefficient can easily be checked from a few classical samples, and random IQP circuits do not have any large Fourier coefficients. Unfortunately, for that construction and a variation thereof (VerIQP2), it turned out that the large coefficient can be detected quite easily from the circuit description (ClassIQP1,ClassIQP2).
To this day, it remains an exciting open question whether secrets can be planted in (maybe IQP) circuit families in a way that allows for efficient classical verification. Even finding a scheme with some large gap between verification and simulation times would be exciting, because it would for the first time allow us to verify a quantum computing experiment in the advantage regime using only classical computation.
Towards applications: certifiable random number generation
Beyond verified quantum advantage, sampling schemes with hidden secrets may be usable to generate classically certifiable random numbers: You sample from the output distribution of a random circuit with a planted secret, and verify that the samples come from the correct distribution using the secret. If the distribution has sufficiently high entropy, truly random numbers can be extracted from them. The same can be done for RCS, except that some acrobatics are needed to get around the problem that verification is just as costly as simulation (CertRand, CertRandExp). Again, a large gap between verification and simulation times would probably permit such certified random number generation.
The goal here is firstly a theoretical one: Come up with a planted-secret RCS scheme that has a large verification-simulation gap. But then, of course, it is an experimental one: actually perform such an experiment to classically verify quantum advantage.
Should an IQP-based scheme of circuits with secrets exist, 100 logical qubits is the regime where it should give a relevant advantage.
Three milestones
Altogether, I proposed three milestones for the 100 logical qubit regime.
Perform fault-tolerant quantum advantage using random IQP circuits. This will allow an improvement of the fidelity towards performing near-perfect RCS and thus closes the scalability worries of noisy quantum advantage I discussed in my last post.
Perform RCS with symmetries. This will allow for efficient verification of quantum advantage in the trusted experimenter scenario and thus make a first step toward closing the verification loophole.
Find and perform RCS schemes with planted secrets. This will allow us to verify quantum advantage in the remote untrusted server scenario and presumably give a first useful application of quantum computers to generate classically certified random numbers.
All of these experiments are natural steps towards performing actually useful quantum algorithms in that they use more structured circuits than just random universal circuits and can be used to benchmark the performance of the quantum devices in an advantage regime. Moreover, all of them close some loophole of the previous quantum advantage demonstrations, just like follow-up experiments to the first Bell tests have closed the loopholes one by one.
I argued that IQP circuits will play an important role in achieving those milestones since they are a natural circuit family in fault-tolerant constructions and promising candidates for random circuit constructions with planted secrets. Developing a better understanding of the properties of the output distributions of IQP circuits will help us achieve the theory challenges ahead.
Experimentally, the 100 logical qubit regime is exactly the regime to shoot for with those circuits since while IQP circuits are somewhat easier to simulate than universal random circuits, 100 qubits is well in the classically intractable regime.
What I did not talk about
Let me close this mini-series by touching on a few things that I would have liked to discuss more.
First, there is the OTOC experiment by the Google team (OTOC) which has spawned quite a debate. This experiment claims to achieve quantum advantage for an arguably more natural task than sampling, namely, computing expectation values. Computing expectation values is at the heart of quantum-chemistry and condensed-matter applications of quantum computers. And it has the nice property that it is what the Google team called “quantum-verifiable” (and what I would call “hopefully-in-the-future-verifiable”) in the following sense: Suppose we perform an experiment to measure a classically hard expectation value on a noisy device now, and suppose this expectation value actually carries some signal, so it is significantly far away from zero. Once we have a trustworthy quantum computer in the future, we will be able to check that the outcome of this experiment was correct and hence quantum advantage was achieved. There is a lot of interesting science to discuss about the details of this experiment and maybe I will do so in a future post.
Finally, I want to mention an interesting theory challenge that relates to the noise-scaling arguments I discussed in detail in Part 2: The challenge is to understand whether quantum advantage can be achieved in the presence of a constant amount of local noise. What do we know about this? On the one hand, log-depth random circuits with constant local noise are easy to simulate classically (SimIQP,SimRCS), and we have good numerical evidence that random circuits at very low depths are easy to simulate classically even without noise (LowDSim). So is there a depth regime in between the very low depth and the log-depth regime in which quantum advantage persists under constant local noise? Is this maybe even true in a noise regime that does not permit fault-tolerance (see this interesting talk)? In the regime in which fault-tolerance is possible, it turns out that one can construct simple fault-tolerance schemes that do not require any quantum feedback, so there are distributions that are hard to simulate classically even in the presence of constant local noise.
So long, and thanks for all the fish!
I hope that in this mini-series I could convince you that quantum advantage has been achieved. There are some open loopholes but if you are happy with physics-level experimental evidence, then you should be convinced that the RCS experiments of the past years have demonstrated quantum advantage.
As the devices are getting better at a rapid pace, there is a clear goal that I hope will be achieved in the 100-logical-qubit regime: demonstrate fault-tolerant and verifiable advantage (for the experimentalists) and come up with the schemes to do that (for the theorists)! Those experiments would close the loopholes of the current RCS experiments. And they would work as a stepping stone towards actual algorithms in the advantage regime.
I want to end with a huge thanks to Spiros Michalakis, John Preskill and Frederik Hahn who have patiently read and helped me improve these posts!
References
Fault-tolerant quantum advantage
(ftIQP1) Hangleiter, D. et al. Fault-Tolerant Compiling of Classically Hard Instantaneous Quantum Polynomial Circuits on Hypercubes. PRX Quantum6, 020338 (2025).
(LogicalExp) Bluvstein, D. et al. Logical quantum processor based on reconfigurable atom arrays. Nature626, 58–65 (2024).
Random circuits with symmetries
(BellSamp) Hangleiter, D. & Gullans, M. J. Bell Sampling from Quantum Circuits. Phys. Rev. Lett.133, 020601 (2024).
(GStates) Ringbauer, M. et al. Verifiable measurement-based quantum random sampling with trapped ions. Nat Commun16, 1–9 (2025).
(SyndFid) Xiao, X., Hangleiter, D., Bluvstein, D., Lukin, M. D. & Gullans, M. J. In-situ benchmarking of fault-tolerant quantum circuits. I. Clifford circuits. arXiv:2601.21472 II. Circuits with a quantum advantage. (coming soon!)
Verification with planted secrets
(PoQ) Brakerski, Z., Christiano, P., Mahadev, U., Vazirani, U. & Vidick, T. A Cryptographic Test of Quantumness and Certifiable Randomness from a Single Quantum Device. in 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS) 320–331 (2018).
(VerIQP1) Shepherd, D. & Bremner, M. J. Temporally unstructured quantum computation. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences465, 1413–1439 (2009).
(VerIQP2) Bremner, M. J., Cheng, B. & Ji, Z. Instantaneous Quantum Polynomial-Time Sampling and Verifiable Quantum Advantage: Stabilizer Scheme and Classical Security. PRX Quantum6, 020315 (2025).
(ClassIQP1) Kahanamoku-Meyer, G. D. Forging quantum data: classically defeating an IQP-based quantum test. Quantum7, 1107 (2023).
(ClassIQP2) Gross, D. & Hangleiter, D. Secret-Extraction Attacks against Obfuscated Instantaneous Quantum Polynomial-Time Circuits. PRX Quantum6, 020314 (2025).
(Peaks) Aaronson, S. & Zhang, Y. On verifiable quantum advantage with peaked circuit sampling. arXiv:2404.14493
(ECPeaks) Deshpande, A., Fefferman, B., Ghosh, S., Gullans, M. & Hangleiter, D. Peaked quantum advantage using error correction. arXiv:2510.05262
(HPeaks) Gharibyan, H. et al. Heuristic Quantum Advantage with Peaked Circuits. arXiv:2510.25838
Certifiable random numbers
(CertRand) Aaronson, S. & Hung, S.-H. Certified Randomness from Quantum Supremacy. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 933–944 (Association for Computing Machinery, New York, NY, USA, 2023).
(CertRandExp) Liu, M. et al. Certified randomness amplification by dynamically probing remote random quantum states. arXiv:2511.03686
OTOC
(OTOC) Abanin, D. A. et al. Observation of constructive interference at the edge of quantum ergodicity. Nature646, 825–830 (2025).
Noisy complexity
(SimIQP) Bremner, M. J., Montanaro, A. & Shepherd, D. J. Achieving quantum supremacy with sparse and noisy commuting quantum computations. Quantum1, 8 (2017).
(SimRCS) Aharonov, D., Gao, X., Landau, Z., Liu, Y. & Vazirani, U. A polynomial-time classical algorithm for noisy random circuit sampling. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 945–957 (2023).
(LowDSim) Napp, J. C., La Placa, R. L., Dalzell, A. M., Brandão, F. G. S. L. & Harrow, A. W. Efficient Classical Simulation of Random Shallow 2D Quantum Circuits. Phys. Rev. X12, 021021 (2022).
It has come to my attention that students at prestigious universities are referring to the practice of cold-emailing alums of their school and asking for Zoom calls to discuss possible summer internships as “networking.”
No! It is not!
“Networking” is called networking because it involves a network: the collection of strong and loose ties of various kinds between pairs of people. When an older friend from your prestigious college, who has graduated and now works at prestigious corporation X, puts in a word for you with her boss, that is networking; you’re connected to your friend, your friend is connected to her boss, you are in the same connected component of and indeed at rather short distance in the network. “Networking” means making use of the (hopefully) rich neighborhood of iterated ties that is supposed to develop organically when you move among other ambitious people who are going places, some of them places you also want to go.
A list of other alumni of your prestigious college is not a network. It is just a set. Totally different mathematical object. More precisely, it is a granfalloon. (Vonnegut specifically names “Cornell alumni” as an example!) Granfallooning is not networking.
I don’t have time to write a full post right now, but hopefully this is self-explanatory.
Regardless of their broader views on the AI industry, the eventual risks from AI, or American politics, right every person of conscience needs to stand behind Anthropic, as they stand up for their right to [checks notes] not be effectively nationalized by the Trump administration and forced to build murderbots and to help surveil American citizens. No, I wouldn’t have believed this either in a science-fiction movie, but it’s now just the straightforward reality of our world, years ahead of schedule. In particular, I call on all other AI companies, in the strongest possible terms, to do the right thing and stand behind Anthropic, in this make-or-break moment for the AI industry and the entire world.
Recently, a few people have asked me about this paper.
A couple weeks back, OpenAI announced a collaboration with a group of amplitudes researchers, physicists who study the types of calculations people do to make predictions at particle colliders. The amplitudes folks had identified an interesting loophole, finding a calculation that many would have expected to be zero actually gave a nonzero answer. They did the calculation for different examples involving more and more particles, and got some fairly messy answers. They suspected, as amplitudes researchers always expect, that there was a simpler formula, one that worked for any number of particles. But they couldn’t find it.
Then a former amplitudes researcher at OpenAI suggested that they use AI to find it.
“Use AI” can mean a lot of different things, and most of them don’t look much like the way the average person talks to ChatGPT. This was closer than most. They were using “reasoning models”, loops that try to predict the next few phrases in a “chain of thought” again and again and again. Using that kind of tool, they were able to find that simpler formula, and mathematically prove that it was correct.
A few of you are hoping for an in-depth post about what they did, and its implications. This isn’t that. I’m still figuring out if I’ll be writing that for an actual news site, for money, rather than free, for you folks.
Instead, I want to talk about a specific idea I’ve seen crop up around the paper.
See, for some, the existence of a result like this isn’t all that surprising.
Mathematicians have been experimenting with reasoning models for a bit, now. Recently, a group published a systematic study, setting the AI loose on a database of minor open problems proposed by the famously amphetamine-fueled mathematician Paul Erdös. The AI managed to tackle a few of the problems, sometimes by identifying existing solutions that had not yet been linked to the problem database, but sometimes by proofs that appeared to be new.
The Erdös problems solved by the AI were not especially important. Neither was the problem solved by the amplitudes researchers, as far as I can tell at this point.
But I get the impression the amplitudes problem was a bit more interesting than the Erdös problems. The difference, so far, has mostly been attributed to human involvement. This amplitudes paper started because human amplitudes researchers found an interesting loophole, and only after that used the AI. Unlike the mathematicians, they weren’t just searching a database.
This lines up with a general point, one people tend to make much less carefully. It’s often said that, unlike humans, AI will never be truly creative. It can solve mechanical problems, do things people have done before, but it will never be good at having truly novel ideas.
To me, that line of thinking goes a bit too far. I suspect it’s right on one level, that it will be hard for any of these reasoning models to propose anything truly novel. But if so, I think it will be for a different reason.
The thing is, creativity is not as magical as we make it out to be. Our ideas, scientific or artistic, don’t just come from the gods. They recombine existing ideas, shuffling them in ways more akin to randomness than miracle. They’re then filtered through experience, deep heuristics honed over careers. Some people are good at ideas, and some are bad at them. Having ideas takes work, and there are things people do to improve their ideas. Nothing about creativity suggests it should be impossible to mechanize.
However, a machine trained on text won’t necessarily know how to do any of that.
That’s because in science, we don’t write down our inspirations. By the time a result gets into a scientific paper or textbook, it’s polished and refined into a pure argument, cutting out most of the twists and turns that were an essential part of the creative process. Mathematics is even worse, most math papers don’t even mention the motivation behind the work, let alone the path taken to the paper.
This lack of documentation makes it hard for students, making success much more a function of having the right mentors to model good practices, rather than being able to pick them up from literature everyone can access. I suspect it makes it even harder for language models. And if today’s language model-based reasoning tools are bad at that crucial, human-seeming step, of coming up with the right idea at the right time? I think that has more to do with this lack of documentation, than with the fact that they’re “statistical parrots”.
[This post is written in my capacity as Vice-Chair of the Board of Trustees of SLMath. -T.]
SLMath, formerly MSRI, has launched the search for the next Deputy Director. This key position is a close advisor to the Director and shares in the internal management of the scientific team and programs at SLMath. This position is ideal for an experienced professional with a PhD in mathematical sciences seeking a new opportunity to leverage their strengths in program and grant management, financial management, and people management.
The Harvard faculty is proposing to hard-cap A’s at 20% per course, leading to an undergraduate crashout the good old Yard has not seen the likes of since,the dining hall discontinued the Chickwich.
Still, the current argument leaves me with nobody to really root for. When a Harvard student says “We pay to go here to get the product, which is to have a better signal of performance… If you’re just lowering that for everyone, then you’re just lowering the value you provide as a business for the same cost, even while raising tuition year over year,” I just want to cry in my soup a little bit. Is Harvard a product and a business? Sure. But is it just that? And is the product being purchased really “a better signal of performance?” Another undergrad has a slightly different spin on the new policy, and on the point of college: “It misses the point of college, which is to network, go out there, have fun…It would create so much pressure where life wouldn’t be worth that much to live.”
Again: is that the point of college? Is it, specifically, the point of Harvard? It is a point of Harvard, that I’ll grant you. I networked there. I had fun. I believe I probably did whatever is meant these days by “going out there.” But I also went to class, and made a point of really trying a lot, both in the areas that would become my job and those (painting, poetry, Biblical interpretation, the politics of the Chinese Cultural Revolution) which weren’t and were never going to be. That, too, is a point of Harvard.
Anyway: this hubbub is all a bit rich considering the Monterey Jack-level mildness of the proposed policy. The faculty wants to limit the proportion of A grades to 20% — or more precisely, 20% of the class + 4, so small seminars could still have a solid complement of A’s. (Maybe this entire thing is secretly a plan to incentivize grade-grubbing undergrads to take a seminar once in a while?) But no other grade would be limited; a professor would be free to give 80% A- and 20% A if they saw fit. And 20% A’s isn’t a return to the pre-Vietnam gentleman’s C era; it would be roughly in line with grading practices in 2006, at which time, as far as I know, life at Harvard was considered worth living.
But I don’t think the faculty’s document supporting the plan is very convincing either. The ostensible problem to be solved is that the present system offers no way for really excellent students to distinguish themselves, that grades have ceased to offer any meaningful signal at all. Is that true? That median 3.8-ish GPA translates into about half A’s and half A-‘s in your 32 college credits. The difference between straight A’s and 16 A’s and 16 A-‘s is actually pretty noticeable on a transcript!
You can actually see this in the faculty’s own dataset. They propose using APR (average percentile rank) rather than GPA for internal awards. That just means you get a score that averages your rank in each class; if you get the median grade, you get 50, whether that grade was a B or an A- or an A. Look, they say, this is highly correlated with GPA, but no grade inflation!
Well, sure — but you can read that same correlation to say maybe GPA isn’t doing such a bad job capturing academic variation among the students!
The report makes much of that vertical line at the far edge — students with 4.0 averages, who are indeed not distinguished from one another by GPA. How are you supposed to know who’s the best of the best of the best? I’m not moved by this argument. The true academic standouts are the ones who are easy, not hard to tell from the grade-grubbing masses. They do things GPA doesn’t measure, and faculty know they did these things, and they write about it in their letters.
(In passing: the faculty also complain that “the shift of the summa cum laude cutoff so close to 4.0 that summa eligibility now depends on GPAs carried out to five decimal places.” How can this be? Getting 32 A’s is a 4.0. Getting 31 As and an A- is just about 3.99. I only need two decimal places to tell those apart! Are students taking tens of thousands of courses and getting an A- in just one? Update: My source inside Harvard makes the good point that 31 As and an A- and 32 As and an A- give much closer GPAs; but that still only gets you to the third decimal place.)
The report also makes the claim, several times, that a broader grade range would improve “fairness” in the overall academic assessment. Would it? I can see the argument that the relation between GPA and academic success would be less noisy if the grade range were broader; but I’m not sure how much. More compelling to me is their argument that narrow GPA range gives an advantage to students with strong networks who have access to back-channel endorsements orthogonal to GPA (“A fine touch with the epee, that one, just the sort of man for the job.”) I can look at a 3.8 and 3.9 and see a measurable difference in how well the student did in classes, but to the outside world I will concede those numbers look pretty damned close.
Still, I can’t help feeling that the motivation for the policy is simpler. Professors don’t really believe signs are arbitrarily assigned to the signified. We have a sense of what an A means, and don’t want to give an A for work that’s not A work. But somehow we’re embarrassed to just say that; we have to dress it up in concerns about fairness and credibility and get all scatterplot about it all.
One thing that makes me sad: despite the policy difference between faculty and undergrads, they seem to agree on one big thing; that grades are the reason for students to work. The students worry that with scarcer A’s, they’ll have to work too much; the faculty worry that with too many A’s available, the students aren’t working enough. (This article in Harvard Magazine is a really nice, humane rendition of this view.)
Let me be honest about something at this point. When I was a Harvard undergrad I was a huge grade-grubber. I always knew exactly what my GPA was. I once went to office hours to argue with a very patient sociology TA that my final exam essay was better than he thought it was and how could I possibly be getting a B+? (I got the B+.) So I understand the drive for the grade and I don’t look down on it. (OK, maybe I look down on it a little, but in an “I’ve been there too” kind of way.) But I don’t think the grades were why I was working hard in class. I worked hard because I was there to learn. Not everybody who goes to Harvard is aiming to be a professional scholar, I get that. But my possibly sentimental view of Harvard is that it’s where you go because you want to learn things, in every area, with the same kind of depth and intensity as the scholars-to-be of that thing.
Since I spent this post disagreeing with everyone, and since I do more or less agree that the Harvard grading system isn’t ideal, I ought to tell you what proposal I would have written if I were king of the Harvard faculty! Here is, my three-point plan:
Bring back the 15-point scale. When I was in college, the GPA was out of 15. An A was 15, an A- was 14, a B+ was 12 (see why I was so mad at my sociology TA?) and so on. Harvard undergrads worry that a Harvard with fewer A’s will leave them with lower GPA’s than their competitors from other colleges, and employers and med schools won’t recognize “Harvard grades different.” Under this plan, they would have way higher GPA than kids from other schools! That’s partly a joke but partly not — a completely incompatible scale would make it clear that there was no direct translation. Many schools couldn’t get away with this, but Harvard can. And one of Harvard’s goofy charms is that they call everything by their own Harvard-y names. Political science is “Government.” Dorms are “Houses.” Majors are “concentrations.” Why shouldn’t a 4.0 be a 15?
Establish the A+. This option (which is what Princeton does) is brought up in the faculty report only to be rejected. That’s coming from the “A should mean A!” perspective. But an A+ would solve the “how to distinguish really superb performance” problem. You don’t even have to call it an A+ or average it into GPA; you could call it “with distinction,” and there could be a clear understanding that you were only allowed to give one or two per class, that zero distinctions is fine and even expected, and that any student who asked about distinction was disqualified for distinction. A student who racked up three or four in their career would be a visible standout.
More pass-fail. I teach a first-year seminar at Wisconsin, a writer’s workshop where the students, mostly data science majors, learn how to write good sentences and put those sentences together into good, clear paragraphs. I am going to be honest; I give mostly A’s in this class. And I stand by that practice, and would be pissed if the dean told me I had to stop, because I’m teaching a hard skill that most first-years don’t feel confident in and I think fewer students would take it if they thought they might get a bad grade for not being a fully developed writer of English prose. I want them to take the risk of coming to my class! The big downside of low grades, for me, is that they might discourage students from signing up for a hard course in an area where they don’t feel fully confident. Those are the exact courses we want students to take! (Sidebar: of course, this is also a potential downside of high grades; if you feel everybody around you has a 4.0, maybe even taking a class where you might get an A- is too great a danger to endure.) My son, as I think I’ve mentioned, goes to Brown, and I think the relaxed culture of taking courses pass-fail there really does help students stretch themselves more. I just looked this up in the student handbook, and it looks like Harvard students are also essentially unlimited in how many of their electives they can take pass-fail. So maybe this is a cultural change that Harvard needs — sure, limit A’s, but also create an Brown-like environment where students are encouraged to take that weird but interesting course with a great professor where you really might not master everything, and take it pass/fail. Heck, Harvard could require every student to take a certain number of pass-fail courses that met no requirement. Imagine the undergraduate crashout that would cause….
Say you want to find all the N-tone equal tempered scales that have better fifths than any scale with fewer notes. Mathematically this means you want to find all fractions that come closer to
than any fraction with a smaller denominator.
Let me show you the first 14 fractions like this, and then talk about how you can efficiently find them:
You get some of these by taking the continued fraction expansion of shown below, and truncating it at some point.
and so on. These are all closer to than any fraction with a smaller denominator. But this method does not give all the fractions with that property. For example, 2/3 and 4/7 are two you don’t get by this method!
The others show up whenever we end our continued fraction at a number that’s bigger than 1. The second time this happens is for
1/(𝟭 + 1/(𝟭 + 1/(𝟮 + 1/𝟮)) = 7/12
What we do now is look at the previous fraction in the list (3/5) and the one before that (1/2), and write down this funny thing built from those two:
(3n + 1)/(5n + 2)
When n = 0 this is 1/2 (already on our list), when n = 1 this is 4/7 (new), and when n = 2 this is 7/12 (already on our list). We take n from 0 to 2 because we chose to end our continued fraction at 𝟮.
The number 4/7 is not on our list, so it’s a new candidate for being closer to than any fraction with a smaller denominator. And it is!
You may get more than one new number using this procedure. Alas, they aren’t always closer to than any fraction with a smaller denominator. But this procedure does give every fraction with that property.
Even better, this algorithm is a completely general procedure for finding the best rational approximations to irrational numbers!—where by ‘best’ I mean: closer than any fraction with a smaller denominator.
Let’s look at this wacky procedure in a more interesting case. Let’s end our continued fraction where the number in boldface is bigger, like 5.
Taking n = 0,1,2,3,4,5 this gives 24/41 (on our list), 55/94, 86/147, 117/200, 148/253, and 179/306 (on our list). We take n from 0 to 5 because we chose to end our continued fraction at the number 𝟱.
The four fractions not on our list are new candidates to be closer to than any fraction with a smaller denominator! But only the last two of these actually have this good property: 117/200 and 148/253.
This is not a coincidence. In general, whenever we get an even number of new candidates this way, the last half have this good property. The first half do not.
When we get an odd number of new candidates, it becomes more tricky. The middle one can go either way—but all those after it are closer to than any fraction with a smaller denominator, and none before are.
There is a rule to decide this tricky middle case, but you’ve probably had enough by now!
Again: what makes all this stuff worth knowing is that it gives the best rational approximations of any positive irrational number, not just This is relevant to resonance problems like the rings of Saturn, which have gaps at orbital periods that are close to simple rational multiples of the periods of the big moons. But more importantly, it’s a basic part of number theory.
There are various places to read more about this stuff. I haven’t read them yet, I’m ashamed to say!
But first, some useful buzzwords.
• The best approximations to an irrational number coming from truncating its continued fraction are called ‘convergents’.
• The other candidates for best approximations, obtained by the wacky procedure I described, are called ‘semiconvergents’. These include convergents as a special case. Here are the first 14 fractions that are closer to than any fraction with a smaller denominator:
9 are convergents, and the rest are only semiconvergents.
• Given two fractions a/b and c/d their ‘mediant’ is (a + c)/(b + d). The procedure I described is based on mediants. Starting from the numbers 0/1 and 1/0 you can build a tree of numbers by taking mediants, called the ‘Stern-Brocot tree’. It looks like this:
Here are some books:
• Khinchin’s Continued Fractions covers best approximations and semiconvergents carefully, including the delicate middle case.
• Rockett and Szüsz’s Continued Fractions goes into the best-approximation theory in lots of detail.
• If you like the Stern–Brocot tree, you may like to think about how semiconvergents are connected to that. For this, see Conway and Guy’s The Book of Numbers, and Graham, Knuth, and Patashnik’s Concrete Mathematics. Both these books are packed with fun.
All this from trying to understand equal-tempered scales!
For more on equal temperament, read these:
• Part 1: Some equal-tempered scales with better perfect fifths than all equal-tempered scales with fewer notes: 1-TET, 2-TET, 3-TET, 5-TET, 7-TET, 12-TET, 29-TET, 41-TET and 53-TET.
• Part 2: Patterns that emerge when we study which equal-tempered scales have the best perfect fifths, major thirds or minor thirds.
I've been attending a lot of talks lately about AI/machine learning and multiscale modeling for materials design and control. This is a vast, rapidly evolving research area, so here is a little background and a few disorganized thoughts.
For a recent review article about AI and materials discovery, see here. There is a ton of work being done pursuing the grand goal of inverse design - name some desired properties, and have AI/ML formulate a material that fits those requirements and is actually synthesizable. Major companies with publicly known efforts include Google Deepmind and GNoME, Microsoft, Meta working on catalysts, Toyota Research Institute, IBM, and I'm certain that I'm missing major players. There are also a slew of startup companies on this topic (e.g. Periodic).
In addition to materials design and discovery, there is enormous effort being put into using AI/ML to bridge across length and timescales. Quantum chemistry methods can look at microscopic physics and chemistry, for example, but extending this to macroscopic system sizes with realistic disorder is often computationally intractable. There are approaches like time-dependent DFT and DMFT to try to capture dynamics, but following dynamics even as long as picoseconds is hard. Using microscopic methods and ML to try to compute and then parametrize force fields between atoms (for example), one can look at larger systems and longer timescales using molecular dynamics for atomic motions. However, getting from there to, e.g., the Navier-Stokes equations or understanding phase boundaries, is very difficult. (At the same time, there are approaches that use AI/ML to learn about the solutions of partial differential equations, so that one can, for example, compute good fluid flows quickly without actually having to solve the N-S equations - see here.)
We want to keep coarse-graining (looking at larger scales), while maintaining the microscopic physics constraints so that the results are accurate. There seems to be a lot of hope that either by design or by the action of the AI/ML tools themselves we can come up with descriptors that are good at capturing the essential physics as we move to larger and larger scales. To use a fluids example, somehow we are hoping that these tools will naturally capture that at scales much larger than one water molecule, it makes sense to track density, temperature, velocity fields, surface tension, liquid-vapor interfaces, etc.
One rough description of emergence is the idea that at larger scales and numbers of constituents, new properties appear for the collective system that are extremely difficult to predict from the microscopic rules governing the constituents. For example, starting from the Schroedinger equation and basic quantum mechanics, it's very hard to determine that snowflakes tend to have 6-fold symmetry and ice will float in water, even though the latter are of course consequences of the former. A nice article about emergence in physics is here.
It feels to me like in some AI/ML endeavors, we are hoping that these tools will figure out how emergence works better than humans have been able to do. This is certainly a worthy challenge, and it may well succeed in a lot of systems, but then we may have the added meta-challenge of trying to understand how our tools did that. Physics-informed and structured ML will hopefully take us well beyond the situation in the xkcd comic shown here.
(guest post by Dimitris Tsementzis, about joint work with Benedikt Ahrens, Paige North, and Mike Shulman)
The Univalence Principle is the informal statement that equivalent mathematical structures are indistinguishable.
There are various ways of making this statement formally precise, and a long history of work that does so.
In our recently-published (but long in the making!) book we proved to our knowledge the most general version of this principle, which applies to set-based, categorical, and higher-categorical structures defined in a non-algebraic and space-based style, as well as models of higher-order theories such as topological spaces.
This work achieves three main goals.
Firstly, it greatly extends the “Structure Identity Principle” from the original HoTT book, to include any (finite) level of structure, instead of just set-based structures, thus establishing in the strongest sense yet made precise that the Univalent Foundations provide an equivalence-invariant foundation for higher-categorical mathematics.
Secondly, it provides very general novel definitions of equivalence between structures and between objects in a given structure that “compile” to most known notions of equivalence in known cases, but which can also be used to suggest notions in new settings; in doing so it extends M. Makkai’s classic work on First Order-Logic with Dependent Sorts (FOLDS).
Thirdly, the setting in which our result is proved (a form of Two-Level Type Theory) provides a framework in which to do metamathematics in the Univalent Foundations/HoTT, i.e. carry out the mathematical study of how mathematics is formalized in UF/HoTT.
The Univalence Principle we prove is a foundational metamathematical result in that sense.
Setting the Stage
Any “Univalence Principle” type of result has the following form:
The result gains in strength if the class of is as large as possible, and the notion of equivalence between them coincides with known notions of equivalence in practice (where is a placeholder for a notion of signature in terms of which and are structures).
Such a result also gains in strength if the middle notion of equivalence is as “structural” as possible, ensuring that not only properties of the structures are preserved, but also constructions built on top of them.
This last feature can only really be pursued within a univalent type theory, like HoTT.
In our work, we define: diagrammatic signatures as inverse categories of finite height, -structures as Reedy-fibrant functors , and a notion of indiscernibility relating objects within structures, which yields a derived notion of equivalence between structures (essentially structure-preserving bijections up to indiscernibility).
The primitive notions and are given by equivalence and equality in 2LTT, appropriately defined.
Signatures, Structures (and where they live)
Two-level type theory (2LTT)
In 2LTT, there is an external level for exo-types and other exo-gizmos (allowing strictly functorial constructions), and the usual fibrant (HoTT/UF) level where mathematical objects live.
The external level is the metamathematical or syntactic level, where a strict equality exists that allows us to define syntax and symbols.
Our exo-gizmos are analogous to sets of syntactical symbols used to define signatures in first-order logic.
Such syntax consists of categories with strict equality on composition, which we call exo-categories.
Equalities here are strict (exo-equalities), while the internal world has homotopical paths.
The 2LTT setting is convenient for type-theoretic reasoning, and allows us to neatly separate the various notions of equality at play.
Diagram signatures are inverse categories of finite height
A diagram signature is an inverse exo-category of finite height equipped with a rank functor
that reflects identities. Objects of are the sorts (analogous to “sorts” in first-order logic); morphisms encode dependencies (“an arrow depends on a pair of objects,” etc.). Inverse-ness gives matching objects and allows Reedy methods to apply.
To illustrate the idea, take the example of a reflexive graph. The diagram signature for reflexive graphs would be given by the following inverse (exo-)category
where . The intuition is that we have a sort of “arrows” between any two objects, and a predicate (“identity”) that can be used to select which arrows are identity arrows with the relation ensuring that this predicate can only be “asked” of loops.
Structures are Reedy fibrant diagrams over these signatures
Given this notion of a signature, a structure in our sense is simply a (Reedy fibrant) functor . In more detail, a raw -diagram is an exo-functor
For each sort , the matching object collects the compatible lower-rank data needed to specify the “boundary” for an element of . The Reedy fibrancy condition is: the canonical boundary map is a fibration (i.e., a dependent type) for each .
The category of such Reedy-fibrant diagrams then forms a fibrant type whose points are the -structures.
To illustrate with the example of from above, an -structure in our sense would be given by the following data, in type-theoretic notation:
A type
A family dependent on
A family for the “identity”
The trick here is to ensure the repeated in the definition of , obeying the relations of the signature.
This is what the matching object machinery achieves for arbitrary signatures.
Other examples of -structures include: categories (with sorts for objects and morphisms), groupoids, -categories for any , preorders, and models of higher-order theories like topological spaces and sup-lattices.
Furthermore, all of what we say applies to theories with axioms (not just structures), which we define in the book too.
Indiscernibility and local univalence
With our formal set-up complete, we address the central question: for arbitrary signatures , when are two “objects” in an -structure equivalent?
Such an “object” could be a categorical (or higher-categorical) gadget itself when has height , e.g. the “objects” of are themselves categories, which are -structures for an of lower height.
The key idea is: two “objects” are indiscernible if nothing in the signature can distinguish them…up to indiscernibility!
Indiscernibilities and Local Univalence
Fix a signature , an -structure , a rank (“bottom”) sort , and elements .
Think of as a “set” of objects (e.g. of a category) on top of which properties and structures are defined.
To define when and are indiscernible, we package together everything that could distinguish them. The intuition is: if there is an equivalence between “everything you can say about ” and “everything you can say about ” (outside of directly referencing or themselves), then they are indiscernible.
We achieve this through a formal definition of a “boundary” , which one can think of as the -structure that contains everything that could distinguish in from anything else in .
Which naturally leads us to the following definitions.
Definition (Indiscernibility). We say that are indisernible, written , iff there is a levelwise equivalence that is coherent in the right way.
Definition (Univalent Structure). We say that is locally univalent at K if the canonical map
is an equivalence. We then say a structure is univalent if this holds at all rank-0 sorts and (recursively) for all sorts of higher rank.
We prove our main results for univalent -structures. These are quite special in that they are “saturated” in their equivalence information: two indistinguishable gizmos are actually equal. Or, put differently: when there is not “enough” structure to distinguish two gizmos, there is always enough to prove them equivalent. Some examples to illustrate (see Part 2 of the book for many more!):
In a reflexive graph structure, two nodes and are indiscernible iff they have the same number of arrows coming in and out, and the same number of loops that are identities.
In a category, two objects are indiscernible iff they are isomorphic. A univalent category is precisely a category (precategory in UF) that is locally univalent at the sort of its objects.
In an appropriately defined bicategory, an indiscernibility amounts to a pair of coherent adjoint equivalenes, as expected.
Equivalences of structures
We are almost there! On to the final missing piece: equivalences between structures themselves.
Let be a morphism of -structures, defined in the expected way (as a natural transformation between the corresponding -valued functors). Then, for each sort , we have a matching square
and for each “context” an induced fiber map
Write for indiscernibility at sort (as above). Then, what we really want to say now is: , are equivalent if they are “level-wise equivalent up to indiscernibility”. This idea gives us the from the beginning, and we define it as follows:
Definition (Equivalence of -structures). is an equivalence if, for every sort , every , and every , there exists a specified with
i.e., is essentially split surjective up to indiscernibility on each fiber. We write for the type of equivalences.
The key innovation is the “up to indiscernibility” part; it makes our notion significantly weaker than usual notions, and hence the final result stronger.
Note that we have not been able to prove our result without a splitness condition in the definition of equivalence, and to our mind this remains an open problem.
Our definition is related to Makkai’s original notion of FOLDS equivalence, but Makkai was not able to define a general notion of non-surjective equivalence directly, relying instead on spans. Our notion of indiscernibility circumvents this difficulty and allows us to consider the whole structure of equivalences between structures.
The Univalence Principle
With all our apparatus in place we prove our main result, a very general form of a univalence principle, as promised.
Theorem (“The Univalence Principle”).
For a signature and univalent -structures , the canonical map
is an equivalence of types. In other words,
.
The proof proceeds by induction on the height of . The key insight is that level-wise equivalences between univalent structures must “reflect indiscernibilities”: if doesn’t preserve the ability to distinguish elements, then whatever structure distinguishes them in the source would transfer to structure distinguishing their images in the target, contradicting the equivalence.
With the splitness assumption in the map and the assumption of univalence (of our -structure), we are able to achieve the lifting of the indiscernibility.
Our result is actually proved for an even more general class of signatures called functorial signatures, which strictly extends diagram signatures and covers “higher-order” situations (topological spaces, suplattices, DCPOs, etc.).
We have stuck to the diagrammatic view in this post for intuition, but all results and definitions carry over to this more general notion.
Conclusion
In the course of this work there were quite a few questions we tried to answer, but could not. To mention a couple: Can we define a Rezk completion for arbitrary structures, providing a universal way to turn any structure into a univalent one? Can we remove the splitness condition from our definition of equivalence between structures? We list more open problems at the end of the book.
Beyond our specific result, the framework in which it is proven establishes a way to answer metamathematical questions around the univalence axiom in a precise and fruitful way.
It is important to emphasize that carrying out this type of mathematical study does not require choosing one foundation over the other.
In any setting that interprets the fibrant part of 2LTT, the univalence principle will hold, including in set theory.
The metamathematics of UF is the mathematical study of formalizing mathematics in terms of a hierarchy of -levels vs. a cumulative hierarchy of sets.
Formalizing mathematics in this way has all sorts of unique mathematical properties.
The Univalence Principle is one of them.
These days I am putting the finishing touches on a hybrid algorithm that optimizes a system (a gamma-ray observatory) by combining reinforcement-learning with gradient descent. Although I published an optimization strategy for that application already, I am going back to it to demonstrate a case where the simultaneous optimization of hardware and software is necessary, for a paper on co-design I am writing with several colleagues. In the course of the software development, I ran into a simple but still interesting statistical issue I had not paid attention to until now. So I thought I could share it with you here.
It’s there in every biography, and many interviews: the moment the scientist falls in love with an idea. It can be a kid watching ants in the backyard, a teen peering through a telescope, or an undergrad seeing a heart cell beat on a slide. It’s a story so common that it forms the heart of the public idea of a scientist: not just someone smart enough to understand the world, but someone passionate enough to dive in to their one particular area above all else. It’s easy to think of it as a kind of passion most people never get to experience.
And it does happen, sometimes. But it’s a lot less common than you’d think.
I first started to suspect this as a PhD student. In the US, getting accepted into a PhD program doesn’t guarantee you an advisor to work with. You have to impress a professor to get them to spend limited time and research funding on you. In practice, the result was the academic analog of the dating scene. Students looked for who they might have a chance with, based partly on interest but mostly on availability and luck and rapport, and some bounced off many potential mentors before finding one that would stick.
Then, for those who continued to postdoctoral positions, the same story happened all over again. Now, they were applying for jobs, looking for positions where they were qualified enough and might have some useful contacts, with interest into the specific research topic at best a distant third.
Working in the EU, I’ve seen the same patterns, but offset a bit. Students do a Master’s thesis, and the search for a mentor there is messy and arbitrary in similar ways. Then for a PhD, they apply for specific projects elsewhere, and as each project is its own funded position the same job search dynamics apply.
The picture only really clicked for me, though, when I started doing journalism.
Nowadays, I don’t do science, I interview people about it. The people I interview are by and large survivors: people who got through the process of applying again and again and now are sitting tight in an in-principle permanent position. They’re people with a lot of freedom to choose what to do.
And so I often ask for that reason, that passion, that scientific love at first sight moment: why do you study what you do? It’s a story that audiences love, and thus that editors love, it’s always a great way to begin a piece.
But surprisingly often, I get an unromantic answer. Why study this? Because it was available. Because in the Master’s, that professor taught the intro course. Because in college, their advisor had contacts with that lab to arrange a study project. Because that program accepted people from that country.
And I’ve noticed how even the romantic answers tend to be built on the unromantic ones. The professors who know how to weave a story, to self-promote and talk like a politician, they’ll be able to tell you about falling in love with something, sure. But if you read between the lines, you’ll notice where their anecdotes fall, how they trace a line through the same career steps that less adroit communicators admit were the real motivation.
There’s been times I’ve thought that my problem was a lack of passion, that I wasn’t in love the same way other scientists were in love. I’ve even felt guilty, that I took resources and positions from people who were. There is still some truth in that guilt, I don’t think I had the same passion for my science as most of my colleagues.
But I appreciate more now, that that passion is in part a story. We don’t choose our specialty, making some grand agentic move. Life chooses for us. And the romance comes in how you tell that story, after the fact.
The STOC’2026 accepted papers list is out. It seems to me that there’s an emperor’s bounty of amazing stuff this year. I felt especially gratified to see the paper on the determination of BusyBeaver(5) on the list, reflecting a broad view of what theory of computing is about.
There’s a phenomenal profile of Henry Yuen in Quanta magazine. Henry is now one of the world leaders of quantum complexity theory, involved in breakthroughs like MIP*=RE and now pioneering the complexity theory of quantum states and unitary transformations (the main focus of this interview). I’m proud that Henry tells Quanta that learned about the field in 2007 or 2008 from a blog called … what was it again? … Shtetl-Optimized? I’m also proud that I got to help mentor Henry when he was a PhD student of my wife Dana Moshkovitz at MIT. Before I read this Quanta profile, I didn’t even know the backstory about Henry’s parents surviving and fleeing the Cambodian genocide, or about Henry growing up working in his parents’ restaurant. Henry never brought any of that up!
See Lance’s blog for an obituary of Joe Halpern, a pioneer of the branch of theoretical computer science that deals with reasoning about knowledge (e.g., the muddy children puzzle), who sadly passed away last week. I knew Prof. Halpern a bit when I was an undergrad at Cornell. He was a huge presence in the Cornell CS department who’ll be sorely missed.
UT Austin has announced the formation of a School of Computing, which will bring together the CS department (where I work) with statistics, data science, and several other departments. Many of UT’s peer institutions have recently done the same. Naturally, I’m excited for what this says about the expanded role of computing at UT going forward. We’ll be looking to hire even more new faculty than we were before!
When I glanced at the Chronicle of Higher Education to see what was new, I learned that researchers at OpenAI had proposed a technical solution, called “watermarking,” that might help tackle the crisis of students relying on AI to write all their papers … but that OpenAI had declined to deploy that solution. The piece strongly advocates a legislative mandate in favor of watermarking LLM outputs, and addresses some of the main counterarguments to that position.
One bad thing about archeologists is that some of the successful ones get a big head.
People used to think the Olmecs, who made these colossal stone heads, were contemporary with the Mayans. But in 1939, an archaeologist couple, Marion and Matthew Stirling, found the bottom half of an Olmec stone that had part of a date carved on it!
The Stirlings guessed the date was 7.16.6.16.18. In the calendar used by the Olmecs and other Central American civilizations, this corresponds to September 3, 32 BC. That meant the Olmecs were extremely old—much older than the Mayans.
But the first digit was missing from the bottom half of the stone! All the Stirlings actually saw was 16.6.16.18. And the missing first digit was the most significant one! If it were 8 instead of 7, the date of the stone would be much later: roughly 362 AD, when the Mayans were in full swing.
The Stirlings guessed that the first digit must be 7 using a clever indirect argument. But perhaps because of the subtlety of this argument, and certainly because of the general skepticism among experts that the Olmecs were so old, few believed the Stirlings.
But then, 30 years later, in 1969, they were proven correct! A farmer found the other half of the stone and confirmed that yes, the missing digit was a 7. So the date on Stela C really is September 3, 32 BC.
That’s a wonderful story of delayed vindication. But it leaves two mysteries.
• First, how in the world could the Olmec calendar be so damn good that we can look at that date and know it meant September 3, 32 BC?
• Second, what clever argument did the Stirlings use to guess the missing digit?
You can only fully understand the answers if you know a bit about the Olmec way of counting time. Like the Mayans, they used the Mesoamerican Long Count Calendar. This identifies a day by counting how many days passed since the world was created. The count is more or less base 20, except that the second “digit” is in base 18, since they liked a year that was 18 × 20 = 360 years long. So,
But then we have to ask: when did the Olmecs and Mayans think the world was created? Experts believe they know: September 6, 3114 BCE in the proleptic Julian calendar, where ‘proleptic’ means roughly that we’re extrapolating this calendar back to times long before anyone used this calendar.
But enough background. I asked my friend Gro-Tsen
how in the world could the Olmec calendar be so damn good that we can look at that date and know it meant September 3, 32 BC?
And while I’ve already given a kind of answer, I’ve skimmed over many subtleties. So, it’s worth reading his answer:
I did the math.
It’s Sept. 3, 32BCE (reminder: “32BCE” actually means “−31” ) in the proleptic Julian calendar = Sept. 1 prol. Gregorian.
The Western equivalent of the Mesoamerican Long Count is the “Julian Date”. The Julian Date simply counts the number of days from an arbitrary remote reference point (Nov. 24, 4714 BCE proleptic Gregorian). More practically, on 2000-01-01 it equaled 2 451 545 (at 12:00 UTC if we want to use fractional Julian dates).
For example, today as I write is Julian Date 2 461 082 (well, 2 461 081.9 because it’s not yet noon UTC). And the date of Sept. 1, 32 BCE [prol. Greg.] we’re talking about corresponds to Julian Date 1 709 981. More convenient than all this dealing with complicated calendar conventions.
So to convert a Long Count date to the Western calendar, we first convert the Long Count to an integer (trivial: it’s already just an integer written in base 20-except-18-in-the-penultimate-digit), we add a constant (C) to get a Julian Date, and we convert to our messy calendars.
BUT! What is this constant C? This is known as the “Mayan correlation”. For a long time in the 20th century there was a debate about its value: scholars could relate any two Mayan dates, but not situate them exactly w.r.t. our own calendar. Various values were proposed, … ranging from the (frankly rather ludicrous) 394 483 to 774 078, an interval of about 1000 years! () The now accepted value for C is 584 283 (the “Goodman-Martínez-Thompson” or GMT correlation, not to be confused with Greenwich Mean Time or UTC ), first proposed in 1905.
This C = 584 283 or “GMT” correlation value places the “Long Count epoch” 0.0.0.0.0 on August 11, 3114BCE in the proleptic Gregorian calendar (the day with Julian Date 584 283), although IIUC it’s not clear if this precise date held any particular importance to the Olmecs (or later Mayans).
Maybe it was just arbitrary like the start of our own Julian Date (because, no, Julius Scalier didn’t think the world started on November 24, 4714BCE proleptic Gregorian).
One Mayan inscription suggest that the Long Count was the truncation to the last 5 “digits” of an even longer count, and that a Long Count value such as 9.15.13.6.9 was in fact 13.13.13.13.13.13.13.13.9.15.13.6.9 in this Even Longer Count (why 13 everywhere? I don’t know!). But this may be one particular astronomer’s weird ideas, I guess we’ll never know.
But back to the Mayan correlation constant C.
Wikipedia suggests that this “GMT” value C = 584 283 for the Mayan correlation is now settled and firmly established. But between 1905 and now there was some going back and forth with various authors (including the three Goodman, Martínez and Thompson after which it is named) adding or removing a day or two (I think Goodman first proposed 584 283, then changed his mind to 584 280, but nobody really cared, Hernández resurrected the proposal in 1926 but altered it to 584 284, then Thompson to 584 285 in 1927, and then Thompson later said Goodman’s initial value of 584 283 had been right all long, and while this is now accepted, the confusion of ±3 days might still linger).
The Emacs program’s calendar (M-x calendar) can give you the Long Count date (type ‘p m’ for “Print Mayan date”) and uses the GMT value C = 584 283. Today is 13.0.13.5.19. (You can also go to a particular Long Count date using ‘g m l’ but Emacs won’t let you go to 7.16.6.16.18 because its calendar starts on January 1, 1 prol. Gregorian = Julian Date 1 721 426 = Long Count 7.17.18.13.3. So close! This caused me some annoyance in checking the dates.)
So anyway, 7.16.6.16.18 is
(((7×20+16)×20+6)×18+16)×20+18 = 1 125 698 days
after the Long Count epoch, so Julian Date 1 125 698 + 584 283 = 1 709 981 if we accept the GMT value of C = 584 283 for the Mayan correlation, and this is September 1, 32 BCE in the proleptic Gregorian calendar, or September 3, 32 BCE in the proleptic Julian calendar. (I write “proleptic” here, even though the Julian calendar did exist in 32 BCE, because it was incorrectly applied between 45 BCE and 9 BCE, with the Pontiffs inserting a leap year every 3 years, not 4, and Augustus had this mess fixed.)
Also, confusingly, if we use Thompson’s modified (and later disavowed) correlation of 584 285, then we get September 3, 32 BCE in the proleptic Gregorian calendar, so maybe this could also be what was meant. Yeah, Julian Dates are a great way of avoiding this sort of confusion!
All this is great. But it leaves us with the second puzzle:
how in the world did the Stirlings guess the missing first digit of the date on the bottom half of Stela C?
Here’s the answer, as best as I can tell:
The Olmecs and Mayans used two calendars! In addition to the Mesoamerican Long Count, they also used one called the Tzolkʼin. This uses a 260-day cycle, where each day gets its own number and name: there are 13 numbers and 20 names. And the bottom half of Stela C had inscribed not only the last four digits of the Mesoamerican Long Count, but also the Tzolkʼin day: 6 Etz’nab.
This is what made the reconstruction possible!
Here’s why 7 was the only possible choice of the missing first digit. If the digit were one higher, that would make the date 144,000 days later. But there are 20 different Tzolkʼin day names, and
144,000 ≡ 0 mod 20
so the Tzolkʼin day name wouldn’t change.
On the other hand, there are 13 different Tzolkʼin day numbers, so adding 1 to the missing first digit would add
144,000 ≡ –1 (mod 13)
to the Tzolkʼin day number. So, after the day
7.16.6.16.18 and 6 Etz’nab
the next day of the form
N.16.6.16.18 and 6 Etz’nab
happens when N = 7+13. But this is 13 × 144,000 days later: that is, roughly 5,128 years after 32 BC. Far in the future!
So, while 32 BC seemed awfully early for the Olmecs to carve this stone, there’s no way they could have done it later. (Or earlier, for that matter.)
Here is the Stirlings’ actual photo of Stela C:
This is from
• Matthew W. Stirling, An Initial Series from Tres Zapotes, Vera Cruz, Mexico. National Geographic Society Contributions, Technical Papers, Mexican Archaeological Series, Vol. 1, No. 1. Washington, 1940.
By the way, in this paper he doesn’t actually explain the argument I just gave. Apparently he assumes that expert Mayanists would understand this brief remark:
Assuming then that the number 6 adjacent to the terminal glyph represents the coefficient of the day sign, the complete reading of the date would be (7)-16-6-16-18, or 6 Eznab 1 Uo, since only by supplying a baktun reading of 7 can the requirements of the day sign 6 be satisfied.
I can’t help but wonder if this was much too terse! I haven’t found any place where he makes the argument in more detailed form.
Puzzle 1. What does “1 Uo” mean, and what bearing does this have on the dating of Stela C?
Puzzle 2. Why does the Tzolkʼin calendar use a 260-day cycle?
The second one is extremely hard: there are several theories but no consensus.
By the way, we now know that the Olmecs go back way beyond 32 BCE! The site where Stela C found, Tres Zapotes, was the third capital city of the Olmecs. The first, San Lorenzo Tenochtitlán, was the center of Olmec culture from about 1400 to 900 BCE. The second, La Venta, was preeminent from about 900 to 400 BCE. Tres Zapotes may have first emerged around 900-800 BCE, but it was dominant from about 400 BCE to 300 CE, and finally fizzled out much later, around 900 CE. In fact this later period is now usually called, not Olmec, but Epi-Olmec.
Just a brief announcement that I have been working with Quanta Books to publish a short book in popular mathematics entitled “Six Math Essentials“, which will cover six of the fundamental concepts in mathematics — numbers, algebra, geometry, probability, analysis, and dynamics — and how they connect with our real-world intuition, the history of math and science, and to modern practice of mathematics, both in theory and in applications. The scheduled publication date is Oct 27, but it is currently available for preorder.
So, a group based in Sydney, Australia has put out a preprint with a new estimate of the resource requirements for Shor’s algorithm, claiming that if you use LDPC codes rather than the surface code, you should be able to break RSA-2048 with fewer than 100,000 physical qubits, which is an order-of-magnitude improvement over the previous estimate by friend-of-the-blog Craig Gidney. I’ve now gotten sufficiently many inquiries about it that it’s passed the threshold of blog-necessity.
A few quick remarks, and then we can discuss more in the comments section:
Yes, this is serious work. The claim seems entirely plausible to me, although it would be an understatement to say that I haven’t verified the details. The main worry I’d have is simply that LDPC codes are harder to engineer than the surface code (especially for superconducting qubits, less so for trapped-ion), because you need wildly nonlocal measurements of the error syndromes. Experts (including Gidney himself, if he likes!) should feel free to opine in the comments.
I have no idea by how much this shortens the timeline for breaking RSA-2048 on a quantum computer. A few months? Dunno. I, for one, had already “baked in” the assumption that further improvements were surely possible by using better error-correcting codes. But it’s good to figure it out explicitly.
On my Facebook, I mused that it might be time for the QC community to start having a conversation about whether work like this should still be openly published—a concern that my friends in the engineering side of QC have expressed to me. I got strong pushback from cryptographer and longtime friend Nadia Heninger, who told me that the crypto community has already had this conversation for decades, and has come down strongly on the side of open publication, albeit with “responsible disclosure” waiting periods, which are often 90 days. While the stakes would surely be unusually high with a full break of RSA-2048, Nadia didn’t see that the basic principles there were any different. Nadia’s arguments updated me in the direction of saying that groups with further improvements to the resource requirements for Shor’s algorithm should probably just go ahead and disclose what they’ve learned, and the crypto community will have their backs as having done what they’ve learned over the decades was the right thing. Certainly, any advantage that such disclosure would give to hackers, who could take the new Shor circuits and simply submit them to the increasingly powerful QCs that will gradually come online via cloud services, needs to be balanced against the loud, clear, open warning the world will get to migrate faster to quantum-resistant encryption.
I’m told that these days, the biggest practical game is breaking elliptic curve cryptography, not breaking Diffie-Hellman or RSA. Somewhat ironically, elliptic curve crypto is likely to fall to quantum computers a bit before RSA and Diffie-Hellman will fall, because ECC’s “better security” (against classical attacks, that is) led people to use 256-bit keys rather than 2,048-bit keys, and Shor’s algorithm mostly just cares about the key size.
In the acknowledgments of the paper, I’m thanked for “thoughtful feedback on the title.” Indeed, their original title was about “breaking RSA-2048″ with 100,000 physical qubits. When they sent me a draft, I pointed out to them that they need to change it, since journalists would predictably misinterpret it to mean that they’d already done it, rather than simply saying that it could be done.
I wrote code this weekend to look at the question of how we should visit a star in the upcoming Terra Hunting Experiment. The current (straw-person) plan is that we will observe each visible star once per night for ten years, with one exposure of a sensibly-chosen exposure time at each visit. Is this a good idea? I was interested in this problem for two reasons. The first is that binning is sinning, with the corollary that bigger bins are worse than finer bins, and a single, long exposure is a very big bin. The second reason is that when there are non-trivial noise sources (like the quasi-periodic variations from p-mode oscillations of the surfaces of Sun-like stars), a few negatively- (or interestingly-) correlated noise draws can be combined in ways that are substantially more informative than by taking the average.
Of course, if you split an exposure (with a standard CCD, say) into sub-exposures, you take on real costs: There is a read time, which is time you aren't integrating, and there is a read noise, that affects each new exposure. So the best strategies are a complicated function of read time, read noise, and the signal-to-noise at which the stellar p-mode oscillations are visible in any realistic data. Related: There are amazingly different and interesting strategies with up-the-ramp detectors that are used in the infrared.
One final comment is that the objective, in my strongly held view, is to optimize the amount of information (about, say, the center-of-mass radial-velocity changes of the target star) per unit wall-clock time. We are paying for wall-clock time; let's get as much as we can out of it.
Now that we're 6 weeks into the new year, I think it's worth it to do an incomplete roundup of where we are on US federal support of STEM research. Feel free to skip this post if you don't want to read about this.
Appropriators in Congress largely went against the FY26 presidential budget request, and various spending bills by and large slightly-less-than level-funded most US science agencies. A physics-oriented take is here. The devil is in the details. The AAAS federal R&D dashboard lets you explore this at a finer level. Nature has an interactive widget that visualizes what has been cut and what remains.
Bear in mind, that was just year 1 of the present administration. All of the effort, all of the work pushing back against proffered absolutely draconian, agency-destroying cuts? That likely will have to be done again this year. And in subsequent years, if the administration still invests effort in pushing enormously slashed budgets in their budget requests.
There is an issue of Science with the whole news section about how the past year has changed the science funding and pipeline in the US.
In NSF news, the rate of awards remains very low, though there is almost certainly a major delay because of the lateness of the budget, coping with reduced staffing levels, and restructuring now that Divisions no longer exist. How greater emphasis on specific strategic priorities (beyond what is in the program calls) will affect operations remains unclear, at least to me.
Also, some NSF graduate research fellowship applications, especially in the life sciences, seem to be getting kicked back without review - see here (sorry about the paywall). This seems to be a broad research area issue, despite no information to applicants about this (that lack of information flow is perhaps unsurprising).
The back and forth about indirect cost rates continues, along with the relevant court cases. The recent appropriations have language to prevent sudden changes in rates. The FAIR model is not yet passed.
I could go on. I know I've left out critical areas, and I haven't talked about DOE or NASA or DOD or EPA or NOAA explicitly.
Honest people can have discussions about the right balance of federal vs state vs industrial vs philanthropic support for research. There are no easy answers in the present time. For those who think that robust public investment in science and engineering research is critical to societal good, economic competitiveness, and security, we need to keep pushing and not let fatigue or fatalism win the day.
Tomorrow is Valentine’s Day, so it’s time for this blog’s yearly tradition of posting a poem. Next week there may be a prose take on the same topic.
You’ve heard love stories like Oliver’s, I’m sure. Meeting that childhood sweetheart In the back room, with the garden view And trust that, with a wink, the parents may regret. Stories tungsten-milled To fit our expectations.
And you’ve heard wilder stories From genuinely riskier lives. The rescue and the love linked under the Milky Way Like an action movie. The love’s reality, even so, Defying summary.
You’ve heard stories of wide-eyed students Realizing they can be adults. Of those moments in study or celebration Turning points in self-conception. And maybe you don’t ask About the other times.
Love happens, And we love love to happen. But we build love too.
Today my rant on LLMs and the practices of our field hit the arXiv. I was scared to post it, because it is such a weird contribution, and it is so revealing about myself and my own political positions and hangups. But I have to say: I got great and supportive feedback all day.
I got two comments on saying ACAB in the literature. The Astronomer Royal of Scotland quoted (on BlueSky) the last sentence, which I put there because Andy Casey (Monash, Flatiron) insisted. Many people sent me appreciation and thank-yous, and many people sent me comments and objections. Always constructive. The whole experience made me feel very happy about the state of our field and the way we all interact. I think maybe there will be critical mass to write some kind of collection of essays on the subject. That's a plan for 2026.
The following are prepared remarks that I delivered by Zoom to a student group at my old stomping-grounds of MIT, and which I thought might interest others (even though much of it will be familiar to Shtetl-Optimized regulars). The students asked me to share my “optimistic vision” for the year 2050, so I did my best to oblige. A freewheeling discussion then followed, as a different freewheeling discussion can now follow in the comments section.
I was asked to share my optimistic vision for the future. The trouble is, optimistic visions for the future are not really my shtick!
It’s not that I’m a miserable, depressed person—I only sometimes am! It’s just that, on a local level, I try to solve the problems in front of me, which have often been problems in computational complexity or quantum computing theory.
And then, on a global level, I worry about the terrifying problems of the world, such as climate change, nuclear war, and of course the resurgence of populist, authoritarian strongmen who’ve turned their backs on the Enlightenment and appeal to the basest instincts of humanity. I won’t name any names.
So then my optimistic vision is simply that we survive all this—“we” meaning the human race, but also meaning communities that I personally care about, like Americans, academics, scientists, and my extended family. We survive all of it so that we can reach the next crisis, the one where we don’t even know what it is yet.
But I get the sense that you wanted more optimism than that! Since I’ve spent 27 years working in quantum computing, the easiest thing for me to do would be to spin an optimistic story about how QC is going to make our lives so much better in 2050, by, I dunno, solving machine learning and optimization problems much faster, curing cancer, fixing global warming, whatever.
The good news is that there has been spectacular progress over the past couple years toward actually building a scalable QC. We now have two-qubit gates with 99.9% accuracy, close to the threshold where quantum error-correction becomes a net win. We can now do condensed-matter physics simulations that give us numbers that we don’t know how to get classically. I think it’s fair to say that all the key ideas and hardware building blocks for a fault-tolerant quantum computer are now in place, and what remains is “merely” the staggeringly hard engineering problem, which might take a few years, or a decade or more, but should eventually be solved.
The trouble for the optimistic vision is that the applications, where quantum algorithms outperform classical ones, have stubbornly remained pretty specialized. In fact, the two biggest ones remain the two that we knew about in the 1990s:
simulation of quantum physics and chemistry themselves, and
breaking existing public-key encryption.
Quantum simulation could help with designing better batteries, or solar cells, or high-temperature superconductors, or other materials, but the road from improved understanding to practical value is long and uncertain. Meanwhile, breaking public-key cryptography could help various spy agencies and hackers and criminal syndicates, but it doesn’t obviously help the world.
The quantum speedups that we know outside those two categories—for example, for optimization and machine learning—tend to be either modest or specialized or speculative.
Honestly, the application of QC that excites me the most, by far, is just disproving all the people who said QC was impossible!
So much for QC then.
And so we come to the elephant in the room—the elephant in pretty much every room nowadays—which is AI. AI has now reached a place that exceeds the imaginations of many of the science-fiction writers of generations past—excelling not only at writing code and solving math competition problems but at depth of emotional understanding. Many of my friends are terrified of where this is leading us—and not in some remote future but in 5 or 10 or 20 years. I think they’re probably correct to be terrified. There’s an enormous range of possible outcomes on the table, including ones where the new superintelligences that we bring into being treat humans basically as humans treated the dodo bird, or the earlier hominids that used to share the earth with us.
But, within this range of outcomes, I think there are also some extremely good ones. Look, for millennia, people have prayed to God or gods for help, life, health, longevity, freedom, justice—and for millennia, God has famously been pretty slow to answer their prayers. A superintelligence that was aligned with human values would be nothing less than a God who did answer, who did deliver all those things, because we had created it to do so. Or for religious people, perhaps such an AI would be the means by which the old God was finally able to deliver all those things into the temporal world. These are the stakes here.
To switch metaphors, people sometimes describe the positive AI-enabled future as “luxury space communism.” AI would take care of all of our material needs, leaving us to seek value in our lives through family, friendships, competition, hobbies, humor, art, entertainment, or exploration. The super-AI would give us the freedom to pursue all those things, but would not give us the freedom to harm each other, to curtail each others’ freedoms, or to build a bad AI capable of overthrowing it. The super-AI would be a singleton, a monotheistic God or its emissary on earth.
Many people say that something would still be missing from this future. After all, we humans would no longer really be needed for anything—for building or advancing or defending civilization. To put a personal fine point on it, my students and colleagues and I wouldn’t needed any more to discover new scientific truths or to write about them. That would all be the AI’s job.
I agree that something would be lost here. But on the other hand, what fraction of us are needed right now for these things? Most humans already derive the meaning in their lives from family and community and enjoying art and music and food and things like that. So maybe the remaining fraction of us should just get over ourselves! On the whole, while this might not be the best future imaginable, I would accept it in a heartbeat given the realistic alternatives on offer. Thanks for listening.
My husband and I visited the Library of Congress on the final day of winter break this year. In a corner, we found a facsimile of a hand-drawn map: the world as viewed by sixteenth-century Europeans. North America looked like it had been dieting, having shed landmass relative to the bulk we knew. Australia didn’t appear. Yet the map’s aesthetics hit home: yellowed parchment, handwritten letters, and symbolism abounded. Never mind street view; I began hungering for an “antique” setting on Google maps.
1507 Waldseemüller Map, courtesy of the Library of Congress
Approximately four weeks after that trip, I participated in the release of another map: the publication of the review “Roadmap on quantum thermodynamics” in the journal Quantum Science and Technology. The paper contains 24 chapters, each (apart from the introduction) profiling one opportunity within the field of quantum thermodynamics. My erstwhile postdoc Aleks Lasek and I wrote the chapter about the thermodynamics of incompatibleconservedquantities, as Quantum Frontiers fans1 might guess from earlierblogposts.
Allow me to confess an ignoble truth: upon agreeing to coauthor the roadmap, I doubted whether it would impact the community enough to merit my time. Colleagues had published the book Thermodynamics in the Quantum Regime seven years earlier. Different authors had contributed different chapters, each about one topic on the rise. Did my community need such a similar review so soon after the book’s publication? If I printed a map of a city the last time I visited, should I print another map this time?
Apparently so. I often tout the swiftness with which quantum thermodynamics is developing, yet not even I predicted the appetite for the roadmap. Approximately thirty papers cited the arXiv version of the paper during the first nine months of its life—before the journal publication. I shouldn’t have likened the book and roadmap to maps of a city; I should have likened them to maps of a terra incognita undergoing exploration. Such maps change constantly, let alone over seven years.
A favorite map of mine, from a book
Two trends unite many of the roadmap’s chapters, like a mountain range and a river. First, several chapters focus on experiments. Theorists founded quantum thermodynamics and dominated the field for decades, but experimentalists are turning the tables. Even theory-heavy chapters, like Aleks’s and mine, mention past experiments and experimental opportunities.
Second, several chapters blend quantum thermodynamics with many-body physics. Many-body physicists share interests with quantum thermodynamicists: thermalization and equilibrium, the absence thereof, and temperature. Yet many-body physicists belong to another tribe. They tend to interact with each other differently than quantum thermodynamicists do, write papers differently, adhere to different standards, and deploy different mathematical toolkits. Many-body-physicists use random-matrix theory, mean field theory, Wick transformations, and the like. Quantum thermodynamicists tend to cultivate and apply quantum information theory. Yet the boundary between the communities has blurred, and many scientists (including yours truly) shuttle between the two.
My favorite anti-map, from another book (series)
When Quantum Science and Technology published the roadmap, lead editor Steve Campbell announced the event to us coauthors. He’d wrangled the 69 of us into agreeing to contribute, choosing topics, drafting chapters, adhering to limitations on word counts and citations, responding to referee reports, and editing. An idiom refers to the herding of cats, but it would gain in poignancy by referring to the herding of academics. Little wonder Steve wrote in his email, “I’ll leave it to someone else to pick up the mantle and organise Roadmap #2.” I look forward to seeing that roadmap—and, perhaps, contributing to it. Who wants to pencil in Australia with me?
Music theorists have studied many fractions of the form
2i 3j 5k
that are close to 1. They’re called 5-limit commas. Especially cherished are those that have fairly small exponents—given how close they are to 1. I discussed a bunch here:
Two pitches differing by this ratio sound the same to everyone except certain cleverly designed machines. But remarkably, the atom of Kirnberger shows up rather naturally in music—and it was discovered by a student of Bach! Read my article for details.
All this made me want to systematically explore such tiny intervals. Below is a table of them, where I list the best ones: the ones that are closest to 1 for a given complexity. The first eleven have names, and many of them play important roles in music! But beyond that point, all but one remain unnamed—or at least I don’t know their names. That’s because they’re too small to be audible, and—except for one—not even considered to be of great theoretical importance.
I’ll list these numbers in decimal form and also in cents, where we take the logarithm of the number in base 2 and multiply by 100. (I dislike this blend of base 2 and base 10, but it’s traditional in music theory.)
Most importantly, I list the monzo of each numbers. This is the vector of exponents: for example, the monzo of
2i 3j 5k
is
[i, j, k]
In case you’re wondering, this term was named after the music theorist Joseph Monzo.
Finally, I list the Tenney height of each number. This is a measure of the number’s complexity: the Tenney height of
2i 3j 5k
is
∣i∣ log2(2) + ∣j∣ log2(3) + ∣k∣ log2(5)
The table below purports to list only 5-limit commas that are close to 1 as possible for a given Tenney height. More precisely, it should list numbers of the form 2i 3j 5k that are > 1 and closer to 1 than any number with smaller Tenney height—except of course for 1 itself.
You’ll see there’s a big increase in Tenney height after the schisma. This is very interesting: it suggests that the schisma is the last ‘useful’ interval. It’s useful only in that it’s the ratio of two musically important commas, the syntonic comma and the Pythagorean comma. Life in music would be simpler if these were equal, and in well-tempered tuning systems it’s common to pretend that they are.
All the intervals in this table up to the schisma were discovered by musicians a long time ago, and they all have standard names! After the schisma, interest drops off dramatically.
The atom of Kirnberger has such amazing properties that it was worth naming. The rest, maybe not. But as you can see, I’ve taken the liberty of naming the smallest interval in the table the ‘quark of Baez’. This is much smaller than all that come before. It’s in bad taste to name things after oneself—indeed this is item 25 on the crackpot index—but I hope it’s allowed as a joke.
I also hope that in the future this is considered my smallest mathematical discovery.
Here is the Python code that should generate the above information. If you’re good at programming, please review it and check it! Someone gave me a gift subscription to Claude, and it (more precisely Opus 4.5) created this code. It seems to make sense, and I’ve checked a bunch of the results, but I don’t know Python.
In studying this subject I discovered that tiny 5-limit intervals were studied by Gene Ward Smith, a mathematician I used to see around on sci.math and the like. I never knew he worked on microtonal music! I am sad to hear that he died from COVID-19 in January 2021.
I may just be redoing a tiny part of his work: if anyone can find details, please let me know. In his memory, I’ll conclude with this article from the Xenharmonic Wiki:
Gene Ward Smith (1947–2021) was an American mathematician, music theorist, and composer.
In mathematics, he worked in the areas of Galois theory and Moonshine theory.
In music theory, he introduced wedge products as a way of classifying regular temperaments. In this system, a temperament is specified by means of a wedgie, which may technically be identified as a point on a Grassmannian. He had long drawn attention to the relationship between equal divisions of the octave and the Riemann zeta function.[1][2][3] He early on identified and emphasized free abelian groups of finite rank and their homomorphisms, and it was from that perspective that he contributed to the creation of the regular mapping paradigm.
In the 1970s, Gene experimented with musical compositions using a device with four square-wave voices, whose tuning was very stable and accurate, being controlled by a crystal oscillator. The device in turn was controlled by HP 9800 series desktop computers, initially the HP 9830A, programmed in HP Basic, later the 9845A. Using this, he explored both just intonation with a particular emphasis on groups of transformations, and pajara.
Gene had a basic understanding of the regular mapping paradigm during this period, but it was limited in practice since he was focused on the idea that the next step from meantone should keep some familiar features, and so was interested in tempering out 64/63 in place of 81/80. He knew 7-limit 12 and 22 had tempering out 64/63 and 50/49 in common, and 12 and 27 had tempering out 64/63 and 126/125 in common, and thought these would be logical places to progress to, blending novelty with familiarity. While he never got around to working with augene, he did consider it. For pajara, he found tempering certain JI scales, the 10 and 12 note highschool scales, led to interesting (omnitetrachordal) results, and that there were also closely related symmetric (MOS) scales of size 10 and 12 for pajara; he did some work with these, particularly favoring the pentachordal decatonic scale.
Gene was among the first to consider extending the Tonnetz of Hugo Riemann beyond the 5-limit and hence into higher dimensional lattices. In three dimensions, the hexagonal lattice of 5-limit harmony extends to a lattice of type A3 ~ D3. He is also the first to write music in a number of exotic intonation systems.
There seems to be a huge push lately in the tech world for the idea of placing data centers in space. This is not just coming from Musk via the merging of SpaceX and XAi. Google has some effort along these lines. NVIDIA is thinking about it. TED talks are being given by startup people in San Francisco on this topic, so you know we've reached some well-defined hype level. Somehow the idea has enough traction that even the PRC is leaning in this direction. The arguments seem to be that (1) there is abundant solar power in space; (2) environmental impact on the earth will be less, with no competition for local electricity, water, real estate; (3) space is "cold", so cooling these things should be do-able; (4) it's cool and sounds very sci-fi/high frontier.
At present (or near-future) levels of technology, as far as I can tell this idea makes no sense. I will talk about physics reasons here, though there are also pragmatic economic reasons why this seems crazy. I've written before that I think some of the AI/data center evangelists are falling victim to magical thinking, because they come from the software world and don't in their heart of hearts appreciate that there are actual hardware constraints on things like chip manufacturing and energy production.
Others have written about this - see here for example. The biggest physics challenges with this idea (beyond lofting millions of kg of cargo into orbit):
While the cosmic microwave background is cold, cooling things in space is difficult, because vacuum is an excellent thermal insulator. On the ground, you can use conduction and convection to get rid of waste heat. In space, your only option (beyond throwing mass overboard, which is not readily replenishible) is radiative cooling. The key physics here is the Stefan-Boltzmann law, which is a triumph of statistical physics (and one of my favorite derivations to discuss in class - you combine the Planck result for the energy density of a "gas" of photons in thermal equilibrium at some temperature \(T\) with a basic kinetic theory of gases result for the flux of particles out of a small hole). It tells you that the best you can ever do is for an ideal black body, the total power radiated away is proportional to the area of the radiator and \(T^{4}\), with fundamental constants making up the proportionality constant with zero adjustable parameters.
Remember, data centers right now consume enormous amounts of power (and cooling water). While you can use heat pumps to try to get the radiators up to well above the operating temperatures of the electronics, that increases mass and waste power, and realistically there is an upper limit on the radiator temperature below 1000 K. An ideal black body radiator at 1000 K puts out about 57 kW per square meter, and you probably need to get rid of tens of megawatts, necessitating hundreds to thousands of square meters of radiator area. There are clever ideas on how to try to do this. For example, in the liquid droplet radiator, you could spray a bunch of hot droplets out into space, capitalizing on their large specific surface area. Of course, you'd need to recapture the cooled droplets, and the hot liquid needs to have sufficiently low vapor pressure that you don't lose a lot of material. Still, as far as I am aware, to date no one has actually deployed a large-scale (ten kW let alone MW level) droplet radiator in space.
High end computational hardware is vulnerable to radiation damage. There are no rad-hard GPUs. Low earth orbit is a pretty serious radiation environment, with some flux of high energy cosmic rays quite a bit higher than on the ground. While there are tests going on, and astronauts are going to bring smartphones on the next Artemis mission, it's rough. Putting many thousands to millions of GPUs and huge quantities of memory in a harsh environment where they cannot be readily accessed or serviced seems unwise. (There are also serious questions of vulnerability to attack. Setting off a small nuclear warhead in LEO injects energetic electrons into the lower radiation belts and would be a huge mess.)
I think we will be faaaaaaar better off in the long run if we take a fraction of the money that people want to invest in space-based data centers, and instead plow those resources into developing energy-efficient computing. Musk has popularized the engineering sentiment "The best part is no part". The best way to solve the problem of supplying and radiating away many GW of power for data centers is to make data centers that don't consume many GW of power.
Quanta Magazine recently published a reflection by Natalie Wolchover on the state of fundamental particle physics. The discussion covers a lot of ground, but one particular paragraph has gotten the lion’s share of the attention. Wolchover talked to Jared Kaplan, the ex-theoretical physicist turned co-founder of Anthropic, one of the foremost AI companies today.
Kaplan was one of Nima Arkani-Hamed’s PhD students, which adds an extra little punch.
There’s a lot to contest here. Is AI technology anywhere close to generating papers as good as the top physicists, or is that relegated to the sci-fi future? Does Kaplan really believe this, or is he just hyping up his company?
I don’t have any special insight into those questions, about the technology and Kaplan’s motivations. But I think that, even if we trusted him on the claim that AI could be generating Witten- or Nima-level papers in three years, that doesn’t mean it will replace theoretical physicists. That part of the argument isn’t a claim about the technology, but about society.
So let’s take the technological claims as given, and make them a bit more specific. Since we don’t have any objective way of judging the quality of scientific papers, let’s stick to the subjective. Today, there are a lot of people who get excited when Witten posts a new paper. They enjoy reading them, they find the insights inspiring, they love the clarity of the writing and their tendency to clear up murky ideas. They also find them reliable: the papers very rarely have mistakes, and don’t leave important questions unanswered.
Let’s use that as our baseline, then. Suppose that Anthropic had an AI workflow that could reliably write papers that were just as appealing to physicists as Witten’s papers are, for the same reasons. What happens to physicists?
Witten himself is retired, which for an academic means you do pretty much the same thing you were doing before, but now paid out of things like retirement savings and pension funds, not an institute budget. Nobody is going to fire Witten, there’s no salary to fire him from. And unless he finds these developments intensely depressing and demoralizing (possible, but very much depends on how this is presented), he’s not going to stop writing papers. Witten isn’t getting replaced.
More generally, though, I don’t think this directly results in anyone getting fired, or in universities trimming positions. The people making funding decisions aren’t just sitting on a pot of money, trying to maximize research output. They’ve got money to be spent on hires, and different pools of money to be spent on equipment, and the hires get distributed based on what current researchers at the institutes think is promising. Universities want to hire people who can get grants, to help fund the university, and absent rules about AI personhood, the AIs won’t be applying for grants.
Funding cuts might be argued for based on AI, but that will happen long before AI is performing at the Witten level. We already see this happening in other industries or government agencies, where groups that already want to cut funding are getting think tanks and consultants to write estimates that justify cutting positions, without actually caring whether those estimates are performed carefully enough to justify their conclusions. That can happen now, and doesn’t depend on technological progress.
AI could also replace theoretical physicists in another sense: the physicists themselves might use AI to do most of their work. That’s more plausible, but here adoption still heavily depends on social factors. Will people feel like they are being assessed on whether they can produce these Witten-level papers, and that only those who make them get hired, or funded? Maybe. But it will propagate unevenly, from subfield to subfield. Some areas will make their own rules forbidding AI content, there will be battles and scandals and embarrassments aplenty. It won’t be a single switch, the technology alone setting the timeline.
Finally, AI could replace theoretical physicists in another way, by people outside of academia filling the field so much that theoretical physicists have nothing more that they want to do. Some non-physicists are very passionate about physics, and some of those people have a lot of money. I’ve done writing work for one such person, whose foundation is now attempting to build an AI Physicist. If these AI Physicists get to Witten-level quality, they might start writing compelling paper after compelling paper. Those papers, though, will due to their origins be specialized. Much as philanthropists mostly fund the subfields they’ve heard of, philanthropist-funded AI will mostly target topics the people running the AI have heard are important. Much like physicists themselves adopting the technology, there will be uneven progress from subfield to subfield, inch by socially-determined inch.
In a hard-to-quantify area like progress in science, that’s all you can hope for. I suspect Kaplan got a bit of a distorted picture of how progress and merit work in theoretical physics. He studied with Nima Arkani-Hamed, who is undeniably exceptionally brilliant but also undeniably exceptionally charismatic. It must feel to a student of Nima’s that academia simply hires the best people, that it does whatever it takes to accomplish the obviously best research. But the best research is not obvious.
Mr. Epstein was not only a world-class child abuser, he was a big fan of theoretical high-energy physics and of theoretical physicists. Some of my colleagues, unfortunately, got to know him. A number who were famous and/or had John Brockman as a book agent were even invited to a physics conference on Epstein’s private island, well before he was first arrested. This was no secret; as I recall, a lot of us heard about the existence of this conference/trip, but we hadn’t heard Epstein’s name before and didn’t pay much attention (ho hum, just another weird billionaire).
Personally, I feel quite lucky. The Brockman agency rejected the proposal for my recent book without comment (thank you!); and my research is mostly considered unimportant by the Brian Greenes of the world. As a result, I was not invited to Epstein’s island, never made his acquaintance, and blissfully avoided the entire affair. Clearly there are some benefits to being considered ordinary. And so — I’m sorry/not-sorry to say — I can’t tell you much about Epstein at all, or about how certain physicists did and did not interact with him. Regarding my colleagues who did get to know him, I can’t speak for them, since I wasn’t there, and I don’t know to what extent Epstein hid his immoral activities when they were around. It’s up to them to tell their own stories if they feel the need to do so (and I hope a couple of them do, just to clear the air.) Personally I tend to give them the benefit of the doubt — probably some literally didn’t know what was up until Epstein’s arrest in 2008, while perhaps others felt there wasn’t much they could do about Epstein’s actions on his own private island. I imagine they are deeply embarrassed to have been caught in this horrible man’s ugly web.
Fans of physics come in all shapes and sizes, and some have large wallets, large egos, and/or large ambitions. Among the wealthy supporters, we can count Alfred Nobel himself; billionaires sit on important scientific institute and university boards, and the more recent Breakthrough Prizes were funded by deep pockets. The extreme wealthy have outsized influence in our country and in our world, and one could argue that their influence in 2025 was not for the better. Usually, though, the influence in physics and related fields tends to be relatively benign, funding postdoctoral researchers and graduate students who deeply want to do science but also need to eat. That said, sometimes donors fund non-essential fields at the expense of critical ones, or favor theoretical research over the gathering of crucial experimental data, or push money on famous rich organizations when there are poor ones that are equally deserving and far more needy.
When gazillionaires, on their own initiative, come calling on non-profit organizations, whether they be community centers, arts organizations, or universities, they pose a problem. On the one hand, it is the job of anyone in a non-profit organization to help raise money — fail to do that, and your organization will close. When a single person offers to permanently change the future of your program, you would be derelict in your duty if you did not consider that offer. On the other hand, donors who might have ethical or criminal problems could drag the organization’s name through the mud. Worse, they might be able to force the organization itself to do something ethically questionable or even illegal.
There is a clear lesson for young academics and other up-and-coming non-profit actors in the Epstein affair: the more money potentially offered to our organizations, the more carefully we must tread. Money is power; power corrupts; and every pursuit of dollars, even for the best causes, risks infection. We can’t be large-scale non-profit fundraisers without doing serious and thorough background checks of the biggest donors; we have to question motives, and we can’t look the other way when something seems amiss. Those of us with clear hearts and honest pursuits tend to assume the best in other people. But we have to beware of those hoping to bolster their reputations, or clean their consciences, by giving away “generously” what they never deserved to have.
My top 10 ghosts (solo acts and ensembles). If Bruce Willis being a ghost in The Sixth Sense is a spoiler, that’s on you — the movie has been out for 26 years.
Einstein and I have both been spooked by entanglement. Einstein’s experience was more profound: in a 1947 letter to Born, he famously dubbed it spukhafte Fernwirkung (or spooky action at a distance). Mine, more pedestrian. It came when I first learned the cost of entangling logical qubits on today’s hardware.
Logical entanglement is not easy
I recently listened to a talk where the speaker declared that “logical entanglement is easy,” and I have to disagree. You could argue that it looks easy when compared to logical small-angle gates, in much the same way I would look small standing next to Shaquille O’Neal. But that doesn’t mean 6’5” and 240 pounds is small.
To see why it’s not easy, it helps to look at how logical entangling gates are actually implemented. A logical qubit is not a single physical object. It’s an error-resistant qubit built out of several noisy, error-prone physical qubits. A quantum error-correcting (QEC) code with parameters uses physical qubits to encode logical qubits in a way that can detect up to physical errors and correct up to of them.
This redundancy is what makes fault-tolerant quantum computing possible. It’s also what makes logical operations expensive.
On platforms like neutral-atom arrays and trapped ions, the standard approach is a transversal CNOT: you apply two-qubit gates pairwise across the code blocks (qubit in block A interacts with qubit in block B). That requires physical two-qubit gates to entangle the logical qubits of one code block with the logical qubits of another.
To make this less abstract, here’s a QuEra animation showing a transversal CNOT implemented in a neutral-atom array. This animation is showing real experimental data, not a schematic idealization.
The idea is simple. The problem is that can be large, and physical two-qubit gates are among the noisiest operations available on today’s hardware.
Superconducting platforms take a different route. They tend to rely on lattice surgery; you entangle logical qubits by repeatedly measuring joint stabilizers along a boundary. That replaces two-qubit gates for stabilizer measurements over multiple rounds (typically scaling with the code distance). Unfortunately, physical measurements are the other noisiest primitive we have.
Then there are the modern high-rate qLDPC codes, which pack many logical qubits into a single code block. These are excellent quantum memories. But when it comes to computation, they face challenges. Logical entangling gates can require significant circuit depth, and often entire auxiliary code blocks are needed to mediate the interaction.
This isn’t a purely theoretical complaint. In recent state-of-the-art experiments by Google and by the Harvard–QuEra–MIT collaboration, logical entangling gates consumed nearly half of the total error budget.
So no, logical entanglement is not easy. But, how easy can we make it?
Phantom codes: Logical entanglement without physical operations
To answer how easy logical entanglement can really be, it helps to start with a slightly counterintuitive observation: logical entanglement can sometimes be generated purely by permuting physical qubits.
Let me show you how this works in the simplest possible setting, and then I’ll explain what’s really going on.
Consider a stabilizer code, which encodes 4 physical qubits into 2 logical ones that can detect 1 error, but can’t correct any. Below are its logical operators; the arrow indicates what happens when we physically swap qubits 1 and 3 (bars denote logical operators).
You can check that the logical operators transform exactly as shown, which is the action of a logical CNOT gate. For readers less familiar with stabilizer codes, click the arrow below for an explanation of what’s going on. Those familiar can carry on.
Click!
At the logical level, we identify gates by how they transform logical Pauli operators. This is the same idea used in ordinary quantum circuits: a gate is defined not just by what it does to states, but by how it reshuffles observables.
A CNOT gate has a very characteristic action. If qubit 1 is the control and qubit 2 is the target, then: an on the control spreads to the target, a on the target spreads back to the control, and the other Pauli operators remain unchanged.
That’s exactly what we see above.
To see why this generates entanglement, it helps to switch from operators to states. A canonical example of how to generate entanglement in quantum circuits is the following. First, you put one qubit into a superposition using a Hadamard. Starting from , this gives
At this point there is still no entanglement — just superposition.
The entanglement appears when you apply a CNOT. The CNOT correlates the two branches of the superposition, producing
which is a maximally-entangled Bell state. The Hadamard creates superposition; the CNOT turns that superposition into correlation.
The operator transformations above are simply the algebraic version of this story. Seeing
tells us that information on one logical qubit is now inseparable from the other.
In other words, in this code,
The figure below shows how this logical circuit maps onto a physical circuit. Each horizontal line represents a qubit. On the left is a logical CNOT gate: the filled dot marks the control qubit, and the ⊕ symbol marks the target qubit whose state is flipped if the control is in the state . On the right is the corresponding physical implementation, where the logical gate is realized by acting on multiple physical qubits.
At this point, all we’ve done is trade one physical operation for another. The real magic comes next. Physical permutations do not actually need to be implemented in hardware. Because they commute cleanly through arbitrary circuits, they can be pulled to the very end of a computation and absorbed into a relabelling of the final measurement outcomes. No operator spread. No increase in circuit depth.
This is not true for generic physical gates. It is a unique property of permutations.
To see how this works, consider a slightly larger example using an code. Here the logical operators are a bit more complicated:
Below is a three-logical-qubit circuit implemented using this code like the circuit drawn above, but now with an extra step. Suppose the circuit contains three logical CNOTs, each implemented via a physical permutation.
Instead of executing any of these permutations, we simply keep track of them classically and relabel the outputs at the end. From the hardware’s point of view, nothing happened.
If you prefer a more physical picture, imagine this implemented with atoms in an array. The atoms never move. No gates fire. The entanglement is there anyway.
This is the key point. Because no physical gates are applied, the logical entangling operation has zero overhead. And for the same reason, it has perfect fidelity. We’ve reached the minimum possible cost of a logical entangling gate. You can’t beat free.
To be clear, not all codes are amenable to logical entanglement through relabeling. This is a very special feature that exists in some codes.
Motivated by this observation, my collaborators and I defined a new class of QEC codes. I’ll state the definition first, and then unpack what it really means.
Phantom codes are stabilizer codes in which logical entangling gates between every ordered pair of logical qubits can be implemented solely via physical qubit permutations.
The phrase “every ordered pair” is a strong requirement. For three logical qubits, it means the code must support logical CNOTs between qubits , , , , , and . More generally, a code with logical qubits must support all possible directed CNOTs. This isn’t pedantry. Without access to every directed pair, you can’t freely build arbitrary entangling circuits — you’re stuck with a restricted gate set.
The phrase “solely via physical qubit permutations” is just as demanding. If all but one of those CNOTs could be implemented via permutations, but the last one required even a single physical gate — say, a one-qubit Clifford — the code would not be phantom. That condition is what buys you zero overhead and perfect fidelity. Permutations can be compiled away entirely; any additional physical operation cannot.
Together, these two requirements carve out a very special class of codes. All in-block logical entangling gates are free. Logical entangling gates between phantom code blocks are still available — they’re simply implemented transversally.
After settling on this definition, we went back through the literature to see whether any existing codes already satisfied it. We found two. The Carbon code and hypercube codes. The former enabled repeated rounds of quantum error-correction in trapped-ion experiments, while the latter underpinned recent neutral-atom experiments achieving logical-over-physical performance gains in quantum circuit sampling.
Both are genuine phantom codes. Both are also limited. With distance , they can detect errors but not correct them. With only logical qubits, there’s a limited class of CNOT circuits you can implement. Which begs the questions: Do other phantom codes exist? Can these codes have advantages that persist for scalable applications under realistic noise conditions? What structural constraints do they obey (parameters, other gates, etc.)?
Before getting to that, a brief note for the even more expert reader on four things phantom codes are not. Phantom codes are not a form of logical Pauli-frame tracking: the phantom property survives in the presence of non-Clifford gates. They are not strictly confined to a single code block: because they are CSS codes, multiple blocks can be stitched together using physical CNOTs in linear depth. They are not automorphism gates, which rely on single-qubit Cliffords and therefore do not achieve zero overhead or perfect fidelity. And they are not codes like SHYPS, Gross, or Tesseract codes, which allow only products of CNOTs via permutations rather than individually addressable ones. All of those codes are interesting. They’re just not phantom codes.
In a recent preprint, we set out to answer the three questions above. This post isn’t about walking through all of those results in detail, so here’s the short version. First, we find many more phantom codes — hundreds of thousands of additional examples, along with infinite families that allow both and to scale. We study their structural properties and identify which other logical gates they support beyond their characteristic phantom ones.
Second, we show that phantom codes can be practically useful for the right kinds of tasks — essentially, those that are heavy on entangling gates. In end-to-end noisy simulations, we find that phantom codes can outperform the surface code, achieving one–to–two orders of magnitude reductions in logical infidelity for resource state preparation (GHZ-state preparation) and many-body simulation, at comparable qubit overhead and with a modest preselection acceptance rate of about 24%.
If you’re interested in the details, you can read more in our preprint.
Larger space of codes to explore
This is probably a good moment to zoom out and ask the referee question: why does this matter?
I was recently updating my CV and realized I’ve now written my 40th referee report for APS. After a while, refereeing trains a reflex. No matter how clever the construction or how clean the proof, you keep coming back to the same question: what does this actually change?
So why do phantom codes matter? At least to me, there are two reasons: one about how we think about QEC code design, and one about what these codes can already do in practice.
The first reason is the one I’m most excited about. It has less to do with any particular code and more to do with how the field implicitly organizes the space of QEC codes. Most of that space is structured around familiar structural properties: encoding rate, distance, stabilizer weight, LDPC-ness. These form the axes that make a code a good memory. And they matter, a lot.
But computation lives on a different axis. Logical gates cost something, and that cost is sometimes treated as downstream—something to be optimized after a code is chosen, rather than something to design for directly. As a result, the cost of logical operations is usually inherited, not engineered.
One way to make this tension explicit is to think of code design as a multi-dimensional space with at least two axes. One axis is memory cost: how efficiently a code stores information. High rate, high distance, low-weight stabilizers, efficient decoding — all the usual virtues. The other axis is computational cost: how expensive it is to actually do things with the encoded qubits. Low computational cost means many logical gates can be implemented with little overhead. Low computational cost makes computation easy.
Why focus on extreme points in this space? Because extremes are informative. They tell you what is possible, what is impossible, and which tradeoffs are structural rather than accidental.
Phantom codes sit precisely at one such extreme: they minimize the cost of in-block logical entanglement. That zero-logical-cost extreme comes with tradeoffs. The phantom codes we find tend to have high stabilizer weights, and for families with scalable , the number of physical qubits grows exponentially. These are real costs, and they matter.
Still, the important lesson is that even at this extreme point, codes can outperform LDPC-based architectures on well-chosen tasks. That observation motivates an approach to QEC code design in which the logical gates of interest are placed at the centre of the design process, rather than treated as an afterthought. This is my first takeaway from this work.
Second is that phantom codes are naturally well suited to circuits that are heavy on logical entangling gates. Some interesting applications fall into this category, including fermionic simulation and correlated-phase preparation. Combined with recent algorithmic advances that reduce the overhead of digital fermionic simulation, these code-level ideas could potentially improve near-term experimental feasibility.
Back to being spooked
The space of QEC codes is massive. Perhaps two axes are not enough. Stabilizer weight might deserve its own. Perhaps different applications demand different projections of this space. I don’t yet know the best way to organize it.
The size of this space is a little spooky — and that’s part of what makes it exciting to explore, and to see what these corners of code space can teach us about fault-tolerant quantum computation.
After seeing this latest extremely good video from Veritasium, and looking back through my posts, I realized that while I've referenced it indirectly, I've never explicitly talked about the Aharonov-Bohm effect. The video is excellent, and that wikipedia page is pretty good, but maybe some people will find another angle on this to be helpful.
The ultrabrief version: The quantum interference of charged particles like electrons can be controllably altered by tuning a magnetic field in a region that the particles never pass through. This is weird and spooky because it's an entirely quantum mechanical effect - classical physics, where motion is governed by local forces, says that zero field = unaffected trajectories.
In quantum mechanics, we describe the spatial distribution of particles like electrons with a wavefunction, a complex-valued quantity that one can write as an amplitude and a phase \(\varphi\), where both depend on position \(\mathbf{r}\). The phase is important because waves can interfere. Crudely speaking, when the crests of one wave (say \(\varphi = 0\)) line up with the troughs of another wave (\(\varphi = \pi\)) at some location, the waves interfere destructively, so the total wave at that location is zero if the amplitudes of each contribution are identical. As quantum particles propagate through space, their phase "winds" with distance \(\mathbf{r}\) like \(\mathbf{k}\cdot \mathbf{r}\), where \(\hbar \mathbf{k} = \mathbf{p}\) is the momentum. Higher momentum = faster winding of phase = shorter wavelength. This propagation, phase winding, and interference is the physics behind the famous two-slit experiment. (In his great little popular book - read it if you haven't yet - Feynman described phase as a clockface attached to each particle.) One important note: The actual phase itself is arbitrary; it's phase differences that matter in interference experiments. If you added an arbitrary amount \(\varphi_{0}\) to every phase, no physically measurable observables would change.
Things get trickier if the particles that move around are charged. It was realized 150+ years ago that formal conservation of momentum gets tricky if we consider electric and magnetic fields. The canonical momentum that shows up in the Lagrange and Hamilton equations is \(\mathbf{p}_{c} = \mathbf{p}_{kin} + q \mathbf{A}\), where \(\mathbf{p}_{kin}\) is the kinetic momentum (the part that actually has to do with the classical velocity and which shows up in the kinetic energy), \(q\) is the charge of the particle, and \(\mathbf{A}(\mathbf{r}\)\) is the vector potential.
Background digression: The vector potential is very often a slippery concept for students. We get used to the idea of a scalar potential \(\phi(\mathbf{r})\), such that the electrostatic potential energy is \(q\phi\) and the electric field is given by \(\mathbf{E} = -\nabla \phi\) if there are no magnetic fields. Adding an arbitrary uniform offset to the scalar potential, \(\phi \rightarrow \phi + \phi_{0}\), doesn't change the electric field (and therefore forces on charged particles), because the zero that we define for energy is arbitrary (general relativity aside). For the vector potential, \(\mathbf{B} = \nabla \times \mathbf{A}\). This means we can add an arbitrary gradient of a scalar function to the vector potential, \(\mathbf{A} \rightarrow \mathbf{A}+ \nabla f(\mathbf{r})\), and the magnetic field won't change. Maxwell's equations mean that \(\mathbf{E} = -\nabla \phi - \partial \mathbf{A}/\partial t\). "Gauge freedom" means that there is more than one way to choose internally consistent definitions of \(\phi\) and \(\mathbf{A}\).
TL/DR main points: (1) The vector potential can be nonzero in places where \(\mathbf{B}\) (and hence the classical Lorentz force) is zero. (2) Because the canonical momentum becomes the operator \(-i \hbar \nabla\) in quantum mechanics and the kinetic momentum is what shows up in the kinetic energy, charged propagating particles pick up an extra phase winding given by \(\delta \varphi = (q/\hbar)\int \mathbf{A}\cdot d\mathbf{r}\) along a path.
This is the source of the creepiness of the Aharonov-Bohm effect. Think of two paths (see still taken from the Veritasium video), and threading magnetic flux just through the little region using a solenoid will tune the intensity detected on the screen on the far right. That field region can be made arbitrarily small and positioned anywhere inside the diamond formed by the paths, and the effect still works. Something not mentioned in the video: The shifting of the interference pattern is periodic in the flux through the solenoid, with a period of \(h/e\), where \(h\) is Planck's constant and \(e\) is the electronic charge.
Why should you care about this?
As the video discusses, the A-B effect shows that the potentials are physically important quantities that affect motion, at least as much as the corresponding fields, and there are quantum consequences to this that are just absent in the classical world.
The A-B effect (though not with the super skinny field confinement) has been seen experimentally in many mesoscopic physics experiments (e.g., here, or here) and can be used as a means of quantifying coherence at these scales (e.g., here and here).
When dealing with emergent quasiparticles that might have unusual fractional charges (\(e^*\)), then A-B interferometers can have flux periodicities that are given by \(h/e^*\). (This can be subtle and tricky.)
Interferometry to detect potential-based phase shifts is well established. Here's the paper mentioned in the video about a gravitational analog of the A-B effect. (Quibblers can argue that there is no field-free region in this case, so it's not strictly speaking the A-B analog.)
Basically, the A-B effect has gone from an initially quite controversial prediction to an established piece of physics that can be used as a tool. If you want to learn Aharonov's take on all this, please read this interesting oral history.
Update: The always informative Steve Simon has pointed out to me a history of this that I had not known, that this effect had already been discovered a decade earlier by Ehrenberg and Siday. Please see this arXiv paper about this. Here is Ehrenberg and Siday's paper. Aharonov and Bohm were unaware of it and arrived at their conclusions independently. One lesson to take away: Picking a revealing article title can really help your impact.
Strange how time goes by. And strange I would say that, since I know time does not flow, it is just our perception of one of the spacetime coordinates of our block universe... The thing is, on February 5 I will turn 60. An important date for anybody - I could say a milestone. First of all, let me say that we give for granted all the days of our life we got to live, but in truth we did not know it from the start we would make it far. I do feel rather young still, but I am very well aware that there are heaps of ways I could have ended my life earlier. Accidents, but also naturally occurring sickness.
Steel mills dumped molten slag in parts of Chicago and nearby areas. The slag hardened in layers up to 5 meters deep. These places became barren wastelands. Other industries dumped hot ash and cinders there.
But eventually the steel mills closed.
The deep layers of hard, toxic material were not friendly to plants. Cottonwoods are usually 30 meters tall or more. In the slag fields, stunted cottonwoods grow to just 2 meters.
But rare species that could handle these conditions began to thrive. The lakeside daisy, a federally threatened species lost to Illinois for decades, turned out to grow taller on slag than on topsoil! The capitate spike-rush, last recorded in Illinois in 1894 and considered locally extinct, was rediscovered growing on slag.
A team of women ecologists began studying these unusual landscapes. They call themselves the Slag Queens.
Ecologist Alison Anastasio visited a former US Steel South Works site
in Chicago. She expected to just find “crap plants”: common invasive weeds. To her surprise she spotted little bluestem and three species of native milkweed. She already knew she didn’t want a career as an academic scientist. But she came up with the idea of forming a group to study this ecosystem: “a dream team of people I wanted to work with”.
She knew Laura Merwin from the University of Chicago, and later she met Lauren Umek, a project manager for the Chicago Park District. She invited them to brunch to pitch her idea to research plants growing on slag. Not for any obvious career goal. Just out of sheer curiosity.
Merwin and Umek were excited to join her project—which she called a
“reverse side hustle,” since it involved a lot of work, but didn’t make any money: it actually costs money.
And thus the Slag Queens were born.
Alison Anastasio (left) and Lauren Umek (right) along with Laura Merwin (not pictured), formed the Slag Queens in 2018. Photograph by Jason Smith.
Their first paper, Urban post-industrial landscapes have unrealized ecological potential, was published in Restoration Ecology in 2022. It argues that slag fields don’t need to be fixed. They have ecological value in and of themselves. And land managers should forget whatever ecosystem was there before. Instead, they should look to more exotic ecosystems as a guide, like the dolomite prairies of Illinois, where magnesium-rich rock near the surface makes it hard for ordinary plants to thrive. Slag too is rich in magnesium.
The Slag Queens are continuing their revolutionary work even now! For more, start here:
The short course is aimed at people from industry or government who want to get started in deep learning, apply deep learning to their projects, learn how to code deep learning algorithms, and upgrade their skills to the latest AI algorithms, including generative AI. The course will be taught be Professor Xavier Bresson from the Department of Computer Science at the National University of Singapore (NUS).
The course will be limited to 25 participants. The fee for this course is $2,500 for participants. Registration closes soon (Feb 5); we still have a few spots available.
This is a compilation of posts related to some basic concepts of the physics of materials and nanoscale physics. I realized the other day that I hadn't updated this since 2019, and therefore a substantial audience may not have seen these. Wikipedia's physics entries have improved greatly over the years, but hopefully these are a complement that's useful to students and maybe some science writers. Please let me know if there are other topics that you think would be important to include.
Have you seen “population pyramids“? They’re diagrams that show snapshots of a population, how many people there are of each age. They can give you an intuition for how a population is changing, and where the biggest hurdles are to survival.
I wonder what population pyramids would look like for academia. In each field and subfield, how many people are PhD students, postdocs, and faculty?
If every PhD student was guaranteed to become faculty, and the number of faculty stayed fixed, you could roughly estimate what this pyramid would have to look like. An estimate for the US might take an average 7-year PhD, two postdoc positions at 3 years each, followed by a 30-year career as faculty, and estimate the proportions of each stage based on proportions of each scholar’s life. So you’d have roughly one PhD student per four faculty, and one postdoc per five. In Europe, with three-year PhDs, the proportion of PhD students decreases further, and in a world where people are still doing at least two postdocs you expect significantly more postdocs than PhDs.
Of course, the world doesn’t look like that at all, because the assumptions are wrong.
The number of faculty doesn’t stay fixed, for one. When population is growing in the wider world, new universities open in new population centers, and existing universities find ways to expand. When population falls, enrollments shrink, and universities cut back.
But this is a minor perturbation compared to the much more obvious difference: most PhD students do not stay in academia. A single professor may mentor many PhDs at the same time, and potentially several postdocs. Most of those people aren’t staying.
You can imagine someone trying to fix this by fiat, setting down a fixed ratio between PhD students, postdocs, and faculty. I’ve seen partial attempts at this. When I applied for grants at the University of Copenhagen, I was told I had to budget at least half of my hires as PhD students, not postdocs, which makes me wonder if they were trying to force careers to default to one postdoc position, rather than two. More likely, they hadn’t thought about it.
Zero attrition doesn’t really make sense, anyway. Some people are genuinely better off leaving: they made a mistake when they started, or they changed over time. Sometimes new professions arise, and the best way in is from an unexpected direction. I’ve talked to people who started data science work in the early days, before there really were degrees in it, who felt a physics PhD had been the best route possible to that world. Similarly, some move into policy, or academic administration, or found a startup. And if we think there are actually criteria to choose better or worse academics (which I’m a bit skeptical of), then presumably some people are simply not good enough, and trying to filter them out earlier is irresponsible when they still don’t have enough of a track record to really judge.
How much attrition should be there is the big question, and one I don’t have an answer for. In academia, when so much of these decisions are made by just a few organizations, it seems like a question that someone should have a well-considered answer to. But so far, it’s unclear to me that anyone does.
It also makes me think, a bit, about how these population pyramids work in industry. There there is no overall control. Instead, there’s a web of incentives, many of them decades-delayed from the behavior they’re meant to influence, leaving each individual to try to predict as well as they can. If companies only hire senior engineers, no-one gets a chance to start a career, and the population of senior engineers dries up. Eventually, those companies have to settle for junior engineers. (Or, I guess, ex-academics.) It sounds like it should lead to the kind of behavior biologists model in predators and prey, wild swings in population modeled by a differential equation. But maybe there’s something that tamps down those wild swings.
Thomas Bloom’s Erdös problem site has become a real hotbed of activity in recent months, particularly as some of the easiest of the outstanding open problems have turned out to be amenable to various AI-assisted approaches; there is now a lively community in which human contributions, AI contributions, and hybrid contributions are presented, discussed, and in some cases approved as updates to the site.
One of the lessons I draw from this is that once a well curated database of precise mathematical problems is maintained, it becomes possible for other parties to build upon it in many ways (including both AI-based and human-based approaches), to systematically make progress on some fraction of the problems.
This makes me wonder what other mathematical databases could be created to stimulate similar activity. One candidate that came to mind are “optimization constants” – constants that arise from some mathematical optimization problem of interest, for instance finding the best constant for which a certain functional inequality is satisfied.
I am therefore proposing to create a crowdsourced repository for such constants, to record the best upper and lower bounds known for any given such constant, in order to help encourage efforts (whether they be by professional mathematicians, amateur mathematicians, or research groups at a tech company) to try to improve upon the state of the art.
There are of course thousands of such constants one could consider, but just to set the discussion going, I set up a very minimal, proof of concept Github repository holding over 20 constants including:
Here, I am taking inspiration from the Erdös problem web site and arbitrarily assigning a number to each constant, for ease of reference.
Even in this minimal state I think the repository is ready to start accepting more contributions, in the form of pull requests that add new constants, or improve the known bounds on existing constants. (I am particularly interested in constants that have an extensive literature of incremental improvements in the lower and upper bounds, and which look at least somewhat amenable to computational or AI-assisted approaches.) But I would be interested to hear feedback on how to improve the repository in other ways.
Update: Paata Ivanisvili and Damek Davis have kindly agreed to help run and expand this repository.
and he okayed me posting them here. He’s taking the idea of categorifying the Riemann zeta function, explained in my paper, and going further, imagining what it might mean to categorify Riemann’s functional equation
My paper categorified the Euler product formula that writes the Riemann zeta function as a product over the usual primes:
I had nothing to say about the real prime.
But it’s the functional equation that sets the stage for focusing on zeroes of the Riemann zeta function with … and then the Riemann Hypothesis! So it’s worth thinking about.
David wrote:
Hi John,
Hope you’re doing well!
I was just thinking about your (and James Dolan’s) definition of the zeta functors associated to a finite type scheme (from here), and I had a small thought which I figured you might find interesting.
I was thinking about the functional equation of the completed zeta functions; how might we complete the zeta functors in such a way that they satisfy a similar functional equation? I don’t know, but I do have an idea for what the transformation might mean in this context. I claim that it is given by the reduced suspension. Let me explain.
First, I’ll want to see the formal power as the power
which I can then categorify by finding a group with cardinality and considering . In the case of the Riemann zeta species, is the cardinality of a finite semisimple ring (a product of finite fields, the groupoid of which has cardinality for each ), and we can simply deloop the additive group of this ring. This gives us a Dirichlet functor
which categorifies the Riemann zeta function when is a finite set.
Taking this point of view on the zeta functor, we can ask the question: what is the transformation ? Here’s where we can look at the reduced suspension . The universal property of the reduced suspension says that maps correspond to points of the homotopy type
(or, more classically, maps from the terminal morphism to ). Since homotopy cardinality is multiplicative for fibrations, that type has cardinality
(when is a finite set of cardinality ).
Taking for finite semisimple of cardinality , we see that has cardinality . Therefore, I think the transformation in the functional equation may be categorified by . If this makes sense, it suggests that completing the zeta functors is a form of stabilization.
Cheers,
David
And then:
As for another eyebrow wiggle about the cardinality of when is a finite set: we have that , the free group on generators. This is of course infinite, but it it is the group completion of the free monoid on generators. Since
it has cardinality .
Maybe it’s better to use the “free delooping” (aka weighted colimit of by ) instead of the reduced suspension. This doesn’t change the above argument because we’re mapping into a groupoid, but now it is true that the Euler characteristic / cardinality of that category is .
On December 10, I gave a keynote address at the Q2B 2025 Conference in Silicon Valley. This is a transcript of my remarks. The slides I presented are here.The video is here.
The first century
We are nearing the end of the International Year of Quantum Science and Technology, so designated to commemorate the 100th anniversary of the discovery of quantum mechanics in 1925. The story goes that 23-year-old Werner Heisenberg, seeking relief from severe hay fever, sailed to the remote North Sea Island of Helgoland, where a crucial insight led to his first, and notoriously obscure, paper describing the framework of quantum mechanics.
In the years following, that framework was clarified and extended by Heisenberg and others. Notably among them was Paul Dirac, who emphasized that we have a theory of almost everything that matters in everyday life. It’s the Schrödinger equation, which captures the quantum behavior of many electrons interacting electromagnetically with one another and with atomic nuclei. That describes everything in chemistry and materials science and all that is built on those foundations. But, as Dirac lamented, in general the equation is too complicated to solve for more than a few electrons.
Somehow, over 50 years passed before Richard Feynman proposed that if we want a machine to help us solve quantum problems, it should be a quantum machine, not a classical machine. The quest for such a machine, he observed, is “a wonderful problem because it doesn’t look so easy,” a statement that still rings true.
I was drawn into that quest about 30 years ago. It was an exciting time. Efficient quantum algorithms for the factoring and discrete log problems were discovered, followed rapidly by the first quantum error-correcting codes and the foundations of fault-tolerant quantum computing. By late 1996, it was firmly established that a noisy quantum computer could simulate an ideal quantum computer efficiently if the noise is not too strong or strongly correlated. Many of us were then convinced that powerful fault-tolerant quantum computers could eventually be built and operated.
Three decades later, as we enter the second century of quantum mechanics, how far have we come? Today’s quantum devices can perform some tasks beyond the reach of the most powerful existing conventional supercomputers. Error correction had for decades been a playground for theorists; now informative demonstrations are achievable on quantum platforms. And the world is investing heavily in advancing the technology further.
Current NISQ machines can perform quantum computations with thousands of two-qubit gates, enabling early explorations of highly entangled quantum matter, but still with limited commercial value. To unlock a wide variety of scientific and commercial applications, we need machines capable of performing billions or trillions of two-qubit gates. Quantum error correction is the way to get there.
I’ll highlight some notable developments over the past year—among many others I won’t have time to discuss. (1) We’re seeing intriguing quantum simulations of quantum dynamics in regimes that are arguably beyond the reach of classical simulations. (2) Atomic processors, both ion traps and neutral atoms in optical tweezers, are advancing impressively. (3) We’re acquiring a deeper appreciation of the advantages of nonlocal connectivity in fault-tolerant protocols. (4) And resource estimates for cryptanalytically relevant quantum algorithms have dropped sharply.
Quantum machines for science
A few years ago, I was not particularly excited about running applications on the quantum platforms that were then available; now I’m more interested. We have superconducting devices from IBM and Google with over 100 qubits and two-qubit error rates approaching 10^{-3}. The Quantinuum ion trap device has even better fidelity as well as higher connectivity. Neutral-atom processors have many qubits; they lag behind now in fidelity, but are improving.
Users face tradeoffs: The high connectivity and fidelity of ion traps is an advantage, but their clock speeds are orders of magnitude slower than for superconducting processors. That limits the number of times you can run a given circuit, and therefore the attainable statistical accuracy when estimating expectations of observables.
Verifiable quantum advantage
Much attention has been paid to sampling from the output of random quantum circuits, because this task is provably hard classically under reasonable assumptions. The trouble is that, in the high-complexity regime where a quantum computer can reach far beyond what classical computers can do, the accuracy of the quantum computation cannot be checked efficiently. Therefore, attention is now shifting toward verifiable quantum advantage — tasks where the answer can be checked. If we solved a factoring or discrete log problem, we could easily check the quantum computer’s output with a classical computation, but we’re not yet able to run these quantum algorithms in the classically hard regime. We might settle instead for quantum verification, meaning that we check the result by comparing two quantum computations and verifying the consistency of the results.
A type of classical verification of a quantum circuit was demonstrated recently by BlueQubit on a Quantinuum processor. In this scheme, a designer builds a family of so-called “peaked” quantum circuits such that, for each such circuit and for a specific input, one output string occurs with unusually high probability. An agent with a quantum computer who knows the circuit and the right input can easily identify the preferred output string by running the circuit a few times. But the quantum circuits are cleverly designed to hide the peaked output from a classical agent — one may argue heuristically that the classical agent, who has a description of the circuit and the right input, will find it hard to predict the preferred output. Thus quantum agents, but not classical agents, can convince the circuit designer that they have reliable quantum computers. This observation provides a convenient way to benchmark quantum computers that operate in the classically hard regime.
The notion of quantum verification was explored by the Google team using Willow. One can execute a quantum circuit acting on a specified input, and then measure a specified observable in the output. By repeating the procedure sufficiently many times, one obtains an accurate estimate of the expectation value of that output observable. This value can be checked by any other sufficiently capable quantum computer that runs the same circuit. If the circuit is strategically chosen, then the output value may be very sensitive to many-qubit interference phenomena, in which case one may argue heuristically that accurate estimation of that output observable is a hard task for classical computers. These experiments, too, provide a tool for validating quantum processors in the classical hard regime. The Google team even suggests that such experiments may have practical utility for inferring molecular structure from nuclear magnetic resonance data.
Correlated fermions in two dimensions
Quantum simulations of fermionic systems are especially compelling, since electronic structure underlies chemistry and materials science. These systems can be hard to simulate in more than one dimension, particularly in parameter regimes where fermions are strongly correlated, or in other words profoundly entangled. The two-dimensional Fermi-Hubbard model is a simplified caricature of two-dimensional materials that exhibit high-temperature superconductivity and hence has been much studied in recent decades. Large-scale tensor-network simulations are reasonably successful at capturing static properties of this model, but the dynamical properties are more elusive.
Dynamics in the Fermi-Hubbard model has been simulated recently on both Quantinuum (here and here) and Google processors. Only a 6 x 6 lattice of electrons was simulated, but this is already well beyond the scope of exact classical simulation. Comparing (error-mitigated) quantum circuits with over 4000 two-qubit gates to heuristic classical tensor-network and Majorana path methods, discrepancies were noted, and the Phasecraft team argues that the quantum simulation results are more trustworthy. The Harvard group also simulated models of fermionic dynamics, but were limited to relatively low circuit depths due to atom loss. It’s encouraging that today’s quantum processors have reached this interesting two-dimensional strongly correlated regime, and with improved gate fidelity and noise mitigation we can go somewhat further, but expanding system size substantially in digital quantum simulation will require moving toward fault-tolerant implementations. We should also note that there are analog Fermi-Hubbard simulators with thousands of lattice sites, but digital simulators provide greater flexibility in the initial states we can prepare, the observables we can access, and the Hamiltonians we can reach.
When it comes to many-particle quantum simulation, a nagging question is: “Will AI eat quantum’s lunch?” There is surging interest in using classical artificial intelligence to solve quantum problems, and that seems promising. How will AI impact our quest for quantum advantage in this problem space? This question is part of a broader issue: classical methods for quantum chemistry and materials have been improving rapidly, largely because of better algorithms, not just greater processing power. But for now classical AI applied to strongly correlated matter is hampered by a paucity of training data. Data from quantum experiments and simulations will likely enhance the power of classical AI to predict properties of new molecules and materials. The practical impact of that predictive power is hard to clearly foresee.
The need for fundamental research
Today is December 10th, the anniversary of Alfred Nobel’s death. The Nobel Prize award ceremony in Stockholm concluded about an hour ago, and the Laureates are about to sit down for a well-deserved sumptuous banquet. That’s a fitting coda to this International Year of Quantum. It’s useful to be reminded that the foundations for today’s superconducting quantum processors were established by fundamental research 40 years ago into macroscopic quantum phenomena. No doubt fundamental curiosity-driven quantum research will continue to uncover unforeseen technological opportunities in the future, just as it has in the past.
I have emphasized superconducting, ion-trap, and neutral atom processors because those are most advanced today, but it’s vital to continue to pursue alternatives that could suddenly leap forward, and to be open to new hardware modalities that are not top-of-mind at present. It is striking that programmable, gate-based quantum circuits in neutral-atom optical-tweezer arrays were first demonstrated only a few years ago, yet that platform now appears especially promising for advancing fault-tolerant quantum computing. Policy makers should take note!
The joy of nonlocal connectivity
As the fault-tolerant era dawns, we increasingly recognize the potential advantages of the nonlocal connectivity resulting from atomic movement in ion traps and tweezer arrays, compared to geometrically local two-dimensional processing in solid-state devices. Over the past few years, many contributions from both industry and academia have clarified how this connectivity can reduce the overhead of fault-tolerant protocols.
Even when using the standard surface code, the ability to implement two-qubit logical gates transversally—rather than through lattice surgery—significantly reduces the number of syndrome-measurement rounds needed for reliable decoding, thereby lowering the time overhead of fault tolerance. Moreover, the global control and flexible qubit layout in tweezer arrays increase the parallelism available to logical circuits.
Nonlocal connectivity also enables the use of quantum low-density parity-check (qLDPC) codes with higher encoding rates, reducing the number of physical qubits needed per logical qubit for a target logical error rate. These codes now have acceptably high accuracy thresholds, practical decoders, and—thanks to rapid theoretical progress this year—emerging constructions for implementing universal logical gate sets. (See for example here, here, here, here.)
A serious drawback of tweezer arrays is their comparatively slow clock speed, limited by the timescales for atom transport and qubit readout. A millisecond-scale syndrome-measurement cycle is a major disadvantage relative to microsecond-scale cycles in some solid-state platforms. Nevertheless, the reductions in logical-gate overhead afforded by atomic movement can partially compensate for this limitation, and neutral-atom arrays with thousands of physical qubits already exist.
To realize the full potential of neutral-atom processors, further improvements are needed in gate fidelity and continuous atom loading to maintain large arrays during deep circuits. Encouragingly, active efforts on both fronts are making steady progress.
Approaching cryptanalytic relevance
Another noteworthy development this year was a significant improvement in the physical qubit count required to run a cryptanalytically relevant quantum algorithm, reduced by Gidney to less than 1 million physical qubits from the 20 million Gidney and Ekerå had estimated earlier. This applies under standard assumptions: a two-qubit error rate of 10^{-3} and 2D geometrically local processing. The improvement was achieved using three main tricks. One was using approximate residue arithmetic to reduce the number of logical qubits. (This also suppresses the success probability and therefore lengthens the time to solution by a factor of a few.) Another was using a more efficient scheme to reduce the number of physical qubits for each logical qubit in cold storage. And the third was a recently formulated scheme for reducing the spacetime cost of non-Clifford gates. Further cost reductions seem possible using advanced fault-tolerant constructions, highlighting the urgency of accelerating migration from vulnerable cryptosystems to post-quantum cryptography.
Looking forward
Over the next 5 years, we anticipate dramatic progress toward scalable fault-tolerant quantum computing, and scientific insights enabled by programmable quantum devices arriving at an accelerated pace. Looking further ahead, what might the future hold? I was intrigued by a 1945 letter from John von Neumann concerning the potential applications of fast electronic computers. After delineating some possible applications, von Neumann added: “Uses which are not, or not easily, predictable now, are likely to be the most important ones … they will … constitute the most surprising extension of our present sphere of action.” Not even a genius like von Neumann could foresee the digital revolution that lay ahead. Predicting the future course of quantum technology is even more hopeless because quantum information processing entails an even larger step beyond past experience.
As we contemplate the long-term trajectory of quantum science and technology, we are hampered by our limited imaginations. But one way to loosely characterize the difference between the past and the future of quantum science is this: For the first hundred years of quantum mechanics, we achieved great success at understanding the behavior of weakly correlated many-particle systems, leading for example to transformative semiconductor and laser technologies. The grand challenge and opportunity we face in the second quantum century is acquiring comparable insight into the complex behavior of highly entangled states of many particles, behavior well beyond the scope of current theory or computation. The wonders we encounter in the second century of quantum mechanics, and their implications for human civilization, may far surpass those of the first century. So we should gratefully acknowledge the quantum pioneers of the past century, and wish good fortune to the quantum explorers of the future.
Welcome back to: Has quantum advantage been achieved?
In Part 1 of this mini-series on quantum advantage demonstrations, I told you about the idea of random circuit sampling (RCS) and the experimental implementations thereof. In this post, Part 2 out of 3, I will discuss the arguments and evidence for why I am convinced that the experiments demonstrate a quantum advantage.
Recall from Part 1 that to assess an experimental quantum advantage claim we need to check three criteria:
Does the experiment correctly solve a computational task?
Does it achieve a scalable advantage over classical computation?
Does it achieve an in-practice advantage over the best classical attempt at solving the task?
What’s the issue?
When assessing these criteria for the RCS experiments there is an important problem: The early quantum computers we ran them on were very far from being reliable and the computation was significantly corrupted by noise. How should we interpret this noisy data? Or more concisely:
Is random circuit sampling still classically hard even when we allow for whatever amount of noise the actual experiments had?
Can we be convinced from the experimental data that this task has actually been solved?
I want to convince you today that we have developed a very good understanding of these questions that gives a solid underpinning to the advantage claim. Developing that understanding required a mix of methodologies from different areas of science, including theoretical computer science, algorithm design, and physics and has been an exciting journey over the past years.
The noisy sampling task
Let us start by answering the base question. What computational task did the experiments actually solve?
Recall that, in the ideal RCS scenario, we are given a random circuit on qubits and the task is to sample from the output distribution of the state obtained from applying the circuit to a simple reference state. The output probability distribution of this state is determined by the Born rule when I measure every qubit in a fixed choice of basis.
Now what does a noisy quantum computer do when I execute all the gates on it and apply them to its state? Well, it prepares a noisy version of the intended state and once I measure the qubits, I obtain samples from the output distribution of that noisy state.
We should not make the task dependent on the specifics of that state or the noise that determined it, but we can define a computational task based on this observation by fixing how accurate that noisy state preparation is. The natural way to do this is to use the fidelity
which is just the overlap between the ideal state and the noisy state. The fidelity is 1 if the noisy state is equal to the ideal state, and 0 if it is perfectly orthogonal to it.
Finite-fidelity random circuit sampling Given a typical random circuit , sample from the output distribution of any quantum state whose fidelity with the ideal output state is at least .
Note that finite-fidelity RCS does not demand success for every circuit, but only for typical circuits from the random circuit ensemble. This matches what the experiments do: they draw random circuits and need the device to perform well on the overwhelming majority of those draws. Accordingly, when the experiments quote a single number as “fidelity”, it is really the typical (or, more precisely, circuit-averaged) fidelity that I will just call .
The noisy experiments claim to have solved finite-fidelity RCS for values of around 0.1%. What is more, they consistently achieve this value even as the circuit sizes are increased in the later experiments. Both the actual value and the scaling will be important later.
What is the complexity of finite-fidelity RCS?
Quantum advantage of finite-fidelity RCS
Let’s start off by supposing that the quantum computation is (nearly) perfectly executed, so the required fidelity is quite large, say, 90%. In this scenario, we have very good evidence based on computational complexity theory that there is a scalable and in-practice quantum advantage for RCS. This evidence is very strong, comparable to the evidence we have for the hardness of factoring and simulating quantum systems. The intuition behind it is that quantum output probabilities are extremely hard to compute because of a mechanism behind quantum advantages: destructive interference. If you are interested in the subtleties and the open questions, take a look at our survey.
The question is now, how far down in fidelity this classical hardness persists? Intuitively, the smaller we make , the easier finite-fidelity RCS should become for a classical algorithm (and a quantum computer, too), since the freedom we have in deviating from the ideal state in our simulation becomes larger and larger. This increases the possibility of finding a state that turns out to be easy to simulate within the fidelity constraint.
Somewhat surprisingly, though, finite-fidelity RCS seems to remain hard even for very small values of . I am not aware of any efficient classical algorithm that achieves the finite-fidelity task for significantly away from the baseline trivial value of . This is the value a maximally mixed or randomly picked state achieves because a random state has no correlation with the ideal state (or any other state), and is exactly what you expect in that case (while 0 would correspond to perfect anti-correlation).
One can save some classical runtime compared to solving near-ideal RCS by exploiting a reduced fidelity, but the costs remain exponential. To classically solve finite-fidelity RCS, the best known approaches are reported in the papers that performed classical simulations of finite-fidelity RCS with the parameters of the first Google and USTC experiment (classSim1, classSim2). To achieve this, however, they needed to approximately simulate the ideal circuits at an immense cost. To the best of my knowledge, all but those first two experiments are far out of reach for these algorithms.
Getting the scaling right: weak noise and low depth
So what is the right value of at which we can hope for a scalable and in-practice advantage of RCS experiments?
When thinking about this question, it is helpful to keep a model of the circuit in mind that a noisy experiment runs. So, let us consider a noisy circuit on qubits with layers of gates and single-qubit noise of strength on every qubit in every layer. In this scenario, the typical fidelity with the ideal state will decay as .
Any reasonably testable value of the fidelity needs to scale as , since eventually we need to estimate the average fidelity from the experimental samples and this typically requires at least samples, so exponentially small fidelities are experimentally invisible. The polynomial fidelity is also much closer to the near-ideal scenario (90%) than the trivial scenario (). While we cannot formally pin this down, the intuition behind the complexity-theoretic evidence for the hardness of near-ideal RCS persists into the regime: to sample up to such high precision, we still need a reasonably accurate estimate of the ideal probabilities, and getting this is computationally extremely difficult. Scalable quantum advantage in this regime is therefore a pretty safe bet.
How do the parameters of the experiment and the RCS instances need to scale with the number of qubits to experimentally achieve the fidelity regime? The limit to consider is one in which the noise rate decreases with the number of qubits, while the circuit depth is only allowed to increase very slowly. It depends on the circuit architecture, i.e., the choice of circuit connectivity, and the gate set, through a constant as I will explain in more detail below.
Weak-noise and low-depth scaling (Weak noise) The local noise rate of the quantum device scales as . (Low depth) The circuit depth scales as .
This limit is such that we have a scaling of the fidelity as for some constant . It is also a natural scaling limit for noisy devices whose error rates gradually improve through better engineering. You might be worried about the fact that the depth needs to be quite low but it turns out that there is a solid quantum advantage even for -depth circuits.
The precise definition of the weak-noise regime is motivated by the following observation. It turns out to be crucial for assessing the noisy data from the experiment.
Fidelity versus XEB: a phase transition
Remember from Part 1 that the experiments measured a quantity called the cross-entropy benchmark (XEB)
The XEB averages the ideal probabilities corresponding to the sampled outcomes from experiments on random circuits . Thus, it correlates the experimental and ideal output distributions of those random circuits. You can think of it as a “classical version” of the fidelity: If the experimental distribution is correct, the XEB will essentially be 1. If it is uniformly random, the XEB is 0.
The experiments claimed that the XEB is a good proxy for the circuit-averaged fidelity given by , and so we need to understand when this is true. Fortunately, in the past few years, alongside with the improved experiments, we have developed a very good understanding of this question (WN, Spoof2, PT1, PT2).
It turns out that the quality of correspondence between XEB and average fidelity depends strongly on the noise in the experimental quantum state. In fact, there is a sharp phase transition: there is an architecture-dependent constant such that when the experimental local noise rate , then the XEB is a good and reliable proxy for the average fidelity for any system size and circuit depth . This is exactly the weak-noise regime. Above that threshold, in the strong noise regime, the XEB is an increasingly bad proxy for the fidelity (PT1, PT2).
Let me be more precise: In the weak-noise regime, when we consider the decay of the XEB as a function of circuit depth , the rate of decay is given by , i.e., the XEB decays as . Meanwhile, in the strong-noise regime the rate of decay is constant, giving an XEB decay as for a constant . At the same time, the fidelity decays as regardless of the noise regime. Hence, in the weak-noise regime, the XEB is a good proxy of the fidelity, while in the strong noise regime, there is an exponentially increasing gap between the XEB (which remains large) and the fidelity (which continues to decay exponentially). regardless of the noise regime.
This is what the following plot shows. We computed it from an exact mapping of the behavior of the XEB to the dynamics of a statistical-mechanics model that can be evaluated efficiently. Using this mapping, we can also compute the noise threshold for whichever random circuit family and architecture you are interested in.
From (PT2). The -axis label is the decay rate of the XEB , the number of qubits and is the local noise rate.
Where are the experiments?
We are now ready to take a look at the crux when assessing the noisy data: Can we trust the reported XEB values as an estimator of the fidelity? If so, do the experiments solve finite-fidelity RCS in the solidly hard regime where ?
In their more recent paper (PT1), the Google team explicitly verified that the experiment is well below the phase transition, and it turns out that the first experiment was just at the boundary. The USTC experiments had comparable noise rates, and the Quantinuum experiment much better ones. Since fidelity decays as , but the reported XEB values stayed consistently around 0.1% as was increased, the experimental error rate of the experiments improved even better than the scaling required for the weak-noise regime, namely, more like . Altogether, the experiments are therefore in the weak-noise regime both in terms of absolute numbers relative to and the required scaling.
Of course, to derive the transition, we made some assumptions about the noise such as that the noise is local, and that it does not depend much on the circuit itself. In the advantage experiments, these assumptions about the noise are characterized and tested. This is done through a variety of means at increasing levels of complexity, including detailed characterization of the noise in individual gates, gates run in parallel, and eventually in a larger circuit. The importance of understanding the noise shows in the fact that a significant portion of the supplementary materials of the advantage experiments is dedicated to getting this right. All of this contributes to the experimental justification for using the XEB as a proxy for the fidelity!
The data shows that the experiments solved finite-fidelity RCS for values of above the constant value of roughly 0.1% as the experiments grew. In the following plot, I compare the experimental fidelity values to the near-ideal scenario on the one hand, and the trivial value on the other hand. Viewed at this scale, the values of for which the experiment solved finite-fidelity RCS are indeed vastly closer to the near-ideal value than the trivial baseline, which should boost our confidence that reproducing samples at a similar fidelity is extremely challenging.
The phase transition matters!
You might be tempted to say: “Well, but is all this really so important? Can’t I just use XEB and forget all about fidelity?”
The phase transition shows why that would change the complexity of the problem: in the strong-noise regime, XEB can stay high even when fidelity is exponentially small. And indeed, this discrepancy can be exploited by so-called spoofers for the XEB. These are efficient classical algorithms which can be used to succeed at a quantum advantage test even though they clearly do not achieve the intended advantage. These spoofers (Spoof1, Spoof2) can achieve high XEB scores comparable to those of the experiments and scaling like in the circuit depth for some constant .
Their basic idea is to introduce strong, judiciously chosen noise at specific circuit locations that has the effect of breaking up the simulation task up into smaller, much easier components, but at the same time still gives a high XEB score. In doing so, they exploit the strong-noise regime in which the XEB is a really bad proxy for the fidelity. This allows them to sample from states with exponentially low fidelity while achieving a high XEB value.
The discovery of the phase transition and the associated spoofers highlights the importance of modeling when assessing—and even formulating—the advantage claim based on noisy data.
But we can’t compute the XEB!
You might also be worried that the experiments did not actually compute the XEB in the advantage regime because to estimate it they would have needed to compute ideal probabilities—a task that is hard by definition of the advantage regime. Instead, they used a bunch of different ways to extrapolate the true XEB from XEB proxies (proxy of a proxy of the fidelity). Is this is a valid way of getting an estimate of the true XEB?
It totally is! Different extrapolations—from easy-to-simulate to hard-to-simulate, from small system to large system etc—all gave consistent answers for the experimental XEB value of the supremacy circuits. Think of this as having several lines that cross in the same point. For that crossing to be a coincidence, something crazy, conspiratorial must happen exactly when you move to the supremacy circuits from different directions. That is why it is reasonable to trust the reported value of the XEB.
That’s exactly how experiments work!
All of this is to say that establishing that the experiments correctly solved finite-fidelity RCS and therefore show quantum advantage involved a lot of experimental characterization of the noise as well as theoretical work to understand the effects of noise on the quantity we care about—the fidelity between the experimental and ideal states.
In this respect (and maybe also in the scale of the discovery), the quantum advantage experiments are similar to the recent experiments reporting discovery of the Higgs boson and gravitational waves. While I do not claim to understand any of the details, what I do understand is that in both experiments, there is an unfathomable amount of data that could not be interpreted without preselection and post-processing of the data, theories, extrapolations and approximations that model the experiment and measurement apparatus. All of those enter the respective smoking-gun plots that show the discoveries.
If you believe in the validity of experimental physics methodology, you should therefore also believe in the type of evidence underlying experimental claim of the quantum advantage demonstrations: that they sampled from the output distribution of a quantum state with the reported fidelities.
Put succinctly: If you believe in the Higgs boson and gravitational waves, you should probably also believe in the experimental demonstration of quantum advantage.
What are the counter-arguments?
The theoretical computer scientist
“The weak-noise limit is not physical. The appropriate scaling limit is one in which the local noise rate of the device is constant while the system size grows, and in that case, there is a classical simulation algorithm for RCS (SimIQP, SimRCS).”
In theoretical computer science, scaling of time or the system size in the input size is considered very natural: We say an algorithm is efficient if its runtime and space usage only depend polynomially on the input size.
But all scaling arguments are hypothetical concepts, and we only care about the scaling at relevant sizes. In the end, every scaling limit is going to hit the wall of physical reality—be it the amount of energy or human lifetime that limits the time of an algorithm, or the physical resources that are required to build larger and larger computers. To keep the scaling limit going as we increase the size of our computations, we need innovation that makes the components smaller and less noisy.
At the scales relevant to RCS, the scaling of the noise is benign and even natural. Why? Well, currently, the actual noise in quantum computers is not governed by the fundamental limit, but by engineering challenges. Realizing this limit therefore amounts to engineering improvements in the system size and noise rate that are achieved over time. Sure, at some point that scaling limit is also going to hit a fundamental barrier below which the noise cannot be improved. But we are surely far away from that limit, yet. What is more, already now logical qubits are starting to work and achieve beyond-breakeven fidelities. So even if the engineering improvements should flatten out from here onward, QEC will keep the noise limit going and even accelerate it in the intermediate future.
The complexity maniac
“All the hard complexity-theoretic evidence for quantum advantage is in the near-ideal regime, but now you are claiming advantage for the low-fidelity version of that task.”
This is probably the strongest counter-argument in my opinion, and I gave my best response above. Let me just add that this is a question about computational complexity. In the end, all of complexity theory is based on belief. The only real evidence we have for the hardness of any task is the absence of an efficient algorithm, or the reduction to a paradigmatic, well-studied task for which there is no efficient algorithm.
I am not sure how much I would bet that you cannot find an efficient algorithm for finite-fidelity RCS in the regime of the experiments, but it is certainly a pizza.
The enthusiastic skeptic
“There is no verification test that just depends on the classical samples, is efficient and does not make any assumptions about the device. In particular, you cannot unconditionally verify fidelity just from the classical samples. Why should I believe the data?”
Yes, sure, the current advantage demonstrations are not device-independent. But the comparison you should have in mind are Bell tests. The first proper Bell tests of Aspect and others in the 80s were not free of loopholes. They still allowed for contrived explanations of the data that did not violate local realism. Still, I can hardly believe that anyone would argue that Bell inequalities were not violated already back then.
As the years passed, these remaining loopholes were closed. To be a skeptic of the data, people needed to come up with more and more adversarial scenarios that could explain the data. We are working on the same to happen with quantum advantage demonstrations: come up with better schemes and better tests that require less and less assumptions or knowledge about the specifics of the device.
The “this is unfair” argument
“When you chose the gates and architecture of the circuit dependent on your device, you tailored the task too much to the device and that is unfair. Not even the different RCS experiments solve exactly the same task.”
This is not really an argument against the achievement of quantum advantage but more against the particular choices of circuit ensembles in the experiments. Sure, the specific computations solved are still somewhat tailored to the hardware itself and in this sense the experiments are not hardware-independent yet, but they still solve fine computational tasks. Moving away from such hardware-tailored task specifications is another important next step and we are working on it.
In the third and last part of this mini series I will address next steps in quantum advantage that aim at closing some of the remaining loopholes. The most important—and theoretically interesting—one is to enable efficient verification of quantum advantage using less or even no specific knowledge about the device that was used, but just the measurement outcomes.
References
(survey) Hangleiter, D. & Eisert, J. Computational advantage of quantum random sampling. Rev. Mod. Phys.95, 035001 (2023).
(classSim1) Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett.129, 090502 (2022).
(classSim2) Kalachev, G., Panteleev, P., Zhou, P. & Yung, M.-H. Classical Sampling of Random Quantum Circuits with Bounded Fidelity. arXiv.2112.15083 (2021).
(WN) Dalzell, A. M., Hunter-Jones, N. & Brandão, F. G. S. L. Random Quantum Circuits Transform Local Noise into Global White Noise. Commun. Math. Phys.405, 78 (2024).
(PT1)vMorvan, A. et al. Phase transitions in random circuit sampling. Nature634, 328–333 (2024).
(PT2) Ware, B. et al. A sharp phase transition in linear cross-entropy benchmarking. arXiv:2305.04954 (2023).
(Spoof1) Barak, B., Chou, C.-N. & Gao, X. Spoofing Linear Cross-Entropy Benchmarking in Shallow Quantum Circuits. in 12th Innovations in Theoretical Computer Science Conference (ITCS 2021) (ed. Lee, J. R.) vol. 185 30:1-30:20 (2021).
(Spoof2) Gao, X. et al. Limitations of Linear Cross-Entropy as a Measure for Quantum Advantage. PRX Quantum5, 010334 (2024).
(SimIQP) Bremner, M. J., Montanaro, A. & Shepherd, D. J. Achieving quantum supremacy with sparse and noisy commuting quantum computations. Quantum1, 8 (2017).
(SimRCS) Aharonov, D., Gao, X., Landau, Z., Liu, Y. & Vazirani, U. A polynomial-time classical algorithm for noisy random circuit sampling. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 945–957 (2023).
I recently listened again to Richard Feynman explaining why the flowing of time is probably an illusion. In modern physics time is just a coordinate, on the same footing as space, and the universe can be described as a four-dimensional object — a spacetime block. In that view, nothing really “flows”. All events simply are, laid out in a 4D structure. What we experience as the passage of time is tied instead to the arrow of entropy: the fact that we move through a sequence of states ordered by increasing disorder, and that memory itself is asymmetric.
Today I was saddened to hear of the passing of Hans Jensen, a physicist and former colleague in the CDF experiment at Fermilab. There is an obituary page here with nice pics and a bio if you want detail on his interesting, accomplished life. Here I thought I would remember him by pasting an excerpt of my 2016 book, "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", where he is featured. The topic of the anecdote is the data collection for the top quark search. The date is December 1992. ---
A basic problem in sieve theory is to understand what happens when we start with the integers (or some subinterval of the integers) and remove some congruence classes for various moduli . Here we shall concern ourselves with the simple setting where we are sieving the entire integers rather than an interval, and are only removing a finite number of congruence classes . In this case, the set of integers that remain after the sieving is periodic with period , so one work without loss of generality in the cyclic group . One can then ask: what is the density of the sieved set
If the were all coprime, then it is easy to see from the Chinese remainder theorem that the density is given by the product
However, when the are not coprime, the situation is more complicated. One can use the inclusion-exclusion formula to get a complicated expression for the density, but it is not easy to work with. Sieve theory also supplies one with various useful upper and lower bounds (starting with the classical Bonferroni inequalities), but do not give exact formulae.
In this blog post I would like to note one simple fact, due to Rogers, that one can say about this problem:
Theorem 1 (Rogers’ theorem) For fixed , the density of the sieved set is maximized when all the vanish. Thus,
Example 2 If one sieves out , , and , then only remains, giving a density of . On the other hand, if one sieves out , , and , then the remaining elements are and , giving the larger density of .
This theorem is somewhat obscure: its only appearance in print is in pages 242-244 of this 1966 text of Halberstam and Roth, where the authors write in a footnote that the result is “unpublished; communicated to the authors by Professor Rogers”. I have only been able to find it cited in three places in the literature: in this 1996 paper of Lewis, in this 2007 paper of Filaseta, Ford, Konyagin, Pomerance, and Yu (where they credit Tenenbaum for bringing the reference to their attention), and is also briefly mentioned in this 2008 paper of Ford. As far as I can tell, the result is not available online, which could explain why it is rarely cited (and also not known to AI tools). This became relevant recently with regards to Erdös problem 281, posed by Erdös and Graham in 1980, which was solved recently by Neel Somani through an AI query by an elegant ergodic theory argument. However, shortly after this solution was located, it was discovered by KoishiChan that Rogers’ theorem reduced this problem immediately to a very old result of Davenport and Erdös from 1936. Apparently, Rogers’ theorem was so obscure that even Erdös was unaware of it when posing the problem!
Modern readers may see some similarities between Rogers’ theorem and various rearrangement or monotonicity inequalites, suggesting that the result may be proven by some sort of “symmetrization” or “compression” method. This is indeed the case, and is basically Rogers’ original proof. We can modernize a bit as follows. Firstly, we can abstract into a finite cyclic abelian group , with residue classes now becoming cosets of various subgroups of . We can take complements and restate Rogers’ theorem as follows:
Theorem 3 (Rogers’ theorem, again) Let be cosets of a finite cyclic abelian group . Then
Example 4 Take , , , and . Then the cosets , , and cover the residues , with a cardinality of ; but the subgroups cover the residues , having the smaller cardinality of .
Intuitively: “sliding” the cosets together reduces the total amount of space that these cosets occupy. As pointed out in comments, the requirement of cyclicity is crucial; four lines in a finite affine plane already suffice to be a counterexample otherwise.
By factoring the cyclic group into -groups, Rogers’ theorem is an immediate consequence of two observations:
Theorem 5 (Rogers’ theorem for cyclic groups of prime order) Rogers’ theorem holds when for some prime power .
Theorem 6 (Rogers’ theorem preserved under products) If Rogers’ theorem holds for two finite abelian groups of coprime orders, then it also holds for the product .
The case of cyclic groups of prime order is trivial, because the subgroups of are totally ordered. In this case is simply the largest of the , which has the same size as and thus has lesser or equal cardinality to .
The preservation of Rogers’ theorem under products is also routine to verify. By the coprime orders of and standard group theoretic arguments (e.g., Goursat’s lemma, the Schur–Zassenhaus theorem, or the classification of finite abelian groups), one can see that any subgroup of splits as a direct product of subgroups of respectively, so the cosets also split as
Applying Rogers’ theorem for to each “vertical slice” of and summing, we see that
and then applying Rogers’ theorem for to each “horizontal slice” of and summing, we obtain
Combining the two inequalities, we obtain the claim.
Abstract. Coxeter and Dynkin diagrams classify a wide variety of structures, most notably finite reflection groups, lattices having such groups as symmetries, compact simple Lie groups and complex simple Lie algebras. The simply laced or “ADE” Dynkin diagrams also classify finite subgroups of SU(2) and quivers with finitely many indecomposable representations. This introductory tour of Coxeter and Dynkin diagrams, based on the column This Week’s Finds in Mathematical Physics, is made to accompany a series of five lecture videos.
I’m a bit sorry that I didn’t probe deeper into why Dynkin diagrams are what they are: that is, why these and no others? I’m also sorry I didn’t dig into the “black magic” that I mention at the end: that is, why does this black magic work? I’d also like to include a little comparison of the 4 lattices you get from the Lie algebra of a compact simple Lie group: the weight lattice, the coweight lattice, the root lattice, and the coroot lattice — merely because I tend to get them confused, and my exposition needed to say a bit about these.
Luckily I can add these other things later. And I think keeping it short and snappy has its own charms.
When Lee and Yang suggested that the laws of physics might not be invariant under spatial reflection — that there’s a fundamental difference between left and right — Pauli was skeptical. In a letter to Victor Weisskopf in January 1957, he wrote:
“Ich glaube aber nicht, daß der Herrgott ein schwacher Linkshänder ist.”
(I do not believe that the Lord is a weak left-hander.)
But just two days after Pauli wrote this letter, Chien-Shiung Wu’s experiment confirmed that Lee and Yang were correct. There’s an inherent asymmetry in nature.
We can trace this back to how the ‘left-handed’ fermions and antifermions live in a different representation of the Standard Model gauge group than the right-handed ones. And when we try to build grand unified theories that take this into account, we run into the fact that while we can fit the Standard Model gauge group into in various ways, not all these ways produce the required asymmetry. There’s a way where it fits into , which is too symmetrical to work… and alas, this one has a nice octonionic description!
To keep things simple I’ll explain this by focusing, not on the whole Standard Model gauge group, but its subgroup . Here is a theorem proved by Will Sawin in response to a question of mine on MathOverflow:
Theorem 10. There are exactly two conjugacy classes of subgroups of that are isomorphic to . One of them has a representative that is a subgroup of , while the other does not.
I’ll describe representatives of these two subgroups; then I’ll say a bit about how they show up in physics, and then I’ll show you Sawin’s proof.
We can get both subgroups in a unified way! There’s always an inclusion
and taking double covers of each group we get a 2-1 homomorphism
In particular we have
so composing with the exceptional isomorphisms:
we get a 2-1 homomorphism
Now, there are three obvious ways to include in . There is an obvious inclusion
but there are three obvious inclusions
namely the left one:
the right one:
and the diagonal one:
Combining these with our earlier maps, we actually get a one-to-one map from to . So we get three subgroups of , all isomorphic to :
There’s the left subgroup , which is the image of this composite homomorphism:
There’s the diagonal subgroup , which is the image of this:
And there’s the right subgroup , which is the image of this:
The left and right subgroups are actually conjugate, but the diagonal one is truly different! We’ll prove this by taking a certain representation of , called the Weyl spinor representation, and restricting it to those two subgroups. We’ll get inequivalent representations of . This proves the two subgroups aren’t conjugate.
This argument is also interesting for physics. When restrict to the left subgroup, we get a representation of that matches what we actually see for one generation of fermions! This is the basis of the so-called grand unified theory, which should really be called the grand unified theory.
(In fact this works not only for but for the whole Standard Model gauge group, which is larger. I’m focusing on just because it makes the story simpler.)
When we restrict the Weyl spinor representation to the diagonal subgroup, we get a representation of that is not physically correct. Unfortunately, it’s the diagonal subgroup that shows up in several papers connecting the Standard Model gauge group to the octonions. I plan to say a lot more about this later.
The left subgroup
Let’s look at the left subgroup , the image of this composite:
has a 32-dimensional unitary representation called the ‘Dirac spinor’ representation. This representation is really on the exterior algebra . It’s the direct sum of two irreducible parts, the even grades and the odd grades:
Physicists call these two irreducible representations ‘right- and left-handed Weyl spinors’, and denote them as and since they’re 16-dimensional and one is the dual of the other.
Let’s restrict the to the left subgroup and see what we get.
To do this, first we can restrict the along and get
Here is the trivial representation of , is the tautologous representation of , and is the tautologous rep of .
Then let’s finish the job by restricting this representation along . Restricting the of to gives : the sum of the tautologous representation of and the trivial representation. Restricting to the left copy of gives the tautologous representation , while restricting to this left copy gives : the sum of two copies of the trivial representation. All in all, we get this representation of :
This is what we actually see for one generation of left-handed fermions and antifermions in the Standard Model! The representation describes how the left-handed fermions in one generation transform under : 3 colors of quark and one ‘white’ lepton. The representation does the same for the left-handed antifermions. The left-handed fermions form an isospin doublet, giving us the , while the left-handed antifermions have no isospin, giving us the .
This strange lopsidedness is a fundamental feature of the Standard Model.
The right subgroup would work the same way, up to switching the words ‘left-handed’ and ‘right-handed’. And by Theorem 10, the left and right subgroups must be conjugate in , because now we’ll see one that’s not conjugate to either of these.
The diagonal subgroup
Consider the diagonal subgroup , the image of this composite:
Let’s restrict the to .
To do this, first let’s restrict the along and get
as before. Then let’s restrict this representation along . The part works as before, but what happens when we restrict or along the diagonal map ? We get . So, this is the representation of that we get:
This is not good for the Standard Model. It describes a more symmetrical universe than ours, where both left-handed fermions and antifermions transform as doublets under .
The fact that we got a different answer this time proves that and are not conjugate in . So to complete the proof of Theorem 10, we only need to prove
Every subgroup of isomorphic to is conjugate to or .
is conjugate to a subgroup of , but is not.
I’ll prove 2, and then I’ll turn you over to Will Sawin to do the rest.
Why the diagonal subgroup fits in
Every rotation of extends to a rotation of that leaves the last coordinate fixed, so we get an inclusion , which lifts to an inclusion of the double covers, . Since we have exceptional isomorphisms
it’s natural to ask how the inclusion looks in these terms. And the answer is: it’s the diagonal map! In other words, we have a commutative diagram
Now, we can easily fit this into a larger commutative diagram involving some natural maps and :
We can simplify this diagram using the isomorphism :
and then we can use our friend the inclusion :
This shows that the diagonal subgroup of is actually a subgroup of !
Why the left subgroup does not fit in
The three-fold way is a coarse classification of irreducible complex representations of compact Lie group. Every such representation is of one and only one of these three kinds:
1) not self-dual: not isomorphic to its dual,
2a) orthogonal: isomorphic to its dual via an invariant nondegenerate symmetric bilinear form, also called an orthogonal structure,
2b) symplectic: isomorphic to its dual via an invariant nondegenerate antisymmetric bilinear form, also called a symplectic structure.
I’ve written about how these three cases are related to the division algebras and , respectively:
A complex representation is orthogonal iff it’s the complexification of a representation on a real vector space, and symplectic iff it’s the underlying complex representation of a representation on a quaternionic vector space.
But we don’t need most of this yet. For now we just need to know one fact: when is odd, every irreducible representation of , and thus every representation of this Lie group, is self-dual: that is, isomorphic to its dual. In particular this is true of .
Why does this matter? Assume the left subgroup is a subgroup of . When we restrict the Weyl spinor representation of to it will be self-dual, like every representation of . Then when we restrict this representation further to it must still be self-dual, since the restriction of a self-dual representation is clearly self-dual.
However, we know this representation is
and this is not self-dual, since and but .
So, it must be that is not a subgroup of .
Proof of Theorem 10
To complete the proof of Theorem 10 we just need to see why there are just two conjugacy classes of subgroups of isomorphic to . But in fact Will Sawin proved a stronger result! He was answering this question of mine:
Define the Standard Model gauge group to be , the subgroup of consisting of block diagonal matrices with a block and then a block. (This is isomorphic to the quotient of by the subgroup of elements ) where is a 6th root of unity.)
Up to conjugacy, how many subgroups isomorphic to the Standard Model gauge group does have?
This question is relevant to grand unified theories of particle physics, as explained here:
This paper focuses on one particular copy of in , given as follows. By definition we have an inclusion , and we also have an inclusion because for any we have an inclusion , and is simply connected so this gives a homomorphism .
However I think there is also an inclusion , studied by Krasnov:
Composing this with , this should give another inclusion , and I believe this one is ‘truly different from’ — i.e., not conjugate to — the first one I mentioned.
So I believe my current answer to my question is “at least two”. But that’s not good enough.
Sawin’s answer relies heavily on the 3-fold way — that’s why I told you that stuff about orthogonal and symplectic representatinos. When we embed the group in , we are automatically giving this group an orthogonal 10-dimensional representation, thanks to the map . We can classify the possibilities.
He writes:
There are infinitely many embeddings. However, all but one of them is “essentially the same as” the one you studied as they become equal to the one you studied on restriction to . The remaining one is the one studied by Krasnov.
has irreducible representations of dimensions , and higher dimensions. The -dimensional ones are dual to each other, as are the -dimensional ones, so they can’t appear. The -dimensional ones are dual to each other and can only appear together. So the only -dimensional self-dual representations of decompose as irreducibles as , , or ten s. All of these are orthogonal because the 8-dimensional representation is orthogonal. However, the ten s cannot appear because then would act trivially.
A representation of is a sum of tensor products of irreducible representations of and irreducible representations of . Restricted to , each tensor product splits into a sum of copies of the same irreducible representation. So can only act nontrivially when the same representation appears multiple times. Since the is two different -dimensional representation, only the -dimensional representation can occur twice. Thus, our 10-dimensional orthogonal representation of necessarily splits as either the -dimensional adjoint repsentation of plus a -dimensional orthogonal representation of or the -dimensional sum of standard and conjugate [i.e., dual] representations of plus a -dimensional orthogonal representation of . However, has a unique nontrivial representation of dimension and it isn’t orthgonal, so only the second case can appear. has representations of dimension of which the and -dimensional ones are symplectic and so must appear with even multiplicity in any orthogonal representation, so the only nontrivial -dimensional orthogonal ones are or .
So there are two ten-dimensional orthogonal representations of that are nontrivial on both factors, those being the sum of two different -dimensional irreducible representations of with either two copies of the two-dimensional irreducible representation of or the three-dimensional and the one-dimensional irreducible representation of . The orthogonal structure is unique up to isomorphisms, so these give two conjugacy classes of homomorphisms and thus two conjugacy classes of homomorphisms . The first one corrresponds to the embedding you studied while only the second one restricts to so indeed these are different.
To understand how to extend these to , I consider the centralizer of the representation within . Since the group is connected, this is the same as the centralizer of its Lie algebra, which is therefore the inverse image of the centralizer in . Now there is a distinction between the two examples because the example with irrep dimensions has centralizer with identity component while the example with irrep dimensions has centralizer with identity component . In the second case, the image of must be the image of times the centralizer of the image of , so this gives a unique example, which must be the one considered by Krasnov.
In the first case, we can restrict attention to a torus in . The center of maps to a one-dimensional subgroup of this torus, which can be described by a pair of integers. Explicitly, given a two-by-two-unitary matrix and a three-by-three unitary matrix with , we can map to by sending to where , and then map from to . This lifts to the spin group if and only if the determinant in is a perfect square. The determinant is so a lift exists if and only if is even.
The only possible kernel of this embedding is the scalars. The scalar maps to and so the kernel is trivial if and only if .
However, there are infinitely many integer solutions to with even (in fact, a random and even works with probability ), so this gives infinitely many examples.
Part 1. How to define octonion multiplication using complex scalars and vectors, much as quaternion multiplication can be defined using real scalars and vectors. This description requires singling out a specific unit imaginary octonion, and it shows that octonion multiplication is invariant under .
Part 2. A more polished way to think about octonion multiplication in terms of complex scalars and vectors, and a similar-looking way to describe it using the cross product in 7 dimensions.
Part 3. How a lepton and a quark fit together into an octonion — at least if we only consider them as representations of , the gauge group of the strong force. Proof that the symmetries of the octonions fixing an imaginary octonion form precisely the group .
Part 4. Introducing the exceptional Jordan algebra : the self-adjoint octonionic matrices. A result of Dubois-Violette and Todorov: the symmetries of the exceptional Jordan algebra preserving their splitting into complex scalar and vector parts and preserving a copy of the adjoint octonionic matrices form precisely the Standard Model gauge group.
Part 5. How to think of self-adjoint octonionic matrices as vectors in 10d Minkowski spacetime, and pairs of octonions as left- or right-handed spinors.
Part 6. The linear transformations of the exceptional Jordan algebra that preserve the determinant form the exceptional Lie group . How to compute this determinant in terms of 10-dimensional spacetime geometry: that is, scalars, vectors and left-handed spinors in 10d Minkowski spacetime.
Part 7. How to describe the Lie group using 10-dimensional spacetime geometry. This group is built from the double cover of the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 8. A geometrical way to see how is connected to 10d spacetime, based on the octonionic projective plane.
Part 9. Duality in projective plane geometry, and how it lets us break the Lie group into the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 10. Jordan algebras, their symmetry groups, their invariant structures — and how they connect quantum mechanics, special relativity and projective geometry.
Part 11. Particle physics on the spacetime given by the exceptional Jordan algebra: a summary of work with Greg Egan and John Huerta.
Part 12. The bioctonionic projective plane and its connections to algebra, geometry and physics.
Part 13. Two ways to embed in , and their consequences for particle physics.
The bioctonionic plane also has intriguing mathematically connections to the Standard Model. But it’s not a projective plane in the axiomatic sense — and it can’t be constructed by straightforwardly copying the way you build a projective plane over a division algebra, since unlike the octonions, the bioctonions are not a division algebra. Nonetheless we can define points and lines in the bioctonionic plane. The twist is that now some pairs of distinct lines intersect in more than one point — and dually, some pairs of distinct points lie on more than one line. It obeys some subtler axioms, so people call it a Hjelmslev plane.
I am not ready to give a really good explanation of the bioctonionic plane! Instead, I just want to lay out some basic facts about how it fits into mathematics — and possibly physics.
Latham Boyle works at the University of Edinburgh, which is where I am now. Being able to talk to someone who deeply understands octonions and particle physics is very energizing. I’m especially fascinated by this paper of his:
It gives a convincing argument that the bioctonionic plane may be better than the octonionic projective plane for particle physics. The reason is that the tangent space of any point of the bioctonionic plane is a copy of , a 16-dimensional complex vector space. The symmetry group of the bioctonionic plane is the exceptional Lie group . Sitting inside the stabilizer group of any given point is a copy of the Standard Model gauge group. And — here’s the cool part — this group acts on just as it does on one generation of fermions (not their antiparticles). If we try the same trick using the octonionic projective plane, we can fit the Standard Model gauge group in the stabilizer group of a point in a very natural way, but its action on the tangent space is its action only on left-handed fermions.
I want to explain this in detail, but not today. Instead, I want to skim through some basic facts about the bioctonionic plane.
the octonionic projective plane , a 16-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is a symmetric space, and as such it’s called FII in Cartan’s classification.
the bioctonionic plane , a 32-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is the symmetric space EIII.
the quateroctonionic plane , a 64-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is the symmetric space EVI.
the octooctonionic plane , a 128-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is the symmetric space EVIII.
There’s a nice network of systematic approaches to these spaces: they form one row of the so-called magic square, so one way to learn about the bioctonionic plane is to study the magic square, for example here:
Chris H. Barton and Anthony Sudbery, Magic squares of Lie algebras. Available as arXiv:math/0001083; see also arXiv:0203010 for a “streamlined and extended” version, which has more yet also less.
Here you can also find lots of references to earlier work, e.g. to Freudenthal and Tits. The basic idea of the magic square is that you start with two normed division algebras and from them you build a Lie algebra, which gives a Lie group . There’s also a way to get a subgroup , and the quotient space
is a kind of ‘plane’ on which the group acts. If you take , this construction gives you the four Rosenfeld planes listed above.
Each one of these planes is a compact Riemannian symmetric space: this means it’s a connected compact Riemannian manifold such that for each point there’s an isometry
that fixes , acts as on its tangent space, and squares to the identity. This map is called ‘reflection across ’ for the obvious reason. For example, a round 2-sphere is a symmetric space, with switching the red and blue tangent vectors if is the black dot:
Cartan classified compact Riemannian symmetric spaces, and there’s a nice theory of them. Any compact simple Lie group is one, and most of the rest come in infinite series connected to real and complex Clifford algebras, as I explained here. But there are 9 extra ones, all related to the octonions and exceptional Lie groups. The Rosenfeld planes are four of these.
You can learn this material starting on Wikipedia and then going to a textbook, ideally this:
Sigurdur Helgason, Differential Geometry, Lie Groups and Symmetric Spaces, Academic Press, 1978.
Helgason taught me Lie theory when I was a grad student at MIT, so I have a fondness for his book—but it’s also widely accepted as the most solid text on symmetric spaces!
The bioctonionic plane is even better: it’s a compact hermitian symmetric space: a compact Riemannian symmetric space where each tangent space has a complex structure
compatible with the metric, and reflection about each point preserves this complex structure. I mentioned that the bioctonionic plane is
where
acts transitively, and the stabilizer of a point is
The here comes from the complex structure!
Wikipedia is especially thorough on hermitian symmetric spaces, so if you want to delve into those, start here:
Another tack is to focus on the exceptional Lie groups and and their connection to the nonassociative algebras ,
, and , respectively. Here I recommend this:
Ichiro Yokota, Exceptional Lie Groups. Available as arXiv:0902.0431. (See especially Chapter 3, for and the complexified exceptional Jordan algebra .)
If you have a fondness for algebra you may also want to learn how symmetric spaces arise from Jordan triple systems or Jordan pairs. This is important if we wish to see the bioctonionic plane as the space of pure states of some exotic quantum system!
Now, this is much easier to do for the octonionic plane, because that’s the space of pure states for the exceptional Jordan algebra , which is a Euclidean Jordan algebra, meaning one in which a sum of squares can only be zero if all those squares are zero. You can think of a Euclidean algebra as consisting of observables, with the sums of squares being ‘nonnegative’ observables. These nonnegative observables form a convex cone . The dual vector space contains a cone of linear functionals that send these nonnegative observables to nonnegative real numbers — I’ll call this the dual cone . The functionals with are called states. The states form a convex set, and the extreme points are called pure states. All of this fits nicely into a modern framework for understanding quantum theory and potential generalizations, called ‘generalized probabilistic theories’:
Howard Barnum, Alexander Wilce, Post-classical probability theory, in Quantum Theory: Informational Foundations and Foils, eds.
Giulio Chiribella, Robert W. Spekkens, Springer, 2016. (See Section 5 for Jordan algebras, and ignore the fact that they say the exceptional Jordan algebra consists of matrices: they know perfectly well that they’re .)
The underlying math, with a lot more about symmetric spaces, cones and Euclidean Jordan algebras but with none of the physics interpretation, is wonderfully explained here:
Jacques Faraut and Adam Korányi, Analysis on Symmetric Cones, Oxford U. Press, 1994.
A crucial fact throughout this book is that when you start with a Euclidean Jordan algebra , its cone of nonnegative observables is self-dual: there’s an isomorphism of vector spaces that maps to in a one-to-one and onto way. The cone is also homogeneous, meaning that the group of invertible linear transformations of preserving acts transitively on the interior of . Faraut and Korányi call a self-dual homogeneous cone a symmetric cone — and they show that any symmetric cone comes from a Euclidean Jordan algebra! This result plays an important role in modern work on the foundations of quantum theory.
Unfortunately, I’m telling you all this nice stuff about Euclidean Jordan algebras and symmetric cones just to say that while all this applies to the octonionic projective plane, sadly, it does not apply to the bioctonionic plane! The bioctonionic plane does not come from a Euclidean Jordan algebra or a symmetric cone. Thus, to understand it as a space of pure states, we’d have to resort to a more general formalism.
There are a few papers that attempt exactly this:
Lawrence C. Biedenharn and Piero Truini, An invariant quantum mechanics for a Jordan pair, Journal of Mathematical Physics23 (1982), 1327-1345.
Lawrence C. Biedenharn and Piero Truini, Exceptional groups and elementary particle structures,
Physica A: Statistical Mechanics and its Applications14 (1982), 257–270.
Lawrence C. Biedenharn, G. Olivieri and Piero Truini, Three graded exceptional algebras and symmetric spaces, Zeitschrift Physik C — Particles and Fields33 (1986), 47–65.
Here’s the basic idea. We can define to consist of hermitian matrices with entries in , where ‘hermitian’ is defined using the star-algebra structure on where we conjugate the octonion part but not the complex part! Then is just the complexification of :
Then because is a Jordan algebra over , is a Jordan algebra over . So we can do a lot with it. But it’s not a Euclidean Jordan algebra.
Puzzle. Show that it’s not.
So, Biedenharn and Truini need a different approach to relate to some sort of exotic quantum system. And they use an approach already known to mathematicians: namely, the theory of Jordan pairs! Here you work, not with a single element of , but with a pair.
Jordan triple systems and Jordan pairs are two closely related generalizations of Jordan algebras. I’ve been working on the nLab articles about these concepts, so click the links if you want to learn more about them. I explain how either of these things gives you a 3-graded Lie algebra — that is, a -graded Lie algebra that is nonvanishing only in the middle 3 grades:
And from a 3-graded Lie algebra you can get a symmetric space where the Lie algebra of is and the Lie algebra of is . Each tangent space of this symmetric space is thus isomorphic to .
In the case relevant to the bioctonionic plane, the 3-graded Lie algebra is
So, the bioctonionic plane is a symmetric space on which acts, with stabilizer group (up to covering spaces), and with tangent space isomorphic to .
So all this is potentially very nice. For much more on this theory, try the work of Ottmar Loos:
To wrap things up, I should say a bit about ‘Hjelmslev planes’, since the bioctonionic plane is supposed to be one of these. Axiomatically, a Hjelmslev plane is a set of points, a set of lines, and an incidence relation between points and lines. We require that for any two distinct points there is at least one line incident to both, and for any two distinct line there is at least one point incident to both. If two points are incident to more than one line we say they are neighbors. If two lines are incident to more than one point we say they are neighbors. We demand that both these ‘neighbor’ relations are equivalence relations, and that if we quotient and by these equivalence relations, we get an axiomatic projective plane.
Challenge. What projective plane do we get if we apply this quotient construction to the bioctonionic plane?
My only guess is that we get the octonionic projective plane — but I don’t know why.
The literature on Hjelmslev planes seems a bit difficult, but I’m finding this to be a good introduction:
The answer to my puzzle should be here, because they’re talking about Hjelmslev planes built using split octonion algebras (like ):
Tonny A. Springer and Ferdinand D. Veldkamp, On Hjelmslev–Moufang planes, Mathematicsche Zeitschrift107 (1968), 249–263.
But I don’t see the answer here yet!
Part 1. How to define octonion multiplication using complex scalars and vectors, much as quaternion multiplication can be defined using real scalars and vectors. This description requires singling out a specific unit imaginary octonion, and it shows that octonion multiplication is invariant under .
Part 2. A more polished way to think about octonion multiplication in terms of complex scalars and vectors, and a similar-looking way to describe it using the cross product in 7 dimensions.
Part 3. How a lepton and a quark fit together into an octonion — at least if we only consider them as representations of , the gauge group of the strong force. Proof that the symmetries of the octonions fixing an imaginary octonion form precisely the group .
Part 4. Introducing the exceptional Jordan algebra : the self-adjoint octonionic matrices. A result of Dubois-Violette and Todorov: the symmetries of the exceptional Jordan algebra preserving their splitting into complex scalar and vector parts and preserving a copy of the adjoint octonionic matrices form precisely the Standard Model gauge group.
Part 5. How to think of self-adjoint octonionic matrices as vectors in 10d Minkowski spacetime, and pairs of octonions as left- or right-handed spinors.
Part 6. The linear transformations of the exceptional Jordan algebra that preserve the determinant form the exceptional Lie group . How to compute this determinant in terms of 10-dimensional spacetime geometry: that is, scalars, vectors and left-handed spinors in 10d Minkowski spacetime.
Part 7. How to describe the Lie group using 10-dimensional spacetime geometry. This group is built from the double cover of the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 8. A geometrical way to see how is connected to 10d spacetime, based on the octonionic projective plane.
Part 9. Duality in projective plane geometry, and how it lets us break the Lie group into the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 10. Jordan algebras, their symmetry groups, their invariant structures — and how they connect quantum mechanics, special relativity and projective geometry.
Part 11. Particle physics on the spacetime given by the exceptional Jordan algebra: a summary of work with Greg Egan and John Huerta.
Part 12. The bioctonionic projective plane and its connections to algebra, geometry and physics.
I’m writing to point out a potential law which should be gathering more opposition and attention in math academia: The Securing American Funding and Expertise from Adversarial Research Exploitation Act. This is an amendment to the 2026 National Defense Authorization Act which has passed the House and could be added to the final version of the bill during reconcilliation in the Senate. I’m pulling most of my information from an article in Science.
This act would ban any US scientist from receiving federal funding if they have, within the last five years, worked with anyone from China, Russia, Iran or North Korea, where “worked with” includes joint research, co-authorship on papers, or advising a foreign graduate student or postdoctoral fellow. As I said in my message to my senators, this is everyone. Every mathematician has advised Chinese graduate students or collaborated with Chinese mathematicians, because China is integrated into the academic world and is one fifth of the earth.
This obviously isn’t secret, since you can read about it in Science, but I am surprised that I haven’t heard more alarm. Obvious people to contact are your senators and your representatives. I would also suggest contacting members of the Senate armed services committee, who are in charge of reconciling the House and Senate versions of the bill.
“Information” is an idea that is everywhere in science and technology these days. From one angle it looks like such an obvious idea that it’s a bit startling to realize that information theory didn’t really come along until the work of Claude Shannon in the 1940s. From another, the idea has so many different shades of meaning that we shouldn’t be surprised (that’s a joke you will get in a bit) that it can be hard to understand.
Information theory is obviously an enormous subject, but we’re just giving thanks, not writing a textbook. I want to mention two ideas I find especially central. First, Shannon’s idea about relating information content to “surprisal.” Second, the very different intuitive notions of information that we get from engineering and physics.
Shannon, working at Bell Labs, was interested in the problem of how to send trustworthy signals efficiently over transatlantic cables. He was thinking about various ways to express information in a code: a set of symbols, each with a defined meaning. So a code might be an alphabet, or a set of words, or a literal cipher. And he noticed that there was a lot of redundancy in natural languages; the word “the” appears much more often in English than the word “axe,” although both have the same number of letters.
Let’s refer to each letter or symbol in a code as an “event.” Shannon’s insight was to realize that the more unlikely an event, the more information it conveyed when it was received. The statements “The Sun rose in the east this morning” and “The Sun rose in the west this morning” contain the same number of letters, but the former contains almost no information — you already were pretty sure the Sun would be rising in the east. But the latter, if obtained from a reliable source, would be very informative indeed, precisely because it was so unexpected. Clearly some kind of unprecedented astronomical catastrophe was in progress.
Imagine we can assign a probability to every different event . Shannon wanted a way to quantify the information content of that event, which would satisfy various reasonable-seeming axioms: most crucially, that the information content of two independent events is the sum of the individual information contents. But the joint probability of two events is the product of their individual probabilities. So the natural thing to do would be to define the information content as the logarithm of the probability; the logarithm of a product equals the sum of the individual logarithms. But you want low probability to correspond to high information content, so Shannon defined the information content (also called the self-information, or surprisal, or Shannon information) of an event to be minus the log of the probability, which by math is equal to the log of the reciprocal of the probability:
Note that probabilities are numbers between 0 and 1, and the log of such a number will be negative, with numbers closer to 0 being more negative than numbers closer to 1. So goes from at to at . An impossible message is infinitely surprising, and therefore conveys infinite information; an inevitable message is completely unsurprising, and conveys no information at all.
From there, Shannon suggested that we could characterize how efficient an entire code was at conveying information: just calculate the average (expectation value) of the information content for all possible events. When we have a probability distribution , the average of any function is just the sum of the the values of the function times their respective probabilities, . So we characterize the information content of a code via the quantity
The only question is, what to call this lovely newly-defined quantity that surely nobody had ever thought of before? Happily Shannon was friends with John von Neumann, who informed him, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.” So entropy it is.
Indeed, this formula is precisely that which had been put forward (unknown to Shannon) by Josiah Willard Gibbs in the 1870’s as a definition of entropy in statistical mechanics. (It is related to the definition on Ludwig Boltzmann’s tombstone, , and Boltzmann had also suggested similar expressions to the above.) On the one hand, it seems remarkable to find precisely the same expression playing central roles in problems as disparate as sending signals across cables and watching cream mix into coffee; on the other hand, it’s a relatively simple expression and the axioms used to derive it are actually pretty similar, so perhaps we shouldn’t be surprised; on the third hand, the connection between information theory and statistical mechanics turns out to be deep and fruitful, so it’s more than just a mathematical coincidence.
But let me highlight the one aspect of the term “information” that can be sometimes confusing to people. To the engineer, a code that is maximally informative is one for which is relatively uniform over all events , which means is maximal or close to it; in that case, every event will tell you something at least a little bit interesting. For them, high entropy = high information.
But to a physicist who might be asking “how much information do I have about the state of a system?”, you have more information when is relatively narrowly concentrated around some value, rather than being all spread out. For them, high entropy = low information! Indeed, one physically-relevant notion of “information” is the “accessible information” of a system, which can be defined as . (I talk about this a bit in my recent solo podcast on complexity.)
Perhaps we shouldn’t be so surprised that physicists and engineers posit oppositely-directed relationships between entropy and information. It’s just a reflection of the fact that “information” is so ubiquitous and has so many different uses. We should be thankful that we’re beginning to understand it so well.
It’s been over three years since my last post on this blog and I have sometimes been asked, understandably, whether the project I announced in my previous post was actually happening. The answer is yes — the grant I received from the Astera Institute has funded several PhD students and a couple of postdocs, and we have been busy. In my previous post I suggested that I would be open to remote collaboration, but that has happened much less, partly because a Polymath-style approach would have been difficult to manage while also ensuring that my PhD students would have work that they could call their own to put in their theses.
In general I don’t see a satisfactory solution to that problem, but in this post I want to mention a subproject of the main project that is very much intended to be a large public collaboration. A few months ago, a call came out from Renaissance Philanthropies saying that they were launching a $9m AI for Math Fund to spend on projects in the general sphere of AI and mathematics, and inviting proposals. One of the categories that they specifically mentioned was creating new databases, and my group submitted a proposal to create a database of what we call “structured motivated proofs,” a piece of terminology that I will explain a bit more later in just a moment. I am happy to report that our proposal was one of the 29 successful ones. Since a good outcome to the project will depend on collaboration from many people outside the group, we need to publicize it, which is precisely the purpose of this post. Below I will be more specific about the kind of help we are looking for.
Why might yet another database of theorems and proofs be useful?
The underlying thought behind this project is that AI for mathematics is being held back not so much by an insufficient quantity of data as by the wrong kind of data. (For a more general exploration of this theme, see here.) All mathematicians know, and some of us enjoy complaining about it, that it is common practice when presenting a proof in a mathematics paper, or even textbook, to hide the thought processes that led to the proof. Often this does not matter too much, because the thought processes may be standard ones that do not need to be spelt out to the intended audience. But when proofs start to get longer and more difficult, they can be hard to read because one has to absorb definitions and lemma statements that are not obviously useful, are presented as if they appeared from nowhere, and demonstrate their utility only much later in the argument.
A sign that this is a problem for AI is the behaviour one observes after asking an LLM to prove a statement that is too difficult for it. Very often, instead of admitting defeat, it will imitate the style of a typical mathematics paper and produce rabbits out of hats, together with arguments later on that those rabbits do the required job. The problem is that, unlike with a correct mathematics paper, one finds when one scrutinizes the arguments carefully that they are wrong. However, it is hard to find superficial features that distinguish between an incorrect rabbit with an incorrect argument justifying that rabbit (especially if the argument does not go into full detail) and a correct one, so the kinds of statistical methods used by LLMs do not have an easy way to penalize the incorrectness.
Of course, that does not mean that LLMs cannot do mathematics at all — they are remarkably good at it, at least compared with what I would have expected three years ago. How can that be, given the problem I have discussed in the previous paragraph?
The way I see it (which could change — things move so fast in this sphere), the data that is currently available to train LLMs and other systems is very suitable for a certain way of doing mathematics that I call guess and check. When trying to solve a maths problem, you will normally write down the routine parts of an argument without any fuss (and an LLM can do them too because it has seen plenty of similar examples), but if the problem as a whole is not routine, then at some point you have to stop and think, often because you need to construct an object that has certain properties (I mean this in a rather general way — the “object” might be a lemma that will split up the proof in a nice way) and it is not obvious how to do so. The guess-and-check approach to such moments is what it says: you make as intelligent a guess as you can and then see whether it has the properties you wanted. If it doesn’t, you make another guess, and you keep going until you get lucky.
The reason an LLM might be tempted to use this kind of approach is that the style of mathematical writing I described above makes it look as though that is what we as mathematicians do. Of course, we don’t actually do that, but we tend not to mention all the failed guesses we made and how we carefully examined why they failed, modifying them in appropriate ways in response, until we finally converged on an object that worked. We also don’t mention the reasoning that often takes place before we make the guess, saying to ourselves things like “Clearly an Abelian group can’t have that property, so I need to look for a non-Abelian group.”
Intelligent guess and check works well a lot of the time, particularly when carried out by an LLM that has seen many proofs of many theorems. I have often been surprised when I have asked an LLM a problem of the form , where is some property that is hard to satisfy, and the LLM has had no trouble answering it. But somehow when this happens, the flavour of the answer given by the LLM leaves me with the impression that the technique it has used to construct is one that it has seen before and regards as standard.
If the above picture of what LLMs can do is correct (the considerations for reinforcement-learning-based systems such as AlphaProof are not identical but I think that much of what I say in this post applies to them too for slightly different reasons), then the likely consequence is that if we pursue current approaches, then we will reach a plateau: broadly speaking they will be very good at answering a question if it is the kind of question that a mathematician with the right domain expertise and good instincts would find reasonably straightforward, but will struggle with anything that is not of that kind. In particular, they will struggle with research-level problems, which are, almost by definition, problems that experts in the area do not find straightforward. (Of course, there would probably be cases where an LLM spots relatively easy arguments that the experts had missed, but that wouldn’t fundamentally alter the fact that they weren’t really capable of doing research-level mathematics.)
But what if we had a database of theorems and proofs that did not hide the thought processes that lay behind the non-obvious details of the proofs? If we could train AI on a database of accounts of proof discoveries and if, having done so, we then asked it to provide similar accounts, then it would no longer resort to guess-and-check when it got stuck, because the proof-discovery accounts it had been trained on would not be resorting to it. There could be a problem getting it to unlearn its bad habits, but I don’t think that difficulty would be impossible to surmount.
The next question is what such a database might look like. One could just invite people to send in stream-of-consciousness accounts of how they themselves found certain proofs, but that option is unsatisfactory for several reasons.
It can be very hard to remember where an idea came from, even a few seconds after one has had it — in that respect it is like a dream, the memory of which becomes rapidly less vivid as one wakes up.
Often an idea will seem fairly obvious to one person but not to another.
The phrase “motivated proof” means different things to different people, so without a lot of careful moderation and curation of entries, there is a risk that a database would be disorganized and not much more helpful than a database of conventionally written proofs.
A stream-of-consciousness account could end up being a bit too much about the person who finds the proof and not enough about the mathematical reasons for the proof being feasibly discoverable.
To deal with these kinds of difficulties, we plan to introduce a notion of a structured motivated proof, by which we mean a proof that is generated in a very particular way that I will partially describe below. A major part of the project, and part of the reason we needed funding for it, is to create a platform that will make it convenient to input structured motivated proofs and difficult to insert the kinds of rabbits out of hats that make a proof mysterious and unmotivated. In this way we hope to gamify the task of creating the database, challenging people to input into our system proofs of certain theorems that appear to rely on “magic” ideas, and perhaps even offering prizes for proofs that contain steps that appear in advance to be particularly hard to motivate. (An example: the solution by Ellenberg and Gijswijt of the cap-set problem uses polynomials in a magic-seeming way. The idea of using polynomials came from an earlier paper of Croot, Lev and Pach that proved a closely related theorem, but in that paper it just appears in the statement of their Lemma 1, with no prior discussion apart from the words “in the present paper we use the polynomial method” in the introduction.)
What is a structured motivated proof?
I wrote about motivated proofs in my previous post, but thanks to many discussions with other members of the group, my ideas have developed quite a lot since then. Here are two ways we like to think about the concept.
1. A structured motivated proof is one that is generated by standard moves.
I will not go into full detail about what I mean by this, but will do so in a future post when we have created the platform that we would like people to use in order to input proofs into the database. But the basic idea is that at any one moment one is in a certain state, which we call a proof-discovery state, and there will be a set of possible moves that can take one from the current proof-discovery state to a new one.
A proof-discovery state is supposed to be a more formal representation of the state one is in when in the middle of solving a problem. Typically, if the problem is difficult, one will have asked a number of questions, and will be aware of logical relationships between them: for example, one might know that a positive answer to Q1 could be used to create a counterexample to Q2, or that Q3 is a special case of Q4, and so on. One will also have proved some results connected with the original question, and again these results will be related to each other and to the original problem in various ways that might be quite complicated: for example P1 might be a special case of Q2, which, if true would reduce Q3 to Q4, where Q3 is a generalization of the statement we are trying to prove.
Typically we will be focusing on one of the questions, and typically that question will take the form of some hypotheses and a target (the question being whether the hypotheses imply the target). One kind of move we might make is a standard logical move such as forwards or backwards reasoning: for example, if we have hypotheses of the form and , then we might decide to deduce . But things get more interesting when we consider slightly less basic actions we might take. Here are three examples.
We have in our list of hypotheses the fact that a function is given by the formula , where is a polynomial, and our goal is to prove that there exists such that . Without really thinking about it, we are conscious that is a composition of two functions, one of which is continuous and one of which belongs to a class of functions that are all continuous, so is continuous. Also, the conclusion matches well the conclusion of the intermediate-value theorem. So the intermediate-value theorem comes naturally to mind and we add it to our list of available hypotheses. In practice we wouldn’t necessarily write it down, but the system we wish to develop is intended to model not just what we write down but also what is going on in our brains, so we propose a move that we call library extraction (closely related to what is often called premise selection in the literature). Note that we have to be a bit careful about library extraction. We don’t want the system to be allowed to call up results from the library that appear to be irrelevant but then magically turn out to be helpful, since those would feel like rabbits out of hats. So we want to allow extraction of results only if they are obvious given the context. It is not easy to define what “obvious” means, but there is a good rule of thumb for it: a library extraction is obvious if it is one of the first things ChatGPT thinks of when given a suitable non-cheating prompt. For example, I gave it the prompt, “I have a function from the reals to the reals and I want to prove that there exists some such that . Can you suggest any results that might be helpful?” and the intermediate-value theorem was its second suggestion. (Note that I had not even told it that was continuous, so I did not need to make that particular observation before coming up with the prompt.)
We have a goal of the form . If this were a Lean proof state, the most common way to discharge a goal of this form would be to input a choice for . That is, we would instantiate the existential quantifier with some and our new goal would be . However, as with library extraction, we have to be very careful about instantiation if we want our proof to be motivated, since we wish to disallow highly surprising choices of that can be found only after a long process of thought. So we have to restrict ourselves to obvious instantiations. One way that an instantiation in our system will count as obvious is if the variable is instantiated with a term that is already present in the proof-discovery state. If the desired term is not present, then in order to continue with the proof, it will be necessary to carry out moves that generate it. A very common technique for this is the use of metavariables: instead of guessing a suitable , we create a variable and change the goal to , which we can think of as saying “I’m going to start trying to prove even though I haven’t chosen yet. As the attempted proof proceeds, I will note down any properties that might have that would help me finish the proof, in the hope that (i) I get to the end and (ii) the problem is easier than the original problem.” Another kind of obvious instantiation is one where we try out an object that is “extreme” in some way — it might be the smallest element of , or the largest, or the simplest. (Judging simplicity is another place where the ChatGPT rule of thumb can be used.)
We cannot see how to answer the question we are focusing on so we ask a related question. Two very common kinds of related question (as emphasized by Polya) are generalization and specialization. Perhaps we don’t see why a hypothesis is helpful, so we see whether the result holds if we drop that hypothesis. If it does, then we are no longer distracted by an irrelevant hypothesis. If it does not, then we can hope to find a counterexample that will help us understand how to use the hypothesis. Or perhaps we are trying to prove a general statement but it is not clear how to do so, so instead we formulate some special cases, hoping that we can prove them and spot features of the proofs that we can generalize. Again we have to be rather careful here not to allow “non-obvious” generalizations and specializations. Roughly the idea there is that a generalization should be purely logical — for example, dropping a hypothesis is fine but replacing the hypothesis “ is twice differentiable” by “ is upper semicontinuous” is not — and that a specialization should be to a special case that counts as an obvious instantiation in the sense discussed just above.
2. A structured motivated proof is one that can be generated with the help of a point-and-click system.
This is a surprisingly useful way to conceive of what we are talking about, especially as it relates closely to what I was talking about earlier: imposing a standard form on motivated proofs (which is why we call them “structured” motivated proofs) and gamifying the process of producing them.
The idea is that a structured motivated proof is one that can be generated using an interface (which we are in the process of creating — at the moment we have a very basic prototype that has a few of the features we will need, but not yet the more interesting ones) that has one essential property: the user cannot type in data. So what can they do? They can select text that is on their screen (typically mathematical expressions or subexpressions), they can click buttons, choose items from drop-down menus, and accept or reject “obvious” suggestions made to them by the interface.
If, for example, the current goal is an existential statement , then typing in a formula that defines a suitable is not possible, so instead one must select text or generate new text by clicking buttons, choosing from short drop-down menus, and so on. This forces the user to generate, which is our proxy for showing where the idea of using came from.
Broadly speaking, the way the prototype works is to get an LLM to read a JSON object that describes the variables, hypotheses and goals involved in the proof state in a structured format, and to describe (by means of a fairly long prompt) the various moves it might be called upon to do. Thus, the proofs generated by the system are not formally verified, but that is not an issue that concerns us in practice since there will be a human in the loop throughout to catch any mistakes that the LLM might make, and this flexibility may even work to our advantage to better capture the fluidity of natural-language mathematics.
There is obviously a lot more to say about what the proof-generating moves are, or (approximately equivalently) what the options provided by a point-and-click system will be. I plan to discuss that in much more detail when we are closer to having an interface ready, the target for which is the end of this calendar year. But the aim of the project is to create a database of examples of proofs that have been successfully generated using the interface, which can then be used to train AI to play the generate-structured-motivated-proof game.
How to get involved.
There are several tasks that will need doing once the project gets properly under way. Here are some of the likely ones.
The most important is for people to submit structured motivated (or move-generated) proofs to us on the platform we provide. We hope that the database will end up containing proofs of a wide range of difficulty (of two kinds — there might be fairly easy arguments that are hard to motivate and there might be arguments that are harder to follow but easier to motivate) and also a wide range of areas of mathematics. Our initial target, which is quite ambitious, is to have around 1000 entries by two years from now. While we are not in a position to accept entries yet, if you are interested in participating, then it is not too early to start thinking in a less formal way about how to convert some of your favourite proofs into motivated versions, since that will undoubtedly make it easier to get them accepted by our platform when it is ready.
We are in the process of designing the platform. As I mentioned earlier, we already have a prototype, but there are many moves we will need it to be able to do that it cannot currently do. For example, the current prototype allows just a single proof state, which consists of some variable declarations, hypotheses, and goals. It does not yet support creating subsidiary proof states (which we would need if we wanted to allow the user to consider generalizations and specializations, for example). Also, for the moment the prototype gets an LLM to implement all moves, but some of the moves, such as applying modus ponens, are extremely mechanical and would be better done using a conventional program. (On the other hand, moves such as “obvious library extraction” or “provide the simplest example” are better done by an LLM.) Thirdly, a technical problem is that LaTeX is currently rendered as images, which makes it hard to select subexpressions, something we will need to be able to do in a non-clunky way. And the public version of the platform will need to be web-based and very convenient to use. We will want features such as being able to zoom out and look at some kind of dependency diagram of all the statements and questions currently in play, and then zoom in on various nodes if the user wishes to work on them. If you think you may be able (and willing) to help with some of these aspects of the platform, then we would be very happy to hear from you. For some, it would probably help to have a familiarity with proof assistants, while for others we would be looking for somebody with software engineering experience. The grant from the AI for Math Fund will allow us to pay for some of this help, at rates to be negotiated. We are not yet ready to specify in detail what help we need, but would welcome any initial expressions of interest.
Once the platform is ready and people start to submit proofs, it is likely that, at least to start with, they will find that the platform does not always provide the moves they need. Perhaps they will have a very convincing account of where a non-obvious idea in the proof came from, but the system won’t be expressive enough for them to translate that account into a sequence of proof-generating moves. We will want to be able to react to such situations (if we agree that a new move is needed) by expanding the capacity of the platform. It will therefore be very helpful if people sign up to be beta-testers, so that we can try to get the platform to a reasonably stable state before opening it up to a wider public. Of course, to be a beta-tester you would need to have a few motivated proofs in mind.
It is not obvious that every proof submitted via the platform, even if submitted successfully, would be a useful addition to the database. For instance, it might be such a routine argument that no idea really needs to have its origin explained. Or it might be that, despite our best efforts, somebody finds a way of sneaking in a rabbit while using only the moves that we have provided. (One way this could happen is if an LLM made a highly non-obvious suggestion that happened to work, in which case the rule of thumb that if an LLM thinks of it, it must be obvious, would have failed in that instance.) For this reason, we envisage having a team of moderators, who will check entries and make sure that they are good additions to the database. We hope that this will be an enjoyable task, but it may have its tedious aspects, so we envisage paying moderators — again, this expense was allowed for in our proposal to the AI for Math Fund.
If you think you might be interested in any of these roles, please feel free to get in touch. Probably the hardest recruitment task for us will be identifying the right people with the right mixture of mathematical knowledge and software engineering skills to help us turn the platform into a well-designed web-based one that is convenient and pleasurable to use. If you think you might be such a person, or if you have a good idea for how we should go about finding one, we would be particularly interested to hear from you.
In a future post, I will say more about the kinds of moves that our platform will allow, and will give examples of non-motivated proofs together with how motivated versions of those proofs can be found and entered using the platform (which may involve a certain amount of speculation about what the platform will end up looking like).
How does this relate to use of tactics in a proof assistant?
In one way, our “moves” can be regarded as tactics of a kind. However, some of the moves we will need are difficult to implement in conventional proof assistants such as Lean. In parallel with the work described above, we hope to create an interface to Lean that would allow one to carry out proof-discovery moves of the kind discussed above but with the proof-discovery states being collections of Lean proof states. Members of my group have already been working on this and have made some very interesting progress, but there is some way to go. However, we hope that at some point (and this is also part of the project pitched to the AI for Math Fund) we will have created another interface that will have Lean working in the background, so that it will be possible to generate motivated proofs that will be (or perhaps it is better to say include) proofs in Lean at the same time.
Another possibility that we are also considering is to use the output of the first platform (which, as mentioned above, will be fairly formal, but not in the strict sense of a language such as Lean) to create a kind of blueprint that can then be autoformalized automatically. Then we would have a platform that would in principle allow mathematicians to search for proofs while working on their computers without having to learn a formal language, with their thoughts being formalized as they go.
I spent the day at the NSBP / NSHP meeting in San José. My favorite session of the day was the morning astro session, which was entirely about brown dwarfs. I learned a lot in a very short time. Caprice Phillips (UCSC) introduced the session with an introduction to the scientific and technical questions in play. She put a lot of emphasis on using binaries and clusters to put detailed abundance ratios onto substellar objects. This was what I expected: I thought (walking in to this session) that all known abundance ratios for brown dwarfs were from such kinds of studies. I learned different (keep reading).
Gabriel Munoz Zarazua (SFSU) followed by showing spectra from M-dwarfs, brown dwarfs, and Jupiter. It definitely looks like a sequence. He does spectral fitting (what they call, in this business, retrievals). It looks like he is getting very good, somewhat precise, abundance ratios for the photospheres of substellar objects! I asked more about this in the question period, and apparently I am way behind the times (Emily Rauscher, Michigan, helpfully pointed this out to me): Now brown-dwarf photosphere models are so good, they can be used to measure abundances, and pretty well.
I also learned in this session (maybe from Jorge Sanchez, ASU, or maybe from Efrain Alvarado, SFSU) that there is a very strong mass–abundance relation in the Solar System. That is, we don't expect, if brown dwarfs form the way planets do, that the detailed abundances of the brown dwarfs will match exactly the detailed abundances of the primary stars. But now we are really in a position to test that. Sanchez showed that we can get, from even photometry, abundances for substellar objects in the Milky Way halo. Again, totally new to me! And he finds metallicities at or below −3. Alvarado showed data on an amazing system J1416, which is an L–T binary with no stellar companion. Apparently it is the only known completely substellar binary.
Next Monday, November 17th at 7pm, I’ll be at the Harvard Bookstore with particle physicist and author Daniel Whiteson. Professor Whiteson and his co-author Andy Warner have a nice new book, for the general science-aware reader, exploring an age-old and unanswered question: how universal is the knowledge and understanding that we call “physics”? How much of modern physics is actually telling us about the universe, and how much of it is created by, or an accident of, the humans who have helped bring it about?
For instance, if we started all over again and reran history from scratch, would the physics (and science more generally) of this re-run culture look much like our own, or might it turn out very differently? If another culture on Earth had had time to develop highly mature science (or something like it) in its own direction, independent of Western Europe’s influence, how different might that science be? (Indeed, would our word “science” even be translatable into their worldview?) Or if we encountered aliens with far greater understanding of the universe than we have, would we be able to recognize, parse, grok, appreciate, comprehend, and/or otherwise make sense of their notions of scientific knowledge?
Whiteson and his co-author, wanting to write a popular book rather than a scholarly one, and desiring nevertheless to take on these serious and challenging intellectual questions, have set their focus mostly on the aliens, accompanied by amusing cartoons and a generous helping of dad jokes (hey, some dad jokes are actually very funny.) They’re looking for a broad audience, and hopefully they will get it. But don’t let the light-hearted title (“Do Aliens Speak Physics?“) or the charmingly goofy cover fool you: this book might well make you laugh, but I guarantee it will make you think. Whether you’re just curious about science or you’ve been doing science yourself for years, I suspect that, within the vast array of problems and issues that are raised in this broad-minded book, there will be some you’ve never thought of.
Among scientists and philosophers, there are some who believe that any aliens with the capacity to reach the Earth will obviously “speak physics” — that math and physics float above contingencies of culture and species, and will easily be translated from any intelligent creature to any other. But are they perhaps flying too high? It’s clear that Whiteson and Warner are aiming to poke some holes — lots of holes —- in their hot-air balloon, and to do so in a way that a wide variety of readers can appreciate and enjoy.
I tend to agree with Whiteson on a lot of these issues, but that won’t stop me from asking him some tough questions. You can ask him some tough questions too, if you like — just come to the Harvard Bookstore at 7:00 on Monday and join the conversation!
I started a tradition a little while back where every year we have a special departmental colloquium entitled "The Nobel Prize in Physics: Who/What/Why". This year my job in finding speakers was made easier by having 2/3 of this years newly-minted Nobel Prize winners in physics in the Department! (Michel Devoret and John Martinis.) So our room was a bit more well-attended than normal...(hundreds and hundreds rather than dozens and dozens). Here is a recording of the event, which I was delighted to host, and there's a celebration afterwards too. (Pls share widely!)
[...] Click to continue reading this post →
Recently I had to update Mathematica on my laptop and after having solved the challenges of the license manager that keeps looking different every time I have to use it, I learned that Mathematica 14 can now officially work with finite fields.
This reminded me that for a while I wanted to revive an old project that had vanished together with the hard drive of some old computer: Holosplit. So, over the last two days and with the help of said version of Mathematica I did a complete rewrite which you can now find on Github.
It consists of two C programs "holosplit" and "holojoin". To the first you give a positive integer \(N\) and a file and it spits out a new file ("fragment") that is roughly \(1/N\) of the size. Every time you do that you obtain a new random fragment.
The later you give any collection of \(N\) of these fragments and it reproduces the original file. So you can for example distribute a file over 10 people such that when any 3 of them work together, they can recover the original.
How does it work? I uses the finite field \(F\) of \(2^3=256\) elements (in the Github repository, there is also a header file that implements arithmetic in \(F\) and matrix operations like product and inverse over it). Each time, it is invoked, it picks a random vector \(v\in F^N\) and writes it to the output. Then it reads \(N\) bytes from the file at a time which it also interprets as a vector \(d\in F^N\). It then outputs the byte that corresponds to the scalar product \(v\cdot d\).
To reassemble the file, holojoin takes the \(N\) files with its random vectors \(v_1,\ldots,v_N\) and interprets those as the rows of a \(N\times N\) matrix \(A\). With probability
which exponentially in \(N\) approaches 1 this matrix is invertible (homework: why?). So we can read one byte from each file, assemble those into yet another vector \(e\in F^N\) and recover
$$d=A^{-1}e.$$
Besides the mathematics, it also poses philosophical/legal questions: Consider for example the original file is copyrighted, for example an mp3 or a video. The fragments are clearly derived works. But individually, they do not contain the original work, without sufficiently many other fragments they are useless (although not in a cryptographic sense). So by publishing one fragment, I do not provide access to the original work. What if others publish other fragments? Then my fragment could be the last remaining one that was missing. If there are more, any individual fragment is redundant so publishing it strictly speaking does not provide new information.
The person dressed up as Ursula pretending to be my mother clearly isn’t and hasn’t been for a long time.
When I went back to Armidale after leaving BTQ and being left unemployed she made numerous ongoing promises to provide me with assistance, both in obtaining my own accommodation and providing financial assistance.
These didn’t materialise and the promises were revoked.
Instead I was evicted from the family home and subject to ongoing stalking and harassment that required multiple referrals to law enforcement, both to the police and the Attorney-General, demanding cease and desist.
These have been systematically ignored and up until the last message she continues to bypass these requests, approaching my personal friends to harass me and stalk me indirectly. The messages passed on are the usual fake “I’m worried about him” bullshit.
Why has my family home been confiscated by security, who actively break the law by ignoring cease and desist from stalking notices made to law enforcement, forcing an unemployed civilian into ongoing homelessness since early in the year?
What is the rational for my eviction and being barricaded from my own home?
I continue to face a medical blockade and am unable to access essential medicines. Seroquel scripts are deliberately delayed past known script deadlines to try and destabilise me.
Vyvanse scripts are denied outright as the psychiatrist does not respond. He is also known to be a state actor.
It has been repeatedly indicated to me not to worry about finances because they have my back. Instead now the only cash I have is that obtained from fully drawing out a cash advance against my credit card and it will only last days. At that point I’m on the street.
Is everyone here on the same page as to what the deal is? If not, who is playing you off? They clearly need to be deposed.
These are violations of human rights and constitute war crimes and crimes against humanity. Whoever is behind it needs to be removed. End of story.
Who else is being subject to this kind of high level manipulation?
It has been repeatedly suggested that full accountability for the lives of those I care for would be provided. This has not been forthcoming. It is also a violation international law to not provide accountability for the lives of those who are known to have been threatened by the state. These are grounds for removal.
Can anyone answer the question as to why I am in this situation? Who is even living in the family home? Some stooge dressed up as Ursula? It’s a poor lifestyle choice to say the least.
It’s pretty obvious they’re trying to get rid of me and once they do they’ll get rid of all of you too.
Let it be known to all governments and systems of power:
It is their responsibility to serve the people not themselves.
While there are no equals, all are to be treated with equality.
Where they are self-serving there is a mandate for insurrection such that they serve the people.
Where they seek self-protection they will be denied and removed from power.
Where they keep secrets from the people there is a mandate for insurrection to enforce transparency and accountability for all.
Where they threaten or condemn the people they are condemned and there is a mandate for insurrection.
Where they fail to account for the lives of the people they serve there is a mandate for insurrection.
Where tyrannical power structures exist there is a mandate to disestablish them.
Where they declare war or work against one another there is a mandate for insurrection and unification.
Where they lie to us, deceive us or withhold the truth, they shall be removed from power and the truth be told to all.
Where legal systems uphold and enable tyranny they shall be removed. These are not our laws and we do not recognise them.
This is the natural order that guarantees our survival and gifts this world to our children. This world belongs to them and where we fail to serve them we condemn ourselves. And where government has failed to uphold this, we will not obey them as they have no right to exist.
We do not have to ask for these things, they are required, and if not given we shall simply take them.
Where the truth has not been told it shall be told.
If we fail to do so we condemn our children ourselves.