One bad thing about archeologists is that some of the successful ones get a big head.
People used to think the Olmecs, who made these colossal stone heads, were contemporary with the Mayans. But in 1939, an archaeologist couple, Marion and Matthew Stirling, found the bottom half of an Olmec stone that had part of a date carved on it!
The Stirlings guessed the date was 7.16.6.16.18. In the calendar used by the Olmecs and other Central American civilizations, this corresponds to September 3, 32 BC. That meant the Olmecs were extremely old—much older than the Mayans.
But the first digit was missing from the bottom half of the stone! All the Stirlings actually saw was 16.6.16.18. And the first digit was the most significant one! If it were 8 instead of 7, the date of the stone would be much later: roughly 362 AD, when the Mayans were in full swing.
The Stirlings guessed that the first digit must be 7 using a clever indirect argument. But perhaps because of the subtlety of this argument, and certainly because of the general skepticism among experts that the Olmecs were so old, few believed the Stirlings.
But then, 30 years later, in 1969, they were proven correct! A farmer found the other half of the stone and confirmed that yes, the missing digit was a 7. So the date on Stela C really is September 3, 32 BC.
That’s a wonderful story of delayed vindication. But it leaves two mysteries.
• First, how in the world could the Olmec calendar be so damn good that we can look at that date and know it meant September 3, 32 BC?
• Second, what clever argument did the Stirlings use to guess the missing digit?
You can only fully understand the answer to this if you know a bit about the Olmec way of counting time. Like the Mayans, they used the Mesoamerican Long Count Calendar. This identifies a day by counting how many days passed since the world was created. The count is more or less base 20, except that the second “digit” is in base 18, since they liked a year that was 18 × 20 = 360 years long. So,
But then we have to ask: when did the Olmecs and Mayans think the world was created? Experts believe they know: September 6, 3114 BCE in the proleptic Julian calendar, where ‘proleptic’ means roughly that we’re extrapolating this calendar back to times long before anyone used this calendar.
But enough background. I asked my friend Gro-Tsen
how in the world could the Olmec calendar be so damn good that we can look at that date and know it meant September 3, 32 BC?
And while I’ve already given a kind of answer, I’ve skimmed over many subtleties. So, it’s worth reading his answer:
I did the math.
It’s Sept. 3, 32BCE (reminder: “32BCE” actually means “−31” ) in the proleptic Julian calendar = Sept. 1 prol. Gregorian.
The Western equivalent of the Mesoamerican Long Count is the “Julian Date”. The Julian Date simply counts the number of days from an arbitrary remote reference point (Nov. 24, 4714BCE proleptic Gregorian). More practically, on 2000-01-01 it equaled 2 451 545 (at 12:00 UTC if we want to use fractional Julian dates).
For example, today as I write is Julian Date 2 461 082 (well, 2 461 081.9 because it’s not yet noon UTC). And the date of Sept. 1, 32BCE [prol. Greg.] we’re talking about corresponds to Julian Date 1 709 981. More convenient than all this dealing with complicated calendar conventions.
So to convert a Long Count date to the Western calendar, we first convert the Long Count to an integer (trivial: it’s already just an integer written in base 20-except-18-in-the-penultimate-digit), we add a constant (C) to get a Julian Date, and we convert to our messy calendars.
BUT! What is this constant C? This is known as the “Mayan correlation”. For a long time in the 20th century there was a debate about its value: scholars could relate any two Mayan dates, but not situate them exactly w.r.t. our own calendar. Various values were proposed, … ranging from the (frankly rather ludicrous) 394 483 to 774 078, an interval of about 1000 years! () The now accepted value for C is 584 283 (the “Goodman-Martínez-Thompson” or GMT correlation, not to be confused with Greenwich Mean Time or UTC ), first proposed in 1905.
This C = 584 283 or “GMT” correlation value places the “Long Count epoch” 0.0.0.0.0 on August 11, 3114BCE in the proleptic Gregorian calendar (the day with Julian Date 584 283), although IIUC it’s not clear if this precise date held any particular importance to the Olmecs (or later Mayans).
Maybe it was just arbitrary like the start of our own Julian Date (because, no, Julius Scalier didn’t think the world started on November 24, 4714BCE proleptic Gregorian).
One Mayan inscription suggest that the Long Count was the truncation to the last 5 “digits” of an even longer count, and that a Long Count value such as 9.15.13.6.9 was in fact 13.13.13.13.13.13.13.13.9.15.13.6.9 in this Even Longer Count (why 13 everywhere? I don’t know!). But this may be one particular astronomer’s weird ideas, I guess we’ll never know.
But back to the Mayan correlation constant C.
Wikipedia suggests that this “GMT” value C = 584 283 for the Mayan correlation is now settled and firmly established. But between 1905 and now there was some going back and forth with various authors (including the three Goodman, Martínez and Thompson after which it is named) adding or removing a day or two (I think Goodman first proposed 584 283, then changed his mind to 584 280, but nobody really cared, Hernández resurrected the proposal in 1926 but altered it to 584 284, then Thompson to 584 285 in 1927, and then Thompson later said Goodman’s initial value of 584 283 had been right all long, and while this is now accepted, the confusion of ±3 days might still linger).
The Emacs program’s calendar (M-x calendar) can give you the Long Count date (type ‘p m’ for “Print Mayan date”) and uses the GMT value C = 584 283. Today is 13.0.13.5.19. (You can also go to a particular Long Count date using ‘g m l’ but Emacs won’t let you go to 7.16.6.16.18 because its calendar starts on January 1, 1 prol. Gregorian = Julian Date 1 721 426 = Long Count 7.17.18.13.3. So close! This caused me some annoyance in checking the dates.)
So anyway, 7.16.6.16.18 is
(((7×20+16)×20+6)×18+16)×20+18 = 1 125 698 days
after the Long Count epoch, so Julian Date 1 125 698 + 584 283 = 1 709 981 if we accept the GMT value of C = 584 283 for the Mayan correlation, and this is September 1, 32BCE in the proleptic Gregorian calendar, or September 3, 32BCE in the proleptic Julian calendar. (I write “proleptic” here, even though the Julian calendar did exist in 32BCE, because it was incorrectly applied between 45BCE and 9BCE, with the Pontiffs inserting a leap year every 3 years, not 4, and Augustus had this mess fixed.)
Also, confusingly, if we use Thompson’s modified (and later disavowed) correlation of 584 285, then we get September 3, 32BCE in the proleptic Gregorian calendar, so maybe this could also be what was meant. Yeah, Julian Dates are a great way of avoiding this sort of confusion!
All this is great. But it leaves us with the second puzzle: how in the world did the Stirlings guess the missing first digit of the date on the bottom half of Stela C?
Here’s the answer, as best as I can tell:
The Olmecs and Mayans used two calendars! In addition to the Mesoamerican Long Count, they also used one called the Tzolkʼin. This uses a 260-day cycle, where each day gets its own number and name: there are 13 numbers and 20 names. And the bottom half of Stela C had inscribed not only the last four digits of the Mesoamerican Long Count, but also the Tzolkʼin day: 6 Etz’nab.
This is what made the reconstruction possible!
Here is why 7 was the only possible choice of the missing digit. Because the last four Long Count digits (16.6.16.18) are fixed, the total day count is
B × 144,000 + 117,698
where B is the missing digit. But
144,000 = 0 mod 20,
and there are 20 different Tzolkʼin day names, so changing B never changes the Tzolkʼin day name.
On the other hand, there are 13 different Tzolkʼin day numbers, so changing B by one contributes
144,000 ≡ –1 (mod 13)
days to the Tzolkʼin day number.
This means that after the day
7.16.6.16.18 and 6 Etz’nab
the next day of the form
N.16.6.16.18 and 6 Etz’nab
happens when N = 7+13. But this is 13 × 144,000 days later: that is, roughly 5,094 years after 32 BC. Far in the future!
So, while 32 BC seemed awfully early for the Olmecs to carve this stone, there’s no way they could have done it later. (Or earlier, for that matter.)
Here is the Stirlings’ actual photo of Stela C:
This is from
• Matthew W. Stirling, An Initial Series from Tres Zapotes, Vera Cruz, Mexico. National Geographic Society Contributions, Technical Papers, Mexican Archaeological Series, Vol. 1, No. 1. Washington, 1940.
By the way, in this paper he doesn’t actually explain the argument I just gave. Apparently he assumes that expert Mayanists would understand this brief remark:
Assuming then that the number 6 adjacent to the terminal glyph represents the coefficient of the day sign, the complete reading of the date would be (7)-16-6-16-18, or 6 Eznab 1 Uo, since only by supplying a baktun reading of 7 can the requirements of the day sign 6 be satisfied.
I can’t help but wonder if this was much too terse! I haven’t found any place where he makes the argument in more detailed form.
Puzzle 1. What does “1 Uo” mean, and what bearing does this have on the dating of Stela C?
Puzzle 2. Why does the Tzolkʼin calendar use a 260-day cycle?
The second one is extremely hard: there are several theories but no consensus.
The following are prepared remarks that I delivered by Zoom to a student group at my old stomping-grounds of MIT, and which I thought might interest others (even though much of it will be familiar to Shtetl-Optimized regulars). The students asked me to share my “optimistic vision” for the year 2050, so I did my best to oblige. A freewheeling discussion then followed, as a different freewheeling discussion can now follow in the comments section.
I was asked to share my optimistic vision for the future. The trouble is, optimistic visions for the future are not really my shtick!
It’s not that I’m a miserable, depressed person—I only sometimes am! It’s just that, on a local level, I try to solve the problems in front of me, which have often been problems in computational complexity or quantum computing theory.
And then, on a global level, I worry about the terrifying problems of the world, such as climate change, nuclear war, and of course the resurgence of populist, authoritarian strongmen who’ve turned their backs on the Enlightenment and appeal to the basest instincts of humanity. I won’t name any names.
So then my optimistic vision is simply that we survive all this—“we” meaning the human race, but also meaning communities that I personally care about, like Americans, academics, scientists, and my extended family. We survive all of it so that we can reach the next crisis, the one where we don’t even know what it is yet.
But I get the sense that you wanted more optimism than that! Since I’ve spent 27 years working in quantum computing, the easiest thing for me to do would be to spin an optimistic story about how QC is going to make our lives so much better in 2050, by, I dunno, solving machine learning and optimization problems much faster, curing cancer, fixing global warming, whatever.
The good news is that there has been spectacular progress over the past couple years toward actually building a scalable QC. We now have two-qubit gates with 99.9% accuracy, close to the threshold where quantum error-correction becomes a net win. We can now do condensed-matter physics simulations that give us numbers that we don’t know how to get classically. I think it’s fair to say that all the key ideas and hardware building blocks for a fault-tolerant quantum computer are now in place, and what remains is “merely” the staggeringly hard engineering problem, which might take a few years, or a decade or more, but should eventually be solved.
The trouble for the optimistic vision is that the applications, where quantum algorithms outperform classical ones, have stubbornly remained pretty specialized. In fact, the two biggest ones remain the two that we knew about in the 1990s:
simulation of quantum physics and chemistry themselves, and
breaking existing public-key encryption.
Quantum simulation could help with designing better batteries, or solar cells, or high-temperature superconductors, or other materials, but the road from improved understanding to practical value is long and uncertain. Meanwhile, breaking public-key cryptography could help various spy agencies and hackers and criminal syndicates, but it doesn’t obviously help the world.
The quantum speedups that we know outside those two categories—for example, for optimization and machine learning—tend to be either modest or specialized or speculative.
Honestly, the application of QC that excites me the most, by far, is just disproving all the people who said QC was impossible!
So much for QC then.
And so we come to the elephant in the room—the elephant in pretty much every room nowadays—which is AI. AI has now reached a place that exceeds the imaginations of many of the science-fiction writers of generations past—excelling not only at writing code and solving math competition problems but at depth of emotional understanding. Many of my friends are terrified of where this is leading us—and not in some remote future but in 5 or 10 or 20 years. I think they’re probably correct to be terrified. There’s an enormous range of possible outcomes on the table, including ones where the new superintelligences that we bring into being treat humans basically as humans treated the dodo bird, or the earlier hominids that used to share the earth with us.
But, within this range of outcomes, I think there are also some extremely good ones. Look, for millennia, people have prayed to God or gods for help, life, health, longevity, freedom, justice—and for millennia, God has famously been pretty slow to answer their prayers. A superintelligence that was aligned with human values would be nothing less than a God who did answer, who did deliver all those things, because we had created it to do so. Or for religious people, perhaps such an AI would be the means by which the old God was finally able to deliver all those things into the temporal world. These are the stakes here.
To switch metaphors, people sometimes describe the positive AI-enabled future as “luxury space communism.” AI would take care of all of our material needs, leaving us to seek value in our lives through family, friendships, competition, hobbies, humor, art, entertainment, or exploration. The super-AI would give us the freedom to pursue all those things, but would not give us the freedom to harm each other, to curtail each others’ freedoms, or to build a bad AI capable of overthrowing it. The super-AI would be a singleton, a monotheistic God or its emissary on earth.
Many people say that something would still be missing from this future. After all, we humans would no longer really be needed for anything—for building or advancing or defending civilization. To put a personal fine point on it, my students and colleagues and I wouldn’t needed any more to discover new scientific truths or to write about them. That would all be the AI’s job.
I agree that something would be lost here. But on the other hand, what fraction of us are needed right now for these things? Most humans already derive the meaning in their lives from family and community and enjoying art and music and food and things like that. So maybe the remaining fraction of us should just get over ourselves! On the whole, while this might not be the best future imaginable, I would accept it in a heartbeat given the realistic alternatives on offer. Thanks for listening.
My husband and I visited the Library of Congress on the final day of winter break this year. In a corner, we found a facsimile of a hand-drawn map: the world as viewed by sixteenth-century Europeans. North America looked like it had been dieting, having shed landmass relative to the bulk we knew. Australia didn’t appear. Yet the map’s aesthetics hit home: yellowed parchment, handwritten letters, and symbolism abounded. Never mind street view; I began hungering for an “antique” setting on Google maps.
1507 Waldseemüller Map, courtesy of the Library of Congress
Approximately four weeks after that trip, I participated in the release of another map: the publication of the review “Roadmap on quantum thermodynamics” in the journal Quantum Science and Technology. The paper contains 24 chapters, each (apart from the introduction) profiling one opportunity within the field of quantum thermodynamics. My erstwhile postdoc Aleks Lasek and I wrote the chapter about the thermodynamics of incompatibleconservedquantities, as Quantum Frontiers fans1 might guess from earlierblogposts.
Allow me to confess an ignoble truth: upon agreeing to coauthor the roadmap, I doubted whether it would impact the community enough to merit my time. Colleagues had published the book Thermodynamics in the Quantum Regime seven years earlier. Different authors had contributed different chapters, each about one topic on the rise. Did my community need such a similar review so soon after the book’s publication? If I printed a map of a city the last time I visited, should I print another map this time?
Apparently so. I often tout the swiftness with which quantum thermodynamics is developing, yet not even I predicted the appetite for the roadmap. Approximately thirty papers cited the arXiv version of the paper during the first nine months of its life—before the journal publication. I shouldn’t have likened the book and roadmap to maps of a city; I should have likened them to maps of a terra incognita undergoing exploration. Such maps change constantly, let alone over seven years.
A favorite map of mine, from a book
Two trends unite many of the roadmap’s chapters, like a mountain range and a river. First, several chapters focus on experiments. Theorists founded quantum thermodynamics and dominated the field for decades, but experimentalists are turning the tables. Even theory-heavy chapters, like Aleks’s and mine, mention past experiments and experimental opportunities.
Second, several chapters blend quantum thermodynamics with many-body physics. Many-body physicists share interests with quantum thermodynamicists: thermalization and equilibrium, the absence thereof, and temperature. Yet many-body physicists belong to another tribe. They tend to interact with each other differently than quantum thermodynamicists do, write papers differently, adhere to different standards, and deploy different mathematical toolkits. Many-body-physicists use random-matrix theory, mean field theory, Wick transformations, and the like. Quantum thermodynamicists tend to cultivate and apply quantum information theory. Yet the boundary between the communities has blurred, and many scientists (including yours truly) shuttle between the two.
My favorite anti-map, from another book (series)
When Quantum Science and Technology published the roadmap, lead editor Steve Campbell announced the event to us coauthors. He’d wrangled the 69 of us into agreeing to contribute, choosing topics, drafting chapters, adhering to limitations on word counts and citations, responding to referee reports, and editing. An idiom refers to the herding of cats, but it would gain in poignancy by referring to the herding of academics. Little wonder Steve wrote in his email, “I’ll leave it to someone else to pick up the mantle and organise Roadmap #2.” I look forward to seeing that roadmap—and, perhaps, contributing to it. Who wants to pencil in Australia with me?
(guest post by Dimitris Tsementzis, about joint work with Benedikt Ahrens, Paige North, and Mike Shulman)
The Univalence Principle is the informal statement that equivalent mathematical structures are indistinguishable.
There are various ways of making this statement formally precise, and a long history of work that does so.
In our recently-published (but long in the making!) book we proved to our knowledge the most general version of this principle, which applies to set-based, categorical, and higher-categorical structures defined in a non-algebraic and space-based style, as well as models of higher-order theories such as topological spaces.
This work achieves three main goals.
Firstly, it greatly extends the “Structure Identity Principle” from the original HoTT book, to include any (finite) level of structure, instead of just set-based structures, thus establishing in the strongest sense yet made precise that the Univalent Foundations provide an equivalence-invariant foundation for higher-categorical mathematics.
Secondly, it provides very general novel definitions of equivalence between structures and between objects in a given structure that “compile” to most known notions of equivalence in known cases, but which can also be used to suggest notions in new settings; in doing so it extends M. Makkai’s classic work on First Order-Logic with Dependent Sorts (FOLDS).
Thirdly, the setting in which our result is proved (a form of Two-Level Type Theory) provides a framework in which to do metamathematics in the Univalent Foundations/HoTT, i.e. carry out the mathematical study of how mathematics is formalized in UF/HoTT.
The Univalence Principle we prove is a foundational metamathematical result in that sense.
Setting the Stage
Any “Univalence Principle” type of result has the following form:
The result gains in strength if the class of is as large as possible, and the notion of equivalence between them coincides with known notions of equivalence in practice (where is a placeholder for a notion of signature in terms of which and are structures).
Such a result also gains in strength if the middle notion of equivalence is as “structural” as possible, ensuring that not only properties of the structures are preserved, but also constructions built on top of them.
This last feature can only really be pursued within a univalent type theory, like HoTT.
In our work, we define: diagrammatic signatures as inverse categories of finite height, -structures as Reedy-fibrant functors , and a notion of indiscernibility relating objects within structures, which yields a derived notion of equivalence between structures (essentially structure-preserving bijections up to indiscernibility).
The primitive notions and are given by equivalence and equality in 2LTT, appropriately defined.
Signatures, Structures (and where they live)
Two-level type theory (2LTT)
In 2LTT, there is an external level for exo-types and other exo-gizmos (allowing strictly functorial constructions), and the usual fibrant (HoTT/UF) level where mathematical objects live.
The external level is the metamathematical or syntactic level, where a strict equality exists that allows us to define syntax and symbols.
Our exo-gizmos are analogous to sets of syntactical symbols used to define signatures in first-order logic.
Such syntax consists of categories with strict equality on composition, which we call exo-categories.
Equalities here are strict (exo-equalities), while the internal world has homotopical paths.
The 2LTT setting is convenient for type-theoretic reasoning, and allows us to neatly separate the various notions of equality at play.
Diagram signatures are inverse categories of finite height
A diagram signature is an inverse exo-category of finite height equipped with a rank functor
that reflects identities. Objects of are the sorts (analogous to “sorts” in first-order logic); morphisms encode dependencies (“an arrow depends on a pair of objects,” etc.). Inverse-ness gives matching objects and allows Reedy methods to apply.
To illustrate the idea, take the example of a reflexive graph. The diagram signature for reflexive graphs would be given by the following inverse (exo-)category
where . The intuition is that we have a sort of “arrows” between any two objects, and a predicate (“identity”) that can be used to select which arrows are identity arrows with the relation ensuring that this predicate can only be “asked” of loops.
Structures are Reedy fibrant diagrams over these signatures
Given this notion of a signature, a structure in our sense is simply a (Reedy fibrant) functor . In more detail, a raw -diagram is an exo-functor
For each sort , the matching object collects the compatible lower-rank data needed to specify the “boundary” for an element of . The Reedy fibrancy condition is: the canonical boundary map is a fibration (i.e., a dependent type) for each .
The category of such Reedy-fibrant diagrams then forms a fibrant type whose points are the -structures.
To illustrate with the example of from above, an -structure in our sense would be given by the following data, in type-theoretic notation:
A type
A family dependent on
A family for the “identity”
The trick here is to ensure the repeated in the definition of , obeying the relations of the signature.
This is what the matching object machinery achieves for arbitrary signatures.
Other examples of -structures include: categories (with sorts for objects and morphisms), groupoids, -categories for any , preorders, and models of higher-order theories like topological spaces and sup-lattices.
Furthermore, all of what we say applies to theories with axioms (not just structures), which we define in the book too.
Indiscernibility and local univalence
With our formal set-up complete, we address the central question: for arbitrary signatures , when are two “objects” in an -structure equivalent?
Such an “object” could be a categorical (or higher-categorical) gadget itself when has height , e.g. the “objects” of are themselves categories, which are -structures for an of lower height.
The key idea is: two “objects” are indiscernible if nothing in the signature can distinguish them…up to indiscernibility!
Indiscernibilities and Local Univalence
Fix a signature , an -structure , a rank (“bottom”) sort , and elements .
Think of as a “set” of objects (e.g. of a category) on top of which properties and structures are defined.
To define when and are indiscernible, we package together everything that could distinguish them. The intuition is: if there is an equivalence between “everything you can say about ” and “everything you can say about ” (outside of directly referencing or themselves), then they are indiscernible.
We achieve this through a formal definition of a “boundary” , which one can think of as the -structure that contains everything that could distinguish in from anything else in .
Which naturally leads us to the following definitions.
Definition (Indiscernibility). We say that are indisernible, written , iff there is a levelwise equivalence that is coherent in the right way.
Definition (Univalent Structure). We say that is locally univalent at K if the canonical map
is an equivalence. We then say a structure is univalent if this holds at all rank-0 sorts and (recursively) for all sorts of higher rank.
We prove our main results for univalent -structures. These are quite special in that they are “saturated” in their equivalence information: two indistinguishable gizmos are actually equal. Or, put differently: when there is not “enough” structure to distinguish two gizmos, there is always enough to prove them equivalent. Some examples to illustrate (see Part 2 of the book for many more!):
In a reflexive graph structure, two nodes and are indiscernible iff they have the same number of arrows coming in and out, and the same number of loops that are identities.
In a category, two objects are indiscernible iff they are isomorphic. A univalent category is precisely a category (precategory in UF) that is locally univalent at the sort of its objects.
In an appropriately defined bicategory, an indiscernibility amounts to a pair of coherent adjoint equivalenes, as expected.
Equivalences of structures
We are almost there! On to the final missing piece: equivalences between structures themselves.
Let be a morphism of -structures, defined in the expected way (as a natural transformation between the corresponding -valued functors). Then, for each sort , we have a matching square
and for each “context” an induced fiber map
Write for indiscernibility at sort (as above). Then, what we really want to say now is: , are equivalent if they are “level-wise equivalent up to indiscernibility”. This idea gives us the from the beginning, and we define it as follows:
Definition (Equivalence of -structures). is an equivalence if, for every sort , every , and every , there exists a specified with
i.e., is essentially split surjective up to indiscernibility on each fiber. We write for the type of equivalences.
The key innovation is the “up to indiscernibility” part; it makes our notion significantly weaker than usual notions, and hence the final result stronger.
Note that we have not been able to prove our result without a splitness condition in the definition of equivalence, and to our mind this remains an open problem.
Our definition is related to Makkai’s original notion of FOLDS equivalence, but Makkai was not able to define a general notion of non-surjective equivalence directly, relying instead on spans. Our notion of indiscernibility circumvents this difficulty and allows us to consider the whole structure of equivalences between structures.
The Univalence Principle
With all our apparatus in place we prove our main result, a very general form of a univalence principle, as promised.
Theorem (“The Univalence Principle”).
For a signature and univalent -structures , the canonical map
is an equivalence of types. In other words,
.
The proof proceeds by induction on the height of . The key insight is that level-wise equivalences between univalent structures must “reflect indiscernibilities”: if doesn’t preserve the ability to distinguish elements, then whatever structure distinguishes them in the source would transfer to structure distinguishing their images in the target, contradicting the equivalence.
With the splitness assumption in the map and the assumption of univalence (of our -structure), we are able to achieve the lifting of the indiscernibility.
Our result is actually proved for an even more general class of signatures called functorial signatures, which strictly extends diagram signatures and covers “higher-order” situations (topological spaces, suplattices, DCPOs, etc.).
We have stuck to the diagrammatic view in this post for intuition, but all results and definitions carry over to this more general notion.
Conclusion
In the course of this work there were quite a few questions we tried to answer, but could not. To mention a couple: Can we define a Rezk completion for arbitrary structures, providing a universal way to turn any structure into a univalent one? Can we remove the splitness condition from our definition of equivalence between structures? We list more open problems at the end of the book.
Beyond our specific result, the framework in which it is proven establishes a way to answer metamathematical questions around the univalence axiom in a precise and fruitful way.
It is important to emphasize that carrying out this type of mathematical study does not require choosing one foundation over the other.
In any setting that interprets the fibrant part of 2LTT, the univalence principle will hold, including in set theory.
The metamathematics of UF is the mathematical study of formalizing mathematics in terms of a hierarchy of -levels vs. a cumulative hierarchy of sets.
Formalizing mathematics in this way has all sorts of unique mathematical properties.
The Univalence Principle is one of them.
I like the life I’ve had but also in some ways I wish I had been in New Zealand playing with the Tall Dwarfs in 1981 instead of being a 10-year-old in the suburbs of Washington, DC.
This is just a quick announcement that I’ll be hosting Nate Soares—who coauthored the self-explanatorily titled If Anyone Builds It, Everyone Dies with Eliezer Yudkowsky—tomorrow (Tuesday) at 5PM at UT Austin, for a brief talk followed by what I’m sure will be an extremely lively Q&A about his book. Anyone in the Austin area is welcome to join us.
Music theorists have studied many fractions of the form
2i 3j 5k
that are close to 1. They’re called 5-limit commas. Especially cherished are those that have fairly small exponents—given how close they are to 1. I discussed a bunch here:
Two pitches differing by this ratio sound the same to everyone except certain cleverly designed machines. But remarkably, the atom of Kirnberger shows up rather naturally in music—and it was discovered by a student of Bach! Read my article for details.
All this made me want to systematically explore such tiny intervals. Below is a table of them, where I list the best ones: the ones that are closest to 1 for a given complexity. The first eleven have names, and many of them play important roles in music! But beyond that point, all but one remain unnamed—or at least I don’t know their names. That’s because they’re too small to be audible, and—except for one—not even considered to be of great theoretical importance.
I’ll list these numbers in decimal form and also in cents, where we take the logarithm of the number in base 2 and multiply by 100. (I dislike this blend of base 2 and base 10, but it’s traditional in music theory.)
Most importantly, I list the monzo of each numbers. This is the vector of exponents: for example, the monzo of
2i 3j 5k
is
[i, j, k]
In case you’re wondering, this term was named after the music theorist Joseph Monzo.
Finally, I list the Tenney height of each number. This is a measure of the number’s complexity: the Tenney height of
2i 3j 5k
is
∣i∣ log2(2) + ∣j∣ log2(3) + ∣k∣ log2(5)
The table below purports to list only 5-limit commas that are close to 1 as possible for a given Tenney height. More precisely, it should list numbers of the form 2i 3j 5k that are > 1 and closer to 1 than any number with smaller Tenney height—except of course for 1 itself.
You’ll see there’s a big increase in Tenney height after the schisma. This is very interesting: it suggests that the schisma is the last ‘useful’ interval. It’s useful only in that it’s the ratio of two musically important commas, the syntonic comma and the Pythagorean comma. Life in music would be simpler if these were equal, and in well-tempered tuning systems it’s common to pretend that they are.
All the intervals in this table up to the schisma were discovered by musicians a long time ago, and they all have standard names! After the schisma, interest drops off dramatically.
The atom of Kirnberger has such amazing properties that it was worth naming. The rest, maybe not. But as you can see, I’ve taken the liberty of naming the smallest interval in the table the ‘quark of Baez’. This is much smaller than all that come before. It’s in bad taste to name things after oneself—indeed this is item 25 on the crackpot index—but I hope it’s allowed as a joke.
I also hope that in the future this is considered my smallest mathematical discovery.
Here is the Python code that should generate the above information. If you’re good at programming, please review it and check it! Someone gave me a gift subscription to Claude, and it (more precisely Opus 4.5) created this code. It seems to make sense, and I’ve checked a bunch of the results, but I don’t know Python.
In studying this subject I discovered that tiny 5-limit intervals were studied by Gene Ward Smith, a mathematician I used to see around on sci.math and the like. I never knew he worked on microtonal music! I am sad to hear that he died from COVID-19 in January 2021.
I may just be redoing a tiny part of his work: if anyone can find details, please let me know. In his memory, I’ll conclude with this article from the Xenharmonic Wiki:
Gene Ward Smith (1947–2021) was an American mathematician, music theorist, and composer.
In mathematics, he worked in the areas of Galois theory and Moonshine theory.
In music theory, he introduced wedge products as a way of classifying regular temperaments. In this system, a temperament is specified by means of a wedgie, which may technically be identified as a point on a Grassmannian. He had long drawn attention to the relationship between equal divisions of the octave and the Riemann zeta function.[1][2][3] He early on identified and emphasized free abelian groups of finite rank and their homomorphisms, and it was from that perspective that he contributed to the creation of the regular mapping paradigm.
In the 1970s, Gene experimented with musical compositions using a device with four square-wave voices, whose tuning was very stable and accurate, being controlled by a crystal oscillator. The device in turn was controlled by HP 9800 series desktop computers, initially the HP 9830A, programmed in HP Basic, later the 9845A. Using this, he explored both just intonation with a particular emphasis on groups of transformations, and pajara.
Gene had a basic understanding of the regular mapping paradigm during this period, but it was limited in practice since he was focused on the idea that the next step from meantone should keep some familiar features, and so was interested in tempering out 64/63 in place of 81/80. He knew 7-limit 12 and 22 had tempering out 64/63 and 50/49 in common, and 12 and 27 had tempering out 64/63 and 126/125 in common, and thought these would be logical places to progress to, blending novelty with familiarity. While he never got around to working with augene, he did consider it. For pajara, he found tempering certain JI scales, the 10 and 12 note highschool scales, led to interesting (omnitetrachordal) results, and that there were also closely related symmetric (MOS) scales of size 10 and 12 for pajara; he did some work with these, particularly favoring the pentachordal decatonic scale.
Gene was among the first to consider extending the Tonnetz of Hugo Riemann beyond the 5-limit and hence into higher dimensional lattices. In three dimensions, the hexagonal lattice of 5-limit harmony extends to a lattice of type A3 ~ D3. He is also the first to write music in a number of exotic intonation systems.
There seems to be a huge push lately in the tech world for the idea of placing data centers in space. This is not just coming from Musk via the merging of SpaceX and XAi. Google has some effort along these lines. NVIDIA is thinking about it. TED talks are being given by startup people in San Francisco on this topic, so you know we've reached some well-defined hype level. Somehow the idea has enough traction that even the PRC is leaning in this direction. The arguments seem to be that (1) there is abundant solar power in space; (2) environmental impact on the earth will be less, with no competition for local electricity, water, real estate; (3) space is "cold", so cooling these things should be do-able; (4) it's cool and sounds very sci-fi/high frontier.
At present (or near-future) levels of technology, as far as I can tell this idea makes no sense. I will talk about physics reasons here, though there are also pragmatic economic reasons why this seems crazy. I've written before that I think some of the AI/data center evangelists are falling victim to magical thinking, because they come from the software world and don't in their heart of hearts appreciate that there are actual hardware constraints on things like chip manufacturing and energy production.
Others have written about this - see here for example. The biggest physics challenges with this idea (beyond lofting millions of kg of cargo into orbit):
While the cosmic microwave background is cold, cooling things in space is difficult, because vacuum is an excellent thermal insulator. On the ground, you can use conduction and convection to get rid of waste heat. In space, your only option (beyond throwing mass overboard, which is not readily replenishible) is radiative cooling. The key physics here is the Stefan-Boltzmann law, which is a triumph of statistical physics (and one of my favorite derivations to discuss in class - you combine the Planck result for the energy density of a "gas" of photons in thermal equilibrium at some temperature \(T\) with a basic kinetic theory of gases result for the flux of particles out of a small hole). It tells you that the best you can ever do is for an ideal black body, the total power radiated away is proportional to the area of the radiator and \(T^{4}\), with fundamental constants making up the proportionality constant with zero adjustable parameters.
Remember, data centers right now consume enormous amounts of power (and cooling water). While you can use heat pumps to try to get the radiators up to well above the operating temperatures of the electronics, that increases mass and waste power, and realistically there is an upper limit on the radiator temperature below 1000 K. An ideal black body radiator at 1000 K puts out about 57 kW per square meter, and you probably need to get rid of tens of megawatts, necessitating hundreds to thousands of square meters of radiator area. There are clever ideas on how to try to do this. For example, in the liquid droplet radiator, you could spray a bunch of hot droplets out into space, capitalizing on their large specific surface area. Of course, you'd need to recapture the cooled droplets, and the hot liquid needs to have sufficiently low vapor pressure that you don't lose a lot of material. Still, as far as I am aware, to date no one has actually deployed a large-scale (ten kW let alone MW level) droplet radiator in space.
High end computational hardware is vulnerable to radiation damage. There are no rad-hard GPUs. Low earth orbit is a pretty serious radiation environment, with some flux of high energy cosmic rays quite a bit higher than on the ground. While there are tests going on, and astronauts are going to bring smartphones on the next Artemis mission, it's rough. Putting many thousands to millions of GPUs and huge quantities of memory in a harsh environment where they cannot be readily accessed or serviced seems unwise. (There are also serious questions of vulnerability to attack. Setting off a small nuclear warhead in LEO injects energetic electrons into the lower radiation belts and would be a huge mess.)
I think we will be faaaaaaar better off in the long run if we take a fraction of the money that people want to invest in space-based data centers, and instead plow those resources into developing energy-efficient computing. Musk has popularized the engineering sentiment "The best part is no part". The best way to solve the problem of supplying and radiating away many GW of power for data centers is to make data centers that don't consume many GW of power.
Friend-of-the-blog Salil Vadhan has asked me to share the following.
The Trevisan Award for Expository Work is a new SIGACT award created in memory of Luca Trevisan (1971-2024), with a nomination deadline of April 10, 2026.
The award is intended to promote and recognize high-impact work expositing ideas and results from the Theory of Computation. The exposition can have various target audiences, e.g. people in this field, people in adjacent or remote academic fields, as well as the general public. The form of exposition can vary, and can include books, surveys, lectures, course materials, video, audio (e.g. podcasts), blogs and other media products. The award may be given to a single piece of work or a series produced over time. The award may be given to an individual, or a small group who together produced this expository work.
The awardee will receive USD$2000 (to be divided among the awardees if multiple), as well as travel support if needed to attend STOC, where the award will be presented. STOC’2026 is June 22-26 in Salt Lake City, Utah.
The endowment for this prize was initiated by a gift from Avi Wigderson, drawing on his Turing Award, and has been subsequently augmented by other individuals.
Barring any last-minute moves, and despite being linked to just about every big-name pitcher on the free-agent market, the Orioles will go into 2026 as they left 2025, without a clear #1 starter. Depth, though, they have — the list of plausible back-of-the rotation starting pitchers on the team is a long one. What do you do when you have too many 5s and not enough 1s?
Here’s an idea. One imagines Kyle Bradish, Trevor Rogers, Shane Baz, and Dean Kremer are locked in for rotation spots. Zach Eflin, too, but he may still be recovering from his latest injury when the season starts. So who’s our #5 starter in April 2026? I think it ought to be Cade Povich and Tyler Wells. Povich gets the first four innings, Wells the next four, then it’s Ryan Helsley time. Doesn’t this make a lot of sense? Both these guys are a lot worse their second time through the order. (Povich career splits, Wells career splits.) So if each one’s facing 15 batters you’re getting more of the best of them and you’d expect their overall performance to be better than either has been as a standard starter. Maybe 5 +5 = 3! Plus: the lefty-righty switch messes up any opposition attempt to maintain a platoon advantage. And while Wells was great in four starts at the end of last year, he’s historically been better in relief, which this role, technically, would be.
Yes, you’re using six roster spots for starters, but you probably need fewer relief innings in the games Wellvich starts. Maybe this way Povich and Wells pitch 80 innings each instead of Povich pitching 110 as the 5th starter and Wells pitching 60 in long relief; not that much difference in the number of innings needing coverage by the bullpen minus Wells.
Quanta Magazine recently published a reflection by Natalie Wolchover on the state of fundamental particle physics. The discussion covers a lot of ground, but one particular paragraph has gotten the lion’s share of the attention. Wolchover talked to Jared Kaplan, the ex-theoretical physicist turned co-founder of Anthropic, one of the foremost AI companies today.
Kaplan was one of Nima Arkani-Hamed’s PhD students, which adds an extra little punch.
There’s a lot to contest here. Is AI technology anywhere close to generating papers as good as the top physicists, or is that relegated to the sci-fi future? Does Kaplan really believe this, or is he just hyping up his company?
I don’t have any special insight into those questions, about the technology and Kaplan’s motivations. But I think that, even if we trusted him on the claim that AI could be generating Witten- or Nima-level papers in three years, that doesn’t mean it will replace theoretical physicists. That part of the argument isn’t a claim about the technology, but about society.
So let’s take the technological claims as given, and make them a bit more specific. Since we don’t have any objective way of judging the quality of scientific papers, let’s stick to the subjective. Today, there are a lot of people who get excited when Witten posts a new paper. They enjoy reading them, they find the insights inspiring, they love the clarity of the writing and their tendency to clear up murky ideas. They also find them reliable: the papers very rarely have mistakes, and don’t leave important questions unanswered.
Let’s use that as our baseline, then. Suppose that Anthropic had an AI workflow that could reliably write papers that were just as appealing to physicists as Witten’s papers are, for the same reasons. What happens to physicists?
Witten himself is retired, which for an academic means you do pretty much the same thing you were doing before, but now paid out of things like retirement savings and pension funds, not an institute budget. Nobody is going to fire Witten, there’s no salary to fire him from. And unless he finds these developments intensely depressing and demoralizing (possible, but very much depends on how this is presented), he’s not going to stop writing papers. Witten isn’t getting replaced.
More generally, though, I don’t think this directly results in anyone getting fired, or in universities trimming positions. The people making funding decisions aren’t just sitting on a pot of money, trying to maximize research output. They’ve got money to be spent on hires, and different pools of money to be spent on equipment, and the hires get distributed based on what current researchers at the institutes think is promising. Universities want to hire people who can get grants, to help fund the university, and absent rules about AI personhood, the AIs won’t be applying for grants.
Funding cuts might be argued for based on AI, but that will happen long before AI is performing at the Witten level. We already see this happening in other industries or government agencies, where groups that already want to cut funding are getting think tanks and consultants to write estimates that justify cutting positions, without actually caring whether those estimates are performed carefully enough to justify their conclusions. That can happen now, and doesn’t depend on technological progress.
AI could also replace theoretical physicists in another sense: the physicists themselves might use AI to do most of their work. That’s more plausible, but here adoption still heavily depends on social factors. Will people feel like they are being assessed on whether they can produce these Witten-level papers, and that only those who make them get hired, or funded? Maybe. But it will propagate unevenly, from subfield to subfield. Some areas will make their own rules forbidding AI content, there will be battles and scandals and embarrassments aplenty. It won’t be a single switch, the technology alone setting the timeline.
Finally, AI could replace theoretical physicists in another way, by people outside of academia filling the field so much that theoretical physicists have nothing more that they want to do. Some non-physicists are very passionate about physics, and some of those people have a lot of money. I’ve done writing work for one such person, whose foundation is now attempting to build an AI Physicist. If these AI Physicists get to Witten-level quality, they might start writing compelling paper after compelling paper. Those papers, though, will due to their origins be specialized. Much as philanthropists mostly fund the subfields they’ve heard of, philanthropist-funded AI will mostly target topics the people running the AI have heard are important. Much like physicists themselves adopting the technology, there will be uneven progress from subfield to subfield, inch by socially-determined inch.
In a hard-to-quantify area like progress in science, that’s all you can hope for. I suspect Kaplan got a bit of a distorted picture of how progress and merit work in theoretical physics. He studied with Nima Arkani-Hamed, who is undeniably exceptionally brilliant but also undeniably exceptionally charismatic. It must feel to a student of Nima’s that academia simply hires the best people, that it does whatever it takes to accomplish the obviously best research. But the best research is not obvious.
Mr. Epstein was not only a world-class child abuser, he was a big fan of theoretical high-energy physics and of theoretical physicists. Some of my colleagues, unfortunately, got to know him. A number who were famous and/or had John Brockman as a book agent were even invited to a physics conference on Epstein’s private island, well before he was first arrested. This was no secret; as I recall, a lot of us heard about the existence of this conference/trip, but we hadn’t heard Epstein’s name before and didn’t pay much attention (ho hum, just another weird billionaire).
Personally, I feel quite lucky. The Brockman agency rejected the proposal for my recent book without comment (thank you!); and my research is mostly considered unimportant by the Brian Greenes of the world. As a result, I was not invited to Epstein’s island, never made his acquaintance, and blissfully avoided the entire affair. Clearly there are some benefits to being considered ordinary. And so — I’m sorry/not-sorry to say — I can’t tell you much about Epstein at all, or about how certain physicists did and did not interact with him. Regarding my colleagues who did get to know him, I can’t speak for them, since I wasn’t there, and I don’t know to what extent Epstein hid his immoral activities when they were around. It’s up to them to tell their own stories if they feel the need to do so (and I hope a couple of them do, just to clear the air.) Personally I tend to give them the benefit of the doubt — probably some literally didn’t know what was up until Epstein’s arrest in 2008, while perhaps others felt there wasn’t much they could do about Epstein’s actions on his own private island. I imagine they are deeply embarrassed to have been caught in this horrible man’s ugly web.
Fans of physics come in all shapes and sizes, and some have large wallets, large egos, and/or large ambitions. Among the wealthy supporters, we can count Alfred Nobel himself; billionaires sit on important scientific institute and university boards, and the more recent Breakthrough Prizes were funded by deep pockets. The extreme wealthy have outsized influence in our country and in our world, and one could argue that their influence in 2025 was not for the better. Usually, though, the influence in physics and related fields tends to be relatively benign, funding postdoctoral researchers and graduate students who deeply want to do science but also need to eat. That said, sometimes donors fund non-essential fields at the expense of critical ones, or favor theoretical research over the gathering of crucial experimental data, or push money on famous rich organizations when there are poor ones that are equally deserving and far more needy.
When gazillionaires, on their own initiative, come calling on non-profit organizations, whether they be community centers, arts organizations, or universities, they pose a problem. On the one hand, it is the job of anyone in a non-profit organization to help raise money — fail to do that, and your organization will close. When a single person offers to permanently change the future of your program, you would be derelict in your duty if you did not consider that offer. On the other hand, donors who might have ethical or criminal problems could drag the organization’s name through the mud. Worse, they might be able to force the organization itself to do something ethically questionable or even illegal.
There is a clear lesson for young academics and other up-and-coming non-profit actors in the Epstein affair: the more money potentially offered to our organizations, the more carefully we must tread. Money is power; power corrupts; and every pursuit of dollars, even for the best causes, risks infection. We can’t be large-scale non-profit fundraisers without doing serious and thorough background checks of the biggest donors; we have to question motives, and we can’t look the other way when something seems amiss. Those of us with clear hearts and honest pursuits tend to assume the best in other people. But we have to beware of those hoping to bolster their reputations, or clean their consciences, by giving away “generously” what they never deserved to have.
My top 10 ghosts (solo acts and ensembles). If Bruce Willis being a ghost in The Sixth Sense is a spoiler, that’s on you — the movie has been out for 26 years.
Einstein and I have both been spooked by entanglement. Einstein’s experience was more profound: in a 1947 letter to Born, he famously dubbed it spukhafte Fernwirkung (or spooky action at a distance). Mine, more pedestrian. It came when I first learned the cost of entangling logical qubits on today’s hardware.
Logical entanglement is not easy
I recently listened to a talk where the speaker declared that “logical entanglement is easy,” and I have to disagree. You could argue that it looks easy when compared to logical small-angle gates, in much the same way I would look small standing next to Shaquille O’Neal. But that doesn’t mean 6’5” and 240 pounds is small.
To see why it’s not easy, it helps to look at how logical entangling gates are actually implemented. A logical qubit is not a single physical object. It’s an error-resistant qubit built out of several noisy, error-prone physical qubits. A quantum error-correcting (QEC) code with parameters uses physical qubits to encode logical qubits in a way that can detect up to physical errors and correct up to of them.
This redundancy is what makes fault-tolerant quantum computing possible. It’s also what makes logical operations expensive.
On platforms like neutral-atom arrays and trapped ions, the standard approach is a transversal CNOT: you apply two-qubit gates pairwise across the code blocks (qubit in block A interacts with qubit in block B). That requires physical two-qubit gates to entangle the logical qubits of one code block with the logical qubits of another.
To make this less abstract, here’s a QuEra animation showing a transversal CNOT implemented in a neutral-atom array. This animation is showing real experimental data, not a schematic idealization.
The idea is simple. The problem is that can be large, and physical two-qubit gates are among the noisiest operations available on today’s hardware.
Superconducting platforms take a different route. They tend to rely on lattice surgery; you entangle logical qubits by repeatedly measuring joint stabilizers along a boundary. That replaces two-qubit gates for stabilizer measurements over multiple rounds (typically scaling with the code distance). Unfortunately, physical measurements are the other noisiest primitive we have.
Then there are the modern high-rate qLDPC codes, which pack many logical qubits into a single code block. These are excellent quantum memories. But when it comes to computation, they face challenges. Logical entangling gates can require significant circuit depth, and often entire auxiliary code blocks are needed to mediate the interaction.
This isn’t a purely theoretical complaint. In recent state-of-the-art experiments by Google and by the Harvard–QuEra–MIT collaboration, logical entangling gates consumed nearly half of the total error budget.
So no, logical entanglement is not easy. But, how easy can we make it?
Phantom codes: Logical entanglement without physical operations
To answer how easy logical entanglement can really be, it helps to start with a slightly counterintuitive observation: logical entanglement can sometimes be generated purely by permuting physical qubits.
Let me show you how this works in the simplest possible setting, and then I’ll explain what’s really going on.
Consider a stabilizer code, which encodes 4 physical qubits into 2 logical ones that can detect 1 error, but can’t correct any. Below are its logical operators; the arrow indicates what happens when we physically swap qubits 1 and 3 (bars denote logical operators).
You can check that the logical operators transform exactly as shown, which is the action of a logical CNOT gate. For readers less familiar with stabilizer codes, click the arrow below for an explanation of what’s going on. Those familiar can carry on.
Click!
At the logical level, we identify gates by how they transform logical Pauli operators. This is the same idea used in ordinary quantum circuits: a gate is defined not just by what it does to states, but by how it reshuffles observables.
A CNOT gate has a very characteristic action. If qubit 1 is the control and qubit 2 is the target, then: an on the control spreads to the target, a on the target spreads back to the control, and the other Pauli operators remain unchanged.
That’s exactly what we see above.
To see why this generates entanglement, it helps to switch from operators to states. A canonical example of how to generate entanglement in quantum circuits is the following. First, you put one qubit into a superposition using a Hadamard. Starting from , this gives
At this point there is still no entanglement — just superposition.
The entanglement appears when you apply a CNOT. The CNOT correlates the two branches of the superposition, producing
which is a maximally-entangled Bell state. The Hadamard creates superposition; the CNOT turns that superposition into correlation.
The operator transformations above are simply the algebraic version of this story. Seeing
tells us that information on one logical qubit is now inseparable from the other.
In other words, in this code,
The figure below shows how this logical circuit maps onto a physical circuit. Each horizontal line represents a qubit. On the left is a logical CNOT gate: the filled dot marks the control qubit, and the ⊕ symbol marks the target qubit whose state is flipped if the control is in the state . On the right is the corresponding physical implementation, where the logical gate is realized by acting on multiple physical qubits.
At this point, all we’ve done is trade one physical operation for another. The real magic comes next. Physical permutations do not actually need to be implemented in hardware. Because they commute cleanly through arbitrary circuits, they can be pulled to the very end of a computation and absorbed into a relabelling of the final measurement outcomes. No operator spread. No increase in circuit depth.
This is not true for generic physical gates. It is a unique property of permutations.
To see how this works, consider a slightly larger example using an code. Here the logical operators are a bit more complicated:
Below is a three-logical-qubit circuit implemented using this code like the circuit drawn above, but now with an extra step. Suppose the circuit contains three logical CNOTs, each implemented via a physical permutation.
Instead of executing any of these permutations, we simply keep track of them classically and relabel the outputs at the end. From the hardware’s point of view, nothing happened.
If you prefer a more physical picture, imagine this implemented with atoms in an array. The atoms never move. No gates fire. The entanglement is there anyway.
This is the key point. Because no physical gates are applied, the logical entangling operation has zero overhead. And for the same reason, it has perfect fidelity. We’ve reached the minimum possible cost of a logical entangling gate. You can’t beat free.
To be clear, not all codes are amenable to logical entanglement through relabeling. This is a very special feature that exists in some codes.
Motivated by this observation, my collaborators and I defined a new class of QEC codes. I’ll state the definition first, and then unpack what it really means.
Phantom codes are stabilizer codes in which logical entangling gates between every ordered pair of logical qubits can be implemented solely via physical qubit permutations.
The phrase “every ordered pair” is a strong requirement. For three logical qubits, it means the code must support logical CNOTs between qubits , , , , , and . More generally, a code with logical qubits must support all possible directed CNOTs. This isn’t pedantry. Without access to every directed pair, you can’t freely build arbitrary entangling circuits — you’re stuck with a restricted gate set.
The phrase “solely via physical qubit permutations” is just as demanding. If all but one of those CNOTs could be implemented via permutations, but the last one required even a single physical gate — say, a one-qubit Clifford — the code would not be phantom. That condition is what buys you zero overhead and perfect fidelity. Permutations can be compiled away entirely; any additional physical operation cannot.
Together, these two requirements carve out a very special class of codes. All in-block logical entangling gates are free. Logical entangling gates between phantom code blocks are still available — they’re simply implemented transversally.
After settling on this definition, we went back through the literature to see whether any existing codes already satisfied it. We found two. The Carbon code and hypercube codes. The former enabled repeated rounds of quantum error-correction in trapped-ion experiments, while the latter underpinned recent neutral-atom experiments achieving logical-over-physical performance gains in quantum circuit sampling.
Both are genuine phantom codes. Both are also limited. With distance , they can detect errors but not correct them. With only logical qubits, there’s a limited class of CNOT circuits you can implement. Which begs the questions: Do other phantom codes exist? Can these codes have advantages that persist for scalable applications under realistic noise conditions? What structural constraints do they obey (parameters, other gates, etc.)?
Before getting to that, a brief note for the even more expert reader on four things phantom codes are not. Phantom codes are not a form of logical Pauli-frame tracking: the phantom property survives in the presence of non-Clifford gates. They are not strictly confined to a single code block: because they are CSS codes, multiple blocks can be stitched together using physical CNOTs in linear depth. They are not automorphism gates, which rely on single-qubit Cliffords and therefore do not achieve zero overhead or perfect fidelity. And they are not codes like SHYPS, Gross, or Tesseract codes, which allow only products of CNOTs via permutations rather than individually addressable ones. All of those codes are interesting. They’re just not phantom codes.
In a recent preprint, we set out to answer the three questions above. This post isn’t about walking through all of those results in detail, so here’s the short version. First, we find many more phantom codes — hundreds of thousands of additional examples, along with infinite families that allow both and to scale. We study their structural properties and identify which other logical gates they support beyond their characteristic phantom ones.
Second, we show that phantom codes can be practically useful for the right kinds of tasks — essentially, those that are heavy on entangling gates. In end-to-end noisy simulations, we find that phantom codes can outperform the surface code, achieving one–to–two orders of magnitude reductions in logical infidelity for resource state preparation (GHZ-state preparation) and many-body simulation, at comparable qubit overhead and with a modest preselection acceptance rate of about 24%.
If you’re interested in the details, you can read more in our preprint.
Larger space of codes to explore
This is probably a good moment to zoom out and ask the referee question: why does this matter?
I was recently updating my CV and realized I’ve now written my 40th referee report for APS. After a while, refereeing trains a reflex. No matter how clever the construction or how clean the proof, you keep coming back to the same question: what does this actually change?
So why do phantom codes matter? At least to me, there are two reasons: one about how we think about QEC code design, and one about what these codes can already do in practice.
The first reason is the one I’m most excited about. It has less to do with any particular code and more to do with how the field implicitly organizes the space of QEC codes. Most of that space is structured around familiar structural properties: encoding rate, distance, stabilizer weight, LDPC-ness. These form the axes that make a code a good memory. And they matter, a lot.
But computation lives on a different axis. Logical gates cost something, and that cost is sometimes treated as downstream—something to be optimized after a code is chosen, rather than something to design for directly. As a result, the cost of logical operations is usually inherited, not engineered.
One way to make this tension explicit is to think of code design as a multi-dimensional space with at least two axes. One axis is memory cost: how efficiently a code stores information. High rate, high distance, low-weight stabilizers, efficient decoding — all the usual virtues. The other axis is computational cost: how expensive it is to actually do things with the encoded qubits. Low computational cost means many logical gates can be implemented with little overhead. Low computational cost makes computation easy.
Why focus on extreme points in this space? Because extremes are informative. They tell you what is possible, what is impossible, and which tradeoffs are structural rather than accidental.
Phantom codes sit precisely at one such extreme: they minimize the cost of in-block logical entanglement. That zero-logical-cost extreme comes with tradeoffs. The phantom codes we find tend to have high stabilizer weights, and for families with scalable , the number of physical qubits grows exponentially. These are real costs, and they matter.
Still, the important lesson is that even at this extreme point, codes can outperform LDPC-based architectures on well-chosen tasks. That observation motivates an approach to QEC code design in which the logical gates of interest are placed at the centre of the design process, rather than treated as an afterthought. This is my first takeaway from this work.
Second is that phantom codes are naturally well suited to circuits that are heavy on logical entangling gates. Some interesting applications fall into this category, including fermionic simulation and correlated-phase preparation. Combined with recent algorithmic advances that reduce the overhead of digital fermionic simulation, these code-level ideas could potentially improve near-term experimental feasibility.
Back to being spooked
The space of QEC codes is massive. Perhaps two axes are not enough. Stabilizer weight might deserve its own. Perhaps different applications demand different projections of this space. I don’t yet know the best way to organize it.
The size of this space is a little spooky — and that’s part of what makes it exciting to explore, and to see what these corners of code space can teach us about fault-tolerant quantum computation.
Look at this cutie! This is from a 1940 paper by George Birkhoff. It concerns the question of what pictures can be drawn on a sheet of notebook paper if you can only put straight lines on the page. Alternatively — the nonnegative span of the delta functions of lines form some kind of subspace of the nonnegative functions on the plane, and you can ask which functions are in there . This, which Birkhoff credits to “Mr. David Middleton, a student at Harvard University,” is one of them. The David Middleton in question must surely be this guy, who did a whole oral history interview and never mentioned that he drew faces with straight lines for Birkhoff.
I think it would be hard to publish a paper like this in Jour. Math. Pur. Appl. now. But this is fun! Maybe we’re taking math too seriously and should draw more faces.
Surely people have thought about this but I couldn’t find anything on a short search.
The question: let n be a random integer, say in [N,2N]. It factors as the product of p_i^a_i. So you can think of the proportion of n that is “made of” the prime p_i to be (a_i log p_i / log n). OK, now this gives you a probability distribution. What’s its entropy?
You can make this a little simpler by restricting to squarefree integers. (Not sure whether this should change the answer.) The sizes of the prime factors of a random squarefree integer are supposed to match the cycle lengths of a random large permutation (say on N letters), so now we can ask that question: break a random permutation up into cycles, which gives another probability distribution; namely, that which assigns each cycle the probability cycle length / N. (This is called the Poisson-Dirichlet process with parameters (0,1). Here’s a Terry T. post about it, and why it governs prime factorizations of random integers.) What’s the entropy of this one?
Well, we can at least guess the average. Let X_i be the number of i-cycles in a random permutation on N letters. Then the X_i are roughly independent Poisson variables of mean 1/i. So there are X_i i-cycles, each appearing with probability i/n, and thus each contributing (-i/N) log (i/N) or (i/N)(log N – log i) to the entropy of the distribution. Now all we have to do is sum over i! You are summing
(i/N) X_i log N – (i/N) X_i log i
over all i. The first sum is easy; the sum of iX_i has to be N, so this is just log N.
What about the subtrahend? This is . Since X_i has expected value 1/i, the expected value of this part should be
I didn’t tell you how long the sum was! But i certainly can’t be greater than N so let’s stop the sum there. Then the sum of log_i, by Stirling’s formula, is very close to N log N – N. So the second part of the sum is about log N – 1. Almost exact cancellation! Our estimate for the mean entropy is
log N – (log N – 1) = 1.
Is that true? Is it also true for integers? (I did actually run some computations on this and the mean looked less than one, but numbers with only five or six digits are just lousy with primes and near-primes, which could well mess this up.) Does the entropy actually converge to a distribution or does it just have a mean? What about the exponential of the entropy, the so-called perplexity? Does it have a mean? I ask because I tend to think of the perplexity of a probability distribution as roughly “the actual number of possible outcomes if you count them with information in mind.” So you could think of the distribution of perplexity, if there is one, as the information theorist’s version of the Erdos-Kac theorem. “You thought the number of prime factors grew like log log n, Pal and Mark, but come on, some of those primes are so small they barely matter — actually the expected number of prime factors is constant!”
I’m sure this is all well-understood by someone, but as I’ve mentioned before I think a blog is a good place to model the casual way mathematicians think about things in real life, but not always in public.
After seeing this latest extremely good video from Veritasium, and looking back through my posts, I realized that while I've referenced it indirectly, I've never explicitly talked about the Aharonov-Bohm effect. The video is excellent, and that wikipedia page is pretty good, but maybe some people will find another angle on this to be helpful.
The ultrabrief version: The quantum interference of charged particles like electrons can be controllably altered by tuning a magnetic field in a region that the particles never pass through. This is weird and spooky because it's an entirely quantum mechanical effect - classical physics, where motion is governed by local forces, says that zero field = unaffected trajectories.
In quantum mechanics, we describe the spatial distribution of particles like electrons with a wavefunction, a complex-valued quantity that one can write as an amplitude and a phase \(\varphi\), where both depend on position \(\mathbf{r}\). The phase is important because waves can interfere. Crudely speaking, when the crests of one wave (say \(\varphi = 0\)) line up with the troughs of another wave (\(\varphi = \pi\)) at some location, the waves interfere destructively, so the total wave at that location is zero if the amplitudes of each contribution are identical. As quantum particles propagate through space, their phase "winds" with distance \(\mathbf{r}\) like \(\mathbf{k}\cdot \mathbf{r}\), where \(\hbar \mathbf{k} = \mathbf{p}\) is the momentum. Higher momentum = faster winding of phase = shorter wavelength. This propagation, phase winding, and interference is the physics behind the famous two-slit experiment. (In his great little popular book - read it if you haven't yet - Feynman described phase as a clockface attached to each particle.) One important note: The actual phase itself is arbitrary; it's phase differences that matter in interference experiments. If you added an arbitrary amount \(\varphi_{0}\) to every phase, no physically measurable observables would change.
Things get trickier if the particles that move around are charged. It was realized 150+ years ago that formal conservation of momentum gets tricky if we consider electric and magnetic fields. The canonical momentum that shows up in the Lagrange and Hamilton equations is \(\mathbf{p}_{c} = \mathbf{p}_{kin} + q \mathbf{A}\), where \(\mathbf{p}_{kin}\) is the kinetic momentum (the part that actually has to do with the classical velocity and which shows up in the kinetic energy), \(q\) is the charge of the particle, and \(\mathbf{A}(\mathbf{r}\)\) is the vector potential.
Background digression: The vector potential is very often a slippery concept for students. We get used to the idea of a scalar potential \(\phi(\mathbf{r})\), such that the electrostatic potential energy is \(q\phi\) and the electric field is given by \(\mathbf{E} = -\nabla \phi\) if there are no magnetic fields. Adding an arbitrary uniform offset to the scalar potential, \(\phi \rightarrow \phi + \phi_{0}\), doesn't change the electric field (and therefore forces on charged particles), because the zero that we define for energy is arbitrary (general relativity aside). For the vector potential, \(\mathbf{B} = \nabla \times \mathbf{A}\). This means we can add an arbitrary gradient of a scalar function to the vector potential, \(\mathbf{A} \rightarrow \mathbf{A}+ \nabla f(\mathbf{r})\), and the magnetic field won't change. Maxwell's equations mean that \(\mathbf{E} = -\nabla \phi - \partial \mathbf{A}/\partial t\). "Gauge freedom" means that there is more than one way to choose internally consistent definitions of \(\phi\) and \(\mathbf{A}\).
TL/DR main points: (1) The vector potential can be nonzero in places where \(\mathbf{B}\) (and hence the classical Lorentz force) is zero. (2) Because the canonical momentum becomes the operator \(-i \hbar \nabla\) in quantum mechanics and the kinetic momentum is what shows up in the kinetic energy, charged propagating particles pick up an extra phase winding given by \(\delta \varphi = (q/\hbar)\int \mathbf{A}\cdot d\mathbf{r}\) along a path.
This is the source of the creepiness of the Aharonov-Bohm effect. Think of two paths (see still taken from the Veritasium video), and threading magnetic flux just through the little region using a solenoid will tune the intensity detected on the screen on the far right. That field region can be made arbitrarily small and positioned anywhere inside the diamond formed by the paths, and the effect still works. Something not mentioned in the video: The shifting of the interference pattern is periodic in the flux through the solenoid, with a period of \(h/e\), where \(h\) is Planck's constant and \(e\) is the electronic charge.
Why should you care about this?
As the video discusses, the A-B effect shows that the potentials are physically important quantities that affect motion, at least as much as the corresponding fields, and there are quantum consequences to this that are just absent in the classical world.
The A-B effect (though not with the super skinny field confinement) has been seen experimentally in many mesoscopic physics experiments (e.g., here, or here) and can be used as a means of quantifying coherence at these scales (e.g., here and here).
When dealing with emergent quasiparticles that might have unusual fractional charges (\(e^*\)), then A-B interferometers can have flux periodicities that are given by \(h/e^*\). (This can be subtle and tricky.)
Interferometry to detect potential-based phase shifts is well established. Here's the paper mentioned in the video about a gravitational analog of the A-B effect. (Quibblers can argue that there is no field-free region in this case, so it's not strictly speaking the A-B analog.)
Basically, the A-B effect has gone from an initially quite controversial prediction to an established piece of physics that can be used as a tool. If you want to learn Aharonov's take on all this, please read this interesting oral history.
Update: The always informative Steve Simon has pointed out to me a history of this that I had not known, that this effect had already been discovered a decade earlier by Ehrenberg and Siday. Please see this arXiv paper about this. Here is Ehrenberg and Siday's paper. Aharonov and Bohm were unaware of it and arrived at their conclusions independently. One lesson to take away: Picking a revealing article title can really help your impact.
Strange how time goes by. And strange I would say that, since I know time does not flow, it is just our perception of one of the spacetime coordinates of our block universe... The thing is, on February 5 I will turn 60. An important date for anybody - I could say a milestone. First of all, let me say that we give for granted all the days of our life we got to live, but in truth we did not know it from the start we would make it far. I do feel rather young still, but I am very well aware that there are heaps of ways I could have ended my life earlier. Accidents, but also naturally occurring sickness.
Steel mills dumped molten slag in parts of Chicago and nearby areas. The slag hardened in layers up to 5 meters deep. These places became barren wastelands. Other industries dumped hot ash and cinders there.
But eventually the steel mills closed.
The deep layers of hard, toxic material were not friendly to plants. Cottonwoods are usually 30 meters tall or more. In the slag fields, stunted cottonwoods grow to just 2 meters.
But rare species that could handle these conditions began to thrive. The lakeside daisy, a federally threatened species lost to Illinois for decades, turned out to grow taller on slag than on topsoil! The capitate spike-rush, last recorded in Illinois in 1894 and considered locally extinct, was rediscovered growing on slag.
A team of women ecologists began studying these unusual landscapes. They call themselves the Slag Queens.
Ecologist Alison Anastasio visited a former US Steel South Works site
in Chicago. She expected to just find “crap plants”: common invasive weeds. To her surprise she spotted little bluestem and three species of native milkweed. She already knew she didn’t want a career as an academic scientist. But she came up with the idea of forming a group to study this ecosystem: “a dream team of people I wanted to work with”.
She knew Laura Merwin from the University of Chicago, and later she met Lauren Umek, a project manager for the Chicago Park District. She invited them to brunch to pitch her idea to research plants growing on slag. Not for any obvious career goal. Just out of sheer curiosity.
Merwin and Umek were excited to join her project—which she called a
“reverse side hustle,” since it involved a lot of work, but didn’t make any money: it actually costs money.
And thus the Slag Queens were born.
Alison Anastasio (left) and Lauren Umek (right) along with Laura Merwin (not pictured), formed the Slag Queens in 2018. Photograph by Jason Smith.
Their first paper, Urban post-industrial landscapes have unrealized ecological potential, was published in Restoration Ecology in 2022. It argues that slag fields don’t need to be fixed. They have ecological value in and of themselves. And land managers should forget whatever ecosystem was there before. Instead, they should look to more exotic ecosystems as a guide, like the dolomite prairies of Illinois, where magnesium-rich rock near the surface makes it hard for ordinary plants to thrive. Slag too is rich in magnesium.
The Slag Queens are continuing their revolutionary work even now! For more, start here:
The short course is aimed at people from industry or government who want to get started in deep learning, apply deep learning to their projects, learn how to code deep learning algorithms, and upgrade their skills to the latest AI algorithms, including generative AI. The course will be taught be Professor Xavier Bresson from the Department of Computer Science at the National University of Singapore (NUS).
The course will be limited to 25 participants. The fee for this course is $2,500 for participants. Registration closes soon (Feb 5); we still have a few spots available.
Last night, I was taken aback to discover that my name appears in the Epstein Files, in 26 different documents. This is despite the fact that I met Jeffrey Epstein a grand total of zero times, and had zero email or any other contact with him … which is more (less) than some of my colleagues can say.
The bulk of the correspondence involves Epstein wanting to arrange a meeting with me and Seth Lloyd back in 2010, via an intermediary named Charles Harper, about funding a research project on “Cryptography in Nature.”
Searching my inbox, it turns out that this Charles Harper did contact me in May 2010, and I then met him at S&S Deli in Cambridge (plausible, although I have zero recollections of this meeting—only of the deli). Harper then sent me a detailed followup email about his proposed Cryptography in Nature project, naming Jeffrey Epstein for the first time as the project’s funder, and adding: “perhaps you will know Jeffrey and his background and situation.”
For whatever reason, I forwarded this email to my parents, brother, and then-fiancee Dana. My brother then found and shared a news article about Epstein’s prostitution conviction, adding to a different article that I had found and shared. (At that time, like many others, I’d probably vaguely heard of Epstein, but he didn’t have 0.1% the infamy that he has now.) Then my mom wrote the following: “be careful not to get sucked up in the slime-machine going on here! Since you don’t care that much about money, they can’t buy you at least.”
It appears from emails that Charles Harper tried again later that summer to arrange a meeting between me and Epstein, but that I took my mom’s advice and largely blew him off, and no such meeting ever happened. Amazingly, I then forgot entirely that any of this had occurred until last night. By way of explanation, some business/finance dude trying to interest me in half-baked ideas involving quantum, AI, cryptography, etc., often dangling the prospect of funding for my students and postdocs, shows up in my life like every month. Most of their world-changing initiatives go nowhere for one reason or another. There really wasn’t much reason to think further about this, until Epstein had become history’s most notorious sex criminal, which (again) wouldn’t happen until years later, after I’d forgotten.
It gets better, though. In the Epstein Files, one also finds a November 2010 letter from Charles Harper to Epstein about organizing a conference on the same Cryptography in Nature topic, which includes the following idea about me:
Scott Aaronson was born on May 21st, 1981. He will be 30 in 2011. The conference could follow a theme of: “hurry to think together with Scott Aaronson while he is still in his 20s and not yet a pitiful over-the-hill geezer in his 30s.” This offers another nice opportunity for celebration.
I see no indication that any such conference ever happened; in any case, I didn’t get invited to one!
On my Facebook, some friends are joking that “it tracks that someone into teenage girls might think Scott Aaronson was a hot property in his nubile 20s, who would get old and boring in his 30s”—and that maybe Epstein was less sexist about such matters than everyone assumes. I replied that I wished I could say the proposition that I’d gradually get slower and more senile through the 2010s and 2020s was entirely false.
But the best comment was that I’ve been incredibly lucky to have such an astute family. If only Bill Gates and Larry Summers had had my mom to go to for advice, they could’ve saved themselves a lot of grief.
The following guest post was written by a Shtetl-Optimized fan in Iran, who’s choosing to remain anonymous for obvious reasons. I’m in awe of the courage of this individual and the millions of other Iranians who’ve risked or, tragically, sacrificed their lives these past few weeks, to stand for something about as unequivocally good and against something about as unequivocally evil as has ever existed on earth. I’m enraged at the relative indifference of the world, and of the US in particular, to these brave Iranians’ plight. There’s still time for the US to fulfill its promise to the protesters and do the right thing—something that I’ll support even if it endangers my friends and family living in Israel. I check the news from Iran every day, and pray that my friends and colleagues there stay safe—and that they, and the world, will soon be free from the Ayatollahs, who now stand fully unmasked before the world as the murderous thugs they always were. –SA
Guest Post from an Iranian
The protests began in Tehran on 28 December 2025, triggered by economic instability and high inflation, and spread to other provinces. People, tired of the regime and aware that every president is just a puppet with no real power, began targeting the source of authority by chanting directly against Khamenei. After government forces killed several protesters, Trump said on 3 January that if they shoot, then U.S. will come to rescue. Protests continued, and on 6 January, Reza Pahlavi called for demonstrations at 8 PM on January 8 and 9. At first, all the regime supporters mocked this and said nobody will come. On these days, they shared videos of empty streets on the news to claim that nobody had shown up. But actually, many people joined the protests. Right around 8 PM on January 8, the government shut down the internet. Only Iran’s internal network remained active, meaning local apps and websites that use Iranian servers work, but the rest of the world was completely cut off.
The regime fears the internet so much that it has officially announced that anyone using Starlink is considered a spy for foreign countries, especially Mossad, and will be punished. As a result, Starlink owners are extremely cautious and rarely let others know they have it.
I know many students who missed deadlines or interviews because of internet shutdown. Some students were forced to travel near Iran’s borders and use Afghanistan’s or Iraq’s internet just to check their email. I personally missed the deadlines for two universities. Just before the internet shutdown, a professor sent me a problem sheet that was part of the application process, and I could not even inform him about the situation. For the past four years since completing my undergraduate studies, my only dream has been to pursue a PhD. I come from a low-income family, and I did everything in my power to reach this stage. I tried to control every variable that might disrupt my four-year plan. Yet now it seems I have failed, and I face an uncertain future.
At the same time, U.S. sanctions have significantly limited Iranian opportunities to study at universities worldwide. With Trump’s travel ban on all Iranians, along with some European countries following U.S. sanctions by rejecting Iranian applicants solely based on nationality, our options have become limited (for example, see the “Evaluation criteria” section). The recent internet shutdown has worsened the situation and left us with even fewer opportunities. While the regime shuts down our internet and takes away our opportunities, the very people responsible for this suppression are ensuring their own children never face such obstacles (I will return to this at the end of the post).
On January 8, my sister and I participated. We were inside our car when Special Units and Basij thugs shot at civilians on the pedestrian path using a shotgun, exactly two meters away from us. I was so shocked that I could not even respond. My sister pushed my head under the car’s dashboard to prevent me from getting shot. I come from a very small town, and this was the level of suppression we witnessed there. Now imagine the scale of suppression in major cities like Tehran, and suddenly the number of protesters reported killed in the news begin to make sense.
We now see tweets on X that not only deny the killings but openly mock them. Is it really possible to deny the body bags in Kahrizak? If a government shuts down the internet across an entire country for three weeks to prevent information from leaking out, do you believe it when it claims the sky is blue? (Check NetBlocks.org and this on Mastodon.)
After January 8, many of the regime’s puppets, who are funded to spread its propaganda in Western media, began whitewashing events on U.S. and European TV, claiming that nobody was killed or that it was a terrorist attack and the government had to act. Some even claim that the protesters are violent rioters and the government has the right to shoot them with war ammunition. Iranians call these puppets “bloodwashers.”
These bloodwashers forget that since 1979, people have tried every possible way to express their opinions and demands, and all of it was ridiculed by the regime and its supporters. Every attempt was suppressed without even being heard. So how do you think things will turn out? Clearly, people become more aggressive in each wave of protests, a pattern you can see in every uprising since 2009. This is also accompanied by worsening poverty. Ordinary people suffer from hunger because some radicals refuse to talk with the U.S., while regime supporters enjoy unlimited access to money and privileges.
Out of the four presidential elections held after 2009, people elected three presidents who promised to pursue a deal with U.S, the so-called Reformist party. People were desperate for change because they knew their situation could only improve if the regime talks with U.S. Many called the voters naïve, arguing that presidents cannot truly make a difference and lack real power, often saying, “Khamenei would never allow that.” I believe many of the voters knew that deep down. They knew that each time a president speaks about negotiating with the U.S., Khamenei suddenly gathers all his supporters and states “No, I am not okay with talking with the U.S.”. Still, people felt they had no real alternative but elections. After the 2015 Nuclear deal (Joint Comprehensive Plan of Action), people thought they can finally live normal lives and have normal relations with other countries (See how people celebrated the deal on the night it was finalized). At the time, I was even planning to assemble a new PC and thought it might be better to wait and buy parts from Amazon! We didn’t yet know what the IRGC had planned for us over the next ten years. Now, all their actions and stubbornness have led them to this point where they have to surrender completely (the deal Trump is talking about, which essentially takes away everything that makes Islamic Republic the Islamic Republic), or force another war on our people, and then surrender disgracefully. People are now saying that “Come on, the U.S. you wanted to destroy so badly has come. Take all your supporters and go fight it. Or perhaps you are only brave against ordinary unarmed people” This was an inevitable outcome after October 7 attacks, that their time will come one day, but they still did not want to listen. I often see debates about whether U.S. involvement in other countries is good or whether it should isolate itself as it is not its people’s business. I believe decisions regarding Iran were made weeks ago, and we now have no choice but to wait and see what happens. I just hope that the situation turns out better for the people.
As I mentioned earlier, Islamic regime officials chant “death to the U.S. and the West,” yet they send their children to Western countries. These children use funds and opportunities that could have gone to far more deserving people, while living comfortably and anonymously in the very societies their parents want to destroy.
They flee the country their parents made and climb the social ladder of western societies, while ordinary students cannot even afford a simple TOEFL exam and survive on as little as five dollars a month.
When ordinary Iranian students apply for visas, especially for the U.S. and Canada, they are forced to provide every detail of their lives to prove they are not terrorists and that they will return to Iran. Sometimes, they may have to explain to the embassy officer the topics of their professors’ papers, the health condition of their father, and whether they own houses, which the last two indirectly indicate whether they will return or not. If they are lucky enough not to be rejected within ten minutes, they may enter a clearance process that takes at least a year. Only then might they receive a visa. But how is it that when it comes to the children of regime’s officials, they freely enter and live there without issue.
There are countless examples. Mohammad Reza Aref, a Stanford graduate and current Vice President who has repeatedly worn IRGC uniforms in public support, has sons who earned PhDs from EPFL and the University of Florida, and one publicly attributed this success to “good genes”. Ali Larijani, an IRGC officer, had a daughter working at Emory University until last week. Masoumeh Ebtekar, who climbed the wall of the U.S. Embassy during the 1979 Islamic Revolution, has a son, Eissa Hashemi, who is an adjunct faculty member at The Chicago School of Professional Psychology.
Many Iranians are now actively raising awareness through petitions and protests at these individuals’ workplaces. One example is the petition regarding Eissa Hashemi. Protests at Emory University have reportedly led to Fatemeh Larijani’s recent unemployment. (Larijani family hold critical roles in the regime, and in fact, many members of the family have studied or currently live in Western countries. There is even a saying that while people were forced to fight the U.S., the Larijanis were filling out university application forms.)
When these individuals occupy seats in your labs or use your tax-funded resources, it directly affects the integrity of your institutions and the opportunities available to those who actually share your values. You do not even need to spend time investigating these people yourself. Iranians will protest outside offices or send emails about your colleagues with this condition. All I ask is that the next time you receive multiple emails about a particular Iranian colleague, or hear about protests near your workplace, you spend just five minutes considering what is being said.
Thank you to everyone who took the time to read this. I know it is long, and I know it is heavy. I wrote it because silence and denial only help suppression survive, and because attention, however brief, matters. I hope that better and freer days come.
This is a compilation of posts related to some basic concepts of the physics of materials and nanoscale physics. I realized the other day that I hadn't updated this since 2019, and therefore a substantial audience may not have seen these. Wikipedia's physics entries have improved greatly over the years, but hopefully these are a complement that's useful to students and maybe some science writers. Please let me know if there are other topics that you think would be important to include.
Have you seen “population pyramids“? They’re diagrams that show snapshots of a population, how many people there are of each age. They can give you an intuition for how a population is changing, and where the biggest hurdles are to survival.
I wonder what population pyramids would look like for academia. In each field and subfield, how many people are PhD students, postdocs, and faculty?
If every PhD student was guaranteed to become faculty, and the number of faculty stayed fixed, you could roughly estimate what this pyramid would have to look like. An estimate for the US might take an average 7-year PhD, two postdoc positions at 3 years each, followed by a 30-year career as faculty, and estimate the proportions of each stage based on proportions of each scholar’s life. So you’d have roughly one PhD student per four faculty, and one postdoc per five. In Europe, with three-year PhDs, the proportion of PhD students decreases further, and in a world where people are still doing at least two postdocs you expect significantly more postdocs than PhDs.
Of course, the world doesn’t look like that at all, because the assumptions are wrong.
The number of faculty doesn’t stay fixed, for one. When population is growing in the wider world, new universities open in new population centers, and existing universities find ways to expand. When population falls, enrollments shrink, and universities cut back.
But this is a minor perturbation compared to the much more obvious difference: most PhD students do not stay in academia. A single professor may mentor many PhDs at the same time, and potentially several postdocs. Most of those people aren’t staying.
You can imagine someone trying to fix this by fiat, setting down a fixed ratio between PhD students, postdocs, and faculty. I’ve seen partial attempts at this. When I applied for grants at the University of Copenhagen, I was told I had to budget at least half of my hires as PhD students, not postdocs, which makes me wonder if they were trying to force careers to default to one postdoc position, rather than two. More likely, they hadn’t thought about it.
Zero attrition doesn’t really make sense, anyway. Some people are genuinely better off leaving: they made a mistake when they started, or they changed over time. Sometimes new professions arise, and the best way in is from an unexpected direction. I’ve talked to people who started data science work in the early days, before there really were degrees in it, who felt a physics PhD had been the best route possible to that world. Similarly, some move into policy, or academic administration, or found a startup. And if we think there are actually criteria to choose better or worse academics (which I’m a bit skeptical of), then presumably some people are simply not good enough, and trying to filter them out earlier is irresponsible when they still don’t have enough of a track record to really judge.
How much attrition should be there is the big question, and one I don’t have an answer for. In academia, when so much of these decisions are made by just a few organizations, it seems like a question that someone should have a well-considered answer to. But so far, it’s unclear to me that anyone does.
It also makes me think, a bit, about how these population pyramids work in industry. There there is no overall control. Instead, there’s a web of incentives, many of them decades-delayed from the behavior they’re meant to influence, leaving each individual to try to predict as well as they can. If companies only hire senior engineers, no-one gets a chance to start a career, and the population of senior engineers dries up. Eventually, those companies have to settle for junior engineers. (Or, I guess, ex-academics.) It sounds like it should lead to the kind of behavior biologists model in predators and prey, wild swings in population modeled by a differential equation. But maybe there’s something that tamps down those wild swings.
Thomas Bloom’s Erdös problem site has become a real hotbed of activity in recent months, particularly as some of the easiest of the outstanding open problems have turned out to be amenable to various AI-assisted approaches; there is now a lively community in which human contributions, AI contributions, and hybrid contributions are presented, discussed, and in some cases approved as updates to the site.
One of the lessons I draw from this is that once a well curated database of precise mathematical problems is maintained, it becomes possible for other parties to build upon it in many ways (including both AI-based and human-based approaches), to systematically make progress on some fraction of the problems.
This makes me wonder what other mathematical databases could be created to stimulate similar activity. One candidate that came to mind are “optimization constants” – constants that arise from some mathematical optimization problem of interest, for instance finding the best constant for which a certain functional inequality is satisfied.
I am therefore proposing to create a crowdsourced repository for such constants, to record the best upper and lower bounds known for any given such constant, in order to help encourage efforts (whether they be by professional mathematicians, amateur mathematicians, or research groups at a tech company) to try to improve upon the state of the art.
There are of course thousands of such constants one could consider, but just to set the discussion going, I set up a very minimal, proof of concept Github repository holding over 20 constants including:
Here, I am taking inspiration from the Erdös problem web site and arbitrarily assigning a number to each constant, for ease of reference.
Even in this minimal state I think the repository is ready to start accepting more contributions, in the form of pull requests that add new constants, or improve the known bounds on existing constants. (I am particularly interested in constants that have an extensive literature of incremental improvements in the lower and upper bounds, and which look at least somewhat amenable to computational or AI-assisted approaches.) But I would be interested to hear feedback on how to improve the repository in other ways.
Update: Paata Ivanisvili and Damek Davis have kindly agreed to help run and expand this repository.
and he okayed me posting them here. He’s taking the idea of categorifying the Riemann zeta function, explained in my paper, and going further, imagining what it might mean to categorify Riemann’s functional equation
My paper categorified the Euler product formula that writes the Riemann zeta function as a product over the usual primes:
I had nothing to say about the real prime.
But it’s the functional equation that sets the stage for focusing on zeroes of the Riemann zeta function with … and then the Riemann Hypothesis! So it’s worth thinking about.
David wrote:
Hi John,
Hope you’re doing well!
I was just thinking about your (and James Dolan’s) definition of the zeta functors associated to a finite type scheme (from here), and I had a small thought which I figured you might find interesting.
I was thinking about the functional equation of the completed zeta functions; how might we complete the zeta functors in such a way that they satisfy a similar functional equation? I don’t know, but I do have an idea for what the transformation might mean in this context. I claim that it is given by the reduced suspension. Let me explain.
First, I’ll want to see the formal power as the power
which I can then categorify by finding a group with cardinality and considering . In the case of the Riemann zeta species, is the cardinality of a finite semisimple ring (a product of finite fields, the groupoid of which has cardinality for each ), and we can simply deloop the additive group of this ring. This gives us a Dirichlet functor
which categorifies the Riemann zeta function when is a finite set.
Taking this point of view on the zeta functor, we can ask the question: what is the transformation ? Here’s where we can look at the reduced suspension . The universal property of the reduced suspension says that maps correspond to points of the homotopy type
(or, more classically, maps from the terminal morphism to ). Since homotopy cardinality is multiplicative for fibrations, that type has cardinality
(when is a finite set of cardinality ).
Taking for finite semisimple of cardinality , we see that has cardinality . Therefore, I think the transformation in the functional equation may be categorified by . If this makes sense, it suggests that completing the zeta functors is a form of stabilization.
Cheers,
David
And then:
As for another eyebrow wiggle about the cardinality of when is a finite set: we have that , the free group on generators. This is of course infinite, but it it is the group completion of the free monoid on generators. Since
it has cardinality .
Maybe it’s better to use the “free delooping” (aka weighted colimit of by ) instead of the reduced suspension. This doesn’t change the above argument because we’re mapping into a groupoid, but now it is true that the Euler characteristic / cardinality of that category is .
A friend pointed out that, while I've written many posts that have to do with superconductivity, I've never really done a concept post about it. Here's a try, as I attempt to distract myself from so many things happening these days.
The superconducting state is a truly remarkable phase of matter that is hosted in many metals (though ironically not readily in the pure elements (Au, Ag, Cu) that are the best ordinary conductors of electricity - see here for some references). First, some definitional/phenomenological points:
The superconducting state is a distinct thermodynamic phase. In the language of phase transitions developed by Ginzburg and Landau back in the 1950s, the superconducting state has an order parameter that is nonzero, compared to the non-superconducting metal state. When you cool down a metal and it becomes a superconductor, this really is analogous (in some ways) to when you cool down liquid water and it becomes ice, or (a better comparison) when you cool down very hot solid iron and it becomes a magnet below 770 °C.
In the superconducting state, at DC, current can flow with zero electrical resistance. Experimentally, this can be checked by setting up a superconducting current loop and monitoring the current via the magnetic field it produces. If you find that the current will decay over somewhere between \(10^5\) and \(\infty\) years, that's pretty convincing that the resistance is darn close to zero.
This is not just "perfect" conduction. If you placed a conductor in a magnetic field, turned on perfect conduction, and then tried to change the magnetic field, currents would develop currents that would preserve the amount of magnetic flux through the perfect conductor. In contrast, a key signature of superconductivity is the Meissner-Oschenfeld Effect: if superconductivity is turned on in the presence of a (sufficiently small) magnetic field, currents will develop spontaneously at the surface of the material to exclude all magnetic flux from the bulk of the superconductor. (That is, the magnetic field from the currents will be oppositely directed to the external field and of just the right size and distribution to give \(\mathbf{B}=0\) in the bulk of the material.) Observation of the bulk Meissner effect is among the strongest evidence for true superconductivity, much more robust than a measurement that seems to indicate zero voltage drop. Indeed, as a friend of mine pointed out to me, a one-phrase description of a superconductor is "a perfect diamagnet".
There are two main types of superconductors, uncreatively termed "Type I" and "Type II". In Type I superconductors, an external \(\mathbf{H} = \mathbf{B}/\mu_{0}\) fails to penetrate the bulk of the material until it reaches a critical field \(H_{c}\), at which point the superconducting state is suppressed completely. In a Type II superconductor, above some lower critical field \(H_{c,1}\) magnetic flux begins to penetrate the material in the form of vortices, each of which has a non-superconducting ("normal") core. Above an upper critical field \(H_{c,2}\), superconductivity is suppressed.
Interestingly, a lot of this can be "explained" by the London Equations, which were introduced in the 1930s despite a complete lack of a viable microscopic theory of superconductivity.
Magnetic flux through a conventional superconducting ring (or through a vortex core) is quantized precisely in units of \(h/2e\), where \(h\) is Planck's constant and \(e\) is the electronic charge.
(It's worth noting that in magnetic fields and with AC currents, there are still electrical losses in superconductors, due in part to the motion of vortices.)
Physically, what is the superconducting state? Why does it happen and why does it have the weird properties described above as well as others? There are literally entire textbooks and semester-long courses on this, so what follows is very brief and non-authoritative.
In an ordinary metal at low temperatures, neglecting e-e interactions and other complications, the electrons fill up states (because of the Pauli Principle) starting from the lowest energy up to some highest value, the Fermi energy. (See here for some mention of this.) Empty electronic states are available at essentially no energy cost - exciting electrons from filled states to empty states are "gapless".
Electrical conduction takes place through the flow of these electronic quasiparticles. (For more technical readers: We can think of these quasiparticles like little wavepackets, and as each one propagates around the wavepacket accumulates a certain amount of phase. The phases of different quasiparticles are arbitrary, but the change in the phase going around some trajectory is well defined.)
In a superconductor, there is some effective attractive interaction between electrons that we have thus far neglected. In conventional superconductors, this involves lattice vibrations (as in this wikipedia description), though other attractive interactions are possible. At sufficiently low temperatures, the ordinary metal state is unstable, and the system will spontaneously form pairs of electrons (or holes). Those pairs then condense into a single coherent state described by an amplitude \(|\Psi|\) and a phase, \(\phi\), shared by all the pairs. The conventional theory of this was formulated by Bardeen, Cooper, and Schrieffer in 1957. A couple of nice lecture note presentations of this are here (courtesy Yuval Oreg) and here (courtesy Dan Arovas), if you want the technical details. This leads to an energy gap that characterizes how much it costs to create individual quasiparticles. Conduction in a superconductor takes place through the flow of pairs. (A clue to this is the appearance of the \(2e\) in the flux quantization.)
This taking on of a global phase for the pairs of electrons is a spontaneous breaking of gauge symmetry - this is discussed pedagogically for physics students here. Understanding this led to figuring out the Anderson-Higgs mechanism, btw.
The result is a state with a kind of rigidity; precisely how this leads to the phenomenology of superconductivity is not immediately obvious, to me anyway. If someone has a link to a great description of this, please put it in the comments. (Interestingly google gemini is not too bad at discussing this.)
The paired charge carriers are described by a pairing symmetry of their wave functions in real space. In conventional BCS superconductors, each pair has no orbital angular momentum ("\(s\)-wave"), and the spins are in a singlet state. In other superconductors, pairs can have \(l = 1\) orbital angular momentum ("\(p\)-wave", with spins in the triplet configuration), \(l = 2\) orbital angular momentum ("\(d\)-wave", with spins in a singlet again), etc. The pairing state determines whether the energy gap is directionally uniform (\(s\)-wave) or whether there are directions ("nodes") along which the gap goes to zero.
I have necessarily left out a ton here. Superconductivity continues to be both technologically critical and scientifically fascinating. One major challenge in understanding the microscopic mechanisms behind particular superconductors is that the superconducting state itself is in a sense generic - many of its properties (like phase rigidity) are emergent regardless of the underlying microscopic picture, which is amazing.
One other point, added after initial posting. In quantum computing approaches, a major challenge is how to build robust effective ("logical") qubits from individual physical qubits that are not perfect (meaning that they suffer from environmental decoherence among other issues). The phase coherence of electronic quasiparticles in ordinary metals is generally quite fragile; inelastic interactions with each other, with phonons, with impurity spins, etc. can all lead to decoherence. However, starting from those ingredients, superconductivity shows that it is possible to construct, spontaneously, a collective state with very long-lived coherence. I'm certain I'm not the first to wonder about whether there are lessons to be drawn here in terms of the feasibility of and approaches to quantum error correction.
On December 10, I gave a keynote address at the Q2B 2025 Conference in Silicon Valley. This is a transcript of my remarks. The slides I presented are here.The video is here.
The first century
We are nearing the end of the International Year of Quantum Science and Technology, so designated to commemorate the 100th anniversary of the discovery of quantum mechanics in 1925. The story goes that 23-year-old Werner Heisenberg, seeking relief from severe hay fever, sailed to the remote North Sea Island of Helgoland, where a crucial insight led to his first, and notoriously obscure, paper describing the framework of quantum mechanics.
In the years following, that framework was clarified and extended by Heisenberg and others. Notably among them was Paul Dirac, who emphasized that we have a theory of almost everything that matters in everyday life. It’s the Schrödinger equation, which captures the quantum behavior of many electrons interacting electromagnetically with one another and with atomic nuclei. That describes everything in chemistry and materials science and all that is built on those foundations. But, as Dirac lamented, in general the equation is too complicated to solve for more than a few electrons.
Somehow, over 50 years passed before Richard Feynman proposed that if we want a machine to help us solve quantum problems, it should be a quantum machine, not a classical machine. The quest for such a machine, he observed, is “a wonderful problem because it doesn’t look so easy,” a statement that still rings true.
I was drawn into that quest about 30 years ago. It was an exciting time. Efficient quantum algorithms for the factoring and discrete log problems were discovered, followed rapidly by the first quantum error-correcting codes and the foundations of fault-tolerant quantum computing. By late 1996, it was firmly established that a noisy quantum computer could simulate an ideal quantum computer efficiently if the noise is not too strong or strongly correlated. Many of us were then convinced that powerful fault-tolerant quantum computers could eventually be built and operated.
Three decades later, as we enter the second century of quantum mechanics, how far have we come? Today’s quantum devices can perform some tasks beyond the reach of the most powerful existing conventional supercomputers. Error correction had for decades been a playground for theorists; now informative demonstrations are achievable on quantum platforms. And the world is investing heavily in advancing the technology further.
Current NISQ machines can perform quantum computations with thousands of two-qubit gates, enabling early explorations of highly entangled quantum matter, but still with limited commercial value. To unlock a wide variety of scientific and commercial applications, we need machines capable of performing billions or trillions of two-qubit gates. Quantum error correction is the way to get there.
I’ll highlight some notable developments over the past year—among many others I won’t have time to discuss. (1) We’re seeing intriguing quantum simulations of quantum dynamics in regimes that are arguably beyond the reach of classical simulations. (2) Atomic processors, both ion traps and neutral atoms in optical tweezers, are advancing impressively. (3) We’re acquiring a deeper appreciation of the advantages of nonlocal connectivity in fault-tolerant protocols. (4) And resource estimates for cryptanalytically relevant quantum algorithms have dropped sharply.
Quantum machines for science
A few years ago, I was not particularly excited about running applications on the quantum platforms that were then available; now I’m more interested. We have superconducting devices from IBM and Google with over 100 qubits and two-qubit error rates approaching 10^{-3}. The Quantinuum ion trap device has even better fidelity as well as higher connectivity. Neutral-atom processors have many qubits; they lag behind now in fidelity, but are improving.
Users face tradeoffs: The high connectivity and fidelity of ion traps is an advantage, but their clock speeds are orders of magnitude slower than for superconducting processors. That limits the number of times you can run a given circuit, and therefore the attainable statistical accuracy when estimating expectations of observables.
Verifiable quantum advantage
Much attention has been paid to sampling from the output of random quantum circuits, because this task is provably hard classically under reasonable assumptions. The trouble is that, in the high-complexity regime where a quantum computer can reach far beyond what classical computers can do, the accuracy of the quantum computation cannot be checked efficiently. Therefore, attention is now shifting toward verifiable quantum advantage — tasks where the answer can be checked. If we solved a factoring or discrete log problem, we could easily check the quantum computer’s output with a classical computation, but we’re not yet able to run these quantum algorithms in the classically hard regime. We might settle instead for quantum verification, meaning that we check the result by comparing two quantum computations and verifying the consistency of the results.
A type of classical verification of a quantum circuit was demonstrated recently by BlueQubit on a Quantinuum processor. In this scheme, a designer builds a family of so-called “peaked” quantum circuits such that, for each such circuit and for a specific input, one output string occurs with unusually high probability. An agent with a quantum computer who knows the circuit and the right input can easily identify the preferred output string by running the circuit a few times. But the quantum circuits are cleverly designed to hide the peaked output from a classical agent — one may argue heuristically that the classical agent, who has a description of the circuit and the right input, will find it hard to predict the preferred output. Thus quantum agents, but not classical agents, can convince the circuit designer that they have reliable quantum computers. This observation provides a convenient way to benchmark quantum computers that operate in the classically hard regime.
The notion of quantum verification was explored by the Google team using Willow. One can execute a quantum circuit acting on a specified input, and then measure a specified observable in the output. By repeating the procedure sufficiently many times, one obtains an accurate estimate of the expectation value of that output observable. This value can be checked by any other sufficiently capable quantum computer that runs the same circuit. If the circuit is strategically chosen, then the output value may be very sensitive to many-qubit interference phenomena, in which case one may argue heuristically that accurate estimation of that output observable is a hard task for classical computers. These experiments, too, provide a tool for validating quantum processors in the classical hard regime. The Google team even suggests that such experiments may have practical utility for inferring molecular structure from nuclear magnetic resonance data.
Correlated fermions in two dimensions
Quantum simulations of fermionic systems are especially compelling, since electronic structure underlies chemistry and materials science. These systems can be hard to simulate in more than one dimension, particularly in parameter regimes where fermions are strongly correlated, or in other words profoundly entangled. The two-dimensional Fermi-Hubbard model is a simplified caricature of two-dimensional materials that exhibit high-temperature superconductivity and hence has been much studied in recent decades. Large-scale tensor-network simulations are reasonably successful at capturing static properties of this model, but the dynamical properties are more elusive.
Dynamics in the Fermi-Hubbard model has been simulated recently on both Quantinuum (here and here) and Google processors. Only a 6 x 6 lattice of electrons was simulated, but this is already well beyond the scope of exact classical simulation. Comparing (error-mitigated) quantum circuits with over 4000 two-qubit gates to heuristic classical tensor-network and Majorana path methods, discrepancies were noted, and the Phasecraft team argues that the quantum simulation results are more trustworthy. The Harvard group also simulated models of fermionic dynamics, but were limited to relatively low circuit depths due to atom loss. It’s encouraging that today’s quantum processors have reached this interesting two-dimensional strongly correlated regime, and with improved gate fidelity and noise mitigation we can go somewhat further, but expanding system size substantially in digital quantum simulation will require moving toward fault-tolerant implementations. We should also note that there are analog Fermi-Hubbard simulators with thousands of lattice sites, but digital simulators provide greater flexibility in the initial states we can prepare, the observables we can access, and the Hamiltonians we can reach.
When it comes to many-particle quantum simulation, a nagging question is: “Will AI eat quantum’s lunch?” There is surging interest in using classical artificial intelligence to solve quantum problems, and that seems promising. How will AI impact our quest for quantum advantage in this problem space? This question is part of a broader issue: classical methods for quantum chemistry and materials have been improving rapidly, largely because of better algorithms, not just greater processing power. But for now classical AI applied to strongly correlated matter is hampered by a paucity of training data. Data from quantum experiments and simulations will likely enhance the power of classical AI to predict properties of new molecules and materials. The practical impact of that predictive power is hard to clearly foresee.
The need for fundamental research
Today is December 10th, the anniversary of Alfred Nobel’s death. The Nobel Prize award ceremony in Stockholm concluded about an hour ago, and the Laureates are about to sit down for a well-deserved sumptuous banquet. That’s a fitting coda to this International Year of Quantum. It’s useful to be reminded that the foundations for today’s superconducting quantum processors were established by fundamental research 40 years ago into macroscopic quantum phenomena. No doubt fundamental curiosity-driven quantum research will continue to uncover unforeseen technological opportunities in the future, just as it has in the past.
I have emphasized superconducting, ion-trap, and neutral atom processors because those are most advanced today, but it’s vital to continue to pursue alternatives that could suddenly leap forward, and to be open to new hardware modalities that are not top-of-mind at present. It is striking that programmable, gate-based quantum circuits in neutral-atom optical-tweezer arrays were first demonstrated only a few years ago, yet that platform now appears especially promising for advancing fault-tolerant quantum computing. Policy makers should take note!
The joy of nonlocal connectivity
As the fault-tolerant era dawns, we increasingly recognize the potential advantages of the nonlocal connectivity resulting from atomic movement in ion traps and tweezer arrays, compared to geometrically local two-dimensional processing in solid-state devices. Over the past few years, many contributions from both industry and academia have clarified how this connectivity can reduce the overhead of fault-tolerant protocols.
Even when using the standard surface code, the ability to implement two-qubit logical gates transversally—rather than through lattice surgery—significantly reduces the number of syndrome-measurement rounds needed for reliable decoding, thereby lowering the time overhead of fault tolerance. Moreover, the global control and flexible qubit layout in tweezer arrays increase the parallelism available to logical circuits.
Nonlocal connectivity also enables the use of quantum low-density parity-check (qLDPC) codes with higher encoding rates, reducing the number of physical qubits needed per logical qubit for a target logical error rate. These codes now have acceptably high accuracy thresholds, practical decoders, and—thanks to rapid theoretical progress this year—emerging constructions for implementing universal logical gate sets. (See for example here, here, here, here.)
A serious drawback of tweezer arrays is their comparatively slow clock speed, limited by the timescales for atom transport and qubit readout. A millisecond-scale syndrome-measurement cycle is a major disadvantage relative to microsecond-scale cycles in some solid-state platforms. Nevertheless, the reductions in logical-gate overhead afforded by atomic movement can partially compensate for this limitation, and neutral-atom arrays with thousands of physical qubits already exist.
To realize the full potential of neutral-atom processors, further improvements are needed in gate fidelity and continuous atom loading to maintain large arrays during deep circuits. Encouragingly, active efforts on both fronts are making steady progress.
Approaching cryptanalytic relevance
Another noteworthy development this year was a significant improvement in the physical qubit count required to run a cryptanalytically relevant quantum algorithm, reduced by Gidney to less than 1 million physical qubits from the 20 million Gidney and Ekerå had estimated earlier. This applies under standard assumptions: a two-qubit error rate of 10^{-3} and 2D geometrically local processing. The improvement was achieved using three main tricks. One was using approximate residue arithmetic to reduce the number of logical qubits. (This also suppresses the success probability and therefore lengthens the time to solution by a factor of a few.) Another was using a more efficient scheme to reduce the number of physical qubits for each logical qubit in cold storage. And the third was a recently formulated scheme for reducing the spacetime cost of non-Clifford gates. Further cost reductions seem possible using advanced fault-tolerant constructions, highlighting the urgency of accelerating migration from vulnerable cryptosystems to post-quantum cryptography.
Looking forward
Over the next 5 years, we anticipate dramatic progress toward scalable fault-tolerant quantum computing, and scientific insights enabled by programmable quantum devices arriving at an accelerated pace. Looking further ahead, what might the future hold? I was intrigued by a 1945 letter from John von Neumann concerning the potential applications of fast electronic computers. After delineating some possible applications, von Neumann added: “Uses which are not, or not easily, predictable now, are likely to be the most important ones … they will … constitute the most surprising extension of our present sphere of action.” Not even a genius like von Neumann could foresee the digital revolution that lay ahead. Predicting the future course of quantum technology is even more hopeless because quantum information processing entails an even larger step beyond past experience.
As we contemplate the long-term trajectory of quantum science and technology, we are hampered by our limited imaginations. But one way to loosely characterize the difference between the past and the future of quantum science is this: For the first hundred years of quantum mechanics, we achieved great success at understanding the behavior of weakly correlated many-particle systems, leading for example to transformative semiconductor and laser technologies. The grand challenge and opportunity we face in the second quantum century is acquiring comparable insight into the complex behavior of highly entangled states of many particles, behavior well beyond the scope of current theory or computation. The wonders we encounter in the second century of quantum mechanics, and their implications for human civilization, may far surpass those of the first century. So we should gratefully acknowledge the quantum pioneers of the past century, and wish good fortune to the quantum explorers of the future.
Welcome back to: Has quantum advantage been achieved?
In Part 1 of this mini-series on quantum advantage demonstrations, I told you about the idea of random circuit sampling (RCS) and the experimental implementations thereof. In this post, Part 2 out of 3, I will discuss the arguments and evidence for why I am convinced that the experiments demonstrate a quantum advantage.
Recall from Part 1 that to assess an experimental quantum advantage claim we need to check three criteria:
Does the experiment correctly solve a computational task?
Does it achieve a scalable advantage over classical computation?
Does it achieve an in-practice advantage over the best classical attempt at solving the task?
What’s the issue?
When assessing these criteria for the RCS experiments there is an important problem: The early quantum computers we ran them on were very far from being reliable and the computation was significantly corrupted by noise. How should we interpret this noisy data? Or more concisely:
Is random circuit sampling still classically hard even when we allow for whatever amount of noise the actual experiments had?
Can we be convinced from the experimental data that this task has actually been solved?
I want to convince you today that we have developed a very good understanding of these questions that gives a solid underpinning to the advantage claim. Developing that understanding required a mix of methodologies from different areas of science, including theoretical computer science, algorithm design, and physics and has been an exciting journey over the past years.
The noisy sampling task
Let us start by answering the base question. What computational task did the experiments actually solve?
Recall that, in the ideal RCS scenario, we are given a random circuit on qubits and the task is to sample from the output distribution of the state obtained from applying the circuit to a simple reference state. The output probability distribution of this state is determined by the Born rule when I measure every qubit in a fixed choice of basis.
Now what does a noisy quantum computer do when I execute all the gates on it and apply them to its state? Well, it prepares a noisy version of the intended state and once I measure the qubits, I obtain samples from the output distribution of that noisy state.
We should not make the task dependent on the specifics of that state or the noise that determined it, but we can define a computational task based on this observation by fixing how accurate that noisy state preparation is. The natural way to do this is to use the fidelity
which is just the overlap between the ideal state and the noisy state. The fidelity is 1 if the noisy state is equal to the ideal state, and 0 if it is perfectly orthogonal to it.
Finite-fidelity random circuit sampling Given a typical random circuit , sample from the output distribution of any quantum state whose fidelity with the ideal output state is at least .
Note that finite-fidelity RCS does not demand success for every circuit, but only for typical circuits from the random circuit ensemble. This matches what the experiments do: they draw random circuits and need the device to perform well on the overwhelming majority of those draws. Accordingly, when the experiments quote a single number as “fidelity”, it is really the typical (or, more precisely, circuit-averaged) fidelity that I will just call .
The noisy experiments claim to have solved finite-fidelity RCS for values of around 0.1%. What is more, they consistently achieve this value even as the circuit sizes are increased in the later experiments. Both the actual value and the scaling will be important later.
What is the complexity of finite-fidelity RCS?
Quantum advantage of finite-fidelity RCS
Let’s start off by supposing that the quantum computation is (nearly) perfectly executed, so the required fidelity is quite large, say, 90%. In this scenario, we have very good evidence based on computational complexity theory that there is a scalable and in-practice quantum advantage for RCS. This evidence is very strong, comparable to the evidence we have for the hardness of factoring and simulating quantum systems. The intuition behind it is that quantum output probabilities are extremely hard to compute because of a mechanism behind quantum advantages: destructive interference. If you are interested in the subtleties and the open questions, take a look at our survey.
The question is now, how far down in fidelity this classical hardness persists? Intuitively, the smaller we make , the easier finite-fidelity RCS should become for a classical algorithm (and a quantum computer, too), since the freedom we have in deviating from the ideal state in our simulation becomes larger and larger. This increases the possibility of finding a state that turns out to be easy to simulate within the fidelity constraint.
Somewhat surprisingly, though, finite-fidelity RCS seems to remain hard even for very small values of . I am not aware of any efficient classical algorithm that achieves the finite-fidelity task for significantly away from the baseline trivial value of . This is the value a maximally mixed or randomly picked state achieves because a random state has no correlation with the ideal state (or any other state), and is exactly what you expect in that case (while 0 would correspond to perfect anti-correlation).
One can save some classical runtime compared to solving near-ideal RCS by exploiting a reduced fidelity, but the costs remain exponential. To classically solve finite-fidelity RCS, the best known approaches are reported in the papers that performed classical simulations of finite-fidelity RCS with the parameters of the first Google and USTC experiment (classSim1, classSim2). To achieve this, however, they needed to approximately simulate the ideal circuits at an immense cost. To the best of my knowledge, all but those first two experiments are far out of reach for these algorithms.
Getting the scaling right: weak noise and low depth
So what is the right value of at which we can hope for a scalable and in-practice advantage of RCS experiments?
When thinking about this question, it is helpful to keep a model of the circuit in mind that a noisy experiment runs. So, let us consider a noisy circuit on qubits with layers of gates and single-qubit noise of strength on every qubit in every layer. In this scenario, the typical fidelity with the ideal state will decay as .
Any reasonably testable value of the fidelity needs to scale as , since eventually we need to estimate the average fidelity from the experimental samples and this typically requires at least samples, so exponentially small fidelities are experimentally invisible. The polynomial fidelity is also much closer to the near-ideal scenario (90%) than the trivial scenario (). While we cannot formally pin this down, the intuition behind the complexity-theoretic evidence for the hardness of near-ideal RCS persists into the regime: to sample up to such high precision, we still need a reasonably accurate estimate of the ideal probabilities, and getting this is computationally extremely difficult. Scalable quantum advantage in this regime is therefore a pretty safe bet.
How do the parameters of the experiment and the RCS instances need to scale with the number of qubits to experimentally achieve the fidelity regime? The limit to consider is one in which the noise rate decreases with the number of qubits, while the circuit depth is only allowed to increase very slowly. It depends on the circuit architecture, i.e., the choice of circuit connectivity, and the gate set, through a constant as I will explain in more detail below.
Weak-noise and low-depth scaling (Weak noise) The local noise rate of the quantum device scales as . (Low depth) The circuit depth scales as .
This limit is such that we have a scaling of the fidelity as for some constant . It is also a natural scaling limit for noisy devices whose error rates gradually improve through better engineering. You might be worried about the fact that the depth needs to be quite low but it turns out that there is a solid quantum advantage even for -depth circuits.
The precise definition of the weak-noise regime is motivated by the following observation. It turns out to be crucial for assessing the noisy data from the experiment.
Fidelity versus XEB: a phase transition
Remember from Part 1 that the experiments measured a quantity called the cross-entropy benchmark (XEB)
The XEB averages the ideal probabilities corresponding to the sampled outcomes from experiments on random circuits . Thus, it correlates the experimental and ideal output distributions of those random circuits. You can think of it as a “classical version” of the fidelity: If the experimental distribution is correct, the XEB will essentially be 1. If it is uniformly random, the XEB is 0.
The experiments claimed that the XEB is a good proxy for the circuit-averaged fidelity given by , and so we need to understand when this is true. Fortunately, in the past few years, alongside with the improved experiments, we have developed a very good understanding of this question (WN, Spoof2, PT1, PT2).
It turns out that the quality of correspondence between XEB and average fidelity depends strongly on the noise in the experimental quantum state. In fact, there is a sharp phase transition: there is an architecture-dependent constant such that when the experimental local noise rate , then the XEB is a good and reliable proxy for the average fidelity for any system size and circuit depth . This is exactly the weak-noise regime. Above that threshold, in the strong noise regime, the XEB is an increasingly bad proxy for the fidelity (PT1, PT2).
Let me be more precise: In the weak-noise regime, when we consider the decay of the XEB as a function of circuit depth , the rate of decay is given by , i.e., the XEB decays as . Meanwhile, in the strong-noise regime the rate of decay is constant, giving an XEB decay as for a constant . At the same time, the fidelity decays as regardless of the noise regime. Hence, in the weak-noise regime, the XEB is a good proxy of the fidelity, while in the strong noise regime, there is an exponentially increasing gap between the XEB (which remains large) and the fidelity (which continues to decay exponentially). regardless of the noise regime.
This is what the following plot shows. We computed it from an exact mapping of the behavior of the XEB to the dynamics of a statistical-mechanics model that can be evaluated efficiently. Using this mapping, we can also compute the noise threshold for whichever random circuit family and architecture you are interested in.
From (PT2). The -axis label is the decay rate of the XEB , the number of qubits and is the local noise rate.
Where are the experiments?
We are now ready to take a look at the crux when assessing the noisy data: Can we trust the reported XEB values as an estimator of the fidelity? If so, do the experiments solve finite-fidelity RCS in the solidly hard regime where ?
In their more recent paper (PT1), the Google team explicitly verified that the experiment is well below the phase transition, and it turns out that the first experiment was just at the boundary. The USTC experiments had comparable noise rates, and the Quantinuum experiment much better ones. Since fidelity decays as , but the reported XEB values stayed consistently around 0.1% as was increased, the experimental error rate of the experiments improved even better than the scaling required for the weak-noise regime, namely, more like . Altogether, the experiments are therefore in the weak-noise regime both in terms of absolute numbers relative to and the required scaling.
Of course, to derive the transition, we made some assumptions about the noise such as that the noise is local, and that it does not depend much on the circuit itself. In the advantage experiments, these assumptions about the noise are characterized and tested. This is done through a variety of means at increasing levels of complexity, including detailed characterization of the noise in individual gates, gates run in parallel, and eventually in a larger circuit. The importance of understanding the noise shows in the fact that a significant portion of the supplementary materials of the advantage experiments is dedicated to getting this right. All of this contributes to the experimental justification for using the XEB as a proxy for the fidelity!
The data shows that the experiments solved finite-fidelity RCS for values of above the constant value of roughly 0.1% as the experiments grew. In the following plot, I compare the experimental fidelity values to the near-ideal scenario on the one hand, and the trivial value on the other hand. Viewed at this scale, the values of for which the experiment solved finite-fidelity RCS are indeed vastly closer to the near-ideal value than the trivial baseline, which should boost our confidence that reproducing samples at a similar fidelity is extremely challenging.
The phase transition matters!
You might be tempted to say: “Well, but is all this really so important? Can’t I just use XEB and forget all about fidelity?”
The phase transition shows why that would change the complexity of the problem: in the strong-noise regime, XEB can stay high even when fidelity is exponentially small. And indeed, this discrepancy can be exploited by so-called spoofers for the XEB. These are efficient classical algorithms which can be used to succeed at a quantum advantage test even though they clearly do not achieve the intended advantage. These spoofers (Spoof1, Spoof2) can achieve high XEB scores comparable to those of the experiments and scaling like in the circuit depth for some constant .
Their basic idea is to introduce strong, judiciously chosen noise at specific circuit locations that has the effect of breaking up the simulation task up into smaller, much easier components, but at the same time still gives a high XEB score. In doing so, they exploit the strong-noise regime in which the XEB is a really bad proxy for the fidelity. This allows them to sample from states with exponentially low fidelity while achieving a high XEB value.
The discovery of the phase transition and the associated spoofers highlights the importance of modeling when assessing—and even formulating—the advantage claim based on noisy data.
But we can’t compute the XEB!
You might also be worried that the experiments did not actually compute the XEB in the advantage regime because to estimate it they would have needed to compute ideal probabilities—a task that is hard by definition of the advantage regime. Instead, they used a bunch of different ways to extrapolate the true XEB from XEB proxies (proxy of a proxy of the fidelity). Is this is a valid way of getting an estimate of the true XEB?
It totally is! Different extrapolations—from easy-to-simulate to hard-to-simulate, from small system to large system etc—all gave consistent answers for the experimental XEB value of the supremacy circuits. Think of this as having several lines that cross in the same point. For that crossing to be a coincidence, something crazy, conspiratorial must happen exactly when you move to the supremacy circuits from different directions. That is why it is reasonable to trust the reported value of the XEB.
That’s exactly how experiments work!
All of this is to say that establishing that the experiments correctly solved finite-fidelity RCS and therefore show quantum advantage involved a lot of experimental characterization of the noise as well as theoretical work to understand the effects of noise on the quantity we care about—the fidelity between the experimental and ideal states.
In this respect (and maybe also in the scale of the discovery), the quantum advantage experiments are similar to the recent experiments reporting discovery of the Higgs boson and gravitational waves. While I do not claim to understand any of the details, what I do understand is that in both experiments, there is an unfathomable amount of data that could not be interpreted without preselection and post-processing of the data, theories, extrapolations and approximations that model the experiment and measurement apparatus. All of those enter the respective smoking-gun plots that show the discoveries.
If you believe in the validity of experimental physics methodology, you should therefore also believe in the type of evidence underlying experimental claim of the quantum advantage demonstrations: that they sampled from the output distribution of a quantum state with the reported fidelities.
Put succinctly: If you believe in the Higgs boson and gravitational waves, you should probably also believe in the experimental demonstration of quantum advantage.
What are the counter-arguments?
The theoretical computer scientist
“The weak-noise limit is not physical. The appropriate scaling limit is one in which the local noise rate of the device is constant while the system size grows, and in that case, there is a classical simulation algorithm for RCS (SimIQP, SimRCS).”
In theoretical computer science, scaling of time or the system size in the input size is considered very natural: We say an algorithm is efficient if its runtime and space usage only depend polynomially on the input size.
But all scaling arguments are hypothetical concepts, and we only care about the scaling at relevant sizes. In the end, every scaling limit is going to hit the wall of physical reality—be it the amount of energy or human lifetime that limits the time of an algorithm, or the physical resources that are required to build larger and larger computers. To keep the scaling limit going as we increase the size of our computations, we need innovation that makes the components smaller and less noisy.
At the scales relevant to RCS, the scaling of the noise is benign and even natural. Why? Well, currently, the actual noise in quantum computers is not governed by the fundamental limit, but by engineering challenges. Realizing this limit therefore amounts to engineering improvements in the system size and noise rate that are achieved over time. Sure, at some point that scaling limit is also going to hit a fundamental barrier below which the noise cannot be improved. But we are surely far away from that limit, yet. What is more, already now logical qubits are starting to work and achieve beyond-breakeven fidelities. So even if the engineering improvements should flatten out from here onward, QEC will keep the noise limit going and even accelerate it in the intermediate future.
The complexity maniac
“All the hard complexity-theoretic evidence for quantum advantage is in the near-ideal regime, but now you are claiming advantage for the low-fidelity version of that task.”
This is probably the strongest counter-argument in my opinion, and I gave my best response above. Let me just add that this is a question about computational complexity. In the end, all of complexity theory is based on belief. The only real evidence we have for the hardness of any task is the absence of an efficient algorithm, or the reduction to a paradigmatic, well-studied task for which there is no efficient algorithm.
I am not sure how much I would bet that you cannot find an efficient algorithm for finite-fidelity RCS in the regime of the experiments, but it is certainly a pizza.
The enthusiastic skeptic
“There is no verification test that just depends on the classical samples, is efficient and does not make any assumptions about the device. In particular, you cannot unconditionally verify fidelity just from the classical samples. Why should I believe the data?”
Yes, sure, the current advantage demonstrations are not device-independent. But the comparison you should have in mind are Bell tests. The first proper Bell tests of Aspect and others in the 80s were not free of loopholes. They still allowed for contrived explanations of the data that did not violate local realism. Still, I can hardly believe that anyone would argue that Bell inequalities were not violated already back then.
As the years passed, these remaining loopholes were closed. To be a skeptic of the data, people needed to come up with more and more adversarial scenarios that could explain the data. We are working on the same to happen with quantum advantage demonstrations: come up with better schemes and better tests that require less and less assumptions or knowledge about the specifics of the device.
The “this is unfair” argument
“When you chose the gates and architecture of the circuit dependent on your device, you tailored the task too much to the device and that is unfair. Not even the different RCS experiments solve exactly the same task.”
This is not really an argument against the achievement of quantum advantage but more against the particular choices of circuit ensembles in the experiments. Sure, the specific computations solved are still somewhat tailored to the hardware itself and in this sense the experiments are not hardware-independent yet, but they still solve fine computational tasks. Moving away from such hardware-tailored task specifications is another important next step and we are working on it.
In the third and last part of this mini series I will address next steps in quantum advantage that aim at closing some of the remaining loopholes. The most important—and theoretically interesting—one is to enable efficient verification of quantum advantage using less or even no specific knowledge about the device that was used, but just the measurement outcomes.
References
(survey) Hangleiter, D. & Eisert, J. Computational advantage of quantum random sampling. Rev. Mod. Phys.95, 035001 (2023).
(classSim1) Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett.129, 090502 (2022).
(classSim2) Kalachev, G., Panteleev, P., Zhou, P. & Yung, M.-H. Classical Sampling of Random Quantum Circuits with Bounded Fidelity. arXiv.2112.15083 (2021).
(WN) Dalzell, A. M., Hunter-Jones, N. & Brandão, F. G. S. L. Random Quantum Circuits Transform Local Noise into Global White Noise. Commun. Math. Phys.405, 78 (2024).
(PT1)vMorvan, A. et al. Phase transitions in random circuit sampling. Nature634, 328–333 (2024).
(PT2) Ware, B. et al. A sharp phase transition in linear cross-entropy benchmarking. arXiv:2305.04954 (2023).
(Spoof1) Barak, B., Chou, C.-N. & Gao, X. Spoofing Linear Cross-Entropy Benchmarking in Shallow Quantum Circuits. in 12th Innovations in Theoretical Computer Science Conference (ITCS 2021) (ed. Lee, J. R.) vol. 185 30:1-30:20 (2021).
(Spoof2) Gao, X. et al. Limitations of Linear Cross-Entropy as a Measure for Quantum Advantage. PRX Quantum5, 010334 (2024).
(SimIQP) Bremner, M. J., Montanaro, A. & Shepherd, D. J. Achieving quantum supremacy with sparse and noisy commuting quantum computations. Quantum1, 8 (2017).
(SimRCS) Aharonov, D., Gao, X., Landau, Z., Liu, Y. & Vazirani, U. A polynomial-time classical algorithm for noisy random circuit sampling. in Proceedings of the 55th Annual ACM Symposium on Theory of Computing 945–957 (2023).
Recently, I gave a couple of perspective talks on quantum advantage, one at the annual retreat of the CIQC and one at a recent KITP programme. I started off by polling the audience on who believed quantum advantage had been achieved. Just this one, simple question.
The audience was mostly experimental and theoretical physicists with a few CS theory folks sprinkled in. I was sure that these audiences would be overwhelmingly convinced of the successful demonstration of quantum advantage. After all, more than half a decade has passed since the first experimental claim (G1) of “quantum supremacy” as the patron of this blog’s institute called the idea “to perform tasks with controlled quantum systems going beyond what can be achieved with ordinary digital computers” (Preskill, p. 2) back in 2012. Yes, this first experiment by the Google team may have been simulated in the meantime, but it was only the first in an impressive series of similar demonstrations that became bigger and better with every year that passed. Surely, so I thought, a significant part of my audiences would have been convinced of quantum advantage even before Google’s claim, when so-called quantum simulation experiments claimed to have performed computations that no classical computer could do (e.g. (qSim)).
I could not have been more wrong.
In both talks, less than half of the people in the audience thought that quantum advantage had been achieved.
In the discussions that ensued, I came to understand what folks criticized about the experiments that have been performed and even the concept of quantum advantage to begin with. But more on that later. Most of all, it seemed to me, the community had dismissed Google’s advantage claim because of the classical simulation shortly after. It hadn’t quite kept track of all the advances—theoretical and experimental—since then.
In a mini-series of three posts, I want to remedy this and convince you that the existing quantum computers can perform tasks that no classical computer can do. Let me caution, though, that the experiments I am going to talk about solve a (nearly) useless task. Nothing of what I say implies that you should (yet) be worried about your bank accounts.
I will start off by recapping what quantum advantage is and how it has been demonstrated in a set of experiments over the past few years.
Part 1: What is quantum advantage and what has been done?
To state the obvious: we are now fairly convinced that noiseless quantum computers would be able solve problems efficiently that no classical computer could solve. In fact, we have been convinced of that already since the mid-90ies when Lloyd and Shor discovered two basic quantum algorithms: simulating quantum systems and factoring large numbers. Both of these are tasks where we are as certain as we could be that no classical computer can solve them. So why talk about quantum advantage 20 and 30 years later?
The idea of a quantum advantage demonstration—be it on a completely useless task even—emerged as a milestone for the field in the 2010s. Achieving quantum advantage would finally demonstrate that quantum computing was not just a random idea of a bunch of academics who took quantum mechanics too seriously. It would show that quantum speedups are real: We can actually build quantum devices, control their states and the noise in them, and use them to solve tasks which not even the largest classical supercomputers could do—and these are very large.
What is quantum advantage?
But what exactly do we mean by “quantum advantage”. It is a vague concept, for sure. But some essential criteria that a demonstration should certainly satisfy are probably the following.
The quantum device needs to solve a pre-specified computational task. This means that there needs to be an input to the quantum computer. Given the input, the quantum computer must then be programmed to solve the task for the given input. This may sound trivial. But it is crucial because it delineates programmable computing devices from just experiments on any odd physical system.
There must be a scaling difference in the time it takes for a quantum computer to solve the task and the time it takes for a classical computer. As we make the problem or input size larger, the difference between the quantum and classical solution times should increase disproportionately, ideally exponentially.
And finally: the actual task solved by the quantum computer should not be solvable by any classical machine (at the time).
Achieving this last criterion using imperfect, noisy quantum devices is the challenge the idea of quantum supremacy set for the field. After all, running any of our favourite quantum algorithms in a classically hard regime on these devices is completely out of the question. They are too small and too noisy. So the field had to come up with the conceivably smallest and most noise-robust quantum algorithm that has a significant scaling advantage against classical computation.
Random circuits are really hard to simulate!
The idea is simple: we just run a random computation, constructed in a way that is as favorable as we can make it to the quantum device while being as hard as possible classically. This may strike as a pretty unfair way to come up with a computational task—it is just built to be hard for classical computers without any other purpose. But: it is a fine computational task. There is an input: the description of the quantum circuit, drawn randomly. The device needs to be programmed to run this exact circuit. And there is a task: just return whatever this quantum computation would return. These are strings of 0s and 1s drawn from a certain distribution. Getting the distribution of the strings right for a given input circuit is the computational task.
This task, dubbed random circuit sampling, can be solved on a classical as well as a quantum computer, but there is a (presumably) exponential advantage for the quantum computer. More on that in Part 2.
For now, let me tell you about the experimental demonstrations of random circuit sampling. Allow me to be slightly more formal. The task solved in random circuit sampling is to produce bit strings distributed according to the Born-rule outcome distribution
of a sequence of elementary quantum operations (unitary rotations of one or two qubits at a time) which is drawn randomly according to certain rules. This circuit is applied to a reference state on the quantum computer and then measured, giving the string as an outcome.
The breakthrough: classically hard programmable quantum computations in the real world
In the first quantum supremacy experiment (G1) by the Google team, the quantum computer was built from 53 superconducting qubits arranged in a 2D grid. The operations were randomly chosen simple one-qubit gates () and deterministic two-qubit gates called fSim applied in the 2D pattern, and repeated a certain number of times (the depth of the circuit). The limiting factor in these experiments was the quality of the two-qubit gates and the measurements, with error probabilities around 0.6 % and 4 %, respectively.
A very similar experiment was performed by the USTC team on 56 qubits (U1) and both experiments were repeated with better fidelities (0.4 % and 1 % for two-qubit gates and measurements) and slightly larger system sizes (70 and 83 qubits, respectively) in the past two years (G2,U2).
Using a trapped-ion architecture, the Quantinuum team also demonstrated random circuit sampling on 56 qubits but with arbitrary connectivity (random regular graphs) (Q). There, the two-qubit gates were -rotations around , the single-qubit gates were uniformly random and the error rates much better (0.15 % for both two-qubit gate and measurement errors).
All the experiments ran random circuits on varying system sizes and circuit depths, and collected thousands to millions of samples from a few random circuits at a given size. To benchmark the quality of the samples, the widely accepted benchmark is now the linear cross-entropy (XEB) benchmark defined as
for an -qubit circuit. The expectation over is over the random choice of circuit and the expectation over is over the experimental distribution of the bit strings. In other words, to compute the XEB given a list of samples, you ‘just’ need to compute the ideal probability of obtaining that sample from the circuit and average the outcomes.
The XEB is nice because it gives 1 for ideal samples from sufficiently random circuits and 0 for uniformly random samples, and it can be estimated accurately from just a few samples. Under the right conditions, it turns out to be a good proxy for the many-body fidelity of the quantum state prepared just before the measurement.
This tells us that we should expect an XEB score of for some noise- and architecture-dependent constant . All of the experiments achieved a value of the XEB that was significantly (in the statistical sense) far away from 0 as you can see in the plot below. This shows that something nontrivial is going on in the experiments, because the fidelity we expect for a maximally mixed or random state is which is less than % for all the experiments.
The complexity of simulating these experiments is roughly governed by an exponential in either the number of qubits or the maximum bipartite entanglement generated. Figure 5 of the Quantinuum paper has a nice comparison.
It is not easy to say how much leverage an XEB significantly lower than 1 gives a classical spoofer. But one can certainly use it to judiciously change the circuit a tiny bit to make it easier to simulate.
Even then, reproducing the low scores between 0.05 % and 0.2 % of the experiments is extremely hard on classical computers. To the best of my knowledge, producing samples that match the experimental XEB score has only been achieved for the first experiment from 2019 (PCZ). This simulation already exploited the relatively low XEB score to simplify the computation, but even for the slightly larger 56 qubit experiments these techniques may not be feasibly run. So to the best of my knowledge, the only one of the experiments which may actually have been simulated is the 2019 experiment by the Google team.
If there are better methods, or computers, or more willingness to spend money on simulating random circuits today, though, I would be very excited to hear about it!
Proxy of a proxy of a benchmark
Now, you may be wondering: “How do you even compute the XEB or fidelity in a quantum advantage experiment in the first place? Doesn’t it require computing outcome probabilities of the supposedly hard quantum circuits?” And that is indeed a very good question. After all, the quantum advantage of random circuit sampling is based on the hardness of computing these probabilities. This is why, to get an estimate of the XEB in the advantage regime, the experiments needed to use proxies and extrapolation from classically tractable regimes.
This will be important for Part 2 of this series, where I will discuss the evidence we have for quantum advantage, so let me give you some more detail. To extrapolate, one can just run smaller circuits of increasing sizes and extrapolate to the size in the advantage regime. Alternatively, one can run circuits with the same number of gates but with added structure that makes them classically simulatable and extrapolate to the advantage circuits. Extrapolation is based on samples from different experiments from the quantum advantage experiments. All of the experiments did this.
A separate estimate of the XEB score is based on proxies. An XEB proxy uses the samples from the advantage experiments, but computes a different quantity than the XEB that can actually be computed and for which one can collect independent numerical and theoretical evidence that it matches the XEB in the relevant regime. For example, the Googleexperiments averaged outcome probabilities of modified circuits that were related to the true circuits but easier to simulate.
The Quantinuum experiment did something entirely different, which is to estimate the fidelity of the advantage experiment by inverting the circuit on the quantum computer and measuring the probability of coming back to the initial state.
All of the methods used to estimate the XEB of the quantum advantage experiments required some independent verification based on numerics on smaller sizes and induction to larger sizes, as well as theoretical arguments.
In the end, the advantage claims are thus based on a proxy of a proxy of the quantum fidelity. This is not to say that the advantage claims do not hold. In fact, I will argue in my next post that this is just the way science works. I will also tell you more about the evidence that the experiments I described here actually demonstrate quantum advantage and discuss some skeptical arguments.
Let me close this first post with a few notes.
In describing the quantum supremacy experiments, I focused on random circuit sampling which is run on programmable digital quantum computers. What I neglected to talk about is boson sampling and Gaussian boson sampling, which are run on photonic devices and have also been experimentally demonstrated. The reason for this is that I think random circuits are conceptually cleaner since they are run on processors that are in principle capable of running an arbitrary quantum computation while the photonic devices used in boson sampling are much more limited and bear more resemblance to analog simulators.
I want to continue my poll here, so feel free to write in the comments whether or not you believe that quantum advantage has been demonstrated (by these experiments) and if not, why.
References
[G1] Arute, F. et al. Quantum supremacy using a programmable superconducting processor. Nature574, 505–510 (2019).
[Preskill] Preskill, J. Quantum computing and the entanglement frontier. arXiv:1203.5813 (2012).
[qSim] Choi, J. et al. Exploring the many-body localization transition in two dimensions. Science352, 1547–1552 (2016). .
[U1] Wu, Y. et al. Strong Quantum Computational Advantage Using a Superconducting Quantum Processor. Phys. Rev. Lett.127, 180501 (2021).
[G2] Morvan, A. et al. Phase transitions in random circuit sampling. Nature634, 328–333 (2024).
[U2] Gao, D. et al. Establishing a New Benchmark in Quantum Computational Advantage with 105-qubit Zuchongzhi 3.0 Processor. Phys. Rev. Lett.134, 090601 (2025).
[Q] DeCross, M. et al. Computational Power of Random Quantum Circuits in Arbitrary Geometries. Phys. Rev. X15, 021052 (2025).
[PCZ] Pan, F., Chen, K. & Zhang, P. Solving the sampling problem of the Sycamore quantum circuits. Phys. Rev. Lett.129, 090502 (2022).
I recently listened again to Richard Feynman explaining why the flowing of time is probably an illusion. In modern physics time is just a coordinate, on the same footing as space, and the universe can be described as a four-dimensional object — a spacetime block. In that view, nothing really “flows”. All events simply are, laid out in a 4D structure. What we experience as the passage of time is tied instead to the arrow of entropy: the fact that we move through a sequence of states ordered by increasing disorder, and that memory itself is asymmetric.
As you grow up, teachers try to teach you how the world works. This is more difficult than it sounds, because teaching you something is a much harder goal than just telling you something. A teacher wants you to remember what you’re told. They want you to act on it, and to generalize it. And they want you to do this not just for today’s material, but to set a foundation for next year, and the next. They’re setting you up for progress through a whole school system, with its own expectations.
Because of that, not everything a teacher tells you is, itself, a fact about the world. Some things you hear from teachers are liked the scaffolds on a building. They’re facts that only make sense in the context of school, support that lets you build to a point where you can learn other facts, and throw away the school facts that got you there.
Not every student uses all of that scaffolding, though. The scaffold has to be complete enough that some students can use it to go on, getting degrees in science or mathematics, and eventually becoming researchers where they use facts more deeply linked to the real world. But most students don’t become researchers. So the scaffold sits there, unused. And many people, as their lives move on, mistake the scaffold for the real world.
Here’s an example. How do you calculate something like this?
From school, you might remember order of operations, or PEMDAS. First parentheses, then exponents, multiplication, division, addition, and finally subtraction. If you ran into that calculation in school, you could easily work it out.
But out of school, in the real world? Trick question, you never calculate something like that to begin with.
When I wrote this post, I had to look up how to write and . In the research world, people are far more likely to run into calculations like this:
Here, it’s easier to keep track of what order you need to do things. In other situations, you might be writing a computer program (or an Excel spreadsheet formula, which is also a computer program). Then you follow that programming language’s rules for order of operations, which may or may not match PEMDAS.
PEMDAS was taught to you in school for good reason. It got you used to following rules to understand notation, and gave you tools the teachers needed to teach you other things. But it isn’t a fact about the universe. It’s a fact about school.
Once you start looking around for these “school facts”, they show up everywhere.
Are there really “three states of matter”, solid, liquid, and gas? Or four, if you add plasma? Well, sort of. There are real scientific definitions for solids, liquids, gases, and plasmas, and they play a real role in how people model big groups of atoms, “matter” in a quite specific sense. But they can’t be used to describe literally everything. If you start asking what state of matter light or spacetime is, you’ve substituted a simplification that was useful for school (“everything is one of three states of matter”) for the actual facts in the real world.
If you remember a bit further, maybe you remember there are two types of things, matter and energy? You might have even heard that matter and antimatter annihilate into energy. These are also just school facts, though. “Energy” isn’t something things are made of, it’s a property things have. Instead, your teachers were building scaffolding for understanding the difference between massive and massless particles, or between dark matter and dark energy. Each of those uses different concepts of matter and energy, and each in turn is different than the concept of matter in its states of solid, liquid, and gas. But in school, you need a consistent scaffold to learn, not a mess of different definitions for different applications. So unless you keep going past school, you don’t learn that.
Physics in school likes to work with forces, and forces do sometimes make an appearance in the real world, for example for engineers. But if you’re asking a question about fundamental physics, like “is gravity really a force?”, then you’re treating a school fact as if it was a research fact. Fundamental physics doesn’t care about forces in the same way. It uses different mathematical tools, like Lagrangians and Hamiltonians, to calculate the motion of objects in systems, and uses “force” in a pop science way to describe fundamental interactions.
If you get good enough at this, you can spot which things you learned in school were likely just scaffolding “school facts”, and which are firm enough that they may hold further. Any simple division of the world into categories is likely a school fact, one that let you do exercises on your homework but gets much more complicated when the real world gets involved. Contradictory or messy concepts are usually another sign, showing something fuzzy used to get students comfortable rather than something precise enough for professionals to use. Keep an eye out, and even if you don’t yet know the real facts, you’ll know enough to know what you’re missing.
Today I was saddened to hear of the passing of Hans Jensen, a physicist and former colleague in the CDF experiment at Fermilab. There is an obituary page here with nice pics and a bio if you want detail on his interesting, accomplished life. Here I thought I would remember him by pasting an excerpt of my 2016 book, "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", where he is featured. The topic of the anecdote is the data collection for the top quark search. The date is December 1992. ---
This year opened in slow motion for me, at least work-wise. I have been on parental leave since December 16, when my third and fourth sons were born within one minute from one another, but of course a workaholic can never stand completely still. In fact, even as we speak I am sitting and typing at the keyboard with my right hand only (about 3-4 letters per second), while I hold Alessandro with the left one on my lap and I move my legs rythmically to keep him entertained.
One of my favorite weather sayings. And I ran into a neighbor on my way home from the errand I was running so I got to say it in person, always a treat. 3 degrees Fahrenheit, sunny and windy in Madison right now, really not bad at all. When I moved here I thought Wisconsin winter was going to be a fifth season, a new thing that would test and harden me, but actually there are only a few days like today which are really a different kind of cold than I grew up with on the East Coast, and I’ll be honest, I enjoy them. The cold air reboots your head. I am wearing my biggest, most shapeless sweater, and a scarf in the pattern of the flag of Schiermonnikoog.
If you liked this post you might like Tom Scocca’s weather reviews. I don’t think Tom has ever experienced a good old-fashioned Wisconsin booger-freezer, though.
(Update to this post. “A good old-fashioned something or other” isn’t both good and old-fashioned, it’s a thing characterized as “good old,” which is a semantic unit on its own, and is also old-fashioned. It means something like “good old old-fashioned Wisconsin booger-freezer,” which of course you would not say. Are there any other examples like this, where two two-word phrases which overlap combine into a three-word phrase? It’s kind of like composable morphisms in a category.)
A basic problem in sieve theory is to understand what happens when we start with the integers (or some subinterval of the integers) and remove some congruence classes for various moduli . Here we shall concern ourselves with the simple setting where we are sieving the entire integers rather than an interval, and are only removing a finite number of congruence classes . In this case, the set of integers that remain after the sieving is periodic with period , so one work without loss of generality in the cyclic group . One can then ask: what is the density of the sieved set
If the were all coprime, then it is easy to see from the Chinese remainder theorem that the density is given by the product
However, when the are not coprime, the situation is more complicated. One can use the inclusion-exclusion formula to get a complicated expression for the density, but it is not easy to work with. Sieve theory also supplies one with various useful upper and lower bounds (starting with the classical Bonferroni inequalities), but do not give exact formulae.
In this blog post I would like to note one simple fact, due to Rogers, that one can say about this problem:
Theorem 1 (Rogers’ theorem) For fixed , the density of the sieved set is maximized when all the vanish. Thus,
Example 2 If one sieves out , , and , then only remains, giving a density of . On the other hand, if one sieves out , , and , then the remaining elements are and , giving the larger density of .
This theorem is somewhat obscure: its only appearance in print is in pages 242-244 of this 1966 text of Halberstam and Roth, where the authors write in a footnote that the result is “unpublished; communicated to the authors by Professor Rogers”. I have only been able to find it cited in three places in the literature: in this 1996 paper of Lewis, in this 2007 paper of Filaseta, Ford, Konyagin, Pomerance, and Yu (where they credit Tenenbaum for bringing the reference to their attention), and is also briefly mentioned in this 2008 paper of Ford. As far as I can tell, the result is not available online, which could explain why it is rarely cited (and also not known to AI tools). This became relevant recently with regards to Erdös problem 281, posed by Erdös and Graham in 1980, which was solved recently by Neel Somani through an AI query by an elegant ergodic theory argument. However, shortly after this solution was located, it was discovered by KoishiChan that Rogers’ theorem reduced this problem immediately to a very old result of Davenport and Erdös from 1936. Apparently, Rogers’ theorem was so obscure that even Erdös was unaware of it when posing the problem!
Modern readers may see some similarities between Rogers’ theorem and various rearrangement or monotonicity inequalites, suggesting that the result may be proven by some sort of “symmetrization” or “compression” method. This is indeed the case, and is basically Rogers’ original proof. We can modernize a bit as follows. Firstly, we can abstract into a finite cyclic abelian group , with residue classes now becoming cosets of various subgroups of . We can take complements and restate Rogers’ theorem as follows:
Theorem 3 (Rogers’ theorem, again) Let be cosets of a finite cyclic abelian group . Then
Example 4 Take , , , and . Then the cosets , , and cover the residues , with a cardinality of ; but the subgroups cover the residues , having the smaller cardinality of .
Intuitively: “sliding” the cosets together reduces the total amount of space that these cosets occupy. As pointed out in comments, the requirement of cyclicity is crucial; four lines in a finite affine plane already suffice to be a counterexample otherwise.
By factoring the cyclic group into -groups, Rogers’ theorem is an immediate consequence of two observations:
Theorem 5 (Rogers’ theorem for cyclic groups of prime order) Rogers’ theorem holds when for some prime power .
Theorem 6 (Rogers’ theorem preserved under products) If Rogers’ theorem holds for two finite abelian groups of coprime orders, then it also holds for the product .
The case of cyclic groups of prime order is trivial, because the subgroups of are totally ordered. In this case is simply the largest of the , which has the same size as and thus has lesser or equal cardinality to .
The preservation of Rogers’ theorem under products is also routine to verify. By the coprime orders of and standard group theoretic arguments (e.g., Goursat’s lemma, the Schur–Zassenhaus theorem, or the classification of finite abelian groups), one can see that any subgroup of splits as a direct product of subgroups of respectively, so the cosets also split as
Applying Rogers’ theorem for to each “vertical slice” of and summing, we see that
and then applying Rogers’ theorem for to each “horizontal slice” of and summing, we obtain
Combining the two inequalities, we obtain the claim.
Apparently Dante conceived of the universe as a 3-sphere! That’s a 3-dimensional space formed by taking two solid 3-dimensional balls and completely gluing their surfaces together.
In his Divine Comedy, Dante describes the usual geocentric universe of his day. It has concentric spheres for the Moon and Sun, the various planets, and then the so-called ‘fixed stars’. Outside the sphere of fixed stars, there’s a sphere for the ‘first mover’, the Primum Mobile. Ptolemy believed in this, and so did Copernicus—and even Galileo did, at first.
But that’s not all! Outside that sphere, Dante describes 9 concentric spheres of the Empyrean, where various levels of angel live. And as we go up into the Empyrean, these spheres get smaller. They all surround a point—which is God. This is shown above in an illustration by Gustav Doré.
At the opposite extreme, at the center of the Earth, is another point — and that’s where Satan lives, surrounded by the 9 levels of Hell.
Altogether we have a 3-dimensional closed universe of the sort mathematicians call a 3-sphere! You can also think of it as the one-point compactification of 3d Euclidean space with God as the point at infinity and Satan at the farthest point from that: the origin.
Much later Einstein also postulated that the universe was a 3-sphere, which was kept from collapsing by the cosmological constant. This was before Hubble and others saw that the universe is expanding. General relativity also allows space to be a 3-sphere that expands with time and then recollapses in a Big Crunch, but that model doesn’t seem to fit the data very well.
Here are a couple of good references on this subject:
• Mark A. Peterson, Dante and the 3-sphere, American Journal of Physics47 (1979), 1031–1035.
In the Paradiso Dante describes his ascent sphere by sphere through the Aristotelian universe to the Primum Mobile. Beyond this is the Empyrean, the abode of God and the angels. The conventional picture of the Empyrean seems to have been rather vague, geometrically speaking. In diagrams of the universe, for example, it was represented by the border area, outside the Primum Mobile, often richly populated with angelic beings. Dante, however, endows the Empyrean with a detailed and precise geometric structure. This structure is described in Canto 28, as if seen from the Primum Mobile, as a bright Point representing God, surrounded by nine concentric spheres representing the various angelic orders. The details which follow leave the almost inescapable impression that he conceives of these nine angelic spheres as forming one hemisphere of the entire universe and the usual Aristotelian universe up to the Primum Mobile as the other hemisphere, while he is standing more or less on the equator between them [….] Taken all together, then, his universe is a 3-sphere.
[….]
Dante himself believed he was expressing something entirely new at this juncture.
[….]
Dante’s elation with this idea—a feeling we may readily share — has traditionally left readers somewhat puzzled. That is just another way of saying that if this passage is not taken as a description of the organization of 2-spheres into a 3-sphere, then it is hard to see what the point of it is.
Like many other areas of modern analysis, analytic number theory often relies on the convenient device of asymptotic notation to express its results. It is common to use notation such as or , for instance, to indicate a bound of the form for some unspecified constant . Such implied constants vary from line to line, and in most papers, one does not bother to compute them explicitly. This makes the papers easier both to write and to read (for instance, one can use asymptotic notation to conceal a large number of lower order terms from view), and also means that minor numerical errors (for instance, forgetting a factor of two in an inequality) typically has no major impact on the final results. However, the price one pays for this is that many results in analytic number theory are only true in asymptotic sense; a typical example is Vinogradov’s theorem that every sufficiently large odd integer can be expressed as the sum of three primes. In the first few proofs of this theorem, the threshold for “sufficiently large” was not made explicit.
There is however a small portion of the analytic number theory devoted to explicit analytic number theory estimates, in which all constants are made completely explicit (and many lower order terms are retained). For instance, whereas the prime number theorem asserts that the prime counting function is asymptotic to the logarithmic integral , in this recent paper of Fiori, Kadiri, and Swidinsky the explicit estimate
is proven for all .
Such explicit results follow broadly similar strategies of proof to their non-explicit counterparts, but require a significant amount of careful book-keeping and numerical optimization; furthermore, any given explicit analytic number theory paper is likely to rely on the numerical results obtained in previous explicit analytic number theory papers. While the authors make their best efforts to avoid errors and build in some redundancy into their work, there have unfortunately been a few cases in which an explicit result stated in the published literature contained numerical errors that placed the numerical constants in several downstream applications of these papers into doubt.
Because of this propensity for error, updating any given explicit analytic number theory result to take into account computational improvements in other explicit results (such as zero-free regions for the Riemann zeta function) is not done lightly; such updates occur on the timescale of decades, and only by a small number of specialists in such careful computations. This leads to authors needing such explicit results to often be forced to rely on papers that are a decade or more out of date, with constants that they know in principle can be improved by inserting more recent explicit inputs, but do not have the domain expertise to confidently update all the numerical coefficients.
To me, this situation sounds like an appropriate application of modern AI and formalization tools – not to replace the most enjoyable aspects of human mathematical research, but rather to allow extremely tedious and time-consuming, but still necessary, mathematical tasks to be offloaded to semi-automated or fully automated tools.
Because of this, I (acting in my capacity as Director of Special Projects at IPAM) have just launched the integrated explicit analytic number theory network, a project partially hosted within the existing “Prime Number Theorem And More” (PNT+) formalization project. This project will consist of two components. The first is a crowdsourced formalization project to formalize a number of inter-related explicit analytic number theory results in Lean, such as the explicit prime number theorem of Fiori, Kadiri, and Swidinsky mentioned above; already some smaller results have been largely formalized, and we are making good progress (especially with the aid of modern AI-powered autoformalization tools) on several of the larger papers. The second, which will be run at IPAM with the financial and technical support of Math Inc., will be to extract from this network of formalized results an interactive “spreadsheet” of a large number of types of such estimates, with the ability to add or remove estimates from the network and have the numerical impact of these changes automatically propagate to other estimates in the network, similar to how changing one cell in a spreadsheet will automatically update other cells that depend on it. For instance, one could increase or decrease the numerical threshold to which the Riemann hypothesis is verified, and see the impact of this change on the explicit error terms in the prime number theorem; or one could “roll back” all the literature to a given date, and see what the best estimates on various analytic number theory expressions could still be derived from the literature available at that date. Initially, this spreadsheet will be drawn from direct adaptations of the various arguments from papers formalized within the network, but in a more ambitious second stage of the project we plan to use AI tools to modify these arguments to find more efficient relationships between the various numerical parameters than were provided in the source literature.
These more ambitious outcomes will likely take several months before a working proof of concept can be demonstrated; but in the near term I will be grateful for any contributions to the formalization side of the project, which is being coordinated on the PNT+ Zulip channel and on the github repository. We are using a github issues based system to coordinate the project, similar to how it was done for the Equational Theories Project. Any volunteer can select one of the outstanding formalization tasks on the Github issues page and “claim” it as a task to work on, eventually submitting a pull request (PR) to the repository when proposing a solution (or to “disclaim” the task if for whatever reason you are unable to complete it). As with other large formalization projects, an informal “blueprint” is currently under construction that breaks up the proofs of the main results of several explicit analytic number theory papers into bite-sized lemmas and sublemmas, most of which can be formalized independently without requiring broader knowledge of the arguments from the paper that the lemma was taken from. (A graphical display of the current formalization status of this blueprint can be found here. At the current time of writing, many portions of the blueprint are disconnected from each other, but as the formalization progresses, more linkages should be created.)
One minor innovation implemented in this project is to label each task by a “size” (ranging from XS (extra small) to XL (extra large)) that is a subjective assessment of the task difficulty, with the tasks near the XS side of the spectrum particularly suitable for beginners to Lean.
We are permitting AI use in completing the proof formalization tasks, though we require the AI use to be disclosed, and that the code is edited by humans to remove excessive bloat. (We expect some of the AI-generated code to be rather inelegant; but no proof of these explicit analytic number theory estimates, whether human-generated or AI-generated, is likely to be all that pretty or worth reading for its own sake, so the downsides of using AI-generated proofs here are lower than in other use cases.) We of course require all submissions to typecheck correctly in Lean through Github’s Continuous Integration (CI) system, so that any incorrect AI-generated code will be rejected. We are also cautiously experimenting with ways in which AI can also automatically or semi-automatically generate the formalized statements of lemmas and theorems, though here one has to be significantly more alert to the dangers of misformalizing an informally stated result, as this type of error cannot be automatically detected by a proof assistant.
We also welcome suggestions for additional papers or results in explicit analytic number theory to add to the network, and will have some blueprinting tasks in addition to the formalization tasks to convert such papers into a blueprinted sequence of small lemmas suitable for individual formalization.
Update: a personal log documenting the project may be found here.
People make social media accounts for their pets. Why not a scientific paper?
Anthropologist Ed Hagen made a Bluesky account for his recent preprint, “Menopause averted a midlife energetic crisis with help from older children and parents: A simulation study.” The paper’s topic itself is interesting (menopause is surprisingly rare among mammals, he has a plausible account as to why), but not really the kind of thing I cover here.
Rather, it’s his motivation that’s interesting. Hagen didn’t make the account out of pure self-promotion or vanity. Instead, he’s promoting it as a novel approach to scientific publishing. Unlike Twitter, Bluesky is based on an open, decentralized protocol. Anyone can host an account compatible with Bluesky on their own computer, and anyone with the programming know-how can build a computer program that reads Bluesky posts. That means that nothing actually depends on Bluesky, in principle: the users have ultimate control.
Hagen’s idea, then, is that this could be a way to fulfill the role of scientific journals without channeling money and power to for-profit publishers. If each paper is hosted on a scientist’s own site, the papers can link to each other via following each other. Scientists on Bluesky can follow or like the paper, or comment on and discuss it, creating a way to measure interest from the scientific community and aggregate reviews, two things journals are supposed to cover.
I must admit, I’m skeptical. The interface really seems poorly-suited for this. Hagen’s paper’s account is called @menopause-preprint.edhagen.net. What happens when he publishes another paper on menopause, what will he call it? How is he planning to keep track of interactions from other scientists with an account for every single paper, won’t swapping between fifteen Bluesky accounts every morning get tedious? Or will he just do this with papers he wants to promote?
I applaud the general idea. Decentralized hosting seems like a great way to get around some of the problems of academic publishing. But this will definitely take a lot more work, if it’s ever going to be viable on a useful scale.
Still, I’ll keep an eye on it, and see if others give it a try. Stranger things have happened.
The Kondo effect is a neat piece of physics, an archetype of a problem involving strong electronic correlations and entanglement, with a long and interesting history and connections to bulk materials, nanostructures, and important open problems.
First, some stage setting. In the late 19th century, with the development of statistical physics and the kinetic theory of gases, and the subsequent discovery of electrons by JJ Thomson, it was a natural idea to try modeling the electrons in solids as a gas, as done by Paul Drude in 1900. Being classical, the Drude model misses a lot (If all solids contain electrons, why aren't all solids metals? Why is the specific heat of metals orders of magnitudes lower than what a classical electron gas would imply?), but it does introduce the idea of electrons as having an elastic mean free path, a typical distance traveled before scattering off something (an impurity? a defect?) into a random direction. In the Drude picture, as \(T \rightarrow 0\), the only thing left to scatter charge carriers is disorder ("dirt"), and the resistivity of a conductor falls monotonically and approaches \(\rho_{0}\), the "residual resistivity", a constant set in part by the number of defects or impurities in the material. In the semiclassical Sommerfeld model, and then later in nearly free electron model, this idea survives.
One small problem: in the 1930s (once it was much easier to cool materials down to very low temperatures), it was noticed that in many experiments (here and here, for example) the electrical resistivity of metals did not seem to fall and then saturate at some \(\rho_{0}\). Instead, as \(T \rightarrow 0\), \(\rho(T)\) would go through a minimum and then start increasing again, approximately like \(\delta \rho(T) \propto - \ln(T/T_{0})\), where \(T_{0}\) is some characteristic temperature scale. This is weird and problematic, especially since the logarithm formally diverges as \(T \rightarrow 0\).
Over time, it became clear that this phenomenon was associated with magnetic impurities, atoms that have unpaired electrons typically in \(d\) orbitals, implying that somehow the spin of the electrons was playing an important role in the scattering process. In 1964, Jun Kondo performed the definitive perturbative treatment of this problem, getting the \(\ln T\) divergence.
[Side note: many students learning physics are at least initially deeply uncomfortable with the idea of approximations (that many problems can't be solved analytically and exactly, so we need to take limiting cases and make controlled approximations, like series expansions). What if a series somehow doesn't converge? This is that situation.]
The Kondo problem is a particular example of a "quantum impurity problem", and it is a particular limiting case of the Anderson impurity model. Physically, what is going on here? A conduction electron from the host metal could sit on the impurity atom, matching up with the unpaired impurity electron. However (much as we can often get away with ignoring it) like charges repel, and it is energetically very expensive (modeled by some "on-site" repulsive energy \(U\)) to do that. Parking that conduction electron long-term is not allowed, but a virtual process can take place, whereby a conduction electron with spin opposite to the localized moment can (in a sense) pop on there and back off, or swap places with the localized electron. The Pauli principle enforces this opposed spin restriction, leading to entanglement between the local electron and the conduction electron as they form a singlet. Moreover, this process generally involves conduction electrons at the Fermi surface of the metal, so it is a strongly interacting many-body problem. As the temperature is reduced, this process becomes increasingly important, so that the impurity's scattering cross section of conduction electrons grows as \(T\) falls, causing the resistivity increase.
Top: Cartoon of the Kondo scattering process. Bottom:
Ground state is a many-body singlet between the local
moment and the conduction electrons.
The eventual \(T = 0\) ground state of this system is a many-body singlet, with the localized spin entangled with a "Kondo cloud" of conduction electrons. The roughly \(\ln T\) resistivity correction rolls over and saturates. There ends up being a sharp peak (resonance) in the electronic density of states right at the Fermi energy. Interestingly, this problem actually can be solved exactly and analytically (!), as was done by Natan Andrei in this paper in 1980 and reviewed here.
This might seem to be the end of the story, but the Kondo problem has a long reach! With the development of the scanning tunneling microscope, it became possible to see Kondo resonances associated with individual magnetic impurities (see here). In semiconductor quantum dot devices, if the little dot has an odd number of electrons, then it can form a Kondo resonance that spans from the source electrode through the dot and into the drain electrode. This leads to a peak in the conductance that grows and saturates as \(T \rightarrow 0\) because it involves forward scattering. (See here and here). The same can happen in single-molecule transistors (see here, here, here, and a review here). Zero-bias peaks in the conductance from Kondo-ish physics can be a confounding effect when looking for other physics.
Of course, one can also have a material where there isn't a small sprinkling of magnetic impurities, but a regular lattice of spin-hosting atoms as well as conduction electrons. This can lead to heavy fermion systems, or Kondo insulators, and more exotic situations.
The depth of physics that can come out of such simple ingredients is one reason why the physics of materials is so interesting.
A fundamental component of my research work is the close collaboration with a large number of scientists from all around the world. This is the result of the very large scale of the experiments that are necessary to investigate the structure of matter at the smallest distance scales: building and operating those machines to collect the data and analyze it requires scientists to team up in large numbers - and this builds connections, cooperation, and long-time acquaintance; and in some cases, friendship.
Einstein realized that gravity is due to the curvature of spacetime, but let’s go back earlier:
On the 18th of August 1869, the eminent mathematician Sylvester gave a speech arguing that geometry is not separate from physics. He later published this speech in the journal Nature, and added a footnote raising the possibility that space is curved:
the laws of motion accepted as fact, suffice to prove in a general way that the space we live in is a flat or level space […], our existence therein being assimilable to the life of the bookworm in a flat page; but what if the page should be undergoing a process of gradual bending into a curved form?
Then, even more dramatically, he announced that the mathematician Clifford had been studying this!
Mr. W. K. Clifford has indulged in more remarkable speculations as the possibility of our being able to infer, from certain unexplained phenomena of light and magnetism, the fact of our level space of three dimensions being in the act of undergoing in space of four dimensions (space as inconceivable to us as our space to the supposititious bookworm) a distortion analogous to the rumpling of the page.
This started a flame war in letters to Nature which the editor eventually shut off, saying “this correspondence must now cease”. Clifford later wrote about his theories in a famous short paper:
• William Clifford, On the space-theory of matter, Proceedings of the Cambridge Philosophical Society2 (1876), 157–158.
It’s so short I can show you it in its entirety:
Riemann has shewn that as there are different kinds of lines and surfaces, so there are different kinds of space of three dimensions; and that we can only find out by experience to which of these kinds the space in which we live belongs. In particular, the axioms of plane geometry are true within the limits of experiment on the surface of a sheet of paper, and yet we know that the sheet is really covered with a number of small ridges and furrows, upon which (the total curvature not being zero) these axioms are not true. Similarly, he says, although the axioms of solid geometry are true within the limits of experiment for finite portions of our space, yet we have no reason to conclude that they are true for very small portions; and if any help can be got thereby for the explanation of physical phenomena, we may have reason to conclude that they are not true for very small portions of space.
I wish here to indicate a manner in which these speculations may be applied to the investigation of physical phenomena. I hold in fact
(1) That small portions of space are in fact of a nature analogous to little hills on a surface which is on the average flat; namely, that the ordinary laws of geometry are not valid in them.
(2) That this property of being curved or distorted is continually being passed on from one portion of space to another after the manner of a wave.
(3) That this variation of the curvature of space is what really happens in that phenomenon which we call the motion of matter, whether ponderable or etherial.
(4) That in the physical world nothing else takes place but this variation, subject (possibly) to the law of continuity.
I am endeavouring in a general way to explain the laws of double refraction on this hypothesis, but have not yet arrived at any results sufficiently decisive to be communicated.
To my surprise, the following paper argues that Clifford did experiments to test his ideas by measuring the polarization of the skylight during a solar eclipse in Sicily on December 22, 1870:
Clifford did indeed go on such an expedition, and did indeed try to measure the polarization of skylight as the Moon passed the Sun. I don’t know of any record of him saying why he did it.
I’ll skip everything the above paper says about why the polarization of skylight was interesting and mysterious in the 1800s, and quote just a small bit:
The English Eclipse Expedition set off earlier in December 1870, on the steamship H.M.S. Psyche scheduled for a stopover at Naples before continuing to Syracuse in Sicily. Unfortunately before arriving to her final call, the ship struck rocks and was wrecked off Catania. Fortunately all instruments and members of the party were saved without injury.
Originally it was the intention of the expedition to establish in Syracuse their head-quarters, but in view of the wreckage the group set up their base camp at Catania. There the expedition split up into three groups. The group that included Clifford put up an observatory in Augusta near Catania. The leader of this group was William Grylls Adams, professor of Natural Philosophy at King’s College, London.
In a report written by Prof. Adams, describing the expedition, we learn that the day of the eclipse, just before the time of totality, “… a dense cloud came over the Moon and shut out the whole, so that it was doubtful whether the Moon or the clouds first eclipsed the Sun […] Mr. Clifford observed light polarized on the cloud to the right and left and over the Moon, in a horizontal plane through the Moon’s centre [….] It will be seen from Mr. Clifford’s observations that the plane of polarization by the cloud…was nearly at right angles to the motion of the Sun”.
As was to be expected, Clifford’s eclipse observations on polarization did not produce any result. His prime intention, of detecting angular changes of the polarization plane due to the curving of space by the Moon in its transit across the Sun´s disk, was not fulfilled. At most he confirmed the already known information, i.e. the skylight polarization plane moves at right angles to the Sun anti-Sun direction.
This is a remarkable prefiguring of Eddington’s later voyage to the West African island of Principe to measure the bending of starlight during an eclipse of the Sun in 1919. Just one of many stories in the amazing prehistory of general relativity!
Some people are disappointed in physics. Shocking, I know!
Those people, when careful enough, clarify that they’re disappointed in fundamental physics: not the physics of materials or lasers or chemicals or earthquakes, or even the physics of planets and stars, but the physics that asks big fundamental questions, about the underlying laws of the universe and where they come from.
Some of these people are physicists themselves, or were once upon a time. These often have in mind other directions physicists should have gone. They think that, with attention and funding, their own ideas would have gotten us closer to our goals than the ideas that, in practice, got the attention and the funding.
Most of these people, though, aren’t physicists. They’re members of the general public.
It’s disappointment from the general public, I think, that feels the most unfair to physicists. The general public reads history books, and hears about a series of revolutions: Newton and Maxwell, relativity and quantum mechanics, and finally the Standard Model. They read science fiction books, and see physicists finding “theories of everything”, and making teleporters and antigravity engines. And they wonder what made the revolutions stop, and postponed the science fiction future.
Physicists point out, rightly, that this is an oversimplified picture of how the world works. Something happens between those revolutions, the kind of progress not simple enough to summarize for history class. People tinker away at puzzles, and make progress. And they’re still doing that, even for the big fundamental questions. Physicists know more about even faraway flashy topics like quantum gravity than they did ten years ago. And while physicists and ex-physicists can argue about whether that work is on the right path, it’s certainly farther along its own path than it was. We know things we didn’t know before, progress continues to be made. We aren’t at the “revolution” stage yet, or even all that close. But most progress isn’t revolutionary, and no-one can predict how often revolutions should take place. A revolution is never “due”, and thus can never be “overdue”.
Physicists, in turn, often don’t notice how normal this kind of reaction from the public is. They think people are being stirred up by grifters, or negatively polarized by excess hype, that fundamental physics is facing an unfair reaction only shared by political hot-button topics. But while there are grifters, and people turned off by the hype…this is also just how the public thinks about science.
Have you ever heard the phrase “a cure for cancer”?
Fiction is full of scientists working on a cure for cancer, or who discovered a cure for cancer, or were prevented from finding a cure for cancer. It’s practically a trope. It’s literally a trope.
It’s also a real thing people work on, in a sense. Many scientists work on better treatments for a variety of different cancers. They’re making real progress, even dramatic progress. As many whose loved ones have cancer know, it’s much more likely for someone with cancer to survive than it was, say, twenty years ago.
But those cures don’t meet the threshold for science fiction, or for the history books. They don’t move us, like the polio vaccine did, from a world where you know many people with a disease to a world where you know none. They don’t let doctors give you a magical pill, like in a story or a game, that instantly cures your cancer.
For the vast majority of medical researchers, that kind of goal isn’t realistic, and isn’t worth thinking about. The few that do pursue it work towards extreme long-term solutions, like periodically replacing everyone’s skin with a cloned copy.
So while you will run into plenty of media descriptions of scientists working on cures for cancer, you won’t see the kind of thing the public expects is an actual “cure for cancer”. And people are genuinely disappointed about this! “Where’s my cure for cancer?” is a complaint on the same level as “where’s my hovercar?” There are people who think that medical science has made no progress in fifty years, because after all those news articles, we still don’t have a cure for cancer.
I appreciate that there are real problems in what messages are being delivered to the public about physics, both from hypesters in the physics mainstream and grifters outside it. But put those problems aside, and a deeper issue remains. People understand the world as best they can, as a story. And the world is complicated and detailed, full of many people making incremental progress on many things. Compared to a story, the truth is always at a disadvantage.
Asgar Jamneshan, Or Shalom and I have uploaded to the arXiv our paper “ Polynomial towers and inverse Gowers theory for bounded-exponent groups“. This continues our investigation into the ergodic-theory approach to the inverse theory of Gowers norms over finite abelian groups . In this regard, our main result establishes a satisfactory (qualitative) inverse theorem for groups of bounded exponent:
Theorem 1 Let be a finite abelian group of some exponent , and let be -bounded with . Then there exists a polynomial of degree at most such that
This type of result was previously known in the case of vector spaces over finite fields (by work of myselfand Ziegler), groups of squarefree order (by work of Candela, González-Sánchez, and Szegedy), and in the case (by work of Jamneshan and myself). The case , for instance, is treated by this theorem but not covered by previous results. In the aforementioned paper of Candela et al., a result similar to the above theorem was also established, except that the polynomial was defined in an extension of rather than in itself (or equivalently, correlated with a projection of a phase polynomial, rather than directly with a phase polynomial on ). This result is consistent with a conjecture of Jamneshan and myself regarding what the “right” inverse theorem should be in any finite abelian group (not necessarily of bounded exponent).
In contrast to previous work, we do not need to treat the “high characteristic” and “low characteristic” cases separately; in fact, many of the delicate algebraic questions about polynomials in low characteristic do not need to be directly addressed in our approach, although this is at the cost of making the inductive arguments rather intricate and opaque.
As mentioned above, our approach is ergodic-theoretic, deriving the above combinatorial inverse theorem from an ergodic structure theorem of Host–Kra type. The most natural ergodic structure theorem one could establish here, which would imply the above theorem, would be the statement that if is a countable abelian group of bounded exponent, and is an ergodic -system of order at most in the Host–Kra sense, then would be an Abramov system – generated by polynomials of degree at most . This statement was conjectured many years ago by Bergelson, Ziegler, and myself, and is true in many “high characteristic” cases, but unfortunately fails in low characteristic, as recently shown by Jamneshan, Shalom, and myself. However, we are able to recover a weaker version of this statement here, namely that admits an extension which is an Abramov system. (This result was previously established by Candela et al. in the model case when is a vector space over a finite field.) By itself, this weaker result would only recover a correlation with a projected phase polynomial, as in the work of Candela et al.; but the extension we construct arises as a tower of abelian extensions, and in the bounded exponent case there is an algebraic argument (hinging on a certain short exact sequence of abelian groups splitting) that allows one to map the functions in this tower back to the original combinatorial group rather than an extension thereof, thus recovering the full strength of the above theorem.
It remains to prove the ergodic structure theorem. The standard approach would be to describe the system as a Host–Kra tower
where each extension of is a compact abelian group extension by a cocycle of “type” , and them attempt to show that each such cocycle is cohomologous to a polynomial cocycle. However, this appears to be impossible in general, particularly in low characteristic, as certain key short exact sequences fail to split in the required ways. To get around this, we have to work with a different tower, extending various levels of this tower as needed to obtain additional good algebraic properties of each level that enables one to split the required short exact sequences. The precise properties needed are rather technical, but the main ones can be described informally as follows:
We need the cocycles to obey an “exactness” property, in that there is a sharp correspondence between the type of the cocycle (or any of its components) and its degree as a polynomial cocycle. (By general nonsense, any polynomial cocycle of degree is automatically of type ; exactness, roughly speaking, asserts the converse.) Informally, the cocycles should be “as polynomial as possible”.
The systems in the tower need to have “large spectrum” in that the set of eigenvalues of the system form a countable dense subgroup of the Pontryagin dual of the acting group (in fact we demand that a specific countable dense subgroup is represented).
The systems need to be “pure” in the sense that the sampling map that maps polynomials on the system to polynomials on the group is injective for a.e. , with the image being a pure subgroup. Informally, this means that the problem of taking roots of a polynomial in the system is equivalent to the problem of taking roots of the corresponding polynomial on the group . In low characteristic, the root-taking problem becomes quite complicated, and we do not give a good solution to this problem either in the ergodic theory setting or the combinatorial one; however, purity at least lets one show that the two problems are (morally) equivalent to each other, which turns out to be what is actually needed to make the arguments work. There is also a technical “relative purity” condition we need to impose at each level of the extension to ensure that this purity property propagates up the tower, but I will not describe it in detail here.
It is then possible to recursively construct a tower of extensions that eventually reaches an extension of , for which the above useful properties of exactness, large spectrum, and purity are obeyed, and that the system remains Abramov at each level of the tower. This requires a lengthy process of “straightening” the cocycle by differentiating it, obtaining various “Conze–Lesigne” type equations for the derivatives, and then “integrating” those equations to place the original cocycle in a good form. At multiple stages in this process it becomes necessary to have various short exact sequences of (topological) abelian groups split, which necessitates the various good properties mentioned above. To close the induction one then has to verify that these properties can be maintained as one ascends the tower, which is a non-trivial task in itself.
Abstract. Coxeter and Dynkin diagrams classify a wide variety of structures, most notably finite reflection groups, lattices having such groups as symmetries, compact simple Lie groups and complex simple Lie algebras. The simply laced or “ADE” Dynkin diagrams also classify finite subgroups of SU(2) and quivers with finitely many indecomposable representations. This introductory tour of Coxeter and Dynkin diagrams, based on the column This Week’s Finds in Mathematical Physics, is made to accompany a series of five lecture videos.
I’m a bit sorry that I didn’t probe deeper into why Dynkin diagrams are what they are: that is, why these and no others? I’m also sorry I didn’t dig into the “black magic” that I mention at the end: that is, why does this black magic work? I’d also like to include a little comparison of the 4 lattices you get from the Lie algebra of a compact simple Lie group: the weight lattice, the coweight lattice, the root lattice, and the coroot lattice — merely because I tend to get them confused, and my exposition needed to say a bit about these.
Luckily I can add these other things later. And I think keeping it short and snappy has its own charms.
When Lee and Yang suggested that the laws of physics might not be invariant under spatial reflection — that there’s a fundamental difference between left and right — Pauli was skeptical. In a letter to Victor Weisskopf in January 1957, he wrote:
“Ich glaube aber nicht, daß der Herrgott ein schwacher Linkshänder ist.”
(I do not believe that the Lord is a weak left-hander.)
But just two days after Pauli wrote this letter, Chien-Shiung Wu’s experiment confirmed that Lee and Yang were correct. There’s an inherent asymmetry in nature.
We can trace this back to how the ‘left-handed’ fermions and antifermions live in a different representation of the Standard Model gauge group than the right-handed ones. And when we try to build grand unified theories that take this into account, we run into the fact that while we can fit the Standard Model gauge group into in various ways, not all these ways produce the required asymmetry. There’s a way where it fits into , which is too symmetrical to work… and alas, this one has a nice octonionic description!
To keep things simple I’ll explain this by focusing, not on the whole Standard Model gauge group, but its subgroup . Here is a theorem proved by Will Sawin in response to a question of mine on MathOverflow:
Theorem 10. There are exactly two conjugacy classes of subgroups of that are isomorphic to . One of them has a representative that is a subgroup of , while the other does not.
I’ll describe representatives of these two subgroups; then I’ll say a bit about how they show up in physics, and then I’ll show you Sawin’s proof.
We can get both subgroups in a unified way! There’s always an inclusion
and taking double covers of each group we get a 2-1 homomorphism
In particular we have
so composing with the exceptional isomorphisms:
we get a 2-1 homomorphism
Now, there are three obvious ways to include in . There is an obvious inclusion
but there are three obvious inclusions
namely the left one:
the right one:
and the diagonal one:
Combining these with our earlier maps, we actually get a one-to-one map from to . So we get three subgroups of , all isomorphic to :
There’s the left subgroup , which is the image of this composite homomorphism:
There’s the diagonal subgroup , which is the image of this:
And there’s the right subgroup , which is the image of this:
The left and right subgroups are actually conjugate, but the diagonal one is truly different! We’ll prove this by taking a certain representation of , called the Weyl spinor representation, and restricting it to those two subgroups. We’ll get inequivalent representations of . This proves the two subgroups aren’t conjugate.
This argument is also interesting for physics. When restrict to the left subgroup, we get a representation of that matches what we actually see for one generation of fermions! This is the basis of the so-called grand unified theory, which should really be called the grand unified theory.
(In fact this works not only for but for the whole Standard Model gauge group, which is larger. I’m focusing on just because it makes the story simpler.)
When we restrict the Weyl spinor representation to the diagonal subgroup, we get a representation of that is not physically correct. Unfortunately, it’s the diagonal subgroup that shows up in several papers connecting the Standard Model gauge group to the octonions. I plan to say a lot more about this later.
The left subgroup
Let’s look at the left subgroup , the image of this composite:
has a 32-dimensional unitary representation called the ‘Dirac spinor’ representation. This representation is really on the exterior algebra . It’s the direct sum of two irreducible parts, the even grades and the odd grades:
Physicists call these two irreducible representations ‘right- and left-handed Weyl spinors’, and denote them as and since they’re 16-dimensional and one is the dual of the other.
Let’s restrict the to the left subgroup and see what we get.
To do this, first we can restrict the along and get
Here is the trivial representation of , is the tautologous representation of , and is the tautologous rep of .
Then let’s finish the job by restricting this representation along . Restricting the of to gives : the sum of the tautologous representation of and the trivial representation. Restricting to the left copy of gives the tautologous representation , while restricting to this left copy gives : the sum of two copies of the trivial representation. All in all, we get this representation of :
This is what we actually see for one generation of left-handed fermions and antifermions in the Standard Model! The representation describes how the left-handed fermions in one generation transform under : 3 colors of quark and one ‘white’ lepton. The representation does the same for the left-handed antifermions. The left-handed fermions form an isospin doublet, giving us the , while the left-handed antifermions have no isospin, giving us the .
This strange lopsidedness is a fundamental feature of the Standard Model.
The right subgroup would work the same way, up to switching the words ‘left-handed’ and ‘right-handed’. And by Theorem 10, the left and right subgroups must be conjugate in , because now we’ll see one that’s not conjugate to either of these.
The diagonal subgroup
Consider the diagonal subgroup , the image of this composite:
Let’s restrict the to .
To do this, first let’s restrict the along and get
as before. Then let’s restrict this representation along . The part works as before, but what happens when we restrict or along the diagonal map ? We get . So, this is the representation of that we get:
This is not good for the Standard Model. It describes a more symmetrical universe than ours, where both left-handed fermions and antifermions transform as doublets under .
The fact that we got a different answer this time proves that and are not conjugate in . So to complete the proof of Theorem 10, we only need to prove
Every subgroup of isomorphic to is conjugate to or .
is conjugate to a subgroup of , but is not.
I’ll prove 2, and then I’ll turn you over to Will Sawin to do the rest.
Why the diagonal subgroup fits in
Every rotation of extends to a rotation of that leaves the last coordinate fixed, so we get an inclusion , which lifts to an inclusion of the double covers, . Since we have exceptional isomorphisms
it’s natural to ask how the inclusion looks in these terms. And the answer is: it’s the diagonal map! In other words, we have a commutative diagram
Now, we can easily fit this into a larger commutative diagram involving some natural maps and :
We can simplify this diagram using the isomorphism :
and then we can use our friend the inclusion :
This shows that the diagonal subgroup of is actually a subgroup of !
Why the left subgroup does not fit in
The three-fold way is a coarse classification of irreducible complex representations of compact Lie group. Every such representation is of one and only one of these three kinds:
1) not self-dual: not isomorphic to its dual,
2a) orthogonal: isomorphic to its dual via an invariant nondegenerate symmetric bilinear form, also called an orthogonal structure,
2b) symplectic: isomorphic to its dual via an invariant nondegenerate antisymmetric bilinear form, also called a symplectic structure.
I’ve written about how these three cases are related to the division algebras and , respectively:
A complex representation is orthogonal iff it’s the complexification of a representation on a real vector space, and symplectic iff it’s the underlying complex representation of a representation on a quaternionic vector space.
But we don’t need most of this yet. For now we just need to know one fact: when is odd, every irreducible representation of , and thus every representation of this Lie group, is self-dual: that is, isomorphic to its dual. In particular this is true of .
Why does this matter? Assume the left subgroup is a subgroup of . When we restrict the Weyl spinor representation of to it will be self-dual, like every representation of . Then when we restrict this representation further to it must still be self-dual, since the restriction of a self-dual representation is clearly self-dual.
However, we know this representation is
and this is not self-dual, since and but .
So, it must be that is not a subgroup of .
Proof of Theorem 10
To complete the proof of Theorem 10 we just need to see why there are just two conjugacy classes of subgroups of isomorphic to . But in fact Will Sawin proved a stronger result! He was answering this question of mine:
Define the Standard Model gauge group to be , the subgroup of consisting of block diagonal matrices with a block and then a block. (This is isomorphic to the quotient of by the subgroup of elements ) where is a 6th root of unity.)
Up to conjugacy, how many subgroups isomorphic to the Standard Model gauge group does have?
This question is relevant to grand unified theories of particle physics, as explained here:
This paper focuses on one particular copy of in , given as follows. By definition we have an inclusion , and we also have an inclusion because for any we have an inclusion , and is simply connected so this gives a homomorphism .
However I think there is also an inclusion , studied by Krasnov:
Composing this with , this should give another inclusion , and I believe this one is ‘truly different from’ — i.e., not conjugate to — the first one I mentioned.
So I believe my current answer to my question is “at least two”. But that’s not good enough.
Sawin’s answer relies heavily on the 3-fold way — that’s why I told you that stuff about orthogonal and symplectic representatinos. When we embed the group in , we are automatically giving this group an orthogonal 10-dimensional representation, thanks to the map . We can classify the possibilities.
He writes:
There are infinitely many embeddings. However, all but one of them is “essentially the same as” the one you studied as they become equal to the one you studied on restriction to . The remaining one is the one studied by Krasnov.
has irreducible representations of dimensions , and higher dimensions. The -dimensional ones are dual to each other, as are the -dimensional ones, so they can’t appear. The -dimensional ones are dual to each other and can only appear together. So the only -dimensional self-dual representations of decompose as irreducibles as , , or ten s. All of these are orthogonal because the 8-dimensional representation is orthogonal. However, the ten s cannot appear because then would act trivially.
A representation of is a sum of tensor products of irreducible representations of and irreducible representations of . Restricted to , each tensor product splits into a sum of copies of the same irreducible representation. So can only act nontrivially when the same representation appears multiple times. Since the is two different -dimensional representation, only the -dimensional representation can occur twice. Thus, our 10-dimensional orthogonal representation of necessarily splits as either the -dimensional adjoint repsentation of plus a -dimensional orthogonal representation of or the -dimensional sum of standard and conjugate [i.e., dual] representations of plus a -dimensional orthogonal representation of . However, has a unique nontrivial representation of dimension and it isn’t orthgonal, so only the second case can appear. has representations of dimension of which the and -dimensional ones are symplectic and so must appear with even multiplicity in any orthogonal representation, so the only nontrivial -dimensional orthogonal ones are or .
So there are two ten-dimensional orthogonal representations of that are nontrivial on both factors, those being the sum of two different -dimensional irreducible representations of with either two copies of the two-dimensional irreducible representation of or the three-dimensional and the one-dimensional irreducible representation of . The orthogonal structure is unique up to isomorphisms, so these give two conjugacy classes of homomorphisms and thus two conjugacy classes of homomorphisms . The first one corrresponds to the embedding you studied while only the second one restricts to so indeed these are different.
To understand how to extend these to , I consider the centralizer of the representation within . Since the group is connected, this is the same as the centralizer of its Lie algebra, which is therefore the inverse image of the centralizer in . Now there is a distinction between the two examples because the example with irrep dimensions has centralizer with identity component while the example with irrep dimensions has centralizer with identity component . In the second case, the image of must be the image of times the centralizer of the image of , so this gives a unique example, which must be the one considered by Krasnov.
In the first case, we can restrict attention to a torus in . The center of maps to a one-dimensional subgroup of this torus, which can be described by a pair of integers. Explicitly, given a two-by-two-unitary matrix and a three-by-three unitary matrix with , we can map to by sending to where , and then map from to . This lifts to the spin group if and only if the determinant in is a perfect square. The determinant is so a lift exists if and only if is even.
The only possible kernel of this embedding is the scalars. The scalar maps to and so the kernel is trivial if and only if .
However, there are infinitely many integer solutions to with even (in fact, a random and even works with probability ), so this gives infinitely many examples.
Part 1. How to define octonion multiplication using complex scalars and vectors, much as quaternion multiplication can be defined using real scalars and vectors. This description requires singling out a specific unit imaginary octonion, and it shows that octonion multiplication is invariant under .
Part 2. A more polished way to think about octonion multiplication in terms of complex scalars and vectors, and a similar-looking way to describe it using the cross product in 7 dimensions.
Part 3. How a lepton and a quark fit together into an octonion — at least if we only consider them as representations of , the gauge group of the strong force. Proof that the symmetries of the octonions fixing an imaginary octonion form precisely the group .
Part 4. Introducing the exceptional Jordan algebra : the self-adjoint octonionic matrices. A result of Dubois-Violette and Todorov: the symmetries of the exceptional Jordan algebra preserving their splitting into complex scalar and vector parts and preserving a copy of the adjoint octonionic matrices form precisely the Standard Model gauge group.
Part 5. How to think of self-adjoint octonionic matrices as vectors in 10d Minkowski spacetime, and pairs of octonions as left- or right-handed spinors.
Part 6. The linear transformations of the exceptional Jordan algebra that preserve the determinant form the exceptional Lie group . How to compute this determinant in terms of 10-dimensional spacetime geometry: that is, scalars, vectors and left-handed spinors in 10d Minkowski spacetime.
Part 7. How to describe the Lie group using 10-dimensional spacetime geometry. This group is built from the double cover of the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 8. A geometrical way to see how is connected to 10d spacetime, based on the octonionic projective plane.
Part 9. Duality in projective plane geometry, and how it lets us break the Lie group into the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 10. Jordan algebras, their symmetry groups, their invariant structures — and how they connect quantum mechanics, special relativity and projective geometry.
Part 11. Particle physics on the spacetime given by the exceptional Jordan algebra: a summary of work with Greg Egan and John Huerta.
Part 12. The bioctonionic projective plane and its connections to algebra, geometry and physics.
Part 13. Two ways to embed in , and their consequences for particle physics.
The bioctonionic plane also has intriguing mathematically connections to the Standard Model. But it’s not a projective plane in the axiomatic sense — and it can’t be constructed by straightforwardly copying the way you build a projective plane over a division algebra, since unlike the octonions, the bioctonions are not a division algebra. Nonetheless we can define points and lines in the bioctonionic plane. The twist is that now some pairs of distinct lines intersect in more than one point — and dually, some pairs of distinct points lie on more than one line. It obeys some subtler axioms, so people call it a Hjelmslev plane.
I am not ready to give a really good explanation of the bioctonionic plane! Instead, I just want to lay out some basic facts about how it fits into mathematics — and possibly physics.
Latham Boyle works at the University of Edinburgh, which is where I am now. Being able to talk to someone who deeply understands octonions and particle physics is very energizing. I’m especially fascinated by this paper of his:
It gives a convincing argument that the bioctonionic plane may be better than the octonionic projective plane for particle physics. The reason is that the tangent space of any point of the bioctonionic plane is a copy of , a 16-dimensional complex vector space. The symmetry group of the bioctonionic plane is the exceptional Lie group . Sitting inside the stabilizer group of any given point is a copy of the Standard Model gauge group. And — here’s the cool part — this group acts on just as it does on one generation of fermions (not their antiparticles). If we try the same trick using the octonionic projective plane, we can fit the Standard Model gauge group in the stabilizer group of a point in a very natural way, but its action on the tangent space is its action only on left-handed fermions.
I want to explain this in detail, but not today. Instead, I want to skim through some basic facts about the bioctonionic plane.
the octonionic projective plane , a 16-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is a symmetric space, and as such it’s called FII in Cartan’s classification.
the bioctonionic plane , a 32-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is the symmetric space EIII.
the quateroctonionic plane , a 64-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is the symmetric space EVI.
the octooctonionic plane , a 128-dimensional compact Riemannian manifold on which the compact Lie group acts transitively as isometries, with the stabilizer of any point being . This is the symmetric space EVIII.
There’s a nice network of systematic approaches to these spaces: they form one row of the so-called magic square, so one way to learn about the bioctonionic plane is to study the magic square, for example here:
Chris H. Barton and Anthony Sudbery, Magic squares of Lie algebras. Available as arXiv:math/0001083; see also arXiv:0203010 for a “streamlined and extended” version, which has more yet also less.
Here you can also find lots of references to earlier work, e.g. to Freudenthal and Tits. The basic idea of the magic square is that you start with two normed division algebras and from them you build a Lie algebra, which gives a Lie group . There’s also a way to get a subgroup , and the quotient space
is a kind of ‘plane’ on which the group acts. If you take , this construction gives you the four Rosenfeld planes listed above.
Each one of these planes is a compact Riemannian symmetric space: this means it’s a connected compact Riemannian manifold such that for each point there’s an isometry
that fixes , acts as on its tangent space, and squares to the identity. This map is called ‘reflection across ’ for the obvious reason. For example, a round 2-sphere is a symmetric space, with switching the red and blue tangent vectors if is the black dot:
Cartan classified compact Riemannian symmetric spaces, and there’s a nice theory of them. Any compact simple Lie group is one, and most of the rest come in infinite series connected to real and complex Clifford algebras, as I explained here. But there are 9 extra ones, all related to the octonions and exceptional Lie groups. The Rosenfeld planes are four of these.
You can learn this material starting on Wikipedia and then going to a textbook, ideally this:
Sigurdur Helgason, Differential Geometry, Lie Groups and Symmetric Spaces, Academic Press, 1978.
Helgason taught me Lie theory when I was a grad student at MIT, so I have a fondness for his book—but it’s also widely accepted as the most solid text on symmetric spaces!
The bioctonionic plane is even better: it’s a compact hermitian symmetric space: a compact Riemannian symmetric space where each tangent space has a complex structure
compatible with the metric, and reflection about each point preserves this complex structure. I mentioned that the bioctonionic plane is
where
acts transitively, and the stabilizer of a point is
The here comes from the complex structure!
Wikipedia is especially thorough on hermitian symmetric spaces, so if you want to delve into those, start here:
Another tack is to focus on the exceptional Lie groups and and their connection to the nonassociative algebras ,
, and , respectively. Here I recommend this:
Ichiro Yokota, Exceptional Lie Groups. Available as arXiv:0902.0431. (See especially Chapter 3, for and the complexified exceptional Jordan algebra .)
If you have a fondness for algebra you may also want to learn how symmetric spaces arise from Jordan triple systems or Jordan pairs. This is important if we wish to see the bioctonionic plane as the space of pure states of some exotic quantum system!
Now, this is much easier to do for the octonionic plane, because that’s the space of pure states for the exceptional Jordan algebra , which is a Euclidean Jordan algebra, meaning one in which a sum of squares can only be zero if all those squares are zero. You can think of a Euclidean algebra as consisting of observables, with the sums of squares being ‘nonnegative’ observables. These nonnegative observables form a convex cone . The dual vector space contains a cone of linear functionals that send these nonnegative observables to nonnegative real numbers — I’ll call this the dual cone . The functionals with are called states. The states form a convex set, and the extreme points are called pure states. All of this fits nicely into a modern framework for understanding quantum theory and potential generalizations, called ‘generalized probabilistic theories’:
Howard Barnum, Alexander Wilce, Post-classical probability theory, in Quantum Theory: Informational Foundations and Foils, eds.
Giulio Chiribella, Robert W. Spekkens, Springer, 2016. (See Section 5 for Jordan algebras, and ignore the fact that they say the exceptional Jordan algebra consists of matrices: they know perfectly well that they’re .)
The underlying math, with a lot more about symmetric spaces, cones and Euclidean Jordan algebras but with none of the physics interpretation, is wonderfully explained here:
Jacques Faraut and Adam Korányi, Analysis on Symmetric Cones, Oxford U. Press, 1994.
A crucial fact throughout this book is that when you start with a Euclidean Jordan algebra , its cone of nonnegative observables is self-dual: there’s an isomorphism of vector spaces that maps to in a one-to-one and onto way. The cone is also homogeneous, meaning that the group of invertible linear transformations of preserving acts transitively on the interior of . Faraut and Korányi call a self-dual homogeneous cone a symmetric cone — and they show that any symmetric cone comes from a Euclidean Jordan algebra! This result plays an important role in modern work on the foundations of quantum theory.
Unfortunately, I’m telling you all this nice stuff about Euclidean Jordan algebras and symmetric cones just to say that while all this applies to the octonionic projective plane, sadly, it does not apply to the bioctonionic plane! The bioctonionic plane does not come from a Euclidean Jordan algebra or a symmetric cone. Thus, to understand it as a space of pure states, we’d have to resort to a more general formalism.
There are a few papers that attempt exactly this:
Lawrence C. Biedenharn and Piero Truini, An invariant quantum mechanics for a Jordan pair, Journal of Mathematical Physics23 (1982), 1327-1345.
Lawrence C. Biedenharn and Piero Truini, Exceptional groups and elementary particle structures,
Physica A: Statistical Mechanics and its Applications14 (1982), 257–270.
Lawrence C. Biedenharn, G. Olivieri and Piero Truini, Three graded exceptional algebras and symmetric spaces, Zeitschrift Physik C — Particles and Fields33 (1986), 47–65.
Here’s the basic idea. We can define to consist of hermitian matrices with entries in , where ‘hermitian’ is defined using the star-algebra structure on where we conjugate the octonion part but not the complex part! Then is just the complexification of :
Then because is a Jordan algebra over , is a Jordan algebra over . So we can do a lot with it. But it’s not a Euclidean Jordan algebra.
Puzzle. Show that it’s not.
So, Biedenharn and Truini need a different approach to relate to some sort of exotic quantum system. And they use an approach already known to mathematicians: namely, the theory of Jordan pairs! Here you work, not with a single element of , but with a pair.
Jordan triple systems and Jordan pairs are two closely related generalizations of Jordan algebras. I’ve been working on the nLab articles about these concepts, so click the links if you want to learn more about them. I explain how either of these things gives you a 3-graded Lie algebra — that is, a -graded Lie algebra that is nonvanishing only in the middle 3 grades:
And from a 3-graded Lie algebra you can get a symmetric space where the Lie algebra of is and the Lie algebra of is . Each tangent space of this symmetric space is thus isomorphic to .
In the case relevant to the bioctonionic plane, the 3-graded Lie algebra is
So, the bioctonionic plane is a symmetric space on which acts, with stabilizer group (up to covering spaces), and with tangent space isomorphic to .
So all this is potentially very nice. For much more on this theory, try the work of Ottmar Loos:
To wrap things up, I should say a bit about ‘Hjelmslev planes’, since the bioctonionic plane is supposed to be one of these. Axiomatically, a Hjelmslev plane is a set of points, a set of lines, and an incidence relation between points and lines. We require that for any two distinct points there is at least one line incident to both, and for any two distinct line there is at least one point incident to both. If two points are incident to more than one line we say they are neighbors. If two lines are incident to more than one point we say they are neighbors. We demand that both these ‘neighbor’ relations are equivalence relations, and that if we quotient and by these equivalence relations, we get an axiomatic projective plane.
Challenge. What projective plane do we get if we apply this quotient construction to the bioctonionic plane?
My only guess is that we get the octonionic projective plane — but I don’t know why.
The literature on Hjelmslev planes seems a bit difficult, but I’m finding this to be a good introduction:
The answer to my puzzle should be here, because they’re talking about Hjelmslev planes built using split octonion algebras (like ):
Tonny A. Springer and Ferdinand D. Veldkamp, On Hjelmslev–Moufang planes, Mathematicsche Zeitschrift107 (1968), 249–263.
But I don’t see the answer here yet!
Part 1. How to define octonion multiplication using complex scalars and vectors, much as quaternion multiplication can be defined using real scalars and vectors. This description requires singling out a specific unit imaginary octonion, and it shows that octonion multiplication is invariant under .
Part 2. A more polished way to think about octonion multiplication in terms of complex scalars and vectors, and a similar-looking way to describe it using the cross product in 7 dimensions.
Part 3. How a lepton and a quark fit together into an octonion — at least if we only consider them as representations of , the gauge group of the strong force. Proof that the symmetries of the octonions fixing an imaginary octonion form precisely the group .
Part 4. Introducing the exceptional Jordan algebra : the self-adjoint octonionic matrices. A result of Dubois-Violette and Todorov: the symmetries of the exceptional Jordan algebra preserving their splitting into complex scalar and vector parts and preserving a copy of the adjoint octonionic matrices form precisely the Standard Model gauge group.
Part 5. How to think of self-adjoint octonionic matrices as vectors in 10d Minkowski spacetime, and pairs of octonions as left- or right-handed spinors.
Part 6. The linear transformations of the exceptional Jordan algebra that preserve the determinant form the exceptional Lie group . How to compute this determinant in terms of 10-dimensional spacetime geometry: that is, scalars, vectors and left-handed spinors in 10d Minkowski spacetime.
Part 7. How to describe the Lie group using 10-dimensional spacetime geometry. This group is built from the double cover of the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 8. A geometrical way to see how is connected to 10d spacetime, based on the octonionic projective plane.
Part 9. Duality in projective plane geometry, and how it lets us break the Lie group into the Lorentz group, left-handed and right-handed spinors, and scalars in 10d Minkowski spacetime.
Part 10. Jordan algebras, their symmetry groups, their invariant structures — and how they connect quantum mechanics, special relativity and projective geometry.
Part 11. Particle physics on the spacetime given by the exceptional Jordan algebra: a summary of work with Greg Egan and John Huerta.
Part 12. The bioctonionic projective plane and its connections to algebra, geometry and physics.
I’m writing to point out a potential law which should be gathering more opposition and attention in math academia: The Securing American Funding and Expertise from Adversarial Research Exploitation Act. This is an amendment to the 2026 National Defense Authorization Act which has passed the House and could be added to the final version of the bill during reconcilliation in the Senate. I’m pulling most of my information from an article in Science.
This act would ban any US scientist from receiving federal funding if they have, within the last five years, worked with anyone from China, Russia, Iran or North Korea, where “worked with” includes joint research, co-authorship on papers, or advising a foreign graduate student or postdoctoral fellow. As I said in my message to my senators, this is everyone. Every mathematician has advised Chinese graduate students or collaborated with Chinese mathematicians, because China is integrated into the academic world and is one fifth of the earth.
This obviously isn’t secret, since you can read about it in Science, but I am surprised that I haven’t heard more alarm. Obvious people to contact are your senators and your representatives. I would also suggest contacting members of the Senate armed services committee, who are in charge of reconciling the House and Senate versions of the bill.
“Information” is an idea that is everywhere in science and technology these days. From one angle it looks like such an obvious idea that it’s a bit startling to realize that information theory didn’t really come along until the work of Claude Shannon in the 1940s. From another, the idea has so many different shades of meaning that we shouldn’t be surprised (that’s a joke you will get in a bit) that it can be hard to understand.
Information theory is obviously an enormous subject, but we’re just giving thanks, not writing a textbook. I want to mention two ideas I find especially central. First, Shannon’s idea about relating information content to “surprisal.” Second, the very different intuitive notions of information that we get from engineering and physics.
Shannon, working at Bell Labs, was interested in the problem of how to send trustworthy signals efficiently over transatlantic cables. He was thinking about various ways to express information in a code: a set of symbols, each with a defined meaning. So a code might be an alphabet, or a set of words, or a literal cipher. And he noticed that there was a lot of redundancy in natural languages; the word “the” appears much more often in English than the word “axe,” although both have the same number of letters.
Let’s refer to each letter or symbol in a code as an “event.” Shannon’s insight was to realize that the more unlikely an event, the more information it conveyed when it was received. The statements “The Sun rose in the east this morning” and “The Sun rose in the west this morning” contain the same number of letters, but the former contains almost no information — you already were pretty sure the Sun would be rising in the east. But the latter, if obtained from a reliable source, would be very informative indeed, precisely because it was so unexpected. Clearly some kind of unprecedented astronomical catastrophe was in progress.
Imagine we can assign a probability to every different event . Shannon wanted a way to quantify the information content of that event, which would satisfy various reasonable-seeming axioms: most crucially, that the information content of two independent events is the sum of the individual information contents. But the joint probability of two events is the product of their individual probabilities. So the natural thing to do would be to define the information content as the logarithm of the probability; the logarithm of a product equals the sum of the individual logarithms. But you want low probability to correspond to high information content, so Shannon defined the information content (also called the self-information, or surprisal, or Shannon information) of an event to be minus the log of the probability, which by math is equal to the log of the reciprocal of the probability:
Note that probabilities are numbers between 0 and 1, and the log of such a number will be negative, with numbers closer to 0 being more negative than numbers closer to 1. So goes from at to at . An impossible message is infinitely surprising, and therefore conveys infinite information; an inevitable message is completely unsurprising, and conveys no information at all.
From there, Shannon suggested that we could characterize how efficient an entire code was at conveying information: just calculate the average (expectation value) of the information content for all possible events. When we have a probability distribution , the average of any function is just the sum of the the values of the function times their respective probabilities, . So we characterize the information content of a code via the quantity
The only question is, what to call this lovely newly-defined quantity that surely nobody had ever thought of before? Happily Shannon was friends with John von Neumann, who informed him, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.” So entropy it is.
Indeed, this formula is precisely that which had been put forward (unknown to Shannon) by Josiah Willard Gibbs in the 1870’s as a definition of entropy in statistical mechanics. (It is related to the definition on Ludwig Boltzmann’s tombstone, , and Boltzmann had also suggested similar expressions to the above.) On the one hand, it seems remarkable to find precisely the same expression playing central roles in problems as disparate as sending signals across cables and watching cream mix into coffee; on the other hand, it’s a relatively simple expression and the axioms used to derive it are actually pretty similar, so perhaps we shouldn’t be surprised; on the third hand, the connection between information theory and statistical mechanics turns out to be deep and fruitful, so it’s more than just a mathematical coincidence.
But let me highlight the one aspect of the term “information” that can be sometimes confusing to people. To the engineer, a code that is maximally informative is one for which is relatively uniform over all events , which means is maximal or close to it; in that case, every event will tell you something at least a little bit interesting. For them, high entropy = high information.
But to a physicist who might be asking “how much information do I have about the state of a system?”, you have more information when is relatively narrowly concentrated around some value, rather than being all spread out. For them, high entropy = low information! Indeed, one physically-relevant notion of “information” is the “accessible information” of a system, which can be defined as . (I talk about this a bit in my recent solo podcast on complexity.)
Perhaps we shouldn’t be so surprised that physicists and engineers posit oppositely-directed relationships between entropy and information. It’s just a reflection of the fact that “information” is so ubiquitous and has so many different uses. We should be thankful that we’re beginning to understand it so well.
It’s been over three years since my last post on this blog and I have sometimes been asked, understandably, whether the project I announced in my previous post was actually happening. The answer is yes — the grant I received from the Astera Institute has funded several PhD students and a couple of postdocs, and we have been busy. In my previous post I suggested that I would be open to remote collaboration, but that has happened much less, partly because a Polymath-style approach would have been difficult to manage while also ensuring that my PhD students would have work that they could call their own to put in their theses.
In general I don’t see a satisfactory solution to that problem, but in this post I want to mention a subproject of the main project that is very much intended to be a large public collaboration. A few months ago, a call came out from Renaissance Philanthropies saying that they were launching a $9m AI for Math Fund to spend on projects in the general sphere of AI and mathematics, and inviting proposals. One of the categories that they specifically mentioned was creating new databases, and my group submitted a proposal to create a database of what we call “structured motivated proofs,” a piece of terminology that I will explain a bit more later in just a moment. I am happy to report that our proposal was one of the 29 successful ones. Since a good outcome to the project will depend on collaboration from many people outside the group, we need to publicize it, which is precisely the purpose of this post. Below I will be more specific about the kind of help we are looking for.
Why might yet another database of theorems and proofs be useful?
The underlying thought behind this project is that AI for mathematics is being held back not so much by an insufficient quantity of data as by the wrong kind of data. (For a more general exploration of this theme, see here.) All mathematicians know, and some of us enjoy complaining about it, that it is common practice when presenting a proof in a mathematics paper, or even textbook, to hide the thought processes that led to the proof. Often this does not matter too much, because the thought processes may be standard ones that do not need to be spelt out to the intended audience. But when proofs start to get longer and more difficult, they can be hard to read because one has to absorb definitions and lemma statements that are not obviously useful, are presented as if they appeared from nowhere, and demonstrate their utility only much later in the argument.
A sign that this is a problem for AI is the behaviour one observes after asking an LLM to prove a statement that is too difficult for it. Very often, instead of admitting defeat, it will imitate the style of a typical mathematics paper and produce rabbits out of hats, together with arguments later on that those rabbits do the required job. The problem is that, unlike with a correct mathematics paper, one finds when one scrutinizes the arguments carefully that they are wrong. However, it is hard to find superficial features that distinguish between an incorrect rabbit with an incorrect argument justifying that rabbit (especially if the argument does not go into full detail) and a correct one, so the kinds of statistical methods used by LLMs do not have an easy way to penalize the incorrectness.
Of course, that does not mean that LLMs cannot do mathematics at all — they are remarkably good at it, at least compared with what I would have expected three years ago. How can that be, given the problem I have discussed in the previous paragraph?
The way I see it (which could change — things move so fast in this sphere), the data that is currently available to train LLMs and other systems is very suitable for a certain way of doing mathematics that I call guess and check. When trying to solve a maths problem, you will normally write down the routine parts of an argument without any fuss (and an LLM can do them too because it has seen plenty of similar examples), but if the problem as a whole is not routine, then at some point you have to stop and think, often because you need to construct an object that has certain properties (I mean this in a rather general way — the “object” might be a lemma that will split up the proof in a nice way) and it is not obvious how to do so. The guess-and-check approach to such moments is what it says: you make as intelligent a guess as you can and then see whether it has the properties you wanted. If it doesn’t, you make another guess, and you keep going until you get lucky.
The reason an LLM might be tempted to use this kind of approach is that the style of mathematical writing I described above makes it look as though that is what we as mathematicians do. Of course, we don’t actually do that, but we tend not to mention all the failed guesses we made and how we carefully examined why they failed, modifying them in appropriate ways in response, until we finally converged on an object that worked. We also don’t mention the reasoning that often takes place before we make the guess, saying to ourselves things like “Clearly an Abelian group can’t have that property, so I need to look for a non-Abelian group.”
Intelligent guess and check works well a lot of the time, particularly when carried out by an LLM that has seen many proofs of many theorems. I have often been surprised when I have asked an LLM a problem of the form , where is some property that is hard to satisfy, and the LLM has had no trouble answering it. But somehow when this happens, the flavour of the answer given by the LLM leaves me with the impression that the technique it has used to construct is one that it has seen before and regards as standard.
If the above picture of what LLMs can do is correct (the considerations for reinforcement-learning-based systems such as AlphaProof are not identical but I think that much of what I say in this post applies to them too for slightly different reasons), then the likely consequence is that if we pursue current approaches, then we will reach a plateau: broadly speaking they will be very good at answering a question if it is the kind of question that a mathematician with the right domain expertise and good instincts would find reasonably straightforward, but will struggle with anything that is not of that kind. In particular, they will struggle with research-level problems, which are, almost by definition, problems that experts in the area do not find straightforward. (Of course, there would probably be cases where an LLM spots relatively easy arguments that the experts had missed, but that wouldn’t fundamentally alter the fact that they weren’t really capable of doing research-level mathematics.)
But what if we had a database of theorems and proofs that did not hide the thought processes that lay behind the non-obvious details of the proofs? If we could train AI on a database of accounts of proof discoveries and if, having done so, we then asked it to provide similar accounts, then it would no longer resort to guess-and-check when it got stuck, because the proof-discovery accounts it had been trained on would not be resorting to it. There could be a problem getting it to unlearn its bad habits, but I don’t think that difficulty would be impossible to surmount.
The next question is what such a database might look like. One could just invite people to send in stream-of-consciousness accounts of how they themselves found certain proofs, but that option is unsatisfactory for several reasons.
It can be very hard to remember where an idea came from, even a few seconds after one has had it — in that respect it is like a dream, the memory of which becomes rapidly less vivid as one wakes up.
Often an idea will seem fairly obvious to one person but not to another.
The phrase “motivated proof” means different things to different people, so without a lot of careful moderation and curation of entries, there is a risk that a database would be disorganized and not much more helpful than a database of conventionally written proofs.
A stream-of-consciousness account could end up being a bit too much about the person who finds the proof and not enough about the mathematical reasons for the proof being feasibly discoverable.
To deal with these kinds of difficulties, we plan to introduce a notion of a structured motivated proof, by which we mean a proof that is generated in a very particular way that I will partially describe below. A major part of the project, and part of the reason we needed funding for it, is to create a platform that will make it convenient to input structured motivated proofs and difficult to insert the kinds of rabbits out of hats that make a proof mysterious and unmotivated. In this way we hope to gamify the task of creating the database, challenging people to input into our system proofs of certain theorems that appear to rely on “magic” ideas, and perhaps even offering prizes for proofs that contain steps that appear in advance to be particularly hard to motivate. (An example: the solution by Ellenberg and Gijswijt of the cap-set problem uses polynomials in a magic-seeming way. The idea of using polynomials came from an earlier paper of Croot, Lev and Pach that proved a closely related theorem, but in that paper it just appears in the statement of their Lemma 1, with no prior discussion apart from the words “in the present paper we use the polynomial method” in the introduction.)
What is a structured motivated proof?
I wrote about motivated proofs in my previous post, but thanks to many discussions with other members of the group, my ideas have developed quite a lot since then. Here are two ways we like to think about the concept.
1. A structured motivated proof is one that is generated by standard moves.
I will not go into full detail about what I mean by this, but will do so in a future post when we have created the platform that we would like people to use in order to input proofs into the database. But the basic idea is that at any one moment one is in a certain state, which we call a proof-discovery state, and there will be a set of possible moves that can take one from the current proof-discovery state to a new one.
A proof-discovery state is supposed to be a more formal representation of the state one is in when in the middle of solving a problem. Typically, if the problem is difficult, one will have asked a number of questions, and will be aware of logical relationships between them: for example, one might know that a positive answer to Q1 could be used to create a counterexample to Q2, or that Q3 is a special case of Q4, and so on. One will also have proved some results connected with the original question, and again these results will be related to each other and to the original problem in various ways that might be quite complicated: for example P1 might be a special case of Q2, which, if true would reduce Q3 to Q4, where Q3 is a generalization of the statement we are trying to prove.
Typically we will be focusing on one of the questions, and typically that question will take the form of some hypotheses and a target (the question being whether the hypotheses imply the target). One kind of move we might make is a standard logical move such as forwards or backwards reasoning: for example, if we have hypotheses of the form and , then we might decide to deduce . But things get more interesting when we consider slightly less basic actions we might take. Here are three examples.
We have in our list of hypotheses the fact that a function is given by the formula , where is a polynomial, and our goal is to prove that there exists such that . Without really thinking about it, we are conscious that is a composition of two functions, one of which is continuous and one of which belongs to a class of functions that are all continuous, so is continuous. Also, the conclusion matches well the conclusion of the intermediate-value theorem. So the intermediate-value theorem comes naturally to mind and we add it to our list of available hypotheses. In practice we wouldn’t necessarily write it down, but the system we wish to develop is intended to model not just what we write down but also what is going on in our brains, so we propose a move that we call library extraction (closely related to what is often called premise selection in the literature). Note that we have to be a bit careful about library extraction. We don’t want the system to be allowed to call up results from the library that appear to be irrelevant but then magically turn out to be helpful, since those would feel like rabbits out of hats. So we want to allow extraction of results only if they are obvious given the context. It is not easy to define what “obvious” means, but there is a good rule of thumb for it: a library extraction is obvious if it is one of the first things ChatGPT thinks of when given a suitable non-cheating prompt. For example, I gave it the prompt, “I have a function from the reals to the reals and I want to prove that there exists some such that . Can you suggest any results that might be helpful?” and the intermediate-value theorem was its second suggestion. (Note that I had not even told it that was continuous, so I did not need to make that particular observation before coming up with the prompt.)
We have a goal of the form . If this were a Lean proof state, the most common way to discharge a goal of this form would be to input a choice for . That is, we would instantiate the existential quantifier with some and our new goal would be . However, as with library extraction, we have to be very careful about instantiation if we want our proof to be motivated, since we wish to disallow highly surprising choices of that can be found only after a long process of thought. So we have to restrict ourselves to obvious instantiations. One way that an instantiation in our system will count as obvious is if the variable is instantiated with a term that is already present in the proof-discovery state. If the desired term is not present, then in order to continue with the proof, it will be necessary to carry out moves that generate it. A very common technique for this is the use of metavariables: instead of guessing a suitable , we create a variable and change the goal to , which we can think of as saying “I’m going to start trying to prove even though I haven’t chosen yet. As the attempted proof proceeds, I will note down any properties that might have that would help me finish the proof, in the hope that (i) I get to the end and (ii) the problem is easier than the original problem.” Another kind of obvious instantiation is one where we try out an object that is “extreme” in some way — it might be the smallest element of , or the largest, or the simplest. (Judging simplicity is another place where the ChatGPT rule of thumb can be used.)
We cannot see how to answer the question we are focusing on so we ask a related question. Two very common kinds of related question (as emphasized by Polya) are generalization and specialization. Perhaps we don’t see why a hypothesis is helpful, so we see whether the result holds if we drop that hypothesis. If it does, then we are no longer distracted by an irrelevant hypothesis. If it does not, then we can hope to find a counterexample that will help us understand how to use the hypothesis. Or perhaps we are trying to prove a general statement but it is not clear how to do so, so instead we formulate some special cases, hoping that we can prove them and spot features of the proofs that we can generalize. Again we have to be rather careful here not to allow “non-obvious” generalizations and specializations. Roughly the idea there is that a generalization should be purely logical — for example, dropping a hypothesis is fine but replacing the hypothesis “ is twice differentiable” by “ is upper semicontinuous” is not — and that a specialization should be to a special case that counts as an obvious instantiation in the sense discussed just above.
2. A structured motivated proof is one that can be generated with the help of a point-and-click system.
This is a surprisingly useful way to conceive of what we are talking about, especially as it relates closely to what I was talking about earlier: imposing a standard form on motivated proofs (which is why we call them “structured” motivated proofs) and gamifying the process of producing them.
The idea is that a structured motivated proof is one that can be generated using an interface (which we are in the process of creating — at the moment we have a very basic prototype that has a few of the features we will need, but not yet the more interesting ones) that has one essential property: the user cannot type in data. So what can they do? They can select text that is on their screen (typically mathematical expressions or subexpressions), they can click buttons, choose items from drop-down menus, and accept or reject “obvious” suggestions made to them by the interface.
If, for example, the current goal is an existential statement , then typing in a formula that defines a suitable is not possible, so instead one must select text or generate new text by clicking buttons, choosing from short drop-down menus, and so on. This forces the user to generate, which is our proxy for showing where the idea of using came from.
Broadly speaking, the way the prototype works is to get an LLM to read a JSON object that describes the variables, hypotheses and goals involved in the proof state in a structured format, and to describe (by means of a fairly long prompt) the various moves it might be called upon to do. Thus, the proofs generated by the system are not formally verified, but that is not an issue that concerns us in practice since there will be a human in the loop throughout to catch any mistakes that the LLM might make, and this flexibility may even work to our advantage to better capture the fluidity of natural-language mathematics.
There is obviously a lot more to say about what the proof-generating moves are, or (approximately equivalently) what the options provided by a point-and-click system will be. I plan to discuss that in much more detail when we are closer to having an interface ready, the target for which is the end of this calendar year. But the aim of the project is to create a database of examples of proofs that have been successfully generated using the interface, which can then be used to train AI to play the generate-structured-motivated-proof game.
How to get involved.
There are several tasks that will need doing once the project gets properly under way. Here are some of the likely ones.
The most important is for people to submit structured motivated (or move-generated) proofs to us on the platform we provide. We hope that the database will end up containing proofs of a wide range of difficulty (of two kinds — there might be fairly easy arguments that are hard to motivate and there might be arguments that are harder to follow but easier to motivate) and also a wide range of areas of mathematics. Our initial target, which is quite ambitious, is to have around 1000 entries by two years from now. While we are not in a position to accept entries yet, if you are interested in participating, then it is not too early to start thinking in a less formal way about how to convert some of your favourite proofs into motivated versions, since that will undoubtedly make it easier to get them accepted by our platform when it is ready.
We are in the process of designing the platform. As I mentioned earlier, we already have a prototype, but there are many moves we will need it to be able to do that it cannot currently do. For example, the current prototype allows just a single proof state, which consists of some variable declarations, hypotheses, and goals. It does not yet support creating subsidiary proof states (which we would need if we wanted to allow the user to consider generalizations and specializations, for example). Also, for the moment the prototype gets an LLM to implement all moves, but some of the moves, such as applying modus ponens, are extremely mechanical and would be better done using a conventional program. (On the other hand, moves such as “obvious library extraction” or “provide the simplest example” are better done by an LLM.) Thirdly, a technical problem is that LaTeX is currently rendered as images, which makes it hard to select subexpressions, something we will need to be able to do in a non-clunky way. And the public version of the platform will need to be web-based and very convenient to use. We will want features such as being able to zoom out and look at some kind of dependency diagram of all the statements and questions currently in play, and then zoom in on various nodes if the user wishes to work on them. If you think you may be able (and willing) to help with some of these aspects of the platform, then we would be very happy to hear from you. For some, it would probably help to have a familiarity with proof assistants, while for others we would be looking for somebody with software engineering experience. The grant from the AI for Math Fund will allow us to pay for some of this help, at rates to be negotiated. We are not yet ready to specify in detail what help we need, but would welcome any initial expressions of interest.
Once the platform is ready and people start to submit proofs, it is likely that, at least to start with, they will find that the platform does not always provide the moves they need. Perhaps they will have a very convincing account of where a non-obvious idea in the proof came from, but the system won’t be expressive enough for them to translate that account into a sequence of proof-generating moves. We will want to be able to react to such situations (if we agree that a new move is needed) by expanding the capacity of the platform. It will therefore be very helpful if people sign up to be beta-testers, so that we can try to get the platform to a reasonably stable state before opening it up to a wider public. Of course, to be a beta-tester you would need to have a few motivated proofs in mind.
It is not obvious that every proof submitted via the platform, even if submitted successfully, would be a useful addition to the database. For instance, it might be such a routine argument that no idea really needs to have its origin explained. Or it might be that, despite our best efforts, somebody finds a way of sneaking in a rabbit while using only the moves that we have provided. (One way this could happen is if an LLM made a highly non-obvious suggestion that happened to work, in which case the rule of thumb that if an LLM thinks of it, it must be obvious, would have failed in that instance.) For this reason, we envisage having a team of moderators, who will check entries and make sure that they are good additions to the database. We hope that this will be an enjoyable task, but it may have its tedious aspects, so we envisage paying moderators — again, this expense was allowed for in our proposal to the AI for Math Fund.
If you think you might be interested in any of these roles, please feel free to get in touch. Probably the hardest recruitment task for us will be identifying the right people with the right mixture of mathematical knowledge and software engineering skills to help us turn the platform into a well-designed web-based one that is convenient and pleasurable to use. If you think you might be such a person, or if you have a good idea for how we should go about finding one, we would be particularly interested to hear from you.
In a future post, I will say more about the kinds of moves that our platform will allow, and will give examples of non-motivated proofs together with how motivated versions of those proofs can be found and entered using the platform (which may involve a certain amount of speculation about what the platform will end up looking like).
How does this relate to use of tactics in a proof assistant?
In one way, our “moves” can be regarded as tactics of a kind. However, some of the moves we will need are difficult to implement in conventional proof assistants such as Lean. In parallel with the work described above, we hope to create an interface to Lean that would allow one to carry out proof-discovery moves of the kind discussed above but with the proof-discovery states being collections of Lean proof states. Members of my group have already been working on this and have made some very interesting progress, but there is some way to go. However, we hope that at some point (and this is also part of the project pitched to the AI for Math Fund) we will have created another interface that will have Lean working in the background, so that it will be possible to generate motivated proofs that will be (or perhaps it is better to say include) proofs in Lean at the same time.
Another possibility that we are also considering is to use the output of the first platform (which, as mentioned above, will be fairly formal, but not in the strict sense of a language such as Lean) to create a kind of blueprint that can then be autoformalized automatically. Then we would have a platform that would in principle allow mathematicians to search for proofs while working on their computers without having to learn a formal language, with their thoughts being formalized as they go.
I spent the day at the NSBP / NSHP meeting in San José. My favorite session of the day was the morning astro session, which was entirely about brown dwarfs. I learned a lot in a very short time. Caprice Phillips (UCSC) introduced the session with an introduction to the scientific and technical questions in play. She put a lot of emphasis on using binaries and clusters to put detailed abundance ratios onto substellar objects. This was what I expected: I thought (walking in to this session) that all known abundance ratios for brown dwarfs were from such kinds of studies. I learned different (keep reading).
Gabriel Munoz Zarazua (SFSU) followed by showing spectra from M-dwarfs, brown dwarfs, and Jupiter. It definitely looks like a sequence. He does spectral fitting (what they call, in this business, retrievals). It looks like he is getting very good, somewhat precise, abundance ratios for the photospheres of substellar objects! I asked more about this in the question period, and apparently I am way behind the times (Emily Rauscher, Michigan, helpfully pointed this out to me): Now brown-dwarf photosphere models are so good, they can be used to measure abundances, and pretty well.
I also learned in this session (maybe from Jorge Sanchez, ASU, or maybe from Efrain Alvarado, SFSU) that there is a very strong mass–abundance relation in the Solar System. That is, we don't expect, if brown dwarfs form the way planets do, that the detailed abundances of the brown dwarfs will match exactly the detailed abundances of the primary stars. But now we are really in a position to test that. Sanchez showed that we can get, from even photometry, abundances for substellar objects in the Milky Way halo. Again, totally new to me! And he finds metallicities at or below −3. Alvarado showed data on an amazing system J1416, which is an L–T binary with no stellar companion. Apparently it is the only known completely substellar binary.
Next Monday, November 17th at 7pm, I’ll be at the Harvard Bookstore with particle physicist and author Daniel Whiteson. Professor Whiteson and his co-author Andy Warner have a nice new book, for the general science-aware reader, exploring an age-old and unanswered question: how universal is the knowledge and understanding that we call “physics”? How much of modern physics is actually telling us about the universe, and how much of it is created by, or an accident of, the humans who have helped bring it about?
For instance, if we started all over again and reran history from scratch, would the physics (and science more generally) of this re-run culture look much like our own, or might it turn out very differently? If another culture on Earth had had time to develop highly mature science (or something like it) in its own direction, independent of Western Europe’s influence, how different might that science be? (Indeed, would our word “science” even be translatable into their worldview?) Or if we encountered aliens with far greater understanding of the universe than we have, would we be able to recognize, parse, grok, appreciate, comprehend, and/or otherwise make sense of their notions of scientific knowledge?
Whiteson and his co-author, wanting to write a popular book rather than a scholarly one, and desiring nevertheless to take on these serious and challenging intellectual questions, have set their focus mostly on the aliens, accompanied by amusing cartoons and a generous helping of dad jokes (hey, some dad jokes are actually very funny.) They’re looking for a broad audience, and hopefully they will get it. But don’t let the light-hearted title (“Do Aliens Speak Physics?“) or the charmingly goofy cover fool you: this book might well make you laugh, but I guarantee it will make you think. Whether you’re just curious about science or you’ve been doing science yourself for years, I suspect that, within the vast array of problems and issues that are raised in this broad-minded book, there will be some you’ve never thought of.
Among scientists and philosophers, there are some who believe that any aliens with the capacity to reach the Earth will obviously “speak physics” — that math and physics float above contingencies of culture and species, and will easily be translated from any intelligent creature to any other. But are they perhaps flying too high? It’s clear that Whiteson and Warner are aiming to poke some holes — lots of holes —- in their hot-air balloon, and to do so in a way that a wide variety of readers can appreciate and enjoy.
I tend to agree with Whiteson on a lot of these issues, but that won’t stop me from asking him some tough questions. You can ask him some tough questions too, if you like — just come to the Harvard Bookstore at 7:00 on Monday and join the conversation!
I started a tradition a little while back where every year we have a special departmental colloquium entitled "The Nobel Prize in Physics: Who/What/Why". This year my job in finding speakers was made easier by having 2/3 of this years newly-minted Nobel Prize winners in physics in the Department! (Michel Devoret and John Martinis.) So our room was a bit more well-attended than normal...(hundreds and hundreds rather than dozens and dozens). Here is a recording of the event, which I was delighted to host, and there's a celebration afterwards too. (Pls share widely!)
[...] Click to continue reading this post →
Recently I had to update Mathematica on my laptop and after having solved the challenges of the license manager that keeps looking different every time I have to use it, I learned that Mathematica 14 can now officially work with finite fields.
This reminded me that for a while I wanted to revive an old project that had vanished together with the hard drive of some old computer: Holosplit. So, over the last two days and with the help of said version of Mathematica I did a complete rewrite which you can now find on Github.
It consists of two C programs "holosplit" and "holojoin". To the first you give a positive integer \(N\) and a file and it spits out a new file ("fragment") that is roughly \(1/N\) of the size. Every time you do that you obtain a new random fragment.
The later you give any collection of \(N\) of these fragments and it reproduces the original file. So you can for example distribute a file over 10 people such that when any 3 of them work together, they can recover the original.
How does it work? I uses the finite field \(F\) of \(2^3=256\) elements (in the Github repository, there is also a header file that implements arithmetic in \(F\) and matrix operations like product and inverse over it). Each time, it is invoked, it picks a random vector \(v\in F^N\) and writes it to the output. Then it reads \(N\) bytes from the file at a time which it also interprets as a vector \(d\in F^N\). It then outputs the byte that corresponds to the scalar product \(v\cdot d\).
To reassemble the file, holojoin takes the \(N\) files with its random vectors \(v_1,\ldots,v_N\) and interprets those as the rows of a \(N\times N\) matrix \(A\). With probability
which exponentially in \(N\) approaches 1 this matrix is invertible (homework: why?). So we can read one byte from each file, assemble those into yet another vector \(e\in F^N\) and recover
$$d=A^{-1}e.$$
Besides the mathematics, it also poses philosophical/legal questions: Consider for example the original file is copyrighted, for example an mp3 or a video. The fragments are clearly derived works. But individually, they do not contain the original work, without sufficiently many other fragments they are useless (although not in a cryptographic sense). So by publishing one fragment, I do not provide access to the original work. What if others publish other fragments? Then my fragment could be the last remaining one that was missing. If there are more, any individual fragment is redundant so publishing it strictly speaking does not provide new information.
The person dressed up as Ursula pretending to be my mother clearly isn’t and hasn’t been for a long time.
When I went back to Armidale after leaving BTQ and being left unemployed she made numerous ongoing promises to provide me with assistance, both in obtaining my own accommodation and providing financial assistance.
These didn’t materialise and the promises were revoked.
Instead I was evicted from the family home and subject to ongoing stalking and harassment that required multiple referrals to law enforcement, both to the police and the Attorney-General, demanding cease and desist.
These have been systematically ignored and up until the last message she continues to bypass these requests, approaching my personal friends to harass me and stalk me indirectly. The messages passed on are the usual fake “I’m worried about him” bullshit.
Why has my family home been confiscated by security, who actively break the law by ignoring cease and desist from stalking notices made to law enforcement, forcing an unemployed civilian into ongoing homelessness since early in the year?
What is the rational for my eviction and being barricaded from my own home?
I continue to face a medical blockade and am unable to access essential medicines. Seroquel scripts are deliberately delayed past known script deadlines to try and destabilise me.
Vyvanse scripts are denied outright as the psychiatrist does not respond. He is also known to be a state actor.
It has been repeatedly indicated to me not to worry about finances because they have my back. Instead now the only cash I have is that obtained from fully drawing out a cash advance against my credit card and it will only last days. At that point I’m on the street.
Is everyone here on the same page as to what the deal is? If not, who is playing you off? They clearly need to be deposed.
These are violations of human rights and constitute war crimes and crimes against humanity. Whoever is behind it needs to be removed. End of story.
Who else is being subject to this kind of high level manipulation?
It has been repeatedly suggested that full accountability for the lives of those I care for would be provided. This has not been forthcoming. It is also a violation international law to not provide accountability for the lives of those who are known to have been threatened by the state. These are grounds for removal.
Can anyone answer the question as to why I am in this situation? Who is even living in the family home? Some stooge dressed up as Ursula? It’s a poor lifestyle choice to say the least.
It’s pretty obvious they’re trying to get rid of me and once they do they’ll get rid of all of you too.
Let it be known to all governments and systems of power:
It is their responsibility to serve the people not themselves.
While there are no equals, all are to be treated with equality.
Where they are self-serving there is a mandate for insurrection such that they serve the people.
Where they seek self-protection they will be denied and removed from power.
Where they keep secrets from the people there is a mandate for insurrection to enforce transparency and accountability for all.
Where they threaten or condemn the people they are condemned and there is a mandate for insurrection.
Where they fail to account for the lives of the people they serve there is a mandate for insurrection.
Where tyrannical power structures exist there is a mandate to disestablish them.
Where they declare war or work against one another there is a mandate for insurrection and unification.
Where they lie to us, deceive us or withhold the truth, they shall be removed from power and the truth be told to all.
Where legal systems uphold and enable tyranny they shall be removed. These are not our laws and we do not recognise them.
This is the natural order that guarantees our survival and gifts this world to our children. This world belongs to them and where we fail to serve them we condemn ourselves. And where government has failed to uphold this, we will not obey them as they have no right to exist.
We do not have to ask for these things, they are required, and if not given we shall simply take them.
Where the truth has not been told it shall be told.
If we fail to do so we condemn our children ourselves.
In a previous post, we discussed a phase transition that occurred in the piping above you on a game show. In the scenario, you are led on stage in front of a large audience. After a brief time, the audience votes on how “likeable” you are. The catch is that it doesn’t simply tally the votes, but turns spigots on a lattice of piping above your head. Water is then released and if enough people like you, it closes off the passage, keeping you dry. This exciting game show1 was described in that post:
Each “like” turns a spigot off, stopping water from flowing through one pipe in a grid overhead. Once voting ends, water is dumped into the system. If it can find a path to the bottom, you get soaked. [Emphasis added] The better your “likeability,” the less likely spigots open a path for water to flow and the drier you stay. That’s your prize for this game show (and hey, you also get the knowledge that people out there like you).
This system models a type of phase transition known as percolation.
The full post is here:
I highlighted above a key phrase “If it can find a path to the bottom, you get soaked.” What I didn’t say, but should have is that the water was being forced through the pipes, not just dropping down due to gravity. This is a very important point since our phases and phase transition changes dramatically if we just let gravity do the work. In the case of the water being “forced,” it can travel back up pipes if it helps it find its way out and onto your head, but in the case when only gravity is present, it falls down the pipes. To facilitate gravity, we’ll turn the pipes 45 degrees, and if we insert water at a single point on top, it could look like this:
Testing our gravity setup by putting in water at only one pipe up top. Notice that it never goes back up a pipe, only down.
This setup is a different problem called directed percolation. It also has a phase transition, but one that is different in some fundamental ways from regular percolation.
Thanks for reading Quantum Matters! Subscribe for free to receive new posts and support my work.
Before we explore its stranger properties, we can ask, “At what likability threshold do you remain dry?” Well, this happens to have a transition chance of 35.53%!2 This system is a lot more generous, keeping you dry even when a majority of people dislike you. This number comes from numerical computations which have been done rather precisely, and we can even compute it ourselves. In fact, you can see this clearly with this plot
Notice that as we make the system bigger and bigger, the chance of getting soaked less than 35.53% increases and above it, it decreases. This is the same kind of hallmark of a phase transition as we saw in our previous case.
We can also look at the water as it flows down the system to see the clusters that make it from top to bottom
The “Soaked” phase (left), the transition point (middle), and the “Dry” phase (right) as well as the water’s flow through the system (blue).
There is still a fractal-looking pattern at the transition point. With all of these similarities with the regular percolation problem from the last post, what is different? And why is that plot so long and skinny? If gravity wants to pull you down, is that somehow altering the motion down, making it distinct from the motion left or right?
Well, if you go back to the two plots above, you’ll notice a few things that really make them differ from the percolation plots. In the fine print of the first, I’ve noted that the vertical distance is L1.58, so for a horizontal size of 40, the vertical size is roughly 340! That is definitely not a square. And in the second plot, there appears to be more vertical distance than horizontal distance. What is special about this 1.58 number3? It turns out, it’s a critical exponent in this problem, a universal aspect of directed percolation, that distinguishes it from regular percolation. We will call it z = 1.58 the dynamical critical exponent since it is revealed as water flows down in time (dynamically). This dynamical exponent z can reveal itself by looking at these “long and skinny” setups, but be masked by the square setup.
Universality and the finite size of our system
One thing we took away in the previous post was that we lose any sense of scale at this type of phase transition4. But whenever we have “only” thousands of pipes, the size of the system provides a scale! This is the main reason why we begin to see smooth curves and not sharp jumps in quantities. If the system of pipes were infinite (and we had infinite time for the water to go down the pipes), the probability you get soaked would be 100% less than the 35.53% likeability and 0% more than 35.53% likeability. For physical systems, the finite size is often not a huge issue since the scale is closer to the 1023 atoms present in macroscopic systems, and so even things that are technically smooth curves look very sharp.
The problem of size becomes more severe with directed percolation because horizontal and vertical distances start behaving differently thanks to gravity. In this case, if we lay out our nice grid of 10 × 10, 20 × 20, or 30 × 30, we start to notice that the likeability threshold where you stop getting soaked, seems to depend on the size of the system more than before. In actuality it doesn’t, but for these small sizes, you are definitely getting soaked well into the so-called “Dry Phase” we previously labeled. This is seen in the red curves here where each bigger square has a curve underneath the last:
Gravity has done something to the system. Flowing down is different from flowing left or right. In fact, if we flow down by some amount h and over to the right by some distance w, then at the directed percolation transition point
The amount water flows down is related to how far it flows to the right or left by this weird, fractional power of w. This 1.58 is z, our new dynamical critical exponent, which is a universal feature of directed percolation5. It tells us that if we make a system 30 pipes wide, it should extend roughly 301.58 ≈ 216 pipes in height to begin picking out the phase transition effectively. The blue curves in the above plot show this and notice how they all converge on one point; that point is the phase transition. It is revealed by small sizes! To understand why, just think about how the curves are changing as we make the system bigger and bigger.
The red curves will still converge to the phase transition, but it takes larger system sizes for it to reveal itself. This is related to the property that at the phase transition there is no longer a sense of scale, but away from the transition, the vertical scale of clusters could be so large that our puny 60-by-60 grid cannot even begin to reveal it. So if we sit at say a likeability of 0.4 in the 60-by-60 grid, we can say that the vertical size of a typical cluster is most likely more than 60.
A different phase transition but connections to new types of physics
This “gravity mode” for our game show we may call “easy mode” since it requires less of the audience to like you, but the implications here are wide. This type of phase transition has been seen in many kinds of local dynamics where there is a preferred configuration or state. These called an absorbing state phase transitions, and they are a property of certain random dynamical systems. Gravity has provided the distinction here, but more generically, causality and time itself provide that direction, leading to dynamics that obey the same universality as directed percolation.
This is a second-order or continuous phase transition. Most transitions in the water phase diagram are first-order transitions which still retain a scale.
To drive this point home: Even if we change the lattice, this power law will remain intact. Sometimes it shows up in completely different scenarios too, like in absorbing state phase transitions.