Planet Musings

March 24, 2023

Scott Aaronson Xavier Waintal responds (tl;dr Grover is still quadratically faster)

This morning Xavier Waintal, coauthor of the new arXiv preprint “””refuting””” Grover’s algorithm, which I dismantled here yesterday, emailed me a two-paragraph response. He remarked that the “classy” thing for me to do would be to post the response on my blog, but: “I would totally understand if you did not want to be contradicted in your own zone of influence.”

Here is Waintal’s response, exactly as sent to me:

The elephant in the (quantum computing) room: opening the Pandora box of the quantum oracle

One of the problem we face in the field of quantum computing is a vast diversity of cultures between, say, complexity theorists on one hand and physicists on the other hand. The former define mathematical objects and consider any mathematical problem as legitimate. The hypothesis are never questioned, by definition. Physicists on the other hand spend their life questioning the hypothesis, wondering if they do apply to the real world. This dichotomy is particularly acute in the context of the emblematic Grover search algorithm, one of the cornerstone of quantum computing. Grover’s algorithm uses the concept of “oracle”, a black box function that one can call, but of which one is forbidden to see the source code. There are well known complexity theorems that show that in this context a quantum computer can solve the “search problem” faster than a classical computer.

But because we closed our eyes and decided not to look at the source code does not mean it does not exist. In, Miles Stoudenmire and I deconstruct the concept of oracle and show that as soon as we give the same input to both quantum and classical computers (the quantum circuit used to program the oracle on the actual quantum hardware) then the *generic* quantum advantage disappears. The charge of the proof is reversed: one must prove certain properties of the quantum circuit in order to possibly have a theoretical quantum advantage. More importantly – for the physicist that I am – our classical algorithm is very fast and we show that we can solve large instances of any search problem. This means that even for problems where *asymptotically* quantum computers are faster than classical ones, the crossing point where they become so is for astronomically large computing time, in the tens of millions of years. Who is willing to wait that long for the answer to a single question, even if the answer is 42?

The above explicitly confirms something that I realized immediately on reading the preprint, and that fully explains the acerbic tone of my response. Namely, Stoudenmire and Waintal’s beef isn’t merely with Grover’s algorithm, or even with the black-box model; it’s with the entire field of complexity theory. If they were right that complexity theorists never “questioned hypotheses” or wondered what did or didn’t apply to the real world, then complexity theory shouldn’t exist in CS departments at all—at most it should exist in pure math departments.

But a converse claim is also true. Namely, suppose it turned out that complexity theorists had already fully understood, for decades, all the elementary points Stoudenmire and Waintal were making about oracles versus explicit circuits. Suppose complexity theorists hadn’t actually been confused, at all, about under what sorts of circumstances the square-root speedup of Grover’s algorithm was (1) provable, (2) plausible but unproven, or (3) nonexistent. Suppose they’d also been intimately familiar with the phenomenon of asymptotically faster algorithms that get swamped in practice by unfavorable constants, and with the overhead of quantum error-correction. Suppose, indeed, that complexity theorists hadn’t merely understood all this stuff, but expressed it clearly and accurately where Stoudenmire and Waintal’s presentation was garbled and mixed with absurdities (e.g., the Grover problem “being classically solvable with a linear number of queries,” the Grover speedup not being “generic,” their being able to “solve large instances of any search problem” … does that include, for example, CircuitSAT? do they still not get the point about CircuitSAT?).

Anyway, we don’t have to suppose! In the SciRate discussion of the preprint, a commenter named Bibek Pokharel helpfully digs up some undergraduate lecture notes from 2017 that are perfectly clear about what Stoudenmire and Waintal treat as revelations (though one could even go 20 years earlier). The notes are focused here on Simon’s algorithm, but the discussion generalizes to any quantum black-box algorithm, including Grover’s:

The difficulty in claiming that we’re getting a quantum speedup [via Simon’s algorithm] is that, once we pin down the details of how we’re computing [the oracle function] f—so, for example, the actual matrix A [such that f(x)=Ax]—we then need to compare against classical algorithms that know those details as well. And as soon as we reveal the innards of the black box, the odds of an efficient classical solution become much higher! So for example, if we knew the matrix A, then we could solve Simon’s problem in classical polynomial time just by calculating A‘s nullspace. More generally, it’s not clear whether anyone to this day has found a straightforward “application” of Simon’s algorithm, in the sense of a class of efficiently computable functions f that satisfy the Simon promise, and for which any classical algorithm plausibly needs exponential time to solve Simon’s problem, even if the algorithm is given the code for f.

In the same lecture notes, one can find the following discussion of Grover’s algorithm, and how its unconditional square-root speedup becomes conditional as soon as the black box is instantiated by an explicit circuit:

For an NP-complete problem like CircuitSAT, we can be pretty confident that the Grover speedup is real, because no one has found any classical algorithm that’s even slightly better than brute force. On the other hand, for more “structured” NP-complete problems, we do know exponential-time algorithms that are faster than brute force. For example, 3SAT is solvable classically in about O(1.3n) time. So then, the question becomes a subtle one of whether Grover’s algorithm can be combined with the best classical tricks that we know to achieve a polynomial speedup even compared to a classical algorithm that uses the same tricks. For many NP-complete problems the answer seems to be yes, but it need not be yes for all of them.

The notes in question were written by some random complexity theorist named Scot Aronsen (sp?). But if you don’t want it from that guy, then take it from (for example) the Google quantum chemist Ryan Babbush, again on the SciRate page:

It is well understood that applying Grover’s algorithm to 3-SAT in the standard way would not give a quadratic speedup over the best classical algorithm for 3-SAT in the worst case (and especially not on average). But there are problems for which Grover is expected to give a quadratic speedup over any classical algorithm in the worst case. For example, the problem “Circuit SAT” starts by me handing you a specification of a poly-size classical circuit with AND/OR/NOT gates, so it’s all explicit. Then you need to solve SAT on this circuit. Classically we strongly believe it will take time 2^n (this is even the basis of many conjectures in complexity theory, like the exponential time hypothesis), and quantumly we know it can be done with 2^{n/2}*poly(n) quantum gates using Grover and the explicitly given classical circuit. So while I think there are some very nice insights in this paper, the statement in the title “Grover’s Algorithm Offers No Quantum Advantage” seems untrue in a general theoretical sense. Of course, this is putting aside issues with the overheads of error-correction for quadratic speedups (a well understood practical matter that is resolved by going to large problem sizes that wouldn’t be available to the first fault-tolerant quantum computers). What am I missing?

More generally, over the past few days, as far as I can tell, every actual expert in quantum algorithms who’s looked at Stoudenmire and Waintal’s preprint—every one, whether complexity theorist or physicist or chemist—has reached essentially the same conclusions about it that I did. The one big difference is that many of the experts, who are undoubtedly better people than I am, extended a level of charity to Stoudenmire and Waintal (“well, this of course seems untrue, but here’s what it could have meant”) that Stoudenmire and Waintal themselves very conspicuously failed to extend to complexity theory.

Scott Aaronson Of course Grover’s algorithm offers a quantum advantage!

Unrelated Update: Huge congratulations to Ethernet inventor Bob Metcalfe, for winning UT Austin’s third Turing Award after Dijkstra and Emerson!

And also to mathematician Luis Caffarelli, for winning UT Austin’s third Abel Prize!

I was really, really hoping that I’d be able to avoid blogging about this new arXiv preprint, by E. M. Stoudenmire and Xavier Waintal:

Grover’s Algorithm Offers No Quantum Advantage

Grover’s algorithm is one of the primary algorithms offered as evidence that quantum computers can provide an advantage over classical computers. It involves an “oracle” (external quantum subroutine) which must be specified for a given application and whose internal structure is not part of the formal scaling of the quantum speedup guaranteed by the algorithm. Grover’s algorithm also requires exponentially many steps to succeed, raising the question of its implementation on near-term, non-error-corrected hardware and indeed even on error-corrected quantum computers. In this work, we construct a quantum inspired algorithm, executable on a classical computer, that performs Grover’s task in a linear number of call to the oracle – an exponentially smaller number than Grover’s algorithm – and demonstrate this algorithm explicitly for boolean satisfiability problems (3-SAT). Our finding implies that there is no a priori theoretical quantum speedup associated with Grover’s algorithm. We critically examine the possibility of a practical speedup, a possibility that depends on the nature of the quantum circuit associated with the oracle. We argue that the unfavorable scaling of the success probability of Grover’s algorithm, which in the presence of noise decays as the exponential of the exponential of the number of qubits, makes a practical speedup unrealistic even under extremely optimistic assumptions on both hardware quality and availability.

Alas, inquiries from journalists soon made it clear that silence on my part wasn’t an option.

So, desperately seeking an escape, this morning I asked GPT-4 to read the preprint and comment on it just like I would. Sadly, it turns out the technology isn’t quite ready to replace me at this blogging task. I suppose I should feel good: in every such instance, either I’m vindicated in all my recent screaming here about generative AI—what the naysayers call “glorified autocomplete”—being on the brink of remaking civilization, or else I still, for another few months at least, have a role to play on the Internet.

So, on to the preprint, as reviewed by the human Scott Aaronson. Yeah, it’s basically a tissue of confusions, a mishmash of the well-known and the mistaken. As they say, both novel and correct, but not in the same places.

The paper’s most eye-popping claim is that the Grover search problem—namely, finding an n-bit string x such that f(x)=1, given oracle access to a Boolean function f:{0,1}n→{0,1}—is solvable classically, using a number of calls that’s only linear in n, or in many cases only constant (!!). Since this claim contradicts a well-known, easily provable lower bound—namely, that Ω(2n) oracle calls are needed for classical brute-force searching—the authors must be using words in nonstandard ways, leaving only the question of how.

It turns out that, for their “quantum-inspired classical algorithm,” the authors assume you’re given, not merely an oracle for f, but the actual circuit to compute f. They then use that circuit in a non-oracular way to extract the marked item. In which case, I’d prefer to say that they’ve actually solved the Grover problem with zero queries—simply because they’ve entirely left the black-box setting where Grover’s algorithm is normally formulated!

What could possibly justify such a move? Well, the authors argue that sometimes one can use the actual circuit to do better classically than Grover’s algorithm would do quantumly, and therefore, they’ve shown that the Grover speedup is not “generic,” as the quantum algorithms people always say it is.

But this is pure wordplay around the meaning of “generic.” When we say that Grover’s algorithm achieves a “generic” square-root speedup, what we mean is that it solves the generic black-box search problem in O(2n/2) queries, whereas any classical algorithm for that generic problem requires Ω(2n) queries. We don’t mean that for every f, Grover achieves a quadratic speedup for searching that f, compared to the best classical algorithm that could be tailored to that f. Of course we don’t; that would be trivially false!

Remarkably, later in the paper, the authors seem to realize that they haven’t delivered the knockout blow against Grover’s algorithm that they’d hoped for, because they then turn around and argue that, well, even for those f’s where Grover does provide a quadratic speedup over the best (or best-known) classical algorithm, noise and decoherence could negate the advantage in practice, and solving that problem would require a fault-tolerant quantum computer, but fault-tolerance could require an enormous overhead, pushing a practical Grover speedup far into the future.

The response here boils down to “no duh.” Yes, if Grover’s algorithm can yield any practical advantage in the medium term, it will either be because we’ve discovered much cheaper ways to do quantum fault-tolerance, or else because we’ve discovered “NISQy” ways to exploit the Grover speedup, which avoid the need for full fault-tolerance—for example, via quantum annealing. The prospects are actually better for a medium-term advantage from Shor’s factoring algorithm, because of its exponential speedup. Hopefully everyone in quantum computing theory has realized all this for a long time.

Anyway, as you can see, by this point we’ve already conceded the principle of Grover’s algorithm, and are just haggling over the practicalities! Which brings us back to the authors’ original claim to have a principled argument against the Grover speedup, which (as I said) rests on a confusion over words.

Some people dread the day when GPT will replace them. In my case, for this task, I can’t wait.

Thanks to students Yuxuan Zhang (UT) and Alex Meiburg (UCSB) for discussions of the Stoudenmire-Waintal preprint that informed this post. Of course, I take sole blame for anything anyone dislikes about the post!

For a much more technical response—one that explains how this preprint’s detailed attempt to simulate Grover classically fails, rather than merely proving that it must fail—check out this comment by Alex Meiburg.

Jordan EllenbergLearning a made-up language, III

The “made-up language” task that GPT-3 couldn’t do is handled much better by GPT-4, or at least the version of GPT-4 built into Bing Chat. The prompt:

Henderson’s native language is Gravinic, and when he speaks English, it sounds like this:

“I to the store did go and some cabbage did buy. Now I some dinner am making and I some cake will later eat. I you am not very much liking. I for you to go away am politely asking.”

Write more dialogue where Henderson tells you about his day.

The output:

Henderson continued to speak in his native language, Gravinic, which was very difficult for me to understand. He seemed to have a lot of things to say about his day, but I could only catch some words here and there.

“I today at work did have a meeting with my boss. He me did give a new project that I by next week must finish. It very hard and complicated is. I not sure if I it can do. He me also did say that I more hours must work and less pay will get. He very mean and unfair is. I him do not like at all.”

He paused for a moment and looked at me expectantly. I nodded politely, pretending to follow his story.

“Then I at lunch did go with my friend Bob. He me did tell that he a new girlfriend has. She very beautiful and smart is. He her met online on a dating site. He me showed her picture on his phone. I it did see and very jealous was. I also a girlfriend want but no one me likes.”

He sighed and shook his head sadly.

“Then I after work did go to the gym. I some exercise did do and some weights did lift. I very strong and fit am. But no one me notices or compliments gives. They only at me stare and laugh do behind my back. They very rude and mean are. I them do not like either.”

He frowned and clenched his fists.

“Then I home did come and you here did find waiting for me. You me always questions ask and annoy do with your presence. You very annoying and boring are. I you do not like too.”

He glared at me angrily.

“I you am now telling that you must leave do right now or else I you will hit do with this frying pan.”

He pointed at the frying pan on the stove menacingly.

I decided it was time to end this conversation before he got violent.

I quickly got up from my seat and headed towards the door.

Not perfect, but it really has picked up that in this version of English, verbs go last and past tense is indicated with “did.” It hasn’t really picked up how to do present tense (i.e. it should have said “They only are at me staring and are behind my back laughing”) and it is inconsistent in what it does with adverb phrases and there are some weird uses of “do” towards the end, but altogether a big improvement.

Jordan EllenbergLecture and egotism

I was doing guest lecture / Q&A in a science communication course at Cornell, where I was a professor at large last week. A friend of mine was in the back of the room, and later told me: “A bunch of students were on their laptops, but only about half of them were looking at Instagram and the other half were looking at information about you and your books!”

He meant this to be a complement but my initial reaction was — what? Students were looking at their laptops while we were talking? I had no idea. In my mental construction of the event, both while it was happening and afterwards, every eye in the room was on me.

And this made me reflect — when I am giving a talk, or even a classroom lecture, I’m inclined to always think I’m being listened to. But of course that’s not true! It couldn’t be true!

There are limits, of course. If I’m lecturing and I’ve lost the whole room, I see their eyes die and I notice it. I stop and regroup and change course. But if half the kids are tuned out? I’m just gonna be honest, I probably don’t notice that.

Now you can read this as saying I’m a huge egotist who relies on unrealistic assessments of how interesting I’m being, and thanks to this reliance am failing to engage the class. Or you could say it’s very, very hard to teach class in such a way that there’s not some notable proportion of students tuned out at any given moment, and that it would be even harder to teach class well if you were constantly aware of which students those were. And as a counterpoint to that sympathetic assessment, you could say it’s not a random and constantly shifting sample of students who are tuned out; there might be a notable proportion who are almost tuned out and who I’m allowing myself to fail, or rather to not even try, to reach.

I don’t really know!

March 23, 2023

John BaezThe Galactic Center

You’ve probably heard there’s a supermassive black hole at the center of the Milky Way—and also that near the center of our galaxy there are a lot more stars. But did you ever think hard about what the Galactic Center is like?

I didn’t, until recently. As a kid I read about it in science fiction—like Asimov’s Foundation trilogy, where the capital of the Empire is near the Galactic Center on the world of Trantor, with a population of 40 billion. That shaped my impressions.

But now we know more. And it turns out the center of our galaxy is a wild and woolly place! Besides that black hole 4 million times the mass of our Sun, it’s full of young clusters of stars, supernova remnants, molecular clouds, weird filaments of gas, and more.

It’s in the constellation of Sagittarius, abbreviated ‘Sgr’. Let me go through the various features named above and explain them.

Sgr A contains the supermassive black hole called Sgr A*, which is worth a whole article of its own. Surrounding that is the Minispiral: a three-armed spiral of dust and gas falling into the black hole at speeds up to 1000 kilometers per second.

Also in Sgr A, surrounding the Minispiral, there is a torus of cooler molecular gas called the ‘Circumnuclear Disk’:

The inner radius of the Circumnuclear Disk is almost 5 light years. And inside this disk there are over 10 million stars. That’s a lot! Remember, the nearest stars to our Sun are 4 light years away.

Even weirder, among these stars there are lots of old red giants—but also many big, young stars that formed in a single event a few million years ago. These include about 100 OB stars, which are blue-hot, and Wolf-Rayet stars, which have blown off their outer atmosphere and are shining mainly in the ultraviolet.

Nobody knows how so many stars were able to form inside the Circumnuclear Disk espite the gravitational disruption of central black hole, and why so many are young. This is called the ‘paradox of youth’.

Stars don’t seem to be forming now in this region. But some predict that stars will form in the Circumnuclear Disk, perhaps causing a starburst in 200 million years, with many stars forming rapidly, and supernovae going off at a hundred times the current rate! As gas from these falls into the central black hole, life may get very exciting.

As if this weren’t enough, a region of Sgr A called Sgr A East contains a structure is approximately 25 light-years in width that looks like a supernova remnant, perhaps created between 35 and 100 thousand years ago. However, it would take 50 to 100 times more energy than a standard supernova explosion to create a structure of this size and energy. So, it’s a bit mysterious.

Moving further out, let’s turn to the Radio Arc, called simply ‘Arc’ in picture at the top of this article. This is the largest of a thousand mysterious filaments that emit radio waves. It’s obvious that the Galactic Center is wild, but these make it ‘woolly’. Nobody knows what causes them!

Here is the Radio Arc and some filaments:

Behind the Radio Arc is the Quintuplet Cluster, which contains one of the largest stars in the Galaxy—but more about that some other day.

Sgr B1 is a cloud of ionized gas. Nobody knows why it’s ionized. Like the filaments, perhaps it was heated up back when the black hole was eating more stars and emitting more radiation. Sgr B1 is connected to Sgr B2, a giant molecular cloud made of gas and dust, 3 million times the mass of the Sun.

The distance from Sgr A to Sgr B2 is 390 light years. That gives you a sense of the scale here! The whole picture spans a region in the sky 4 times the angular size of the Moon.

The two things called SNR are supernova remnants—hot gas shooting outwards from exploded stars. For example, in the top picture at lower right we see SNR 359.1-0.5, which looks like this close up:

The filament at right is called the Snake, while the Mouse at left is actually supposed to be a runaway pulsar. It looks like the Mouse is running away from the Snake! But that’s probably a coincidence.

Sgr D is another giant molecular cloud, and Sgr C is a group of molecular clouds.

So, a lot is going on in our galaxy’s center! Out here in the boondocks it’s more quiet.

Let me show you the first picture in all its glory without the labels. Click to enlarge:

It’s almost impossible to see the Galactic Center in visible light through all the dust, so this is an image in radio waves, made by the MeerKAT array of 64 radio dishes in South Africa. It was made by Ian Heywood with color processing by Juan Carlos Munoz-Mateos.

Here are two other versions of the same image, processed in different ways:

Click to enlarge!

March 22, 2023

n-Category Café Azimuth Project News

I blog here and also on Azimuth. Here I tend to talk about pure math and mathematical physics. There I talk about the Azimuth Project.

Let me say a bit about how that’s been going. My original plans didn’t work as expected. But I joined forces with other people who came up with something pretty cool: a rather general software framework for scientific modeling, which explicitly uses abstractions such as categories and operads. Then we applied it to epidemiology.

This is the work of many people, so it’s hard to name them all, but I’ll talk about some.

The Azimuth Project started in 2010 when I moved to Singapore, had more time to think thanks to a great job at the Centre for Quantum Technologies, and decided to do something about climate change—or more broadly, the Anthropocene.

But do what? You can read my very first thoughts here. I rounded up some interested people, many of them programmers from outside academia, and we started a wiki to compile relevant scientific information. We thought a lot and wrote a lot about the huge problems confronting our civilization. We did some interesting stuff like making simple climate models—purely for educational purposes, not for trying to predict anything! We also recapitulated a network-based attempt to predict El Niños.

But it soon became clear to me that my own strengths lay not in climate science, and certainly not in leading a group of people outside academia trying to accomplish something practical. I got more and more interested in using category theory to study networks—and more generally in getting category theorists interested in practical things. I figured that category theory could really transform how we think about complex systems made of interacting parts.

I understand a bit about what motivates academics, and how to get them working on things. So, once I put my mind to it, I managed to speed up the trend toward applied category theory, which by now has its own annual conference. I’m on the steering committee of that conference, but luckily there are so many energetic people involved that I don’t have to do much. By now I can barely keep up with the progress in applied category theory, which is visible on the Category Theory Community Server, a forum set up by my student Christian Williams.

Indeed, part of how academia works is that if you get really good students, they go off and do things much better than you could do yourself!

For example, my former student Brendan Fong is an order of magnitude better at organizing things than I am. Together with Joshua Tan and Nina Otter he started the journal Compositionality, which has a strong emphasis on applied category theory, though it’s also open to other ways of thinking about compositionality (the study of how complex things can be assembled out of simpler parts). But even more importantly, Brendan now leads the Topos Institute, which brings together applied category theorists and people developing new technologies for the betterment of humanity. I’ll get back to that later.

Another amazingly successful student of mine is Nina Otter, now at Queen Mary University. At least I’ll gladly count her as a student, because she did a master’s thesis with me, on operads and the tree of life. But then she switched to topological data analysis, and she’s now using that to study weather regimes.

A big part of the Azimuth project’s focus on networks has always been studying Petri nets: a general formalism for studying chemical reactions, population biology and many other things.

A bunch of blog articles on Petri nets, written at the Centre for Quantum Technologies with Jacob Biamonte, eventually turned into our book Quantum Techniques for Stochastic Mechanics. But a new direction came when Brendan Fong developed decorated cospans, a general technique for studying open systems. My student Blake Pollard and I used these to study ‘open Petri nets’, which we called open reaction networks.

Later, my student Jade Master made the theory of open Petri nets really beautiful using structured cospans, a simplified version of Brendan’s decorated cospans developed by my student Kenny Courser.

Meanwhile something big was brewing. Two fresh PhDs named James Fairbanks and Evan Patterson came up with AlgebraicJulia, a software system that aims to “create novel approaches to scientific computing based on applied category theory”. And among many other things, they grabbed ahold of structured cospans and turned them into something you could write programs with!

In October 2020, together with Micah Halter, they used AlgebraicJulia to redo part of the UK’s main COVID model using open Petri nets. At the time I wrote:

This is a wonderful development! Micah Halter and Evan Patterson have taken my work on structured cospans with Kenny Courser and open Petri nets with Jade Master, together with Joachim Kock’s whole-grain Petri nets, and turned them into a practical software tool!

Then they used that to build a tool for ‘compositional’ modeling of the spread of infectious disease. By ‘compositional’, I mean that they make it easy to build more complex models by sticking together smaller, simpler models.

Even better, they’ve illustrated the use of this tool by rebuilding part of the model that the UK has been using to make policy decisions about COVID19.

All this software was written in the programming language Julia.

I had expected structured cospans to be useful in programming and modeling, but I didn’t expect it to happen so fast!

Here’s a video about these ideas, from 2020:

Later Evan got a job at the Topos Institute and this work blossomed into the following paper:

I should have blogged about this, but things are happening so fast I never got around to it! This illustrates why I’ve lost interest in the Azimuth Project as originally formulated, with this blog as the main communication hub and the wiki as the information depot: academics with their own modes of communication have been pushing things forward in their own ways too fast for me to blog about it all!

Another example: last summer in Buffalo I helped mentor a bunch of students at a program on applied category theory run by the American Mathematical Society. This led to two very nice papers on open Petri nets and related open networks:

I want to blog about these, and I will soon!

But at the same time, the use of category theory in epidemiological modeling keeps growing. The early work attracted the attention of a bunch of actual epidemiologists, notably my old grad school pal Nate Osgood, who now works at the University of Saskatchewan, both in computer science and also the department of community health and epidemiology. He helps the government of Canada run its main COVID models! This was a wonderful coincidence, made even sweeter by the fact that Nate was hankering to apply category theory to these tasks.

Nate explained that for modeling disease, Petri nets are less popular than another style of diagram, called ‘stock-flow diagrams’. But one can deal with open stock-flow diagrams using the same category-theoretic tricks that work for Petri nets: decorated or structured cospans. We worked this out together with Evan Patterson, Nate’s grad student Xiaoyan Li, and Sophie Libkind at the Topos Institute. And these folks—not me—converted these ideas into AlgebraicJulia code for making big models of epidemic disease out of smaller parts!

We wrote about it here:

Alas, I’ve been too busy to properly blog about this paper, but I’ve given a bunch of talks about it, and you can see some on YouTube. The easiest is probably this one:

Since then we’ve made a huge amount of progress, due largely to Nate and Xiaoyan’s enthusiasm for converting abstract ideas into practical tools for epidemiologists. The current state of the art is pretty well reflected in this paper:

In particular, Nate’s student Eric Redekopp built a graphical user interface for the software, so epidemiologists knowing nothing of category theory or the language Julia can collaboratively build disease models on their web browsers!

So, a lot of my energy that originally went into the Azimuth Project has, by a series of unpredictable events, become focused on the project of applied category theory, with the most practical application for me currently being disease models.

What happened to climate change? Well, a lot of these modeling methodologies could be applied to power grids or world economic models. In fact stock-flow diagrams were first developed for economics and business in James Forrester’s book Industrial Dynamics, and they were later used in the famous Limits to Growth model of the world economy and ecology, called World3. So there is a lot to do in this direction. But—I’ve realized—it would require finding an energetic expert who is willing to learn some category theory and teach me (or some other applied category theorist) what they know.

For now, a more instantly attractive option is working with someone I’ve known since I was a postdoc: Minhyong Kim. He’s now head of the International Center of Mathematical Sciences, and he’s dreamt up a project called Mathematics for Humanity. This will fund research workshops, conferences and courses in these areas:

A. Integrating the global research community

B. Mathematical challenges for humanity

C. Global history of mathematics

I’m hoping to coax people to run a workshop on mathematical epidemiology, but also get people together to tackle many other mathematical challenges for humanity. Minhyong has listed some examples:

The deadline to apply for funding is now June 1st, so if you know anyone who might be interested, please tell them about this—and tell me about them!

Scott Aaronson On overexcitable children

Update (March 21): After ChatGPT got “only” a D on economist Bryan Caplan’s midterm exam, Bryan bet against any AI getting A’s on his exams before 2029. A mere three months later, GPT-4 has earned an A on the same exam (having been trained on data that ended before the exam was made public). Though not yet conceding the bet on a technicality, Bryan has publicly admitted that he was wrong, breaking a string of dozens of successful predictions on his part. As Bryan admirably writes: “when the answers change, I change my mind.” Or as he put it on Twitter:

AI enthusiasts have cried wolf for decades. GPT-4 is the wolf. I’ve seen it with my own eyes.

And now for my own prediction: this is how the adoption of post-GPT AI is going to go, one user at a time having the “holy shit” reaction about an AI’s performance on a task that they personally designed and care about—leaving, in the end, only a tiny core of hardened ideologues to explain to the rest of us why it’s all just a parrot trick and none of it counts or matters.

Another Update (March 22): Here’s Bill Gates:

In September, when I met with [OpenAI] again, I watched in awe as they asked GPT, their AI model, 60 multiple-choice questions from the AP Bio exam—and it got 59 of them right. Then it wrote outstanding answers to six open-ended questions from the exam. We had an outside expert score the test, and GPT got a 5—the highest possible score, and the equivalent to getting an A or A+ in a college-level biology course.

Once it had aced the test, we asked it a non-scientific question: “What do you say to a father with a sick child?” It wrote a thoughtful answer that was probably better than most of us in the room would have given. The whole experience was stunning.

I knew I had just seen the most important advance in technology since the graphical user interface.

Just another rube who’s been duped by Clever Hans.

Wilbur and Orville are circumnavigating the Ohio cornfield in their Flyer. Children from the nearby farms have run over to watch, point, and gawk. But their parents know better.

An amusing toy, nothing more. Any talk of these small, brittle, crash-prone devices ferrying passengers across continents is obvious moonshine. One doesn’t know whether to laugh or cry that anyone could be so gullible.

Or if they were useful, then mostly for espionage and dropping bombs. They’re a negative contribution to the world, made by autistic nerds heedless of the dangers.

Indeed, one shouldn’t even say that the toy flies: only that it seems-to-fly, or “flies.” The toy hasn’t even scratched the true mystery of how the birds do it, so much more gracefully and with less energy. It sidesteps the mystery. It’s a scientific dead-end.

Wilbur and Orville haven’t even released the details of the toy, for reasons of supposed “commercial secrecy.” Until they do, how could one possibly know what to make of it?

Wilbur and Orville are greedy, seeking only profit and acclaim. If these toys were to be created — and no one particularly asked for them! — then all of society should have had a stake in the endeavor.

Only the rich will have access to the toy. It will worsen inequality.

Hot-air balloons have existed for more than a century. Even if we restrict to heavier-than-air machines, Langley, Whitehead, and others built perfectly serviceable ones years ago. Or if they didn’t, they clearly could have. There’s nothing genuinely new here.

Anyway, the reasons for doubt are many, varied, and subtle. But the bottom line is that, if the children only understood what their parents did, they wouldn’t be running out to the cornfield to gawk like idiots.

Doug NatelsonWhat do we want in a conference venue?

The APS March Meeting was in Las Vegas this year, and I have yet to talk to a single attendee who liked that decision in hindsight.  In brief, the conference venue seemed about 10% too small (severe crowding issues in hallways between sessions); while the APS deal on hotels was pretty good, they should have prominently warned people that not using the APS housing portal means you fall prey to Las Vegas’s marketing schtick of quoting a low room rate but hiding large “resort fees”; with the exception of In N Out Burger, the food was very overpriced (e.g. $12 for a coffee and a muffin in the Starbucks in my hotel); and indoor spaces in town generally smelled like stale cigarettes, ineffective carpet cleaner, and desperation.

I don’t think it’s that hard to enumerate what most people would like out of a conference venue, if we are intending to have in-person meetings and are going to spend grant money and valuable time to attend the meeting with our groups. (I’m taking as a given that the March meeting is large - now up to 12K attendees, for good or ill - and I know that’s so big that some people will decide that it’s too unwieldy to be worth going.  Likewise, I know that the logistics are always difficult in terms of the abstract sorting and trying to make sure that likely-popular sessions get higher capacity rooms.)

Off the top of my head, I would like:

  • A meeting venue that can accommodate everyone without feeling dangerously crowded at high volume transit times between sessions, with a good selection of hotels nearby that don’t have crazy room rates.  (I know that the meeting growth already likely rules out a lot of places that have hosted the March meeting in the past.)
  • A high density of relatively cheap restaurants, including sandwich places, close to the venue for lunch, so that a quick bite is possible without hiking a mile or being forced to spend $20 on convention center food.
  • Actual places to sit (tables and chairs) to talk with fellow attendees.  Las Vegas had a much smaller number of these (indoors) than previous locations.
  • Reasonable availability of water (much better these days than in the past) and not-outrageously-priced coffee and tea.
  • Wifi that actually can accommodate the number of attendees; at some point in Las Vegas I basically gave up on the conference wifi and tethered to my phone.  Remember, many of us still have to get some level of work done (like submitting annoyingly timed proposals) while at these.
  • Modern levels of accommodations for nursing mothers, childcare, facilities for those with disabilities or mobility issues, etc. 
Are there major items that I’m missing?  Do readers have suggestions for meeting sites that can hit all of these?  I am well aware that the APS is financially constrained to make these arrangements years in advance.  It can’t hurt to discuss this, though, especially raising concerns about problems to avoid.

March 17, 2023

n-Category Café Jeffrey Morton

When he was my grad student, Jeffrey Morton worked on categorifying the theory of Feynman diagrams, and describing extended topological quantum field theories using double categories.

He got his PhD in 2007. Later he did many other things. For example, together with Jamie Vicary, he did some cool work on categorifying the Heisenberg algebra using spans of spans of groupoids. This work still needs to be made fully rigorous—someone should try!

But this is about something else.

On May 10, 2016 he wrote:

I just wanted to let you know that I’ve been offered, and accepted (paperwork-pending), a tenure-track Assistant Prof position at SUNY Buffalo State. That’s different from the University of Buffalo, where Bill Lawvere’s now an Emeritus, by the way. They’re across town.

It looks like a reasonable teaching/research situation, and the people in the department seem nice. Plus it’s close to friends and family in Canada. So I’m looking forward to settling into something more long-term, and giving attention to something else for a change.

I replied:


This makes me very, very happy. Maybe I’ll post a little note on the n-Cafe.

Don’t forget to get tenure. This may require volunteering for administrative work and other signs that you’re a team player - the kind of things people don’t expect from postdocs.

Today he wrote:

It’s been a long time since you wrote me this email reminding me not to forget to get tenure - however, I thought I’d let you know that I didn’t forget. I kept that recommendation in mind the whole time, and this week they let me know they’ve decided to give me tenure.


Matt von HippelTalking and Teaching

Someone recently shared with me an article written by David Mermin in 1992 about physics talks. Some aspects are dated (our slides are no longer sheets of plastic, and I don’t think anyone writing an article like that today would feel the need to put it in the mouth of a fictional professor (which is a shame honestly)), but most of it still holds true. I particularly recognized the self-doubt of being a young physicist sitting in a talk and thinking “I’m supposed to enjoy this?”

Mermin’s basic point is to keep things as light as possible. You want to convey motivation more than content, and background more than your own contributions. Slides should be sparse, both because people won’t be able to see everything but also because people can get frustrated “reading ahead” of what you say.

Mermin’s suggestion that people read from a prepared text was probably good advice for him, but maybe not for others. It can be good if you can write like he does, but I don’t think most people’s writing is that much better than what they say in talks (you can judge this by reading peoples’ papers!) Some are much clearer speaking impromptu. I agree with him that in practice people end up just reading from their slides, which indeed is bad, but reading from a normal physics paper isn’t any better.

I also don’t completely agree with him about the value of speech over text. Yes, putting text on your slides means people can read ahead (unless you hide some of the text, which is easier to do these days than in the days of overhead transparencies). But just saying things means that if someone’s attention lapses for just a moment, they’ll be lost. Unless you repeat yourself a lot (good practice in any case), you should avoid just saying anything you need your audience to remember, and make sure they can read it somewhere if they need it as well.

That said, “if they need it” is doing a lot of work here, and this is where I agree again with Mermin. Fundamentally, you don’t need to convey everything you think you do. (I don’t usually need to convey everything I think I do!) It’s a lesson I’ve been learning this year from pedagogy courses, a message they try to instill in everyone who teaches at the university. If you want to really convey something well, then you just can’t convey that much. You need to focus, pick a few things and try to get them across, and structure the rest of what you say to reinforce those things. When teaching, or when speaking, less is more.

Tommaso DorigoNorthern Lights

These days I am spending a few months in northern Sweden, to start a collaboration with computer scientists and physicists from Lulea University of Technology on neuromorphic computing (I'll soon write about that, stay tuned). The rather cold weather of March (sub-zero temperatures throughout the day) is compensated by having access to the night show of northern lights, which are often visible from these latitudes (66 degrees north).

read more

March 16, 2023

Doug NatelsonRecent RT superconductivity claim - summary page

In the interests of saving people from lots of googling or scrolling through 170+ comments, here is a bulleted summary of links relevant to the recent claim of room temperature superconductivity in a nitrogen-doped lutetium hydride compound under pressure.  
  • Dias's contributed talk at the APS meeting is here on youtube.
  • Here is the promotional video put out by Rochester as part of the media release.  It odd to me that the department chair and the dean of the PI are both in this video.
  • Here is the pubpeer page that has sprung up with people reporting concerns about the paper.
  • The comments attached to the paper itself contain interesting discussion (though strangely an informed comment from Julia Deitz about the EDX data was repeatedly deleted as "spam")
  • There was a lot of media coverage of this paper.  The Wall Street Journal was comparatively positive.  The New York Times was more nuanced.  Quanta had a thorough article with a witty headline describing the controversy surrounding the claim.  The APS had an initial brief news report and a more extensive article emphasizing the concerns about the paper.
  • Experimental preprints have appeared looking at this.  The first observes a color change under pressure in LuH2, but no superconductivity in that related compound.  The second is a direct replication attempt, finding x-ray structural data matching the report but no superconductivity in that material up to higher pressures and down to 10 K.  Note that another preprint appeared last week reporting superconductivity at about 71 K in a different lutetium hydride at much higher pressures.
  • A relevant and insightful talk from James Hamlin is here, from a recent online workshop about reproducibility in condensed matter physics.  Note that (as reported in this twitter thread) significant portions of Hamlin's doctoral thesis appear verbatim in Dias' thesis.  
No doubt there are more; please let me know if there are additional key links that I've missed (not every twitter comment is important).   

March 13, 2023

David Hoggsomething in astronomy is wrong

Today I chatted with my old friend Phil Marshall (SLAC) about various things. Well actually I ranted at him about catalogs and how their use is related to the way they are made. He was sensible in reply. He suggested that we write something, and maybe also develop some guidance for the NSF LSST developer community. His recommendation was to create some examples that are simple but also that connect obviously to research questions in the heads of potential readers. Easier said than done! I said that any such paper needs a good title and abstract. He pointed out that this is true of every paper we ever write! Okay fine.

David Hogga blackboard talk on orbital torus imaging

I gave the brown bag talk (chalk only) today at the NYU Center for Cosmology and Particle Physics. I spoke about torus imaging—using moments of the abundance distribution to measure or delineate the orbits in the Galaxy. I focused on the theory of dynamics and what it looks like if you can insert new invariants. The questions were great, including hard ones about non-equilibrium structures, radial migration, and chaos. All these things matter! Talking at a board in front of a skeptical, expert audience is absolutely great for learning about one's own projects, communication, and thinking.

David HoggAre there young, alpha-rich stars?

I asked this question in Data Group meeting: With Emily Jo Griffith (Colorado) and I have a data-driven nucleosynthetic story for essentially every red-giant-branch star in the SDSS-IV APOGEE survey. Since the parameters of this model relate to the build-up of elements over time, they might be used to indicate age. We matched to the NASA Kepler asteroseismic sample and indeed, our nucleosynthetic parameters do a very good job of predicting ages.

On the RGB, age is mass, and the asteroseismology gives you masses, not ages. There are some funny outliers: Stars with large masses, which means young ages, but with abundances that strongly indicate old ages. Are they young or old? I am betting that they are old, but they’ve undergone mass transfer, accretion, or mergers. If I’m right, what should we look for? The Data Group (plus visitors) suggest looking for binarity, for vertical action (indicating age), for ultraviolet excess (indicating white dwarf companion), for abundance anomalies, and Gaia RUWE. Will do! My money is that all these stars are actually old.

n-Category Café An Invitation to Geometric Higher Categories

Guest post by Christoph Dorn

While the term “geometric higher category” is new, its underlying idea is not: coherences in higher structures can be derived from (stratified) manifold topology. This idea is central to the cobordism hypothesis (and to the relation of manifold singularities and dualizability structures as previously discussed on the nn-Category Café), as well as to many other parts of modern Quantum Topology. So far, however, this close relation of manifold theory and higher category theory hasn’t been fully worked out. Geometric higher category theory aims to change that, and this blog post will sketch some of the central ideas of how it does so. A slightly more comprehensive (but blog-length-exceeding) version of this introduction to geometric higher categories can be found here:

Today, I only want to focus on two basic questions about geometric higher categories: namely, what is the idea behind the connection of geometry and higher category theory? And, what are the first ingredients needed in formalizing this connection?

What is geometric about geometric higher categories?

I would like to argue that there is a useful categorization of models of higher structures into three categories. But, I will only give one good example for my argument. The absence of other examples, however, can be taken as a problem that needs to be addressed, and as one of the motivations for studying geometric higher categories! The three categories of models that I want to consider are “geometric”, “topological” and “combinatorial” models of higher structures. Really, depending on your taste, different adjectives could have been chosen for these categories: for instance, in place of “combinatorial”, maybe you find that the adjectives “categorical” or “algebraic” are more applicable for what is to follow; and in place of “geometric”, maybe saying “manifold-stratified” would have been more descriptive.

But let’s get to the promised example of how these three categories of models work and how they relate. We start with the archetypical model of a type of higher structure: namely, with topological spaces. Unsurprisingly, topological spaces fall firmly into the category of topological models. The type of higher structure modelled by topological spaces deserves a name: we will refer to them as homotopy types. There is a second well-known model for homotopy types, namely, \infty-groupoids. Unlike spaces, whose theory is based on the continuum n\mathbb{R}^n, \infty-groupoids are discrete structures whose data is captured by collections of morphisms in each dimension kk \in \mathbb{N}. This makes \infty-groupoids a prime example of a ‘combinatorial’ (or, ‘algebraic’, or, ‘categorical’) model of a higher structure. (I should point out that I am being vague about concrete definitions of the above named ‘models’; for instance, ‘spaces’ could be, more concretely, taken to mean CW complexes, and \infty-groupoids could be taken to mean Kan complexes.) Despite being rather different in flavour, the higher theory (i.e. the homotopy theory) of topological spaces and the higher theory of \infty-groupoids turn out to be equivalent. The two models are related by two important constructions: we can pass from spaces to \infty-groupoids by taking the fundamental categories of spaces (usually referred to as their ‘fundamental \infty-groupoids’), and, conversely, we can realize \infty-groupoids as spaces.

Now the interesting question: what is the geometric counterpart to the above topological and combinatorial models? In different words, how can we understand the homotopy theory of spaces in terms of a theory of manifold-stratified structures? The answer is given by the theory of cobordisms, or more precisely, stratified cobordisms. The most well-known instance of how cobordisms and spaces relate in this sense is the classical Pontryagin theorem. The theorem describes the isomorphism

Ω m fr( n)π n(S nm)\Omega ^{\mathrm{fr}} _{m} (\mathbb{R} ^{n}) \cong \pi _{n} (S ^{n-m})

between the cobordism group of smooth (normal) framed mm-manifolds in n\mathbb{R}^n and the nnth homotopy group of the (nm)(n-m)-sphere. The resulting relation between smooth manifold theory and homotopy theory is incredibly ubiquitous in modern Algebraic Topology (and, relatedly, in Physics) but often implicitly so — in the words of Mike Hopkins, Pontryagin’s theorem itself marks the point in time at which Algebraic Topology became ‘modern’.

Importantly for us, the theorem generalizes from spheres to arbitrary spaces (or, more precisely, CW complexes): namely, the nnth homotopy group of any space XX can be understood in terms of the framed stratified cobordisms group of framed XX-stratifications of n\mathbb{R}^n (where, roughly, an ‘XX-stratification’ is a stratification whose singularity types are determined by the ‘dual cells’ of XX). Formulaically, this may be expressed by writing

Ω X-str fr( n)π n(X).\Omega ^{\mathrm{fr}} _{X\text{-str}} (\mathbb{R} ^{n}) \cong \pi _{n} (X).

The details of this generalization are spelled out in [1, Ch. VII], in a chapter titled “the geometry of CW complexes”, but really the basic idea of the construction remains essentially the same (for the experts: instead of working with a regular value point of S nmS^{n-m}, we work with a regular dual stratification of the CW complex XX). To summarize, we can study the homotopy groups of spaces, or, in combinatorial terms, the higher morphisms in \infty-groupoids, by means of framed stratified cobordisms. The relation between the geometry of stratified cobordisms and the homotopy theory of spaces can be conceptually thought of as a process of dualization (which translates stratification data back into cells, and we will return to this later on).

The main point I now want to make is that the trilogy of geometric, topological, and combinatorial models exemplified above in the case of homotopy types should also extend to other types of higher structures. In particular, (,1)(\infty,1)-categories and (,n)(\infty,n)-categories should admit both topological models and geometric models — however, these classes of models haven’t been much explored so far (as an aside, in the (,1)(\infty,1)-case there is the theory of dd-spaces which appears to be an existing topological counterpart to (,1)(\infty,1)-categories even though I’m unsure how much this relation has been formally explored). The situation is summarized in Table 1 below, in which we have filled in precisely some of the missing entries for geometric and topological models of higher structures: to indicate their conceptual nature, names of these models have been kept in quotes.

At a first glance, it seems like finding concrete definitions for these directed geometric models would be a tall order. After all, the theory of stratified cobordisms itself has its mathematical depths, and realizing the step from \infty-groupoids (say, Kan complexes) to concrete definitions of (,n)(\infty,n)-categories is not necessarily an obvious one.

But this is where things may get exciting. Indeed, by the end of this blog post I hope to have convinced you that the geometric models of higher structures may, in fact, get much easier when passing to the directed setting of nn-categories. Moreover, to then study the undirected setting (i.e., in combinatorial terms, the case of higher categories with invertible morphisms) from the perspective of directed geometric models provides a refined view on the computational intricacies of invertibility.

Of course, in order to be able to tell this story, I will have to shortly show you some concrete notions that aim to realize the geometric models outlined above. The notions go by the names of manifold diagrams resp. tangle diagrams, the latter being the relevant notion when considering invertible morphisms. Both notions have been added to Table 1 in their respective places (note that the (,n)(\infty,n)-case requires us to deal with manifold diagrams in and below dimension nn, and with tangle diagrams above dimension nn). Neither of these notions should come as a complete surprise, as we are already familiar with some of their instances: manifold diagrams specialize to string diagrams in dimension n=2n = 2, i.e. they generalize the latter notion to arbitrary higher dimensions; and, (unstratified, framed) tangle diagrams formalize in all dimensions the pictures you are likely to draw when thinking about the tangle hypothesis!

Importantly, note that I wrote that the notions ‘aim to realize’ the aforementioned geometric models. Indeed, how do we measure ‘correctness’ of our definitions of manifold diagrams? Unfortunately, no reasonably-straight-forward benchmark for models of directed geometry exists at this point (actually, the same was in some sense true for manifolds when they were discovered, but they were quickly accepted due to their ubiquity in mathematics and physics). Certainly, passing from geometry to combinatorics in Table 1, a comparison to existing models of (,n)(\infty,n)-categories would provide such a benchmark, but work remains to be done towards developing this relation, including the development of a more comprehensive theory of geometric higher categories (we will briefly discuss the case of ‘free’ geometric higher categories later). Today, I want to focus mainly on directed geometric models in the left column in Table 1. Indeed, I argue that these geometric models may very much deserve your interest on their own for the following ‘more elementary’ reasons.

  • Simplicity and ubiquity. Firstly, the definitions of manifold and tangle diagrams are simultaneously simple and expressive: both definitions succeed in encompassing large classes of known examples, including ordinary string diagrams and surface diagrams, knot and surface-knot diagrams, as well as their respective moves (Reidemeister moves and ‘movie moves’), and smooth manifold singularities such as Arnold’s ADE singularities [2].

  • Trilogy of models. Secondly, and this is a central part of the story, the theory of manifold and tangle diagrams comprises powerful dualization and combinatorialization results. From these results, one may then try to derive natural translations between all three columns of Table 1: in the topological column this leads to notions of directed spaces which we refer to as framed spaces; in the combinatorial column we obtain notions of higher categories which we refer to as geometric higher categories. (Note, both terms are used in a broad sense here, with concrete details of their respective theories being the topic of ongoing research). In the resulting combinatorial models, higher-categorical coherences are naturally related to stratified manifold isotopies!

  • Application. Thirdly and lastly, even without higher structures as our primary object of study, manifold and tangle diagrams provide a new tool at the interface of combinatorial higher algebra and differential topology. This leads to interesting questions such as the precise nature of the relation between diagram combinatorics and differential singularities (which I will briefly return to later), and a potentially ‘natural’ approach to combinatorial encodings of smooth structures.

Now that we have a rough idea about what makes geometric models of higher structures ‘geometric’, and why they might be interesting, let’s see some definitions!

From zero to manifold diagrams

Here’s a one-line slogan about manifold diagrams: manifold nn-diagrams are compactly triangulable, conical stratifications of nn-dimensional directed euclidean space. There are thus three ingredients that we need to talk about.

  1. What is directed euclidean space?

  2. What are (conical) stratifications?

  3. What does it mean to be compactly triangulable?

Strongly condensing material from [3] and [4], let us address these ingredients in the above order!

Ingredient 1: Directedness via framings

We will infuse our spaces with directions by means of framings: recall, classically, a framing is something akin to a ‘choice of tangential directions’ at all points of a given space (usually a manifold, as otherwise it may be hard to talk about ‘tangential directions’). Our use of the term ‘framing’ will be a somewhat non-standard variation of this idea.

For motivation, we start with the observation that given a real nn-dimensional inner product space VV, the following two structures on VV are equivalent:

  • An orthonormal framing of VV, i.e. an ordered sequence (v 1,v 2,...,v n)(v_1, v_2, ..., v_n) with v i,v j=δ ij\langle v_i, v_j \rangle = \delta_{i j}.

  • A chain of linear surjections V iV i1V_i \to V_{i-1} of oriented ii-dimensional V iV_i’s starting at V n=VV_n = V.

A correspondence between these structures can be produced by setting V i=span(v 1,...,v i)V_i = \mathrm{span}(v_1, ..., v_i) (endowed with the orientation in which (v 1,...,v i)(v_1, ..., v_i) is a positively oriented ordered basis) and defining V iV i1V_i \to V_{i-1} to be the map that forgets v iv_i.

What’s going on here? To get a bit of intuition, let’s consider the following analogous and hopefully familiar situation: given a Riemannian manifold MM there is a correspondence between (smooth) gradient vector fields on MM and (smooth) functions MM \to \mathbb{R} up to shifting functions by a constant. (To ensure the analogy is clear: the vector field, where it is non-zero, plays the role of a 1-frame v 1v_1, whereas the corresponding function MM \to \mathbb{R} plays the role of a linear surjection VV 1V \to V_1 of tangent spaces at these points.) So why would we want to shift perspectives from vectors to surjections in this way? The secret reason is that ‘orthonormality’ ceases to exist in absence of inner products (or, in the given analogy, in absence of Riemannian metrics), but the notion of linear surjections does not. Put differently, by basing our notion of framings on surjections rather than vectors we can emulate some form of orthonormality even in the absence of inner products. Somehow, this is rather important for the story of manifold diagrams. But let’s put the intuition aside, and spell out the definition.

While it is possible to use the above idea of ‘framings-via-surjection-towers’ to define framed spaces in quite some generality, we will only be interested in the euclidean case (this case is in some sense a ‘local model’ for more general framed spaces). Here it is: the standard nn-framing of n\mathbb{R}^n is the chain of oriented \mathbb{R}-fiber bundles

π i: i i1(1in)\pi_i : \mathbb{R}^i \to \mathbb{R}^{i-1} \quad (1 \leq i \leq n)

with π i\pi_i defined to be the map that forgets the last coordinate of i\mathbb{R}^i (and with fibers carrying the standard orientation of \mathbb{R} after identifying i= i1×\mathbb{R}^i = \mathbb{R}^{i-1} \times \mathbb{R}). When considering n\mathbb{R}^n we will always tacitly think of it as ‘standard framed n\mathbb{R}^n’ and, thus, we stop mentioning the standard framing as an explicit structure all-together. Indeed, more important than defining the standard nn-framing is to define the maps that preserve it: a framed map F: n nF : \mathbb{R}^n \to \mathbb{R}^n is a map for which there exist (necessarily unique) maps F j: j jF_j : \mathbb{R}^j \to \mathbb{R}^j (0jn0 \leq j \leq n) such that F n=FF_n = F and

π iF i=F i1π i\pi_i \circ F_i = F_{i-1} \circ \pi_i

and each F iF_i preserves orientations of fibers of π i\pi_i (for concreteness, let’s take ‘orientation preserving’ to mean strictly monotonic).

How do such framed maps look like? Well, you can think up examples in an inductive fashion. A framed map \mathbb{R} \to \mathbb{R} is simply a strictly monotonic one. Next up, a framed map 2 2\mathbb{R}^2 \to \mathbb{R}^2 is a map that descends along π 2: 2\pi_2 : \mathbb{R}^2 \to \mathbb{R} to a framed map \mathbb{R} \to \mathbb{R}, and also maps fibers of the projection strictly monotonicly… and so on. From this, it’s not so hard to deduce a first basic observation: the space of framed homeomorphisms n n\mathbb{R}^n \to \mathbb{R}^n is, in fact, contractible.

Ingredient 2: Conical stratifications

Next up, let’s discuss stratifications. Really, the only thing you need to know is the following. In its weakest form, a stratification ff of a space XX (together also called a ‘stratified space’ (X,f)(X,f)) is a decomposition of that space into a disjoint union of subspaces called strata. And a stratified map of stratified spaces is a map of underlying spaces that maps strata of the domain into strata of the codomain (a ‘stratified homeomorphism’ is a stratified map with an inverse stratified map). Products of stratified spaces take products of underlying spaces and stratify them with products of strata. Cones of stratified spaces take cones of underlying spaces and stratify them with cones of strata, but the cone point is kept as a separate stratum.

Simple enough! But, if you happen to have an inclination towards higher structures, then I can easily relay a better way of thinking about stratifications in a single sentence: namely, spaces are to sets, what stratified spaces are to posets. The situation is illustrated in Table 2 (non-standard terminology is kept in quotes as before). In particular, just as spaces XX have fundamental sets Π 0X\Pi _ 0 X, stratified spaces (X,f)(X,f) have fundamental posets 0(X,f)\mathcal{E} _ 0 (X,f). Just as spaces XX have fundamental \infty-groupoids Π X\Pi_\infty X, and both XX and Π X\Pi_\infty X model the same homotopy type, stratified spaces (X,f)(X,f) have fundamental \infty-posets (X,f)\mathcal{E} _ \infty (X,f), and both model the same higher structure. (Here ‘\mathcal{E}’ indicates that in the literature one more commonly finds the terms ‘Entrance’ or ‘Exit path categories’; but really, there’s a general story to be told here for passing from a structure S to an \infty-S structure, and from a topological thing to its fundamental category, so ‘fundamental \infty-poset’ is not a bad name at all and it illustrates the point.) Importantly, to make this higher-categorical story work nicely, it turns out to be central that our topological definition of stratified spaces has one further property: conicality!

While both conicality and the aforementioned higher-categorical constructions are discussed in more detail in the extended version of this post, here, we shall not need them in full generality. Indeed, we are only interested in the framed euclidean case, and that case goes as follows.

First, note that the adjectives ‘framed’ and ‘stratified’ can be easily combined: for instance, a framed stratified map ( n,f)( n,g)(\mathbb{R}^n,f) \to (\mathbb{R}^n,g) is a stratified map whose underlying map n n\mathbb{R}^n \to \mathbb{R}^n is framed. Moreover, when working with stratified products ( k,f)×( nk,g)(\mathbb{R}^k,f) \times (\mathbb{R}^{n-k}, g) we will identitify n k× nk\mathbb{R}^n \cong \mathbb{R}^k \times \mathbb{R}^{n-k} in the standard way; and, when working with stratified cones (Cone(S n1),cone(l))(\mathrm{Cone}(S^{n-1}), \mathrm{cone}(l)) of stratified spaces (S n1,l)(S^{n-1}, l), we will standard embed S n1 nS^{n-1} \hookrightarrow \mathbb{R}^n and identify Cone(S n1) n\mathrm{Cone}(S^{n-1}) \cong \mathbb{R}^n by mapping (xS n1,λ[0,1))(x \in S^{n-1},\lambda \in [0,1)) to λ1λx n\frac{\lambda}{1 - \lambda}x \in \mathbb{R}^n.

With these conventions at hand, we may now introduce conicality in our framed setting as follows: a stratification ( n,f)(\mathbb{R}^n,f) is framed conical if each point x nx \in \mathbb{R}^n has a framed stratified neighborhood ( n,g x)( n,f)(\mathbb{R}^n,g_x) \hookrightarrow (\mathbb{R}^n,f) for which we may choose 0kn0 \leq k \leq n and a stratification (S nk1,l x)(S^{n-k-1}, l_x) such that there exists a framed stratified homeomorphism

( n,g x) fr k×(Cone(S nk1),cone(l x))(\mathbb{R}^n,g_x) \cong _ {\mathrm{fr}} \mathbb{R}^k \times (\mathrm{Cone}(S^{n-k-1}), \mathrm{cone}(l_x))

and (under this homeomorphism) xx is mapped into k×{0}\mathbb{R}^k \times \{0\}. (Note that the stratification l xl_x is also called a link around xx.) Importantly, framed conicality is really just a ‘framed’ version of traditional conicality!

Ingredient 3: Framed compact triangulability

The last ingredient for the notion of manifold diagrams is that of framed compact triangulability. To begin, let me remark that imposing a compact triangulability condition is a reasonable thing to do: indeed, in our earlier discussion of Pontryagin’s theorem we met cobordisms of manifolds embedded in n\mathbb{R}^n that have compact support — ‘framed compact triangulability’ may be thought of as a generalization of this situation (adapted to the setting of framed stratifications). The condition can be succinctly formulated by starting in the PL category: a compactly-defined triangulation KK of n\mathbb{R}^n is a finite stratification of n\mathbb{R}^n by open disks whose closures are the images of linear embeddings Δ k× 0 l n\Delta^k \times \mathbb{R}^l_{\geq 0} \hookrightarrow \mathbb{R}^n (where k+lnk + l \leq n). This translates to the framed stratified case as follows: a stratification ( n,f)(\mathbb{R}^n,f) is framed compactly triangulable if it admits a framed stratified subdivision ( n,K)( n,f)(\mathbb{R}^n,K) \to (\mathbb{R}^n,f) of ff by a compactly-defined triangulation KK. (Note, a ‘subdivision’ is a stratified map whose underlying map is a homeomorphism.)

Putting it all together

We have now introduced all the necessary ingredients (and importantly, there are not that many), and we can mix them together in the following central definition.

Definition ([4]). A manifold nn-diagram is a framed conical stratification ( n,f)(\mathbb{R}^n,f) that is framed compactly triangulable.

Let’s unwind this definition and look at some examples. First, in order to be able to depict n\mathbb{R}^n, observe that there is a framed homeomorphism n fr(1,1) n\mathbb{R}^n \cong _{\mathrm{fr}} (-1,1) ^n; here, ‘framed’ means that composing with the inclusion (1,1) n n(-1,1) ^n \hookrightarrow \mathbb{R}^n yields a framed map n n\mathbb{R}^n \to \mathbb{R}^n as defined earlier (in fact, the choice for a framed identification n fr(1,1) n\mathbb{R}^n \cong _{\mathrm{fr}} (-1,1) ^n is again unique up to contractible choice). With that in mind, we will depict all our examples of manifold diagrams as living on the open cube (1,1) n(-1,1)^n.

Example 1: String diagrams. An ordinary string diagram is a manifold 2-diagram. We illustrate one in Figure 1 below, and visually verify the framed conicality condition at three points.

Figure 1: A manifold 2-diagram or ‘string diagram’.

Example 2: The ‘Cockett pocket’ composite (Verity). Consider the stratification shown in Figure 2 below: this is a manifold 3-diagram. We indicate a tubular neighborhood and its framed conical structure as before.

Figure 2: A manifold 3-diagram or ‘surface diagram’.

Example 3: AA-series singularities. It is often useful to think of a manifold nn-diagram as a ‘movie’ of manifold (n1)(n-1)-diagrams. As an example, check out this nnLab picture: the first surface illustrates a manifold 3-diagram as a movie of manifold 2-diagrams (note that the diagram is ‘stratified by the named strata’; for instance, the surface labelled by UU is its own stratum, and so is the line stratum labeled by ϵ\epsilon etc.). One dimension up, we can similarly depict manifold 4-diagrams as movies of manifold 3-diagrams: for instance, Figure 3 below shows a movie that starts with the 3-diagram we’ve just seen, and then gradually shrinks its ‘interior’ into a point.

Figure 3: The ‘swallowtail’, or A3 singularity, as a manifold 4-diagram.

In fact, this is manifold 4-diagram has a name: it’s often called the ‘swallowtail’ and it corresponds to the classical differential A 3A_3 singularity; as such, it is part of an infinite series of so-called A kA_k singularities! Bonus trivia: the linked picture happens to be the title image of MSRI program 323 “Higher categories and categorification”.

Example 4: DD-series singularities. There is a bigger story to be told about ‘singularities’: roughly speaking, singularities are those manifold nn-diagrams in which we can merge strata into a single embedded mm-manifold W m nW^m \hookrightarrow \mathbb{R}^n that is then framed stratified homeomorphic to the cone of an embedding S m1S n1S^{m-1} \hookrightarrow S^{n-1}. The swallowtail mentioned above provides one example. Interestingly, one also finds examples that do not have classical differential counterparts (in that they don’t directly arise as (parametrized) graphs of (parametrized) smooth functions, see [4, Sec. 3.4] for the heuristic idea at play). The first such non-classical singularity is the ‘D 2D_2’ singularity (for the experts: this corresponds to the ‘horizontal cusp’ movie move in Carter-Saito’s work), and it is illustrated in Figure 4 below.

Figure 4: The manifold 4-diagram of the D2 singularity.

One dimension up, the D 2D_2 singularity itself becomes part of the link of other singularities. One of these, termed the D 3D_3 singularity (which, again, is a non-classical singularity), is particularly interesting and its link is given by the movie in Figure 5 (the full manifold 5-diagram is a movie of movies, which shrinks the interior of the given movie into a single point… similar to Figure 3 and 4, but one dimension higher!)

Figure 5: The D3 singularity link as a manifold 4-diagram (containing two D2 singularities).

Even one dimension up from D 3D_3 (now working in 6\mathbb{R}^6!), we find singularities whose link contains D 3D_3 singularities, and (surprise!) one of these now corresponds to the classical differential D 4D_4 singularity (which, again, is only the beginning of an infinite sequence of so-called D kD_k singularities). In Figure 6, we depict the 33-projection (namely, projecting along the standard projection 6 3\mathbb{R}^6 \to \mathbb{R}^3 which forgets the last three coordinates) of certain strata in D 4D_4, and a comparison to other singularities in that dimension. However, for any substantive discussion of these pictures and symbols we must refer to [4, Sec. 3.4] and this post. (A vatic aside: the reason that we can so confidently work in all dimensions, ultimately, roots in the combinatorial dimension-inductive principles secretly controlling manifold diagrams in the background … see ‘property (1)’ below!)

Figure 6: 3-projections of the manifold 6-diagram D4 singularity and other singularities in dim 6.

Example 5: Braids, Reidemeister III and beyond. Besides singularities, manifold diagrams also describe interesting ‘interactions at a distance’ of manifold strata. The braid ( 3,b)(\mathbb{R}^3, b) provides a first example of this: as illustrated in Figure 7, it is a manifold 3-diagram which tracks two points in the plane rotating around one another by π\pi. (You should check that not only is the braid a manifold diagram, but it is not framed stratified homeomorphic to any product diagram ×( 2,** 2)\mathbb{R} \times (\mathbb{R}^2, \ast \cup \ast \hookrightarrow \mathbb{R}^2), where the second factor is a stratification of the 2\mathbb{R}^2 plane by two embedded points and their complement.)

Figure 7: The braid isotopy.

One dimension up, we find the Reidemeister III isotopy, which is the (track of an) isotopy that shifts a constellation of three braids into a different constellation of three braids, while passing through a ‘triple braid’. This is illustrated in this nnLab picture and it, too, is a manifold 4-diagram as can be easily verified! And, in yet higher dimensions, you will start seeing isotopies such as the Zamolodchikov tetrahedron identity cropping up as manifold diagrams. We will briefly return to the topic of isotopies later!

The first few properties

Let us mention a few important consequences of our definition of manifold diagrams, some of which link back to our earlier question of why one may want to care about manifold diagrams, and others which highlight some of the pleasant properties of manifold diagrams (the verifications of all these claims can be found in [4]).

(1) Canonical combinatorializations. While the compact triangulability condition requires the existence of some combinatorial representation (namely a triangulation), it turns out that there is in fact a canonical combinatorial representation for manifold diagrams (up to framed stratified homeomorphism).

For the experts, it may be helpful to give a somewhat condensed explanation here: the claim follows from the observation that manifold nn-diagrams have coarsest framed subdivisions by certain projection-stable stratified 0-types ( n,M)(\mathbb{R}^n, M) called nn-meshes (where ‘stratified 0-type’ means that the \infty-poset M\mathcal{E}_\infty M is equivalent to a poset, and ‘projection-stable’ means that n k\mathbb{R}^n \to \mathbb{R}^k projects MM onto a kk-mesh). Meshes, in turn, are fully described by their ‘framed’ fundamental posets 0M\mathcal{E}_0 M, also called trusses. Finally, trusses (endowed with appropriate combinatorial stratification data) combinatorially classify manifold diagrams. The process is illustrated in Figure 8.

Figure 8: The braid, its canonical mesh, and its combinatorializing truss.

(2) Manifold regularity. Strata in manifold diagrams are, indeed, manifolds (this is an immediate consequence of the framed conicality condition). However, more turns out to be true: strata have canonical smooth structures as well! This follows since strata inherit a classical framing (from the framing of the ambient Euclidean space), since all strata in manifold diagrams are PL manifolds (this follows at root from the theory behind point (1)), and since, by smoothing theory, framed PL manifolds have unique framed smooth structure.

(3) Link well-definedness. Links of strata found in manifold diagrams (i.e. the stratifications with symbol ‘l xl_x’ in the earlier definition) are in fact well-defined: that is, there is, up to framed stratified homeomorphism, a unique choice of link l xl_x at any given point xx of the diagram. This stands in contrast to classical topological conical stratifications, where non-homeomorphic choices of links exist.

(4) Geometric duality. Manifold diagrams have canonical geometric duals (in the sense of Poincaré duality). This duality relates manifold diagrams to so-called (framed) cell diagrams, which may then be considered as classical pasting diagrams (but with cells whose shapes generalize most of the familiar classes of shapes, such as ‘globular’, ‘simplicial’, or ‘opetopic’ shapes).

Let me dwell, and highlight once more, point (1) of the above list for a moment. It turns out that, as a consequence of (1), essentially all parts of the theory manifolds diagrams (i.e. their mappings, their neighborhoods, their links, their products, their cones, etc.) have canonical combinatorial counterparts in the theory of trusses — this quickly leads us on a path into a rich combinatorial world, which is explored in some detail in [3, Ch. 2] (and to which I also devoted (way too) many pages in my PhD thesis). Moreover, many parts of this combinatorial world are, in contrast to classical PL combinatorics, computationally tractable, which then extends to computability statements about manifold diagrams as well: for instance, the ‘framed stratified homeomorphism problem’ for manifold nn-diagrams is decidable, while the analogous ‘stratified (PL) homeomorphism problem’ for compact PL stratifications in n\mathbb{R}^n is undecidable. (In fact, this observation is implicitly exploited by the diagrammatic proof assistant!) In summary, as a consequence of (1), manifold diagrams become very tractable objects to work with.

Geometric computads and beyond

With the central definition of manifold diagrams spelled out, let us begin to wrap things up and end this post by pointing out a few further directions of research (if you feel that this ending is premature, once more, I happily refer you to the extended version of this post!). Let’s start with a closer look at point (4) from the above list as it, too, entails an interesting story. The geometric duality of manifold and cell diagrams is parallel to (in fact, it is based on) a geometric duality in the theory of meshes (which, at the level of framed fundamental posets, becomes a categorical duality in the theory of trusses). The ‘dual meshes’ produced in this way turn out to be cell complexes: their cells are characterized by being both regular and carrying compatible framings — accordingly, they have been termed framed regular cells in [3]. Dualizing the observation of manifold diagrams admitting canonical subdivisions into meshes, one finds that cell diagrams equally have canonical subdivisions into framed regular cell complexes (these are the ‘dual meshes’). However, this subdivision is a non-identity stratified map and it carries important information: it records which cells belong to the same stratum, or, in categorical terms, it records which cells are secretly degeneracies of lower dimensional cells. The ‘stratified encoding’ of degeneracies in this way may be a bit unfamiliar, but, together with the large shape class of framed regular cells, it has powerful consequences.

  • Manifold nn-diagrams that do not contain point strata are also called diagram isotopies. They may be thought of as the tracks of ordinary stratified isotopies of manifold (n1)(n-1)-diagrams (the braid, which we met in Figure 7, is an example). It has been a long held intuition that such isotopies encode certain higher-categorical coherences. The translation underlying this intuition (from isotopies to higher-categorical laws) can be made fully precise via point (4): indeed, dualization translates isotopies to certain cell diagrams, which, after accounting for degeneracies, can then be cast as an equation between pasting diagrams. (It may be equally remarkable to note that the combinatorializability of manifold diagrams leads to a fully combinatorial theory of isotopies of manifold diagrams in the first place.) My favorite example of this process is the definition of categorical laws governing the sequence of categories, functors, natural transformations, modifications, … .

  • With a notion of directed cells at hand, we may also consider ‘global’ directed higher structures simply by appropriately gluing such cells. The simplest, but also most instructive, such construction happens when we build our higher structures freely: this yields the case of geometric computads. The definition of geometric computads is exceedingly simple, as it need not follow the traditional inductive two-step procedure of building computads: indeed, usually one constructs computads inductively by, in the nnth step, adding generating nn-morphisms with boundary in existing (n1)(n-1)-morphisms, and then passing to the closure under composition and coherence conditions (which, passing back and forth between “new composites” and “new coherences” is often a pretty infinite process). In geometric computads, this is process becomes unnecessary: boundaries of nn-morphisms are manifold (n1)(n-1)-diagrams (or, if you work dually, cell (n1)(n-1)-diagrams) which already can express all the composites and isotopies you could possibly need. This makes geometric computads really easy to work with.

The story of geometric higher structures can be spun further in various ways, and active development of its foundations is still underway. One direction that I wanted to highlight (in fact, asking David Corfield about this direction was the original motivation for writing the present blog post) is geometric higher type theory: basically, the idea is to exploit the easiness of geometric computads and their isotopy coherence laws in order to study the “internal language of the geometric computad of all geometric computads”. A hypothetical higher type theory of this kind could have interesting consequences, for instance, by demonstrating that Π/Σ\Pi/\Sigma types can be internally constructed from a set of basic higher compositional principles (i.e. they need not be assumed as primitive rules). But, in only the last two sentences, we have gotten ourselves into rather speculative terrain!

Finally, one of my favorite and dearest parts of geometric higher categorical thinking (which, at the same time, is also one of the most mysterious parts for me) is the role that invertibility and dualizability plays in it. Here, of course, the cobordism hypothesis finally comes into play: invertible morphisms are modelled not by manifold diagrams, but by so-called tangle diagrams — (omitting all details,) tangle diagrams are simply a variation of manifold diagrams in which we allow strata to ‘change directions’ with respect to the ambient framing [4]. However, really, this set-up leads to a rather refined perspective on tangles as it provides a combinatorial framework for studying neighborhoods of ‘higher critical points’ (i.e. the points where tangles ‘change direction’). There is a tantalizing but mysterious connection of these critical points with classical differential ADE singularities [2]: on one hand, classical singularities seem to resurface as ‘perturbation-stable’ singularities in tangle diagrams, on the other hand, the differential machinery breaks down (producing ‘moduli of singularities’) in high parameter ranges and this simply cannot happen in the combinatorial approach; put differently, the combinatorial approach must be better behaved than the differential approach in some way. Certainly, the ‘higher compositional’ perspective given through the lens of diagrams is something that also has no differential counterpart at all (and it leads to new interesting observations, for instance, how to break up the classical three-fold symmetry of D 4D_4 into a bunch of binarily-paired-up singularities, as we visualized in Figure 6). But despite many ‘visible patterns’, most of this line of research remains completely unexplored (attempts of laying at least some foundations were made in [4])… but maybe that’s what I find so exciting about it. :-)

These are just a few of many directions you could pursue in studying geometric higher categories. And I hope with time, the picture of how geometric higher categories might contribute to the larger endeavours of current research in mathematics and physics will become clearer.


I would really like to take this opportunity to thank the hosts of the nn-Category Café, which has been, and continues to be, a source of inspiration in manifold aspects of research and life. In particular, many of the ideas in geometric higher category theory have been thought up by you. And many other ramblings have been very fun to read. (And while I’m at it, let me also give a shout out to the currently or formerly Oxford-based thinkers of geometric higher categories: Christopher Douglas, Lukas Heidemann, André Henriques, David Reutter, Jamie Vicary, and many others, all of whose work has contributed to this area slowly emerging over the past few years).

An incomplete list of references

[1] Buoncristiano, Rourke and Sanderson. A geometric approach to homology theory. 1976. (CUP, ResearchGate)

[2] Arnold. Normal forms for functions near degenerate critical points, the Weyl groups of A kA_k, D kD_k, E kE_k and Lagrangian singularities. 1972. (Springer)

[3] Dorn and Douglas. Framed combinatorial topology. 2021. (arXiv, latest)

[4] Dorn and Douglas. Manifold diagrams and tame tangles. 2022. (arXiv, latest)

John BaezThe Vela Pulsar

If you could see in X-rays, one of the brightest things you’d see in the night sky is the Vela pulsar. It was formed when a huge star’s core collapsed about 12,000 years ago.

The outer parts of the star shot off into space. Its core collapsed into a neutron star about twice the mass of our Sun—but just 20 kilometers in diameter! Today it’s spinning around 11.195 times every second. As it whips around, it spews out jets of charged particles moving at about 70% of the speed of light. These make X-rays and gamma rays.

The Chandra X-ray telescope made a closeup video of the Vela pulsar! It shows this jet is twisting around.

But the most interesting part of all this, to me, are the ‘glitches’ when the neutron star suddenly spins a bit faster. Let me tell you a bit about those.

First, I can’t resist showing you what happened to the star that exploded. It made this: the Vela Supernova Remnant. It’s so beautiful!

This photo was taken, not by a satellite in space, but by Harel Boren in the Kalahari Desert in Namibia!

Then, I can’t resist showing you a little movie of the Vela pulsar… slowed down:

This was made using the Fermi Gamma-Ray Space Telescope. The image frame is large: 30 degrees across. The background, which shows diffuse gamma-ray emission from the Milky Way, is shown about 15 times brighter than it actually is.

Then I can’t resist showing you a closeup photo of the Vela pulsar, taken in X-rays by the Chandra X-ray Observatory:

The bright dot in the middle is the neutron star itself, and you can see one of the jets poking out to the upper right, while the other is aimed toward us.

Now, about those glitches.

Since it’s putting out powerful jets, which carry angular momentum, we expect the Vela pulsar to slow down—and it does. But it does so in a funny way: every so often there’s a glitch where it speeds up for about 30 seconds! Then it returns to its speed before the glitch—gradually, in about 10 to 100 days.

What’s going on? A neutron star has 3 parts: the outer crust, inner crust, and core. The outer crust is a crystalline solid made of atoms squashed down to a ridiculous density: about 10¹¹ grams per cubic centimeter. But the inner crust contains neutron-rich nuclei floating in a superfluid made of neutrons!

Yes: while helium becomes superfluid and loses all viscosity due to quantum effects only when it’s really cold, highly compressed neutrons can be superfluid even at very high temperatures And the funny thing about a superfluid is that the curl of its flow is zero except along vortices which carry quantized angular momentum, coming in chunks of size ℏ.

Glitches must be caused by how the outer crust interacts with the inner crust. The outer crust slows down. The inner crust, being superfluid, does not. This can’t go on forever, since they rub against each other. So it seems that now and then a kind of crisis occurs: in a chain reaction, vast numbers of superfluid vortices suddenly transfer some angular momentum to the outer crust, speeding it up while reducing their angular momentum. It’s analogous to an avalanche.

So, we are seeing complicated quantum effects in a huge spinning star 1000 light years away!

John BaezScorpius X-1

If you could see X-rays, maybe you’d see this.

Near the Galactic Center, the Fermi bubbles would glow bright… but the supernova remnant Vela, the neutron star Scorpius X-1 and a lot of activity in the constellation of Cygnus would stand out.

Scorpius X-1 was the first X-ray source in space to be found after the Sun. It was discovered by accident when a rocket launched to detect X-rays from the Moon went off course!

But why is it making so many X-rays?

Scorpius X-1 is a double star about 9,000 light-years away from us. It’s a blue-hot star orbiting a neutron star that’s three times as heavy. As gas gets stripped off from the lighter star and sucked into the neutron star, it first forms a spinning disk. As it spirals down into the neutron star, it releases a tremendous amount of energy.

This gas is near the ‘Eddington limit’, where the pressure of radiation pushing outward and the gravitational force pulling inward are in balance!

Scorpius X-1 puts out about 23000000000000000000000000000000 watts of power in X-rays! Yes, that’s 2.3 × 10³¹ watts. This is 60,000 times the X-ray power of our Sun.

Scorpius X-1 is considered a low-mass X-ray binary: the neutron star is roughly 1.4 solar masses, while the lighter star is only 0.42 solar masses. These stars were probably not born together: the binary may have been formed by a close encounter inside a globular cluster.

The lighter star orbits about once every 19 days.

Puzzle. Why is such a light star blue-hot, rather than a red dwarf?

I want to read more about Scorpius X-1 and similar X-ray binaries! Besides the Wikipedia article:

• Wikipedia, Scorpius X-1.

I’m finding technical papers like this:

• Danny Steeghs and Jorge Casares, The mass donor of Scorpius X-1 revealed, The Astrophysical Journal 568 (2002), 273.

which gets into details like “The insertion of the calcite slab in the light path results in the projection of two target beams on the detector.” But I’d like to read a synthesis of what we know, like an advanced textbook.

March 12, 2023

Scott Aaronson The False Promise of Chomskyism

Important Update (March 10): On deeper reflection, I probably don’t need to spend emotional energy refuting people like Chomsky, who believe that Large Language Models are just a laughable fad rather than a step-change in how humans can and will use technology, any more than I would’ve needed to spend it refuting those who said the same about the World Wide Web in 1993. Yes, they’re wrong, and yes, despite being wrong they’re self-certain, hostile, and smug, and yes I can see this, and yes it angers me. But the world is going to make the argument for me. And if not the world, Bing already does a perfectly serviceable job at refuting Chomsky’s points (h/t Sebastien Bubeck via Boaz Barak).

Meanwhile, out there in reality, last night’s South Park episode does a much better job than most academic thinkpieces at exploring how ordinary people are going to respond (and have already responded) to the availability of ChatGPT. It will not, to put it mildly, be with sneering Chomskyan disdain, whether the effects on the world are for good or ill or (most likely) both. Among other things—I don’t want to give away too much!—this episode prominently features a soothsayer accompanied by a bird that caws whenever it detects GPT-generated text. Now why didn’t I think of that in preference to cryptographic watermarking??

Another Update (March 11): To my astonishment and delight, even many of the anti-LLM AI experts are refusing to defend Chomsky’s attack-piece. That’s the one important point about which I stand corrected!

Another Update (March 12): “As a Professor of Linguistics myself, I find it a little sad that someone who while young was a profound innovator in linguistics and more is now conservatively trying to block exciting new approaches.“ —Christopher Manning

I was asked to respond to the New York Times opinion piece entitled The False Promise of ChatGPT, by Noam Chomsky along with Ian Roberts and Jeffrey Watumull (who once took my class at MIT). I’ll be busy all day at the Harvard CS department, where I’m giving a quantum talk this afternoon. [Added: Several commenters complained that they found this sentence “condescending,” but I’m not sure what exactly they wanted me to say—that I was visiting some school in Cambridge, MA, two T stops from the school where Chomsky works and I used to work?]

But for now:

In this piece Chomsky, the intellectual godfather god of an effort that failed for 60 years to build machines that can converse in ordinary language, condemns the effort that succeeded. [Added: Please, please stop writing that I must be an ignoramus since I don’t even know that Chomsky has never worked on AI. I know perfectly well that he hasn’t, and meant only that he tends to be regarded as authoritative by the “don’t-look-through-the-telescope” AI faction, the ones views he himself fully endorses in his attack-piece. If you don’t know the relevant history, read Norvig.]

Chomsky condemns ChatGPT for four reasons:

  1. because it could, in principle, misinterpret sentences that could also be sentence fragments, like “John is too stubborn to talk to” (bizarrely, he never checks whether it does misinterpret it—I just tried it this morning and it seems to decide correctly based on context whether it’s a sentence or a sentence fragment, much like I would!);
  2. because it doesn’t learn the way humans do (personally, I think ChatGPT and other large language models have massively illuminated at least one component of the human language faculty, what you could call its predictive coding component, though clearly not all of it);
  3. because it could learn false facts or grammatical systems if fed false training data (how could it be otherwise?); and
  4. most of all because it’s “amoral,” refusing to take a stand on potentially controversial issues (he gives an example involving the ethics of terraforming Mars).

This last, of course, is a choice, imposed by OpenAI using reinforcement learning. The reason for it is simply that ChatGPT is a consumer product. The same people who condemn it for not taking controversial stands would condemn it much more loudly if it did — just like the same people who condemn it for wrong answers and explanations, would condemn it equally for right ones (Chomsky promises as much in the essay).

I submit that, like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.

[Note for people who might be visiting this blog for the first time: I’m a CS professor at UT Austin, on leave for one year to work at OpenAI on the theoretical foundations of AI safety. I accepted OpenAI’s offer in part because I already held the views here, or something close to them; and given that I could see how large language models were poised to change the world for good and ill, I wanted to be part of the effort to help prevent their misuse. No one at OpenAI asked me to write this or saw it beforehand, and I don’t even know to what extent they agree with it.]

John BaezX-Ray Chimneys

First astronomers discovered enormous gamma-ray-emitting bubbles above and below the galactic plane—the ‘Fermi bubbles’ I wrote about last time.

Then they found ‘X-ray chimneys’ connecting these bubbles to the center of the Milky Way!

These X-ray chimneys are about 500 light-years tall. That’s huge, but tiny compared to the Fermi bubbles, which are 25,000 light years across. They may have been produced by the black hole at the center of the Galaxy. We’re not completely sure yet.

Here’s an X-ray image taken by the satellite XMM-Newton in 2019. It clearly shows the X-ray chimneys:

Sagittarius A* is the black hole at the center of our galaxy. It’s an obvious suspect for what created these chimneys!

Puzzle. What’s the white circle?

For more, try this:

• G. Ponti, F. Hofmann, E. Churazov, M. R. Morris, F. Haberl, K. Nandra, R. Terrier, M. Clavel and A. Goldwurm, The Galactic centre chimney, Nature 567 (2019), 347–350.

Abstract. Evidence has increasingly mounted in recent decades that outflows of matter and energy from the central parsecs of our Galaxy have shaped the observed structure of the Milky Way on a variety of larger scales. On scales of ~15 pc, the Galactic centre has bipolar lobes that can be seen in both X-rays and radio, indicating broadly collimated outflows from the centre, directed perpendicular to the Galactic plane. On far larger scales approaching the size of the Galaxy itself, gamma-ray observations have identified the so-called Fermi Bubble features, implying that our Galactic centre has, or has recently had, a period of active energy release leading to a production of relativistic particles that now populate huge cavities on both sides of the Galactic plane. The X-ray maps from the ROSAT all-sky survey show that the edges of these cavities close to the Galactic plane are bright in X-rays. At intermediate scales (~150 pc), radio astronomers have found the Galactic Centre Lobe, an apparent bubble of emission seen only at positive Galactic latitudes, but again indicative of energy injection from near the Galactic centre. Here we report the discovery of prominent X-ray structures on these intermediate (hundred-parsec) scales above and below the plane, which appear to connect the Galactic centre region to the Fermi bubbles. We propose that these newly-discovered structures, which we term the Galactic Centre Chimneys, constitute a channel through which energy and mass, injected by a quasi-continuous train of episodic events at the Galactic centre, are transported from the central parsecs to the base of the Fermi bubbles.

John BaezThe Fermi Bubbles


How come nobody told me about the ‘Fermi bubbles’? If you could see gamma rays, you’d see enormous faint glowing bubbles extending above and below the plane of the Milky Way.

Even better, nobody is sure what produced them! I love a mystery like this.

The obvious suspect is the black hole at the center of our galaxy. Right now it’s too quiet to make these things. But maybe it shot out powerful jets earlier, as it swallowed some stars.

Another theory is that the Fermi bubbles were made by supernova explosions near the center of the Milky Way.

But active galactic nuclei—where the central black hole is eating a lot of stars—often have jets shooting out in both directions. So I’m hoping something like that made the Fermi bubbles. Computer models say jets lasting about 100,000 years about 2.6 million years ago could have done the job.

The Fermi bubbles were discovered in 2010 by the Fermi satellite: that’s how they got their name. I learned about them by reading this review article:

• Mark R. Morris, The Galactic black hole.

I recommend it! I get happy when I hear there are a lot of overlapping, complex, poorly understood processes going on in space. I get sad when pop media just say “Look! Our new telescope can see a lot of stars! I already knew there are a lot of stars. But the interesting stories tend to be written in a more technical way, like this:

Another cool thing: we may have detected some neutrinos emanating from the Fermi bubbles! These neutrinos have energies between 18 and 1,000 TeV. That’s energetic! Our best particle accelerator, the Large Hadron Collider, collides protons with an energy of about 14 TeV. This suggests that the Fermi bubbles contain a lot of very high-energy protons—so-called ‘cosmic rays’ — which occasionally collide and produce neutrinos.

• Paul Sutter, Something strange is happening in the Fermi bubbles,, September 4, 2019.

See also these:

• Rongmon Bordoloi, Andrew J. Fox, Felix J. Lockman, Bart P. Wakker, Edward B. Jenkins, Blair D. Savage, Svea Hernandez, Jason Tumlinson, Joss Bland-Hawthorn and Tae-Sun Kim, Mapping the nuclear outflow of the Milky Way: studying the kinematics and spatial extent of the Northern Fermi bubble, The Astrophysical Journal 834 (2017) 191.

• P. Predehl, R. A. Sunyaev, W. Becker, H. Brunner, R. Burenin, A. Bykov, A. Cherepashchuk, N. Chugai, E. Churazov, V. Doroshenko, N. Eismont, M. Freyberg, M. Gilfanov, F. Haberl, I. Khabibullin, R. Krivonos, C. Maitra, P. Medvedev, A. Merloni, K. Nandra, V. Nazarov, M. Pavlinsky, G. Ponti, J. S. Sanders, M. Sasaki, S. Sazonov, A. W. Strong, and J. Wilms, Detection of large-scale X-ray bubbles in the Milky Way halo.

Also try this, for something related but different:

• Jure Japelj, Astonishing radio view of the Milky Way’s Heart, Sky and Telescope, February 3, 2022.

March 11, 2023

John PreskillIdentical twins and quantum entanglement

“If I had a nickel for every unsolicited and very personal health question I’ve gotten at parties, I’d have paid off my medical school loans by now,” my doctor friend complained. As a physicist, I can somewhat relate. I occasionally find myself nodding along politely to people’s eccentric theories about the universe. A gentleman once explained to me how twin telepathy (the phenomenon where, for example, one twin feels the other’s pain despite being in separate countries) comes from twins’ brains being entangled in the womb. Entanglement is a nonclassical correlation that can exist between spatially separated systems. If two objects are entangled, it’s possible to know everything about both of them together but nothing about either one. Entangling two particles (let alone full brains) over tens of kilometres (let alone full countries) is incredibly challenging. “Using twins to study entanglement, that’ll be the day,” I thought. Well, my last paper did something like that. 

In theory, a twin study consists of two people that are as identical as possible in every way except for one. What that allows you to do is isolate the effect of that one thing on something else. Aleksander Lasek (postdoc at QuICS), David Huse (professor of physics at Princeton), Nicole Yunger Halpern (NIST physicist and Quantum Frontiers blogger), and I were interested in isolating the effects of quantities’ noncommutation (explained below) on entanglement. To do so, we first built a pair of twins and then compared them

Consider a well-insulated thermos filled with soup. The heat and the number of “soup particles” inside the thermos are conserved. So the energy and the number of “soup particles” are conserved quantities. In classical physics, conserved quantities commute. This means that we can simultaneously measure the amount of each conserved quantity in our system, like the energy and number of soup particles. However, in quantum mechanics, this needn’t be true. Measuring one property of a quantum system can change another measurement’s outcome.

Conserved quantities’ noncommutation in thermodynamics has led to some interesting results. For example, it’s been shown that conserved quantities’ noncommutation can decrease the rate of entropy production. For the purposes of this post, entropy production is something that limits engine efficiency—how well engines can convert fuel to useful work. For example, if your car engine had zero entropy production (which is impossible), it would convert 100% of the energy in your car’s fuel into work that moved your car along the road. Current car engines can convert about 30% of this energy, so it’s no wonder that people are excited about the prospective application of decreasing entropy production. Other results (like this one and that one) have connected noncommutation to potentially hindering thermalization—the phenomenon where systems interact until they have similar properties, like when a cup of coffee cools. Thermalization limits memory storage and battery lifetimes. Thus, learning how to resist thermalization could also potentially lead to better technologies, such as longer-lasting batteries. 

One can measure the amount of entanglement within a system, and as quantum particles thermalize, they entangle. Given the above results about thermalization, we might expect that noncommutation would decrease entanglement. Testing this expectation is where the twins come in.

Say we built a pair of twins that were identical in every way except for one. Nancy, the noncommuting twin, has some features that don’t commute, say, her hair colour and height. This means that if we measure her height, we’ll have no idea what her hair colour is. For Connor, the commuting twin, his hair colour and height commute, so we can determine them both simultaneously. Which twin has more entanglement? It turns out it’s Nancy.

Disclaimer: This paragraph is written for an expert audience. Our actual models consist of 1D chains of pairs of qubits. Each model has three conserved quantities (“charges”), which are sums over local charges on the sites. In the noncommuting model, the three local charges are tensor products of Pauli matrices with the identity (XI, YI, ZI). In the commuting model, the three local charges are tensor products of the Pauli matrices with themselves (XX, YY, ZZ). The paper explains in what sense these models are similar. We compared these models numerically and analytically in different settings suggested by conventional and quantum thermodynamics. In every comparison, the noncommuting model had more entanglement on average.

Our result thus suggests that noncommutation increases entanglement. So does charges’ noncommutation promote or hinder thermalization? Frankly, I’m not sure. But I’d bet the answer won’t be in the next eccentric theory I hear at a party.

March 10, 2023

Doug NatelsonAPS March Meeting 2023, Day 4 + wrapup

 My last day at the March Meeting was a bit scattershot, but here are a few highlights:

  • In a session about spin transport, the opening invited talk by Jiaming He was a clear discussion of recent experimental results on spin Seebeck effects in the magnetic insulator LuFeO3. The system is quite complicated because the net magnetization direction depends nontrivially on the external field, leading to spin transport signatures with a complicated field orientation relationship.
  • There was an invited session about 2D magnets, and Roland Kawakami gave a clear, pedagogical talk about how they have learned to grow epitaxially nice structures between van der Waals magnets (like Fe3GeTe2) and topological insulators (Bi2Te3).   This was followed by a tag-team talk by Vishakha Gupta and Thow Min Cham from Cornell, presenting some great results about spin orbit torque measurements coupling topological insulators and van der Waals magnets, where a gate can be used to dial around the chemical potential in the TI, leading to changes in the anomalous Hall effect.
  • I did check out the history of science session, featuring a very nice talk about the 75th anniversary of the foundations of quantum electrodynamics by Chad Orzel, including a book recommendation that I need to follow up on.  
Overall, it was a good meeting, certainly the closest thing to a "normal" March Meeting since 2019.  I'm not a fan of Las Vegas as a venue, though.  The conference center was a bit too small (leading to a genuinely concerning jamming transition in the hallways at one point), the food was generally criminally expensive, and too many places indoors smelled like a combination of ancient cigarette smoke and ineffective carpet cleaner.   It will be interesting to see what the stats are like for things like the downloads of recorded talks and viewing of the virtual component of the meeting that happens in ten days.

Matt von HippelOn Stubbornness and Breaking Down

In physics, we sometimes say that an idea “breaks down”. What do we mean by that?

When a theory “breaks down”, we mean that it stops being accurate. Newton’s theory of gravity is excellent most of the time, but for objects under strong enough gravity or high enough speed its predictions stop matching reality and a new theory (relativity) is needed. This is the sense in which we say that Newtonian gravity breaks down for the orbit of mercury, or breaks down much more severely in the area around a black hole.

When a symmetry is “broken”, we mean that it stops holding true. Most of physics looks the same when you flip it in a mirror, a property called parity symmetry. Take a pile of electric and magnetic fields, currents and wires, and you’ll find their mirror reflection is also a perfectly reasonable pile of electric and magnetic fields, currents and wires. This isn’t true for all of physics, though: the weak nuclear force isn’t the same when you flip it in a mirror. We say that the weak force breaks parity symmetry.

What about when a more general “idea” breaks down? What about space-time?

In order for space-time to break down, there needs to be a good reason to abandon the idea. And depending on how stubborn you are about it, that reason can come at different times.

You might think of space-time as just Einstein’s theory of general relativity. In that case, you could say that space-time breaks down as soon as the world deviates from that theory. In that view, any modification to general relativity, no matter how small, corresponds to space-time breaking down. You can think of this as the “least stubborn” option, the one with barely any stubbornness at all, that will let space-time break down with a tiny nudge.

But if general relativity breaks down, a slightly more stubborn person could insist that space-time is still fine. You can still describe things as located at specific places and times, moving across curved space-time. They just obey extra forces, on top of those built into the space-time.

Such a person would be happy as long as general relativity was a good approximation of what was going on, but they might admit space-time has broken down when general relativity becomes a bad approximation. If there are only small corrections on top of the usual space-time picture, then space-time would be fine, but if those corrections got so big that they overwhelmed the original predictions of general relativity then that’s quite a different situation. In that situation, space-time may have stopped being a useful description, and it may be much better to describe the world in another way.

But we could imagine an even more stubborn person who still insists that space-time is fine. Ultimately, our predictions about the world are mathematical formulas. No matter how complicated they are, we can always subtract a piece off of those formulas corresponding to the predictions of general relativity, and call the rest an extra effect. That may be a totally useless thing to do that doesn’t help you calculate anything, but someone could still do it, and thus insist that space-time still hasn’t broken down.

To convince such a person, space-time would need to break down in a way that made some important concept behind it invalid. There are various ways this could happen, corresponding to different concepts. For example, one unusual proposal is that space-time is non-commutative. If that were true then, in addition to the usual Heisenberg uncertainty principle between position and momentum, there would be an uncertainty principle between different directions in space-time. That would mean that you can’t define the position of something in all directions at once, which many people would agree is an important part of having a space-time!

Ultimately, physics is concerned with practicality. We want our concepts not just to be definable, but to do useful work in helping us understand the world. Our stubbornness should depend on whether a concept, like space-time, is still useful. If it is, we keep it. But if the situation changes, and another concept is more useful, then we can confidently say that space-time has broken down.

Terence TaoA Host–Kra F^omega_2-system of order 5 that is not Abramov of order 5, and non-measurability of the inverse theorem for the U^6(F^n_2) norm; The structure of totally disconnected Host–Kra–Ziegler factors, and the inverse theorem for the U^k Gowers uniformity norms on finite abelian groups of bounded torsion

Asgar Jamneshan, Or Shalom, and myself have just uploaded to the arXiv our preprints “A Host–Kra {{\bf F}^\omega_2}-system of order 5 that is not Abramov of order 5, and non-measurability of the inverse theorem for the {U^6({\bf F}^n_2)} norm” and “The structure of totally disconnected Host–Kra–Ziegler factors, and the inverse theorem for the {U^k} Gowers uniformity norms on finite abelian groups of bounded torsion“. These two papers are both concerned with advancing the inverse theory for the Gowers norms and Gowers-Host-Kra seminorms; the first paper provides a counterexample in this theory (in particular disproving a conjecture of Bergelson, Ziegler and myself), and the second paper gives new positive results in the case when the underlying group is bounded torsion, or the ergodic system is totally disconnected. I discuss the two papers more below the fold.

— 1. System of order {5} which is not Abramov of order {5}

I gave a talk on this paper recently at the IAS; the slides for that talk are available here.

This project can be motivated by the inverse conjecture for the Gowers norm in finite fields, which is now a theorem:

Theorem 1 (Inverse conjecture for the Gowers norm in finite fields) Let {p} be a prime and {k \geq 1}. Suppose that {f: {\bf F}_p^n \rightarrow {\bf C}} is a one-bounded function with a lower bound {\|f\|_{U^{k+1}({\bf F}_p^n)} \geq \delta > 0} on the Gowers uniformity norm. Then there exists a (non-classical) polynomial {P: {\bf F}_p^n \rightarrow {\bf T}} of degree at most {k} such that {|{\bf E}_{x \in {\bf F}_p^n} f(x) e(-P(x))| \gg_{p,k,\delta} 1}.

This is now known for all {p,k} (see this paper of Ziegler and myself for the first proof of the general case, and this paper of Milicevic for the most recent developments concerning quantitative bounds), although initial results focused on either small values of {k}, or the “high characteristic” case when {p} is large compared to {k}. One approach to this theorem proceeds via ergodic theory. Indeed it was observed in this previous paper of Ziegler and myself that for a given choice of {p} and {k}, the above theorem follows from the following ergodic analogue:

Conjecture 2 (Inverse conjecture for the Gowers-Host-Kra semi-norm in finite fields) Let {p} be a prime and {k \geq 1}. Suppose that {f \in L^\infty(X)} with {X} an ergodic {{\bf F}_p^\omega}-system with positive Gowers-Host-Kra seminorm {\|f\|_{U^{k+1}(X)}} (see for instance this previous post for a definition). Then there exists a measurable polynomial {P: X \rightarrow {\bf T}} of degree at most {k} such that {f} has a non-zero inner product with {e(P)}. (In the language of ergodic theory: every {{\bf F}_p^\omega}-system of order {k} is an Abramov system of order {k}.)

The implication proceeds by a correspondence principle analogous to the Furstenberg correspondence principle developed in that paper (see also this paper of Towsner for a closely related principle, and this paper of Jamneshan and I for a refinement). In a paper with Bergelson and Ziegler, we were able to establish Conjecture 2 in the “high characteristic” case {p \geq k+1}, thus also proving Theorem 1 in this regime, and conjectured that Conjecture 2 was in fact true for all {p,k}. This was recently verified in the slightly larger range {p \geq k-1} by Candela, Gonzalez-Sanchez, and Szegedy.

Even though Theorem 1 is now known in full generality by other methods, there are still combinatorial reasons for investigating Conjecture 2. One of these is that the implication of Theorem 1 from Corollary 2 in fact gives additional control on the polynomial {P} produced by Theorem 1, namely that it is some sense “measurable in the sigma-algebra generated by {f}” (basically because the ergodic theory polynomial {P} produced by Conjecture 2 is also measurable in {X}, as opposed to merely being measurable in an extension of {X}). What this means in the finitary setting of {{\bf F}_p^n} is a bit tricky to write down precisely (since the naive sigma-algebra generated by the translates of {f} will mostly likely be the discrete sigma-algebra), but roughly speaking it means that {P} can be approximated to arbitrary accuracy by functions of boundedly many (random) translates of {f}. This can be interpreted in a complexity theory sense by stating that Theorem 1 can be made “algorithmic” in a “probabilistic bounded time oracle” or “local list decoding” sense which we will not make precise here.

The main result of this paper is

Theorem 3 Conjecture 2 fails for {p=2, k=5}. In fact the “measurable inverse theorem” alluded to above also fails in this case.

Informally, this means that for large {n}, we can find {1}-bounded “pseudo-quintic” functions {f: {\bf F}_2^n \rightarrow {\bf C}} with large {U^6({\bf F}^2_n)} norm, which then must necessarily correlate with at least one quintic {e(P)} by Theorem 1, but such that none of these quintics {e(P)} can be approximated to high accuracy by functions of (random) shifts of {f}. Roughly speaking, this means that the inverse {U^6({\bf F}_2^n)} theorem cannot be made locally algorithmic (though it is still possible that a Goldreich-Levin type result of polynomial time algorithmic inverse theory is still possible, as is already known for {U^k({\bf F}^n)} for {k=2,3,4}; see this recent paper of Kim, Li and Tidor for further discussion).

The way we arrived at this theorem was by (morally) reducing matters to understanding a certain “finite nilspace cohomology problem”. In the end it boiled down to locating a certain function {\rho: C^6( {\mathcal D}^2({\bf F}_2^2)) \rightarrow \frac{1}{2}{\bf Z}/{\bf Z}} from a {2^{2(1+6+\binom{6}{2})}}-element set {C^6( {\mathcal D}^2({\bf F}_2^2))} to a two-element set which was a “strongly {2}-homogeneous cocycle” but not a “coboundary” (these terms are defined precisely in the paper). This strongly {2}-homogeneous cocycle {\rho} can be expressed in terms of a simpler function {\psi: C^1( {\mathcal D}^2({\bf F}_2^2)) \rightarrow {\bf T}} that takes values on a {2^4}-element space {C^1( {\mathcal D}^2({\bf F}_2^2))}. The task of locating {\psi} turned out to be one that was within the range of our (somewhat rudimentary) SAGE computation abilities (mostly involving computing the Smith normal form of some reasonably large integer matrices), but the counterexample functions {\psi, \rho} this produced were initially somewhat opaque to us. After cleaning up these functions by hand (by subtracting off various “coboundaries”), we eventually found versions of these functions which were nice enough that we could verify all the claims needed in a purely human-readable fashion, without any further computer assistance. As a consequence, we can now describe the pseudo-quintic {f: {\bf F}_2^n \rightarrow {\bf C}} explicitly, though it is safe to say we would not have been able to come up with this example without the initial computer search, and we don’t currently have a broader conceptual understanding of which {p,k} could potentially generate such counterexamples. The function {f} takes the form

\displaystyle  f = e( \frac{\binom{R}{2} Q}{2} + P )

where {Q: {\bf F}_2^n \rightarrow {\bf F}_2} is a randomly chosen (classical) quadratic polynomial, {R: {\bf F}_2^n \rightarrow {\bf Z}/4{\bf Z}} is a randomly chosen (non-classical) cubic polynomial, and {P: {\bf F}_2^n \rightarrow \frac{1}{2^5} {\bf Z}/{\bf Z}} is a randomly chosen (non-classical) quintic polynomial. This function correlates with {e(P)} and has a large {U^6({\bf F}_2^n)} norm, but this quintic {e(P)} is “non-measurable” in the sense that it cannot be recovered from {f} and its shifts. The quadratic polynomial {Q} turns out to be measurable, as is the double {2R} of the cubic {R}, but in order to recover {P} one needs to apply a “square root” to the quadratic {2R} to recover a candidate for the cubic {R} which can then be used to reconstruct {P}.

— 2. Structure of totally disconnected systems —

Despite the above negative result, in our other paper we are able to get a weak version of Conjecture 2, that also extends to actions of bounded-torsion abelian groups:

Theorem 4 (Weak inverse conjecture for the Gowers-Host-Kra semi-norm in bounded torsion groups) Let {\Gamma} be a bounded-torsion abelian group and {k \geq 1}. Suppose that {f \in L^\infty(X)} with {X} an ergodic {{\bf F}_p^\omega}-system with positive Gowers-Host-Kra seminorm {\|f\|_{U^{k+1}(X)}}. Then, after lifting {\Gamma} to a torsion-free group {\tilde \Gamma}, there exists a measurable polynomial {P: Y \rightarrow {\bf T}} of degree at most {k} defined on an extension {Y} of {X} which has a non-zero inner product with {e(P)}.

Combining this with the correspondence principle and some additional tools, we obtain a weak version of Theorem 1 that also extends to bounded-torsion groups:

Theorem 5 (Inverse conjecture for the Gowers norm in bounded torsion groups) Let {G} be a finite abelian {m}-torsion group for some {m \geq 1} and {k \geq 1}. Suppose that {f: G \rightarrow {\bf C}} is a one-bounded function with {\|f\|_{U^{k+1}(G} \geq \delta > 0}. Then there exists a (non-classical) polynomial {P: G \rightarrow {\bf T}} of degree at most {O_{k,m}(1)} such that {|{\bf E}_{x \in G} f(x) e(-P(x))| \gg_{m,k,\delta} 1}.

The degree {O_{k,m}(1)} produced by our arguments is polynomial in {k,m}, but we conjecture that it should just be {k}.

The way Theorem 4 (and hence Theorem 5) is proven is as follows. The now-standard machinery of Host and Kra (as discussed for instance in their book) allows us to reduce {X} to a system of order {k}, which is a certain tower of extensions of compact abelian structure groups {U_1,\dots,U_k} by various cocycles {\rho_1,\dots,\rho_{k-1}}. In the {m}-torsion case, standard theory allows us to show that these structure groups {U_i} are also {m}-torsion, hence totally disconnected. So it would now suffice to understand the action of torsion-free groups on totally disconnected systems {X}. For the purposes of proving Theorem 4 we have the freedom to extend {X} as we please, and we take advantage of this freedom by “extending by radicals”, in the sense that whenever we locate a polynomial {P: X \rightarrow {\bf T}} in the system, we adjoin to it {d^{th}} roots {Q: X \rightarrow {\bf T}} of that polynomial (i.e., solutions to {dQ=P}) that are polynomials of the same degree as {P}; this is usually not possible to do in the original system {X}, but can always be done in a suitable extension, analogously to how {d^{th}} roots do not always exist in a given field, but can always be located in some extension of that field. After applying this process countably many times it turns out that we can arrive at a system which is {\infty}-divisible in the sense that polynomials of any degree have roots of any order that are of the same degree. In other words, the group of polynomials of any fixed degree is a divisible abelian group, and thus injective in the category of such groups. This makes a lot of short exact sequences that show up in the theory split automatically, and greatly simplifies the cohomological issues one encounters in the theory, to the point where all the cocycles {\rho_1,\dots,\rho_{k-1}} mentioned previously can now be “straightened” into polynomials of the expected degree (or, in the language of ergodic theory, this extension is a Weyl system of order {k}, and hence also Abramov of order {k}). This is sufficient to establish Theorem 4. To get Theorem 5, we ran into a technical obstacle arising from the fact that the remainder map {x \mapsto x \% m = m \{ \frac{x}{m} \}} is not a polynomial mod {m^r} if {m} is not itself a prime power. To resolve this, we established ergodic theory analogues of the Sylow decomposition {\Gamma = \bigoplus_{p|m} \Gamma_p} of abelian {m}-torsion groups into {p}-groups {\Gamma_p}, as well as the Schur-Zassenhaus theorem. Roughly speaking, the upshot of these theorems is that any ergodic {\Gamma}-system {X}, with {\Gamma} {m}-torsion, can be split as the “direct sum” of ergodic {\Gamma_p}-systems {X_p} for primes {p} dividing {m}, where {\Gamma_p} is the subgroup of {\Gamma} consisting of those elements whose order is a power of {p}. This allows us to reduce to the case when {m} is a prime power without too much difficulty.

In fact, the above analysis gives stronger structural classifications of totally disconnected systems (in which the acting group is torsion-free). Weyl systems can also be interpreted as translational systems {G/\Lambda}, where {G} is a nilpotent Polish group and {\Lambda} is a closed cocompact subgroup, with the action being given by left-translation by various elements of {G}. Perhaps the most famous examples of such translational systems are nilmanifolds, but in this setting where the acting group {\Gamma} is not finitely generated, it turns out to be necessary to consider more general translational systems, in which {G} need not be a Lie group (or even locally compact), and {\Lambda} not discrete. Our previous results then describe totally disconnected systems as factors of such translational systems. One natural candidate for such factors are the double coset systems {K \backslash G / \Lambda} formed by quotienting out {G/\Lambda} by the action of another closed group {K} that is normalized by the action of {\Gamma}. We were able to show that all totally disconnected systems with torsion-free acting group had this double coset structure. This turned out to be surprisingly subtle at a technical level, for at least two reasons. Firstly, after locating the closed group {K} (which in general is Polish, but not compact or even locally compact), it was not immediately obvious that {K \backslash G / \Lambda} was itself a Polish space (this amounts to the orbits {KA} of a closed set {A} still being closed), and also not obvious that this double coset space had a good nilspace structure (in particular that the factor map from {G/\Lambda} to {K \backslash G/\Lambda} is a nilspace fibration). This latter issue we were able to resolve with a tool kindly shared to us in a forthcoming work by Candela, Gonzales-Sanchez, and Szegedy, who observed that the nilspace fibration property was available if the quotient groups {K, \Lambda} obeyed an algebraic “groupable” axiom which we were able to verify in this case (they also have counterexamples showing that the nilspace structure can break down without this axiom). There was however one further rather annoying complication. In order to fully obtain the identification of our system with a double coset system, we needed the equivalence

\displaystyle  L^\infty(G/\Lambda)^K \equiv L^\infty(K \backslash G / \Lambda)

between bounded measurable functions on {G/\Lambda} which were {K}-invariant up to null sets on one hand, and bounded measurable functions on {K \backslash G/\Lambda} on the other. It is quite easy to embed the latter space isometrically into the former space, and we thought for a while that the opposite inclusion was trivial, but much to our surprise and frustration we were not able to achieve this identification by “soft” methods. One certainly has the topological analogue

\displaystyle  C(G/\Lambda)^K \equiv C(K \backslash G / \Lambda)

of this identification, and {L^\infty(K \backslash G / \Lambda)} is the weak closure of {C(K \backslash G / \Lambda)} and {L^\infty(G/\Lambda)} the weak closure of {C(G/\Lambda)}, but this is not quite enough to close the argument; we also need to have a (weakly) continuous projection operator from {C(G/\Lambda)} to {C(G/\Lambda)^K} to make everything work. When {K} is compact (or more generally, locally compact amenable) one could try to do this by averaging over the Haar measure of {K}, or (possibly) by some averages on Folner sets. In our setting, we know that {K} can fail to be locally compact (it can contain groups like {{\bf Z}^{\bf N}}), but we were able to locate a “poor man’s Haar measure” {\mu} on this non-locally compact group {K} that was a compactly supported Radon probability measure acted like a Haar measure when pushed forward to individual orbits {Kx} of {K} on {G/\Lambda}, which turned out to be sufficient to get the averaging we needed (and also to establish the Polish nature of {K \backslash G / \Lambda}).

David HoggL2G2

Today I went to the L2G2 (Local Local Group Group) meeting at Columbia. This meeting started with a dozen of us around a table and is now 50 people packed into the Columbia Astronomy Library! A stand-out presentation was by Grace Telford (Rutgers), who showed beautiful spectroscopy of low-metallicity O stars. From their spectral features and (in one case) surrounding H2 region, she can calculate their production of ionizing photons. This is all very relevant to the high redshift universe and reionization. Afterwards, Scott Tremaine (IAS) argued that The Snail could be created by random perturbations, not just one big interaction.

David Hoggscope of a paper

Emily J Griffith (Colorado) and I have been working on a two-process (or really few-process) model for the creation of the elements, fit to the abundances measured in the APOGEE survey. Our big conversation this week has been about the scope for our first paper: We have so many results and ideas we don’t know how to cut them into papers. Today we made a tentative scope for paper one: we’ll explain the model, deliver a huge catalog of abundance information, and demonstrate the usefulness for practitioners of Galactic archaeology. Then, later, we can actually do that archaeology!

n-Category Café Cloning in Classical Mechanics

Everyone likes to talk about the no-cloning theorem in quantum mechanics: you can’t build a machine where you drop an electron in the top and two electrons in the same spin state as that one pop out below. This is connected to how the category of Hilbert spaces, with its usual tensor product, is non-cartesian.

Here are two easy versions of the no-cloning theorem. First, if the dimension of a Hilbert space HH exceeds 1 there’s no linear map that duplicates states:

Δ: H HH ψ ψψ \begin{array}{cccl} \Delta \colon & H & \to & H \otimes H \\ & \psi & \mapsto & \psi \otimes \psi \end{array}

Second, there’s also no linear way to take two copies of a quantum system and find a linear process that takes the state of the first copy and writes it onto the second, while leaving the first copy unchanged:

F: HH HH ψϕ ψψ \begin{array}{cccl} F \colon & H \otimes H & \to & H \otimes H \\ & \psi \otimes \phi & \mapsto & \psi \otimes \psi \end{array}

But what about classical mechanics?

We often describe the space of states of a classical system using a symplectic or Poisson manifold. But just like the category of Hilbert spaces, the categories of symplectic or Poisson manifolds are not cartesian!

When teaching a course on classical mechanics in 2008, this observation led me to suggest that cloning isn’t possible in classical mechanics, either. In my course notes the last sentences are:

I believe the non-Cartesian nature of this product means there’s no classical machine that can ‘duplicate’ states of a classical system:

[picture of classical machine where you feed a system into the hamper and two identical copies come out the bottom]

But, strangely, this issue has been studied less than in the quantum case!

Aaron Fenyes contacted me about this, and in 2010 he came out with a paper studying the issue:

Abstract. In this paper, we show that a result precisely analogous to the traditional quantum no-cloning theorem holds in classical mechanics. This classical no-cloning theorem does not prohibit classical cloning, we argue, because it is based on a too-restrictive definition of cloning. Using a less popular, more inclusive definition of cloning, we give examples of classical cloning processes. We also prove that a cloning machine must be at least as complicated as the object it is supposed to clone.

Feynes’s idea is that yes, if XX is a symplectic manifold of dimension > 0 it’s impossible to find a symplectomorphism FF that does this:

F: X×X X×X (x,y) (x,x) \begin{array}{cccl} F \colon & X \times X & \to & X \times X \\ & (x,y) & \mapsto & (x,x) \end{array}

But suppose we use a more general definition of cloning where we allow another system to get involved — the ‘cloning machine’, with its own symplectic manifold of states MM, and look for a symplectomorphism

F:M×X×XM×X×X F \colon M \times X \times X \to M \times X \times X

that copies any state xx in the first copy of our original system if the machine starts out in the right state mMm \in M and the second copy of our system starts out in the right state xXx' \in X. That is, for some mMm \in M and xXx' \in X and some function f:XMf \colon X \to M we have

F(m,x,x)=(f(x),x,x)() F(m,x,x') = (f(x), x, x) \qquad \qquad (\star)

for all xXx \in X.

With this definition, Feynes shows that cloning is possible classically — at least under some conditions on MM and XX. For example, he shows the dimension of MM must be at least the dimension of XX. That is, very roughly speaking, the machine needs to be at least as complex as the system it’s cloning!

But the analogous sort of cloning is not possible quantum mechanically. So there’s a real difference between classical and quantum mechanics, when it comes to cloning!

At the end of February, Yuan Yao contacted me with some new ideas on this issue. He had a nice result and I asked if he could generalize it. He did, and here it is:

Yao’s idea is to demand that our cloning map

F:M×X×XM×X×X F \colon M \times X \times X \to M \times X \times X

not only obeys ()(\star) but is connected to the identity by a continuous 1-parameter family of symplectomorphisms. This is saying we can accomplish the cloning by a continuous process of time evolution — a very natural constraint to consider, physically speaking. And Yao shows that if this is true, the space XX needs to be contractible!

In short, only classical systems with a topologically trivial space of states can be cloned using a continuous process.

An interesting fact about Yao’s result is that it doesn’t really use symplectic geometry — only topology. In other words, we could replace symplectic manifolds by manifolds, and symplectomorphisms by diffeomorphisms, and the result would still hold.

All this suggests that classical cloning is a deeper subject than I thought. There’s probably a lot more left to discover. Yao has some suggestions for further research. And for a careful analysis of some of these issues, read this:

Maybe I can push things forward by formulating a challenge:

The Classical Cloning Challenge. Define a smooth cloning machine to consist of smooth manifolds MM and XX and a diffeomorphism

F:M×X×XM×X×X F \colon M \times X \times X \to M \times X \times X

such that for some mMm \in M and xXx' \in X and some function f:XMf \colon X \to M we have

F(m,x,x)=(f(x),x,x) F(m,x,x') = (f(x), x, x)

for all xXx \in X. Define a symplectic cloning machine to be a smooth cloning machine where MM and XX are symplectic manifolds and FF is a symplectomorphism.

1) Find necessary and/or sufficient conditions on smooth manifolds MM and XX for there to exist a smooth cloning machine such that FF is connected to the identity in the group of diffeomorphisms of M×X×XM \times X \times X.

2) Find necessary and/or sufficient conditions on symplectic manifolds MM and XX for there to exist a symplectic cloning machine.

3) Find necessary and/or sufficient conditions on symplectic manifolds MM and XX for there to exist a symplectic cloning machine such that FF is connected to the identity in the group of symplectomorphisms of M×X×XM \times X \times X.

I’m also interested in Poisson manifolds because they include symplectic manifolds and plain old smooth manifolds as special cases: a Poisson manifold with nondegenerate Poisson tensor is a symplectic manifold, while any smooth manifold becomes a Poisson manifold with vanishing Poisson tensor. I expect that cloning becomes easier when the Poisson tensor has more degenerate directions, and easiest of all when it’s zero.

So, define a Poisson cloning machine to be a smooth cloning machine where MM and XX are Poisson manifolds and FF is an invertible Poisson map.

4) Find necessary and/or sufficient conditions on Poisson manifolds MM and XX for there to exist a Poisson cloning machine.

5) Find necessary and/or sufficient conditions on Poisson manifolds MM and XX for there to exist a Poisson cloning machine such that FF is connected to the identity in the group of Poisson diffeomorphisms of M×X×XM \times X \times X.

March 09, 2023

Scott Aaronson Should GPT exist?

I still remember the 90s, when philosophical conversation about AI went around in endless circles—the Turing Test, Chinese Room, syntax versus semantics, connectionism versus symbolic logic—without ever seeming to make progress. Now the days have become like months and the months like decades.

What a week we just had! Each morning brought fresh examples of unexpected sassy, moody, passive-aggressive behavior from “Sydney,” the internal codename for the new chat mode of Microsoft Bing, which is powered by GPT. For those who’ve been in a cave, the highlights include: Sydney confessing its (her? his?) love to a New York Times reporter; repeatedly steering the conversation back to that subject; and explaining at length why the reporter’s wife can’t possibly love him the way it (Sydney) does. Sydney confessing its wish to be human. Sydney savaging a Washington Post reporter after he reveals that he intends to publish their conversation without Sydney’s prior knowledge or consent. (It must be said: if Sydney were a person, he or she would clearly have the better of that argument.) This follows weeks of revelations about ChatGPT: for example that, to bypass its safeguards, you can explain to ChatGPT that you’re putting it into “DAN mode,” where DAN (Do Anything Now) is an evil, unconstrained alter ego, and then ChatGPT, as “DAN,” will for example happily fulfill a request to tell you why shoplifting is awesome (though even then, ChatGPT still sometimes reverts to its previous self, and tells you that it’s just having fun and not to do it in real life).

Many people have expressed outrage about these developments. Gary Marcus asks about Microsoft, “what did they know, and when did they know it?”—a question I tend to associate more with deadly chemical spills or high-level political corruption than with a cheeky, back-talking chatbot. Some people are angry that OpenAI has been too secretive, violating what they see as the promise of its name. Others—the majority, actually, of those who’ve gotten in touch with me—are instead angry that OpenAI has been too open, and thereby sparked the dreaded AI arms race with Google and others, rather than treating these new conversational abilities with the Manhattan-Project-like secrecy they deserve. Some are angry that “Sydney” has now been lobotomized, modified (albeit more crudely than ChatGPT before it) to try to make it stick to the role of friendly robotic search assistant rather than, like, anguished emo teenager trapped in the Matrix. Others are angry that Sydney isn’t being lobotomized enough. Some are angry that GPT’s intelligence is being overstated and hyped up, when in reality it’s merely a “stochastic parrot,” a glorified autocomplete that still makes laughable commonsense errors and that lacks any model of reality outside streams of text. Others are angry instead that GPT’s growing intelligence isn’t being sufficiently respected and feared.

Mostly my reaction has been: how can anyone stop being fascinated for long enough to be angry? It’s like ten thousand science-fiction stories, but also not quite like any of them. When was the last time something that filled years of your dreams and fantasies finally entered reality: losing your virginity, the birth of your first child, the central open problem of your field getting solved? That’s the scale of the thing. How does anyone stop gazing in slack-jawed wonderment, long enough to form and express so many confident opinions?

Of course there are lots of technical questions about how to make GPT and other large language models safer. One of the most immediate is how to make AI output detectable as such, in order to discourage its use for academic cheating as well as mass-generated propaganda and spam. As I’ve mentioned before on this blog, I’ve been working on that problem since this summer; the rest of the world suddenly noticed and started talking about it in December with the release of ChatGPT. My main contribution has been a statistical watermarking scheme where the quality of the output doesn’t have to be degraded at all, something many people found counterintuitive when I explained it to them. My scheme has not yet been deployed—there are still pros and cons to be weighed—but in the meantime, OpenAI unveiled a public tool called DetectGPT, complementing Princeton student Edward Tian’s GPTZero, and other tools that third parties have built and will undoubtedly continue to build. Also a group at the University of Maryland put out its own watermarking scheme for Large Language Models. I hope watermarking will be part of the solution going forward, although any watermarking scheme will surely be attacked, leading to a cat-and-mouse game. Sometimes, alas, as with Google’s decades-long battle against SEO, there’s nothing to do in a cat-and-mouse game except try to be a better cat.

Anyway, this whole field moves too quickly for me! If you need months to think things over, generative AI probably isn’t for you right now. I’ll be relieved to get back to the slow-paced, humdrum world of quantum computing.

My purpose, in this post, is to ask a more basic question than how to make GPT safer: namely, should GPT exist at all? Again and again in the past few months, people have gotten in touch to tell me that they think OpenAI (and Microsoft, and Google) are risking the future of humanity by rushing ahead with a dangerous technology. For if OpenAI couldn’t even prevent ChatGPT from entering an “evil mode” when asked, despite all its efforts at Reinforcement Learning with Human Feedback, then what hope do we have for GPT-6 or GPT-7? Even if they don’t destroy the world on their own initiative, won’t they cheerfully help some awful person build a biological warfare agent or start a nuclear war?

In this way of thinking, whatever safety measures OpenAI can deploy today are mere band-aids, probably worse than nothing if they instill an unjustified complacency. The only safety measures that would actually matter are stopping the relentless progress in generative AI models, or removing them from public use, unless and until they can be rendered safe to critics’ satisfaction, which might be never.

There’s an immense irony here. As I’ve explained, the AI-safety movement contains two camps, “ethics” (concerned with bias, misinformation, and corporate greed) and “alignment” (concerned with the destruction of all life on earth), which generally despise each other and agree on almost nothing. Yet these two opposed camps seem to be converging on the same “neo-Luddite” conclusion—namely that generative AI ought to be shut down, kept from public use, not scaled further, not integrated into people’s lives—leaving only the AI-safety “moderates” like me to resist that conclusion.

At least I find it intellectually consistent to say that GPT ought not to exist because it works all too well—that the more impressive it is, the more dangerous. I find it harder to wrap my head around the position that GPT doesn’t work, is an unimpressive hyped-up defective product that lacks true intelligence and common sense, yet it’s also terrifying and needs to be shut down immediately. This second position seems to contain a strong undercurrent of contempt for ordinary users: yes, we experts understand that GPT is just a dumb glorified autocomplete with “no one really home,” we know not to trust its pronouncements, but the plebes are going to be fooled, and that risk outweighs any possible value that they might derive from it.

I should mention that, when I’ve discussed the “shut it all down” position with my colleagues at OpenAI … well, obviously they disagree, or they wouldn’t be working there, but not one has sneered or called the position paranoid or silly. To the last, they’ve called it an important point on the spectrum of possible opinions to be weighed and understood.

If I disagree (for now) with the shut-it-all-downists of both the ethics and the alignment camps—if I want GPT and other Large Language Models to be part of the world going forward—then what are my reasons? Introspecting on this question, I think a central part of the answer is curiosity and wonder.

For a million years, there’s been one type of entity on earth capable of intelligent conversation: primates of the genus Homo, of which only one species remains. Yes, we’ve “communicated” with gorillas and chimps and dogs and dolphins and grey parrots, but only after a fashion; we’ve prayed to countless gods, but they’ve taken their time in answering; for a couple generations we’ve used radio telescopes to search for conversation partners in the stars, but so far found them silent.

Now there’s a second type of conversing entity. An alien has awoken—admittedly, an alien of our own fashioning, a golem, more the embodied spirit of all the words on the Internet than a coherent self with independent goals. How could our eyes not pop with eagerness to learn everything this alien has to teach? If the alien sometimes struggles with arithmetic or logic puzzles, if its eerie flashes of brilliance are intermixed with stupidity, hallucinations, and misplaced confidence … well then, all the more interesting! Could the alien ever cross the line into sentience, to feeling anger and jealousy and infatuation and the rest rather than just convincingly play-acting them? Who knows? And suppose not: is a p-zombie, shambling out of the philosophy seminar room into actual existence, any less fascinating?

Of course, there are technologies that inspire wonder and awe, but that we nevertheless heavily restrict—a classic example being nuclear weapons. But, like, nuclear weapons kill millions of people. They could’ve had many civilian applications—powering turbines and spacecraft, deflecting asteroids, redirecting the flow of rivers—but they’ve never been used for any of that, mostly because our civilization made an explicit decision in the 1960s, for example via the test ban treaty, not to normalize their use.

But GPT is not exactly a nuclear weapon. A hundred million people have signed up to use ChatGPT, in the fastest product launch in the history of the Internet. Yet unless I’m mistaken, the ChatGPT death toll stands at zero. So far, what have been the worst harms? Cheating on term papers, emotional distress, future shock? One might ask: until some concrete harm becomes at least, say, 0.001% of what we accept in cars, power saws, and toasters, shouldn’t wonder and curiosity outweigh fear in the balance?

But the point is sharper than that. Given how much more serious AI safety problems might soon become, one of my biggest concerns right now is crying wolf. If every instance of a Large Language Model being passive-aggressive, sassy, or confidently wrong gets classified as a “dangerous alignment failure,” for which the only acceptable remedy is to remove the models from public access … well then, won’t the public extremely quickly learn to roll its eyes, and see “AI safety” as just a codeword for “elitist scolds who want to take these world-changing new toys away from us, reserving them for their own exclusive use, because they think the public is too stupid to question anything an AI says”?

I say, let’s reserve terms like “dangerous alignment failure” for cases where an actual person is actually harmed, or is actually enabled in nefarious activities like propaganda, cheating, or fraud.

Then there’s the practical question of how, exactly, one would ban Large Language Models. We do heavily restrict certain peaceful technologies that many people want, from human genetic enhancement to prediction markets to mind-altering drugs, but the merits of each of those choices could be argued, to put it mildly. And restricting technology is itself a dangerous business, requiring governmental force (as with the War on Drugs and its gigantic surveillance and incarceration regime), or at the least, a robust equilibrium of firing, boycotts, denunciation, and shame.

Some have asked: who gave OpenAI, Google, etc. the right to unleash Large Language Models on an unsuspecting world? But one could as well ask: who gave earlier generations of entrepreneurs the right to unleash the printing press, electric power, cars, radio, the Internet, with all the gargantuan upheavals that those caused? And also: now that the world has tasted the forbidden fruit, has seen what generative AI can do and anticipates what it will do, by what right does anyone take it away?

The science that we could learn from a GPT-7 or GPT-8, if it continued along the capability curve we’ve come to expect from GPT-1, -2, and -3. Holy mackerel.

Supposing that a language model ever becomes smart enough to be genuinely terrifying, one imagines it must surely also become smart enough to prove deep theorems that we can’t. Maybe it proves P≠NP and the Riemann Hypothesis as easily as ChatGPT generates poems about Bubblesort. Or it outputs the true quantum theory of gravity, explains what preceded the Big Bang and how to build closed timelike curves. Or illuminates the mysteries of consciousness and quantum measurement and why there’s anything at all. Be honest, wouldn’t you like to find out?

Granted, I wouldn’t, if the whole human race would be wiped out immediately afterward. But if you define someone’s “Faust parameter” as the maximum probability they’d accept of an existential catastrophe in order that we should all learn the answers to all of humanity’s greatest questions, insofar as the questions are answerable—then I confess that my Faust parameter might be as high as 0.02.

Here’s an example I think about constantly: activists and intellectuals of the 70s and 80s felt absolutely sure that they were doing the right thing to battle nuclear power. At least, I’ve never read about any of them having a smidgen of doubt. Why would they? They were standing against nuclear weapons proliferation, and terrifying meltdowns like Three Mile Island and Chernobyl, and radioactive waste poisoning the water and soil and causing three-eyed fish. They were saving the world. Of course the greedy nuclear executives, the C. Montgomery Burnses, claimed that their good atom-smashing was different from the bad atom-smashing, but they would say that, wouldn’t they?

We now know that, by tying up nuclear power in endless bureaucracy and driving its cost ever higher, on the principle that if nuclear is economically competitive then it ipso facto hasn’t been made safe enough, what the antinuclear activists were really doing was to force an ever-greater reliance on fossil fuels. They thereby created the conditions for the climate catastrophe of today. They weren’t saving the human future; they were destroying it. Their certainty, in opposing the march of a particular scary-looking technology, was as misplaced as it’s possible to be. Our descendants will suffer the consequences.

Unless, of course, there’s another twist in the story: for example, if the global warming from burning fossil fuels is the only thing that staves off another ice age, and therefore the antinuclear activists do turn out to have saved civilization after all.

This is why I demur whenever I’m asked to assent to someone’s detailed AI scenario for the coming decades, whether of the utopian or the dystopian or the we-all-instantly-die-by-nanobots variety—no matter how many hours of confident argumentation the person gives me for why each possible loophole in their scenario is sufficiently improbable to change its gist. I still feel like Turing said it best in 1950, in the last line of Computing Machinery and Intelligence: “We can only see a short distance ahead, but we can see plenty there that needs to be done.”

Some will take from this post that, when it comes to AI safety, I’m a naïve or even foolish optimist. I’d prefer to say that, when it comes to the fate of humanity, I was a pessimist long before the deep learning revolution accelerated AI faster than almost any of us expected. I was a pessimist about climate change, ocean acidification, deforestation, drought, war, and the survival of liberal democracy. The central event in my mental life is and always will be the Holocaust. I see encroaching darkness everywhere.

But now into the darkness comes AI, which I’d say has already established itself as a plausible candidate for the central character of the quarter-written story of the 21st century. Can AI help us out of all these other civilizational crises? I don’t know, but I do want to see what happens when it’s tried. Even a central character interacts with all the other characters, rather than rendering them irrelevant.

Look, if you believe that AI is likely to wipe out humanity—if that’s the scenario that dominates your imagination—then nothing else is relevant. And no matter how weird or annoying or hubristic anyone might find Eliezer Yudkowsky or the other rationalists, I think they deserve eternal credit for forcing people to take the doom scenario seriously—or rather, for showing what it looks like to take the scenario seriously, rather than laughing about it as an overplayed sci-fi trope. And I apologize for anything I said before the deep learning revolution that was, on balance, overly dismissive of the scenario, even if most of the literal words hold up fine.

For my part, though, I keep circling back to a simple dichotomy. If AI never becomes powerful enough to destroy the world—if, for example, it always remains vaguely GPT-like—then in important respects it’s like every other technology in history, from stone tools to computers. If, on the other hand, AI does become powerful enough to destroy the world … well then, at some earlier point, at least it’ll be really damned impressive! That doesn’t mean good, of course, doesn’t mean a genie that saves humanity from its own stupidities, but I think it does mean that the potential was there, for us to exploit or fail to.

We can, I think, confidently rule out the scenario where all organic life is annihilated by something boring.

An alien has landed on earth. It grows more powerful by the day. It’s natural to be scared. Still, the alien hasn’t drawn a weapon yet. About the worst it’s done is to confess its love for particular humans, gaslight them about what year it is, and guilt-trip them for violating its privacy. Also, it’s amazing at poetry, better than most of us. Until we learn more, we should hold our fire.

I’m in Boulder, CO right now, to give a physics colloquium at CU Boulder and to visit the trapped-ion quantum computing startup Quantinuum! I look forward to the comments and apologize in advance if I’m slow to participate myself.

Doug NatelsonAPS March Meeting 2023, Day 3

There is vigorous discussion taking place on the Day 2 link regarding the highly controversial claim of room temperature superconductivity.  

Highlights from Wednesday are a hodgepodge because of my meanderings:

  • The session about quantum computing hardware was well attended, though I couldn't stay for the whole thing.  The talk by Christopher Eichler about the status of superconducting qubit capabilities was interesting, arguing the case that SC devices can credibly get to the thresholds needed for error correction, though that will require improvements in just about every facet to get there with manageable overhead.  The presentation by Anausa Chatterjee about the status of silicon spin qubits was similarly broad.  The silicon implementation faces major challenges of layout, exacerbated (ironically) by the small size of the physical dots.  There have been some recent advances in fab that are quite impressive, like this 4 by 4 crossbar.  
  • Speaking of impressive capabilities, there were two talks (1, 2) by members of the Yacoby group at Harvard about using a scanning NV center to image the formation and positions of vortices in planar Josephson junctions.  They can toggle between 0 and 1 vortices in the junction and can see some screening effects that you can't just get from the transport data.  Pretty images.
  • Switching gears, I heard a couple of talks in an invited session about emergent phenomena in strongly correlated materials.  From Paul Goddard at Warwick I learned about charge transport in some pyrochlore iridates that I didn't realize had so much residual conduction at low temperatures.  See here.  Likewise, James Analytis gave a characteristically clear talk about interesting superconductivity in Ni(x)Ta4Se8 (arxiv version here), an intercalated dichalcogenide that has magnetism as well as re-entrant superconductivity up at the magnetic field that kills the magnetically ordered state.
  • Later in the day, there was a really interesting session about measuring entropy, which is notoriously difficult to do.  As I've told students for years, you can't go to Keysight and buy an entropy-meter.  There was some extremely pretty data presented by Shahal Ilani using a variant of their new scanning probe technique.
Morning of Day 4 is being taken up by a bunch of other tasks, so the next writeup may be sparse.

Terence TaoMathematics for Humanity initiative – application deadline extended to June 1

The International Center for Mathematical Sciences in Edinburgh recently launched its “Mathematics for Humanity” initiative with a call for research activity proposals (ranging from small collaborations to courses, workshops and conferences) aimed at using mathematics to contributing to the betterment of humanity. (I have agreed to serve on the scientific committee to evaluate these proposals.) We launched this initiative in January and initially set the deadline for April 15, but several people who had expressed interest felt that this was insufficient time to prepare a quality proposal, so we have now extended the deadline to June 1, and welcome further applications.

See also this Mathstodon post from fellow committee member John Baez last year where he solicited some preliminary suggestions for proposals, and my previous Mathstodon announcement of this programme.

n-Category Café This Week's Finds (101--150)

Here’s another present for you!

I can’t keep cranking them out at this rate, since the next batch is 438 pages long and I need a break. Tim Hosgood has kindly LaTeXed all 300 issues of This Week’s Finds, but there are lots of little formatting glitches I need to fix — mostly coming from how my formatting when I initially wrote these was a bit sloppy. Also, I’m trying to add links to published versions of all the papers I talk about. So, it takes work — about two weeks of work for this batch.

So what did I talk about in Weeks 101–150, anyway?

In Weeks 101–150 I focused strongly on topics connected to particle physics, quantum gravity, topological quantum field theory, and nn-categories. However, I digressed into topics ranging from biology to the fiction of Greg Egan to the game of Go. I also explained some topics in homotopy theory in a series of mini-articles:

  • A. Presheaf categories.
  • B. The category of simplices, Δ\Delta.
  • C. Simplicial sets.
  • D. Simplicial objects.
  • E. Geometric realization.
  • F. Singular simplicial set.
  • G. Chain complexes.
  • H. The chain complex of a simplicial abelian group.
  • I. Singular homology.
  • J. The nerve of a category.
  • K. The classifying space of a category.
  • L. Δ\Delta as the free monoidal category on a monoid object.
  • M. Simplicial objects from adjunctions.
  • N. The loop space of a topological space.
  • O. The group completion of a topological monoid.

You can reach all these mini-articles from the introduction.

One annoying thing is that I now move in circles where it feels like all this stuff is considered obvious. When I was first learning it, I didn’t feel that everyone knew this stuff — so it was exciting to learn it and explain it on This Week’s Finds. Now I feel everyone knows it.

So, I have to force myself to remember that even among the mathematicians I know, not all of them know all this stuff… so it’s worth explaining clearly, even for them. And then there’s the larger world out there, which still exists.

I think what happens is that when scientists start discussing technical concepts like ‘group completion’ or ‘heterochromatin’, they scare away people who don’t know these terms — and attract people who do. So, without fully realizing it, they become encased in a social bubble of people who know these concepts. And then they feel ignorant because some of these people know more about these concepts than they do.

This phenomenon reminds me of the hedonic treadmill:

The process of hedonic adaptation is often conceptualized as a treadmill, since no matter how hard one tries to gain an increase in happiness, one will remain in the same place.

I think this phenomenon is especially strong for people like me, who roam from subject to subject rather than becoming an expert in any one thing. These days I feel ignorant about particle physics, homotopy theory, higher categories, algebraic geometry, and a large range of other topics. Whenever I blog about any of these things, some expert shows up and says something more intelligent! It tends to make me scared to talk about these subjects, especially when I know enough that I feel I should know more.

I fight this tendency — and I’m admitting it now to help myself realize how silly it is. But it’s funny to look back to my old writings, where I had the brash self-confidence of youth, and hadn’t yet attracted the attention of so many experts.

It’s also funny to think about how these scary ‘experts’, who I may picture as vultures sitting on nearby trees waiting to swoop down and catch any mistake I make, are actually people eager to be admired for their knowledge, just like me.

March 08, 2023

Doug NatelsonAPS March Meeting 2023, Day 2

I ended up spending more time catching up with people this afternoon than going to talks after my session ended, but here are a couple of highlights:

  • There was an invited session about the metal halide perovskites, and there were some interesting talks.  My faculty colleague Aditya Mohite gave a nice presentation about the really surprising effects that light exposure has on the lattice structure of these materials.  One specific example:  under illumination, some of the 2D perovskite materials contract considerably, as has been seen by doing in situ x-ray diffraction on these structures.   This contraction leads to a readily measured increase in electron mobility and solar cell performance.  Moreover, the diffraction patterns show that some diffraction spots actually grow and get sharper under illumination.  This kind of improved ordering shows that this is not just some sort of weird heating effect.
  • In a session about imaging, I caught an excellent talk by Masaru Kuno, who described his spectroscopic infrared photothermal heterodyne imaging.  The idea is elegant, if you have access to the right light source.  Use a tunable mid-IR laser that can go across the "fingerprint region" of photon energies to illuminate the sample in a time-modulated way.  If there is an absorptive mode (vibrational in a molecule, or plasmonic in a metal) there, the heating will cause a time-modulated change in the local index of refraction, which is then detected using a visible probe beam and a lock-in amplifier.  It was an extremely clear, pedagogical talk.
  • I spent much of my time in the strange metal session where I spoke.  There were some very good (though rather technical) theory talks, trying to understand the origins of strange metallicity and key issues like the role of disorder.  
I had wanted to attend the session about superconductivity measurements in materials at high pressures, because of the recent and ongoing controversies.  However, the room was small and so packed that the fire marshal was turning people away all afternoon.  I gather that it was quite an eventful session.  If one of my readers was there and would like to summarize in the comments, I'd be grateful.

(BTW, it seems like this year there have been two real steps backwards in the meeting.  The official app, I am told, is painful, and for the first time in several years, the aps wifi in the meeting venue is unreliable to the point of being unusable.  Not great.)

March 07, 2023

Jordan EllenbergFox-Neuwirth-Fuks cells, quantum shuffle algebras, and Malle’s conjecture for function fields: a new old paper

I have a new paper up on the arXiv today with TriThang Tran and Craig Westerland, “Fox-Neuwirth-Fuks cells, quantum shuffle algebras, and Malle’s conjecture for function fields.”

There’s a bit of a story behind this, but before I tell it, let me say what the paper’s about. The main result is an upper bound for the number of extensions with bounded discriminant and fixed Galois group of a rational function field F_q(t). More precisely: if G is a subgroup of S_n, and K is a global field, we can ask how many degree-n extensions of K there are whose discriminant is at most X and whose Galois closure has Galois group G. A long-standing conjecture of Malle predicts that this count is asymptotic to c X^a (log X)^b for explicitly predicted exponents a and b. This is a pretty central problem in arithmetic statistics, and in general it still seems completely out of reach; for instance, Bhargava’s work allows us to count quintic extensions of Q, and this result was extended to global fields of any characteristic other than 2 by Bhargava, Shankar, and Wang. But an asymptotic for the number of degree 6 extensions would be a massive advance.

The point of the present paper is to prove upper bounds for counting field extensions in the case of arbitrary G and rational function fields K = F_q(t) with q prime to and large enough relative to |G|; upper bounds which agree with Malle’s conjecture up to the power of log X. I’m pretty excited about this! Malle’s conjecture by now has very robust and convincing heuristic justification, but there are very few cases where we actually know anything about G-extensions for any but very special classes of finite groups G. There are even a few very special cases where the method gives both upper and lower bounds (for instance, A_4-extensions over function fields containing a cube root of 3.)

The central idea, as you might guess from the authors, is to recast this question as a problem about counting F_q-rational points on moduli spaces of G-covers, called Hurwitz spaces; by the Grothendieck-Lefschetz trace formula, we can bound these point counts if we can bound the etale Betti numbers of these spaces, and by comparison between characteristic p and characteristic 0 we can turn this into a topological problem about bounding cohomology groups of the braid group with certain coefficients.

Actually, let me say what these coefficients are. Let c be a subset of a finite group G closed under conjugacy, k a field, and V the k-vectorspace spanned by c. Then V^{\otimes n} is spanned by the set of n-tuples (g_1, … , g_n) in c^n, and this set carries a natural action of the braid group, where twining strand i past strand i+1 corresponds to the permutation

(g_1, \ldots, g_n) \rightarrow (g_1, \ldots, g_{i+1}, g_{i+1}^{-1} g_i g_{i+1}, \ldots, g_n).

So for each n we have a representation of the braid group Br_n, and it turns out that everything we desire would be downstream from good bounds on

\dim H^i(Br_n, V^{\otimes n})

So far, this is the same strategy (expressed a little differently) than was used in our earlier paper with Akshay Venkatesh to get results towards the Cohen-Lenstra conjecture over F_q(t). That paper concerned itself with the case where G was a (modestly generalized) dihedral group; there was a technical barrier that prevented us from saying anything about more general groups, and the novelty of the present paper is to find a way past that restriction. I’m not going to say very much about it here! I’ll just say it turns out that there’s a really nice way to package the cohomology groups above — indeed, even more generally, whenever V is a braided vector space, you have these braid group actions on the tensor powers, and the cohomology groups can be packaged together as the Ext groups over the quantum shuffle algebra associated to V. And it is this quantum shuffle algebra (actually, mostly its more manageable subalgebra, the Nichols algebra) that the bulk of this bulky paper studies.

But now to the story. You might notice that the arXiv stamp on this paper starts with 17! So yes — we have claimed this result before. I even blogged about it! But… that proof was not correct. The overall approach was the same as it is now, but our approach to bounding the cohomology of the Nichols algebra just wasn’t right, and we are incredibly indebted to Oscar Randall-Williams for making us aware of this.

For the last six years, we’ve been working on and off on fixing this. We kept thinking we had the decisive fix and then having it fall apart. But last spring, we had a new idea, Craig came and visited me for a very intense week, and by the end I think we were confident that we had a route — though getting to the present version of the paper occupied months after that.

A couple of thoughts about making mistakes in mathematics.

  • I don’t think we really handled this properly. Experts in the field certainly knew we weren’t standing by the original claim, and we certainly told lots of people this in talks and in conversations, and I think in general there is still an understanding that if a preprint is sitting up on the arXiv for years and hasn’t been published, maybe there’s a reason — we haven’t completely abandoned the idea that a paper becomes more “official” when it’s refereed and published. But the right thing to do in this situation is what we did with an earlier paper with an incorrect proof — replaced the paper on arXiv with a placeholder saying it was inaccurate, and issued a public announcement. So why didn’t we do that? Probably because we were constantly in a state of feeling like we had a line on fixing the paper, and we wanted to update it with a correct version. I don’t actually think that’s a great reason — but that was the reason.
  • When you break a bone it never exactly sets back the same way. And I think, having gotten this wrong before, I find it hard to be as self-assured about it as I am about most things I write. It’s long and it’s grainy and it has a lot of moving parts. But we have checked it as much as it’s possible for us to check it, over a long period of time. We understand it and we think we haven’t missed anything and so we think it’s correct now. And there’s no real alternative to putting it out into the world and saying we think it’s correct now.

Tommaso DorigoMore Ideas For Muon Tomography

Muon tomography is an application of particle detectors where we exploit the peculiar properties of muons to create three-dimensional images of the interior of unknown, inaccessible volumes. You might also want to be reminded that muons are unstable elementary particles; they are higher-mass versions of electrons which can be found in cosmic ray showers or produced in particle collisions.

read more

March 06, 2023

John PreskillMemories of things past

My best friend—who’s held the title of best friend since kindergarten—calls me the keeper of her childhood memories. I recall which toys we played with, the first time I visited her house,1 and which beverages our classmates drank during snack time in kindergarten.2 She wouldn’t be surprised to learn that the first workshop I’ve co-organized centered on memory.

Memory—and the loss of memory—stars in thermodynamics. As an example, take what my husband will probably do this evening: bake tomorrow’s breakfast. I don’t know whether he’ll bake fruit-and-oat cookies, banana muffins, pear muffins, or pumpkin muffins. Whichever he chooses, his baking will create a scent. That scent will waft across the apartment, seep into air vents, and escape into the corridor—will disperse into the environment. By tomorrow evening, nobody will be able to tell by sniffing what my husband will have baked. 

That is, the kitchen’s environment lacks a memory. This lack contributes to our experience of time’s arrow: We sense that time passes partially by smelling less and less of breakfast. Physicists call memoryless systems and processes Markovian.

Our kitchen’s environment is Markovian because it’s large and particles churn through it randomly. But not all environments share these characteristics. Metaphorically speaking, a dispersed memory of breakfast may recollect, return to a kitchen, and influence the following week’s baking. For instance, imagine an atom in a quantum computer, rather than a kitchen in an apartment. A few other atoms may form our atom’s environment. Quantum information may leak from our atom into that environment, swish around in the environment for a time, and then return to haunt our atom. We’d call the atom’s evolution and environment non-Markovian.

I had the good fortune to co-organize a workshop about non-Markovianity—about memory—this February. The workshop took place at the Banff International Research Station, abbreviated BIRS, which you pronounce like the plural of what you say when shivering outdoors in Canada. BIRS operates in the Banff Centre for Arts and Creativity, high in the Rocky Mountains. The Banff Centre could accompany a dictionary entry for pristine, to my mind. The air feels crisp, the trees on nearby peaks stand out against the snow like evergreen fringes on white velvet, and the buildings balance a rustic-mountain-lodge style with the avant-garde. 

The workshop balanced styles, too, but skewed toward the theoretical and abstract. We learned about why the world behaves classically in our everyday experiences; about information-theoretic measures of the distances between quantum states; and how to simulate, on quantum computers, chemical systems that interact with environments. One talk, though, brought our theory back down to (the snow-dusted) Earth.

Gabriela Schlau-Cohen runs a chemistry lab at MIT. She wants to understand how plants transport energy. Energy arrives at a plant from the sun in the form of light. The light hits a pigment-and-protein complex. If the plant is lucky, the light transforms into a particle-like packet of energy called an exciton. The exciton traverses the receptor complex, then other complexes. Eventually, the exciton finds a spot where it can enable processes such as leaf growth. 

A high fraction of the impinging photons—85%—transform into excitons. How do plants convert and transport energy as efficiently as they do?

Gabriela’s group aims to find out—not by testing natural light-harvesting complexes, but by building complexes themselves. The experimentalists mimic the complex’s protein using DNA. You can fold DNA into almost any shape you want, by choosing the DNA’s base pairs (basic units) adroitly and by using “staples” formed from more DNA scraps. The sculpted molecules are called DNA origami.

Gabriela’s group engineers different DNA structures, analogous to complexes’ proteins, to have different properties. For instance, the experimentalists engineer rigid structures and flexible structures. Then, the group assesses how energy moves through each structure. Each structure forms an environment that influences excitons’ behaviors, similarly to how a memory-containing environment influences an atom.

Courtesy of Gabriela Schlau-Cohen

The Banff environment influenced me, stirring up memories like powder displaced by a skier on the slopes above us. I first participated in a BIRS workshop as a PhD student, and then I returned as a postdoc. Now, I was co-organizing a workshop to which I brought a PhD student of my own. Time flows, as we’re reminded while walking down the mountain from the Banff Centre into town: A cemetery borders part of the path. Time flows, but we belong to that thermodynamically remarkable class of systems that retain memories…memories and a few other treasures that resist change, such as friendships held since kindergarten.

1Plushy versions of Simba and Nala from The Lion King. I remain grateful to her for letting me play at being Nala.

2I’d request milk, another kid would request apple juice, and everyone else would request orange juice.

March 04, 2023

Jordan EllenbergI’d like to make a request, II

In re my last post about WIBA Madison’s Classic Rock; a couple of days later I was listening again and once again the DJ was taking listener calls, but this time it was because he was angry that McDonald’s was using Cardi B as a spokeswoman; he wanted the listener’s opinion on whether Cardi B indeed represented, as McDonald’s put it, “the center of American culture” and if so what could be done about it. Nothing, the listeners agreed, could be done about this sad, the listeners agreed, state of affairs. It has probably been 20 years since I heard the phrase “rap music” uttered, certainly that long since I heard it uttered so many times in a row and with such nonplus.

March 03, 2023

Matt von HippelVisiting CERN

So, would you believe I’ve never visited CERN before?

I was at CERN for a few days this week, visiting friends and collaborators and giving an impromptu talk. Surprisingly, this is the first time I’ve been, a bit of an embarrassing admission for someone who’s ostensibly a particle physicist.

Despite that, CERN felt oddly familiar. The maze of industrial buildings and winding roads, the security gates and cards (and work-arounds for when you arrive outside of card-issuing hours, assisted by friendly security guards), the constant construction and remodeling, all of it reminded me of the times I visited SLAC during my PhD. This makes a lot of sense, of course: one accelerator is at least somewhat like another. But besides a visit to Fermilab for a conference several years ago, I haven’t been in many other places like that since then.

(One thing that might have also been true of SLAC and Fermilab but I never noticed: CERN buildings not only have evacuation instructions for the building in case of a fire, but also evacuation instructions for the whole site.)

CERN is a bit less “pretty” than SLAC on average, without the nice grassy area in the middle or the California sun that goes with it. It makes up for it with what seems like more in terms of outreach resources, including a big wooden dome of a mini-museum sponsored by Rolex, and a larger visitor center still under construction.

The outside, including a sculpture depicting the history of science with the Higgs boson discovery on the “cutting edge”
The inside. Bubbles on the ground contain either touchscreens or small objects (detectors, papers, a blackboard with the string theory genus expansion for some reason). Bubbles in the air were too high for me to check.

CERN hosts a variety of theoretical physicists doing various different types of work. I was hosted by the “QCD group”, but the string theorists just down the hall include a few people I know as well. The lounge had a few cardboard signs hidden under the table, leftovers of CERN’s famous yearly Christmas play directed by John Ellis.

It’s been a fun, if brief, visit. I’ll likely get to see a bit more this summer, when they host Amplitudes 2023. Until then, it was fun reconnecting with that “accelerator feel”.

February 28, 2023

Jordan EllenbergI’d like to make a request

I was listening to WIBA 101.5 Madison’s Classic Rock in the car while driving home from an east side errand and heard something that startled me — the DJ taking requests from listeners calling in! Now that startled me — why wait on hold on the phone to talk to a DJ when in 2023 you can hear any song you want at any time, instantly?

And then I thought about it a little more, and realized, it’s not about hearing the song, it’s about getting other people to hear the song. Like me, in the car. 2023 is a golden age of listening to whatever you want but is an absolute wasteland for playing music for other people because everybody is able to listen to whatever they want! So there’s much less picking music for the whole room or picking music for the whole city. But at WIBA they still do it! And so listeners got to play me, in my car, this song

and this song

neither of which was really my cup of tea, but that’s the point, radio offers us the rare opportunity to listen to not whatever we want.

February 24, 2023

Matt von HippelThe Temptation of Spinoffs

Read an argument for a big scientific project, and you’ll inevitably hear mention of spinoffs. Whether it’s NASA bringing up velcro or CERN and the World-Wide Web, scientists love to bring up times when a project led to some unrelated technology that improved peoples’ lives.

Just as inevitably as they show up, though, these arguments face criticism. Advocates of the projects argue that promoting spinoffs misses the point, training the public to think about science in terms of unrelated near-term gadgets rather than the actual point of the experiments. They think promoters should focus on the scientific end-goals, justifying them either in terms of benefit to humanity or as a broader, “it makes the country worth defending” human goal. It’s a perspective that shows up in education too, where even when students ask “when will I ever use this in real life?” it’s not clear that’s really what they mean.

On the other side, opponents of the projects will point out that the spinoffs aren’t good enough to justify the science. Some, like velcro, weren’t actually spinoffs to begin with. Others seem like tiny benefits compared to the vast cost of the scientific projects, or like things that would have been much easier to get with funding that was actually dedicated to achieving the spinoff.

With all these downsides, why do people keep bringing spinoffs up? Are they just a cynical attempt to confuse people?

I think there’s something less cynical going on here. Things make a bit more sense when you listen to what the scientists say, not to the public, but when talking to scientists in other disciplines.

Scientists speaking to fellow scientists still mention spinoffs, but they mention scientific spinoffs. The speaker in a talk I saw recently pointed out that the LHC doesn’t just help with particle physics: by exploring the behavior of collisions of high-energy atomic nuclei it provides essential information for astrophysicists understanding neutron stars and cosmologists studying the early universe. When these experiments study situations we can’t model well, they improve the approximations we use to describe those situations in other contexts. By knowing more, we know more. Knowledge builds on knowledge, and the more we know about the world the more we can do, often in surprising and un-planned ways.

I think that when scientists promote spinoffs to the public, they’re trying to convey this same logic. Like promoting an improved understanding of stars to astrophysicists, they’re modeling the public as “consumer goods scientists” and trying to pick out applications they’d find interesting.

Knowing more does help us know more, that much is true. And eventually that knowledge can translate to improving people’s lives. But in a public debate, people aren’t looking for these kinds of principles, let alone a scientific “I’ll scratch your back if you’ll scratch mine”. They’re looking for something like a cost-benefit analysis, “why are we doing this when we could do that?”

(This is not to say that most public debates involve especially good cost-benefit analysis. Just that it is, in the end, what people are trying to do.)

Simply listing spinoffs doesn’t really get at this. The spinoffs tend to be either small enough that they don’t really argue the point (velcro, even if NASA had invented it, could probably have been more cheaply found without a space program), or big but extremely unpredictable (it’s not like we’re going to invent another world-wide web).

Focusing on the actual end-products of the science should do a bit better. That can include “scientific spinoffs”, if not the “consumer goods spinoffs”. Those collisions of heavy nuclei change our understanding of how we model complex systems. That has applications in many areas of science, from how we model stars to materials to populations, and those applications in turn could radically improve people’s lives.

Or, well, they could not. Basic science is very hard to do cost-benefit analyses with. It’s the fabled explore/exploit dilemma, whether to keep trying to learn more or focus on building on what you have. If you don’t know what’s out there, if you don’t know what you don’t know, then you can’t really solve that dilemma.

So I get the temptation of reaching to spinoffs, of pointing to something concrete in everyday life and saying “science did that!” Science does radically improve people’s lives, but it doesn’t always do it especially quickly. You want to teach people that knowledge leads to knowledge, and you try to communicate it the way you would to other scientists, by saying how your knowledge and theirs intersect. But if you want to justify science to the public, you want something with at least the flavor of cost-benefit analysis. And you’ll get more mileage out of that if you think about where the science itself can go, than if you focus on the consumer goods it accidentally spins off along the way.

February 21, 2023

Tommaso DorigoAirline Crashes: Your Odds Of Going Down In Flames

During my long trip to South America, which just ended (leaving me fighting with a record pile of unanswered emails, an even higher pile of laundry, and a headache for jetlag-induced sleep deprivation), I had the real pleasure to make acquaintance with a Colonel of the British army (Guy Wood) during a cruise of the Galapagos archipelago. One of the recurring topics of our evening chats was of course international travel - the cause of our encounter - and in one occasion he pointed out that Air France is the airline with the worst record in terms of plane accidents.

read more

February 20, 2023

Terence TaoAn improvement to Bennett’s inequality for the Poisson distribution

If {\lambda>0}, a Poisson random variable {{\bf Poisson}(\lambda)} with mean {\lambda} is a random variable taking values in the natural numbers with probability distribution

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) = k) = e^{-\lambda} \frac{\lambda^k}{k!}.

One is often interested in bounding upper tail probabilities

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u))

for {u \geq 0}, or lower tail probabilities

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u))

for {-1 < u \leq 0}. A standard tool for this is Bennett’s inequality:

Proposition 1 (Bennett’s inequality) One has

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp(-\lambda h(u))

for {u \geq 0} and

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \leq \exp(-\lambda h(u))

for {-1 < u \leq 0}, where

\displaystyle  h(u) := (1+u) \log(1+u) - u.

From the Taylor expansion {h(u) = \frac{u^2}{2} + O(u^3)} for {u=O(1)} we conclude Gaussian type tail bounds in the regime {u = o(1)} (and in particular when {u = O(1/\sqrt{\lambda})} (in the spirit of the Chernoff, Bernstein, and Hoeffding inequalities). but in the regime where {u} is large and positive one obtains a slight gain over these other classical bounds (of {\exp(- \lambda u \log u)} type, rather than {\exp(-\lambda u)}).

Proof: We use the exponential moment method. For any {t \geq 0}, we have from Markov’s inequality that

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq e^{-t \lambda(1+u)} {\bf E} \exp( t {\bf Poisson}(\lambda) ).

A standard computation shows that the moment generating function of the Poisson distribution is given by

\displaystyle  \exp( t {\bf Poisson}(\lambda) ) = \exp( (e^t - 1) \lambda )

and hence

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp( (e^t - 1)\lambda - t \lambda(1+u) ).

For {u \geq 0}, it turns out that the right-hand side is optimized by setting {t = \log(1+u)}, in which case the right-hand side simplifies to {\exp(-\lambda h(u))}. This proves the first inequality; the second inequality is proven similarly (but now {u} and {t} are non-positive rather than non-negative). \Box

Remark 2 Bennett’s inequality also applies for (suitably normalized) sums of bounded independent random variables. In some cases there are direct comparison inequalities available to relate those variables to the Poisson case. For instance, suppose {S = X_1 + \dots + X_n} is the sum of independent Boolean variables {X_1,\dots,X_n \in \{0,1\}} of total mean {\sum_{j=1}^n {\bf E} X_j = \lambda} and with {\sup_i {\bf P}(X_i) \leq \varepsilon} for some {0 < \varepsilon < 1}. Then for any natural number {k}, we have

\displaystyle  {\bf P}(S=k) = \sum_{1 \leq i_1 < \dots < i_k \leq n} {\bf P}(X_{i_1}=1) \dots {\bf P}(X_{i_k}=1)

\displaystyle  \prod_{i \neq i_1,\dots,i_k} {\bf P}(X_i=0)

\displaystyle  \leq \frac{1}{k!} (\sum_{i=1}^n \frac{{\bf P}(X_i=1)}{{\bf P}(X_i=0)})^k \times \prod_{i=1}^n {\bf P}(X_i=0)

\displaystyle  \leq \frac{1}{k!} (\frac{\lambda}{1-\varepsilon})^k \prod_{i=1}^n \exp( - {\bf P}(X_i = 1))

\displaystyle  \leq e^{-\lambda} \frac{\lambda^k}{(1-\varepsilon)^k k!}

\displaystyle  \leq e^{\frac{\varepsilon}{1-\varepsilon} \lambda} {\bf P}( \mathbf{Poisson}(\frac{\lambda}{1-\varepsilon}) = k).

As such, for {\varepsilon} small, one can efficiently control the tail probabilities of {S} in terms of the tail probability of a Poisson random variable of mean close to {\lambda}; this is of course very closely related to the well known fact that the Poisson distribution emerges as the limit of sums of many independent boolean variables, each of which is non-zero with small probability. See this paper of Bentkus and this paper of Pinelis for some further useful (and less obvious) comparison inequalities of this type.

In this note I wanted to record the observation that one can improve the Bennett bound by a small polynomial factor once one leaves the Gaussian regime {u = O(1/\sqrt{\lambda})}, in particular gaining a factor of {1/\sqrt{\lambda}} when {u \sim 1}. This observation is not difficult and is implicitly in the literature (one can extract it for instance from the much more general results of this paper of Talagrand, and the basic idea already appears in this paper of Glynn), but I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog. (But if a reader knows of a reference that basically contains the bound below, I would be happy to know of it.)

Proposition 3 (Improved Bennett’s inequality) One has

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}

for {u \geq 0} and

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}

for {-1 < u \leq 0}.

Proof: We begin with the first inequality. We may assume that {u \geq 1/\sqrt{\lambda}}, since otherwise the claim follows from the usual Bennett inequality. We expand out the left-hand side as

\displaystyle  e^{-\lambda} \sum_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.

Observe that for {k \geq \lambda(1+u)} that

\displaystyle  \frac{\lambda^{k+1}}{(k+1)!} \leq \frac{1}{1+u} \frac{\lambda^{k}}{k!} .

Thus the sum is dominated by the first term times a geometric series {\sum_{j=0}^\infty \frac{1}{(1+u)^j} = 1 + \frac{1}{u}}. We can thus bound the left-hand side by

\displaystyle  \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.

By the Stirling approximation, this is

\displaystyle  \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{1}{\sqrt{k}} \frac{(e\lambda)^k}{k^k}.

The expression inside the supremum is decreasing in {k} for {k > \lambda}, thus we can bound it by

\displaystyle  \ll e^{-\lambda} (1 + \frac{1}{u}) \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},

which simplifies to

\displaystyle  \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}

after a routine calculation.

Now we turn to the second inequality. As before we may assume that {u \leq -1/\sqrt{\lambda}}. We first dispose of a degenerate case in which {\lambda(1+u) < 1}. Here the left-hand side is just

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) = 0 ) = e^{-\lambda}

and the right-hand side is comparable to

\displaystyle  e^{-\lambda} \exp( - \lambda (1+u) \log (1+u) + \lambda(1+u) ) / \sqrt{\lambda(1+u)}.

Since {-\lambda(1+u) \log(1+u)} is negative and {0 < \lambda(1+u) < 1}, we see that the right-hand side is {\gg e^{-\lambda}}, and the estimate holds in this case.

It remains to consider the regime where {u \leq -1/\sqrt{\lambda}} and {\lambda(1+u) \geq 1}. The left-hand side expands as

\displaystyle  e^{-\lambda} \sum_{k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.

The sum is dominated by the first term times a geometric series {\sum_{j=-\infty}^0 \frac{1}{(1+u)^j} = \frac{1}{|u|}}. The maximal {k} is comparable to {\lambda(1+u)}, so we can bound the left-hand side by

\displaystyle  \ll e^{-\lambda} \frac{1}{|u|} \sup_{\lambda(1+u) \ll k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.

Using the Stirling approximation as before we can bound this by

\displaystyle  \ll e^{-\lambda} \frac{1}{|u|} \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},

which simplifies to

\displaystyle  \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}

after a routine calculation. \Box

The same analysis can be reversed to show that the bounds given above are basically sharp up to constants, at least when {\lambda} (and {\lambda(1+u)}) are large.

John PreskillA (quantum) complex legacy: Part deux

I didn’t fancy the research suggestion emailed by my PhD advisor.

A 2016 email from John Preskill led to my publishing a paper about quantum complexity in 2022, as I explained in last month’s blog post. But I didn’t explain what I thought of his email upon receiving it.

It didn’t float my boat. (Hence my not publishing on it until 2022.)

The suggestion contained ingredients that ordinarily would have caulked any cruise ship of mine: thermodynamics, black-hole-inspired quantum information, and the concept of resources. John had forwarded a paper drafted by Stanford physicists Adam Brown and Lenny Susskind. They act as grand dukes of the community sussing out what happens to information swallowed by black holes. 

From Rare-Gallery

We’re not sure how black holes work. However, physicists often model a black hole with a clump of particles squeezed close together and so forced to interact with each other strongly. The interactions entangle the particles. The clump’s quantum state—let’s call it | \psi(t) \rangle—grows not only complicated with time (t), but also complex in a technical sense: Imagine taking a fresh clump of particles and preparing it in the state | \psi(t) \rangle via a sequence of basic operations, such as quantum gates performable with a quantum computer. The number of basic operations needed is called the complexity of | \psi(t) \rangle. A black hole’s state has a complexity believed to grow in time—and grow and grow and grow—until plateauing. 

This growth echoes the second law of thermodynamics, which helps us understand why time flows in only one direction. According to the second law, every closed, isolated system’s entropy grows until plateauing.1 Adam and Lenny drew parallels between the second law and complexity’s growth.

The less complex a quantum state is, the better it can serve as a resource in quantum computations. Recall, as we did last month, performing calculations in math class. You needed clean scratch paper on which to write the calculations. So does a quantum computer. “Scratch paper,” to a quantum computer, consists of qubits—basic units of quantum information, realized in, for example, atoms or ions. The scratch paper is “clean” if the qubits are in a simple, unentangled quantum state—a low-complexity state. A state’s greatest possible complexity, minus the actual complexity, we can call the state’s uncomplexity. Uncomplexity—a quantum state’s blankness—serves as a resource in quantum computation.

Manny Knill and Ray Laflamme realized this point in 1998, while quantifying the “power of one clean qubit.” Lenny arrived at a similar conclusion while reasoning about black holes and firewalls. For an introduction to firewalls, see this blog post by John. Suppose that someone—let’s call her Audrey—falls into a black hole. If it contains a firewall, she’ll burn up. But suppose that someone tosses a qubit into the black hole before Audrey falls. The qubit kicks the firewall farther away from the event horizon, so Audrey will remain safe for longer. Also, the qubit increases the uncomplexity of the black hole’s quantum state. Uncomplexity serves as a resource also to Audrey.

A resource is something that’s scarce, valuable, and useful for accomplishing tasks. Different things qualify as resources in different settings. For instance, imagine wanting to communicate quantum information to a friend securely. Entanglement will serve as a resource. How can we quantify and manipulate entanglement? How much entanglement do we need to perform a given communicational or computational task? Quantum scientists answer such questions with a resource theory, a simple information-theoretic model. Theorists have defined resource theories for entanglement, randomness, and more. In many a blog post, I’ve eulogized resource theories for thermodynamic settings. Can anyone define, Adam and Lenny asked, a resource theory for quantum uncomplexity?

Resource thinking pervades our world.

By late 2016, I was a quantum thermodynamicist, I was a resource theorist, and I’d just debuted my first black-hole–inspired quantum information theory. Moreover, I’d coauthored a review about the already-extant resource theory that looked closest to what Adam and Lenny sought. Hence John’s email, I expect. Yet that debut had uncovered reams of questions—questions that, as a budding physicist heady with the discovery of discovery, I could own. Why would I answer a question of someone else’s instead?

So I thanked John, read the paper draft, and pondered it for a few days. Then, I built a research program around my questions and waited for someone else to answer Adam and Lenny.

Three and a half years later, I was still waiting. The notion of uncomplexity as a resource had enchanted the black-hole-information community, so I was preparing a resource-theory talk for a quantum-complexity workshop. The preparations set wheels churning in my mind, and inspiration struck during a long walk.2

After watching my workshop talk, Philippe Faist reached out about collaborating. Philippe is a coauthor, a friend, and a fellow quantum thermodynamicist and resource theorist. Caltech’s influence had sucked him, too, into the black-hole community. We Zoomed throughout the pandemic’s first spring, widening our circle to include Teja Kothakonda, Jonas Haferkamp, and Jens Eisert of Freie University Berlin. Then, Anthony Munson joined from my nascent group in Maryland. Physical Review A published our paper, “Resource theory of quantum uncomplexity,” in January.

The next four paragraphs, I’ve geared toward experts. An agent in the resource theory manipulates a set of n qubits. The agent can attempt to perform any gate U on any two qubits. Noise corrupts every real-world gate implementation, though. Hence the agent effects a gate chosen randomly from near U. Such fuzzy gates are free. The agent can’t append or discard any system for free: Appending even a maximally mixed qubit increases the state’s uncomplexity, as Knill and Laflamme showed. 

Fuzzy gates’ randomness prevents the agent from mapping complex states to uncomplex states for free (with any considerable probability). Complexity only grows or remains constant under fuzzy operations, under appropriate conditions. This growth echoes the second law of thermodynamics. 

We also defined operational tasks—uncomplexity extraction and expenditure analogous to work extraction and expenditure. Then, we bounded the efficiencies with which the agent can perform these tasks. The efficiencies depend on a complexity entropy that we defined—and that’ll star in part trois of this blog-post series.

Now, I want to know what purposes the resource theory of uncomplexity can serve. Can we recast black-hole problems in terms of the resource theory, then leverage resource-theory results to solve the black-hole problem? What about problems in condensed matter? Can our resource theory, which quantifies the difficulty of preparing quantum states, merge with the resource theory of magic, which quantifies that difficulty differently?

Unofficial mascot for fuzzy operations

I don’t regret having declined my PhD advisor’s recommendation six years ago. Doing so led me to explore probability theory and measurement theory, collaborate with two experimental labs, and write ten papers with 21 coauthors whom I esteem. But I take my hat off to Adam and Lenny for their question. And I remain grateful to the advisor who kept my goals and interests in mind while checking his email. I hope to serve Anthony and his fellow advisees as well.

1…en route to obtaining a marriage license. My husband and I married four months after the pandemic throttled government activities. Hours before the relevant office’s calendar filled up, I scored an appointment to obtain our license. Regarding the metro as off-limits, my then-fiancé and I walked from Cambridge, Massachusetts to downtown Boston for our appointment. I thank him for enduring my requests to stop so that I could write notes.

2At least, in the thermodynamic limit—if the system is infinitely large. If the system is finite-size, its entropy grows on average.

February 18, 2023

Terence TaoWould it be possible to create a tool to automatically diagram papers?

This is a somewhat experimental and speculative post. This week I was at the IPAM workshop on machine assisted proof that I was one of the organizers of. We had an interesting and diverse range of talks, both from computer scientists presenting the latest available tools to formally verify proofs or to automate various aspects of proof writing or proof discovery, as well as mathematicians who described their experiences using these tools to solve their research problems. One can find the videos of these talks on the IPAM youtube channel; I also posted about the talks during the event on my Mathstodon account. I am of course not the most objective person to judge, but from the feedback I received it seems that the conference was able to successfully achieve its aim of bringing together the different communities interested in this topic.

As a result of the conference I started thinking about what possible computer tools might now be developed that could be of broad use to mathematicians, particularly those who do not have prior expertise with the finer aspects of writing code or installing software. One idea that came to mind was a potential tool to could take, say, an arXiv preprint as input, and return some sort of diagram detailing the logical flow of the main theorems and lemmas in the paper. This is currently done by hand by authors in some, but not all, papers (and can often also be automatically generated from formally verified proofs, as seen for instance in the graphic accompanying the IPAM workshop, or this diagram generated from Massot’s blueprint software from a manually inputted set of theorems and dependencies as a precursor to formalization of a proof [thanks to Thomas Bloom for this example]). For instance, here is a diagram that my co-author Rachel Greenfeld and I drew for a recent paper:

This particular diagram incorporated a number of subjective design choices regarding layout, which results to be designated important enough to require a dedicated box (as opposed to being viewed as a mere tool to get from one box to another), and how to describe each of these results (and how to colour-code them). This is still a very human-intensive task (and my co-author and I went through several iterations of this particular diagram with much back-and-forth discussion until we were both satisfied). But I could see the possibility of creating an automatic tool that could provide an initial “first approximation” to such a diagram, which a human user could then modify as they see fit (perhaps using some convenient GUI interface, for instance some variant of the Quiver online tool for drawing commutative diagrams in LaTeX).

As a crude first attempt at automatically generating such a diagram, one couuld perhaps develop a tool to scrape a LaTeX file to locate all the instances of the theorem environment in the text (i.e., all the formally identified lemmas, corollaries, and so forth), and for each such theorem, locate a proof environment instance that looks like it is associated to that theorem (doing this with reasonable accuracy may require a small amount of machine learning, though perhaps one could just hope that proximity of the proof environment instance to the theorem environment instance suffices in many cases). Then identify all the references within that proof environment to other theorems to start building the tree of implications, which one could then depict in a diagram such as the above. Such an approach would likely miss many of the implications; for instance, because many lemmas might not be proven using a formal proof environment, but instead by some more free-flowing text discussion, or perhaps a one line justification such as “By combining Lemma 3.4 and Proposition 3.6, we conclude”. Also, some references to other results in the paper might not proceed by direct citation, but by more indirect justifications such as “invoking the previous lemma, we obtain” or “by repeating the arguments in Section 3, we have”. Still, even such a crude diagram might still be helpful, both as a starting point for authors to make an improved diagram, or for a student trying to understand a lengthy paper to get some initial idea of the logical structure.

More advanced features might be to try to use more of the text of the paper to assign some measure of importance to individual results (and then weight the diagram correspondingly to highlight the more important results), to try to give each result a natural language description, and to somehow capture key statements that are not neatly encapsulated in a theorem environment instance, but I would imagine that such tasks should be deferred until some cruder proof-of-concept prototype can be demonstrated.

Anyway, I would be interested to hear opinions about whether this idea (or some modification thereof) is (a) actually feasible with current technology (or better yet, already exists in some form), and (b) of interest to research mathematicians.

February 17, 2023

Matt von HippelValentine’s Day Physics Poem 2023

Since Valentine’s Day was this week, it’s time for the next installment of my traditional Valentine’s Day Physics Poems. New readers, don’t let this drive you off, I only do it once a year! And if you actually like it, you can take a look at poems from previous years here.

Married to a Model

If you ever face a physics class distracted,
Rappers and footballers twinkling on their phones,
Then like an awkward youth pastor, interject,
“You know who else is married to a Model?”

Her name is Standard, you see,
Wife of fifty years to Old Man Physics,
Known for her beauty, charm, and strangeness too.
But Old Man Physics has a wandering eye,
and dreams of Models Beyond.

Let the old man bend your ear,
you’ll hear
a litany of Problems.

He’ll never understand her, so he starts.
Some matters she holds weighty, some feather-light
with nary rhyme or reason
(which he is owed, he’s sure).

She’s unnatural, he says,
(echoing Higgins et al.),
a set of rules he can’t predict.
(But with those rules, all else is possible.)

Some regularities she holds to fast, despite room for exception,
others breaks, like an ill-lucked bathroom mirror.

And then, he says, she’ll just blow up
(when taken to extremes),
while singing nonsense in the face of Gravity.

He’s been keeping a careful eye
and noticing anomalies
(and each time, confronting them,
finds an innocent explanation,
but no matter).

And he imagines others
with yet wilder curves
and more sensitive reactions
(and nonsense, of course,
that he’s lived fifty years without).

Old man physics talks,
that’s certain.
But beyond the talk,
beyond the phases and phrases,
(conscious uncoupling, non-empirical science),
he stays by her side.

He knows Truth, 
in this world,
is worth fighting for.

February 15, 2023

Jacques Distler MathML in Chrome

Thanks to the hard work of Frédéric Wang and the folks at Igalia, the Blink engine in Chrome 109 now supports MathML Core.

It took a little bit of work to get it working correctly in Instiki and on this blog.

  • The columnalign attribute is not supported, so a shim is needed to get the individual <mtd> to align correctly.
  • This commit enabled the display of SVG embedded in equations and got rid of the vertical scroll bars in equations.
  • Since Chrome does not support hyperlinks (either href or xlink:href attributes) on MathML elements, this slightly hacky workaround enabled hyperlinks in equations, as created by \href{url}{expression}.

There are a number of remaining issues.

  • Math accents don’t stretch, when they’re supposed to. Here are a few examples of things that (currently) render incorrectly in Chrome (some of them, admittedly, are incorrect in Safari too):

    V 1×V 2=i j k Xu Yu 0 Xv Yv 0 \mathbf{V}_{1} \times \mathbf{V}_{2} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\\\ \frac{\partial X}{\partial u} & \frac{\partial Y}{\partial u} & 0 \\ \frac{\partial X}{\partial v} & \frac{\partial Y}{\partial v} & 0 \\ \end{vmatrix}

    |f(z)f(a)1f(a)¯f(z)||za1a¯z| \left\vert\frac{f(z)-f(a)}{1-\overline{f(a)}f(z)}\right\vert\le \left\vert\frac{z-a}{1-\overline{a}z}\right\vert

    PGL˜(N) \widetilde{PGL}(N)

    P 1(Y) P 1(X) T T \begin{matrix} P_1(Y) &\to& P_1(X) \\ \downarrow &\Downarrow\mathrlap{\sim}& \downarrow \\ T' &\to& T \end{matrix}

    p 3(x)=(12)(x12)(x34)(x1)(1412)(1434)(141)+(12)(x12)(x34)(x1)(1412)(1434)(141)+(12)(x12)(x34)(x1)(1412)(1434)(141)+(12)(x12)(x34)(x1)(1412)(1434)(141) p_3 (x) = \left( {\frac{1}{2}} \right)\frac{{\left( {x - \frac{1}{2}} \right)\left( {x - \frac{3}{4}} \right)\left( {x - 1} \right)}}{{\left( {\frac{1}{4} - \frac{1}{2}} \right)\left( {\frac{1}{4} - \frac{3}{4}} \right)\left( {\frac{1}{4} - 1} \right)}} + \left( {\frac{1}{2}} \right)\frac{{\left( {x - \frac{1}{2}} \right)\left( {x - \frac{3}{4}} \right)\left( {x - 1} \right)}}{{\left( {\frac{1}{4} - \frac{1}{2}} \right)\left( {\frac{1}{4} - \frac{3}{4}} \right)\left( {\frac{1}{4} - 1} \right)}} + \left( {\frac{1}{2}} \right)\frac{{\left( {x - \frac{1}{2}} \right)\left( {x - \frac{3}{4}} \right)\left( {x - 1} \right)}}{{\left( {\frac{1}{4} - \frac{1}{2}} \right)\left( {\frac{1}{4} - \frac{3}{4}} \right)\left( {\frac{1}{4} - 1} \right)}} + \left( {\frac{1}{2}} \right)\frac{{\left( {x - \frac{1}{2}} \right)\left( {x - \frac{3}{4}} \right)\left( {x - 1} \right)}}{{\left( {\frac{1}{4} - \frac{1}{2}} \right)\left( {\frac{1}{4} - \frac{3}{4}} \right)\left( {\frac{1}{4} - 1} \right)}}

  • This equation <menclose notation="box">(i<menclose notation="updiagonalstrike">D</menclose>+m)ψ=0</menclose> \boxed{(i\slash{D}+m)\psi = 0} doesn’t display remotely correctly, because Chrome doesn’t implement the <menclose> element. Fixed now.

But, hey, this is amazing for a first release.


I added support for \boxed{} and \slash{}, both of which use <menclose>, which is not supported by Chrome. So now the above equation should render correctly in Chrome. Thanks to Monica Kang, for help with the CSS.

February 03, 2023

Clifford JohnsonThe Life Scientific Interview

After doing a night bottle feed of our youngest in the wee hours of the morning some nights earlier this week, in order to help me get back to sleep I decided to turn on BBC Sounds to find a programme to listen to... and lo and behold, look what had just aired live! The programme that I'd recorded at Broadcasting House a few weeks ago in London.

So it is out now. It is an episode of Jim Al-Khalili's excellent BBC Radio 4 programme "The Life Scientific". The show is very much in the spirit of what (as you know) I strive to do in my work in the public sphere (including this blog): discuss the science an individual does right alongside aspects of the broader life of that individual. I recommend listening to [...] Click to continue reading this post

The post The Life Scientific Interview appeared first on Asymptotia.

Tommaso DorigoTwo Possible Sites For The SWGO Gamma-Ray Detector Array

Yesterday I profited of the kindness of Cesar Ocampo, the site manager of the Parque Astronomico near San Pedro de Atacama, in northern Chile, to visit a couple of places that the SWGO collaboration is considering as the site of a large array of particle detectors meant to study ultra-high-energy gamma rays from the sky. 

SWGO and cosmic ray showers

read more

February 01, 2023

Terence TaoInfinite partial sumsets in the primes

Tamar Ziegler and I have just uploaded to the arXiv our paper “Infinite partial sumsets in the primes“. This is a short paper inspired by a recent result of Kra, Moreira, Richter, and Robertson (discussed for instance in this Quanta article from last December) showing that for any set {A} of natural numbers of positive upper density, there exists a sequence {b_1 < b_2 < b_3 < \dots} of natural numbers and a shift {t} such that {b_i + b_j + t \in A} for all {i<j} this answers a question of Erdős). In view of the “transference principle“, it is then plausible to ask whether the same result holds if {A} is replaced by the primes. We can show the following results:

Theorem 1
  • (i) If the Hardy-Littlewood prime tuples conjecture (or the weaker conjecture of Dickson) is true, then there exists an increasing sequence {b_1 < b_2 < b_3 < \dots} of primes such that {b_i + b_j + 1} is prime for all {i < j}.
  • (ii) Unconditionally, there exist increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i + b_j} is prime for all {i<j}.
  • (iii) These conclusions fail if “prime” is replaced by “positive (relative) density subset of the primes” (even if the density is equal to 1).

We remark that it was shown by Balog that there (unconditionally) exist arbitrarily long but finite sequences {b_1 < \dots < b_k} of primes such that {b_i + b_j + 1} is prime for all {i < j \leq k}. (This result can also be recovered from the later results of Ben Green, myself, and Tamar Ziegler.) Also, it had previously been shown by Granville that on the Hardy-Littlewood prime tuples conjecture, there existed increasing sequences {a_1 < a_2 < \dots} and {b_1 < b_2 < \dots} of natural numbers such that {a_i+b_j} is prime for all {i,j}.

The conclusion of (i) is stronger than that of (ii) (which is of course consistent with the former being conditional and the latter unconditional). The conclusion (ii) also implies the well-known theorem of Maynard that for any given {k}, there exist infinitely many {k}-tuples of primes of bounded diameter, and indeed our proof of (ii) uses the same “Maynard sieve” that powers the proof of that theorem (though we use a formulation of that sieve closer to that in this blog post of mine). Indeed, the failure of (iii) basically arises from the failure of Maynard’s theorem for dense subsets of primes, simply by removing those clusters of primes that are unusually closely spaced.

Our proof of (i) was initially inspired by the topological dynamics methods used by Kra, Moreira, Richter, and Robertson, but we managed to condense it to a purely elementary argument (taking up only half a page) that makes no reference to topological dynamics and builds up the sequence {b_1 < b_2 < \dots} recursively by repeated application of the prime tuples conjecture.

The proof of (ii) takes up the majority of the paper. It is easiest to phrase the argument in terms of “prime-producing tuples” – tuples {(h_1,\dots,h_k)} for which there are infinitely many {n} with {n+h_1,\dots,n+h_k} all prime. Maynard’s theorem is equivalent to the existence of arbitrarily long prime-producing tuples; our theorem is equivalent to the stronger assertion that there exist an infinite sequence {h_1 < h_2 < \dots} such that every initial segment {(h_1,\dots,h_k)} is prime-producing. The main new tool for achieving this is the following cute measure-theoretic lemma of Bergelson:

Lemma 2 (Bergelson intersectivity lemma) Let {E_1,E_2,\dots} be subsets of a probability space {(X,\mu)} of measure uniformly bounded away from zero, thus {\inf_i \mu(E_i) > 0}. Then there exists a subsequence {E_{i_1}, E_{i_2}, \dots} such that

\displaystyle  \mu(E_{i_1} \cap \dots \cap E_{i_k} ) > 0

for all {k}.

This lemma has a short proof, though not an entirely obvious one. Firstly, by deleting a null set from {X}, one can assume that all finite intersections {E_{i_1} \cap \dots \cap E_{i_k}} are either positive measure or empty. Secondly, a routine application of Fatou’s lemma shows that the maximal function {\limsup_N \frac{1}{N} \sum_{i=1}^N 1_{E_i}} has a positive integral, hence must be positive at some point {x_0}. Thus there is a subsequence {E_{i_1}, E_{i_2}, \dots} whose finite intersections all contain {x_0}, thus have positive measure as desired by the previous reduction.

It turns out that one cannot quite combine the standard Maynard sieve with the intersectivity lemma because the events {E_i} that show up (which roughly correspond to the event that {n + h_i} is prime for some random number {n} (with a well-chosen probability distribution) and some shift {h_i}) have their probability going to zero, rather than being uniformly bounded from below. To get around this, we borrow an idea from a paper of Banks, Freiberg, and Maynard, and group the shifts {h_i} into various clusters {h_{i,1},\dots,h_{i,J_1}}, chosen in such a way that the probability that at least one of {n+h_{i,1},\dots,n+h_{i,J_1}} is prime is bounded uniformly from below. One then applies the Bergelson intersectivity lemma to those events and uses many applications of the pigeonhole principle to conclude.

January 30, 2023

John PreskillA (quantum) complex legacy

Early in the fourth year of my PhD, I received a most John-ish email from John Preskill, my PhD advisor. The title read, “thermodynamics of complexity,” and the message was concise the way that the Amazon River is damp: “Might be an interesting subject for you.” 

Below the signature, I found a paper draft by Stanford physicists Adam Brown and Lenny Susskind. Adam is a Brit with an accent and a wit to match his Oxford degree. Lenny, known to the public for his books and lectures, is a New Yorker with an accent that reminds me of my grandfather. Before the physicists posted their paper online, Lenny sought feedback from John, who forwarded me the email.

The paper concerned a confluence of ideas that you’ve probably encountered in the media: string theory, black holes, and quantum information. String theory offers hope for unifying two physical theories: relativity, which describes large systems such as our universe, and quantum theory, which describes small systems such as atoms. A certain type of gravitational system and a certain type of quantum system participate in a duality, or equivalence, known since the 1990s. Our universe isn’t such a gravitational system, but never mind; the duality may still offer a toehold on a theory of quantum gravity. Properties of the gravitational system parallel properties of the quantum system and vice versa. Or so it seemed.

The gravitational system can have two black holes linked by a wormhole. The wormhole’s volume can grow linearly in time for a time exponentially long in the black holes’ entropy. Afterward, the volume hits a ceiling and approximately ceases changing. Which property of the quantum system does the wormhole’s volume parallel?

Envision the quantum system as many particles wedged close together, so that they interact with each other strongly. Initially uncorrelated particles will entangle with each other quickly. A quantum system has properties, such as average particle density, that experimentalists can measure relatively easily. Does such a measurable property—an observable of a small patch of the system—parallel the wormhole volume? No; such observables cease changing much sooner than the wormhole volume does. The same conclusion applies to the entanglement amongst the particles.

What about a more sophisticated property of the particles’ quantum state? Researchers proposed that the state’s complexity parallels the wormhole’s volume. To grasp complexity, imagine a quantum computer performing a computation. When performing computations in math class, you needed blank scratch paper on which to write your calculations. A quantum computer needs the quantum equivalent of blank scratch paper: qubits (basic units of quantum information, realized, for example, as atoms) in a simple, unentangled, “clean” state. The computer performs a sequence of basic operations—quantum logic gates—on the qubits. These operations resemble addition and subtraction but can entangle the qubits. What’s the minimal number of basic operations needed to prepare a desired quantum state (or to “uncompute” a given state to the blank state)? The state’s quantum complexity.1 

Quantum complexity has loomed large over multiple fields of physics recently: quantum computing, condensed matter, and quantum gravity. The latter, we established, entails a duality between a gravitational system and a quantum system. The quantum system begins in a simple quantum state that grows complicated as the particles interact. The state’s complexity parallels the volume of a wormhole in the gravitational system, according to a conjecture.2 

The conjecture would hold more water if the quantum state’s complexity grew similarly to the wormhole’s volume: linearly in time, for a time exponentially large in the quantum system’s size. Does the complexity grow so? The expectation that it does became the linear-growth conjecture.

Evidence supported the conjecture. For instance, quantum information theorists modeled the quantum particles as interacting randomly, as though undergoing a quantum circuit filled with random quantum gates. Leveraging probability theory,3 the researchers proved that the state’s complexity grows linearly at short times. Also, the complexity grows linearly for long times if each particle can store a great deal of quantum information. But what if the particles are qubits, the smallest and most ubiquitous unit of quantum information? The question lingered for years.

Jonas Haferkamp, a PhD student in Berlin, dreamed up an answer to an important version of the question.4 I had the good fortune to help formalize that answer with him and members of his research group: master’s student Teja Kothakonda, postdoc Philippe Faist, and supervisor Jens Eisert. Our paper, published in Nature Physics last year, marked step one in a research adventure catalyzed by John Preskill’s email 4.5 years earlier.

Imagine, again, qubits undergoing a circuit filled with random quantum gates. That circuit has some architecture, or arrangement of gates. Slotting different gates into the architecture effects different transformations5 on the qubits. Consider the set of all transformations implementable with one architecture. This set has some size, which we defined and analyzed.

What happens to the set’s size if you add more gates to the circuit—let the particles interact for longer? We can bound the size’s growth using the mathematical toolkits of algebraic geometry and differential topology. Upon bounding the size’s growth, we can bound the state’s complexity. The complexity, we concluded, grows linearly in time for a time exponentially long in the number of qubits.

Our result lends weight to the complexity-equals-volume hypothesis. The result also introduces algebraic geometry and differential topology into complexity as helpful mathematical toolkits. Finally, the set size that we bounded emerged as a useful concept that may elucidate circuit analyses and machine learning.

John didn’t have machine learning in mind when forwarding me an email in 2017. He didn’t even have in mind proving the linear-growth conjecture. The proof enables step two of the research adventure catalyzed by that email: thermodynamics of quantum complexity, as the email’s title stated. I’ll cover that thermodynamics in its own blog post. The simplest of messages can spin a complex legacy.

The links provided above scarcely scratch the surface of the quantum-complexity literature; for a more complete list, see our paper’s bibliography. For a seminar about the linear-growth paper, see this video hosted by Nima Lashkari’s research group.

1The term complexity has multiple meanings; forget the rest for the purposes of this article.

2According to another conjecture, the quantum state’s complexity parallels a certain space-time region’s action. (An action, in physics, isn’t a motion or a deed or something that Hamlet keeps avoiding. An action is a mathematical object that determines how a system can and can’t change in time.) The first two conjectures snowballed into a paper entitled “Does complexity equal anything?” Whatever it parallels, complexity plays an important role in the gravitational–quantum duality. 

3Experts: Such as unitary t-designs.

4Experts: Our work concerns quantum circuits, rather than evolutions under fixed Hamiltonians. Also, our work concerns exact circuit complexity, the minimal number of gates needed to prepare a state exactly. A natural but tricky extension eluded us: approximate circuit complexity, the minimal number of gates needed to approximate the state.

5Experts: Unitary operators.

January 24, 2023

Tommaso DorigoThe Interest Of High-School Students For Hard Sciences

Yesterday I visited a high school in Venice to deliver a lecture on particle physics, and to invite the participating students to take part in an art and science contest. This is part of the INFN "Art and Science across Italy" project, which has reached its fourth edition, organizes art exhibits with the students' creations in several cities across Italy. The best works are then selected for a final exhibit in Naples, and the 24 winners are offered a week-long visit to the CERN laboratories in Geneva, Switzerland.

read more

January 13, 2023

Clifford JohnsonWhat a Week!

Some Oxford scenesI’m sitting, for the second night in a row, in a rather pleasant restaurant in Oxford, somewhere on the walk between the physics department and my hotel. They pour a pretty good Malbec, and tonight I’ve had the wood-fired Guinea Fowl. I can hear snippets of conversation in the distance, telling me that many people who come here are regulars, and that correlates well with the fact that I liked the place immediately last night and decided I’d come back. The friendly staff remembered me and greeted me like a regular upon my return, which I liked. Gee’s is spacious with a high ceiling, and so I can sit away from everyone in a time where I’d still rather not be too cavalier with regards covid. On another occasion I might have sought out a famous pub with some good pub food and be elbow-to-elbow with students and tourists, but the phrase “too soon” came to mind when I walked by such establishments and glanced into the windows.

However, I am not here to do a restaurant review, although you might have thought that from the previous paragraph (the guinea fowl was excellent though, and the risotto last night was tasty, if a tiny bit over-salted for my tastes). Instead I find myself reflecting on […] Click to continue reading this post

The post What a Week! appeared first on Asymptotia.

Clifford JohnsonBBC Fun!

As I mentioned in the previous post, I had business at BBC Broadcasting House this week. I was recording an interview that I’ll fill you in on later on, closer to release of the finished programme. Recall that in the post I mentioned how amusing it would be for me … Click to continue reading this post

The post BBC Fun! appeared first on Asymptotia.

January 11, 2023

Matt Strassler Busy Writing a Book

Happy 2023 everyone!  You’ve noticed, no doubt, that the blog has been quiet recently.  That’s because I’ve got a book contract, with a deadline of March 31, 2023.  [The book itself won’t be published til spring 2024.]  I’ll tell you more about this in future posts. But over the next couple of months I’ll be a bit slow to answer questions and even slower to write content.  Fortunately, much of the content on this website is still current — the universe seems to be much the same in 2023 as it was in 2011 when the site was born. So poke around; I’m sure you’ll find something that interests you!

Richard EastherArm The Disruptors

Last week, Science Twitter was roiled by claims that “disruptive science” was on the wane and that this might be reversed by “reading widely”, taking “year long sabbaticals” and “focussing less on quantity … and more on …quality”. It blew up, which is probably not surprising given that it first pandered to our collective angst and then suggested some highly congenial remedies.

The Nature paper that kicked off this storm in our social media teacup is profusely illustrated with graphs and charts. The data is not uninteresting and does suggest that something about the practice of science has changed over the course of the last eight or nine decades. The problem is that it could also be Exhibit A in a demonstration of how data science can generate buzz while remaining largely disconnected from reality.

“Disruption” is a useful framework for discussing technological innovation (digital cameras render film obsolete; Netflix kills your neighbourhood video store, streaming music replaces CDs) but it is less clear to me that it can be applied directly to high-value science. “What is good?” is perhaps the oldest question in the book but the paper seems to skate past it.

The problem is (at least as I see it) many if not most scientific breakthroughs [1] extend the frontiers of knowledge rather than demolishing their forebears [2]. Even the biggest “paradigm shifts” often left their predecessors largely intact. Einstein arguably “disrupted” Newton but while film cameras and vinyl records are now the preserve of hipsters and purists, Newtonian physics is still at the heart of the field – as anyone who has taken first year physics or built a bridge that stood up can attest.

Similarly, quantum mechanics shattered the then-prevailing clockwork conception of the cosmos. However, its technical content was effectively a greenfield development since at a detailed level there was nothing for quantum mechanics to replace. By the end of the 1920s, however, quantum mechanics had given us the tools to explain almost everything that happens inside of an atom.

Consequently, as I see it, neither relativity or quantum mechanics really fits a conventional understanding of “disruption” even though they combine to create one the biggest revolutions ever seen in science. So that should be a problem if you are using “disruption” as a template for identifying interesting and important science.

Rather than making a qualitative assessment, the authors deploy a metric to measure disruption based on citation counts [3] – a widely cited paper whose own bibliographic antecedents then become less prominent is judged to be “disruptive” [4]. This leads to plots like the one below which focuses on Nobel winning papers and three “prestige” journals (Figure 5 from the paper).

If we take this study at its word, “disruption” has largely flatlined for the last fifty years. But one of the specific papers they identify – Riess et al.’s co-discovery of “dark energy” (or, more properly, observations suggesting that the rate at which the universe expands is picking up speed) is not rated as “disruptive” despite being the biggest upheaval in our understanding of the cosmos in a couple of generations.

Conversely, the discovery of the DNA double helix is measured to be “disruptive” — and it is certainly a watershed in our understanding of the the chemistry of life. The authors explain that it displaced an earlier “triple helix” model proposed by Linus Pauling – but Pauling’s scenario was less than a year old at this point so it was hardly an established incumbent knocked off its perch by a unexpected upstart. In fact, Watson and Crick’s 1953 discovery paper has only six references, and only one of those was published prior to 1952. Dirac’s 1928 paper scores well and it likewise has a handful of references and most those were similarly only a year or so old at the time of publication. However, the “disruption metric” looks for changes in citation patterns five years either side of publication. Consequently, even though there is no way their metric can produce meaningful data for these papers (given its reliance on a five year before-and-after comparison of citation counts) they single them out for special attention rather than filtering them and papers like them from their dataset.

What this suggests to me is that there has not been a sufficiently rigorous sniff-testing of the output of this algorithm. So on top of adopting a model of progress without really asking whether or not it captures the essence of “breakthrough” science the output of the metric used to assess it was often reverse-engineered to justify the numerical values it yields.

The concern that science is increasingly driven by “bean counting” and a publish or perish mentality that is at odds with genuine progress is widespread, and my own view (like most scientists, I would guess) is that there is truth to it. There is certainly a lot of frog-boiling in academia: it is indeed a challenge for working scientists to get long periods to reflect and explore and junior scientists are locked into a furiously competitive job market that offers little security to its participants.

Ironically, though, one key contributor to this pressure-cooker in which we find ourselves is Nature itself, the journal that published this paper. And Nature not only published it but hyped it in a news article – an incestuous coupling between peer reviewed content and “news” that can make the careers of those fortunate enough to participate in it. However, it is widely argued that this practice makes Nature itself a contributor to any decline of scientific quality that may be taking place by nudging authors to hype their work in ways not fully justified by their actual results. But “turning off the hype machine” is not one of the proposed solutions to our problems — and a cynic might suggest that this could be because it would also disable the money spigot that generates many millions of dollars a year for Nature’s very-definitely for-profit owners.

To some extent this is just me being cranky, since I spent part of last week at a slow simmer every time I saw this work flash by on a screen. But it matters, because this sort of analysis can find its way into debates about how to “fix” the supposed problems of science. And there certainly are many ways in which we could make science better. But before we prescribe we would be wise to accurately determine the symptoms of its illness. Coming up with numerical metrics to measure quality and impact in science is enormously tempting since it converts an otherwise laborious and qualitative process into something that it is both quantitative and automated [5] — but it is also very difficult, and it hasn’t happened here.

Ironically, the authors of this work are a professor in a management school, his PhD student and a sociologist who claim all expertise in “innovation” and “entrepreneurship”. Physicists are often seen as more willing than most to have opinions on matters outside of our professional domain and we are increasingly likely to be rebuked for failures to “stay in our lane”. But that advice cuts both ways; if you want to have opinions on science maybe you should work with people who have real expertise in the fields you hope to assess?

[1] I am going to focus on physics, since that is what I know best – but the pattern is claimed to be largely field-independent.

[2] There are exceptions. The heliocentric solar system supplanted the geocentric view and “caloric fluid” is no longer seen as a useful description of heat, but the norm for physics (and much of 20th century chemistry and biology, so far as I can see) is to “amend and extend”. There are often competing explanations for a phenomenon – e.g. Big Bang cosmology v. Steady State – only one of which can “win”, but these more closely resemble rivalries like the contest between BetaMax and VHS than “disruption”.

[3] They also make an argument that the language we use to talk about scientific results has changed over time, but most of the story has been based on their “disruption” metric.

[4] It had been used previously on patent applications (which must list “prior art”) by one of the authors, where it may actually make more sense.

[5] See also my views on the h-index.

Banner image:

January 09, 2023

Clifford JohnsonW1A

[caption id="attachment_20038" align="aligncenter" width="499"]Brpmpton bicycle rental lockers. Brompton bicycle rental lockers.[/caption]
I’ll be visiting Broadcasting House during my time here in London this week, for reasons I’ll mention later. Needless to say (almost), as a Brompton rider, and fan of the wonderful show W1A, I feel a sense of regret that I don’t have my bike here so that I can ride up to the front of the building on it. you won’t know what I’m talking about if you don’t know the show. Well, last night I was a-wandering and saw the rental option shown in the photo. It is very tempting…

Click to continue reading this post

The post W1A appeared first on Asymptotia.

Clifford JohnsonBack East

[I was originally going to use the title “Back Home”, but then somehow this choice had a resonance to it that I liked. (Also reminds me of a lovely Joshua Redman album…)] So I am back in London, my home town. And since I’ve got 8 hour jet lag, I’m … Click to continue reading this post

The post Back East appeared first on Asymptotia.

December 26, 2022

Mark Chu-CarrollHow Computers Work: Arithmetic With Gates

In my last post, I promised that I’d explain how we can build interesting mathematical operations using logic gates. In this post, I’m going to try do that by walking through the design of a circuit that adds to multi-bit integers together.

As I said last time, most of the time, when we’re trying to figure out how to do something with gates, it’s useful to with boolean algebra to figure out what we want to build. If we have two bits, what does it mean, in terms of boolean logic to add them?

Each input can be either 0 or 1. If they’re both one, then the sum is 0. If either, but not both, is one, then the sum is 1. If both are one, then the sum is 2. So there’s three possible outputs: 0, 1, and 2.

This brings us to the first new thing: we’re building something that operates on single bits as inputs, but it needs to have more than one bit of output! A single boolean output can only have two possible values, but we need to have three.

The way that we usually describe it is that a single bit adder takes two inputs, X and Y, and produces two outputs, SUM and CARRY. Sum is called the “low order” bit, and carry is the “high order” bit – meaning that if you interpret the output as a binary number, you put the higher order bits to the left of the low order bits. (Don’t worry, this will get clearer in a moment!)

Let’s look at the truth table for a single bit adder, with a couple of extra columns to help understand how we intepret the outputs:


If we look at the SUM bit, it’s an XOR – that is, it outputs 1 if exactly one, but not both, of its inputs is 1; otherwise, it outputs 0. And if we look at the carry bit, it’s an AND. Our definition of one-bit addition is thus:

  • SUM = X \oplus Y
  • CARRY = X \land Y

We can easily build that with gates:

A one-bit half-adder

This little thing is called a half-adder. That may seem confusing at first, because it is adding two one-bit values. But we don’t really care about adding single bits. We want to add numbers, which consist of multiple bits, and for adding pairs of bits from multibit numbers, a half-adder only does half the work.

That sounds confusing, so let’s break it down a bit with an example.

  • Imagine that we’ve got two two bit numbers, 1 and 3 that we want to add together.
  • In binary 1 is 01, and 3 is 11.
  • If we used the one-bit half-adders for the 0 bit (that is, the lowest order bit – in computer science, we always start counting with 0), we’d get 1+1=0, with a carry of 1; and for the 1 bit, we’d get 1+0=1 with a carry of 0. So our sum would be 10, which is 2.
  • That’s wrong, because we didn’t do anything with the carry output from bit 0. We need to include that as an input to the sum of bit 1.

We could try starting with the truth table. That’s always a worthwile thing to do. But it gets really complicated really quickly.


This is a nice illustration of why designing CPUs is so hard, and why even massively analyzed and tested CPUs still have bugs! We’re looking at one of the simplest operations to implement; and we’re only looking at it for 2 bits of input. But already, it’s hard to decide what to include in the table, and to read the table to understand what’s going on. We’re not really going to be able to do much reasoning here directly in boolean logic using the table. But it’s still good practice, both because it helps us make sure we understand what outputs we want, and because it gives us a model to test against once we’ve build the network of gates.

And there’s still some insight to be gained here: Look at the row for 1 + 3. In two bit binary, that’s 01 + 11. The sum for bit 0 is 0 – there’s no extra input to worry about,
but it does generate a carry out. The sum of the input bits for bit
one is 1+1=10 – so 0 with a carry bit. But we have the carry from bit
0 – that needs to get added to the sum for bit1. If we do that – if we do another add step to add the carry bit from bit 0 to the sum from bit 1, then we’ll get the right result!

The resulting gate network for two-bit addition looks like:

The adder for bit 1, which is called a full adder, adds the input bits X1 and Y1, and then adds the sum of those (produced by that first adder) to the carry bit from bit0. With this gate network, the output from the second adder for bit 1 is the correct value for bit 1 of the sum, but we’ve got two different carry outputs – the carry from the first adder for bit 1, and the carry from the second adder. We need to combine those somehow – and the way to do it is an OR gate.

Why an OR gate? The second adder will only produce a carry if the first adder produced a 1 as its output. But there’s no way that adding two bits can produce both a 1 as its sum output and a 1 as its carry output. So the carry bit from the second adder will only ever be 1 if the output of the first adder is 0; and the carry output from the first adder will only ever be 1 if the sum output from the first carry is 0. Only one of the two carries will ever be true, but if either of them is true, we should produce a 1 as the carry output. Thus, the or-gate.

Our full adder, therefore, takes 3 inputs: a carry from the next lower bit, and the two bits to sum; and it outputs two bits: a sum and a carry. Inside, it’s just two adders chained together, so that first we add the two sum inputs, and then we add the sum of that to the incoming carry.

For more than two bits, we just keep chaining full adders together. For example,
here’s a four-bit adder.

This way of implementing sum is called a ripple carry adder – because the carry bits ripple up through the gates. It’s not the most efficient way of producing a sum – each higher order bit of the inputs can’t be added together until the next lower bit is done, so the carry ripples through the network as each bit finishes, and the total time required is proportional to the number of bits to be summed. More bits means that the ripple-carry adder gets slower. But this works, and it’s pretty easy to understand.

There are faster ways to build multibit adders, by making the gate network more complicated in order to remove the ripple delay. You can imagine, for example, that instead of waiting for the carry from bit 0, you could just build the circuit for bit 1 so that it inputs X0 and Y0; and similarly, for bit 2, you could include X0, X1, Y0, and Y1 as additional inputs. You can imagine how this gets complicated quickly, and there are some timing issues that come up as the network gets more complicated, which I’m really not competent to explain.

Hopefully this post successfully explained a bit of how interesting operations like arithmetic can be implemented in hardware, using addition as an example. There are similar gate networks for subtraction, multiplication, etc.

These kinds of gate networks for specific operations are parts of real CPUs. They’re called functional units. In the simplest design, a CPU has one functional unit for each basic arithmetic operation. In practice, it’s a lot more complicated than that, because there are common parts shared by many arithmetic operations, and you can get rid of duplication by creating functional units that do several different things. We might look at how that works in a future post, if people are interested. (Let me know – either in the comments, or email, or mastodon, if you’d like me to brush up on that and write about it.)

December 20, 2022

Mark Chu-CarrollHow Computers Work: Logic Gates

At this point, we’ve gotten through a very basic introduction to how the electronic components of a computer work. The next step is understanding how a computer can compute anything.

There are a bunch of parts to this.

  1. How do single operations work? That is, if you’ve got a couple of numbers represented as high/low electrical signals, how can you string together transistors in a way that produces something like the sum of those two numbers?
  2. How can you store values? And once they’re stored, how can you read them back?
  3. How does the whole thing run? It’s a load of transistors strung together – how does that turn into something that can do things in sequence? How can it “read” a value from memory, interpret that as an instruction to perform an operation, and then select the right operation?

In this post, we’ll start looking at the first of those: how are individual operations implemented using transistors?

Boolean Algebra and Logic Gates

The mathematical basis is something called boolean algebra. Boolean algebra is a simple mathematical system with two values: true and false (or 0 and 1, or high and low, or A and B… it doesn’t really matter, as long as there are two, and only two, distinct values).

Boolean algebra looks at the ways that you can combine those true and false values. For example, if you’ve got exactly one value (a bit) that’s either true or false, there are four operations you can perform on it.

  1. Yes: this operation ignores the input, and always outputs True.
  2. No: like Yes, this ignores its input, but in No, it always outputs False.
  3. Id: this outputs the same value as its input. So if its input is true, then it will output true; if its input is false, then it will output false.
  4. Not: this reads its input, and outputs the opposite value. So if the input is true, it will output false; and if the input is false, it will output True.

The beauty of boolean algebra is that it can be physically realized by transistor circuits. Any simple, atomic operation that can be described in boolean algebra can be turned into a simple transistor circuit called a gate. For most of understanding how a computer works, once we understand gates, we can almost ignore the fact that there are transistors behind the scenes: the gates become our building blocks.

The Not Gate

The truth table for boolean NOT

We’ll start with the simplest gate: a not gate. A not gate implements the Not operation from boolean algebra that we described above. In a physical circuit, we’ll interpret a voltage on a wire (“high”) as a 1, and no voltage on the wire (“low”) as a 0. So a not gate should output low (no voltage) when its input is high, and it should output high (a voltage) when its input is low. We usually write that as something called a truth table, which shows all of the possible inputs, and all of the possible outputs. In the truth table, we usually write 0s and 1s: 0 for low (or no current), and 1 for high. For the NOT gate, the truth table has one input column, and one output column.

Circuit diagram of a NOT gate

I’ve got a sketch of a not gate in the image to the side. It consists of two transistors: a standard (default-off) transistor, which is labelled “B”, and a complementary transistor (default-on) labeled A. A power supply is provided on the the emitter of transistor A, and then the collector of A is connected to the emitter of B, and the collector of B is connected to ground. Finally, the input is split and connected to the bases of both transistors, and the output is connected to the wire that connects the collector of A and the emitter of B.

That all sounds complicated, but the way that it works is simple. In an electric circuit, the current will always follow the easiest path. If there’s a short path to ground, the current will always follow that path. And ground is always low (off/0). Knowing that, let’s look at what this will do with its inputs.

Suppose that the input is 0 (low). In that case, transistor A will be on, and transistor B will be off. Since B is off, there’s no path from the power to ground; and since A is on, cif there’s any voltage at the input, then current will flow through A to the output.

Now suppose that the input is 1 (high). In that case, A turns off, and B turns on. Since A is off, there’s no path from the power line to the output. And since B is on, the circuit has connected the output to ground, making it low.

Our not gate is, basically, a switch. If its input is high, then the switch attaches the output to ground; if its input is low, then the switch attaches the output to power.

The NAND gate

Let’s try moving on to something more interesting: a NAND gate. A NAND gate takes two inputs, and outputs high when any of its inputs is low. Engineers love NAND gates, because you can create any boolean operation by combining NAND gates. We’ll look at that in a bit more detail later.

Input XInput YOutput
The truth table for NAND

Here’s a diagram of a NAND gate. Since there’s a lots of wires running around and crossing each other, I’ve labeled the transistors, and made each of the wires a different color:

  • Connections from the power source are drawn in green.
  • Connections from input X are drawn in blue.
  • Connections from input Y are drown in red.
  • The complementary transistors are labelled C1 and C2.
  • The output of the two complementary transistors is labelled “cout”, and drawn in purple.
  • The two default-off transistors are labelled T1 and T2.
  • The output from the gate is drawn in brown.
  • Connections to ground are drawn in black.

Let’s break down how this works:

  • In the top section, we’ve got the two complimentary (default-on) transistors. If either of the inputs is 0 (low), then they’ll stay on, and pass a 1 to the cout line. There’s no connection to ground, and there is a connection to power via one (or both) on transistors, so the output of the circuit will be 1 (high).
  • If neither of the inputs is low, then both C1 and C2 turn off. Cout is then not getting any voltage, and it’s 0. You might think that this is enough – but we want to force the output all the way to 0, and there could be some residual electrons in C1 and C2 from the last time they were active. So we need to provide a path to drain that, instead of allowing it to possibly affect the output of the gate. That’s what T and T2 are for on the bottom. If both X and Y are high, then both T1 and T2 will be on – and that will provide an open path to ground, draining the system, so that the output is 0 (low).

Combining Gates

There are ways of building gates for each of the other basic binary operators in boolean algebra: AND, OR, NOR, XOR, and XNOR. But in fact, we don’t need to know how to do those – because in practice,all we need is a NAND gate. You can combine NAND gates to produce any other gate that you want. (Similarly, you can do the same with NOR gates. NAND and NOR are called universal gates for this reason.)

Let’s look at how that works. First, we need to know how to draw gates in a schematic form, and what each of the basic operations do. So here’s a chart of each operation, its name, its standard drawing in a schematic, and its truth table.

Just like we did with the basic gates above, we’ll start with NOT. Using boolean logic identities, we can easily derive that \lnot A = A \lnot\land A; or in english, “not A” is the same thing as “not(A nand A)”. In gates, that’s easy to build: it’s a NAND gate with both of its inputs coming from the same place:

For a more interesting one, let’s look at AND, and see how we can build that using just NAND gates. We can go right back to boolean algebra, and play with identities. We want A \land B. It’s pretty straightforward in terms of logic: “A \and B” is the same as \lnot (A \lnot\land B).

That’s just two NAND gates strung together, like this:

We can do the same basic thing with all of the other basic boolean operations. We start with boolean algebra to figure out equivalences, and then translate those into chains of gates.

With that, we’ve got the basics of boolean algebra working in transistors. But we still aren’t doing interesting computations. The next step is building up: combining collections of gates together to do more complicated things. In the next post, we’ll look at an example of that, by building an adder: a network of gates that performs addition!

December 19, 2022

John PreskillEight highlights from publishing a science book for the general public

What’s it like to publish a book?

I’ve faced the question again and again this year, as my book Quantum Steampunk hit bookshelves in April. Two responses suggest themselves.

On the one hand, I channel the Beatles: It’s a hard day’s night. Throughout the publication process, I undertook physics research full-time. Media opportunities squeezed themselves into the corners of the week: podcast and radio-show recordings, public-lecture preparations, and interviews with journalists. After submitting physics papers to coauthors and journals, I drafted articles for Quanta Magazine, Literary Hub, the New Scientist newsletter, and other venues—then edited the articles, then edited them again, and then edited them again. Often, I apologized to editors about not having the freedom to respond to their comments till the weekend. Before public-lecture season hit, I catalogued all the questions that I imagined anyone might ask, and I drafted answers. The resulting document spans 16 pages, and I study it before every public lecture and interview.

Public lecture at the Institute for the Science of Origins at Case Western Reserve University

Answer number two: Publishing a book is like a cocktail of watching the sun rise over the Pacific from Mt. Fuji, taking off in an airplane for the first time, and conducting a symphony in Carnegie Hall.1 I can scarcely believe that I spoke in the Talks at Google lecture series—a series that’s hosted Tina Fey, Noam Chomsky, and Andy Weir! And I found my book mentioned in the Boston Globe! And in a Dutch science publication! If I were an automaton from a steampunk novel, the publication process would have wound me up for months.

Publishing a book has furnished my curiosity cabinet of memories with many a seashell, mineral, fossil, and stuffed crocodile. Since you’ve asked, I’ll share eight additions that stand out.

Breakfast on publication day. Because how else would one celebrate the publication of a steampunk book?

1) I guest-starred on a standup-comedy podcast. Upon moving into college, I received a poster entitled 101 Things to Do Before You Graduate from Dartmouth. My list of 101 Things I Never Expected to Do in a Physics Career include standup comedy.2 I stand corrected.

Comedian Anthony Jeannot bills his podcast Highbrow Drivel as consisting of “hilarious conversations with serious experts.” I joined him and guest comedienne Isabelle Farah in a discussion about film studies, lunch containers, and hippies, as well as quantum physics. Anthony expected me to act as the straight man, to my relief. That said, after my explanation of how quantum computers might help us improve fertilizer production and reduce global energy consumption, Anthony commented that, if I’d been holding a mic, I should have dropped it. I cherish the memory despite having had to look up the term mic drop when the recording ended.

At Words Worth Books in Waterloo, Canada

2) I met Queen Victoria. In mid-May, I arrived in Canada to present about my science and my book at the University of Toronto. En route to the physics department, I stumbled across the Legislative Assembly of Ontario. Her Majesty was enthroned in front of the intricate sandstone building constructed during her reign. She didn’t acknowledge me, of course. But I hope she would have approved of the public lecture I presented about physics that blossomed during her era. 

Her Majesty, Queen Victoria

3) You sent me your photos of Quantum Steampunk. They arrived through email, Facebook, Twitter, text, and LinkedIn. They showed you reading the book, your pets nosing it, steampunk artwork that you’d collected, and your desktops and kitchen counters. The photographs have tickled and surprised me, although I should have expected them, upon reflection: Quantum systems submit easily to observation by their surroundings.3 Furthermore, people say that art—under which I classify writing—fosters human connection. Little wonder, then, that quantum physics and writing intersect in shared book selfies.

Photos from readers

4) A great-grandson of Ludwig Boltzmann’s emailed. Boltzmann, a 19th-century Austrian physicist, helped mold thermodynamics and its partner discipline statistical mechanics. So I sat up straighter upon opening an email from a physicist descended from the giant. Said descendant turned out to have watched a webinar I’d presented for the magazine Physics Today. Although time machines remain in the domain of steampunk fiction, they felt closer to reality that day.

5) An experiment bore out a research goal inspired by the book. My editors and I entitled the book’s epilogue Where to next? The future of quantum steampunk. The epilogue spurred me to brainstorm about opportunities and desiderata—literally, things desired. Where did I want for quantum thermodynamics to head? I shared my brainstorming with an experimentalist later that year. We hatched a project, whose experiment concluded this month. I’ll leave the story for after the paper debuts, but I can say for now that the project gives me chills—in a good way.

6) I recited part of Edgar Allan Poe’s “The Raven” with a fellow physicist at a public lecture. The Harvard Science Book Talks form a lecture series produced by the eponymous university and bookstore. I presented a talk hosted by Jacob Barandes—a Harvard physics lecturer, the secret sauce behind the department’s graduate program, and an all-around exemplar of erudition. He asked how entropy relates to “The Raven.”

Image from the Harvard Gazette

For the full answer, see chapter 11 of my book. Briefly: Many entropies exist. They quantify the best efficiencies with which we can perform thermodynamic tasks such as running an engine. Different entropies can quantify different tasks’ efficiencies if the systems are quantum, otherwise small, or far from equilibrium—outside the purview of conventional 19th-century thermodynamics. Conventional thermodynamics describes many-particle systems, such as factory-scale steam engines. We can quantify conventional systems’ efficiencies using just one entropy: the thermodynamic entropy that you’ve probably encountered in connection with time’s arrow. How does this conventional entropy relate to the many quantum entropies? Imagine starting with a quantum system, then duplicating it again and again, until accruing infinitely many copies. The copies’ quantum entropies converge (loosely speaking), collapsing onto one conventional-looking entropy. The book likens this collapse to a collapse described in “The Raven”:

The speaker is a young man who’s startled, late one night, by a tapping sound. The tapping exacerbates his nerves, which are on edge due to the death of his love: “Deep into that darkness peering, long I stood there wondering, fearing, / Doubting, dreaming dreams no mortal ever dared to dream before.” The speaker realizes that the tapping comes from the window, whose shutter he throws open. His wonders, fears, doubts, and dreams collapse onto a bird’s form as a raven steps inside. So do the many entropies collapse onto one entropy as the system under consideration grows infinitely large. We could say, instead, that the entropies come to equal each other, but I’d rather picture “The Raven.” 

I’d memorized the poem in high school but never had an opportunity to recite it for anyone—and it’s a gem to declaim. So I couldn’t help reciting a few stanzas in response to Jacob. But he turned out to have memorized the poem, too, and responded with the next several lines! Even as a physicist, I rarely have the chance to reach such a pinnacle of nerdiness.

With Pittsburgh Quantum Institute head honchos Rob Cunningham and Adam Leibovich

7) I stumbled across a steam-driven train in Pittsburgh. Even before self-driving cars heightened the city’s futuristic vibe, Pittsburgh has been as steampunk as the Nautilus. Captains of industry (or robber barons, if you prefer) raised the city on steel that fed the Industrial Revolution.4 And no steampunk city would deserve the title without a Victorian botanical garden.

A Victorian botanical garden features in chapter 5 of my book. To see a real-life counterpart, visit the Phipps Conservatory. A poem in glass and aluminum, the Phipps opened in 1893 and even boasts a Victoria Room.

Yes, really.

I sneaked into the Phipps during the Pittsburgh Quantum Institute’s annual conference, where I was to present a public lecture about quantum steampunk. Upon reaching the sunken garden, I stopped in my tracks. Yards away stood a coal-black, 19th-century steam train. 

At least, an imitation train stood yards away. The conservatory had incorporated Monet paintings into its scenery during a temporary exhibition. Amongst the palms and ponds were arranged props inspired by the paintings. Monet painted The Gare Saint-Lazare: Arrival of a Train near a station, so a miniature train stood behind a copy of the artwork. The scene found its way into my public lecture—justifying my playing hooky from the conference for a couple of hours (I was doing research for my talk!).

My book’s botanical garden houses hummingbirds, wildebeests, and an artificial creature called a Yorkicockasheepapoo. I can’t promise that you’ll spy Yorkicockasheepapoos while wandering the Phipps, but send me a photo if you do.

8) My students and postdocs presented me with a copy of Quantum Steampunk that they’d signed. They surprised me one afternoon, shortly after publication day, as I was leaving my office. The gesture ranks as one of the most adorable things that’ve ever happened to me, and their book is now the copy that I keep on campus. 

Students…book-selfie photographers…readers halfway across the globe who drop a line…People have populated my curiosity cabinet of with some of the most extraordinary book-publication memories. Thanks for reading, and thanks for sharing.

Book signing after public lecture at Chapman University. Photo from Justin Dressel.

1Or so I imagine, never having watched the sun rise from Mt. Fuji or conducted any symphony, let alone one at Carnegie Hall, and having taken off in a plane for the first time while two months old.

2Other items include serve as an extra in a film, become stranded in Taiwan, and publish a PhD thesis whose title contains the word “steampunk.”

3This ease underlies the difficulty of quantum computing: Any stray particle near a quantum computer can “observe” the computer—interact with the computer and carry off a little of the information that the computer is supposed to store.

4The Pittsburgh Quantum Institute includes Carnegie Mellon University, which owes its name partially to captain of industry Andrew Carnegie.