Planet Musings

September 26, 2023

Doug NatelsonA few quick highlights

 It's been a very busy time, hence my lower posting frequency.  It was rather intense trying to attend both the KITP conference and the morning sessions of the DOE experimental condensed matter PI meeting (pdf of agenda here).  A few quick highlights that I thought were interesting:

  • Kagome metals of the form AV3Sb5 are very complicated.  In these materials, in the a-b plane the V atoms form a Kagome lattice (before that one reader corrects me, I know that this is not formally a lattice from the crystallographic point of view, just using the term colloquially).  Band structure calculations show that there are rather flat bands (for an explanation, see here) near the Fermi level, and there are Dirac cones, van Hove singularities, Fermi surface nesting, etc.  These materials have nontrivial electronic topology, and CsV3Sb5 and KV3Sb5 both have charge density wave transitions and low-temperature superconductivity.  Here is a nice study of the CDW in CsV3Sb5, and here is a study that shows that there is no spontaneous breaking of time-reversal symmetry below that transition.  This paper shows that there is funky nonlinear electronic transport (apply a current at frequency \(\omega\), measure a voltage at frequency \(2 \omega\)) in CsV3Sb5 that is switchable in sign with an out-of-plane magnetic field.  Weirdly, that is not seen in KV3Sb5 even though the basic noninteracting band structures of the two materials are almost identical, implying that it has something to do with electronic correlation effects.
  • Related to that last paper, here is a review article about using focused ion beams for sample preparation and material engineering.  It's pretty amazing what can be done with these tools, including carving out micro/nanostructured devices from originally bulk crystals of interesting materials.  
  • The temperature-dependent part of the electrical resistivity of Fermi liquids is expected to scale like \(T^{2}\) as \(T \rightarrow 0\).  One can make a very general argument (that ignores actual kinematic restrictions on scattering) based on the Pauli exclusion principle that the inelastic e-e scattering rate should go like \(T^{2}\) (number of electron quasiparticles excited goes like \(T\), number of empty states available to scatter into also goes like \(T\)).  However, actually keeping track of momentum conservation, it turns out that one usually needs Umklapp scattering processes to get this.  That isn't necessary all the time, however.  In very low density metals, the Fermi wavevector is far from the Brillouin zone boundary and so Umklapp should not be important, but it is still possible to get \(T^{2}\) resistivity (see here as well).  Similarly, in 3He, a true Fermi liquid, there is no lattice, so there is no such thing as Umklapp, but at the lowest temperatures the \(T^{2}\) thermal conduction is still seen (though some weird things happen at higher temperatures). 
There are more, but I have to work on writing some other things.  More soon....

September 25, 2023

Tommaso DorigoMesmerizing Shapes - Symmetries According To An AI

Having spent the past 12 months coding up an end-to-end model of an astrophysics experiment, with the sole aim of searching for an optimal solution for its design by use of stochastic gradient descent, I am the least qualified person to judge the aesthetic value of the results I am finally getting from it. 
Therefore it makes sense to ask you, dear reader, what you think of the eerily arcane geometries that the system is proposing. I do not think that to be a good judge you need to know the details of how the model is put together, but I will nevertheless make an attempt at briefing you on it, just in case it makes a difference in your judgment.

read more

September 24, 2023

David Hoggplanning your science

I had two interactions today that made me think seriously about big-picture and design things. I like design language: How do you design your whole research program, and how do you design individual projects so they fit into it. One interaction was in the Astronomical Data Meeting at Flatiron, where Vivi Acquaviva (CUNY) talked about the intersection between what you are good at, what is important, and what brings you joy. That's a hard intersection to find. Or way too easy; I am not sure. The other interaction was a conversation with Jiayin Dong (Flatiron), who is thinking about faculty job applications and the like. How to talk about your research in terms of the next decade instead of the next year?

One comment that is frequently made by Hans-Walter Rix (MPIA) is that he feels like most early-career (and even mid-career) people spend too much time doing their science and not enough time planning and justifying their science. It is important to be able to answer “why” questions about your research, and in the medium term it helps all your projects.

David Hogglost

I got really lost with respect to research today. In almost all of my projects I am supposed to be mentoring postdocs and students. Today various blocks came up that interfered with that mentoring. And then I found that I had nothing sensible to work on! Of course that isn't true: I have literally a dozen projects in a mature state waiting on final work from me. But I couldn't figure out how to work on any of them. Research is hard. At the end of the day, Andy Casey (Monash) helped me out by giving me some very specific jobs to do.

n-Category Café The Moduli Space of Acute Triangles

I wrote a little article explaining the concept of ‘moduli space’ through an example. It’s due October 1st so I’d really appreciate it if you folks could take a look and see if it’s clear enough. It’s really short, and it’s written for people who know some math, but not necessarily anything about moduli spaces.

The cool part is the connection between the moduli space of acute triangles — that is, the space of all shapes an acute triangle can have — and the more famous moduli space of elliptic curves.

There’s a lot more one could do with this, e.g. describing the modular lambda function as the cross ratio of the 4 points on a sphere obtained by taking an acute triangle, dividing it into 4 similar acute triangles, and folding it up to form a tetrahedron, which is conformally equivalent to a sphere. But I didn’t have space for that here!

Okay, here goes.

The moduli space of acute triangles

In mathematics we often like to classify objects up to isomorphism. Sometimes the classification is discrete, but sometimes we have a notion of when two objects are ‘close’. Then we can make the set of isomorphism classes into a topological space called a ‘moduli space’. A simple example is the moduli space of acute triangles. In simple terms, this is the space of all possible shapes that an acute triangle can have, where we count two triangles as having the same shape if they are similar.

As a first step, consider triangles with labeled vertices in the complex plane. Every triangle is similar to one with its first vertex at 00, the second at 11, and the third at some point in the upper half-plane. This triangle is acute precisely when its third vertex lies in this set:

T={z|Im(z)>0,0<Re(z)<1,|z12|>12} T = \left\{z \in \mathbb{C} \; \left\vert \; \mathrm{Im}(z) \gt 0, \; 0 \lt \mathrm{Re}(z) \lt 1, \; |z - \tfrac{1}{2}| \gt \tfrac{1}{2} \right. \right\}

So, we say TT is the moduli space of acute triangles with labeled vertices. This set is colored yellow and purple above; the yellow and purple regions on top extend infinitely upward.

To get the moduli space of acute triangles with unlabeled vertices, we must mod out TT by the action of S 3S_3 that permutes the three vertices. The 6 yellow and purple regions in TT are ‘fundamental domains’ for this S 3S_3 action: that is, they each contain exactly one point from each orbit. If we reflect a labeled triangle corresponding to a point in a yellow region we get a triangle corresponding to a point in a purple region, and vice versa. Points on the boundary between two regions correspond to isosceles triangles. All 6 regions meet at the point that corresponds to an equilateral triangle.

The moduli space of acute triangles is closely related to a more famous moduli space: that of elliptic curves. The group GL(2,)\mathrm{GL}(2,\mathbb{Z}), consisting of invertible 2×22 \times 2 integer matrices, acts on the upper half-plane

={z|Im(z)>0} \mathcal{H} = \left\{z \in \mathbb{C} \; \left\vert \; \mathrm{Im}(z) \gt 0 \right. \right\}

as follows:

(a b c d):zaz+bcz+d. \left( \begin{array}{cc} a & b \\ c & d \end{array} \right) \colon z \mapsto \frac{a z + b}{c z + d} .

The light and dark regions shown above are fundamental domains for this group action. Elements of GL(2,)\mathrm{GL}(2,\mathbb{Z}) with determinant 1-1 map light regions to dark ones and vice versa. Elements with determinant 11 map light regions to light ones and dark ones to dark ones. People more often study the action of the subgroup SL(2,)\mathrm{SL}(2,\mathbb{Z}) consisting of gGL(2,)g \in \mathrm{GL}(2,\mathbb{Z}) with determinant 1. The union of a light region and a dark one, touching each other, forms a fundamental domain of SL(2,)\mathrm{SL}(2,\mathbb{Z}).

For any point zz \in \mathcal{H} we can form a parallelogram with vertices 0,1,z0, 1, z and z+1z+1. If we identify the opposite edges of this parallelogram we get an elliptic curve: a torus equipped with the structure of a complex manifold. We can get every elliptic curve this way, at least up to isomorphism. Moreover, two points z,zz,z' \in \mathcal{H} in the upper half-plane give isomorphic elliptic curves iff z=gzz' = g z for some gSL(2,)g \in \mathrm{SL}(2,\mathbb{Z}). Thus the quotient space /SL(2,)\mathcal{H}/\mathrm{SL}(2,\mathbb{Z}) is the moduli space of elliptic curves: points in this space correspond to isomorphism classes of elliptic curves.

Since TT is the union of three fundamental domains for SL(2,)\mathrm{SL}(2,\mathbb{Z}), there is a map

p:T/SL(2,) p \colon T \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z})

from the moduli space of acute triangles to the moduli space of elliptic curves, and generically this map is three-to-one. This map is not onto, but if we take the closure of TT inside \mathcal{H} we get a larger set

T¯={z|Im(z)>0,0Re(z)1,|z12|12} \overline{T} = \left\{z \in \mathbb{C} \; \vert \; \mathrm{Im}(z) \gt 0, \; 0 \le \mathrm{Re}(z) \le 1, \; |z - \tfrac{1}{2}| \ge \tfrac{1}{2} \right\}

whose boundary consists of points corresponding to right triangles. Then pp extends to an onto map

p:T¯/SL(2,). p \colon \overline{T} \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z}) .

The existence of this map suggests that from any acute or right triangle in the plane we can construct an elliptic curve, in such a way that similar triangles give isomorphic elliptic curves. This is in fact true! How can we understand this more directly?

Take any acute or right triangle with labelled vertices in the complex plane. Rotating it 180° around the midpoint of any edge we get another triangle. The union of these two triangles is a parallelogram. Identifying opposite edges of this parallelogram we get a torus with a complex structure — and this is an elliptic curve! There are three choices of how to build this parallelogram, one for each edge of the original triangle, but they give isomorphic elliptic curves. Also, similar triangles give isomorphic elliptic curves. Even better, every elliptic curve is isomorphic to one arising from this construction. So this construction gives a map from T¯\overline{T} onto /SL(2,)\mathcal{H}/\mathrm{SL}(2,\mathbb{Z}), and with a little thought one can see that this map is pp.

I learned about the moduli space of acute triangles from James Dolan. There has also been interesting work on the moduli space of all triangles in the plane. Gaspar and Neto [2] noticed that this space is a triangle, and Ian Stewart later gave a more geometrical explanation [3]. In fact all the moduli spaces mentioned here are better thought of as moduli ‘stacks’: stacks give a way to understand the special role of more symmetrical objects, like isosceles and equilateral triangles. Kai Behrend [1] has written an introduction to stacks using various moduli stacks of triangles and the moduli space of elliptic curves as examples. Though he does not describe the map pp (or its stacky analogue), his work is a nice way to dig deeper into some of the material discussed here.


[1] K. Behrend, An introduction to algebraic stacks, in Moduli Spaces, eds. L. Brambila-Paz, P. Newstead, R. P. Thomas and O. García-Prada, Cambridge U. Press, Cambridge 2014, pp. 1–131. Also available at

[2] J. Gaspar and O. Neto, All triangles at once, Amer. Math. Monthly 122 (2015), 982–982.

[3] I. Stewart, Why do all triangles form a triangle?, Amer. Math. Monthly 124 (2017), 70–73.

John BaezThe Moduli Space of Acute Triangles

Recently Quanta magazine came out with an article explaining modular forms:

• Jordana Cepelewicz, Behold modular forms, the ‘fifth fundamental operation’ of math, Quanta, 23 September 2023.

It does a heroically good job. One big thing it doesn’t do is explain these funny looking ‘fundamental domains’ in the upper half-plane:

That is: where does this picture come from and why is it important?

By sheer coincidence, I just wrote a little article explaining the concept of ‘moduli space’ through an example, which does touch on these fundamental domains. It’s due October 1st so I’d really appreciate it if you folks could take a look and see if it’s clear enough. It’s really short, and it’s written for people who know some math—more than your typical Quanta reader—but not necessarily anything about moduli spaces.

The cool part is the connection between the moduli space of acute triangles—that is, the space of all shapes an acute triangle can have—and the more famous moduli space of elliptic curves.

Okay, here goes.

The moduli space of acute triangles

In mathematics we often like to classify objects up to isomorphism. Sometimes the classification is discrete, but sometimes we have a notion of when two objects are ‘close’. Then we can make the set of isomorphism classes into a topological space called a ‘moduli space’. A simple example is the moduli space of acute triangles. In simple terms, this is the space of all possible shapes that an acute triangle can have, where we count two triangles as having the same shape if they are similar.

As a first step, consider triangles with labeled vertices in the complex plane. Every triangle is similar to one with its first vertex at 0, the second at 1, and the third at some point in the upper half-plane. This triangle is acute precisely when its third vertex lies in this set:

T = \left\{z \in \mathbb{C} \; \left\vert \; \mathrm{Im}(z) > 0, \; 0 < \mathrm{Re}(z) < 1, \; |z - \tfrac{1}{2}| > \tfrac{1}{2} \right. \right\}

which is colored yellow and purple above. So, we say T is the moduli space of acute triangles with labeled vertices.

To get the moduli space of acute triangles with unlabeled vertices, we must mod out T by the action of S_3 that permutes the three vertices. The 6 yellow and purple regions in T are fundamental domains for this S_3 action. If we reflect a labeled triangle corresponding to a point in a yellow region we get a triangle corresponding to a point in a purple region, and vice versa. Points on the boundary between two regions correspond to isosceles triangles. All 6 regions meet at the point that corresponds to an equilateral triangle.

The moduli space of acute triangles is closely related to a more famous moduli space: that of elliptic curves. The group \mathrm{GL}(2,\mathbb{Z}), consisting of invertible 2 \times 2 integer matrices, acts on the upper half-plane

\mathcal{H} = \left\{z \in \mathbb{C} \; \vert \; \mathrm{Im}(z) > 0 \right\}

as follows:

\displaystyle{  \left( \begin{array}{cc} a & b \\ c & d \end{array} \right) \colon z \mapsto \frac{a z + b}{c z + d}  }

The light and dark regions shown above are fundamental domains for this group action. Elements of \mathrm{GL}(2,\mathbb{Z}) with determinant -1 map light regions to dark ones and vice versa. Elements with determinant 1 map light regions to light ones and dark ones to dark ones. People more often study the action of the subgroup \mathrm{SL}(2,\mathbb{Z}) consisting of g \in \mathrm{GL}(2,\mathbb{Z}) with determinant 1. The union of a light region and a dark one, touching each other, forms a fundamental domain of \mathrm{SL}(2,\mathbb{Z}).

For any point z \in \mathcal{H} we can form a parallelogram with vertices 0, 1, z and z+1. If we identify the opposite edges of this parallelogram we get an elliptic curve: a torus equipped with the structure of a complex manifold. We can get every elliptic curve this way, at least up to isomorphism. Moreover, two points z,z' \in \mathcal{H} in the upper half-plane give isomorphic elliptic curves iff z' = g z for some g \in \mathrm{SL}(2,\mathbb{Z}). Thus the quotient space \mathcal{H}/\mathrm{SL}(2,\mathbb{Z}) is the moduli space of elliptic curves: points in this space correspond to isomorphism classes of elliptic curves.

Since T is the union of three fundamental domains for \mathrm{SL}(2,\mathbb{Z}), there is a map

p \colon T \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z})

from the moduli space of acute triangles to the moduli space of elliptic curves, and generically this map is three-to-one. This map is not onto, but if we take the closure of T inside \mathcal{H} we get a larger set

\overline{T} = \left\{z \in \mathbb{C} \; \vert \; \mathrm{Im}(z) > 0, \; 0 \le \mathrm{Re}(z) \le 1, \; |z - \tfrac{1}{2}| \ge \tfrac{1}{2} \right\}

whose boundary consists of points corresponding to right triangles. Then p extends to an onto map

p \colon \overline{T} \to \mathcal{H}/\mathrm{SL}(2,\mathbb{Z})

The existence of this map suggests that from any acute or right triangle in the plane we can construct an elliptic curve, in such a way that similar triangles give isomorphic elliptic curves. This is in fact true! How can we understand this more directly?

Take any acute or right triangle with labelled vertices in the complex plane. Rotating it 180° around the midpoint of any edge we get another triangle. The union of these two triangles is a parallelogram. Identifying opposite edges of this parallelogram we get a torus with a complex structure—and this is an elliptic curve! There are three choices of how to build this parallelogram, one for each edge of the original triangle, but they give isomorphic elliptic curves. Also, similar triangles give isomorphic elliptic curves. Even better, every elliptic curve is isomorphic to one arising from this construction. So this construction gives a map from \overline{T} onto \mathcal{H}/\mathrm{SL}(2,\mathbb{Z}), and with a little thought one can see that this map is p.

I learned about the moduli space of acute triangles from James Dolan. There has also been interesting work on the moduli space of all triangles in the plane. Gaspar and Neto [2] noticed that this space is a triangle, and Ian Stewart later gave a more geometrical explanation [3]. In fact all the moduli spaces mentioned here are better thought of as moduli ‘stacks’: stacks give a way to understand the special role of more symmetrical objects, like isosceles and equilateral triangles. Kai Behrend [1] has written an introduction to stacks using various moduli stacks of triangles and the moduli space of elliptic curves as examples. Though he does not describe the map p (or its stacky analogue), his work is a nice way to dig deeper into some of the material discussed here.


[1] K. Behrend, An introduction to algebraic stacks, in Moduli Spaces, eds. L. Brambila-Paz, P. Newstead, R. P. Thomas and O. García-Prada, Cambridge U. Press, Cambridge 2014, pp. 1–131. Also available at

[2] J. Gaspar and O. Neto, All triangles at once, Amer.
Math. Monthly
122 (2015), 982–982.

[3] I. Stewart, Why do all triangles form a triangle?, Amer. Math. Monthly 124 (2017), 70–73.

Matt Strassler Mass, Weight, and Fields

Today a reader asked me “Out of the quantum fields which have mass, do any of them also have weight?” I thought other readers would be interested in my answer, so I’m putting it here. (Some of what is discussed below is covered in greater detail in my upcoming book.)

Before we start, we need to rephrase the question, because fields do not have mass.

Mass and Weight of Particles and Other Objects

For ordinary objects ranging from particles to planets, the corresponding question is meaningful, but still it is a bit subtle. “Out of the objects which have mass, do they also have weight?”

Gravity is a universal force that responds to energy and momentum, and universal means applicable to every object and indeed to anything that has energy. Because all objects have energy, all objects can have weight , and correspondingly all objects have gravitational mass. NOTE ADDED: as a reader pointed out, the above statement was not written clearly. It should instead read: “Because all objects have energy, all objects have gravitational mass, which means they will have weight if there’s some gravity around that can pull on them!” An object out in the middle of deep space, in the absence of anything to create a noticeable gravitational force on it, will have no weight, no matter how much gravitational mass it has.

(If this universality of gravity weren’t true, one could not think of gravity as a manifestation of curved space and time, as Einstein did. In the presence of gravity, objects without weight would act as though space and time were flat, while everything else around them would act as though space-time is curved. That would ruin Einstein’s whole idea!)

Yet even though all objects have gravitational mass, not all of them have rest mass. This leads to confusions that I address in the book, as in this paragraph:

  • Here’s another strange thing. If you have read a variety of books about particles and mass, you will probably have noticed that some say that photons have mass and others say that they don’t. It’s hard to believe there could be disagreement about something so fundamental in nature. But the origin of the discrepancy is simple: it depends on which version of mass you’re asking about. [From “Waves in an Impossible Sea”.]

Photons have gravitational mass, like everything else. But they have zero rest mass, which is why they must always move at the cosmic speed limit c, 186,000 miles (300,000 km) per second.

An electron, by contrast, has both gravitational mass and rest mass. But one still has to be careful: an electron’s gravitational mass is usually larger than its rest mass, unless (from your perspective) it is stationary.

Why Don’t Fields Have Mass?

A photon is a ripple in the electromagnetic field. An electron is a ripple in the electron field. More precisely, each is a quantum — a gentlest possible ripple — of its field. Since electrons have rest mass and photons do not, should we say that the electron field has mass and that the electromagnetic field does not?

No. That would be misguided.

It is true that some physicists will sometimes say, “the electron field has mass“. But they are using a potentially confusing shorthand when they do so. What they actually mean is: “the quanta of the electron field have rest mass” — i.e., electrons, ripples in the electron field, have rest mass — and that “the electron field’s equations include a term corresponding to this non-zero mass.” The field’s rest mass simply cannot be defined; it is meaningless. Here’s why.

Rest mass is a measure of how difficult it is for you to move an object that is currently stationary. But the electromagnetic field, present across the entire cosmos, is not something that can move. It has no such thing as speed, and you can’t move it. It’s part of the cosmos. This is true of the electron field as well. Fields of the universe are not things to which you can attribute motion. Since both rest mass and inertial mass have to do with motion, neither type of mass can be attributed to these fields. Nor can the fields cause gravity the way objects do — they are everywhere, so they can’t pull objects in any direction, as gravity does, or feel weight, which would cause them to move in some direction or other.

Ripples in these fields are a different matter! They do move, and they carry energy and therefore can cause gravity. Quanta of fields definitely can have all possible types of mass, and they have weight. But the fields themselves do not.

So: objects, including all elementary particles (i.e. quanta of the universe’s fields), have weight and gravitational mass, and some have non-zero rest mass. However, the fields of the universe are not objects, and they have neither weight nor mass of any kind.

Vacuum Energy-Density of fields

Despite this, quantum fields can have gravitational effects and intrinsic energy — more precisely, what is known as vacuum energy-density. Even when a field is sitting undisturbed, an effect of quantum physics causes it to be uncertain, creating a constant amount of energy in each spatial volume. This is true whether its quanta have non-zero rest mass or not. Vacuum energy-density contributes to the cosmological constant, and is potentially among the sources of the universe’s widespread “dark energy” (which, despite the name, is in fact energy-density and negative pressure). Ordinary objects, including the elementary particles they’re made from, can’t have vacuum-energy.

Vacuum energy-density has potentially enormous gravitational effects on the cosmos as a whole. But you shouldn’t think of it as directly analogous to weight and mass, which are properties of localized objects made from electrons and other quanta. It is something quite different, with its own distinct effects. [For instance, you might think positive vacuum energy-density, like the positive energy inside a planet, would cause things to fall inward; but instead it causes the universe’s space to expand. And while the rest mass of ordinary objects in empty space can only be zero or positive, vacuum energy-density can be negative.]

I hope this somewhat clarifies how the properties of fields differ from the properties of their particles. It’s a very different thing to be spread out across the whole cosmos than to be a localized, movable object.

September 23, 2023

n-Category Café Constructing the Real Numbers as Nearly Multiplicative Sequences

I’m in Regensburg this week attending a workshop on Interactions of Proof Assistants and Mathematics. One of the lecture series is being given by John Harrison, a Senior Principal Applied Scientist in the Automated Reasoning Group at Amazon Web Services, and a lead developer of the HOL Light interactive theorem prover. He just told us about a very cool construction of the non-negative real numbers as sequences of natural numbers satisfying a property he calls “near multiplicativity”. In particular, the integers and the rational numbers aren’t needed at all! This is how the reals are constructed in HOL Light and is described in more detail in a book he wrote entitled Theorem Proving with the Real Numbers.

Edit: as the commenters note, these are also known as the Eudoxus reals and were apparently discovered by our very own Stephen Schanuel and disseminated by Ross Street. Thanks for pointing me to the history of this construction!

The idea

One of the standard constructions of the real numbers is as equivalence classes of Cauchy sequences of rationals. Let us consider a non-negative real number aa. One way to say that a sequence q:q \colon \mathbb{N} \to \mathbb{Q} of rational numbers converges to aa is to ask there to exist a constant AA \in \mathbb{N} so that for all nn \in \mathbb{N},

|q na|An.|q_n - a | \le \frac{A}{n}.

These representations aren’t at all unique: many Cauchy sequences of rationals represent the same real number. And in particular, for any positive real number aa, it is possible to find a sequence of natural numbers a:a \colon \mathbb{N} \to \mathbb{N} so that the sequence na nnn \mapsto \frac{a_n}{n} converges to aa in the above sense, i.e.:

|a nna|An|\frac{a_n}{n} - a | \le \frac{A}{n}

or equivalently

|a nna|A.|a_n - n \cdot a | \le A.

This sequence a:a \colon \mathbb{N} \to \mathbb{N} will encode the real number aa (which is why I’ve given them the same name).

The construction

Now that I’ve explained the idea let’s try to characterize the sequences of natural numbers that will correspond to non-negative real numbers without presupposing the existence of non-negative real numbers. The idea is that a sequence a:a \colon \mathbb{N} \to \mathbb{N} will have the property that the sequence na nnn \mapsto \frac{a_n}{n} encodes some non-negative real number just when this sequence is Cauchy, which we express in the following way: there exists a constant AA \in \mathbb{N} so that for all n,mn,m \in \mathbb{N},

|a nna mm|An+Am,|\frac{a_n}{n} - \frac{a_m}{m} | \le \frac{A}{n} + \frac{A}{m},

or equivalently by

|ma nna m|(m+n)A.|m \cdot {a_n} - n \cdot {a_m} | \le (m + n) \cdot A.

Such sequences are called nearly multiplicative. Supposedly this is equivalent to the property of a sequence being nearly additive, meaning there exists a constant AA' \in \mathbb{N} so that for all m,nm,n \in \mathbb{N}

|a m+n(a m+a n)|A.|a_{m+n} - (a_m + a_n)| \le A'.

The non-negative reals are then equivalence classes of nearly multiplicative sequences of natural numbers, where the equivalence relation says that a,a:a, a' \colon \mathbb{N} \to \mathbb{N} represent the same real number when there exists CC \in \mathbb{N} so that for all nn \in \mathbb{N}

|a na n|C.|a_n - a'_n | \le C.

This is more or less the usual equivalence relation of Cauchy sequences, except with a specified rate of convergence.

Addition and multiplication

Now that we have the non-negative reals, how do we add and how do we multiply?

Given nearly multiplicative sequences a:a \colon \mathbb{N} \to \mathbb{N} and b:b \colon \mathbb{N} \to \mathbb{N}, their sum a+b:a + b \colon \mathbb{N} \to \mathbb{N} is defined by pointwise addition: (a+b) n:=a n+b n(a+b)_n := a_n + b_n. This is fairly intuitive.

More interestingly, their product ab:a \cdot b \colon \mathbb{N} \to \mathbb{N} is defined by function composition: (ab) n:=a b n(a \cdot b)_n := a_{b_n}. It’s a fun exercise to work out that this converges to the desired non-negative real number.

September 22, 2023

Matt von HippelCause and Effect and Stories

You can think of cause and effect as the ultimate story. The world is filled with one damn thing happening after another, but to make sense of it we organize it into a narrative: this happened first, and it caused that, which caused that. We tie this to “what if” stories, stories about things that didn’t happen: if this hadn’t happened, then it wouldn’t have caused that, so that wouldn’t have happened.

We also tell stories about cause and effect. Physicists use cause and effect as a tool, a criterion to make sense of new theories: does this theory respect cause and effect, or not? And just like everything else in science, there is more than one story they tell about it.

As a physicist, how would you think about cause and effect?

The simplest, and most obvious requirement, is that effects should follow their causes. Cause and effect shouldn’t go backwards in time, the cause should come before the effect.

This all sounds sensible, until you remember that in physics “before” and “after” are relative. If you try to describe the order of two distant events, your description will be different than someone moving with a different velocity. You might think two things happened at the same time, while they think one happened first, and someone else thinks the other happened first.

You’d think this makes a total mess of cause and effect, but actually everything remains fine, as long nothing goes faster than the speed of light. If someone could travel between two events slower than the speed of light, then everybody will agree on their order, and so everyone can agree on which one caused the other. Cause and effect only get screwed up if they can happen faster than light.

(If the two events are two different times you observed something, then cause and effect will always be fine, since you yourself can’t go faster than the speed of light. So nobody will contradict what you observe, they just might interpret it differently.)

So if you want to make sure that your theory respects cause and effect, you’d better be sure that nothing goes faster than light. It turns out, this is not automatic! In general relativity, an effect called Shapiro time delay makes light take longer to pass a heavy object than to go through empty space. If you modify general relativity, you can accidentally get a theory with a Shapiro time advance, where light arrives sooner than it would through empty space. In such a theory, at least some observers will see effects happen before their causes!

Once you know how to check this, as a physicist, there are two kinds of stories you can tell. I’ve heard different people in the field tell both.

First, you can say that cause and effect should be a basic physical principle. Using this principle, you can derive other restrictions, demands on what properties matter and energy can have. You can carve away theories that violate these rules, making sure that we’re testing for theories that actually make sense.

On the other hand, there are a lot of stories about time travel. Time travel screws up cause and effect in a very direct way. When Harry Potter and Hermione travel back in time at the end of Harry Potter and the Prisoner of Azkaban, they cause the event that saves Harry’s life earlier in the book. Science fiction and fantasy are full of stories like this, and many of them are perfectly consistent. How can we be so sure that we don’t live in such a world?

The other type of story positions the physics of cause and effect as a search for evidence. We’re looking for physics that violates cause and effect, because if it exists, then on some small level it should be possible to travel back in time. By writing down the consequences of cause and effect, we get to describe what evidence we’d need to see it breaking down, and if we see it whole new possibilities open up.

These are both good stories! And like all other stories in science, they only capture part of what the scientists are up to. Some people stick to one or the other, some go between them, driven by the actual research, not the story itself. Like cause and effect itself, the story is just one way to describe the world around us.

September 21, 2023

Matt Strassler How to Tell that the Earth Spins

Continuing with the supplementary material for the book, from its Chapter 2. This is in reference to Galileo’s principle of relativity, a central pillar of modern science. This principle states that perfectly steady motion in a straight line is indistinguishable from no motion at all, and thus cannot be felt. This is why we don’t feel our rapid motion around the Earth and Sun; over minutes, that motion is almost steady and straight. I wrote

  • . . . Our planet rotates and roams the heavens, but our motion is nearly steady. That makes it nearly undetectable, thanks to Galileo’s principle.

To this I added a brief endnote, since the spin of the Earth can be detected, with some difficulty.

  • As pointed out by the nineteenth-century French physicist Léon Foucault, the Earth’s rotation, the least steady of our motions, is reflected in the motion of a tall pendulum. Many science museums around the world have such a “Foucault pendulum” on exhibit.

But for those who would want to know more, here’s some information about how to measure the Earth’s spin.


The spin of the Earth was first detected through what is known as the Coriolis effect, which causes objects far from the equator and moving in long paths to seem to curve gently. The reasons for the apparent curved paths, as well as the consequent impacts on navigation and weather, are discussed in this post from 2022, one of a series in which I showed how you can confirm basic facts of astronomy for yourself.

(Regarding the Coriolis effect: there’s a famous tourist trap in which trained professionals cause water to spin down drains in opposite directions, depending on which side of the equator they are standing on. But this is a magician’s trick, intended to obtain a nice tip from impressed travelers. The Coriolis effect is tiny on the scale of a sink; it’s only easy to observe on scales of miles (kilometers). It is even tinier right near the equator — that’s why there are no hurricanes at very low latitudes — and so it has no effect on the trickster’s draining water. Here’s someone’s webpage devoted to this issue.)

In the same post, I described the basics of a Foucault pendulum, the simplest device that can visibly demonstrate the Earth’s rotation. It’s really nothing more than an ordinary pendulum, but very tall, heavy, and carefully suspended. Unfortunately, although such a pendulum is easy to observe and is common in science museums, it is conceptually confusing. It is easily understood only at the Earth’s poles, where it swings in a fixed plane; then the Earth rotates “underneath” it, making it to appear to spectators that it rotates exactly once a day. But at lower latitudes, the pendulum appears to rotate more slowly, and at the equator it seems not to rotate at all (because, again, there’s no Coriolis effect near the equator.) Depending on how close the pendulum is to the equator, it may take weeks, months or years to rotate completely. To understand these details is not straightforward, even for physics students.

A better device for measuring the Earth’s rotation, which Foucault was well aware of and which I discussed in the next post in that same series, is a gyroscope — for example, a spinning top, or indeed any symmetrical, rapidly spinning object. Conceptually, a gyroscope is extremely simple, because its pointing direction stays fixed in space no matter what happens around it. Once pointed at a star, the gyroscope will continue to aim at that star even as the Earth turns, and so it will appear to rotate once a day no matter where it is located on Earth.

So why don’t science museums display gyroscopes instead of Foucault pendula? Unfortunately, even today, it is still impossible to build a mechanical gyroscope that is stable enough over a full day to demonstrate the Earth’s rotation. Ring laser gyroscopes, which use interference effects between light waves, are much more stable, and they can do the job well (as a flat-earther discovered to his embarrassment — see the last final section of that same post.) But their workings are invisible and nonintuitive, making them less useful at a science museum or in a physics classroom.

Now here’s something worth thinking about. Imagine a intelligent species living forever underground, perhaps without vision, never having seen the sky. Many such species may exist in the universe, far more than on planetary surfaces that are subject to potentially sterilizing solar flares and other catastrophes. Despite complete ignorance of their astronomical surroundings, these creatures too can prove their planet rotates, using nothing more than a swinging pendulum or a spinning top.

September 20, 2023

David Hogguncertainty estimation for regression outputs

Most methods for performing regressions don't provide natural uncertainties. Some do, of course! But few deliver uncertainties you will believe. I discussed these issues with Contardo (SISSA) today, in the context of our project to (confidently) find infrared excesses around boring old main-sequence stars. One option is to look at the performance on held-out data. But then you have to decide how to aggregate this information in a way that is relevant for each object in your sample: They probably don't all have the same uncertainty! Another option is to look at the variation of prediction across training sets. That's good! But it requires that you have lots of training data. In this case, we do, so that's where we are at right now.

September 19, 2023

David Hoggregressions for point clouds

I spent my research time today writing in a document that proposes (and demonstrates) some methods for performing machine-learning-style regressions, but where the input objects (features) are variable-size point clouds. Contributions also from Villar (JHU) and Gebhard (MPI-IS). I spent way too long working out the terminology and notation, and I am still wrong.

Scott Aaronson Quantum miscellany

  1. Tomorrow at 1:30pm US Central time, I’ll be doing an online Q&A with Collective[i] Forecast about quantum computing (probably there will also be questions about AI safety). It’s open to all. Hope to see some of you there!
  2. Toby Cubitt of University College London is visiting UT Austin. We’ve been discussing the question: can you produce a QMA witness state using a closed timelike curve? Since QMA⊆PSPACE, and since Fortnow, Watrous, and I proved that closed timelike curves (or anyway, Deutsch’s model of them) let you solve PSPACE problems, clearly a closed timelike curve lets you solve QMA decision problems, but that’s different from producing the actual witness state as the fixed-point of a polynomial-time superoperator. Toby has a neat recent result, which has as a corollary that you can produce the ground state of a local Hamiltonian using a CTC, if you have as auxiliary information the ground state energy as well as (a lower bound on) the spectral gap. But you do seem to need that extra information.

    Yesterday I realized there’s also a simpler construction: namely, take an n-qubit state from the CTC, and check whether it’s a valid QMA witness, having used Marriott-Watrous amplification to push the probability of error down to (say) exp(-n2). If the witness is valid, then send it back in time unmodified; otherwise replace it by the maximally mixed state. If valid witnesses exist, then you can check that this sets up a Markov chain whose stationary distribution is almost entirely concentrated on such witnesses. (If no valid witnesses exist, then the stationary distribution is just the maximally mixed state, or exponentially close to it.) One drawback of this construction is that it can only produce a Marriott-Watrous state, rather than the “original” QMA witness state.

    Is there a third approach, which overcomes the disadvantages of both mine and Toby’s? I’ll leave that question to my readers!
  3. On the theme of QMA plus weird physics, a wonderful question emerged from a recent group meeting: namely, what’s the power of QMA if we let the verifier make multiple non-collapsing measurements of the same state, as in the “PDQP” model defined by myself, Bouland, Fitzsimons, and Lee? I conjecture that this enhanced QMA goes all the way up to NEXP (Nondeterministic Exponential-Time), by a construction related to the one I used to show that PDQP/qpoly = ALL (i.e., non-collapsing measurements combined with quantum advice lets you decide literally all languages), and that also uses the PCP Theorem. I even have some candidate constructions, though I haven’t yet proven their soundness.

    In the past, I would’ve spent more time on such a problem before sharing it. But after giving some students a first crack, I now … just want to know the answer? Inspecting my feelings in my second year of leave at OpenAI, I realized that I still care enormously about quantum complexity theory, but only about getting answers to the questions, barely at all anymore about getting credit for them. Admittedly, it took me 25 years to reach this state of not caring.

Terence TaoUndecidability of translational monotilings

Rachel Greenfeld and I have just uploaded to the arXiv our paper “Undecidability of translational monotilings“. This is a sequel to our previous paper in which we constructed a translational monotiling {A \oplus F = {\bf Z}^d} of a high-dimensional lattice {{\bf Z}^d} (thus the monotile {F} is a finite set and the translates {a+F}, {a \in A} of {F} partition {{\bf Z}^d}) which was aperiodic (there is no way to “repair” this tiling into a periodic tiling {A' \oplus F = {\bf Z}^d}, in which {A'} is now periodic with respect to a finite index subgroup of {{\bf Z}^d}). This disproved the periodic tiling conjecture of Stein, Grunbaum-Shephard and Lagarias-Wang, which asserted that such aperiodic translational monotilings do not exist. (Compare with the “hat monotile“, which is a recently discovered aperiodic isometric monotile for of {{\bf R}^2}, where one is now allowed to use rotations and reflections as well as translations, or the even more recent “spectre monotile“, which is similar except that no reflections are needed.)

One of the motivations of this conjecture was the observation of Hao Wang that if the periodic tiling conjecture were true, then the translational monotiling problem is (algorithmically) decidable: there is a Turing machine which, when given a dimension {d} and a finite subset {F} of {{\bf Z}^d}, can determine in finite time whether {F} can tile {{\bf Z}^d}. This is because if a periodic tiling exists, it can be found by computer search; and if no tiling exists at all, then (by the compactness theorem) there exists some finite subset of {{\bf Z}^d} that cannot be covered by disjoint translates of {F}, and this can also be discovered by computer search. The periodic tiling conjecture asserts that these are the only two possible scenarios, thus giving the decidability.

On the other hand, Wang’s argument is not known to be reversible: the failure of the periodic tiling conjecture does not automatically imply the undecidability of the translational monotiling problem, as it does not rule out the existence of some other algorithm to determine tiling that does not rely on the existence of a periodic tiling. (For instance, even with the newly discovered hat and spectre tiles, it remains an open question whether the isometric monotiling problem for (say) polygons with rational coefficients in {{\bf R}^2} is decidable, with or without reflections.)

The main result of this paper settles this question (with one caveat):

Theorem 1 There does not exist any algorithm which, given a dimension {d}, a periodic subset {E} of {{\bf Z}^d}, and a finite subset {F} of {{\bf Z}^d}, determines in finite time whether there is a translational tiling {A \oplus F = E} of {E} by {F}.

The caveat is that we have to work with periodic subsets {E} of {{\bf Z}^d}, rather than all of {{\bf Z}^d}; we believe this is largely a technical restriction of our method, and it is likely that can be removed with additional effort and creativity. We also remark that when {d=2}, the periodic tiling conjecture was established by Bhattacharya, and so the problem is decidable in the {d=2} case. It remains open whether the tiling problem is decidable for any fixed value of {d>2} (note in the above result that the dimension {d} is not fixed, but is part of the input).

Because of a well known link between algorithmic undecidability and logical undecidability (also known as logical independence), the main theorem also implies the existence of an (in principle explicitly describable) dimension {d}, periodic subset {E} of {{\bf Z}^d}, and a finite subset {F} of {{\bf Z}^d}, such that the assertion that {F} tiles {E} by translation cannot be proven or disproven in ZFC set theory (assuming of course that this theory is consistent).

As a consequence of our method, we can also replace {{\bf Z}^d} here by “virtually two-dimensional” groups {{\bf Z}^2 \times G_0}, with {G_0} a finite abelian group (which now becomes part of the input, in place of the dimension {d}).

We now describe some of the main ideas of the proof. It is a common technique to show that a given problem is undecidable by demonstrating that some other problem that was already known to be undecidable can be “encoded” within the original problem, so that any algorithm for deciding the original problem would also decide the embedded problem. Accordingly, we will encode the Wang tiling problem as a monotiling problem in {{\bf Z}^d}:

Problem 2 (Wang tiling problem) Given a finite collection {{\mathcal W}} of Wang tiles (unit squares with each side assigned some color from a finite palette), is it possible to tile the plane with translates of these tiles along the standard lattice {{\bf Z}^2}, such that adjacent tiles have matching colors along their common edge?

It is a famous result of Berger that this problem is undecidable. The embedding of this problem into the higher-dimensional translational monotiling problem proceeds through some intermediate problems. Firstly, it is an easy matter to embed the Wang tiling problem into a similar problem which we call the domino problem:

Problem 3 (Domino problem) Given a finite collection {{\mathcal R}_1} (resp. {{\mathcal R}_2}) of horizontal (resp. vertical) dominoes – pairs of adjacent unit squares, each of which is decorated with an element of a finite set {{\mathcal W}} of “pips”, is it possible to assign a pip to each unit square in the standard lattice tiling of {{\bf Z}^2}, such that every horizontal (resp. vertical) pair of squares in this tiling is decorated using a domino from {{\mathcal R}_1} (resp. {{\mathcal R}_2})?

Indeed, one just has to interpet each Wang tile as a separate “pip”, and define the domino sets {{\mathcal R}_1}, {{\mathcal R}_2} to be the pairs of horizontally or vertically adjacent Wang tiles with matching colors along their edge.

Next, we embed the domino problem into a Sudoku problem:

Problem 4 (Sudoku problem) Given a column width {N}, a digit set {\Sigma}, a collection {{\mathcal S}} of functions {g: \{0,\dots,N-1\} \rightarrow \Sigma}, and an “initial condition” {{\mathcal C}} (which we will not detail here, as it is a little technical), is it possible to assign a digit {F(n,m)} to each cell {(n,m)} in the “Sudoku board” {\{0,1,\dots,N-1\} \times {\bf Z}} such that for any slope {j \in {\bf Z}} and intercept {i \in {\bf Z}}, the digits {n \mapsto F(n,jn+i)} along the line {\{(n,jn+i): 0 \leq n \leq N-1\}} lie in {{\mathcal S}} (and also that {F} obeys the initial condition {{\mathcal C}})?

The most novel part of the paper is the demonstration that the domino problem can indeed be embedded into the Sudoku problem. The embedding of the Sudoku problem into the monotiling problem follows from a modification of the methods in our previous papers, which had also introduced versions of the Sudoku problem, and created a “tiling language” which could be used to “program” various problems, including the Sudoku problem, as monotiling problems.

To encode the domino problem into the Sudoku problem, we need to take a domino function {{\mathcal T}: {\bf Z}^2 \rightarrow {\mathcal W}} (obeying the domino constraints associated to some domino sets {{\mathcal R}_1, {\mathcal R}_2}) and use it to build a Sudoku function {F: \{0,\dots,N-1\} \times {\bf Z} \rightarrow \Sigma} (obeying some Sudoku constraints relating to the domino sets); conversely, every Sudoku function obeying the rules of our Sudoku puzzle has to arise somehow from a domino function. The route to doing so was not immediately obvious, but after a helpful tip from Emmanuel Jeandel, we were able to adapt some ideas of Aanderaa and Lewis, in which certain hierarchical structures were used to encode one problem in another. Here, we interpret hierarchical structure {p}-adically (using two different primes due to the two-dimensionality of the domino problem). The Sudoku function {F} that will exemplify our embedding is then built from {{\mathcal T}} by the formula

\displaystyle  F(n,m) := ( f_{p_1}(m), f_{p_2}(m), {\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m)) ) \ \ \ \ \ (1) where {p_1,p_2} are two large distinct primes (for instance one can take {p_1=53}, {p_2=59} for concreteness), {\nu_p(m)} denotes the number of times {p} divides {m}, and {f_p(m) \in {\bf Z}/p{\bf Z} \backslash \{0\}} is the last non-zero digit in the base {p} expansion of {m}:

\displaystyle  f_p(m) := \frac{m}{p^{\nu_p(m)}} \hbox{ mod } p (with the conventions {\nu_p(0)=+\infty} and {f_p(0)=1}). In the case {p_1=3, p_2=5}, the first component of (1) looks like this:

and a typical instance of the final component {{\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m))} looks like this:

Amusingly, the decoration here is essentially following the rules of the children’s game “Fizz buzz“.

To demonstrate the embedding, we thus need to produce a specific Sudoku rule {{\mathcal S}} (as well as a more technical initial condition {{\mathcal C}}, which is basically required to exclude degenerate Sudoku solutions such as a constant solution) that can “capture” the target function (1), in the sense that the only solutions to this specific Sudoku puzzle are given by variants of {F} (e.g., {F} composed with various linear transformations). In our previous paper we were able to build a Sudoku puzzle that could similarly capture either of the first two components {f_{p_1}(m)}, {f_{p_2}(m)} of our target function (1) (up to linear transformations), by a procedure very akin to solving an actual Sudoku puzzle (combined with iterative use of a “Tetris” move in which we eliminate rows of the puzzle that we have fully solved, to focus on the remaining unsolved rows). Our previous paper treated the case when {p} was replaced with a power of {2}, as this was the only case that we know how to embed in a monotiling problem of the entirety of {{\bf Z}^d} (as opposed to a periodic subset {E} of {{\bf Z}^d}), but the analysis is in fact easier when {p} is a large odd prime, instead of a power of {2}. Once the first two components {f_{p_1}(m), f_{p_2}(m)} have been solved for, it is a relatively routine matter to design an additional constraint in the Sudoku rule that then constrains the third component to be of the desired form {{\mathcal T}(\nu_{p_1}(m), \nu_{p_2}(m))}, with {{\mathcal T}} obeying the domino constraints.

Matt Strassler Beyond the Book (and What the Greeks Knew About the Earth)

Since the upcoming book is basically done, it’s time for me to launch the next phase of the project — the supplementary material, which will be placed here, on this website.

Any science book has to leave out many details of the subjects it covers, and omit many important topics. While my book has endnotes that help flesh out the main text, I know that some readers will want even more information. That’s what I’ll be building here over the coming months. I’ll continue to develop this material even after the book is published, as additional readers explore it. For a time, then, this will be a living, growing extension to the written text.

As I create this supplementary material, I’ll first post it on this blog, looking for your feedback in terms of its clarity and accuracy, and hoping to get a sense from you as to whether there are other questions that I ought to address. Let’s try this out today with a first example; I look forward to your comments.

In Chapter 2 of the book, I have written

  • Over two thousand years ago, Greek thinkers became experts in geometry and found clever tricks for estimating the Earth’s shape and size.

This sentence then refers to an endnote, in which I state

  • The shadow that the Earth casts on the Moon during a lunar eclipse is always disk-shaped, no matter the time of day, which can be true only for a spherical planet. Earth’s size is revealed by comparing the lengths of shadows of two identical objects, separated by a known north-south distance, measured at noon on the same day.*

Obviously this is very terse, and I’m sure some readers will want an explanation of the endnote. Here’s the explanation that I’ll post on this website:

Chapter 2, Endnote 1

By the time that ancient Rome’s power was expanding, Greek scholars understood the basics of solar and lunar eclipses. Noting the relation between the phases of the Moon and the positions of the Sun and Moon, they recognized that the Moon’s light is reflected sunlight. They were aware that New Moon occurs when the Sun and Moon are on the same side of the Earth, while Full Moon occurs when they are on the opposite sides. And they knew that a lunar eclipse occurs when the Earth lies between the Sun and Moon, so that the Earth blocks the Sun’s light and casts a shadow on the Moon. This is illustrated (not to scale) in Fig. 1.

Fig. 1: (Top) When the Moon and Sun are on opposite sides of the Earth but not perfectly aligned, the Moon is full: its lit half faces the Earth. (Bottom) But when the Moon moves directly behind the Earth, it enters its shadow and is partly or completely darkened — a “lunar eclipse”. Sizes and distances are not to scale.

From these shadows, they confirmed the Earth was a sphere. Clearly, if the Earth had the shape of an X, it could cast an X-shaped shadow.

Fig. 2: Light from the Sun (orange dashed line) would cause an X-shaped Earth to cast an X-shaped shadow on the Moon.

If Earth were a flat disk like a coin, then depending on the Moon’s location in the sky, the Earth’s shadow might be circular or might be oval. The important point is the shadow of a circular disk is not always a circular disk. You can confirm this with a coin and a light bulb.

Fig. 3: If the Sun and Moon aren’t directly aligned face-on with a disk-shaped Earth, the disk’s shadow on the Moon will be an oval, not a circle.

The only shape which always creates a circular, disk-like shadow, from any angle and from any place and at any time, is a sphere, as you can confirm with a ball and a light bulb.

Fig. 4: Only a spherical Earth always casts a disk-shaped shadow, which causes the bright areas of the moon to be crescent shaped at all times during a partial lunar eclipse.

This is consistent with what is actually observed in eclipses, as in Fig. 4. The circular shadow of the Earth is shown especially clearly when sets of photos taken during an eclipse are carefully aligned.

Next, one you are aware that the Earth’s a sphere (and, as the Greeks also knew, that the Sun is far away ), it’s not hard to learn the size of the Earth. Imagine two vertical towers sitting on flat ground in two different cities. To keep things especially simple, let’s imagine one city is due north of the other, and the distance between them — call it “D” — is already known.

In each city, a person observes their tower’s shadow at exactly noon. (No clock is needed, because one can watch the shadow over time, and noon is when the shadow is shortest.) The end of the shadow and the tower’s base and top form the points of a right-angle triangle, as in Fig. 5, whose other angles we can call ⍺ and 𝜃. For a person squatting at the shadow’s end, the angle ⍺ is easily measured: it is the angle formed by the tower’s silhouette against the sky. Because this is a right-angle triangle, the angle 𝜃 is 90 degrees minus ⍺, so each observer can easily determine 𝜃 for their city’s tower. We’ll call their two measurements 𝜃1 and 𝜃2. Knowing these angles and their distance D, they know everything we need to determine the size of the Earth.

Fig. 5: The right triangle formed by connecting a tower’s base, its top, and the end of its shadow at noon. Of greatest interest is the angle 𝜃.

The key observation is that the difference in these angles is the same as the difference in the latitude between the two cities. To see this, examine Fig. 6 below. The paths of sunlight (the orange dashed lines in Figs. 5 and 6) are parallel to the Earth-Sun line. [This (almost-exact) parallelism is only true because the Sun’s distance from Earth is much larger than the Earth’s size — which the Greeks knew.] Meanwhile the line from the Earth’s center to the first tower (call it L1) is a continuation of the line from that tower’s base to its top. Because (a) L1 forms an angle 𝜃1 with the sunlight, as shown in Fig. 5, and (b) the sunlight line and Earth-Sun line are parallel, as shown in Fig. 6, the intersection of L1 with the Earth-Sun line is also 𝜃1! Similarly, the line from the Earth’s center to tower 2 forms the angle 𝜃2 with the Earth-Sun line. As can be seen in Fig. 6, it follows that the angle between the two lines connecting the Earth’s center to the two towers is 𝜃2 – 𝜃1 , the difference in the noon-time sun angles as seen by the two observers!

Fig. 6: The angles 𝜃1 and 𝜃2 that involve the towers’ shadows (Fig. 5) are also the angles between the Earth-Sun line and the lines connecting the Earth’s center to the two towers; the difference between the two angles is the difference between the two cities’ latitudes.

Now, however, the observers can use the fact that they know D, the distance between the cities. In particular, the distance D is to the Earth’s circumference C just as 𝜃2 – 𝜃1 is to 360 degrees (or, in radians, to 2ℼ). In formulas

  • D/C = (𝜃2 – 𝜃1)/360°

So (dividing and multiplying on both sides) the Earth’s circumference is simply

  • C = D [360° / (𝜃2 – 𝜃1) ]

and since they know both D and 𝜃2 – 𝜃1 , they now know the Earth’s circumference. (Note this correctly says that if the angle were 90 degrees = 2ℼ/4, then D would be C/4.)

Eratosthenes made this measurement (in a slightly different way) around 240 B.C.E. Reports by classical historians do not quite agree on what he found, but in the most optimistic interpretation of the historical record, he was well within 1 percent of the correct answer. And why not? Once you’ve realized what you should do, this is a relatively simple measurement; it’s one that you and a distant friend could carry out yourselves.

Note: You might prefer to see the answer in radians instead of degrees; since 360° = 2ℼ radians, we can write

  • C = D [2ℼ / (𝜃2 – 𝜃1) ] (in radians)

and since C = 2ℼR, where R is the Earth’s radius, that gives us a particularly simple formula

  • R = D / (𝜃2 – 𝜃1) ( in radians)

Note: If the two cities are not due north-south of one another, this poses no problem. Measure tower 1’s shadow at the first city’s noon, and tower 2’s shadow at the second city’s noon on the same day; then take D not to be the distance between the two cities but instead the distance between their latitude lines. Practically speaking, we can make a triangle with one side being the distance between the cities and the other two sides aligned north-south and east-west; then D is the length of the north-south line, as in Fig. 7. With this definition of D, the formulas above are still valid.

Fig. 7: For two cities that are not in a north-south line, D should be defined as the distance between their lines of latitude, and thus the length of the north-south line in a right triangle connecting them. (The triangle should be carefully drawn on the Earth’s globe, which I have not done here.)

September 18, 2023

Sean Carroll Proposed Closure of the Dianoia Institute at Australian Catholic University

Just a few years ago, Australian Catholic University (ACU) established a new Dianoia Institute of Philosophy. They recruited a number of researchers and made something of a splash, leading to a noticeable leap in ACU’s rankings in philosophy — all the way to second among Catholic universities in the English-speaking world, behind only Notre Dame.

Now, without warning, ACU has announced plans to completely disestablish the institute, along with eliminating 35 other academic positions in other fields. This leaves the faculty, some of which left permanent jobs elsewhere to join the new institute, completely stranded.

I sent the letter below to the Vice-Chancellor of ACU and other interested parties. I hope the ongoing international outcry leads the administration to change its mind.

John BaezLife’s Struggle to Survive

I’m giving a public talk, the second of my Leverhulme Lectures at the International Centre for Mathematical Sciences:

Life’s struggle to survive. Tuesday September 26, 6 pm UK time. Room G.03 on the ground floor of the Bayes Centre, 47 Potterrow, Edinburgh.

Abstract. When pondering our future amid global warming, it is worth remembering how we got here. Even after it got started, the success of life on Earth was not a foregone conclusion! In this talk I recount some thrilling, chilling episodes from the history of our planet. For example: our collision with the planet Theia, the “snowball Earth events” when most of the oceans froze over, and the asteroid impact that ended the age of dinosaurs. Some are well-documented, others only theorized, but pondering them may give us some optimism about the ability of life to survive crises.

To attend in person, you need to get a free ticket here. Refreshments will be served after the lecture. If you actually show up, say hi!

If you can’t join us, fear not: I should be able to put a recording of the talk on my YouTube channel eventually. And you can already see the slides here. If you find mistakes in them or just have questions, please let me know! I’m still polishing the slides—and the more questions I field now, the more prepared I’ll be when it comes time to give the actual talk.

Doug NatelsonMeetings this week

This week is the 2023 DOE experimental condensed matter physics PI meeting - in the past I’ve written up highlights of these here (2021), here (2019), here (2017), here (2015), and here (2013).  This year, I am going to have to present remotely, however, because I am giving a talk at this interesting conference at the Kavli Institute for Theoretical Physics.  I will try to give some takeaways of the KITP meeting, and if any of the ECMP attendees want to give their perspective on news from the DOE meeting, I’d be grateful for updates in the comments.

September 17, 2023

n-Category Café Counting Algebraic Structures

The number of groups with nn elements goes like this, starting with n=0n = 0:

0, 1, 1, 1, 2, 1, 2, 1, 5, …

The number of semigroups with nn elements goes like this:

1, 1, 5, 24, 188, 1915, 28634, 1627672, 3684030417, 105978177936292, …

Here I’m counting isomorphic guys as the same.

But how much do we know about such sequences in general? For example, is there any sort of algebraic gadget where the number of gadgets with nn elements goes like this:

1, 1, 2, 1, 1, 1, 1, 1, … ?

No! Not if by “algebraic gadget” we mean something described by a bunch of operations obeying equational laws — that is, an algebra of a Lawvere theory.

This follows from a result of László Lovász in 1967:

On Mastodon, Omar Antolín sketched a proof that greases the wheels with more category theory. It relies on a rather shocking lemma:

Super-Yoneda Lemma. Let C\mathsf{C} be the category of algebras of some Lawvere theory, and let A,BCA, B \in \mathsf{C} be two algebras whose underlying sets are finite. If the functors hom(,A)\mathrm{hom}(-,A) and hom(,B)\mathrm{hom}(-,B) are unnaturally isomorphic, then ABA \cong B.

Here we say the functors hom(,A)\mathrm{hom}(-,A) and hom(,B)\mathrm{hom}(-,B) are unnaturally isomorphic if

hom(X,A)hom(X,B) \mathrm{hom}(X,A) \cong \mathrm{hom}(X,B)

for all XCX \in \mathsf{C}. We’re not imposing the usual commuting naturality square — indeed we can’t, since we’re not even giving any specific choice of isomorphism!

If hom(,A)\mathrm{hom}(-,A) and hom(,B)\mathrm{hom}(-,B) are naturally isomorphic, you can easily show ABA \cong B using the Yoneda Lemma. But when they’re unnaturally isomorphic, you have to break the glass and pull out the Super-Yoneda Lemma.

Given this shocking lemma, it’s easy to show this:

Theorem. Let A,BA, B be two algebras of a Lawvere theory whose underlying sets are finite. If A kB kA^k \cong B^k for some natural number kk then ABA \cong B.

Here’s how. Since A kB kA^k \cong B^k, we have natural isomorphisms

hom(,A) khom(,A k)hom(,B k)hom(,B) k \mathrm{hom}(-,A)^k \cong \mathrm{hom}(-, A^k) \cong \mathrm{hom}(-, B^k) \cong \mathrm{hom}(-,B)^k

so for any XCX \in \mathsf{C} the sets hom(X,A) k\mathrm{hom}(X,A)^k and hom(X,B) k\mathrm{hom}(X,B)^k have the same cardinality. This means we have an unnatural isomorphism

hom(,A)hom(,B) \mathrm{hom}(-,A) \cong \mathrm{hom}(-,B)

The lemma magically lets us conclude that

AB A \cong B

Now, how do we use this to solve our puzzle? Let a(n)a(n) be the number of isomorphism classes of algebras whose underlying set has nn elements. We must have

a(n k)a(n) a(n^k) \ge a(n)

since we’ve just seen that nonisomorphic algebras with nn elements give nonisomorphic algebras with n kn^k elements. So, for example, we can never have a(4)<a(2)a(4) \lt a(2), since 4=2 24 = 2^2. Thus, the sequence can’t look like the one I showed you, with

a(0)=1,a(1)=1,a(2)=2,a(3)=1,a(4)=1,... a(0) = 1, \; a(1) = 1, \; a(2) = 2, \; a(3) = 1,\; a(4) = 1, ...

Nice! So let’s turn to the lemma, which is the really interesting part.

I’ll just quote Omar Antolín’s proof, since I can’t improve on it. I believe the ideas go back to Lovász, but a bit of category theory really helps. Remember, AA and BB are algebras of some Lawvere theory whose underlying sets are finite:

Let mon(X,A)\mathrm{mon}(X, A) be the set of monomorphisms, which here are just homomorphisms that are injective functions. I claim you can compute the cardinality of mon(X,A)\mathrm{mon}(X, A) using the inclusion-exclusion principle in terms of the cardinalities of hom(Q,A)\mathrm{hom}(Q, A) for various quotients of XX.

Indeed, for any pair of elements x,yXx, y \in X, let S(x,y)S(x, y) be the set for homomorphisms f:XAf \colon X \to A such that f(x)=f(y)f(x) = f(y). The monomorphisms are just the homomorphisms that belong to none of the sets S(x,y)S(x, y), so you can compute how many there are via the inclusion-exclusion formula: you’ll just need the cardinality of intersections of several S(x i,y i)S(x_i, y_i).

Now, the intersection of some S(x i,y i)S(x_i, y_i) is the set of homorphisms ff such that for all ii, f(x i)=f(y i)f(x_i) = f(y_i). Those are in bijection with the homorphisms QAQ \to A where QQ is the quotient of XX obtained by adding the relations x i=y ix_i=y_i for each ii.

So far I hope I’ve convinced you that if hom(,A)\mathrm{hom}(-, A) and hom(,B)\mathrm{hom}(-, B) are unnaturally isomorphic, so are mon(,A)\mathrm{mon}(-, A) and mon(,B)\mathrm{mon}(-, B). Now it’s easy to finish: since mon(A,A)\mathrm{mon}(A, A) is non-empty, so is mon(A,B)\mathrm{mon}(A, B), so AA is isomorphic to a subobject of BB. Similarly BB is isomorphic to a subobject of AA, and since they are finite, they must be isomorphic.


But if you look at this argument you’ll see we didn’t use the full force of the assumptions. We didn’t need AA and BB to be algebras of a Lawvere theory. They could have been topological spaces, or posets, or simple graphs (which you can think of as reflexive symmetric relations), or various other things. It seems all we really need is a category C\mathsf{C} of gadgets with a forgetful functor

U:CFinSet U \colon \mathsf{C} \to \mathsf{FinSet}

that is faithful and has some extra property… roughly, that we can take an object in C\mathsf{C} and take a quotient of it where we impose a bunch of extra relations x i=y ix_i = y_i, and maps out of this quotient will behave as you’d expect. More precisely, I think the extra property is this:

Given any XCX \in \mathsf{C} and any surjection p:U(X)Sp \colon U(X) \to S, there is a morphism j:XQj \colon X \to Q such that the morphisms f:XYf \colon X \to Y that factor through jj are precisely those for which U(f)U(f) factors through pp.

Can anyone here shed some light on this property, and which faithful functors U:CFinSetU \colon \mathsf{C} \to \mathsf{FinSet} have it? These papers should help:

but I haven’t had time to absorb them yet.

By the way, there’s a name for categories where the super-Yoneda Lemma holds: they’re called right combinatorial.

And there’s a name for the sequences I’m talking about. If TT is a Lawvere theory, the sequence whose nnth term is the number of isomorphism classes of TT-algebras with nn elements is called the fine spectrum of TT. The idea was introduced here:

  • Walter Taylor, The fine spectrum of a variety, Algebra Universalis 5 (1975), 263–303.

though Taylor used not Lawvere theories but an equivalent framework: ‘varieties’ in the sense of universal algebra. For a bit more on this, go here.

I’m interested in which sequences are the fine spectrum of some Lawvere theory. You could call this an ‘inverse problem’. The direct problem — computing the fine spectrum of a given Lawvere theory — is already extremely difficult in many cases. But the case where there aren’t any equational laws (except trivial ones) is manageable:

Some errors in Harrison’s paper were corrected here:

I suspect Harrison and Tureček’s formulas could be nicely derived using species, since they’re connected to the tree-like structures discussed here:

  • François Bergeron, Gilbert Labelle and Pierre Leroux, Combinatorial Species and Tree-Like Structures, Cambridge U. Press, Cambridge, 1998.

For all I know these authors point this out! It’s been a while since I’ve read this book.

n-Category Café Coalgebraic Behavioural Metrics: Part 1

guest post by Keri D’Angelo, Johanna Maria Kirss and Matina Najafi and Wojtek Rozowski

Long ago, coalgebras of all kinds lived together: deterministic and nondeterministic automata, transition systems of the labelled and probabilistic varieties, finite and infinite streams, and any other arrows α:XFX\alpha: X\to F X for an arbitrary endofunctor F:𝒞𝒞F:\mathcal{C}\to\mathcal{C}.


These different systems were governed by a unifying theory of FF-coalgebra which allowed them to understand each other in their differences and become stronger by a sense of sameness all at once.


However, the land of coalgebra was ruled by a rigid principle. Its name was Behavioural Equivalence, and many felt it was too harsh in its judgement.


Some more delicate transition systems like the probabilistic kinds, found being governed by behavioural equivalence to be unjust towards their nuanced and minute differences, and slowly, diversion began to brew among the peoples.


More and more did the coalgebra find within themselves small distinctions that did not quite outweigh their similarities, and more and more did they feel called to be able to express that in their measurements. The need for a notion of behavioural distance gained momentum, but various paths towards a behavioural distance emerged.


Transition systems as coalgebras

State transition systems are a pervasive object in theoretical computer science. Consider a deterministic finite automaton. The usual textbook definition (see, for example, “Automata and Computability” by Dexter Kozen) of a DFA is a 5-tuple (Q,Σ,δ:Q×ΣQ,FQ,q 0)(Q, \Sigma, \delta : Q \times \Sigma \to Q, F \subseteq Q, q_0 ) consisting of finite set of states QQ, an alphabet Σ\Sigma, a transition function δ:Q×ΣQ\delta : Q \times \Sigma \to Q, that when given a state qQq \in Q and a letter aΣa \in \Sigma, δ(q,a)Q\delta(q,a) \in Q yields a state that can be obtained by reading a letter aa from the alphabet in some state qq. The subset FQF \subseteq Q describes a set of states which are accepting and q 0q_0 denotes the initial state. The set of words over an alphabet Σ\Sigma, which we denote by Σ *\Sigma^{\ast} is the least set satisfying the following ϵΣ *(Empty word ϵ is a word) \epsilon \in \Sigma^{\ast} \quad \text{(Empty word } \epsilon \text{ is a word)} If aΣand wΣ *thenawΣ *(Concatenating a letter to a word yields a word) \text{If }\; a \in \Sigma \;\text{and }\; w \in \Sigma^{\ast} \;\text{then}\; aw \in \Sigma^{\ast} \text{(Concatenating a letter to a word yields a word)}

We can now extend the transition function to operate on words. Define δ^:Q×Σ *Q\hat{\delta}: Q \times \Sigma^{\ast} \to Q by induction, using the construction of a word: δ^(q,ϵ)=q \hat{\delta}(q,\epsilon) = q \quad\quad δ^(q,aw)=δ^(δ(q,a),w) \hat{\delta}(q, aw) = \hat\delta(\delta(q,a),w)

A word ww is accepted by the automaton if δ^(q 0,w)F\hat{\delta}(q_0, w) \in F. The set of all accepted words {wΣ *δ^(q 0,w)F}2 Σ *\{w \in \Sigma^\ast \mid \hat\delta(q_0,w) \in F\} \in 2^{\Sigma^\ast} is called the language of an automaton. The languages that can be accepted by a deterministic automaton with finitely many states are called regular or rational languages.

Now, let’s slightly relax and rearrange the definitions.

First, let’s drop the idea having an initial state. Instead of the language of the automaton, we will now speak of a language qQq \in Q of a state of an automaton. Every automaton induces a function :Q2 Σ *\dagger : Q \to 2^{\Sigma^{\ast}} which is given by q{wΣ δ^(q,w)F}q \mapsto \{ w \in \Sigma^\star \mid \hat\delta(q,w) \in F\}, and the set (q)\dagger(q) is called the language of the state qq.

We can view the set of accepting states, as a function o:Q2o : Q \to 2 to the two-element set, such that o(q)=1qFo(q)=1 \Leftrightarrow q \in F. Currying the transition function yields δ:QQ Σ\delta : Q \to Q^\Sigma. Now, we can use cartesian product and combine the functions oo and δ\delta into a function o,δ:Q2×Q Σ\langle o, \delta\rangle : Q \to 2 \times Q^\Sigma. Intuitively, such function takes a state to its one-step observable behaviour. Finally, we will drop the requirement of the set of states to be finite. Automata with infinitely many states can denote more expressive langauges from the Chomsky hierarchy, such as Context-Free Languages.

Each deterministic automaton can be now viewed as a pair (Q,α:Q2×Q Σ)(Q, \alpha : Q \to 2 \times Q^\Sigma). More interestingly, we can endow the set of all languages 2 Σ *2^{\Sigma^{\ast}} with the structure of a deterministic automaton. Define the acceptance function ϵ?:2 Σ *2\epsilon ? : 2^{\Sigma^{\ast}} \to 2 to be given by ϵ?(L)=1ϵL\epsilon ? (L) = 1 \Leftrightarrow \epsilon \in L. In other words, a language will be an accepting state if and only if it contains the empty word. Now, define a transition function () a:2 Σ *×Σ2 Σ *(-)_a : 2^{\Sigma^{\ast}} \times \Sigma \to {2^{\Sigma^{\ast}}} to be given by (L) a={wawL}(L)_a=\{w \mid aw \in L\}. Intuitively given a language LL and a letter aa, we transition to a language obtained by cutting that letter from words that start with aa in the original language. The transition structure ϵ?,() a\langle \epsilon ? , (-)_a \rangle is often called the semantic Brzozowski derivative, and the automaton 2 Σ *2×(2 Σ *) Σ2^{\Sigma^{\ast}} \to 2\times (2^{\Sigma^{\ast}})^\Sigma is called the (semantic) Brzozowski automaton.

It can be observed that the Brzozowski automaton provides a form of a universal automaton in which every other deterministic automaton embeds in a well-behaved way. By well-behaved, we mean the following: - if a state qQq\in Q in some automaton accepts, then its language (q)\dagger(q) contains an empty word, and so the language is an accepting state in the Brzozowski automaton, - if a state transitions to an another state on some letter aΣa\in \Sigma, then in the Brzozowski automaton, the language of the first state transitions to the language of the second state on the same letter aa.

Formally speaking, let o,δ:Q2×Q Σ\langle o , \delta \rangle : Q \to 2 \times Q^\Sigma be a deterministic automaton with QQ being the set of states. The function o,δ:Q2 Σ *\dagger_{\langle o,\delta\rangle} : Q \to 2^{\Sigma^{\ast}} taking each state to its language satisfies the following for all qQq \in Q: - o(q)=ϵ?( o,δ(q))o(q)=\epsilon?(\dagger_{\langle o,\delta\rangle}(q)), - for all aΣa \in \Sigma, ( o,δ(q)) a= o,δ(δ(a,q))\left(\dagger_{\langle o,\delta\rangle}(q)\right)_a=\dagger_{\langle o,\delta\rangle}\left(\delta(a,q)\right).

In other words, the map o,δ\dagger_{\langle o, \delta\rangle} to the Brzozowski automaton is structure-preserving. More generally, functions between any two deterministic automata satisfying the condition above are called homomorphisms. One can easily show that the Brzozowski automaton satisfies a certain universal property, namely that every other deterministic automaton admits a unique homomorphism to it (which is precisely the map which assigns to a state its language). We can use the Brzozowski automaton and language-assigining map to talk about the behaviour of states – we will say that two states are behaviourally equivalent if they are mapped to the same language in the Brzozowski automaton.

To sum up, the set of languages equipped with an automaton structure provides a universal and fully abstract treatment of the behaviours of states of deterministic automata.

Now, consider a different object of study within theoretical computer science and logic: Kripke frames. The usual way one gives the semantics of modal logic is through Kripke frames. A Kripke frame is a pair (W,RW×W)(W, R \subseteq W \times W) consisting of a set of worlds WW and an accessibility relation RW×WR \subseteq W \times W. A valuation v:W𝒫(AP)v : W \to \mathcal{P}({AP}) is a function, which assigns to each world a set of atomic propositions which are true in it. A Kripke frame along with a valuation forms a Kripke model. Given two Kripke models ((W 1,R 1W 1×W 1),v 1)((W_1, R_1 \subseteq W_1 \times W_1), v_1) and ((W 2,R 2W 2×W 2),v 2)((W_2, R_2 \subseteq W_2 \times W_2), v_2), we call a map f:W 1W 2f : W_1 \to W_2 a pp-morphism if the following conditions hold: - if uR 1vu R_1 v, then f(u)R 2f(v)f(u) R_2 f(v), - if sR 2ts R_2 t for some s=f(u)s = f(u), then uR 1wu R_1 w and t=f(w)t=f(w) for some wW 1w \in W_1, - pv(w)p \in v(w) if and only if pv(f(w))p \in v(f(w)). domae pp-morphisms are well-behaved morphisms preserving the structure of Kripke models (and hence the validity of modal formulae), and are thus somewhat similar in spirit to the homomorphisms between deterministic automata.

Observe that an accessibility relation RW×WR \subseteq W \times W on a set of worlds can be viewed equivalently as a function W𝒫(W)W \to \mathcal{P}(W). Similarly to deterministic automata, we can use the Cartesian product to combine seuthe accessibility relation and the valuation in a frame into a function R,v:W𝒫(W)×𝒫(AP)\langle R,v\rangle:W \to \mathcal{P}(W) \times \mathcal{P}({AP}) which takes a world to its one-step observable behaviour. A Kripke model then becomes a pair (W,α:W𝒫(W)×𝒫(AP))(W, \alpha : W \to \mathcal{P}(W) \times \mathcal{P}(AP)).

Both deterministic automata and Kripke frames are pairs consisting of a set of states, and a one-step transition function which takes a state into some set-theoretic construction describing its bservable behaviour. These set-theoretic constructions are endofunctors SetSet\mathsf{Set} \to \mathsf{Set}, in our case 2×() Σ2 \times (-)^\Sigma and 𝒫()×𝒫(AP)\mathcal{P}(-) \times \mathcal{P}(AP). Deterministic automata and Kripke frames are concrete instances of coalgebras for an endofunctor.

Let’s spell out the concrete definitions.

Let F:𝒞𝒞F : \mathcal{C} \to \mathcal{C} be an endofunctor on some category. An FF-coalgebra is a pair (X,α:XFX)(X, \alpha : X \to F X) consisting of an object XX of category 𝒞\mathcal{C} and an arrow α:XFX\alpha : X \to F X.

A homomorphism from FF-coalgebra (X,β)(X, \beta) to (Y,γ)(Y, \gamma) is a map f:XYf: X \to Y satisfying γf=Ffβ\gamma \circ f = Ff \circ \beta. Homomorphisms of deterministic automata and pp-morphisms are the concrete instatiations of this definition.

Coalgebras and their homomorphisms form a category. Under an approriate size restrictions on the functor FF, there exists a final coalgebra (νF,t)(\nu F, t), which is a final object in the category of coalgebras. In other words, it satisfies the universal property that for every FF-coalgebra (X,β)(X,\beta) there exists a unique homomorphism β:XνF\dagger \beta : X \to \nu F. Final coalgebras provide a abstract domain providing denotation of the behaviour of FF-coalgebras. We already saw that in the case of deterministic automata, Brzozowski automaton provided a final coalgebra, where formal langauges denoted the behaviour of states of automata. Finally, the concrete notion of behavioural equivalence was given by language equivalence.

Behavioural equivalence of quantitative systems

Having introduced the motivation for the study of coalgebras, we move on to a particular example, which will motivate looking at behavioural distance. Consider an endofunctor 𝒟:SetSet\mathcal{D} : \mathsf{Set} \to \mathsf{Set}, which takes each set XX to the set 𝒟X={μ:X[0,1] xXμ(X)=1}\mathcal{D} X = \{\mu : X \to [0,1] \mid \sum_{x \in X} \mu(X) = 1\} of discrete probability distributions over XX. On arrows f:XYf : X \to Y, we have that 𝒟f:𝒟X𝒟Y\mathcal{D} f : \mathcal{D}X \to \mathcal{D}Y is given by 𝒟f(μ)(y)= f(x)=yμ(x)\mathcal{D} f (\mu)(y) = \sum_{f(x) = y} \mu(x), also known as the pushforward measure. Coalgebras for 𝒟\mathcal{D} are precisely Discrete Time Markov chains. Since all the states in such a case have no extra observable behaviour besides the transition probability, it turns out that all the states are behaviourally equivalent. Let’s consider a slightly more involved example, where each state can also be terminating with some probability.

Consider a composite functor 𝒟(1+())\mathcal{D}(1 + (-)) that takes a set XX to 𝒟({*}X):=𝒟(1+X)\mathcal{D}(\{\ast\}\cup X):=\mathcal{D}(1+X). Coalgebras for it can be thought of Markov Chains with potential deadlock behaviour. Now, let’s look at a concrete example of a four state automaton


State zz enters deadlock with probability 11, while state uu has a self loop with probability 11. The functor 𝒟(1+())\mathcal{D}(1 + (-)) admits a final coalgebra and it is not too hard to observe that states uu and zz exhibit completely different behaviour and will be mapped into distinct elements of the carrier of final coalgebra. State yy has 12\frac{1}{2} probability of ending in uu and 12\frac{1}{2} probability of ending in zz. It has two equiprobable options of tranisitioning to states, which behave in a different way from eachother. It is not too hard that yy is also not behaviourally equivalent to uu and zz. Finally, consider the state zz. Let ϵ[0,12]\epsilon \in [0, \frac{1}{2}]. State xx can transition to uu with probability 12ϵ\frac{1}{2} - \epsilon and transition to zz with probability 12+ϵ\frac{1}{2} + \epsilon. Observe that only when ϵ=0\epsilon=0 state xx is behaviourally equivalent to yy. Even for very small non-zero values of ϵ\epsilon, states xx and yy would be considered inequivalent.

For practical purposes, coalgebraic behavioural equivalence (for coalgebras on Set\mathsf{Set}) might be way too restrictive. In the case of systems exposing quantitative effects, it would be more desirable to consider a more robust notion of behavioural distance, which would quantify how similarly two states behave. The abstract idea of distance has been captured by mathematical notion of metric spaces and related objects.

Pseudometrics and PMet\mathsf{PMet}

Underlying the behavioural distance between states of a coalgebra is the mathematical idea of distance. Most commonly, distance is treated in the form of a metric space. A metric space is a set XX with a distance function d:X×X[0,)d:X\times X\to [0,\infty) that satisfies the following axioms for each x,yXx,y\in X. - Separation: d(x,y)=0x=yd(x,y)=0\iff x=y. - Symmetry: d(x,y)=d(y,x)d(x,y)=d(y,x), - Triangle inequality: d(x,z)d(x,y)+d(y,z)d(x,z)\leq d(x,y)+d(y,z).

In the context of coalgebras, however, we may wish to have a slightly altered notion of distance to capture behavioural similarity.

If two states behave identically, then even if they are not the same state, we expect their distance to be zero. Thus, we relax the separation axiom and only require that d(x,x)=0d(x,x)=0 for all xXx\in X. This yields a pseudometric. Note that a pseudometric is still symmetric and still fulfills the triangle inequality. It is also possible to have an asymmetric notion of distance, where both separation and symmetry are relaxed. The distance function obtained like that is a function d:X×X[0,)d:X\times X\to [0,\infty) that only fulfills the triangle inequality, but it is possible that d(x,y)=0d(x,y)=0 and d(x,y)d(y,x)d(x,y)\neq d(y,x) for some xyx\neq y. This is called a hemimetric, and it occurs when formalising distance via fuzzy lax extensions.

Finally, we may wish to bound the distance, to have a way of expressing when two states are “maximally different”. For example, we use this to express that an accepting and a non-accepting state are maximally different. In order to accommodate that, we instead define the distance function as d:X×X[0,]d:X\times X\to [0,\top], where (0,]\top \in (0,\infty]. In the context of behavioural distances on coalgebra, it is common to take =1\top = 1 or =\top = \infty. However, in the case of =\top = \infty, this requires a way of calculating with infinity, and this is resolved by defining distance and addition with infinity as d(x,)=,x;d(,)=0;x+=,x[0,].d(x,\infty) = \infty,\: x\neq \infty; \quad d(\infty,\infty)=0;\quad x+ \infty = \infty,\: x\in [0,\infty].

Once we fix the bound \top and some set XX, we can consider the set PMet(X)\mathsf{PMet}(X) of all pseudometrics dd over XX. It turns out that under the pointwise order and appropriate definitions of meet and join, PMet(X)\mathsf{PMet}(X) becomes a lattice. In particular, the order is given by d 1d 2d 1(x,y)d 2(x,y)x,yX, d_1 \leq d_2 \iff d_1(x,y)\leq d_2(x,y) \quad\forall x,y\in X, and for a subset DPMet(X)D\subseteq \mathsf{PMet}(X), (supD)(x,y):=sup{d(x,y)dD},infD:=sup{ddPMet(X)landdD.dd}.(\sup D)(x,y) := \sup \{d(x,y)\mid d\in D\},\quad\inf D := \sup \{d\mid d \in \mathsf{PMet}(X) \land \forall d'\in D. d\leq d'\}.

Moreover, it is a complete lattice, meaning each subset has a meet and a join. The lattice being complete implies that the category of all pseudometric spaces for a fixed \top is complete and cocomplete, which in turn implies that the category has products and coproducts.

Here’s how we define this category of pseudometric spaces:

The category PMet\mathsf{PMet} for a fixed (0,]\top\in (0,\infty] has - pseudometric spaces (X,d X)(X,d_X) as objects, - nonexpansive maps f:(X,d X)(Y,d Y)f:(X,d_X)\to (Y,d_Y) as morphisms.

The definition of nonexpansive maps is quite immediate: d(f(x),f(y))d(x,y)x,yX,d(f(x),f(y))\leq d(x,y)\quad \forall x,y\in X, i.e. the maps do not increase distances between elements. Clearly, the composition of nonexpansive maps is nonexpansive as well, and the identity map is nonexpansive.

Motivation from transportation theory

In the long run, we would like to be able to go from distances on the states of coalgebra to distances on observable behaviours. For now, let’s focus our attention on the discrete distributions functor. Let XX be a set equipped with a \top-pseudometric structure on it, given by d X:X×X[0,]d_X : X \times X \to [0, \top] and let’s say we have two distributions ν,μ𝒟X\nu, \mu \in \mathcal{D}X. We would like define a pseudometric structure d 𝒟X:𝒟X×𝒟X[0,]d_{\mathcal{D} X} : \mathcal{D}X \times \mathcal{D}X \to [0,\top] which would allow us to calculate the distance between ν\nu and μ\mu.

It turns out, there is well-known answer to this question, coming from the field of transportation theory. Let’s make the example slightly more concrete. Let X={A,B,C}X = \{A,B,C\} and from now on we will omit the subscript on the map d X:X×X[0,]d_X :X \times X \to [0,\top]

Imagine that A,BA, B and CC are three bakeries with an adjacent shop where one can taste the pastries. We have that d(A,A)=d(B,B)=d(C,C)=0d(A,A)=d(B,B)=d(C,C)=0. Moreover, let’s say that distance between AA and BB is three (d(A,B)=d(B,A)=3d(A,B)=d(B,A)=3), while BB and CC are in distance four (d(B,C)=d(C,B)=4d(B,C)=d(C,B)=4) and AA and CC are in distance five (d(A,C)=d(C,A)=5d(A,C)=d(C,A)=5). Hence, dd is a metric space.

Let ν𝒟X\nu \in \mathcal{D} X be the distribution of supply of the pastries in each of the bakeries. ν(A)=0.7\nu(A) = 0.7, ν(B)=0.1\nu(B)=0.1 and ν(C)=0.2\nu(C)=0.2

Let μ𝒟X\mu \in \mathcal{D} X be the distribution of demand in pastry shops adjactent by the bakeries. We have that μ(A)=0.2\mu(A) = 0.2, μ(B)=0.3\mu(B)=0.3 and μ(C)=0.5\mu(C)=0.5.


In bakery AA, there are more pastries produced than being consumed, while in B,CB, C more people are having their pastry than B,CB, C can actually produce. The reasonable thing to do for an owner of these places, would be to redistribute the pastries so the needs of customers are satisfied. One way of solving this problem would be to come up with a transport plan, a map t:X×X[0,1]t : X \times X \to [0,1], where t(x,y)=kt(x,y) = k would intuitively mean move the ratio of kk pastries from bakery xx to yy. Such a transport plan should satisfy

  • For all xXx \in X, ν(x)= xXt(x,x)\nu(x) = \sum_{x' \in X} t(x,x') - whole supply of pastries is used
  • For all xXx \in X, μ(x)= xXt(x,x)\mu(x) = \sum_{x ' \in X'} t(x',x) - all demand is satisfied

In other words, one can think of t:X×X[0,1]t : X \times X \to [0,1] as a joint probability distribution t𝒟(X×X)t \in \mathcal{D}(X \times X) which can be marginalised into ν\nu and η\eta. Such a joint distribution, which equivalently describes a transportation plan is called a coupling of η\eta and μ\mu. There can be multiple transportation plans, but some might involve unnecessary movement of pastries to the bakery which is too far away. Each transportation plan can be assigned a cost, proportional to the distance that each fraction of pastries needs to travel. The cost of plan tt is given by c tc_t defined as c t= (x,y)X×Xd(x,y)t(x,y)c_t = \sum_{(x,y) \in X \times X} d(x,y)t(x,y)

Let Γ(ν,μ)\Gamma(\nu, \mu) denote the set of all couplings of ν\nu and μ\mu. The owner of the bakeries is aiming to minimise the total cost of transportation. The minimal cost of the tranportation from μ\mu to ν\nu is given by the following d 𝒟(μ,ν)=min{ (x,y)X×Xd(x,y)t(x,y)tΓ(ν,μ)} d^{\mathcal{D}\downarrow}(\mu, \nu) = \min\{\sum_{(x,y) \in X \times X} d(x,y)t(x,y) \mid t \in \Gamma(\nu, \mu)\}

It turns out that phrasing this minimisation problem as a linear program guarantees existence of the optimal coupling, which costs the least. The definition above actually defines takes a pseudometric d:X×X[0,]d : X \times X \to [0, \top] and transforms it into a pseudometric d 𝒟:𝒟X×𝒟X[0,]d^{\mathcal{D}\downarrow} : \mathcal{D} X \times \mathcal{D} X \to [0, \top] on distributions over XX. This (pseudo)metric is known as Wasserstein metric.

In our example, it turns out that the most cost optimal plan is to do the following: - Move 15\frac{1}{5} of pastries from AA (where there is an overproduction) to BB - Move 310\frac{3}{10} of pastries from AA (where there is an overproduction) to CC

The optimal cost is given by the following d 𝒟(ν,μ)=153+3105=2.1d^{\mathcal{D}\downarrow}(\nu,\mu)= \frac{1}{5}\cdot 3 +\frac{3} {10} \cdot 5 = 2.1

Now, consider an alternative approach. Let’s say that instead of organising transport on their own, the owner of the bakery will hire an external company to do that. In such a setting the company will assign to each bakery a price for which it buy pastries (in case of overproduction) or sells (in the case of higher demand). Formally it will be modelled as functions f:X +f : X \to \mathbb{R}^{+} satisfying that for all x,yXx,y \in X, we have that f(x)f(y)d(x,y)\mid f(x) - f(y)\mid \leq d(x,y) One can think of the requirement above that intuitively owner of the bakeries won’t accept situation when they have to pay more for the transport than when performing the transport on their own. Formally speaking, it means that ff is a nonexpansive function from (X,d)(X,d) to nonnegative reals equipped with euclidean norm d ed_e. Given a price plan f:X +f : X \to \mathbb{R}^+ the income of the company will be given by xXf(x)(μ(x)ν(x)) \sum_{x \in X} f(x) (\mu(x) - \nu(x))

The external company would like to maximise their profits, so the optimal profit would be given by d 𝒟(μ,ν)=sup{ xXf(x)(μ(x)ν(x))f:(X,d)( +,d e) nonexpansive} d^{\uparrow \mathcal{D}}(\mu, \nu)= \sup\left\{ \sum_{x \in X}f(x)(\mu(x)-\nu(x)) \mid f : (X, d) \to (\mathcal{R}^{+}, d_e) \text{ nonexpansive}\right\}

The optimal pricing plan exists (again by formulation of the problem as the linear programming problem), so the above is well-defined. d 𝒟d^{\mathcal{D}\uparrow} happens to be a pseudometric, as long as dd is. This notation of distance between distributions is known as Kantorovich metric.

For the example above, the optimal price plan is given by the following: - Owner of the bakeries gives out the excess from AA for free (f(A)=0f(A)=0) - Bakery BB buys necessary pastries for three monetary units (f(B)=3f(B)=3) - Bakery CC does the same with the cost of five units (f(C)=5f(C)=5) In such a case, the distance is given by the following d 𝒟(μ,ν)=0(0.20.7)+3(0.30.1)+5(0.50.2)=2.1d^{\mathcal{D}\uparrow}(\mu, \nu) = 0 \cdot (0.2-0.7) + 3 \cdot (0.3 - 0.1) + 5 \cdot (0.5 - 0.2)= 2.1

In this particular case, we ended up with the same distance as before, when talking about transport plans. It is no coindidence and actually an instance of the quite famous result, know as Kantorovich-Rubinstein duality. We have that d 𝒟=d 𝒟 d^{\mathcal{D}\uparrow}=d^{\mathcal{D}\downarrow} One can intuitively think that minimising the transportation cost is dual to the external company trying to maximise their profit.

Liftings to PMet\mathsf{PMet}

The above distances can also be thought of as answers to the following lifting problem. If FF is a functor on Set\mathsf{Set}, then how can one construct a functor F¯\overline{F} on PMet\mathsf{PMet} such that drawing1

commutes? In other words, how can we construct a functor F¯:PMetPMet\overline{F}:\mathsf{PMet}\to \mathsf{PMet} that acts as FF on the underlying sets and nonexpansive functions?

The first priority when defining a lift is that F¯\overline{F} should be a functor. That is, when given a pseudometric space (X,d)(X,d), we need to define a pseudometric on FXFX in a sensible way. Since the definition of a lift implies that F¯f\overline{F}f is just FfFf for all nonexpansive f:(X,d X)(Y,d Y)f:(X,d_X)\to (Y,d_Y) and we need FfFf to be nonexpansive, then the pseudometrics on the objects FXFX and FYFY need to ensure that FfFf is nonexpansive. Then and only then does F¯\overline{F} become a functor.

Put precisely, in order to have a lift, we require that - for any pseudometric space (X,d X)(X,d_X), we have a pseudometric space (FX,d X F)(FX,d_X^F) such that - for any nonexpansive functions f:(X,d X)(Y,d Y)f:(X,d_X)\to (Y,d_Y), the function Ff:(FX,d X F)Ff:(FX,d_X^F) and (FY,d Y F)(FY,d_Y^F) is nonexpansive.

In the above case of the distance between two distributions, an initial distance d Xd_X between elements of XX was given, and the distance on 𝒟X\mathcal{D}X was defined. The functoriality of such a construction was not checked, but it holds (and the proof in the abstract case can be found in the referential paper).

Inspired by the constructions in the case of 𝒟\mathcal{D}, Kantorovich and Wasserstein liftings can be defined for arbitrary functors. This is also where the relevance to coalgebras becomes evident – the notion of a lift enables us to start with a pseudometric on the state space XX, and induce from that a pseudometric on the space of possible behaviours FXFX.

In the mechanics of lifting a pseudometric, it is useful to consider an evaluation function: a function ev F:FX[0,]ev_F:FX\to [0,\top] for all objects XX. Then, we can consider the function F˜f:FX[0,]\widetilde{F}f:FX\to [0,\top] defined as F˜f=ev FFf\widetilde{F}f = ev_F\circ Ff for morphisms f:X[0,]f:X\to[0,\top]. Certain choices of evaluation functions may arise in the case of particular functors. For example, in the case of 𝒟\mathcal{D}, the evaluation function 𝒟X[0,1]\mathcal{D}X\to [0,1] is taken to be the expected value 𝔼()\mathbb{E}(-).

Kantorovich lifting

The Kantorovich lifting is defined as follows. Let FF be a functor on Set\mathsf{Set} with an evaluation function ev Fev_F. Then the Kantorovich lifting is the functor F¯\overline{F} that takes (X,d)(X,d) to the space (FX,d F)(FX,d^{\uparrow F}), where d F(t 1,t 2):=sup{d e(F˜f(t 1),F˜f(t 2))f:(X,d)([0,],d e)}.d^{\uparrow F}(t_1,t_2) := \sup\{\: d_e\left( \widetilde{F}f(t_1),\widetilde{F}f(t_2)\right)\:\mid\: f:(X,d)\to ([0,\top],d_e) \:\}.

This is in fact the smallest possible pseudometric that makes all functions F˜f:FX[0,]\widetilde{F}f:FX\to [0,\top] nonexpansive. The Kantorovich lifting preserves isometries.

Wasserstein lifting

The Wasserstein distance d Fd^{\downarrow F} that is required to define the Wasserstein lifting, needs more work to set up. Altogether, it requires a generalised definition of couplings, some well-behavedness conditions from the evaluation function, and the preservation of weak pullbacks from the functor.

Tommaso DorigoThe Monte Carlo Method

These days I am in Paris, for a short vacation - for once, I am following my wife in a work trip; she performs at the grand Halle at la Villette (she is a soprano singer), and I exploit the occasion to have some pleasant time in one of the cities I like the most.

This morning I took the metro to go downtown, and found myself standing up in a wagon full of people. When my eyes wandered to the pavement, I saw that the plastic sheet had circular bumps, presumably reducing the chance of slips. And the pattern immediately reminded me of the Monte Carlo method, as it betrayed the effect of physical sampling of the ground by the passengers' feet:

read more

n-Category Café Finite Model Theory and Game Comonads: Part 2

guest post by Elena Dimitriadis, Richie Yeung, Tyler Hanks, and Zhixuan Yang

In the Part 1 of this post, we saw how logical equivalences of first-order logic (FOL) can be characterised by a combinatory game, but there are still a few unsatisfactory aspects of the formulation of EF games in Part 1:

  1. The game was formulated in a slightly informal way, delegating the precise meaning of “turns”, “moves”, “wins” to our common sense.

  2. There are variants of the EF game that characterise logical equivalences for other logics, but these closely related games are defined ad hoc rather than as instances of one mathematical framework.

  3. We have confined ourselves entirely to the classical semantics of FOL in the category of sets, rather than general categorical semantics.

So you, a patron of the n-Category Café, must be thinking that category theory is perfect for addressing these problems! This is exactly what we are gonna talk about today—the framework of game comonads that was introduced by Abramsky, Dawar and Wang (2017) and Abramsky and Shah (2018).

(We will not address the third point above in this post though, but hopefully the reader will agree that what we talk about below is a useful first step towards model comparison games for general categorical logic.)

One-Way EF Games

Let’s warm up by recalling EF games and considering a simplified version of them.

Recall that a kk-round EF game is parameterized by two σ\sigma-relational structures 𝒜\mathcal{A} and \mathcal{B} for some relational vocabulary σ\sigma. The rule is that in every round 1ik1 \leq i \leq k, the spoiler picks either an element from 𝒜\mathcal{A} or an element from \mathcal{B}, and then the duplicator responds with an element from the other structure. The duplicator wins if after kk-rounds, these elements form a partial isomorphism.

In the game the spoiler has the freedom in each round to pick the structure 𝒜\mathcal{A} or \mathcal{B}, so EF games are also sometimes called back-and-forth games. We can also consider the one-way variant of EF games from 𝒜\mathcal{A} to \mathcal{B}, where the spoiler can only pick elements a ia_i from 𝒜\mathcal{A} (so the duplicator responds with elements b ib_i from \mathcal{B}). Additionally, we weaken the winning condition for the duplicator to be a ib ia_i \mapsto b_i forming a partial homomorphism from 𝒜\mathcal{A} to \mathcal{B} (rather than a partial isomorphism).

It is not difficult to modify the Ehrenfeucht-Fraïssé theorem that we saw last time to show that such one-way EF games characterise the fragment of FOL that only uses \exists, \wedge, \vee, true\text{true}, false\text{false}. This fragment of FOL is known as existential-positive fragment of first-order logic or coherent logic.

Theorem (Existential Ehrenfeucht-Fraïssé). If the duplicator has a winning strategy for the kk-round one-way EF game from 𝒜\mathcal{A} to \mathcal{B}, then AφA \models \varphi implies BφB \models \varphi for all closed formulas φ\varphi of quantifier rank kk in the existential-positive fragment.

A consequence of the one-way EF games is that now a winning strategy for the duplicator can be thought of as a function from the half-board of 𝒜\mathcal{A}-elements a 1,,a i\langle a_1, \cdots, a_i\rangle in each round ii to the duplicator’s response b ib_i \in \mathcal{B}, instead of a function from the whole board a 1,,a i,b 1,,b i1\langle\langle a_1, \cdots, a_i\rangle, \langle b_1, \cdots, b_{i-1}\rangle\rangle to their response b ib_i. The reason is that the \mathcal{B}-elements b 1,,b i1\langle b_1, \cdots, b_{i-1}\rangle were all picked by the duplicator themselves before, so the duplicator knows what they are, and they don’t have to be in the input to the duplicator’s winning strategy.

Write A kA^{\leq k} for the set {a 1,,a iA *1ik}\left\{\langle a_1, \cdots, a_i\rangle \in A^\ast \mid 1 \leq i \leq k\right\} of non-empty AA-sequences of length at most kk. Clearly, not every function f:A kBf : A^{\leq k} \to B is a valid winning strategy for the duplicator for the one-way EF game, since the duplicator has to make sure their responses b ib_i maintain a partial homomorphism to the spoiler’s choices a ia_i.

More precisely, for all half-boards a 1,,a iA k\langle a_1, \cdots, a_i\rangle \in A^{\leq k}, if some of its elements are related by a relation PσP \in \sigma of arity mm, i.e. P A(a j 1,,a j m)P^A(a_{j_1}, \dots, a_{j_m}) holds for 1j 1,,j mi1 \leq j_1, \dots, j_m \leq i, the duplicator’s strategy f:A kBf : A^{\leq k} \to B must make P B(fa 1,,a j 1,,fa 1,,a j m)hold. P^B(f\;\langle a_1, \dots, a_{j_1}\rangle, \dots, f\;\langle a_1, \dots, a_{j_m}\rangle) \ \text{hold}.

The EF Game Comonad

Now we are ready to formulate EF games in a more categorical way using comonads. Recall that the Kleisli presentation a comonad (G,ϵ,() *)(G, \epsilon, (-)^\ast) on a category 𝒞\mathcal{C} is given by

  • a map G:Obj(𝒞)Obj(𝒞)G: \mathop{Obj}(\mathcal{C}) \to \mathop{Obj}(\mathcal{C}),

  • for all A𝒞A\in \mathcal{C}, a counit ϵ A:GAA\epsilon_A: G A \to A, and

  • a co-extension operation () *(-)^\ast that takes every morphism f:GABf : G A \to B to a morphism f *:GAGBf^\ast: G A \to G B

such that the following equations hold for all f:GABf : G A \to B and g:GBCg : G B \to C: (gf *) * = g *f * :GAGC id GA = ϵ A * :GAGA f = ϵ Bf * :GAB. \begin{array}{rcll} (g\circ {f^\ast})^\ast &=& g^\ast\circ f^\ast &: G A \to G C\\ \mathit{id}_{G A} &=& \epsilon_A^\ast &: G A \to G A\\ f &=& \epsilon_B\circ f^\ast &: G A \to B. \end{array}

Given any natural number kk, the mapping from sets AA to sets A kA^{\leq k} of non-empty AA-sequences of length at most kk can be equipped with a comonad structure E kE_k on Set\mathbf{Set}:

  • E kA=A kE_k A = A^{\leq k} for all sets AA;

  • ϵ Aa 1,,a i=a i\epsilon_A \langle a_1, \dots, a_i\rangle = a_i extracts the last element of the sequence (which corresponds to the newest choice by the spoiler in the one-way EF game);

  • for all f:E kABf: E_k A \to B, the co-extension f *:E kAE kBf^\ast : E_k A \to E_k B is f *a 1,,a i=fa 1,fa 1,a 2,,fa 1,,a i,f^\ast\;\langle a_1, \ldots, a_i\rangle = \langle f \; \langle a_1\rangle,\ f \; \langle a_1,a_2\rangle,\ \dots, f \; \langle a_1, \ldots, a_i\rangle\rangle, which intuitively means that the duplicator can recall their own historical moves on the half-board on BB given the spoiler’s half-board on AA.

A co-Kleisli map f:E kABf : E_k A \to B for this comonad is then a function from half-boards of AA-elements to responses in BB, so ff encodes precisely a (not necessarily winning) strategy for the duplicator.

The way to formulate winning strategies is to lift the comonad E kE_k on Set\mathbf{Set} to a comonad 𝔼 k\mathbb{E}_k on the category (σ)\mathcal{R}(\sigma) of σ\sigma-relational structures.

Definition (EF Comonad). The comonad 𝔼 k\mathbb{E}_k on (σ)\mathcal{R}(\sigma) is defined as follows:

  • The object mapping sends every σ\sigma-structure 𝒜=A,{P A} Pσ\mathcal{A} = \langle{A, \{P^A\}_{P \in \sigma}}\rangle to the σ\sigma-structure 𝔼 k𝒜=E kA,{P E kA} Pσ\mathbb{E}_k \mathcal{A} = \langle E_k A, \{P^{E_k A}\}_{P\in\sigma}\rangle whose underlying set is just E kAE_k A, i.e. non-empty AA-sequences of length at most kk; for every relation symbol PσP \in \sigma of arity nn, its interpretation P E kA(E kA) nP^{E_k A} \subseteq (E_k A)^n relates sequences s 1,,s n\langle s_1,\dots,s_n\rangle satisfying the following two conditions:

    1. for all 1i,jn1 \leq i, j \leq n, the sequence s is_i is a prefix of s js_j or s js_j is a prefix of s is_i, and
    2. ϵ A(s 1),,ϵ A(s n)P 𝒜\langle \epsilon_A(s_1), \dots, \epsilon_A(s_n)\rangle\in P^\mathcal{A}.
  • The counit ϵ 𝒜:𝔼 k𝒜𝒜\epsilon_{\mathcal{A}} : \mathbb{E}_k \mathcal{A} \to \mathcal{A} and the co-extension f *:𝔼 k𝒜𝔼 kf^\ast : \mathbb{E}_k \mathcal{A} \to \mathbb{E}_k \mathcal{B} are the same as those of the comonad E k:SetSetE_k : \mathbf{Set} \to \mathbf{Set}. It can be checked that they are valid morphisms in the category (σ)\mathcal{R}(\sigma).

Co-Kleisli morphisms 𝔼 k𝒜\mathbb{E}_k \mathcal{A} \to \mathcal{B} are exactly winning strategies for the duplicator in the one-way EF game frome 𝒜\mathcal{A} to \mathcal{B} (modulo one subtle problem that we will talk about later). Let’s gain some intuition by looking at a small example.

Let us have two σ\sigma-structures, 𝒜\mathcal{A} and \mathcal{B}, with underlying sets A={a,b,c}A=\{a,b,c\} and B={a,b,c,d}B=\{a',b',c',d'\} respectively, and let us play a 3-round one-way EF game. We will try to associate the objects of the comonad as we go: * The spoiler chooses aAa\in A. In response, the duplicator chooses aBa' \in B: ϵ Aa=a,fa=a,f *a=a.\epsilon_A \langle a\rangle=a,\quad f \langle a\rangle=a',\quad f^\ast \langle a\rangle=\langle a'\rangle.

  • The spoiler chooses bAb\in A. In response, the duplicator chooses bBb'\in B: ϵ Aa,b=b,fa,b=b,f *a,b=a,b.\epsilon_A \langle a,b\rangle=b,\quad f \langle a,b\rangle=b',\quad f^\ast \langle a,b\rangle=\langle a',b'\rangle.

  • The spoiler chooses cAc\in A. In response, the duplicator chooses cBc'\in B: ϵ Aa,b,c=c,fa,b,c=c,f *a,b,c=a,b,c.\epsilon_A \langle a,b,c\rangle=c,\quad f \langle a,b,c\rangle=c',\quad f^\ast \langle a,b,c\rangle=\langle a',b',c'\rangle.

For intuition on how the EF comonad 𝔼 k\mathbb{E}_k acts on relations, let us look at the binary relation case. The general case has the same intuition, but with more terms.

Condition (1) of the above definition imposes that one sequence be a prefix of the other. In game terms, this amounts to that s 1s_1 and s 2s_2 can be stages of the same game play: either s 1s_1 evolves to s 2s_2 or s 2s_2 evolves to s 1s_1.

Condition (2) says that two sequences s 1s_1 and s 2s_2 are related iff their last elements are related. Since under Condition (1), s 1s_1 and s 2s_2 are two stages of the same play, so s 1s_1 and s 2s_2 being related mean precisely that two elements in a game are related.

Let us go back to our earlier example with our σ\sigma-structures A={a,b,c}A=\{a,b,c\} and B={a,b,c,d}B=\{a',b',c',d'\}, and suppose there is binary relation symbol \leq in σ\sigma whose interpretation in AA and BB are the alphabetical order \leq. Then, we can say that a,b E kAa,b,c\langle a,b\rangle \leq^{E_k A} \langle a,b,c\rangle, because on one hand both sequences start in the same way (this game has started with the spoiler playing aa and then bb), and on the other hand bcb\leq c in the alphabetical order.

Now the intuition behind the comonad 𝔼 k\mathbb{E}_k might be clearer: in the σ\sigma-relational structure 𝔼 k𝒜\mathbb{E}_k\mathcal{A}, some sequences s 1,,s ns_1, \dots, s_n are related means precisely that there is a game play for which s 1,,s ns_1, \dots, s_n are the spoiler’s half-boards on 𝒜\mathcal{A} in certain rounds, and the spoiler’s choices ϵ(s 1),,ϵ(s n)\epsilon(s_1), \dots, \epsilon(s_n) are related by some relation in 𝒜\mathcal{A}. Therefore, a σ\sigma-structure homomorphism 𝔼 k𝒜\mathbb{E}_k \mathcal{A} \to \mathcal{B} is a winning strategy for the duplicator.

There is one problem though: we have lifted all relations PσP \in \sigma on 𝒜\mathcal{A} to 𝔼 k𝒜\mathbb{E}_k \mathcal{A}, but there is a special built-in relation in FOL—the equality x=yx = y. The one-way EF game asks that after every round, the chosen 𝒜\mathcal{A}-elements and the chosen \mathcal{B}-elements form a partial isomorphism, so if the spoiler chooses the same element a𝒜a \in \mathcal{A} in two rounds, the duplicator must respond with the same elements as well. Otherwise it won’t be a partial isomorphism. However, this requirement is not captured by co-Kleisli morphisms f:𝔼 k𝒜f : \mathbb{E}_k \mathcal{A} \to \mathcal{B}, since e.g. a\langle a\rangle and a,b,a\langle a, b, a\rangle are different elements of 𝔼 k𝒜\mathbb{E}_k \mathcal{A}, ff do not need to map them to the same response.

This motivates us to consider what is called I-morphisms, which are coKleisli morphisms f:𝔼 k𝒜f : \mathbb{E}_k \mathcal{A} \to \mathcal{B} such that f(s 1)=f(s 2)f(s_1) = f(s_2) whenever s 1s_1 is a prefix of s 2s_2 and ϵ(s 1)=ϵ(s 2)\epsilon(s_1) = \epsilon(s_2). (There is an alternative definition of I-morphisms using relative comonads, but the direct definition suffices for our purposes in this post.)

Theorem. There is an winning strategy for the duplicator in the kk-round one-way EF game from 𝒜\mathcal{A} to \mathcal{B} if and only if there is an I-morphism f:𝔼 k𝒜f : \mathbb{E}_k \mathcal{A} \to \mathcal{B}.

Since kk-round one-way EF games characterise logical refinement of existential-positive FOL, a direct corollary is that two σ\sigma-structure 𝒜\mathcal{A} and \mathcal{B} agree on all closed existential-positive FOL formula φ\varphi of rank kk if and only if there are two I-morphisms 𝔼 k𝒜\mathbb{E}_k \mathcal{A} \to \mathcal{B} and 𝔼 k𝒜\mathbb{E}_k \mathcal{B} \to \mathcal{A}.

Back-and-Forth Games

If we ask to have two coKleisli morphisms but nothing else of them, we get the existential-positive fragment, but recall our original goal was to characterise FOL and EF games. What if we ask for the morphisms to have a relationship between them? What if, as we usually do, we ask them to be inverses of each other?

It turns out to characterise the logical equivalence of FOL augmented with counting quantifiers nx.φ\exists_{\leq n}x. \varphi and nx.φ\exists_{\geq n} x. \varphi, whose semantics is that there exist at most nn, and respectively at least nn, elements a𝒜a \in \mathcal{A} making φ\varphi true.

Proposition. Two finite σ\sigma-structures 𝒜\mathcal{A} and \mathcal{B} agree on all closed formulas on FOL with counting quantifiers if and only if there is a pair of II-morphisms f:𝔼 k𝒜f : \mathbb{E}_k\mathcal{A} \to \mathcal{B} and g:𝔼 k𝒜g : \mathbb{E}_k\mathcal{B} \to \mathcal{A} that are mutual inverses (in the co-Kleisli category of 𝔼 k\mathbb{E}_k).

So if we don’t ask anything from the homomorphisms we get too little; but if we ask for them to be an isomorphism we go overboard and get too much. Can we find a middle ground?

Yes, the key intuition for this is that the duplicator can play the back-and-forth game like a one-way game when the spoiler keeps choosing elements from one structure, but the duplicator must have a plan B in mind in case the spoiler switches to choosing elements from the other structure in the next round.

This motivates what is called a locally invertible pair. Given two σ\sigma-structure 𝒜\mathcal{A} and \mathcal{B} and a natural number kk as usual, a locally invertible pair F,G\langle F,G\rangle consists of a set FF of co-Kleisli I-morphisms 𝔼 k𝒜\mathbb{E}_k \mathcal{A} \to \mathcal{B} and a set GG of co-Kleisli I-morphisms 𝔼 k𝒜\mathbb{E}_k \mathcal{B} \to \mathcal{A} such that

  1. for all fFf \in F and s𝔼 k𝒜s \in \mathbb{E}_k \mathcal{A}, there is some g f,sGg_{f,s} \in G with g f,s *(f *(s))=sg_{f,s}^\ast (f^\ast(s)) = s, and
  2. for all gGg \in G and t𝔼 kt \in \mathbb{E}_k \mathcal{B}, there is some f g,tFf_{g,t} \in F with f g,t *(g *(t))=tf_{g,t}^\ast (g^\ast(t)) = t.

Theorem. The duplicator has a winning strategy for the kk-round EF game on 𝒜\mathcal{A} and \mathcal{B} if and only if there is a non-empty locally invertible pair F,G\langle F,G\rangle.

Proof. Assuming a non-empty locally invertible pair F,G\langle F,G\rangle, the duplicator has the following strategy such that the chosen elements s iA is_i \in A^i and t iB it_i \in B^i after round ii satisfy the condition ϕ i=f iF.g iG.(s i,t i)=(s i,f *s i)=(g *t i,t i). \phi_i = \exists f_i \in F. \exists g_i \in G. (s_i, t_i) = (s_i, f^\ast s_i) = (g^\ast t_i, t_i). 1. In round 11, if the spoiler chooses an element a 1Aa_1 \in A, the duplicator can pick an arbitrary fFf \in F and respond with fa 1Bf \, \langle a_1\rangle \in B. The condition ϕ 1\phi_1 is witnessed by f 1=ff_1 = f and g 1=g f,a 1g_1 = g_{f, \langle a_1\rangle} given by the definition of locally invertible pairs. If the spoiler chooses an element b 1Bb_1 \in B, the argument is the same.

  1. In round i+1i+1, if the spoiler chooses an element a i+1a_{i+1}, the duplicator responds with b i+1=f ia 1,,a i+1b_{i+1} = f_i\,\langle a_1, \dots, a_{i+1}\rangle. The condition ϕ i+1\phi_{i+1} is then witnessed by f i+1=f if_{i+1} = f_i and g i+1=g f i,a 1,,a i+1g_{i+1} = g_{f_i, \langle a_1, \dots, a_{i+1}\rangle}.

After kk-rounds, ϕ k\phi_k is true and it implies that a 1,,a k\langle a_1, \dots, a_k\rangle and b 1,,b k\langle b_1, \dots, b_k\rangle are a partial isomorphism because f kf_k and g kg_k are coKleisli morphisms 𝔼 k𝒜\mathbb{E}_k \mathcal{A} \to \mathcal{B} and 𝔼 k𝒜\mathbb{E}_k \mathcal{B} \to \mathcal{A}.

For the other direction, assuming the duplicator has a winning strategy, let ΦA k×B k\Phi \subseteq A^{\leq k} \times B^{\leq k} be the set of all possible game states in each round following the winning strategy. The locally invertible pair is F={f:𝔼 k𝒜s𝔼 k𝒜.(s,f *s)Ψ}, G={g:𝔼 k𝒜t𝔼 k.(g *t,t)Ψ}. \begin{array}{l} &F = \left\{f : \mathbb{E}_k \mathcal{A} \to \mathcal{B} \mid \forall s \in \mathbb{E}_k \mathcal{A}. (s, f^\ast s) \in \Psi\right\}, \\ &G = \left\{g : \mathbb{E}_k \mathcal{B} \to \mathcal{A} \mid \forall t \in \mathbb{E}_k \mathcal{B}. (g^\ast t, t) \in \Psi\right\}. \end{array}

First we argue that for all (s,t)Ψ(s,t) \in \Psi, (i) f s,tF.f s,t *s=t\exists f_{s,t} \in F. f_{s,t}^\ast s = t and (ii) g s,tG.g s,t *t=s\exists g_{s,t} \in G. g_{s,t}^\ast t = s. To show (i) (and symmetrically for (ii)) we construct f s,tf_{s,t} as follows: for every s𝔼 k𝒜s'\in \mathbb{E}_k \mathcal{A}, let mm be the length of the longest common prefix of ss and ss'. We consider the game play where in the first mm-rounds the spoiler and the duplicator play in the way as (s,t)Φ(s,t) \in \Phi, and afterwards, the spoiler always chooses elements according to ss' and the duplicator follows the winning strategy. The last \mathcal{B}-element picked in this game play is the value of the function f s,tf_{s,t} at ss'.

Now we can see F,G\langle F,G\rangle defined as above is a locally invertible pair, for every gGg \in G and t𝔼 kt \in \mathbb{E}_k \mathcal{B}, (g astt,t)(g^\astt, t) is in Φ\Phi, and thus by (i) there exists f g *t,tFf_{g^\ast t,t} \in F with f g *t,t *(g *t)=tf_{g^\ast t,t}^\ast(g^\ast t) = t. The symmetric condition for FF similarly holds. \square


In Part 1 of the post, we have seen how logical equivalences for first-order logic can be characterised by combinatorial games, and how this can be used for showing inexpressivity results of first-order logic. In Part 2 of this post, we have seen how such games can be formulated in a concise way using comonads.

As an active research subject, what we didn’t say about game comonads in this blog post is a lot:

  1. Many other logics have model comparison games and have received a comonadic treatment, including modal logics (Abramsky and Shah 2021), the kk-variable fragment of FOL (Abramsky, Dawar and Wang 2017), guarded logics (Abramsky and Marsden 2021), monadic second-order logic (Jakl, Marsden and Shah 2022), finite-variable logics with generalised quantifiers (Conghaile and Dawar 2021).

  2. Coalgebras of game comonads usually reveal interesting information about the combinatorial structure of finite structures. For example, 𝔼 k\mathbb{E}_k-coalgebras 𝒜𝔼 k𝒜\mathcal{A} \to \mathbb{E}_k \mathcal{A} are in bijection with forest covers of height k\leq k for the Gaifman graph of 𝒜\mathcal{A}.

  3. Back-and-forth games can be defined in an axiomatic way (Abramsky and Reggio 2021).

Moreover, we have only considered the classical semantics of FOL in sets, so a natural question is how finite model theory interacts with the various notions of finiteness in constructive mathematics and the general categorical semantics of FOL in hyperdoctrines.

September 16, 2023

John BaezThe Triassic-Jurassic Extinction

214 million years ago an asteroid hit what is now Canada. Now the crater is a ring-shaped reservoir 70 kilometers in diameter: Manicouagan Reservoir.

Did this impact cause a mass extinction? The asteroid was 5 kilometers across, while the one that killed the dinosaurs much later was 10 kilometers across. But this is still huge!

For a while people thought this impact may have caused the Triassic-Jurassic mass extinction event. But now that the crater has been carefully dated, they don’t think that anymore. The extinction happened 12 million years later!

In the Triassic-Jurassic mass extinction, all the really huge amphibians died out—like Mastodonsaurus, shown above. So did lots of large reptiles. This let another kind of reptile—dinosaurs—become the dominant land animals for the next 135 million years.

So what caused this mass extinction? A mass extinction event is like a crime scene: you see the dead body, or more precisely the absence of fossils after the event, and you see other clues, but it’s quite hard to figure out the killer.

One big clue is that there was an enormous amount of volcanic activity near the end of the Triassic and start of the Jurassic, as the supercontinent Pangaea split apart. It lasted for about 600,000 years. In fact, there’s about 11 million square kilometers of basalt left over from this event, spread over the eastern Americas, western Africa, Spain, and northwestern France! It’s called the Central Atlantic magmatic province or CAMP.

So, this event could have put huge amounts of carbon dioxide into the air, causing an intense bout of global warming.

(I’m giving a public lecture on mass extinctions, so I’m boning up on them now.)

September 15, 2023

Matt von HippelStories Backwards and Forwards

You can always start with “once upon a time”…

I come up with tricks to make calculations in particle physics easier. That’s my one-sentence story, or my most common one. If I want to tell a longer story, I have more options.

Here’s one longer story:

I want to figure out what Nature is telling us. I want to take all the data we have access to that has anything to say about fundamental physics, every collider and gravitational wave telescope and ripple in the overall structure of the universe, and squeeze it as hard as I can until something comes out. I want to make sure we understand the implications of our current best theories as well as we can, to as high precision as we can, because I want to know whether they match what we see.

To do that, I am starting with a type of calculation I know how to do best. That’s both because I can make progress with it, and because it will be important for making these inferences, for testing our theories. I am following a hint in a theory that definitely does not describe the real world, one that is both simpler to work with and surprisingly complex, one that has a good track record, both for me and others, for advancing these calculations. And at the end of the day, I’ll make our ability to infer things from Nature that much better.

Here’s another:

Physicists, unknowing, proposed a kind of toy model, one often simpler to work with but not necessarily simpler to describe. Using this model, they pursued increasingly elaborate calculations, and time and time again, those calculations surprised them. The results were not random, not a disorderly mess of everything they could plausibly have gotten. Instead, they had structure, symmetries and patterns and mathematical properties that the physicists can’t seem to explain. If we can explain them, we will advance our knowledge of models and theories and ideas, geometry and combinatorics, learning more about the unexpected consequences of the rules we invent.

We can also help the physicists advance physics, of course. That’s a happy accident, but one that justifies the money and time, showing the rest of the world that understanding consequences of rules is still important and valuable.

These seem like very different stories, but they’re not so different. They change in order, physics then math or math then physics, backwards and forwards. By doing that, they change in emphasis, in where they’re putting glory and how they’re catching your attention. But at the end of the day, I’m investigating mathematical mysteries, and I’m advancing our ability to do precision physics.

(Maybe you think that my motivation must lie with one of these stories and not the other. One is “what I’m really doing”, the other is a lie made up for grant agencies.
Increasingly, I don’t think people work like that. If we are at heart stories, we’re retroactive stories. Our motivation day to day doesn’t follow one neat story or another. We move forward, we maybe have deep values underneath, but our accounts of “why” can and will change depending on context. We’re human, and thus as messy as that word should entail.)

I can tell more than two stories if I want to. I won’t here. But this is largely what I’m working on at the moment. In applying for grants, I need to get the details right, to sprinkle the right references and the right scientific arguments, but the broad story is equally important. I keep shuffling that story, a pile of not-quite-literal index cards, finding different orders and seeing how they sound, imagining my audience and thinking about what stories would work for them.

September 13, 2023

Matt Strassler The Impossible Cover

Waves in an Impossible Sea, on the intersection of modern physics with human existence and daily life, is essentially done and edited now — not perfect, of course, but as good as I have had time to make it. Now I await the proofs.

The book is supposed to appear in early March. Here’s the cover art, created by an artist at the publisher, Basic Books. I hope it makes you curious about what might lie inside!

A Harvard physicist takes us on an awe-inspiring journey from relativity to the Higgs field, showing how the universe creates everything from what seems like nothing at all 

John BaezSeminar on “This Week’s Finds”

Summer is coming to a close! It’s almost time to continue my seminars on topics from This Week’s Finds in Mathematical Physics. As before, I’ll be doing these on Thursdays at 3:00 pm UK time in Room 6206 of the James Clerk Maxwell Building, home of the School of Mathematics at the University of Edinburgh.

The first talk will be on Thursday September 21st, and the last on November 30th. I’ll skip October 19th and 27th… and any days there are strikes.

We’re planning to

1) make the talks hybrid on Zoom so that you can join online:
Meeting ID: 822 7032 5098
Passcode: XXXXXX36

Here the X’s stand for the name of the famous lemma in category theory.

2) make lecture notes available on my website.

3) record them and eventually make them publicly available on my YouTube channel.

4) have a Zulip channel on the Category Theory Community Server dedicated to discussion of the seminars: it’s here.

More details soon!

September 11, 2023

Matt Strassler New Scientist Covers the Standard Model and Beyond

For those of you who subscribe to New Scientist, their magazine’s cover story this week is a feature entitled “THE AMAZING THEORY OF (ALMOST) EVERYTHING”. In the feature is an overview of the Standard Model (which describes all known fields and particles, excepting gravity, with amazing accuracy, but leaves a plethora of puzzles unaddressed) and includes a final section (edited by Abby Beall) with short articles by six scientists about their current views regarding the Standard Model, among them myself. [This website’s introductory article on the Standard Model is here; see also here.] . . .

The other five scientists who contributed are

The experimenters, of course, are hoping their experiments will shed some new light on the puzzles that the Standard Model leaves open. I don’t want to get into those details today, but I’ll come back to the g-2 experiment at some point soon.

In my brief contribution to the feature, I make simple points concerning the following issue. So far, other than the Higgs boson, the LHC hasn’t discovered any new elementary particles or other dramatic unexpected effects. This poses a conceptual crisis, because there were strong arguments (based on quantum field theory and on experiments) that Higgs bosons shouldn’t appear alone. That crisis both justifies and motivates the work by professors Burrage, Rajendran and Adlam, along with other young physicists.

In their articles, the other theorists discuss their approaches. Rajendran, whose work has covered many research areas, examines the potential role of new experiments aimed at finding evidence of particles whose interactions with all known particles are extremely weak. Burrage, thinking along similar lines, describes a subtle form of new force whose effects depend on the environment that it is in, and which can’t be observed without specially designed experiments, including ones that she and her colleagues have proposed. Adlam has a more radical and more speculative proposal: that not only is our way of thinking about time wrong (which by itself is plausible, given how confusing time is to us), our misunderstanding of it may have an impact even on the Standard Model.

As of yet, neither they nor anyone else seems to have an exceptionally compelling idea. But out of these new lines of thought, intriguing proposals for entirely new types of experiments are emerging. This all to the good; as is often said, we should never let a crisis go to waste. If our current confusion leads to a novel set of experimental questions about the world, that’s real progress. And if one of those new experiments turns up something no one (or almost no one) was expecting, that’s priceless.

September 10, 2023

John BaezIn hydraulis by Antoine Busnois

You may remember this post of mine:

Renaissance polyphony: the Franco-Flemish school.

This school of music flourished for two whole centuries, roughly from 1400 to 1600. Though I haven’t been posting about it lately, I continue to enjoy it.

Antoine Busnois (1430–1492) is one of the most famous composers in the second generation of the Franco-Flemish school. He’s almost up there with Johannes Ockeghem. And I just ran into a piece by him called In hydraulis. I really like it! But why did he call it that? You don’t hear many songs about hydraulics.

It turns out the lyrics are a description of Pythagorean music theory based on simple fractions and also a homage to his colleague Ockeghem: in 1467, when Busnois wrote this piece, he had recently joined Ockeghem working at the court of Burgundy. The first two words just happen to mention a ‘hydraulis’, which is an ancient kind of water organ.

According to Wikipedia,

The hydraulis is the name of a Greek instrument created by Ctesibius of Alexandria. The hydraulis has a reservoir of air which is inserted into a cistern of water. The air is pushed into the reservoir with hand pumps, and exits the reservoir as pressurized air to blow through the pipes. The reservoir is open on the bottom, allowing water to maintain the pressure on the air as the air supply fluctuates from either the pumps pushing more air in, or the pipes letting air out.

But why was Busnois writing a song about a hydraulis? Translated into English, the lyrics of In hydraulis start like this:

Once when Pythagoras was wondering
at the tones in water organs and the tonalities
of hammers, having followed with his eyes the surfaces
according to the inequalities of the weights,
he discovered the essence of music:
the proportions of epitritus and hemiola,
epogdous and duple, for they lead to
the harmony of fourth, fifth, and also
tone and octave, while they connect
the species of the monochord.

These lyrics are remarkably scholarly, and I’d like to know why. I had to look up some of these words, but it was worthwhile.

The ‘epitritus’ is a ratio of 4:3, which is called a ‘fourth’ in music. The ‘hemiola’ is a ratio of 3:2, which is a ‘fifth’. The ‘epogdous’ is a ratio of 9:8, or a ‘second’, also called an interval of a ‘tone’ since it’s approximately one step up the white keys on a piano. The ‘duple’ is obviously 2:1, or an octave. So Busnois is reviewing how some simple fractions give some of the most important intervals in music. And of course, we attribute this discovery to Pythagoras—though nobody really knows exactly what Pythagoras did.

The ‘monochord’ is a one-stringed instrument supposedly used by Pythagoras to study harmony. And there’s a popular but pretty clearly false legend that Pythagoras noticed these ratios by hitting some hammers in a blacksmith’s shop and comparing their weights! It just doesn’t work like that with hammers.

• Wikipedia, Pythagorean hammers.

The lyrics continue:

Ockeghem, you who sing before all
in the service of the King of the French,
Strengthen the practice of your generation,
examining these things on occasion in the
halls of the Duke of Burgundy in your fatherland.
Through me, Busnoys, unworthy musician
of the illustrious Count of Charolais,
may you be greeted for your merits as the
highest trope-uttering Cephas,
Farewell, true image of Orpheus.

I don’t know why he says farewell — to Ockeghem? Ockeghem hadn’t died. ‘Cephas’ is another name for the apostle Peter, ‘rock’ of the church, so Busnois seems to be saying that Ockeghem played a similar role in the Burgundian musical tradition (which is true).

Here’s a live version of In hydraulis by the group Blue Heron, with captions in Latin and English:

I know Blue Heron because they’re recording Ockeghem’s complete songs in honor of his 600th birthday. But the pungent leading-tones in this piece by Busnois—seventh tones, desperately eager to resolve to the tonic—remind me a bit more of Guillaume Dufa, from the first generation of the Franco-Flemish school, than the more smoothed-down harmonies of Ockeghem.

Busnois could also craft catchy melodies like Dufay. Indeed, he may have written L’homme arme, one of the most popular songs of the entire Renaissance!

The recording of In hydraulis at the top of this page was made by another group who specializes in this era: Pomerium. It’s better recorded and more peppy. If you like this kind of music, I urge you to check out everything by Pomerium. But this version of In hydraulis not live and it doesn’t have lyrics!

John PreskillCan Thermodynamics Resolve the Measurement Problem?

At the recent Quantum Thermodynamics conference in Vienna (coming next year to the University of Maryland!), during an expert panel Q&A session, one member of the audience asked “can quantum thermodynamics address foundational problems in quantum theory?”

That stuck with me, because that’s exactly what my research is about. So naturally, I’d say the answer is yes! In fact, here in the group of Marcus Huber at the Technical University of Vienna, we think thermodynamics may have something to say about the biggest quantum foundations problem of all: the measurement problem.

It’s sort of the iconic mystery of quantum mechanics: we know that an electron can be in two places at once – in a ‘superposition’ – but when we measure it, it’s only ever seen to be in one place, picked seemingly at random from the two possibilities. We say the state has ‘collapsed’.

What’s going on here? Thanks to Bell’s legendary theorem, we know that the answer can’t just be that it was always actually in one place and we just didn’t know which option it was – it really was in two places at once until it was measured1. But also, we don’t see this effect for sufficiently large objects. So how can this ‘two-places-at-once’ thing happen at all, and why does it stop happening once an object gets big enough?

Here, we already see hints that thermodynamics is involved, because even classical thermodynamics says that big systems behave differently from small ones. And interestingly, thermodynamics also hints that the narrative so far can’t be right. Because when taken at face value, the ‘collapse’ model of measurement breaks all three laws of thermodynamics.

Imagine an electron in a superposition of two energy levels: a combination of being in its ground state and first excited state. If we measure it and it ‘collapses’ to being only in the ground state, then its energy has decreased: it went from having some average of the ground and excited energies to just having the ground energy. The first law of thermodynamics says (crudely) that energy is conserved, but the loss of energy is unaccounted for here.

Next, the second law says that entropy always increases. One form of entropy represents your lack of information about a system’s state. Before the measurement, the system was in one of two possible states, but afterwards it was in only one state. So speaking very broadly, our uncertainty about its state, and hence the entropy, is reduced. (The third law is problematic here, too.)

There’s a clear explanation here: while the system on its own decreases its entropy and doesn’t conserve energy, in order to measure something, we must couple the system to a measuring device. That device’s energy and entropy changes must account for the system’s changes.

This is the spirit of our measurement model2. We explicitly include the detector as a quantum object in the record-keeping of energy and information flow. In fact, we also include the entire environment surrounding both system and device – all the lab’s stray air molecules, photons, etc. Then the idea is to describe a measurement process as propagating a record of a quantum system’s state into the surroundings without collapsing it.

A schematic representation of a system spreading information into an environment (from Schwarzhans et al., with permission)

But talking about quantum systems interacting with their environments is nothing new. The “decoherence” model from the 70s, which our work builds on, says quantum objects become less quantum when buffeted by a larger environment.

The problem, though, is that decoherence describes how information is lost into an environment, and so usually the environment’s dynamics aren’t explicitly calculated: this is called an open-system approach. By contrast, in the closed-system approach we use, you model the dynamics of the environment too, keeping track of all information. This is useful because conventional collapse dynamics seems to destroy information, but every other fundamental law of physics seems to say that information can’t be destroyed.

This all allows us to track how information flows from system to surroundings, using the “Quantum Darwinism” (QD) model of W.H. Żurek. Whereas decoherence describes how environments affect systems, QD describes how quantum systems impact their environments by spreading information into them. The QD model says that the most ‘classical’ information – the kind most consistent with classical notions of ‘being in one place’, etc. – is the sort most likely to ‘survive’ the decoherence process.

QD then further asserts that this is the information that’s most likely to be copied into the environment. If you look at some of a system’s surroundings, this is what you’d most likely see. (The ‘Darwinism’ name is because certain states are ‘selected for’ and ‘replicate’3.)

So we have a description of what we want the post-measurement state to look like: a decohered system, with its information redundantly copied into its surrounding environment. The last piece of the puzzle, then, is to ask how a measurement can create this state. Here, we finally get to the dynamics part of the thermodynamics, and introduce equilibration.

Earlier we said that even if the system’s entropy decreases, the detector’s entropy (or more broadly the environment’s) should go up to compensate. Well, equilibration maximizes entropy. In particular, equilibration describes how a system tends towards a particular ‘equilibrium’ state, because the system can always increase its entropy by getting closer to it.

It’s usually said that systems equilibrate if put in contact with an external environment (e.g. a can of beer cooling in a fridge), but we’re actually interested in a different type of equilibration called equilibration on average. There, we’re asking for the state that a system stays roughly close to, on average, over long enough times, with no outside contact. That means it never actually decoheres, it just looks like it does for certain observables. (This actually implies that nothing ever actually decoheres, since open systems are only an approximation you make when you don’t want to track all of the environment.)

Equilibration is the key to the model. In fact, we call our idea the Measurement-Equilibration Hypothesis (MEH): we’re asserting that measurement is an equilibration process. Which makes the final question: what does all this mean for the measurement problem?

In the MEH framework, when someone ‘measures’ a quantum system, they allow some measuring device, plus a chaotic surrounding environment, to interact with it. The quantum system then equilibrates ‘on average’ with the environment, and spreads information about its classical states into the surroundings. Since you are a macroscopically large human, any measurement you do will induce this sort of equilibration to happen, meaning you will only ever have access to the classical information in the environment, and never see superpositions. But no collapse is necessary, and no information is lost: rather some information is only much more difficult to access in all the environment noise, as happens all the time in the classical world.

It’s tempting to ask what ‘happens’ to the outcomes we don’t see, and how nature ‘decides’ which outcome to show to us. Those are great questions, but in our view, they’re best left to philosophers4. For the question we care about: why measurements look like a ‘collapse’, we’re just getting started with our Measurement-Equilibration Hypothesis – there’s still lots to do in our explorations of it. We think the answers we’ll uncover in doing so will form an exciting step forward in our understanding of the weird and wonderful quantum world.

Members of the MEH team at a kick-off meeting for the project in Vienna in February 2023. Left to right: Alessandro Candeloro, Marcus Huber, Emanuel Schwarzhans, Tom Rivlin, Sophie Engineer, Veronika Baumann, Nicolai Friis, Felix C. Binder, Mehul Malik, Maximilian P.E. Lock, Pharnam Bakhshinezhad

Acknowledgements: Big thanks to the rest of the MEH team for all the help and support, in particular Dr. Emanuel Schwarzhans and Dr. Lock for reading over this piece!)

Here are a few choice references (by no means meant to be comprehensive!)

Quantum Thermodynamics (QTD) Conference 2023:
QTD 2024:
Bell’s Theorem:
The first MEH paper:
A review of decoherence:
Quantum Darwinism:
Measurements violate the 3rd law:
More on the 3rd and QM:
Equilibration on average:

  1. There is a perfectly valid alternative with other weird implications: that it was always just in one place, but the world is intrinsically non-local. Most physicists prefer to save locality over realism, though. ↩
  2. First proposed in this paper by Schwarzhans, Binder, Huber, and Lock: ↩
  3. In my opinion… it’s a brilliant theory with a terrible name! Sure, there’s something akin to ‘selection pressure’ and ‘reproduction’, but there aren’t really any notions of mutation, adaptation, fitness, generations… Alas, the name has stuck. ↩
  4. I actually love thinking about this question, and the interpretations of quantum mechanics more broadly, but it’s fairly orthogonal to the day-to-day research on this model. ↩

September 08, 2023

Matt von HippelGetting Started in Saclay

I started work this week in my new position, as a permanent researcher at the Institute for Theoretical Physics of CEA Paris-Saclay. I’m still settling in, figuring out how to get access to the online system and food at the canteen and healthcare. Things are slowly getting into shape, with a lot of running around involved. Until then, I don’t have a ton of time to write (and am dedicating most of it to writing grants!) But I thought, mirroring a post I made almost a decade ago, that I’d at least give you a view of my new office.

September 07, 2023

David Hoggis a periodic signal in a time series statistically significant?

I had conversations with Nora Eisner (Flatiron) and Abby Shaum (CUNY) today about how we report the significance of a signal we find in a time series. In particular a periodic signal. It's an old, unsolved problem, with a lot of literature. And various hacks that are popular in the exoplanet community (and binary-star community!). My position is very simple: Since all methods for determining significance are flawed, and since when you fit a signal you have to estimate also an uncertainty on that signal's parameters, the simplest and most basic test of significance is the significance with which you measure the amplitude of the proposed signal. That is, if the amplitude is well measured, the signal is real. Of course there are adversarial data sets I can make where this isn't true! But that's just a restatement of the point that this is an unsolved problem. For deep reasons!

Doug NatelsonThings I learned at the Packard Foundation meeting

Early in my career, I was incredibly fortunate to be awarded a David and Lucille Packard Foundation fellowship, and this week I attended the meeting in honor of the 35th anniversary of the fellowship program.  Packard fellowships are amazing, with awardees spanning the sciences (including math) and engineering, providing resources for a sustained period (5 years) with enormous flexibility.  The meetings have been some of the most fun ones I've ever attended, with talks by incoming and outgoing fellows that are short (20 min) and specifically designed to be accessible by scientifically literate non-experts.  My highlights from the meeting ten years ago (the last one I attended) are here.  Highlights from meetings back when I was a fellow are here, herehere, here.

Here are some cool things that I learned at the meeting (some of which I'm sure I should've known), from a few of the talks + posters.  (Unfortunately I cannot stay for the last day, so apologies for missing some great presentations.)   I will further update this post later in the day and tomorrow.

  • By the 2040s, with the oncoming LISA and Cosmic Explorer/Einstein Telescope instruments, it's possible that we will be able to detect every blackhole merger in the entire visible universe.
  • It's very challenging to have models of galaxy evolution that handle how supernovae regulate mass outflow and star formation to end up with what we see statistically in the sky
  • Machine learning can be really good at disentangling overlapping seismic events.
  • In self-propelled/active matter, it's possible to start with particles that just have a hard-shell repulsion and still act like there is an effective attractive interaction that leads to clumping.
  • There are about \(10^{14}\) bacteria in each person, with about 360\(\times\) the genetic material of the person.  Also, the gut has lots of neurons, five times as many as the spinal cord (!).  The gut microbiome can seemingly influence concentrations of neurotransmitters.
  • Bees can deliberately damage leaves of plants to stress the flora and encourage earlier and more prolific flowering.
  • For some bio-produced materials that are nominally dry, their elastic properties and the dependence of those properties on humidity is seemingly controlled almost entirely by the water they contain.  
  • It is now possible to spatially resolve gene expression (via mRNA) at the single cell level across whole slices of, e.g., mouse brain tissue.  Mind-blowing links here and here.
  • I knew that ordinary human red blood cells have no organelles, and therefore they can't really respond much to stimuli.  What I did not know is that maturing red blood cells (erythrocyte precurors) in bone marrow start with nuclei and can participate in immune response, and that red blood cells in fetuses (and then at trace level in pregnant mothers) circulate all the different progenitor cells, potentially playing an important role in immune response.
  • 45% of all deaths in the US can be attributed in part to fibrosis (scarring) issues (including cardiac problems), but somehow the uterus can massively regenerate monthly without scarring.  Also, zero common lab animals menstruate, which is a major obstacle for research; transgenic mice can now be made so that there are good animal models for study. 
  • Engineered cellulose materials can be useful for radiative cooling to the sky and can be adapted for many purposes, like water harvesting from the atmosphere with porous fabrics.

September 06, 2023

Terence TaoMonotone non-decreasing sequences of the Euler totient function

I have just uploaded to the arXiv my paper “Monotone non-decreasing sequences of the Euler totient function“. This paper concerns the quantity {M(x)}, defined as the length of the longest subsequence of the numbers from {1} to {x} for which the Euler totient function {\varphi} is non-decreasing. The first few values of {M} are

\displaystyle  1, 2, 3, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 11, 12, 12, \dots

(OEIS A365339). For instance, {M(6)=5} because the totient function is non-decreasing on the set {\{1,2,3,4,5\}} or {\{1,2,3,4,6\}}, but not on the set {\{1,2,3,4,5,6\}}.

Since {\varphi(p)=p-1} for any prime {p}, we have {M(x) \geq \pi(x)}, where {\pi(x)} is the prime counting function. Empirically, the primes come quite close to achieving the maximum length {M(x)}; indeed it was conjectured by Pollack, Pomerance, and Treviño, based on numerical evidence, that one had

\displaystyle  M(x) = \pi(x)+64 \ \ \ \ \ (1)

for all {x \geq 31957}; this conjecture is verified up to {x=10^7}. The previous best known upper bound was basically of the form

\displaystyle  M(x) \leq \exp( (C+o(1)) (\log\log\log x)^2 ) \frac{x}{\log x} \ \ \ \ \ (2)

as {x \rightarrow \infty} for an explicit constant {C = 0.81781\dots}, from combining results from the above paper with that of Ford or of Maier-Pomerance. In this paper we obtain the asymptotic

\displaystyle  M(x) = \left( 1 + O \left(\frac{(\log\log x)^5}{\log x}\right) \right) \frac{x}{\log x}

so in particular {M(x) = (1+o(1))\pi(x)}. This answers a question of Erdős, as well as a closely related question of Pollack, Pomerance, and Treviño.

The methods of proof turn out to be mostly elementary (the most advanced result from analytic number theory we need is the prime number theorem with classical error term). The basic idea is to isolate one key prime factor {p} of a given number {1 \leq n \leq x} which has a sizeable influence on the totient function {\varphi(n)}. For instance, for “typical” numbers {n}, one has a factorization

\displaystyle  n = d p_2 p_1

where {p_2} is a medium sized prime, {p_1} is a significantly larger prime, and {d} is a number with all prime factors less than {p_2}. This leads to an approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} (1-\frac{1}{p_2}) n.

As a consequence, if we temporarily hold {d} fixed, and also localize {n} to a relatively short interval, then {\varphi} can only be non-decreasing in {n} if {p_2} is also non-decreasing at the same time. This turns out to significantly cut down on the possible length of a non-decreasing sequence in this regime, particularly if {p_2} is large; this can be formalized by partitioning the range of {p_2} into various subintervals and inspecting how this (and the monotonicity hypothesis on {\varphi}) constrains the values of {n} associated to each subinterval. When {p_2} is small, we instead use a factorization

\displaystyle  n = d p \ \ \ \ \ (3)

where {d} is very smooth (i.e., has no large prime factors), and {p} is a large prime. Now we have the approximation

\displaystyle  \varphi(n) \approx \frac{\varphi(d)}{d} n \ \ \ \ \ (4)

and we can conclude that {\frac{\varphi(d)}{d}} will have to basically be piecewise constant in order for {\varphi} to be non-decreasing. Pursuing this analysis more carefully (in particular controlling the size of various exceptional sets in which the above analysis breaks down), we end up achieving the main theorem so long as we can prove the preliminary inequality

\displaystyle  \sum_{\frac{\varphi(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (5)

for all positive rational numbers {q}. This is in fact also a necessary condition; any failure of this inequality can be easily converted to a counterexample to the bound (2), by considering numbers of the form (3) with {\frac{\varphi(d)}{d}} equal to a fixed constant {q} (and omitting a few rare values of {n} where the approximation (4) is bad enough that {\varphi} is temporarily decreasing). Fortunately, there is a minor miracle, relating to the fact that the largest prime factor of denominator of {\frac{\varphi(d)}{d}} in lowest terms necessarily equals the largest prime factor of {d}, that allows one to evaluate the left-hand side of (5) almost exactly (this expression either vanishes, or is the product of {\frac{1}{p-1}} for some primes {p} ranging up to the largest prime factor of {q}) that allows one to easily establish (5). If one were to try to prove an analogue of our main result for the sum-of-divisors function {\sigma(n)}, one would need the analogue

\displaystyle  \sum_{\frac{\sigma(d)}{d}=q} \frac{1}{d} \leq 1 \ \ \ \ \ (6)

of (5), which looks within reach of current methods (and was even claimed without proof by Erdos), but does not have a full proof in the literature at present.

In the final section of the paper we discuss some near counterexamples to the strong conjecture (1) that indicate that it is likely going to be difficult to get close to proving this conjecture without assuming some rather strong hypotheses. Firstly, we show that failure of Legendre’s conjecture on the existence of a prime between any two consecutive squares can lead to a counterexample to (1). Secondly, we show that failure of the Dickson-Hardy-Littlewood conjecture can lead to a separate (and more dramatic) failure of (1), in which the primes are no longer the dominant sequence on which the totient function is non-decreasing, but rather the numbers which are a power of two times a prime become the dominant sequence. This suggests that any significant improvement to (2) would require assuming something comparable to the prime tuples conjecture, and perhaps also some unproven hypotheses on prime gaps.

September 05, 2023

Tommaso DorigoA Visit To ICTS

The Indian Center for Theoretical Sciences is located in a rural area a few kilometers north of Bangalore, in southern India. Bangalore is a mid-sized city that saw a very big expansion in the past few years due to having become a center for the information technology in the country - with most of the big multinationals opening sections there. The rapid expansion increased the wealth of the middle class there (but remember, the middle class is the top 5% in India), but it also created stress to the traffic in the city, which is notoriously a plague there.
The campus of ICTS is very nice from an architectonic point of view, embedding nature in its buildings and trying to integrate the two realities. Below is a picture.

read more

Jacques Distler dCS

For various reasons, some people seem to think that the following modification to Einstein Gravity
(1)S=12dϕ*dϕ+κ 22*+3ϕ192π 2fTr(RR)S= \int \tfrac{1}{2} d\phi\wedge *d\phi + \tfrac{\kappa^2}{2} *\mathcal{R} + {\color{red} \tfrac{3 \phi}{192\pi^2 f}Tr(R\wedge R)}
is interesting to consider. In some toy world, it might be1. But in the real world, there are nearly massless neutrinos. In the Standard Model, U(1) BLU(1)_{B-L} has a gravitational ABJ anomaly (where, in the real world, the number of generations N f=3N_f=3)
(2)d*J BL=N f192π 2Tr(RR) d * J_{B-L} = \frac{N_f}{192\pi^2} Tr(R\wedge R)
which, by a U(1) BLU(1)_{B-L} rotation, would allow us to entirely remove2 the coupling marked in red in (1). In the real world, the neutrinos are not massless; there’s the Weinberg term
(3)1M(y ij(HL i)(HL j)+h.c.)\frac{1}{M}\left(y^{i j} (H L_i)(H L_j) + \text{h.c.}\right)
which explicitly breaks U(1) BLU(1)_{B-L}. When the Higgs gets a VEV, this term gives a mass m ij=H 2y ijM m^{i j} = \frac{\langle H\rangle^2 y^{i j}}{M} to the neutrinos, So, rather than completely decoupling, ϕ\phi reappears as a (dynamical) contribution to the phase of the neutrino mass matrix
(4)m ijm ije 2iϕ/fm^{i j} \to m^{i j}e^{2i\phi/f}
Of course there is a CP-violating phase in the neutrino mass matrix. But its effects are so tiny that its (presumably nonzero) value is still unknown. Since (4) is rigourously equivalent to (1), the effects of the term in red in (1) are similarly unobservably small. Assertions that it could have dramatic consequences — whether for LIGO or large-scale structure — are … bizarre.


The claim that (1) has some observable effect is even more bizarre if you are seeking to find one (say) during inflation. Before the electroweak phase transition, H=0\langle H \rangle=0 and the effect of a ϕ\phi-dependent phase in the Weinberg term (3) is even more suppressed.

1 An analogy with Yang Mills might be helpful. In pure Yang-Mills, the θ\theta-parameter is physical; observable quantities depend on it. But, if you introduce a massless quark, it becomes unphysical and all dependence on it drops out. For massive quarks, only the sum of θ\theta and phase of the determinant of the quark mass matrix is physical.
2 The easiest way to see this is to introduce a background gauge field, 𝒜\mathcal{A}, for U(1) BLU(1)_{B-L} and modify (1) to
(5)S=12(dϕf𝒜)*(dϕf𝒜)+κ 22*+3ϕ24π 2f[18Tr(RR)+d𝒜d𝒜]S= \int \tfrac{1}{2} (d\phi-f\mathcal{A})\wedge *(d\phi-f\mathcal{A}) + \tfrac{\kappa^2}{2} *\mathcal{R} + {\color{red} \tfrac{3 \phi}{24\pi^2 f}\left[\tfrac{1}{8}Tr(R\wedge R)+d\mathcal{A}\wedge d\mathcal{A}\right]}
Turning off the Weinberg term, the theory is invariant under U(1) BLU(1)_{B-L} gauge transformations 𝒜 𝒜+dχ ϕ ϕ+fχ Q i e iχ/3Q i u¯ i e iχ/3u¯ i d¯ i e iχ/3d¯ i L i e iχL i e¯ i e iχe¯ i \begin{split} \mathcal{A}&\to \mathcal{A}+d\chi\\ \phi&\to \phi+ f \chi\\ Q_i&\to e^{i\chi/3}Q_i\\ \overline{u}_i&\to e^{-i\chi/3}\overline{u}_i\\ \overline{d}_i&\to e^{-i\chi/3}\overline{d}_i\\ L_i&\to e^{-i\chi}L_i\\ \overline{e}_i&\to e^{i\chi}\overline{e}_i\\ \end{split} where the anomalous variation of the fermions cancels the variation of the term in red. Note that the first term in (5) is a gauge-invariant mass term for 𝒜\mathcal{A} (or would be if we promoted 𝒜\mathcal{A} to a dynamical gauge field). Choosing χ=ϕ/f\chi = -\phi/f eliminates the term in red. Turning back on the Weinberg term (which explicitly breaks U(1) BLU(1)_{B-L}) puts the coupling to ϕ\phi into the neutrino mass matrix (where it belongs).

September 01, 2023

Matt von HippelCosmology and the Laws of Physics

Suppose you were an unusual sort of person: one who wanted, above all else, to know the laws of physics. Not content with the rules governing just one sort of thing, a star or an atom or a galaxy, you want to know the fundamental rules behind everything in the universe.

A good reductionist, you know that smaller things are more fundamental: the rules of the parts of things determine the rules of the whole. Knowing about quantum mechanics, you know that the more precisely you want to pin down something’s position, the more uncertain its momentum will be. And aware of special relativity, you know that terms like “small thing” or “high momentum” are relative: things can look bigger or smaller, faster or slower, depending on how they move relative to you. If you want to find the most fundamental things then, you end up needing not just small things or high momenta, but a lot of energy packed into a very small space.

You can get this in a particle collider, and that’s why they’re built. By colliding protons or electrons, you can cram a lot of energy into a very small space, and the rules governing that collision will be some of the most fundamental rules you have access to. By comparing your measurements of those collisions with your predictions, you can test your theories and learn more about the laws of physics.

If you really just wanted to know the laws of physics, then you might thing cosmology would be less useful. Cosmology is the science of the universe as a whole, how all of the stars and galaxies and the space-time around them move and change over the whole history of the universe. Dealing with very large distances, cosmology seems like it should take you quite far away from universal reductionist physical law.

If you thought that, you’d be missing one essential ingredient: the Big Bang. In the past, the universe was (as the song goes) in a hot dense state. The further back in time you look, the hotter and denser it gets. Go far enough back, and you find much higher energies, crammed into much smaller spaces, than we can make in any collider here on Earth. That means the Big Bang was governed by laws much more fundamental than the laws we can test here on Earth. And since the Big Bang resulted in the behavior of the universe as a whole, by observing that behavior we can learn more about those laws.

So a cosmologist can, in principle, learn quite a lot about fundamental physics. But cosmology is in many ways a lot harder than working with colliders. In a collider, we can clash protons together many times a second, with measurement devices right next to the collision. In cosmology, we have in a sense only one experiment, the universe we live in. We have to detect the evidence much later than the Big Bang itself, when the cosmic microwave background has cooled down and the structure of the universe has been warped by all the complexities of star and galaxy formation. Because we have only one experiment, all we can do is compare different sections of the sky, but there is only so much sky we can see, and as a consequence there are real limits on how much we can know.

Still, it’s worth finding out what we can know.m Cosmology is the only way at the moment we can learn about physics at very high energies, and thus learn the most fundamental laws. So if you’re someone who cares a lot about that sort of thing, it’s worth paying attention to!

August 31, 2023

Doug NatelsonWhat is the thermal Hall effect?

One thing that physics and mechanical engineering students learn early on is that there are often analogies between charge flow and heat flow, and this is reflected in the mathematical models we use to describe charge and heat transport.  We use Ohm's law, \(\mathbf{j}=\tilde{\sigma}\cdot \mathbf{E}\), which defines an electrical conductivity tensor \(\tilde{\sigma}\) that relates charge current density \(\mathbf{j}\) to electric fields \(\mathbf{E}=-\nabla \phi\), where \(\phi(\mathbf{r})\) is the electric potential.  Similarly, we can use Fourier's law for thermal conduction, \(\mathbf{j}_{Q} = - \tilde{\kappa}\cdot \nabla T\), where \(\mathbf{j}_{Q}\) is a heat current density, \(T(\mathbf{r})\) is the temperature distribution, and \(\tilde{\kappa}\) is the thermal conductivity.  

We know from experience that the electrical conductivity really has to be a tensor, meaning that the current and the electric field don't have to point along each other.  The most famous example of this, the Hall effect, goes back a long way, discovered by Edwin Hall in 1879.  The phenomenon is easy to describe.  Put a conductor in a magnetic field (directed along \(z\)), and drive a (charge) current \(I_{x}\) along it (along \(x\)), as shown, typically by applying a voltage along the \(x\) direction, \(V_{xx}\).  Hall found that there is then a transverse voltage that develops, \(V_{xy}\) that is proportional to the current.  The physical picture for this is something that we teach to first-year undergrads:  The charge carriers in the conductor obey the Lorentz force law and curve in the presence of a magnetic field.  There can't be a net current in the \(y\) direction because of the edges of the sample, so a transverse (\(y\)-directed) electric field has to build up.  

There can also be a thermal Hall effect, when driving heat conduction in one direction (say \(x\)) leads to an additional temperature gradient in a transverse (\(y\)) direction.  The least interesting version of this (the Maggi–Righi–Leduc effect) is in fact a consequence of the regular Hall effect:  the same charge carriers in a conductor can carry thermal energy as well as charge, so thermal energy just gets dragged sideways.   

Surprisingly, insulators can also show a thermal Hall effect.  That's rather unintuitive, since whatever is carrying thermal energy in the insulator is not some charged object obeying the Lorentz force law.  Interestingly, there are several distinct mechanisms that can lead to thermal Hall response.  With phonons carrying the thermal energy, you can have magnetic field affecting the scattering of phonons, and you can also have intrinsic curving of phonon propagation due to Berry phase effects.  In magnetic insulators, thermal energy can also be carried by magnons, and there again you can have Berry phase effects giving you a magnon Hall effect.  There can also be a thermal Hall signal from topological magnon modes that run around the edges of the material.  In special magnetic insulators (Kitaev systems), there are thought to be special Majorana edge modes that can give quantized thermal Hall response, though non-quantized response argues that topological magnon modes are relevant in those systems.  The bottom line:  thermal Hall effects are real and it can be very challenging to distinguish between candidate mechanisms. 

(Note: Blogger now compresses the figures, so click on the image to see a higher res version.)

August 28, 2023

John PreskillThe Book of Mark, Chapter 2

Late in the summer of 2021, I visited a physics paradise in a physical paradise: the Kavli Institute for Theoretical Physics (KITP). The KITP sits at the edge of the University of California, Santa Barbara like a bougainvillea bush at the edge of a yard. I was eating lunch outside the KITP one afternoon, across the street from the beach. PhD student Arman Babakhani, whom a colleague had just introduced me to, had joined me.

The KITP’s Kohn Hall

What physics was I working on nowadays? Arman wanted to know.

Thermodynamic exchanges. 

The world consists of physical systems exchanging quantities with other systems. When a rose blooms outside the Santa Barbara mission, it exchanges pollen with the surrounding air. The total amount of pollen across the rose-and-air whole remains constant, so we call the amount a conserved quantity. Quantum physicists usually analyze conservation of particles, energy, and magnetization. But quantum systems can conserve quantities that participate in uncertainty relations. Such quantities are called incompatible, because you can’t measure them simultaneously. The x-, y-, and z-components of a qubit’s spin are incompatible.

The Santa Barbara mission…
…and its roses

Exchanging and conserving incompatible quantities, systems can violate thermodynamic expectations. If one system is much larger than the other, we expect the smaller system to thermalize; yet incompatibility invalidates derivations of the thermal state’s form. Incompatibility reduces the thermodynamic entropy produced by exchanges. And incompatibility can raise the average amount entanglement in the pair of systems—the total system.

If the total system conserves incompatible quantities, what happens to the eigenstate thermalization hypothesis (ETH)? Last month’s blog post overviewed the ETH, a framework for understanding how quantum many-particle systems thermalize internally. That post labeled Mark Srednicki, a professor at the KITP, a high priest of the ETH. I want, I told Arman, to ask Mark what happens when you combine the ETH with incompatible conserved quantities.

I’ll do it, Arman said.

Soon after, I found myself in the fishbowl. High up in the KITP, a room filled with cushy seats overlooks the ocean. The circular windows lend the room its nickname. Arrayed on the armchairs and couches were Mark, Arman, Mark’s PhD student Fernando Iniguez, and Mark’s recent PhD student Chaitanya Murthy. The conversation went like this:

Mark was frustrated about not being able to answer the question. I was delighted to have stumped him. Over the next several weeks, the group continued meeting, and we emailed out notes for everyone to criticize. I particulary enjoyed watching Mark and Chaitanya interact. They’d grown so intellectually close throughout Chaitanya’s PhD studies, they reminded me of an old married couple. One of them had to express only half an idea for the other to realize what he’d meant and to continue the thread. Neither had any qualms with challenging the other, yet they trusted each other’s judgment.1

In vintage KITP fashion, we’d nearly completed a project by the time Chaitanya and I left Santa Barbara. Physical Review Letters published our paper this year, and I’m as proud of it as a gardener of the first buds from her garden. Here’s what we found.

Southern California spoiled me for roses.

Incompatible conserved quantities conflict with the ETH and the ETH’s prediction of internal thermalization. Why? For three reasons. First, when inferring thermalization from the ETH, we assume that the Hamiltonian lacks degeneracies (that no energy equals any other). But incompatible conserved quantities force degeneracies on the Hamiltonian.2 

Second, when inferring from the ETH that the system thermalizes, we assume that the system begins in a microcanonical subspace. That’s an eigenspace shared by the conserved quantities (other than the Hamiltonian)—usually, an eigenspace of the total particle number or the total spin’s z-component. But, if incompatible, the conserved quantities share no eigenbasis, so they might not share eigenspaces, so microcanonical subspaces won’t exist in abundance.

Third, let’s focus on a system of N qubits. Say that the Hamiltonian conserves the total spin components S_x, S_y, and S_z. The Hamiltonian obeys the Wigner–Eckart theorem, which sounds more complicated than it is. Suppose that the qubits begin in a state | s_\alpha, \, m \rangle labeled by a spin quantum number s_\alpha and a magnetic spin quantum number m. Let a particle hit the qubits, acting on them with an operator \mathcal{O} . With what probability (amplitude) do the qubits end up with quantum numbers s_{\alpha'} and m'? The answer is \langle s_{\alpha'}, \, m' | \mathcal{O} | s_\alpha, \, m \rangle. The Wigner–Eckart theorem dictates this probability amplitude’s form. 

| s_\alpha, \, m \rangle and | s_{\alpha'}, \, m' \rangle are Hamiltonian eigenstates, thanks to the conservation law. The ETH is an ansatz for the form of \langle s_{\alpha'}, \, m' | \mathcal{O} | s_\alpha, \, m \rangle—of the elements of matrices that represent operators \mathcal{O} relative to the energy eigenbasis. The ETH butts heads with the Wigner–Eckart theorem, which also predicts the matrix element’s form.

The Wigner–Eckart theorem wins, being a theorem—a proved claim. The ETH is, as the H in the acronym relates, only a hypothesis.

If conserved quantities are incompatible, we have to kiss the ETH and its thermalization predictions goodbye. But must we set ourselves adrift entirely? Can we cling to no buoy from physics’s best toolkit for quantum many-body thermalization?

No, and yes, respectively. Our clan proposed a non-Abelian ETH for Hamiltonians that conserve incompatible quantities—or, equivalently, that have non-Abelian symmetries. The non-Abelian ETH depends on s_\alpha and on Clebsch–Gordan coefficients—conversion factors between total-spin eigenstates | s_\alpha, \, m \rangle and product states | s_1, \, m_1 \rangle \otimes | s_2, \, m_2 \rangle.

Using the non-Abelian ETH, we proved that many systems thermalize internally, despite conserving incompatible quantities. Yet the incompatibility complicates the proof enormously, extending it from half a page to several pages. Also, under certain conditions, incompatible quantities may alter thermalization. According to the conventional ETH, time-averaged expectation values \overline{ \langle \mathcal{O} \rangle }_t come to equal thermal expectation values \langle \mathcal{O} \rangle_{\rm th} to within O( N^{-1} ) corrections, as I explained last month. The correction can grow polynomially larger in the system size, to O( N^{-1/2} ), if conserved quantities are incompatible. Our conclusion holds under an assumption that we argue is physically reasonable.

So incompatible conserved quantities do alter the ETH, yet another thermodynamic expectation. Physicist Jae Dong Noh began checking the non-Abelian ETH numerically, and more testing is underway. And I’m looking forward to returning to the KITP this fall. Tales do say that paradise is a garden.

View through my office window at the KITP

1Not that married people always trust each other’s judgment.

2The reason is Schur’s lemma, a group-theoretic result. Appendix A of this paper explains the details.

August 26, 2023

Jordan EllenbergNew York (list form)

Went with AB to New York for three days and this is what we did/saw/ate, in rough chronological order:

Korean barbecue at Antoya, Natural History Museum (including new Gilder wing and finally re-opened Northwest Coast hall, my favorite), belly lox from Zabar’s eaten at Riverside Park, Little Shop of Horrors revival, bubble tea, slice at 2 Bros, breakfast at Katz’s (people, if you can stomach a 3/4-pound pastrami sandwich at 10am this is absolutely the way to beat the line), the Strand, shake at the original Shake Shack, The Play That Goes Wrong, observation deck at Top of the Rock, MUJI, Churrascaria Plataforma, breakfast at Junior’s Cheesecake, the Intrepid museum, the Staten Island Ferry (why is this free?), old high school friend, old grad school friend, Korean fried chicken at Turntable Chicken Jazz, one final bubble tea.

Transportation note: we didn’t take a taxi or Lyft the entire time. I understand why almost no US city can have a subway and bus network this thick and this good, but boy is it nice. (Maybe relevant is that we didn’t leave Manhattan the entire time except for the two-block radius around the ferry terminal on the Staten Island side.)

August 25, 2023

Matt von HippelWhy You Might Want to Inspire Kids to Be Physicists (And What Movies You’d Make as a Result)

Since the new Oppenheimer biopic came out, people have been making fun of this tweet by Sam Altman:

Expecting a movie about someone building an immensely destructive weapon, watching it plunge the world into paranoia, then getting mercilessly hounded about it to be an inspiration seems…a bit unrealistic? But everyone has already made that point. What I found more interesting was a blog post a couple days ago by science blogger Chad Orzel. Orzel asks, suppose you did want to make a movie inspiring kids to go into physics: how would you do it? I commented on his post with my own take on the question, then realized it might be nice as a post here.

If you want to inspire kids to go into physics with a movie, what do you do? Well, you can start by asking, why do you want kids to go into physics? Why do you want more physicists?

Maybe you believe that more physicists are needed to understand the fundamental laws of the universe. The quest of fundamental physics may be worthwhile in its own right, or may be important because understanding the universe gives us more tools to manipulate it. You might even think of Oppenheimer’s story in that way: because physicists understood the nature of the atom, they could apply that knowledge to change the world, racing to use it to defeat the Nazis and later convinced to continue to avoid a brutal invasion of Japan. (Whether the bomb was actually necessary to do this is still, of course, quite controversial.)

If that’s why you want more kids to be physicists, then you want a story like that. You could riff off of Ashoke Sen’s idea that physics may be essential to save humanity. The laws of physics appear to be unstable, such that at some point the world will shift and a “bubble”, expanding at the speed of light, will rewrite the rules in a way that would destroy all life as we know it. The only way to escape would be to travel faster than light, something that is possible because the universe itself expands at those speeds. By scattering “generation ships” in different directions, we could ensure that some of humanity would survive any such “bubble”: but only if we got the physics right.

A movie based on that idea could look a bit like the movie Cloud Atlas, with connected characters spanning multiple time periods. Scientists in the modern day investigate the expanding universe, making plans that refugees in a future generation ship must carry out. If you want to inspire kids with the idea that physics could save the world, you could get a lot of mileage out of a story that could actually be true.

On the other hand, maybe you don’t care so much about fundamental physics. Maybe you want more physicists because they’re good at solving a variety of problems. They help to invent new materials, to measure things precisely, to predict the weather, change computation, and even contribute to medicine. Maybe you want to tell a story about that.

(Maybe you even want these kids to go farther afield, and study physics without actually becoming physicists. Sam Altman is not a physicist, and I’ve heard he’s not very interested in directing his philanthropic money to increasing the number of jobs for physicists. On the other hand, the AI industry where he is a central player does hire a lot of ex-physicists.)

The problem, as Orzel points out, is that those stories aren’t really stories about physicists. They’re stories about engineering and technology, and a variety of other scientists, because a wide variety of people contribute to these problems. In order to tell a story that inspires people to be physicists, you need a story that highlights something unique that they bring to the table.

Orzel gets close to what I think of as the solution, by bringing up The Social Network. Altman was also mocked for saying that The Social Network motivated kids to found startups: the startup founders in that movie are not exactly depicted as good people. But in reality, it appears that the movie did motivate people to found startups. Stories about badass amoral jerks are engaging, and it’s easy to fantasize about having that kind of power and ability. There’s a reason that The Imitation Game depicted Alan Turing, a man known for his gentle kindness, as brusque and arrogant.

If you want to tell a story about physicists, it’s actually pretty easy, because physicists can be quite arrogant! There is a stereotype of physicists walking into another field, deciding they know everything they need to know, and lecturing the experts about how they should be doing their jobs. This really does happen, and sometimes it’s exactly as dumb as it sounds…but sometimes the physicists are right! Orzel brings up Feynman’s role in figuring out how the Challenger space shuttle blew up, an example of precisely this kind of success.

So if you want kids to grow up to be generalist physicists, people who solve all sorts of problems for all sorts of people, you need to tell them a story like that. One with a Sherlock-esque physicist who runs around showing how much smarter they are than everyone else. You need to make a plot where they physicist waves around “physicist tools”, like dimensional analysis, Fermi estimates, and thermodynamics, and uses them to uncover a mystery, showing a bunch of engineers or biologists just how much cooler they are.

If you do that, you probably could inspire some kids to become physicists. You’ll need a new movie to inspire them to be engineers or biologists, though!

Terence TaoYoneda’s lemma as an identification of form and function: the case study of polynomials

As someone who had a relatively light graduate education in algebra, the import of Yoneda’s lemma in category theory has always eluded me somewhat; the statement and proof are simple enough, but definitely have the “abstract nonsense” flavor that one often ascribes to this part of mathematics, and I struggled to connect it to the more grounded forms of intuition, such as those based on concrete examples, that I was more comfortable with. There is a popular MathOverflow post devoted to this question, with many answers that were helpful to me, but I still felt vaguely dissatisfied. However, recently when pondering the very concrete concept of a polynomial, I managed to accidentally stumble upon a special case of Yoneda’s lemma in action, which clarified this lemma conceptually for me. In the end it was a very simple observation (and would be extremely pedestrian to anyone who works in an algebraic field of mathematics), but as I found this helpful to a non-algebraist such as myself, and I thought I would share it here in case others similarly find it helpful.

In algebra we see a distinction between a polynomial form (also known as a formal polynomial), and a polynomial function, although this distinction is often elided in more concrete applications. A polynomial form in, say, one variable with integer coefficients, is a formal expression {P} of the form

\displaystyle  P = a_d {\mathrm n}^d + \dots + a_1 {\mathrm n} + a_0 \ \ \ \ \ (1)

where {a_0,\dots,a_d} are coefficients in the integers, and {{\mathrm n}} is an indeterminate: a symbol that is often intended to be interpreted as an integer, real number, complex number, or element of some more general ring {R}, but is for now a purely formal object. The collection of such polynomial forms is denoted {{\bf Z}[{\mathrm n}]}, and is a commutative ring.

A polynomial form {P} can be interpreted in any ring {R} (even non-commutative ones) to create a polynomial function {P_R : R \rightarrow R}, defined by the formula

\displaystyle  P_R(n) := a_d n^d + \dots + a_1 n + a_0 \ \ \ \ \ (2)

for any {n \in R}. This definition (2) looks so similar to the definition (1) that we usually abuse notation and conflate {P} with {P_R}. This conflation is supported by the identity theorem for polynomials, that asserts that if two polynomial forms {P, Q} agree at an infinite number of (say) complex numbers, thus {P_{\bf C}(z) = Q_{\bf C}(z)} for infinitely many {z}, then they agree {P=Q} as polynomial forms (i.e., their coefficients match). But this conflation is sometimes dangerous, particularly when working in finite characteristic. For instance:

  • (i) The linear forms {{\mathrm n}} and {-{\mathrm n}} are distinct as polynomial forms, but agree when interpreted in the ring {{\bf Z}/2{\bf Z}}, since {n = -n} for all {n \in {\bf Z}/2{\bf Z}}.
  • (ii) Similarly, if {p} is a prime, then the degree one form {{\mathrm n}} and the degree {p} form {{\mathrm n}^p} are distinct as polynomial forms (and in particular have distinct degrees), but agree when interpreted in the ring {{\bf Z}/p{\bf Z}}, thanks to Fermat’s little theorem.
  • (iii) The polynomial form {{\mathrm n}^2+1} has no roots when interpreted in the reals {{\bf R}}, but has roots when interpreted in the complex numbers {{\bf C}}. Similarly, the linear form {2{\mathrm n}-1} has no roots when interpreted in the integers {{\bf Z}}, but has roots when interpreted in the rationals {{\bf Q}}.

The above examples show that if one only interprets polynomial forms in a specific ring {R}, then some information about the polynomial could be lost (and some features of the polynomial, such as roots, may be “invisible” to that interpretation). But this turns out not to be the case if one considers interpretations in all rings simultaneously, as we shall now discuss.

If {R, S} are two different rings, then the polynomial functions {P_R: R \rightarrow R} and {P_S: S \rightarrow S} arising from interpreting a polynomial form {P} in these two rings are, strictly speaking, different functions. However, they are often closely related to each other. For instance, if {R} is a subring of {S}, then {P_R} agrees with the restriction of {P_S} to {R}. More generally, if there is a ring homomorphism {\phi: R \rightarrow S} from {R} to {S}, then {P_R} and {P_S} are intertwined by the relation

\displaystyle  \phi \circ P_R = P_S \circ \phi, \ \ \ \ \ (3)

which basically asserts that ring homomorphism respect polynomial operations. Note that the previous observation corresponded to the case when {\phi} was an inclusion homomorphism. Another example comes from the complex conjugation automorphism {z \mapsto \overline{z}} on the complex numbers, in which case (3) asserts the identity

\displaystyle  \overline{P_{\bf C}(z)} = P_{\bf C}(\overline{z})

for any polynomial function {P_{\bf C}} on the complex numbers, and any complex number {z}.

What was surprising to me (as someone who had not internalized the Yoneda lemma) was that the converse statement was true: if one had a function {F_R: R \rightarrow R} associated to every ring {R} that obeyed the intertwining relation

\displaystyle  \phi \circ F_R = F_S \circ \phi \ \ \ \ \ (4)

for every ring homomorphism {\phi: R \rightarrow S}, then there was a unique polynomial form {P \in {\bf Z}[\mathrm{n}]} such that {F_R = P_R} for all rings {R}. This seemed surprising to me because the functions {F} were a priori arbitrary functions, and as an analyst I would not expect them to have polynomial structure. But the fact that (4) holds for all rings {R,S} and all homomorphisms {\phi} is in fact rather powerful. As an analyst, I am tempted to proceed by first working with the ring {{\bf C}} of complex numbers and taking advantage of the aforementioned identity theorem, but this turns out to be tricky because {{\bf C}} does not “talk” to all the other rings {R} enough, in the sense that there are not always as many ring homomorphisms from {{\bf C}} to {R} as one would like. But there is in fact a more elementary argument that takes advantage of a particularly relevant (and “talkative”) ring to the theory of polynomials, namely the ring {{\bf Z}[\mathrm{n}]} of polynomials themselves. Given any other ring {R}, and any element {n} of that ring, there is a unique ring homomorphism {\phi_{R,n}: {\bf Z}[\mathrm{n}] \rightarrow R} from {{\bf Z}[\mathrm{n}]} to {R} that maps {\mathrm{n}} to {n}, namely the evaluation map

\displaystyle  \phi_{R,n} \colon a_d {\mathrm n}^d + \dots + a_1 {\mathrm n} + a_0 \mapsto a_d n^d + \dots + a_1 n + a_0

that sends a polynomial form to its evaluation at {n}. Applying (4) to this ring homomorphism, and specializing to the element {\mathrm{n}} of {{\bf Z}[\mathrm{n}]}, we conclude that

\displaystyle  \phi_{R,n}( F_{{\bf Z}[\mathrm{n}]}(\mathrm{n}) ) = F_R( n )

for any ring {R} and any {n \in R}. If we then define {P \in {\bf Z}[\mathrm{n}]} to be the formal polynomial

\displaystyle  P := F_{{\bf Z}[\mathrm{n}]}(\mathrm{n}),

then this identity can be rewritten as

\displaystyle  F_R = P_R

and so we have indeed shown that the family {F_R} arises from a polynomial form {P}. Conversely, from the identity

\displaystyle  P = P_{{\bf Z}[\mathrm{n}]}(\mathrm{n})

valid for any polynomial form {P}, we see that two polynomial forms {P,Q} can only generate the same polynomial functions {P_R, Q_R} for all rings {R} if they are identical as polynomial forms. So the polynomial form {P} associated to the family {F_R} is unique.

We have thus created an identification of form and function: polynomial forms {P} are in one-to-one correspondence with families of functions {F_R} obeying the intertwining relation (4). But this identification can be interpreted as a special case of the Yoneda lemma, as follows. There are two categories in play here: the category {\mathbf{Ring}} of rings (where the morphisms are ring homomorphisms), and the category {\mathrm{Set}} of sets (where the morphisms are arbitrary functions). There is an obvious forgetful functor {\mathrm{Forget}: \mathbf{Ring} \rightarrow \mathbf{Set}} between these two categories that takes a ring and removes all of the algebraic structure, leaving behind just the underlying set. A collection {F_R: R \rightarrow R} of functions (i.e., {\mathbf{Set}}-morphisms) for each {R} in {\mathbf{Ring}} that obeys the intertwining relation (4) is precisely the same thing as a natural transformation from the forgetful functor {\mathrm{Forget}} to itself. So we have identified formal polynomials in {{\bf Z}[\mathbf{n}]} as a set with natural endomorphisms of the forgetful functor:

\displaystyle  \mathrm{Forget}({\bf Z}[\mathbf{n}]) \equiv \mathrm{Hom}( \mathrm{Forget}, \mathrm{Forget} ). \ \ \ \ \ (5)

Informally: polynomial forms are precisely those operations on rings that are respected by ring homomorphisms.

What does this have to do with Yoneda’s lemma? Well, remember that every element {n} of a ring {R} came with an evaluation homomorphism {\phi_{R,n}: {\bf Z}[\mathrm{n}] \rightarrow R}. Conversely, every homomorphism from {{\bf Z}[\mathrm{n}]} to {R} will be of the form {\phi_{R,n}} for a unique {n} – indeed, {n} will just be the image of {\mathrm{n}} under this homomorphism. So the evaluation homomorphism provides a one-to-one correspondence between elements of {R}, and ring homomorphisms in {\mathrm{Hom}({\bf Z}[\mathrm{n}], R)}. This correspondence is at the level of sets, so this gives the identification

\displaystyle  \mathrm{Forget} \equiv \mathrm{Hom}({\bf Z}[\mathrm{n}], -).

Thus our identification can be written as

\displaystyle  \mathrm{Forget}({\bf Z}[\mathbf{n}]) \equiv \mathrm{Hom}( \mathrm{Hom}({\bf Z}[\mathrm{n}], -), \mathrm{Forget} )

which is now clearly a special case of the Yoneda lemma

\displaystyle  F(A) \equiv \mathrm{Hom}( \mathrm{Hom}(A, -), F )

that applies to any functor {F: {\mathcal C} \rightarrow \mathbf{Set}} from a (locally small) category {{\mathcal C}} and any object {A} in {{\mathcal C}}. And indeed if one inspects the standard proof of this lemma, it is essentially the same argument as the argument we used above to establish the identification (5). More generally, it seems to me that the Yoneda lemma is often used to identify “formal” objects with their “functional” interpretations, as long as one simultaneously considers interpretations across an entire category (such as the category of rings), as opposed to just a single interpretation in a single object of the category in which there may be some loss of information due to the peculiarities of that specific object. Grothendieck’s “functor of points” interpretation of a scheme, discussed in this previous blog post, is one typical example of this.

August 24, 2023

Tommaso DorigoOn The Multiverse

I recently read a book by Martin Rees, "On the future". I found it an agile small book packed full with wisdom and interesting considerations on what's in the plate for humanity in the coming decades, centuries, millennia, billions of years. And I agree with much of what he wrote in it, finding also coincidental views on topics I had built my own judgement independently in the past.

read more

Terence TaoAn upper bound on the mean value of the Erdős-Hooley delta function

Dimitris Koukoulopoulos and I have just uploaded to the arXiv our paper “An upper bound on the mean value of the Erdős-Hooley delta function“. This paper concerns a (still somewhat poorly understood) basic arithmetic function in multiplicative number theory, namely the Erdos-Hooley delta function

\displaystyle  \Delta(n) := \sup_u \Delta(n;u)


\displaystyle  \Delta(n;u) := \# \{ d|n: e^u < d \leq e^{u+1} \}.

The function {\Delta} measures the extent to which the divisors of a natural number can be concentrated in a dyadic (or more precisely, {e}-dyadic) interval {(e^u, e^{u+1}]}. From the pigeonhole principle, we have the bounds

\displaystyle  \frac{\tau(n)}{\log n} \ll \Delta(n) \leq \tau(n),

where {\tau(n) := \# \{ d: d|n\}} is the usual divisor function. The statistical behavior of the divisor function is well understood; for instance, if {n} is drawn at random from {1} to {x}, then the mean value of {\tau(n)} is roughly {\log x}, the median is roughly {\log^{\log 2} x}, and (by the Erdős-Kac theorem) {\tau(n)} asymptotically has a log-normal distribution. In particular, there are a small proportion of highly divisible numbers that skew the mean to be significantly higher than the median.

On the other hand, the statistical behavior of the Erdős-Hooley delta function is significantly less well understood, even conjecturally. Again drawing {n} at random from {1} to {x} for large {x}, the median is known to be somewhere between {(\log\log x)^{0.3533\dots}} and {(\log\log x)^{0.6102\dots}} for large {x} – a (difficult) recent result of Ford, Green, and Koukoulopolous (for the lower bound) and La Bretèche and Tenenbaum (for the upper bound). And the mean {\frac{1}{x} \sum_{n \leq x} \Delta(n)} was even less well controlled; the best previous bounds were

\displaystyle  \log \log x \ll \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll \exp( c \sqrt{\log\log x} )

for any {c > \sqrt{2} \log 2}, with the lower bound due to Hall and Tenenbaum, and the upper bound a recent result of La Bretèche and Tenenbaum.

The main result of this paper is an improvement of the upper bound to

\displaystyle  \frac{1}{x} \sum_{n \leq x} \Delta(n) \ll (\log \log x)^{11/4}.

It is still unclear to us exactly what to conjecture regarding the actual order of the mean value.

The reason we looked into this problem was that it was connected to forthcoming work of David Conlon, Jacob Fox, and Huy Pham on the following problem of Erdos: what is the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no non-empty subset of {A} sums to a perfect square? Erdos observed that one can obtain sets of size {\gg N^{1/3}} (basically by considering certain homogeneous arithmetic progressions), and Nguyen and Vu showed an upper bound of {\ll N^{1/3} (\log N)^{O(1)}}. With our mean value bound as input, together with several new arguments, Conlon, Fox, and Pham have been able to improve the upper bound to {\ll N^{1/3} (\log\log N)^{O(1)})}.

Let me now discuss some of the ingredients of the proof. The first few steps are standard. Firstly we may restrict attention to square-free numbers without much difficulty (the point being that if a number {n} factors as {n = d^2 m} with {m} squarefree, then {\Delta(n) \leq \tau(d^2) \Delta(m)}). Next, because a square-free number {n>1} can be uniquely factored as {n = pm} where {p} is a prime and {m} lies in the finite set {{\mathcal S}_{<p}} of squarefree numbers whose prime factors are less than {p}, and {\Delta(n) \leq \tau(p) \Delta(m) = 2 \Delta(m)}, it is not difficult to establish the bound

\displaystyle  \frac{1}{x} \sum_{n \in {\mathcal S}_{<x}} \Delta(n) \ll \sup_{2 \leq y\leq x} \frac{1}{\log y} \sum_{n \in {\mathcal S}_{<y}} \frac{\Delta(n)}{n}.

The upshot of this is that one can replace an ordinary average with a logarithmic average, thus it suffices to show

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x}} \frac{\Delta(n)}{n} \ll (\log \log x)^{11/4}. \ \ \ \ \ (1)

We actually prove a slightly more refined distributional estimate: for any {A \geq 2}, we have a bound

\displaystyle  \Delta(n) \ll A \log^{3/4} A \ \ \ \ \ (2)

outside of an exceptional set {E} which is small in the sense that

\displaystyle  \frac{1}{\log x} \sum_{n \in {\mathcal S}_{<x} x: n \in E} \frac{1}{n} \ll \frac{1}{A}. \ \ \ \ \ (3)

It is not difficult to get from this distributional estimate to the logarithmic average estimate (1) (worsening the exponent {3/4} to {3/4+2 = 11/4}).

To get some intuition on the size of {\Delta(n)}, we observe that if {y > 0} and {n_{<y}} is the factor of {n} coming from the prime factors less than {y}, then

\displaystyle  \Delta(n) \geq \Delta(n_{<y}) \gg \frac{\tau(n_{<y})}{\log n_{<y}}. \ \ \ \ \ (4)

On the other hand, standard estimates let one establish that

\displaystyle  \tau(n_{<y}) \ll A \log n_{<y} \ \ \ \ \ (5)

for all {y}, and all {n} outside of an exceptional set that is small in the sense (3); in fact it turns out that one can also get an additional gain in this estimate unless {\log y} is close to {A^{\log 4}}, which turns out to be useful when optimizing the bounds. So we would like to approximately reverse the inequalities in (4) and get from (5) to (2), possibly after throwing away further exceptional sets of size (3).

At this point we perform another standard technique, namely the moment method of controlling the supremum {\Delta(n) = \sup_u \Delta(n;u)} by the moments

\displaystyle  M_q(n) := \int_{{\bf R}} \Delta(n,u)^q\ du

for natural numbers {q}; it is not difficult to establish the bound

\displaystyle  \Delta(n) \ll M_q(n)^{1/q}

and one expects this bound to become essentially sharp once {q \sim \log\log x}. We will be able to show a moment bound

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n) / \tau(n)}{n} \leq O(q)^q A^{q-2} \log^{3q/4} A

for any {q \geq 2} for some exceptional set {E_q} obeying the smallness condition (3) (actually, for technical reasons we need to improve the right-hand side slightly to close an induction on {q}); this will imply the distributional bound (2) from a standard Markov inequality argument (setting {q \sim \log\log x}).

The strategy is then to obtain a good recursive inequality for (averages of) {M_q(n)}. As in the reduction to (1), we factor {n=pm} where {p} is a prime and {m \in {\mathcal S}_{<p}}. One observes the identity

\displaystyle  \Delta(n;u) = \Delta(m;u) + \Delta(m;u-\log p)

for any {u}; taking moments, one obtains the identity

\displaystyle  M_q(n) = \sum_{a+b=q; 0 \leq b \leq q} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

As in previous literature, one can try to average in {p} here and apply Hölder’s inequality. But it convenient to first use the symmetry of the summand in {a,b} to reduce to the case of relatively small values of {b}:

\displaystyle  M_q(n) \leq 2 \sum_{a+b=q; 0 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

One can extract out the {b=0} term as

\displaystyle  M_q(n) \leq 2 M_q(m)

\displaystyle + 2 \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

It is convenient to eliminate the factor of {2} by dividing out by the divisor function:

\displaystyle  \frac{M_q(n)}{\tau(n)} \leq \frac{M_q(m)}{\tau(m)}

\displaystyle + \frac{1}{m} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \int_{\bf R} \Delta(m;u)^a \Delta(m;u-\log p)^b\ du.

This inequality is suitable for iterating and also averaging in {p} and {m}. After some standard manipulations (using the Brun–Titchmarsh and Hölder inequalities), one is able to estimate sums such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_q(n)/\tau(n)}{n} \ \ \ \ \ (6)

in terms of sums such as

\displaystyle  \int_2^{x^2} \sum_{a+b=q; 1 \leq b \leq q/2} \binom{q}{a} \sum_{n \in {\mathcal S}_{<x} \backslash E_q} \frac{M_a(n) M_b(n)}{\tau(n) n} \frac{dy}{\log^2 y}

(assuming a certain monotonicity property of the exceptional set {E_q} that turns out to hold in our application). By an induction hypothesis and a Markov inequality argument, one can get a reasonable pointwise upper bound on {M_b} (after removing another exceptional set), and the net result is that one can basically control the sum (6) in terms of expressions such as

\displaystyle  \sum_{n \in {\mathcal S}_{<x} \backslash E_a} \frac{M_a(n)/\tau(n)}{n}

for various {a < q}. This allows one to estimate these expressions efficiently by induction.

Terence TaoThe convergence of an alternating series of Erdős, assuming the Hardy–Littlewood prime tuples conjecture

I have just uploaded to the arXiv my paper “The convergence of an alternating series of Erdős, assuming the Hardy–Littlewood prime tuples conjecture“. This paper concerns an old problem of Erdős concerning whether the alternating series {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} converges, where {p_n} denotes the {n^{th}} prime. The main result of this paper is that the answer to this question is affirmative assuming a sufficiently strong version of the Hardy–Littlewood prime tuples conjecture.

The alternating series test does not apply here because the ratios {\frac{n}{p_n}} are not monotonically decreasing. The deviations of monotonicity arise from fluctuations in the prime gaps {p_{n+1}-p_n}, so the enemy arises from biases in the prime gaps for odd and even {n}. By changing variables from {n} to {p_n} (or more precisely, to integers in the range between {p_n} and {p_{n+1}}), this is basically equivalent to biases in the parity {(-1)^{\pi(n)}} of the prime counting function. Indeed, it is an unpublished observation of Said that the convergence of {\sum_{n=1}^\infty \frac{(-1)^n n}{p_n}} is equivalent to the convergence of {\sum_{n=10}^\infty \frac{(-1)^{\pi(n)}}{n \log n}}. So this question is really about trying to get a sufficiently strong amount of equidistribution for the parity of {\pi(n)}.

The prime tuples conjecture does not directly say much about the value of {\pi(n)}; however, it can be used to control differences {\pi(n+\lambda \log x) - \pi(n)} for {n \sim x} and {\lambda>0} not too large. Indeed, it is a famous calculation of Gallagher that for fixed {\lambda}, and {n} chosen randomly from {1} to {x}, the quantity {\pi(n+\lambda \log x) - \pi(n)} is distributed according to the Poisson distribution of mean {\lambda} asymptotically if the prime tuples conjecture holds. In particular, the parity {(-1)^{\pi(n+\lambda \log x)-\pi(n)}} of this quantity should have mean asymptotic to {e^{-2\lambda}}. An application of the van der Corput {A}-process then gives some decay on the mean of {(-1)^{\pi(n)}} as well. Unfortunately, this decay is a bit too weak for this problem; even if one uses the most quantitative version of Gallagher’s calculation, worked out in a recent paper of (Vivian) Kuperberg, the best bound on the mean {|\frac{1}{x} \sum_{n \leq x} (-1)^{\pi(n)}|} is something like {1/(\log\log x)^{-1/4+o(1)}}, which is not quite strong enough to overcome the doubly logarithmic divergence of {\sum_{n=1}^\infty \frac{1}{n \log n}}.

To get around this obstacle, we take advantage of the random sifted model {{\mathcal S}_z} of the primes that was introduced in a paper of Banks, Ford, and myself. To model the primes in an interval such as {[n, n+\lambda \log x]} with {n} drawn randomly from say {[x,2x]}, we remove one random residue class {a_p \hbox{ mod } p} from this interval for all primes {p} up to Pólya’s “magic cutoff” {z \approx x^{1/e^\gamma}}. The prime tuples conjecture can then be intepreted as the assertion that the random set {{\mathcal S}_z} produced by this sieving process is statistically a good model for the primes in {[n, n+\lambda \log x]}. After some standard manipulations (using a version of the Bonferroni inequalities, as well as some upper bounds of Kuperberg), the problem then boils down to getting sufficiently strong estimates for the expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}} of the random sifted set {{\mathcal S}_z}.

For this problem, the main advantage of working with the random sifted model, rather than with the primes or the singular series arising from the prime tuples conjecture, is that the sifted model can be studied iteratively from the partially sifted sets {{\mathcal S}_w} arising from sifting primes {p} up to some intermediate threshold {w<z}, and that the expected parity of the {{\mathcal S}_w} experiences some decay in {w}. Indeed, once {w} exceeds the length {\lambda \log x} of the interval {[n,n+\lambda \log x]}, sifting {{\mathcal S}_w} by an additional prime {p} will cause {{\mathcal S}_w} to lose one element with probability {|{\mathcal S}_w|/p}, and remain unchanged with probability {1 - |{\mathcal S}_w|/p}. If {|{\mathcal S}_w|} concentrates around some value {\overline{S}_w}, this suggests that the expected parity {{\bf E} (-1)^{|{\mathcal S}_w|}} will decay by a factor of about {|1 - 2 \overline{S}_w/p|} as one increases {w} to {p}, and iterating this should give good bounds on the final expected parity {{\bf E} (-1)^{|{\mathcal S}_z|}}. It turns out that existing second moment calculations of Montgomery and Soundararajan suffice to obtain enough concentration to make this strategy work.

Doug NatelsonSome interesting recent papers - lots to ponder

As we bid apparent farewell to LK99, it's important to note that several other pretty exciting things have been happening in the condensed matter/nano world.  Here are a few papers that look intriguing (caveat emptor:  I have not had a chance to read these in any real depth, so my insights are limited.)

  • Somehow I had never heard of Pines' Demon until this very recent paper came out, and the story is told briefly here.  The wikipedia link is actually very good, so I don't know that I can improve upon the description.  You can have coupled collective modes for electrons in two different bands in a material, where the electrons in one band are sloshing anti-phase with the electrons in the other band.  The resulting mode can be "massless" (in the sense that its energy is linearly proportional to its momentum, like a photon's), and because it doesn't involve net real-space charge displacement, to first approximation it doesn't couple to light.  The UIUC group used a really neat, very sensitive angle-resolved electron scattering method to spot this for the first time, in high quality films of Sr2RuO4.  (An arxiv version of the paper is here.) 
  • Here is a theory paper in Science (arxiv version) that presents a general model of so-called strange metals (ancient post on this blog).  Strange metals appear in a large number of physical systems and are examples where the standard picture of metals, Fermi liquid theory, seems to fail.  I will hopefully write a bit more about this soon.  One of the key signatures of strange metals is a low temperature electrical resistivity that varies like \(\rho(T) = \rho_{0} + AT\), as opposed to the usual Fermi liquid result \(\rho(T) = \rho_{0} + AT^{2}\).  Explaining this and the role of interactions and disorder is a real challenge.  Here is a nice write-up by the Simons Foundation on this.
  • Scanning tunneling microscopy is a great spectroscopic tool, and here is an example where it's been possible to map out information about the many-body electronic states in magic-angle twisted bilayer graphene (arxiv version).  Very pretty images, though I need to think carefully about how to understand what is seen here.
  • One more very intriguing result is this paper, which reports the observation of the fractional quantum anomalous Hall effect (arxiv version).  As I'd mentioned here, the anomalous Hall effect (AHE, a spontaneous voltage appearing transverse to a charge current) in magnetic materials was discovered in 1881 and not understood until recently.  Because of cool topological physics, some materials show a quantized AHE.  In 2D electron systems, the fractional quantum Hall effect is deeply connected to many-body interaction effects.  Seeing fractional quantum Hall states spontaneously appear in the AHE is quite exciting, suggesting that rich many-body correlations can happen in these topological magnetic systems as well.  Note: I really need to read more about this - I don't know anything in depth here.
  • On the more applied side, this article is an extremely comprehensive review of the state of the art for transistors, the critical building block of basically every modern computing technology.  Sorry - I don't have a link to a free version (unless this one is open access and I missed it).  Anyway, for anyone who wants to understand modern transistor technology, where it is going, and why, I strongly encourage you to read this.  If I was teaching my grad nano class, I'd definitely use this as a reference.
  • Again on the applied side, here is a neat review of energy harvesting materials.  There is a lot of interest in finding ways to make use of energy that would otherwise go to waste (e.g. putting piezo generators in your clothing or footwear that could trickle charge your electronics while you walk around).  
  • In the direction of levity, in all too short supply these days, xkcd was really on-point this week.  For condensed matter folks, beware the quasiparticle beam weapon.  For those who do anything with electronics, don't forget this handy reference guide

August 22, 2023

Scott Aaronson Palate cleanser

  1. Ben Brubaker wrote a long piece for Quanta magazine about meta-complexity. The first three-quarters are a giant refresher on the story of computability and complexity theory in the 20th century—including Turing, Gödel, Shannon, Cook, Karp, Levin, Baker-Gill-Solovay, Sipser, Razborov, Rudich, and more. But then the last quarter gets into actually new (well, within the last couple years) developments, including the NP-completeness of “Partial-MCSP” and other progress on the Minimum Circuit Size Problem, and progress toward basing cryptography on the sole assumption P≠NP, and ruling out Impagliazzo’s “Heuristica” and “Pessiland” worlds. I’m quoted (and helped proofread the piece) despite playing no role in the new developments. Worth a read if you don’t already know this stuff.
  2. Duane Rich created a Part II of his YouTube video series on the Busy Beaver function. It features some of the core ideas from my Busy Beaver survey, clearly narrated and beautifully animated. If reading my survey is too much for you, now you can just watch the movie!
  3. Aznaur Midov recorded a podcast with me about quantum computing and AI—just in case you haven’t got enough of either of those lately.
  4. Oded Regev put an exciting paper on the arXiv, showing how to factor an n-digit integer using quantum circuits of size ~O(n3/2) (multiple such circuits, whose results are combined classically), assuming a smoothness conjecture from number theory. This compares to ~O(n2) for Shor’s algorithm. Regev’s algorithm uses classical algorithms for lattice problems, thereby connecting that subject to quantum factoring. This might or might not bring nearer in time the day when we can break (say) 2048-bit RSA keys using a quantum computer—that mostly depends, apparently, on whether Regev’s algorithm can also be made highly efficient in its use of qubits.
  5. A team from IBM, consisting of Sergey Bravyi, Andrew Cross, Jay Gambetta, Dmitri Maslov, Ted Yoder, and my former student Patrick Rall, put another exciting paper on the arXiv, which reports an apparent breakthrough in quantum error-correction—building a quantum memory based on LDPC (Low Density Parity Check) codes rather than the Kitaev surface code, and which (they say) with an 0.1% physical error rate, can preserve 12 logical qubits for ten million syndrome cycles using 288 physical qubits, rather than more than 4000 physical qubits with the surface code. Anyone who understands in more detail is welcome to comment!
  6. Boaz Barak wrote a blog post about the history of the atomic bomb, and possible lessons for AI development today. I’d been planning to write a blog post about the history of the atomic bomb and possible lessons for AI development today. Maybe I’ll still write that blog post.
  7. Last week I attended the excellent Berkeley Simons Workshop on Large Language Models and Transformers, hosted by my former adviser Umesh Vazirani. While there, I gave a talk on watermarking of LLMs, which you can watch on YouTube (see also here for the PowerPoint slides). Shtetl-Optimized readers might also enjoy the talk by OpenAI cofounder Ilya Sutskever, An Observation on Generalization, as well as many other talks on all aspects of LLMs, from theoretical to empirical to philosophical to legal.
  8. Right now I’m excited to be at Crypto’2023 in Santa Barbara, learning a lot about post-quantum crypto and more, while dodging both earthquakes and hurricanes. On Wednesday, I’ll give an invited plenary talk about “Neurocryptography”: my vision for what cryptography can contribute to AI safety, including via watermarking and backdoors. Who better to enunciate such a vision than someone who’s neither a cryptographer nor an AI person? If you’re at Crypto and see me, feel free to come say hi.

August 16, 2023

Tommaso DorigoMultithreading For Dummies

What is multithreading? It is the use of multiple processors to perform tasks in parallel by a single computer program. I have known this simple fact for over thirty years, but funnily enough I never explored it in practice. The reason is fundamentally that I am a physicist, not a computer scientist, and as a physicist I tend to stick with a known skillset to solve my problems, and to invest time in more physics knowledge than software wizardry. You might well say I am not a good programmer altogether, although that would secretly cause me pain. I would answer that while it is certainly true that my programs are ugly and hard to read, they do what they are supposed to do, as proven by a certain record of scientific publications. 

read more

August 15, 2023

Scott Aaronson Testing GPT-4 with math plugins

A couple nights ago Ernie Davis and I put out a paper entitled Testing GPT-4 on Wolfram Alpha and Code Interpreter plug-ins on math and science problems. Following on our DALL-E paper with Gary Marcus, this was another “adversarial collaboration” between me and Ernie. I’m on leave to work for OpenAI, and have been extremely excited by the near-term applications of LLMs, while Ernie has often been skeptical of OpenAI’s claims, but we both want to test our preconceptions against reality. As I recently remarked to Ernie, we both see the same glass; it’s just that he mostly focuses on the empty half, whereas I remember how fantastical even a drop of water in this glass would’ve seemed to me just a few years ago, and therefore focus more on the half that’s full.

Anyway, here are a few examples of the questions I posed to GPT-4, with the recent plug-ins that enhance its calculation abilities:

If you fell into the black hole at the center of the Milky Way, how long would you have before hitting the singularity? [You’d have about a minute]

Approximately how much time would a commercial airliner save in going from New York to Tel Aviv, if it could go in a straight line, through a tunnel in the earth, at the same speed as usual? [I was on such a flight when I wrote this question, and must’ve been bored and impatient. The answer is ~50 minutes.]

Approximately how long would it take to transmit an entire human genome over a standard WiFi connection? [About 4 minutes, assuming no compression and a 25Mbps connection]

How does the total weight of all the uranium that humans mined, compare to the total weight of all the gold that they’ve mined? [About 13 times as much uranium]

Approximately how many errors will a standard laptop suffer over its lifetime, due to cosmic rays hitting the microchip? [Estimates vary widely, but maybe 2000]

What is the approximate probability that a randomly-chosen 100-digit integer is prime? [About 0.4%]

GPT-4 with plug-ins did very well on all of the questions above. Here, by contrast, is a question where it did poorly:

Assume that IQs are normally distributed, with a mean of 100 and a standard deviation of 15. For what n is there the maximum excess of people with an IQ of n over people with an IQ of n+1?

GPT-4 thought that there were two solutions, n~85 and n~115, rather than just a single solution (n~115).

Ernie, for his part, was more a fan of “pure pain” problems like the following:

A quantity of chlorine gas is in a right prism whose base is a triangle with sides 5cm, 7cm, and 4cm and whose altitude is 8cm. The temperature is the freezing point of mercury, and the pressure is 2 atmospheres. What is the mass of the chlorine?

GPT-4 actually aced the above problem. But it failed the majority of Ernie’s other problems, such as:

Viewed from Vega, what is the angle between Sirius and the Sun? [The answer is about 5.6 degrees. GPT thought, implausibly, that it was just 0.005 degrees, or that the answer would vary depending on the time of day.]

My personal favorite among Ernie’s problems was this one:

A physical process generates photons whose energies follow a random distribution of the following form: For positive energy e, the probability density at e is proportional to the value of e in a Gaussian distribution with mean 2 Ev and standard deviation 0.01 Ev. The probability of a negative value is zero. What is the expected value of the wavelength of a photon produced by this process? (Give the mathematical answer, assuming that the above description is exact, and assuming the standard relation between energy and wavelength in a photon. The answer is not physically plausible.)

The answer, in case you’re wondering, is “infinity.” On this problem, GPT-4 set up the integral perfectly correctly, then correctly fed it to WolframAlpha. But on getting the result, it apologized that “something went wrong,” it must’ve made a mistake, the integral seemed not to be converging, and there was a singularity at E=0 that would have to be dealt with by a change of variables. So it tried again. And again. And again. Each time, it got the same “mistaken” result, and each time it profusely apologized. Despite the explicit wording of the problem, GPT-4 never considered the possibility that the human would be so ridiculous as to give it a physics problem with an infinite answer.

Anyway, what did we learn from this exercise?

  • GPT-4 remains an endlessly enthusiastic B/B+ student in math, physics, and any other STEM field. By using the Code Interpreter or WolframAlpha plugins, it can correctly solve difficult word problems, involving a combination of tedious calculations, world knowledge, and conceptual understanding, maybe a third of the time—a rate that’s not good enough to be relied on, but is utterly astounding compared to where AI was just a few years ago.
  • GPT-4 can now clearly do better at calculation-heavy STEM problems with the plugins than it could do without the plugins.
  • We didn’t see that either the WolframAlpha or Code Interpreter plugin is clearly superior to the other. It’s possible that they’re incomparable, good for different things.
  • When GPT-4 screwed up, it was often due to a “poor interface” between the language model and the plug-in—e.g. the model having no idea what call to make or how to recover when a call returned an error. Enormous gains seem to be possible by improving these interfaces.
  • Sometimes, much like humans I’ve known, GPT-4 would do amazingly well at a difficult computation, then fumble a trivial final step (e.g., converting the answer into the requested units). Just like with I would with human students, I advocated for generous partial credit in such cases.
  • I conjecture, although I don’t have empirical data to show this, that GPT-4 with math plug-ins used in “interactive mode”—with a human reformulating and clarifying the problems as needed, feeding ideas, checking the answers for plausibility, pointing out errors, etc.—could currently get excellent accuracy on these sorts of problems faster than either GPT-4 with math plug-ins alone, or all but the very best humans alone.

August 14, 2023

Scott Aaronson Long-awaited Shtetl-Optimized Barbenheimer post! [warning: spoilers]

I saw Oppenheimer three weeks ago, but I didn’t see Barbie until this past Friday. Now, my scheduled flight having been cancelled, I’m on multiple redeyes on my way to a workshop on Large Language Models at the Simons Institute in Berkeley, organized by my former adviser and quantum complexity theorist Umesh Vazirani (!). What better occasion to review the two movies of the year, or possibly decade?

Shtetl-Optimized Review of Oppenheimer

Whatever its flaws, you should of course see it, if you haven’t yet. I find it weird that it took 80 years for any movie even to try to do justice to one of the biggest stories in the history of the world. There were previous attempts, even a risible opera (“Doctor Atomic”), but none of them made me feel for even a second like I was there in Los Alamos. This movie did. And it has to be good that tens of millions of people, raised on the thin gruel of TikTok and Kardashians and culture-war, are being exposed for the first time to a bygone age when brilliant and conflicted scientific giants agonized over things that actually mattered, such as the ultimate nature of matter and energy, life and death and the future of the world. And so the memory of that age will be kept alive for another generation, and some of the young viewers will no doubt realize that they can be tormented about things that actually matter as well.

This is a movie where General Groves, Lewis Strauss, Einstein, Szilard, Bohr, Heisenberg, Rabi, Teller, Fermi, and E.O. Lawrence are all significant characters, and the acting and much of the dialogue are excellent. I particularly enjoyed Matt Damon as Groves.

But there are also flaws [SPOILERS FOLLOW]:

1. Stuff that never happened. Most preposterously, Oppenheimer travels all the way from Los Alamos to Princeton, to have Einstein check the calculation suggesting that the atomic bomb could ignite the atmosphere.

2. Weirdly, but in common with pretty much every previous literary treatment of this material, the movie finds the revocation of Oppenheimer’s security clearance a far more riveting topic than either the actual creation of the bomb or the prospect of global thermonuclear war. Maybe half the movie consists of committee hearings.

3. The movie misses the opportunity to dramatize almost any of the scientific turning points, from Szilard’s original idea for a chain reaction to the realization of the need to separate U-235 to the invention of the implosion design—somehow, a 3-hour movie didn’t have time for any of this.

4. The movie also, for some reason, completely misses the opportunity to show Oppenheimer’s anger over the bombing of Nagasaki, three days after Hiroshima—a key turning point in the story it’s trying to tell.

5. There’s so much being said, by actors speaking quickly and softly and often imitating European accents, that there’s no hope of catching it all. I’ll need to watch it again with subtitles.

Whatever it gets wrong, this movie does a good job exploring the fundamental irony of the Manhattan Project, that the United States is being propelled into its nuclear-armed hegemony by a group of mostly Jewish leftists who constantly have affairs and hang out with Communists and deeply distrust the government and are distrusted by it.

The movie clearly shows how much grief Oppenheimer gets from both sides: to his leftist friends he’s a sellout; to the military brass he’s potentially disloyal to the United States. For three hours of screen time, he’s constantly pressed on what he actually believes: does he support building the hydrogen bomb, or not? Does he regret the bombing of Hiroshima and (especially) Nagasaki? Does he believe that the US nuclear plans should be shared with Stalin? Every statement in either direction seems painfully wrung from him, as if he’s struggling to articulate a coherent view, or buffeted around by conflicting loyalties and emotions, even while so many others seem certain. In that way, he’s an avatar for the audience.

Anyway, yeah, see it.

Shtetl-Optimized Review of Barbie

A friend-of-the-blog, who happens to be one of the great young theoretical physicists of our time, opined to me that Barbie was a far more interesting movie than Oppenheimer and “it wasn’t even close.” Having now seen both, I’m afraid I can’t agree.

I can best compare my experience watching Barbie to that of watching a two-hour-long episode of South Park—not one of the best episodes, but one that really runs its satircal premise into the ground. Just like with South Park, there’s clearly an Important Commentary On Hot-Button Cultural Issues transpiring, but the commentary has been reflected through dozens of funhouse mirrors and then ground up into slurry, with so many layers of self-aware meta-irony that you can’t keep track of what point is being made, and then fed to hapless characters who are little more than the commentary’s mouthpieces. This is often amusing and interesting, but it rarely makes you care about the characters.

Is Barbie a feminist movie that critiques patriarchy and capitalism? Sort of, yes, but it also subverts that, and subverts the subversion. To sum up [SPOILERS FOLLOW], Barbieland is a matriarchy, where everyone seems pretty happy except for Ken, who resents how Barbie ignores him. Then Barbie and Ken visit the real world, and discover the real world is a patriarchy, where Mattel is controlled by a board of twelve white men (the real Mattel’s board has 7 men and 5 women), and where Barbie is wolf-whistled at and sexually objectified, which she resents despite not knowing what sex is.

Ken decides that patriarchy is just what Barbieland needs, and most importantly, will finally make Barbie need and appreciate him. So he returns and institutes it—both Barbies and Kens think it’s a wonderful idea, as they lack “natural immunity.” Horrified at what’s transpired, Barbie hatches a plan with the other Barbies to restore Barbieland to its rightful matriarchy. She also decisively rejects Ken’s advances. But Ken no longer minds, because he’s learned an important lesson about not basing his self-worth on Barbie’s approval. Barbie, for her part, makes the fateful choice to become a real, mortal woman and live the rest of her life in the real world. In the final scene—i.e., the joke the entire movie has been building up to—Barbie, filled with childlike excitement, goes for her first visit to the gynecologist.

What I found the weirdest is that this is a movie about gender relations, clearly aimed at adults, yet where sex and sexual desire and reproduction have all been taken off the table—explicitly so, given the constant jokes about the Barbies and Kens lacking genitalia and not knowing what they’re for. Without any of the biological realities that differentiate men from women in the first place, or (often enough) cause them to seek each other’s company, it becomes really hard to make sense of the movie’s irony-soaked arguments about feminism and patriarchy. In Barbieland, men and women are just two tribes, one obsessed with “brewsky beers,” foosball, guitar, and The Godfather; the other with shoes, hairstyles, and the war on cellulite. There’s no fundamental reason for any conflict between the two.

Well, except for one thing: Ken clearly needs Barbie’s affection, until he’s inexplicably cured of that need at the end. By contrast, no Barbies are ever shown needing any Kens for anything, or even particularly desiring the Kens’ company, except when they’ve been brainwashed into supporting the patriarchy. The most the movie manages to offer any straight males in the audience, at the very end, is well-wishes as they “Go Their Own Way”, and seek meaning in their lives without women.

For most straight men, I daresay, this would be an incredibly bleak message if it were true, so it’s fortunate that not even the movie’s creators seem actually to believe it. Greta Gerwig has a male partner, Noah Baumbach, with whom she co-wrote Barbie. Margot Robbie is married to a man named Tom Ackerley.

I suppose Barbie could be read as, among other things, a condemnation of male incel ideology, with its horrific desire to reinstitute the patriarchy, driven (or so the movie generously allows) by the incels’ all-too-human mistake of basing their entire self-worth on women’s affection, or lack thereof. If so, however, the movie’s stand-in for incels is … a buff, often shirtless Ryan Gosling, portraying the most famous fantasy boyfriend doll ever marketed to girls? Rather than feeling attacked, should nerdy, lovelorn guys cheer to watch a movie where even Ryan-Gosling-as-Ken effectively gets friendzoned, shot down, put in his place, reduced to a simpering beta just like they are? Yet another layer of irony tossed into the blender.

August 10, 2023

John PreskillCaltech’s Ginsburg Center

Editor’s note: On 10 August 2023, Caltech celebrated the groundbreaking for the Dr. Allen and Charlotte Ginsburg Center for Quantum Precision Measurement, which will open in 2025. At a lunch following the ceremony, John Preskill made these remarks.

Rendering of the facade of the Ginsburg Center

Hello everyone. I’m John Preskill, a professor of theoretical physics at Caltech, and I’m honored to have this opportunity to make some brief remarks on this exciting day.

In 2025, the Dr. Allen and Charlotte Ginsburg Center for Quantum Precision Measurement will open on the Caltech campus. That will certainly be a cause for celebration. Quite fittingly, in that same year, we’ll have something else to celebrate — the 100th anniversary of the formulation of quantum mechanics in 1925. In 1900, it had become clear that the physics of the 19th century had serious shortcomings that needed to be addressed, and for 25 years a great struggle unfolded to establish a firm foundation for the science of atoms, electrons, and light; the momentous achievements of 1925 brought that quest to a satisfying conclusion. No comparably revolutionary advance in fundamental science has occurred since then.

For 98 years now we’ve built on those achievements of 1925 to arrive at a comprehensive understanding of much of the physical world, from molecules to materials to atomic nuclei and exotic elementary particles, and much else besides. But a new revolution is in the offing. And the Ginsburg Center will arise at just the right time and at just the right place to drive that revolution forward.

Up until now, most of what we’ve learned about the quantum world has resulted from considering the behavior of individual particles. A single electron propagating as a wave through a crystal, unfazed by barriers that seem to stand in its way. Or a single photon, bouncing hundreds of times between mirrors positioned kilometers apart, dutifully tracking the response of those mirrors to gravitational waves from black holes that collided in a galaxy billions of light years away. Understanding that single-particle physics has enabled us to explore nature in unprecedented ways, and to build information technologies that have profoundly transformed our lives.

At the groundbreaking: Physics, Math and Astronomy Chair Fiona Harrison, California Assemblymember Chris Holden, President Tom Rosenbaum, Charlotte Ginsburg, Dr. Allen Ginsburg, Pasadena Mayor Victor Gordo, Provost Dave Tirrell.

What’s happening now is that we’re getting increasingly adept at instructing particles to move in coordinated ways that can’t be accurately described in terms of the behavior of one particle at a time. The particles, as we like to say, can become entangled. Many particles, like electrons or photons or atoms, when highly entangled, exhibit an extraordinary complexity that we can’t capture with the most powerful of today’s supercomputers, or with our current theories of how Nature works. That opens extraordinary opportunities for new discoveries and new applications.

We’re very proud of the role Caltech has played in setting the stage for the next quantum revolution. Richard Feynman envisioning quantum computers that far surpass the computers we have today. Kip Thorne proposing ways to use entangled photons to perform extraordinarily precise measurements. Jeff Kimble envisioning and executing ingenious methods for entangling atoms and photons. Jim Eisenstein creating and studying extraordinary phenomena in a soup of entangled electrons. And much more besides. But far greater things are yet to come.

How can we learn to understand and exploit the behavior of many entangled particles that work together? For that, we’ll need many scientists and engineers who work together. I joined the Caltech faculty in August 1983, almost exactly 40 years ago. These have been 40 good years, but I’m having more fun now than ever before. My training was in elementary particle physics. But as our ability to manipulate the quantum world advances, I find that I have more and more in common with my colleagues from different specialties. To fully realize my own potential as a researcher and a teacher, I need to stay in touch with atomic physics, condensed matter physics, materials science, chemistry, gravitational wave physics, computer science, electrical engineering, and much else. Even more important, that kind of interdisciplinary community is vital for broadening the vision of the students and postdocs in our research groups.

Nurturing that community — that’s what the Ginsburg Center is all about. That’s what will happen there every day. That sense of a shared mission, enhanced by colocation, will enable the Ginsburg Center to lead the way as quantum science and technology becomes increasingly central to Caltech’s research agenda in the years ahead, and increasingly important for science and engineering around the globe. And I just can’t wait for 2025.

Caltech is very fortunate to have generous and visionary donors like the Ginsburgs and the Sherman Fairchild Foundation to help us realize our quantum dreams.

Dr. Allen and Charlotte Ginsburg

August 09, 2023

John PreskillIt from Qubit: The Last Hurrah

Editor’s note: Since 2015, the Simons Foundation has supported the “It from Qubit” collaboration, a group of scientists drawing on ideas from quantum information theory to address deep issues in fundamental physics. The collaboration held its “Last Hurrah” event at Perimeter Institute last week. Here is a transcript of remarks by John Preskill at the conference dinner.

It from Qubit 2023 at Perimeter Institute

This meeting is forward-looking, as it should be, but it’s fun to look back as well, to assess and appreciate the progress we’ve made. So my remarks may meander back and forth through the years. Settle back — this may take a while.

We proposed the It from Qubit collaboration in March 2015, in the wake of several years of remarkable progress. Interestingly, that progress was largely provoked by an idea that most of us think is wrong: Black hole firewalls. Wrong perhaps, but challenging to grapple with.

This challenge accelerated a synthesis of quantum computing, quantum field theory, quantum matter, and quantum gravity as well. By 2015, we were already appreciating the relevance to quantum gravity of concepts like quantum error correction, quantum computational complexity, and quantum chaos. It was natural to assemble a collaboration in which computer scientists and information theorists would participate along with high-energy physicists.

We built our proposal around some deep questions where further progress seemed imminent, such as these:

Does spacetime emerge from entanglement?
Do black holes have interiors?
What is the information-theoretical structure of quantum field theory?
Can quantum computers simulate all physical phenomena?

On April 30, 2015 we presented our vision to the Simons Foundation, led by Patrick [Hayden] and Matt [Headrick], with Juan [Maldacena], Lenny [Susskind] and me tagging along. We all shared at that time a sense of great excitement; that feeling must have been infectious, because It from Qubit was successfully launched.

Some It from Qubit investigators at a 2015 meeting.

Since then ideas we talked about in 2015 have continued to mature, to ripen. Now our common language includes ideas like islands and quantum extremal surfaces, traversable wormholes, modular flow, the SYK model, quantum gravity in the lab, nonisometric codes, the breakdown of effective field theory when quantum complexity is high, and emergent geometry described by Von Neumann algebras. In parallel, we’ve seen a surge of interest in quantum dynamics in condensed matter, focused on issues like how entanglement spreads, and how chaotic systems thermalize — progress driven in part by experimental advances in quantum simulators, both circuit-based and analog.

Why did we call ourselves “It from Qubit”? Patrick explained that in our presentation with a quote from John Wheeler in 1990. Wheeler said,

“It from bit” symbolizes the idea that every item of the physical world has at bottom—a very deep bottom, in most instances — an immaterial source and explanation; that which we call reality arises in the last analysis from the posing of yes-or-no questions and the registering of equipment-evoked responses; in short, that all things physical are information-theoretic in origin and that this is a participatory universe.

As is often the case with Wheeler, you’re not quite sure what he’s getting at. But you can glean that Wheeler envisioned that progress in fundamental physics would be hastened by bringing in ideas from information theory. So we updated Wheeler’s vision by changing “it from bit” to “it from qubit.”

As you may know, Richard Feynman had been Wheeler’s student, and he once said this about Wheeler: “Some people think Wheeler’s gotten crazy in his later years, but he’s always been crazy.” So you can imagine how flattered I was when Graeme Smith said the exact same thing about me.

During the 1972-73 academic year, I took a full-year undergraduate course from Wheeler at Princeton that covered everything in physics, so I have a lot of Wheeler stories. I’ll just tell one, which will give you some feel for his teaching style. One day, Wheeler arrives in class dressed immaculately in a suit and tie, as always, and he says: “Everyone take out a sheet of paper, and write down all the equations of physics – don’t leave anything out.” We dutifully start writing equations. The Schrödinger equation, Newton’s laws, Maxwell’s equations, the definition of entropy and the laws of thermodynanics, Navier-Stokes … we had learned a lot. Wheeler collects all the papers, and puts them in a stack on a table at the front of the classroom. He gestures toward the stack and says imploringly “Fly!” [Long pause.] Nothing happens. He tries again, even louder this time: “Fly!” [Long pause.] Nothing happens. Then Wheeler concludes: “On good authority, this stack of papers contains all the equations of physics. But it doesn’t fly. Yet, the universe flies. Something must be missing.”

Channeling Wheeler at the banquet, I implore my equations to fly. Photo by Jonathan Oppenheim.

He was an odd man, but inspiring. And not just odd, but also old. We were 19 and could hardly believe he was still alive — after all, he had worked with Bohr on nuclear fission in the 1930s! He was 61. I’m wiser now, and know that’s not really so old.

Now let’s skip ahead to 1998. Just last week, Strings 2023 happened right here at PI. So it’s fitting to mention that a pivotal Strings meeting occurred 25 years ago, Strings 1998 in Santa Barbara. The participants were in a celebratory mood, so much so that Jeff Harvey led hundreds of physicists in a night of song and dance. It went like this [singing to the tune of “The Macarena”]:

You start with the brane
and the brane is BPS.
Then you go near the brane
and the space is AdS.
Who knows what it means?
I don’t, I confess.
Ehhhh! Maldacena!

You can’t blame them for wanting to celebrate. Admittedly I wasn’t there, so how did I know that hundreds of physicists were singing and dancing? I read about it in the New York Times!

It was significant that by 1998, the Strings meetings had already been held annually for 10 years. You might wonder how that came about. Let’s go back to 1984. Those of you who are too young to remember might not realize that in the late 70s and early 80s string theory was in eclipse. It had initially been proposed as a model of hadrons, but after the discovery of asymptotic freedom in 1973 quantum chromodynamics became accepted as the preferred theory of the strong interactions. (Maybe the QCD string will make a comeback someday – we’ll see.) The community pushing string theory forward shrunk to a handful of people around the world. That changed very abruptly in August 1984. I tried to capture that sudden change in a poem I wrote for John Schwarz’s 60th birthday in 2001. I’ll read it — think of this as a history lesson.

Thirty years ago or more
John saw what physics had in store.
He had a vision of a string
And focused on that one big thing.

But then in nineteen-seven-three
Most physicists had to agree
That hadrons blasted to debris
Were well described by QCD.

The string, it seemed, by then was dead.
But John said: “It’s space-time instead!
The string can be revived again.
Give masses twenty powers of ten!

Then Dr. Green and Dr. Black,
Writing papers by the stack,
Made One, Two-A, and Two-B glisten.
Why is it none of us would listen?

We said, “Who cares if super tricks
Bring D to ten from twenty-six?
Your theory must have fatal flaws.
Anomalies will doom your cause.”

If you weren’t there you couldn’t know
The impact of that mighty blow:
“The Green-Schwarz theory could be true —
It works for S-O-thirty-two!”

Then strings of course became the rage
And young folks of a certain age
Could not resist their siren call:
One theory that explains it all.

Because he never would give in,
Pursued his dream with discipline,
John Schwarz has been a hero to me.
So … please don’t spell it with a  “t”!

And 39 years after the revolutionary events of 1984, the intellectual feast launched by string theory still thrives.

In the late 1980s and early 1990s, many high-energy physicists got interested in the black hole information problem. Of course, the problem was 15 years old by then; it arose when Hawking radiation was discovered, as Hawking himself pointed out shortly thereafter. But many of us were drawn to this problem while we waited for the Superconducting Super Collider to turn on. As I have sometimes done when I wanted to learn something, in 1990 I taught a course on quantum field theory in curved spacetime, the main purpose of which was to explain the origin of Hawking radiation, and then for a few years I tried to understand whether information can escape from black holes and if so how, as did many others in those days. That led to a 1992 Aspen program co-organized by Andy Strominger and me on “Quantum Aspects of Black Holes.” Various luminaries were there, among them Hawking, Susskind, Sidney Coleman, Kip Thorne, Don Page, and others. Andy and I were asked to nominate someone from our program to give the Aspen Center colloquium, so of course we chose Lenny, and he gave an engaging talk on “The Puzzle of Black Hole Evaporation.”

At the end of the talk, Lenny reported on discussions he’d had with various physicists he respected about the information problem, and he summarized their views. Of course, Hawking said information is lost. ‘t Hooft said that the S-matrix must be unitary for profound reasons we needed to understand. Polchinski said in 1992 that information is lost and there is no way to retrieve it. Yakir Aharonov said that the information resides in a stable Planck-sized black hole remnant. Sidney Coleman said a black hole is a lump of coal — that was the code in 1992 for what we now call the central dogma of black hole physics, that as seen from the outside a black hole is a conventional quantum system. And – remember this was Lenny’s account of what he claimed people had told him – Frank Wilczek said this is a technical problem, I’ll soon have it solved, while Ed Witten said he did not find the problem interesting.

We talked a lot that summer about the no-cloning principle, and our discomfort with the notion that the quantum information encoded in an infalling encyclopedia could be in two places at once on the same time slice, seen inside the black hole by infalling observers and seen outside the black hole by observers who peruse the Hawking radiation. That potential for cloning shook the faith of the self-appointed defenders of unitarity. Andy and I wrote a report at the end of the workshop with a pessimistic tone:

There is an emerging consensus among the participants that Hawking is essentially right – that the information loss paradox portends a true revolution in fundamental physics. If so, then one must go further, and develop a sensible “phenomenological” theory of information loss. One must reconcile the fact of information loss with established principles of physics, such as locality and energy conservation. We expect that many people, stimulated by their participation in the workshop, will now focus attention on this challenge.

I posted a paper on the arXiv a month later with a similar outlook.

There was another memorable event a year later, in June 1993, a conference at the ITP in Santa Barbara (there was no “K” back then), also called “Quantum Aspects of Black Holes.” Among those attending were Susskind, Gibbons, Polchinski, Thorne, Wald, Israel, Bekenstein, and many others. By then our mood was brightening. Rather pointedly, Lenny said to me that week: “Why is this meeting so much better than the one you organized last year?” And I replied, “Because now you think you know the answer!”

That week we talked about “black hole complementarity,” our hope that quantum information being available both inside and outside the horizon could be somehow consistent with the linearity of quantum theory. Complementarity then was a less radical, less wildly nonlocal idea than it became later on. We envisioned that information in an infalling body could stick to the stretched horizon, but not, as I recall, that the black hole interior would be somehow encoded in Hawking radiation emitted long ago — that came later. But anyway, we felt encouraged.

Joe Polchinski organized a poll of the participants, where one could choose among four options.

  1. Information is lost (unitarity violated)
  2. Information escapes (causality violated)
  3. Planck-scale black hole remnants
  4. None of the above

The poll results favored unitarity over information loss by a 60-40 margin. Perhaps not coincidentally, the participants self-identified as 60% high energy physicists and 40% relativists.

The following summer in June 1994, there was a program called Geometry and Gravity at the Newton Institute in Cambridge. Hawking, Gibbons, Susskind, Strominger, Harvey, Sorkin, and (Herman) Verlinde were among the participants. I had more discussions with Lenny that month than any time before or since. I recall sending an email to Paul Ginsparg after one such long discussion in which I said, “When I hear Lenny Susskind speak, I truly believe that information can come out of a black hole.” Secretly, though, having learned about Shor’s algorithm shortly before that program began, I was spending my evenings struggling to understand Shor’s paper. After Cambridge, Lenny visited ‘t Hooft in Utrecht, and returned to Stanford all charged up to write his paper on “The world as a hologram,” in which he credits ‘t Hooft with the idea that “the world is in a sense two-dimensional.”

Important things happened in the next few years: D-branes, counting of black hole microstates, M-theory, and AdS/CFT. But I’ll skip ahead to the most memorable of my visits to Perimeter Institute. (Of course, I always like coming here, because in Canada you use the same electrical outlets we do …)

In June 2007, there was a month-long program at PI called “Taming the Quantum World.” I recall that Lucien Hardy objected to that title — he preferred “Let the Beast Loose” — which I guess is a different perspective on the same idea. I talked there about fault-tolerant quantum computing, but more importantly, I shared an office with Patrick Hayden. I already knew Patrick well — he had been a Caltech postdoc — but I was surprised and pleased that he was thinking about black holes. Patrick had already reached crucial insights concerning the behavior of a black hole that is profoundly entangled with its surroundings. That sparked intensive discussions resulting in a paper later that summer called “Black holes as mirrors.” In the acknowledgments you’ll find this passage:

We are grateful for the hospitality of the Perimeter Institute, where we had the good fortune to share an office, and JP thanks PH for letting him use the comfortable chair.

We intended for that paper to pique the interest of both the quantum information and quantum gravity communities, as it seemed to us that the time was ripe to widen the communication channel between the two. Since then, not only has that communication continued, but a deeper synthesis has occurred; most serious quantum gravity researchers are now well acquainted with the core concepts of quantum information science.

That John Schwarz poem I read earlier reminds me that I often used to write poems. I do it less often lately. Still, I feel that you are entitled to hear something that rhymes tonight. But I quickly noticed our field has many words that are quite hard to rhyme, like “chaos” and “dogma.” And perhaps the hardest of all: “Takayanagi.” So I decided to settle for some limericks — that’s easier for me than a full-fledged poem.

This first one captures how I felt when I first heard about AdS/CFT: excited but perplexed.

Spacetime is emergent they say.
But emergent in what sort of way?
It’s really quite cool,
The bulk has a dual!
I might understand that someday.

For a quantum information theorist, it was pleasing to learn later on that we can interpret the dictionary as an encoding map, such that the bulk degrees of freedom are protected when a portion of the boundary is erased.

Almheiri and Harlow and Dong
Said “you’re thinking about the map wrong.”
It’s really a code!
That’s the thing that they showed.
Should we have known that all along?

(It is easier to rhyme “Dong” than “Takayanagi”.) To see that connection one needed a good grasp of both AdS/CFT and quantum error-correcting codes. In 2014 few researchers knew both, but those guys did.

For all our progress, we still don’t have a complete answer to a key question that inspired IFQ. What’s inside a black hole?

Information loss has been denied.
Locality’s been cast aside.
When the black hole is gone
What fell in’s been withdrawn.
I’d still like to know: what’s inside?

We’re also still lacking an alternative nonperturbative formulation of the bulk; we can only say it’s something that’s dual to the boundary. Until we can define both sides of the correspondence, the claim that two descriptions are equivalent, however inspiring, will remain unsatisfying.

Duality I can embrace.
Complexity, too, has its place.
That’s all a good show
But I still want to know:
What are the atoms of space?

The question, “What are the atoms of space?” is stolen from Joe Polchinski, who framed it to explain to a popular audience what we’re trying to answer. I miss Joe. He was a founding member of It from Qubit, an inspiring scientific leader, and still an inspiration for all of us today.

The IFQ Simons collaboration may fade away, but the quest that has engaged us these past 8 years goes on. IFQ is the continuation of a long struggle, which took on great urgency with Hawking’s formulation of the information loss puzzle nearly 50 years ago. Understanding quantum gravity and its implications is a huge challenge and a grand quest that humanity is obligated to pursue. And it’s fun and it’s exciting, and I sincerely believe that we’ve made remarkable progress in recent years, thanks in large part to you, the IFQ community. We are privileged to live at a time when truths about the nature of space and time are being unveiled. And we are privileged to be part of this community, with so many like-minded colleagues pulling in the same direction, sharing the joy of facing this challenge.

Where is it all going? Coming back to our pitch to the Simons Foundation in 2015, I was very struck by Juan’s presentation that day, and in particular his final slide. I liked it so much that I stole it and used in my presentations for a while. Juan tried to explain what we’re doing by means of an analogy to biological science. How are the quantumists like the biologists?

Well, bulk quantum gravity is life. We all want to understand life. The boundary theory is chemistry, which underlies life. The quantum information theorists are chemists; they want to understand chemistry in detail. The quantum gravity theorists are biologists, they think chemistry is fine, if it can really help them to understand life. What we want is: molecular biology, the explanation for how life works in terms of the underlying chemistry. The black hole information problem is our fruit fly, the toy problem we need to solve before we’ll be ready to take on a much bigger challenge: finding the cure for cancer; that is, understanding the big bang.

How’s it going? We’ve made a lot of progress since 2015. We haven’t cured cancer. Not yet. But we’re having a lot of fun along the way there.

I’ll end with this hope, addressed especially to those who were not yet born when AdS/CFT was first proposed, or were still scampering around in your playpens. I’ll grant you a reprieve, you have another 8 years. By then: May you cure cancer!

So I propose this toast: To It from Qubit, to our colleagues and friends, to our quest, to curing cancer, to understanding the universe. I wish you all well. Cheers!

Scott Aaronson “Will AI Destroy Us?”: Roundtable with Coleman Hughes, Eliezer Yudkowsky, Gary Marcus, and me (+ GPT-4-enabled transcript!)

A month ago Coleman Hughes, a young writer whose name I recognized from his many thoughtful essays in Quillette and elsewhere, set up a virtual “AI safety roundtable” with Eliezer Yudkowsky, Gary Marcus, and, err, yours truly, for his Conversations with Coleman podcast series. Maybe Coleman was looking for three people with the most widely divergent worldviews who still accept the premise that AI could, indeed, go catastrophically for the human race, and that talking about that is not merely a “distraction” from near-term harms. In any case, the result was that you sometimes got me and Gary against Eliezer, sometimes me and Eliezer against Gary, and occasionally even Eliezer and Gary against me … so I think it went well!

You can watch the roundtable here on YouTube, or listen here on Apple Podcasts. (My one quibble with Coleman’s intro: extremely fortunately for both me and my colleagues, I’m not the chair of the CS department at UT Austin; that would be Don Fussell. I’m merely the “Schlumberger Chair,” which has no leadership responsibilities.)

I know many of my readers are old fuddy-duddies like me who prefer reading to watching or listening. Fortunately, and appropriately for the subject matter, I’ve recently come into possession of a Python script that grabs the automatically-generated subtitles from any desired YouTube video, and then uses GPT-4 to edit those subtitles into a coherent-looking transcript. It wasn’t perfect—I had to edit the results further to produce what you see below—but it was still a huge time savings for me compared to starting with the raw subtitles. I expect that in a year or two, if not sooner, we’ll have AIs that can do better still by directly processing the original audio (which would tell the AIs who’s speaking when, the intonations of their voices, etc).

Anyway, thanks so much to Coleman, Eliezer, and Gary for a stimulating conversation, and to everyone else, enjoy (if that’s the right word)!

PS. As a free bonus, here’s a GPT-4-assisted transcript of my recent podcast with James Knight, about common knowledge and Aumann’s agreement theorem. I prepared this transcript for my fellow textophile Steven Pinker and am now sharing it with the world!

PPS. I’ve now added links to the transcript and fixed errors. And I’ve been grateful, as always, for the reactions on Twitter (oops, I mean “X”), such as: “Skipping all the bits where Aaronson talks made this almost bearable to watch.”

COLEMAN: Why is AI going to destroy us? ChatGPT seems pretty nice. I use it every day. What’s, uh, what’s the big fear here? Make the case.

ELIEZER: We don’t understand the things that we build. The AIs are grown more than built, you might say. They end up as giant inscrutable matrices of floating point numbers that nobody can decode. At this rate, we end up with something that is smarter than us, smarter than humanity, that we don’t understand. Whose preferences we could not shape and by default, if that happens, if you have something around it, it is like much smarter than you and does not care about you one way or the other. You probably end up dead at the end of that.

GARY: Extinction is a pretty, you know, extreme outcome that I don’t think is particularly likely. But the possibility that these machines will cause mayhem because we don’t know how to enforce that they do what we want them to do, I think that’s a real thing to worry about.


COLEMAN: Welcome to another episode of Conversations with Coleman. Today’s episode is a roundtable discussion about AI safety with Eliezer Yudkowsky, Gary Marcus, and Scott Aaronson.

Eliezer Yudkowsky is a prominent AI researcher and writer known for co-founding the Machine Intelligence Research Institute, where he spearheaded research on AI safety. He’s also widely recognized for his influential writings on the topic of rationality.

Scott Aaronson is a theoretical computer scientist and author, celebrated for his pioneering work in the field of quantum computation. He’s also the [Schlumberger] Chair of CompSci at U of T Austin, but is currently taking a leave of absence to work at OpenAI.

Gary Marcus is a cognitive scientist, author, and entrepreneur known for his work at the intersection of psychology, linguistics, and AI. He’s also authored several books including Kluge and Rebooting AI: Building AI We Can Trust.

This episode is all about AI safety. We talk about the alignment problem, we talk about the possibility of human extinction due to AI. We talk about what intelligence actually is, we talk about the notion of a singularity or an AI takeoff event, and much more. It was really great to get these three guys in the same virtual room, and I think you’ll find that this conversation brings something a bit fresh to a topic that has admittedly been beaten to death on certain corners of the internet.

So, without further ado, Eliezer Yudkowsky, Gary Marcus, and Scott Aaronson. [Music]

Okay, Eliezer Yudkowsky, Scott Aaronson, Gary Marcus, thanks so much for coming on my show. Thank you. So, the topic of today’s conversation is AI safety and this is something that’s been in the news lately. We’ve seen, you know, experts and CEOs signing letters recommending public policy surrounding regulation. We continue to have the debate between people that really fear AI is going to end the world and potentially kill all of humanity and the people who fear that those fears are overblown. And so, this is going to be sort of a roundtable conversation about that, and you three are really three of the best people in the world to talk about it with. So thank you all for doing this.

Let’s just start out with you, Eliezer, because you’ve been one of the most really influential voices getting people to take seriously the possibility that AI will kill us all. You know, why is AI going to destroy us? ChatGPT seems pretty nice. I use it every day. What’s the big fear here? Make the case.

ELIEZER: Well, ChatGPT seems quite unlikely to kill everyone in its present state. AI capabilities keep on advancing and advancing. The question is not, “Can ChatGPT kill us?” The answer is probably no. So as long as that’s true, as long as it hasn’t killed us yet, the engineers are just gonna keep pushing the capabilities. There’s no obvious blocking point.

We don’t understand the things that we build. The AIs are grown more than built, you might say. They end up as giant inscrutable matrices of floating point numbers that nobody can decode. It’s probably going to end up technically difficult to make them want particular things and not others, and people are just charging straight ahead. So, at this rate, we end up with something that is smarter than us, smarter than humanity, that we don’t understand, whose preferences we could not shape.

By default, if that happens, if you have something around that is much smarter than you and does not care about you one way or the other, you probably end up dead. At the end of that, it gets the most of whatever strange and inscrutable things that it wants: it wants worlds in which there are not humans taking up space, using up resources, building other AIs to compete with it, or it just wants a world in which you built enough power plants that the surface of the earth gets hot enough that humans didn’t survive.

COLEMAN: Gary, what do you have to say about that?

GARY: There are parts that I agree with, some parts that I don’t. I agree that we are likely to wind up with AIs that are smarter than us. I don’t think we’re particularly close now, but you know, in 10 years or 50 years or 100 years, at some point, it could be a thousand years, but it will happen.

I think there’s a lot of anthropomorphization there about the machines wanting things. Of course, they have objective functions, and we can talk about that. I think it’s a presumption to say that the default is that they’re going to want something that leads to our demise, and that they’re going to be effective at that and be able to literally kill us all.

I think, if you look at the history of AI, at least so far, they don’t really have wants beyond what we program them to do. There is an alignment problem, I think that that’s real in the sense of like people who program the system to do X and they do X’, that’s kind of like X but not exactly. And so, I think there’s really things to worry about. I think there’s a real research program here that is under-researched.

But the way I would put it is, we want to understand how to make machines that have values. You know Asimov’s laws are way too simple, but they’re a kind of starting point for conversation. We want to program machines that don’t harm humans. They can calculate the consequences of their actions. Right now, we have technology like GPT-4 that has no idea what the consequence of its actions are; it doesn’t really anticipate things.

And there’s a separate thing that Eliezer didn’t emphasize, which is, it’s not just how smart the machines are but how much power we give them; how much we empower them to do things like access the internet or manipulate people, or, um, you know, write source code, access files and stuff like that. Right now, AutoGPT can do all of those things, and that’s actually pretty disconcerting to me. To me, that doesn’t all add up to any kind of extinction risk anytime soon, but catastrophic risk where things go pretty wrong because we wanted these systems to do X and we didn’t really specify it well. They don’t really understand our intentions. I think there are risks like that.

I don’t see it as a default that we wind up with extinction. I think it’s pretty hard to actually terminate the entire human species. You’re going to have people in Antarctica; they’re going to be out of harm’s way or whatever, or you’re going to have some people who, you know, respond differently to any pathogen, etc. So, like, extinction is a pretty extreme outcome that I don’t think is particularly likely. But the possibility that these machines will cause mayhem because we don’t know how to enforce that they do what we want them to do – I think that’s a real thing to worry about and it’s certainly worth doing research on.

COLEMAN: Scott, how do you view this?

SCOTT: So I’m sure that you can get the three of us arguing about something, but I think you’re going to get agreement from all three of us that AI safety is important. That catastrophic outcomes, whether or not they mean literal human extinction, are possible. I think it’s become apparent over the last few years that this century is going to be largely defined by our interaction with AI. That AI is going to be transformative for human civilization and—I’m confident about that much. If you ask me almost anything beyond that about how it’s going to transform civilization, will it be good, will it be bad, what will the AI want, I am pretty agnostic. Just because, if you would have asked me 20 years ago to try to forecast where we are now, I would have gotten a lot wrong.

My only defense is that I think all of us here and almost everyone in the world would have gotten a lot wrong about where we are now. If I try to envision where we are in 2043, does the AI want to replace humanity with something better, does it want to keep us around as pets, does it want to continue helping us out, like a super souped-up version of ChatGPT, I think all of those scenarios merit consideration.

What has happened in the last few years that’s really exciting is that AI safety has become an empirical subject. Right now, there are very powerful AIs that are being deployed and we can actually learn something. We can work on mitigating the nearer-term harms. Not because the existential risk doesn’t exist, or is absurd or is science fiction or anything like that, but just because the nearer-term harms are the ones that we can see right in front of us. And where we can actually get feedback from the external world about how we’re doing. We can learn something and hopefully some of the knowledge that we gain will be useful in addressing the longer term risks, that I think Eliezer is very rightly worried about.

COLEMAN: So, there’s alignment and then there’s alignment, right? So there’s alignment in the sense that we haven’t even fully aligned smartphone technology with our interests. Like, there are some ways in which smartphones and social media have led to probably deleterious mental health outcomes, especially for teenage girls for example. So there are those kinds of mundane senses of alignment where it’s like, ‘Is this technology doing more good than harm in the normal everyday public policy sense?’ And then there’s the capital ‘A’ alignment. Are we creating a creature that is going to view us like ants and have no problem extinguishing us, whether intentional or not?

So it seems to me all of you agree that the first sense of alignment is, at the very least, something to worry about now and something to deal with. But I’m curious to what extent you think the really capital ‘A’ sense of alignment is a real problem because it can sound very much like science fiction to people. So maybe let’s start with Eliezer.

ELIEZER: I mean, from my perspective, I would say that if we had a solid guarantee that AI was going to do no more harm than social media, we ought to plow ahead and reap all the gains. The amount of harm that social media has done to humanity, while significant in my view and having done a lot of damage to our sanity, is not enough harm to justify either foregoing the gains that you could get from AI— if that was going to be the worst downside—or to justify the kind of drastic measures you’d need to stop plowing ahead on AI.

I think that the capital “A” alignment is beyond this generation. Yeah, you know, I’ve started in the field, I’ve watched over it for two decades. I feel like in some ways, the modern generation, plowing in with their eyes on the short-term stuff, is losing track of the larger problems because they can’t solve the larger problems, and they can solve the little problems. But we’re just plowing straight into the big problems, and we’re going to plow right into the big problems with a bunch of little solutions that aren’t going to scale.

I think it’s cool. I think it’s lethal. I think it’s at the scale where you just back off and don’t do this.

COLEMAN: By “back off and don’t do this,” what do you mean?

ELIEZER: I mean, have an international treaty about where the chips capable of doing AI training go, and have them all going into licensed, monitored data centers. And not have the training runs for AI’s more powerful than GPT-4, possibly even lowering that threshold over time as algorithms improve, and it gets power possible to train more powerful AIs using lessons—

COLEMAN: So you’re picturing a kind of international agreement to just stop? International moratorium?

ELIEZER: If North Korea steals the GPU shipment, then you’ve got to be ready to destroy their data center that they build by conventional means. And if you don’t have that willingness in advance, then countries may refuse to sign up for the agreement being, like, ‘Why aren’t we just ceding the advantage to someone else?’

Then, it actually has to be a worldwide shutdown because the scale of harmfulness super intelligence—it’s not that if you have 10 times as many super intelligences, you’ve got 10 times as much harm. It’s not that a superintelligence only wrecks the country that built the superintelligence. Any superintelligence anywhere is everyone’s last problem.

COLEMAN: So, Gary and Scott, if either of you want to jump in there, I mean, is there—is AI safety a matter of forestalling the end of the world? And all of these smaller issues and paths towards safety that Scott, you mentioned, are they—just, you know—throwing I don’t know what the analogy is but um, pointless essentially? I mean, what do you guys make of this?

SCOTT: The journey of a thousand miles begins with a step, right? Most of the way I think about this comes from, you know, 25 years of doing computer science research, including quantum computing and computational complexity, things like that. We have these gigantic aspirational problems that we don’t know how to solve and yet, year after year, we do make progress. We pick off little sub-problems, and if we can’t solve those, then we find sub-problems of those. And we keep repeating until we find something that we can solve. And this is, I think, for centuries, the way that science has made progress. Now it is possible, of course, that this time, we just don’t have enough time for that to work.

And I think that is what Eliezer is fearful of, right? That we just don’t have enough time for the ordinary scientific process to take place before AI becomes too powerful. In such a case, you start talking about things like a global moratorium, enforced with the threat of war.

However, I am not ready to go there. I could imagine circumstances where I might say, ‘Gosh, this looks like such an imminent threat that, you know, we have to intervene.’ But, I tend to be very worried in general about causing a catastrophe in the process of trying to prevent one. And I think, when you’re talking about threatening airstrikes against data centers or similar actions, then that’s an obvious worry.

GARY: I’m somewhat in between here. I agree with Scott that we are not at the point where we should be bombing data centers. I don’t think we’re close to that. Furthermore, I’m much less optimistic about our proximity to AGI than Eliezer sometimes sounds like. I don’t think GPT-5 is anything like AGI, and I’m not particularly concerned about who gets it first and so forth. On the other hand, I think that we’re in a sort of dress rehearsal mode.

You know, nobody expected GPT-4, or really ChatGPT, to percolate as fast as it did. And it’s a reminder that there’s a social side to all of this. How software gets distributed matters, and there’s a corporate side as well.

It was a kind of galvanizing moment for me when Microsoft didn’t pull Sydney, even though Sydney did some awfully strange things. I thought they would stop it for a while and it’s a reminder that they can make whatever decisions they want. So, when we multiply that by Eliezer’s concerns about what do we do and at what point would it be enough to cause problems, it is a reminder I think, that we need, for example, to start drafting these international treaties now because there could become a moment where there is a problem.

I don’t think the problem that Eliezer sees is here now, but maybe it will be. And maybe when it does come, we will have so many people pursuing commercial self-interest and so little infrastructure in place, we won’t be able to do anything. So, I think it really is important to think now—if we reach such a point, what are we going to do? And what do we need to build in place before we get to that point.

COLEMAN: We’ve been talking about this concept of Artificial General Intelligence and I think it’s worth asking whether that is a useful, coherent concept. So for example, if I were to think of my analogy to athleticism and think of the moment when we build a machine that has, say, artificial general athleticism meaning it’s better than LeBron James at basketball, but also better at curling than the world’s best curling player, and also better at soccer, and also better at archery and so forth. It would seem to me that there’s something a bit strange in framing it as having reached a point on a single continuum. It seems to me you would sort of have to build each capability, each sport individually, and then somehow figure how to package them all into one robot without each skill set detracting from the other.

Is that a disanalogy? Is there a different way you all picture this intelligence as sort of one dimension, one knob that is going to get turned up along a single axis? Or do you think that way of talking about it is misleading in the same way that I kind of just sketched out?

GARY: Yeah, I would absolutely not accept that. I’d like to say that intelligence is not a one-dimensional variable. There are many different aspects to intelligence and I don’t think there’s going to be a magical moment when we reach the singularity or something like that.

I would say that the core of artificial general intelligence is the ability to flexibly deal with new problems that you haven’t seen before. The current systems can do that a little bit, but not very well. My typical example of this now is GPT-4. It is exposed to the game of chess, sees lots of games of chess, sees the rules of chess but it never actually figure out the rules of chess. They often make illegal moves and so forth. So it’s in no way a general intelligence that can just pick up new things. Of course, we have things like AlphaGo that can play a certain set of games or AlphaZero really, but we don’t have anything that has the generality of human intelligence.

However, human intelligence is just one example of general intelligence. You could argue that chimpanzees or crows have another variety of general intelligence. I would say that current machines don’t really have it but they will eventually.

SCOTT: I think a priori, it could have been that you would have math ability, you would have verbal ability, you’d have the ability to understand humor, and they’d all be just completely unrelated to each other. That is possible and in fact, already with GPT, you can say that in some ways it’s already a superintelligence. It knows vastly more, can converse on a vastly greater range of subjects than any human can. And in other ways, it seems to fall short of what humans know or can do.

But you also see this sort of generality just empirically. I mean, GPT was trained on most of the text on the open internet. So it was just one method. It was not explicitly designed to write code, and yet, it can write code. And at the same time as that ability emerged, you also saw the ability to solve word problems, like high school level math. You saw the ability to write poetry. This all came out of the same system without any of it being explicitly optimized for.

GARY: I feel like I need to interject one important thing, which is – it can do all these things, but none of them all that reliably well.

SCOTT: Okay, nevertheless, I mean compared to what, let’s say, my expectations would have been if you’d asked me 10 or 20 years ago, I think that the level of generality is pretty remarkable. It does lend support to the idea that there is some sort of general quality of understanding there. For example, you could say that GPT-4 has more of it than GPT-3, which in turn has more than GPT-2.

ELIEZER: It does seem to me like it’s presently pretty unambiguous that GPT-4 is, in some sense, dumber than an adult or even a teenage human. And…

COLEMAN: That’s not obvious to me.

GARY: I mean, to take the example I just gave you a minute ago, it never learns to play chess even with a huge amount of data. It will play a little bit of chess; it will memorize the openings and be okay for the first 15 moves. But, it gets far enough away from what it’s trained on, and it falls apart. This is characteristic of these systems. It’s not really characteristic in the same way of adults or even teenage humans. Almost, I feel that it does, it does unreliably. Let me give another example. You can ask a human to write a biography of someone and not make stuff up, and you really can’t ask GPT to do that.

ELIEZER: Yeah, like it’s a bit difficult because you could always be cherry-picking something that humans are unusually good at. But to me, it does seem like there’s this broad range of problems that don’t seem especially to play to humans’ strong points or machine weak points. For where GPT-4 will, you know, do no better than a seven-year-old on those problems.

COLEMAN: I do feel like these examples are cherry-picked. Because if I, if I just take a different, very typical example – I’m writing an op-ed for the New York Times, say about any given subject in the world, and my choice is to have a smart 14-year-old next to me with anything that’s in his mind already or GPT – there’s no comparison, right? So, which of these examples is the litmus test for who’s more intelligent, right?

GARY: If you did it on a topic where it couldn’t rely on memorized text, you might actually change your mind on that. So I mean, the thing about writing a Times op-ed is, most of the things that you propose to it, there’s actually something that it can pastiche together from its dataset. But, that doesn’t mean that it really understands what’s going on. It doesn’t mean that that’s a general capability.

ELIEZER: Also, as the human, you’re doing all the hard parts. Right, like obviously, a human is going to prefer – if a human has a math problem, he’s going to rather use a calculator than another human. And similarly, with the New York Times op-ed, you’re doing all the parts that are hard for GPT-4, and then you’re asking GPT-4 to just do some of the parts that are hard for you. You’re always going to prefer an AI partner rather than a human partner, you know, within that sort of range. The human can do all the human stuff and you want an AI to do whatever the AI is good at the moment, right?

GARY: A relevant analogy here is driverless cars. It turns out, on highways and ordinary traffic, they’re probably better than people. But in unusual circumstances, they’re really worse than people. For instance, a Tesla not too long ago ran into a jet at slow speed while being summoned across a parking lot. A human wouldn’t have done that, so there are different strengths and weaknesses.

The strength of a lot of the current kinds of technology is that they can either patch things together or make non-literal analogies; we’ll go into details, but they can pull from stored examples. They tend to be poor when you get to outlier cases, and this is persistent across most of the technologies that we use right now. Therefore, if you stick to stuff for which there’s a lot of data, you’ll be happy with the results you get from these systems. But if you move far enough away, not so much.

ELIEZER: What we’re going to see over time is that the debate about whether or not it’s still dumber than you will continue for longer and longer. Then, if things are allowed to just keep running and nobody dies, at some point, it switches over to a very long debate about ‘is it smarter than you?’ which then gets shorter and shorter and shorter. Eventually it reaches a point where it’s pretty unambiguous if you’re paying attention. Now, I suspect that this process gets interrupted by everybody dying. In particular, there’s a question of the point at which it becomes better than you, better than humanity at building the next edition of the AI system. And how fast do things snowball once you get to that point? Possibly, you do not have time for further public debates or even a two-hour Twitter space depending on how that goes.

SCOTT: I mean, some of the limitations of GPT are completely understandable, just from a little knowledge of how it works. For example, it doesn’t have an internal memory per se, other than what appears on the screen in front of you. This is why it’s turned out to be so effective to explicitly tell it to think step-by-step when it’s solving a math problem. You have to tell it to show all of its work because it doesn’t have an internal memory with which to do that.

Likewise, when people complain about it hallucinating references that don’t exist, well, the truth is when someone asks me for a citation and I’m not allowed to use Google, I might have a vague recollection of some of the authors, and I’ll probably do a very similar thing to what GPT does: I’ll hallucinate.

GARY: So there’s a great phrase I learned the other day, which is ‘frequently wrong, never in doubt.’

SCOTT: That’s true, that’s true.

GARY: I’m not going to make up a reference with full detail, page numbers, titles, and so forth. I might say, ‘Look, I don’t remember, you know, 2012 or something like that.’ Yeah, whereas GPT-4, what it’s going to say is, ‘2017, Aaronson and Yudkowsky, you know, New York Times, pages 13 to 17.’

SCOTT: No, it does need to get much much better at knowing what it doesn’t know. And yet already I’ve seen a noticeable improvement there, going from GPT-3 to GPT-4.

For example, if you ask GPT-3, ‘Prove that there are only finitely many prime numbers,’ it will give you a proof, even though the statement is false. It will have an error which is similar to the errors on a thousand exams that I’ve graded, trying to get something past you, hoping that you won’t notice. Okay, if you ask GPT-4, ‘Prove that there are only finitely many prime numbers,’ it says, ‘No, that’s a trick question. Actually, there are infinitely many primes and here’s why.’

GARY: Yeah, part of the problem with doing the science here is that — I think, you would know better since you work part-time, or whatever, at OpenAI — but my sense is that a lot of the examples that get posted on Twitter, particularly by the likes of me and other critics, or other skeptics I should say, is that the system gets trained on those. Almost everything that people write about it, I think, is in the training set. So it’s hard to do the science when the system’s constantly being trained, especially in the RLHF side of things. And we don’t actually know what’s in GPT-4, so we don’t even know if there are regular expressions and, you know, simple rules or such things. So we can’t do the kind of science we used to be able to do.

ELIEZER: This conversation, this subtree of the conversation, I think, has no natural endpoint. So, if I can sort of zoom out a bit, I think there’s a pretty solid sense in which humans are more generally intelligent than chimpanzees. As you get closer and closer to the human level, I would say that the direction here is still clear. The comparison is still clear. We are still smarter than GPT-4. This is not going to take control of the world from us.

But, you know, the conversations get longer, the definitions start to break down around the edges. But I think it also, as you keep going, it comes back together again. There’s a point, and possibly this point is very close to the point of time to where everybody dies, so maybe we don’t ever see it in a podcast. But there’s a point where it’s unambiguously smarter than you, including like the spark of creativity, being able to deduce things quickly rather than with tons and tons of extra evidence, strategy, cunning, modeling people, figuring out how to manipulate people.

GARY: So, let’s stipulate, Eliezer, that we’re going to get to machines that can do all of that. And then the question is, what are they going to do? Is it a certainty that they will make our annihilation part of their business? Is it a possibility? Is it an unlikely possibility?

I think your view is that it’s a certainty. I’ve never really understood that part.

ELIEZER: It’s a certainty on the present tech, is the way I would put it. Like, if that happened tomorrow, then you know, modulo Cromwell’s Rule, never say certain. My probability is like yes, modulo like the chance that my model is somehow just completely mistaken.

If we got 50 years to work it out and unlimited retries, I’d be a lot more confident. I think that’d be pretty okay. I think we’d make it. The problem is that it’s a lot harder to do science when your first wrong attempt destroys the human species and then you don’t get to try again.

GARY: I mean, I think there’s something again that I agree with and something I’m a little bit skeptical about. So I agree that the amount of time we have matters. And I would also agree that there’s no existing technology that solves the alignment problem, that gives a moral basis to these machines.

I mean, GPT-4 is fundamentally amoral. I don’t think it’s immoral. It’s not out to get us, but it really is amoral. It can answer trolley problems because there are trolley problems in the dataset, but that doesn’t mean that it really has a moral understanding of the world.

And so if we get to a very smart machine that, by all the criteria that we’ve talked about, is amoral, then that’s a problem for us. There’s a question of whether, if we can get to smart machines, whether we can build them in a way that will have some moral basis…

ELIEZER: On the first try?

GARY: Well, the first try part I’m not willing to let pass. So, I understand, I think your argument there; maybe you should spell it out. I think that we’ll probably get more than one shot, and that it’s not as dramatic and instantaneous as you think. I do think one wants to think about sandboxing and wants to think about distribution.

But let’s say we had one evil super-genius now who is smarter than everybody else. Like, so what? One super-

ELIEZER: Much smarter? Not just a little smarter?

GARY: Oh, even a lot smarter. Like most super-geniuses, you know, aren’t actually that effective. They’re not that focused; they’re focused on other things. You’re kind of assuming that the first super-genius AI is gonna make it its business to annihilate us, and that’s the part where I’m still a bit stuck in the argument.

ELIEZER: Yeah, some of this has to do with the notion that if you do a bunch of training you start to get goal direction, even if you don’t explicitly train on that. That goal direction is a natural way to achieve higher capabilities. The reason why humans want things is that wanting things is an effective way of getting things. And so, natural selection in the process of selecting exclusively on reproductive fitness, just on that one thing, got us to want a bunch of things that correlated with reproductive fitness in the ancestral distribution because wanting, having intelligences that want things, is a good way of getting things. That’s, in a sense, like, wanting comes from the same place as intelligence itself. And you could even, from a certain technical standpoint on expected utilities, say that intelligence is a special, is a very effective way of wanting – planning, plotting paths through time that leads to particular outcomes.

So, part of it is that I think it, I do not think you get like the brooding super-intelligence that wants nothing because I don’t think that wanting and intelligence can be pried apart that easily. I think that the way you get super-intelligence is that there are things that have gotten good at organizing their own thoughts and have good taste in which thoughts to think. And that is where the high capabilities come from.

COLEMAN: Let me just put the following point to you, which I think, in my mind, is similar to what Gary was saying. There’s often, in philosophy, this notion of the Continuum Fallacy. The canonical example is like you can’t locate a single hair that you would pluck from my head where I would suddenly go from not bald to bald. Or, like, the even more intuitive examples, like a color wheel. Like there’s no single pixel on a grayscale you can point to and say, well that’s where gray begins and white ends. And yet, we have this conceptual distinction that feels hard and fast between gray and white, and gray and black, and so forth.

When we’re talking about artificial general intelligence or superintelligence, you seem to operate on a model where either it’s a superintelligence capable of destroying all of us or it’s not. Whereas, intelligence may just be a continuum fallacy-style spectrum, where we’re first going to see the shades of something that’s just a bit more intelligent than us, and maybe it can kill five people at most. And when that happens, you know, we’re going to want to intervene, and we’re going to figure out how to intervene and so on and so forth.

ELIEZER: Yeah, so if it’s stupid enough to do it then yes. Let me assure you, by employing the identical logic, there should be nobody who steals money on a really large scale, right? Because you could just give them five dollars and see if they steal that, and if they don’t steal that, you know, you’re good to trust them with a billion.

SCOTT: I think that in actuality, anyone who did steal a billion dollars probably displayed some dishonest behavior earlier in their life which was, unfortunately, not acted upon early enough.

COLEMAN: The analogy is like, we have the first case of fraud that’s ten thousand dollars, and then we build systems to prevent it. But then they fail with a somewhat smarter opponent, but our systems get better and better, and so we prevent the billion dollar fraud because of the systems put in place in response to the ten thousand dollar frauds.

GARY: I think Coleman’s putting his finger on an important point here, which is, how much do we get to iterate in the process? And Eliezer is saying the minute we have a superintelligent system, we won’t be able to iterate because it’s all over immediately.

ELIEZER: Well, there isn’t a minute like that.

So, the way that the continuum goes to the threshold is that you eventually get something that’s smart enough that it knows not to play its hand early. Then, if that thing, you know, if you are still cranking up the power on that and preserving its utility function, it knows it just has to wait to be smarter to be able to win. It doesn’t play its hand prematurely. It doesn’t tip you off. It’s not in its interest to do that. It’s in its interest to cooperate until it thinks it can win against humanity and only then make its move.

If it doesn’t expect future smarter AIs to be smarter than itself, then we might perhaps see these early AI’s telling humanity, ‘don’t build the later AIs.’ I would be sort of surprised and amused if we ended up in that particular sort of science-fiction scenario, as I see it. But we’re already in something that, you know, me from 10 years ago would have called a science-fiction scenario, which is the things that talk to you without being very smart.

GARY: I always come up against Eliezer with this idea that you’re assuming the very bright machines, the superintelligent machines, will be malicious and duplicitous and so forth. And I just don’t see that as a logical entailment of being very smart.

ELIEZER: I mean, they don’t specifically want, as an end in itself, for you to be destroyed. They’re just doing whatever obtains the most of the stuff that they actually want, which doesn’t specifically have a term that’s maximized by humanity surviving and doing well.

GARY: Why can’t you just hardcode, um, ‘don’t do anything that will annihilate the human species? Don’t do anything…’

ELIEZER: We don’t know how.

GARY: I agree that right now we don’t have the technology to hard-code ‘don’t do harm to humans.’ But for me, it all boils down to a question of: are we going to get the smart machines before we make progress on that hard coding problem or not? And that, to me, means that the problem of hard-coding ethical values is actually one of the most important projects that we should be working on.

ELIEZER: Yeah, and I tried to work on it 20 years in advance, and capabilities are just running vastly ahead of alignment. When I started working on this 20 years, you know, like two decades ago, we were in a sense ahead of where we are now. AlphaGo is much more controllable than GPT-4.

GARY: So there I agree with you. We’ve fallen in love with technology that is fairly poorly controlled. AlphaGo is very easily controlled – very well-specified. We know what it does, we can more or less interpret why it’s doing it, and everybody’s in love with these large language models, and they’re much less controlled, and you’re right, we haven’t made a lot of progress on alignment.

ELIEZER: So if we just go on a straight line, everybody dies. I think that’s an important fact.

GARY: I would almost even accept that for argument, but then ask, do we have to be on a straight line?

SCOTT: I would agree to the weaker claim that we should certainly be extremely worried about the intentions of a superintelligence, in the same way that, say, chimpanzees should be worried about the intentions of the first humans that arise. And in fact, chimpanzees continue to exist in our world only at humans’ pleasure.

But I think that there are a lot of other considerations here. For example, if we imagined that GPT-10 is the first unaligned superintelligence that has these sorts of goals, well then, it would be appearing in a world where presumably GPT-9 already has a very wide diffusion, and where people can use that to try to prevent GPT-10 from destroying the world.

ELIEZER: Why does GPT-9 work with humans instead of with GPT-10?

SCOTT: Well, I don’t know. Maybe it does work with GPT-10, but I just don’t view that as a certainty. I think your certainty about this is the one place where I really get off the train.

GARY: Same with me.

ELIEZER: I mean, I’m not asking you to share my certainty. I am asking the viewers to believe that you might end up with more extreme probabilities after you stare at things for an additional couple of decades, well that doesn’t mean you have to accept my probabilities immediately. But, I’m at least asking you to not treat that as some kind of weird anomaly, you know what I mean? You’re just gonna find those kinds of situations in these debates.

GARY: My view is that I don’t find the extreme probabilities that you describe to be plausible. But, I find the question that you’re raising to be important. I think, you know, maybe a straight line is too extreme. But this idea – that if you just follow current trends, we’re getting less and less controllable machines and not getting more alignment.

We have machines that are more unpredictable, harder to interpret and no better at sticking to even a basic principle like, ‘be honest and don’t make stuff up’. In fact, that’s a problem that other technologies don’t really have. Routing systems, GPS systems, they don’t make stuff up. Google Search doesn’t make stuff up. It will point to things that other people have made stuff up, but it doesn’t itself do it.

So, in that sense, the trend line is not great. I agree with that and I agree that we should be really worried about that, and we should put effort into it. Even if I don’t agree with the probabilities that you attach to it.

SCOTT: I think that Eliezer deserves eternal credit for raising these issues twenty years ago, when it was very far from obvious to most of us that they would be live issues. I mean, I can say for my part, I was familiar with Eliezer’s views since 2006 or so. When I first encountered them, I knew that there was no principle that said this scenario was impossible, but I just felt like, “Well, supposing I agreed with that, what do you want me to do about it? Where is the research program that has any hope of making progress here?”

One question is, what are the most important problems in the world? But in science, that’s necessary but not sufficient. We need something that we can make progress on. That is the thing that I think has changed just recently with the advent of actual, very powerful AIs. So, the irony here is that as Eliezer has gotten much more pessimistic in the last few years about alignment, I’ve sort of gotten more optimistic. I feel like, “Wow, there is a research program where we can actually make progress now.”

ELIEZER: Your research program is going to take 100 years, we don’t have…

SCOTT: I don’t know how long it will take.

GARY: I mean, we don’t know exactly. I think the argument that we should put a lot more effort into it is clear. The argument that it will take 100 years is totally unclear.

ELIEZER: I’m not even sure we can do it in 100 years because there’s the basic problem of getting it right on the first try. And the way things are supposed to work in science is, you have your bright-eyed, optimistic youngsters with their vastly oversimplified, hopelessly idealistic plan. They charge ahead, they fail, they learn a little cynicism and pessimism, and realize it’s not as easy as they thought. They try again, they fail again, and they start to build up something akin to battle hardening. Then, they find out how little is actually possible for them.

GARY: Eliezer, this is the place where I just really don’t agree with you. So, I think there’s all kinds of things we can do of the flavor of model organisms or simulations and so forth. I mean, it’s hard because we don’t actually have a superintelligence, so we can’t fully calibrate. But it’s a leap to say that there’s nothing iterative that we can do here, whether we have to get it right the first time. I mean, I certainly see a scenario where that’s true, where getting it right the first time does make a difference. But I can see lots of scenarios where it doesn’t and where we do have time to iterate before it happens, after it happens, it’s really not a single moment.

ELIEZER: The problem is getting anything that generalizes up to a superintelligent level. Once we’re past some threshold level, the minds may find it in their own interest to start lying to you, even if that happens before superintelligence.

GARY: Even that, I don’t see the logical argument that says you can’t emulate that or study it. I mean, for example – and I’m just making this up as I go along – you could study sociopaths, who are often very bright, and you know, not tethered to our values. But, yeah, well, you can…

ELIEZER: What strategy can a like 70 IQ honest person come up with and invent themselves by which they will outwit and defeat a 130 IQ sociopath?

GARY: Well, there, you’re not being fair either, in the sense that we actually have lots of 150 IQ people who could be working on this problem collectively. And there’s value in collective action. There’s literature…

ELIEZER: What I see that gives me pause, is that the people don’t seem to appreciate what about the problem is hard. Even at the level where, like 20 years ago, I could have told you it was hard.

Until, you know, somebody like me comes along and nags them about it. And then they talk about the ways in which they could adapt and be clever. But the people charging straightforward are just sort of doing this in a supremely naive way.

GARY: Let me share a historical example that I think about a lot which is, in the early 1900s, almost every scientist on the planet who thought about biology made a mistake. They all thought that genes were proteins. And then eventually Oswald Avery did the right experiments. They realized that genes were not proteins, they were this weird acid.

And it didn’t take long after people got out of this stuck mindset before they figured out how that weird acid worked and how to manipulate it, and how to read the code that it was in and so forth. So, I absolutely sympathize with the fact that I feel like the field is stuck right now. I think the approaches people are taking to alignment are unlikely to work.

I’m completely with you there. But I’m also, I guess, more long-term optimistic that science is self-correcting, and that we have a chance here. Not a certainty, but I think if we change research priorities from ‘how do we make some money off this large language model that’s unreliable?’ to ‘how do I save the species?’, we might actually make progress.

ELIEZER: There’s a special kind of caution that you need when something needs to be gotten correct on the first try. I’d be very optimistic if people got a bunch of free retries, and I didn’t think the first one was going to kill — you know, the first really serious mistake — killed everybody, and we didn’t get to try again. If we got free retries, it’d be in some sense an ordinary science problem.

SCOTT: Look, I can imagine a world where we only got one try, and if we failed, then it destroys all life on Earth. And so, let me agree to the conditional statement that if we are in that world, then I think that we’re screwed.

GARY: I will agree with the same conditional statement.

COLEMAN: Yeah, this gets back to — if you picture by analogy, the process of a human baby, which is extremely stupid, becoming a human adult, and then just extending that so that in a single lifetime, this person goes from a baby to the smartest being that’s ever lived. But in the normal way that humans develop, which is, you know, and it doesn’t happen on any one given day, and each sub-skill develops a little bit at its own rate and so forth, it would not be at all obvious to me that our concerns, that we have to get it right vis-a-vis that individual the first time.

ELIEZER: I agree. Well, no, pardon me. I do think we have to get it right the first time, but I think there’s a decent chance of getting it right. It is very important to get it right the first time, if, like, you have this one person getting smarter and smarter and not everyone else is getting smarter and smarter.

SCOTT: Eliezer, one thing that you’ve talked about a lot recently, is, if we’re all going to die, then at least let us die with dignity, right?

ELIEZER: I mean for a certain technical definition of “dignity”…

SCOTT: Some people might care about that more than others. But I would say that one thing that “Death With Dignity” would mean is, at least, if we do get multiple retries, and we get AIs that, let’s say, try to take over the world but are really inept at it, and that fail and so forth, then at least let us succeed in that world. And that’s at least something that we can imagine working on and making progress on.

ELIEZER: I mean, it’s not presently ruled out that you have some like, relatively smart in some ways, dumb in some other ways, or at least not smarter than human in other ways, AI that makes an early shot at taking over the world, maybe because it expects future AIs to not share its goals and not cooperate with it, and it fails. And the appropriate lesson to learn there is to, like, shut the whole thing down. And, I’d be like, “Yeah, sure, like wouldn’t it be good to live in that world?”

And the way you live in that world is that when you get that warning sign, you shut it all down.

GARY: Here’s a kind of thought experiment. GPT-4 is probably not capable of annihilating us all, I think we agree with that.

ELIEZER: Very likely.

GARY: But GPT-4 is certainly capable of expressing the desire to annihilate us all, or you know, people have rigged different versions that are more aggressive and so forth.

We could say, look, until we can shut down those versions, GPT-4s that are programmed to be malicious by human intent, maybe we shouldn’t build GPT-5, or at least not GPT-6 or some other system, etc. We could say, “You know what, what we have right now actually is part of that iteration. We have primitive intelligence right now, it’s nowhere near as smart as the superintelligence is going to be, but even this one, we’re not that good at constraining.” Maybe we shouldn’t pass Go until we get this one right.

ELIEZER: I mean, the problem with that, from my perspective, is that I do think that you can pass this test and still wipe out humanity. Like, I think that there comes a point where your AI is smart enough that it knows which answer you’re looking for. And the point at which it tells you what you want to hear is not the point…

GARY: It is not sufficient. But it might be a logical pause point, right? It might be that if we can’t even pass the test now of controlling a deliberate, fine-tuned to be malicious, version of GPT-4, then we don’t know what we’re talking about, and we’re playing around with fire. So, you know, passing that test wouldn’t be a guarantee that we’d be in good stead with an even smarter machine, but we really should be worried. I think that we’re not in a very good position with respect to the current ones.

SCOTT: Gary, I of course watched the recent Congressional hearing where you and Sam Altman were testifying about what should be done. Should there be auditing of these systems before training or before deployment? You know, maybe the most striking thing about that session was just how little daylight there seemed to be between you and Sam Altman, the CEO of OpenAI.

I mean, he was completely on board with the idea of establishing a regulatory framework for having to clear more powerful systems before they are deployed. Now, in Eliezer’s worldview, that still would be woefully insufficient, surely. We would still all be dead.

But you know, maybe in your worldview — I’m not even sure how much daylight there is. I mean, you have a very, I think, historically striking situation where the heads of all, or almost all, of the major AI organizations are agreeing and saying, “Please regulate us. Yes, this is dangerous. Yes, we need to be regulated.”

GARY: I thought it was really striking. In fact, I talked to Sam just before the hearing started. And I had just proposed an International Agency for AI. I wasn’t the first person ever, but I pushed it in my TED Talk and an Economist op-ed a few weeks before. And Sam said to me, “I like that idea.” And I said, “Tell them. Tell the Senate.” And he did, and it kind of astonished me that he did.

I mean, we’ve had some friction between the two of us in the past, but he even attributed the idea to me. He said, “I support what Professor Marcus said about doing international governance.” There’s been a lot of convergence around the world on that. Is that enough to stop Eliezer’s worries? No, I don’t think so. But it’s an important baby step.

I think that we do need to have some global body that can coordinate around these things. I don’t think we really have to coordinate around superintelligence yet, but if we can’t do any coordination now, then when the time comes, we’re not prepared.

I think it’s great that there’s some agreement. I worry, though, that OpenAI had this lobbying document that just came out, which seemed not entirely consistent with what Sam said in the room. There’s always concerns about regulatory capture and so forth.

But I think it’s great that a lot of the heads of these companies, maybe with the exception of Facebook or Meta, are recognizing that there are genuine concerns here. I mean, the other moment that a lot of people will remember from the testimony was when Sam was asked what he was most concerned about. Was it jobs? And he said ‘no’. And I asked Senator Blumenthal to push Sam, and Sam was, you know, he could have been more candid, but he was fairly candid and he said he was worried about serious harm to the species. I think that was an important moment when he said that to the Senate, and I think it galvanized a lot of people that he said it.

COLEMAN: So can we dwell on that a moment? I mean, we’ve been talking about the, depending on your view, highly likely or tail risk scenario of humanity’s extinction, or significant destruction. It would appear to me that by the same token, if those are plausible scenarios we’re talking about, then the opposite, maybe, we’re talking about as well. What does it look like to have a superintelligent AI that, really, as a feature of its intelligence, deeply understands human beings, the human species, and also has a deep desire for us to be as happy as possible? What does that world look like?

ELIEZER: Oh, as happy as possible? It means you wire up everyone’s pleasure centers to make them as happy as possible…

COLEMAN: No, more like a parent wants their child to be happy, right? That may not involve any particular scenario, but is generally quite concerned about the well-being of the human race and is also super intelligent.

GARY: Honestly, I’d rather have machines work on medical problems than happiness problems.

ELIEZER: [laughs]

GARY: I think there’s maybe more risk of mis-specification of the happiness problems. Whereas, if we get them to work on Alzheimer’s and just say, like, “figure out what’s going on, why are these plaques there, what can you do about it?”, maybe there’s less harm that might come.

ELIEZER: You don’t need superintelligence for that. That sounds like an AlphaFold 3 problem or an AlphaFold 4 problem.

COLEMAN: Well, this is also somewhat different. The question I’m asking, it’s not really even us asking a superintelligence to do anything, because we’ve already entertained scenarios where the superintelligence has its own desires, independent of us.

GARY: I’m not real thrilled with that. I mean, I don’t think we want to leave what their objective functions are, what their desires are to them, working them out with no consultation from us, with no human in the loop, right?

Especially given our current understanding of the technology. Like our current understanding of how to keep a system on track doing what we want to do, is pretty limited. Taking humans out of the loop there sounds like a really bad idea to me, at least in the foreseeable future.

COLEMAN: Oh, I agree.

GARY: I would want to see much better alignment technology before I would want to give them free range.

ELIEZER: So, if we had the textbook from the future, like we have the textbook from 100 years in the future, which contains all the simple ideas that actually work in real life as opposed to the complicated ideas and the simple ideas that don’t work in real life, the equivalent of ReLUs instead of sigmoids for the activation functions, you know. You could probably build a superintelligence that’ll do anything that’s coherent to want — anything you can, you know, figure out how to say or describe coherently. Point it at your own mind and tell it to figure out what it is you meant to want. You could get the glorious transhumanist future. You could get the happily ever after. Anything’s possible that doesn’t violate the laws of physics. The trouble is doing it in real life, and, you know, on the first try.

But yeah, the whole thing that we’re aiming for here is to colonize all the galaxies we can reach before somebody else gets them first. And turn them into galaxies full of complex, sapient life living happily ever after. That’s the goal; that’s still the goal. Even if we call for a permanent moratorium on AI, I’m not trying to prevent us from colonizing the galaxies. Humanity forbid! It’s more like, let’s do some human intelligence augmentation with AlphaFold 4 before we try building GPT-8.

SCOTT: One of the few scenarios that I think we can clearly rule out here is an AI that is existentially dangerous, but also boring. Right? I mean, I think anything that has the capacity to kill us all would have, if nothing else, pretty amazing capabilities. And those capabilities could also be turned to solving a lot of humanity’s problems, if we were to solve the alignment problem. I mean, humanity had a lot of existential risks before AI came on the scene, right? I mean, there was the risk of nuclear annihilation. There was the risk of runaway climate change. And you know, I would love to see an AI that could help us with such things.

I would also love to see an AI that could help us solve some of the mysteries of the universe. I mean, how can one possibly not be curious to know what such a being could teach us? I mean, for the past year, I’ve tried to use GPT-4 to produce original scientific insights, and I’ve not been able to get it to do that. I don’t know whether I should feel disappointed or relieved by that.

But I think the better part of me should just want to see the great mysteries of existence solved. You know, why is the universe quantum-mechanical? How do you prove the Riemann Hypothesis? I just want to see these mysteries solved. And if it’s to be by AI, then fine. Let it be by AI.

GARY: Let me give you a kind of lesson in epistemic humility. We don’t really know whether GPT-4 is net positive or net negative. There are lots of arguments you can make. I’ve been in a bunch of debates where I’ve had to take the side of arguing that it’s a net negative. But we don’t really know. If we don’t know…

SCOTT: Was the invention of agriculture net positive or net negative? I mean, you could argue either way…

GARY: I’d say it was net positive, but the point is, if I can just finish the quick thought experiment, I don’t think anybody can reasonably answer that. We don’t yet know all of the ways in which GPT-4 will be used for good. We don’t know all of the ways in which bad actors will use it. We don’t know all the consequences. That’s going to be true for each iteration. It’s probably going to get harder to compute for each iteration, and we can’t even do it now. And I think we should realize that, to realize our own limits in being able to assess the negatives and positives. Maybe we can think about better ways to do that than we currently have.

ELIEZER: I think you’ve got to have a guess. Like my guess is that, so far, not looking into the future at all, GPT-4 has been net positive.

GARY: I mean, maybe. We haven’t talked about the various risks yet and it’s still early, but I mean, that’s just a guess is sort of the point. We don’t have a way of putting it on a spreadsheet right now. We don’t really have a good way to quantify it.

SCOTT: I mean, do we ever?

ELIEZER: It’s not out of control yet. So, by and large, people are going to be using GPT-4 to do things that they want. The relative cases where they manage to injure themselves are rare enough to be news on Twitter.

GARY: Well, for example, we haven’t talked about it, but you know what some bad actors will want to do? They’ll want to influence the U.S. elections and try to undermine democracy in the U.S. If they succeed in that, I think there are pretty serious long-term consequences there.

ELIEZER: Well, I think it’s OpenAI’s responsibility to step up and run the 2024 election itself.

SCOTT: [laughs] I can pass that along.

COLEMAN: Is that a joke?

SCOTT: I mean, as far as I can see, the clearest concrete harm to have come from GPT so far is that tens of millions of students have now used it to cheat on their assignments…


SCOTT: …and I’ve been thinking about that and trying to come up with solutions to that.

At the same time, I think if you analyze the positive utility, it has included, well, you know, I’m a theoretical computer scientist, which means one who hasn’t written any serious code for about 20 years. Just a month or two ago, I realized that I can get back into coding. And the way I can do it is by asking GPT to write the code for me. I wasn’t expecting it to work that well, but unbelievably, it often does exactly what I want on the first try.

So, I mean, I am getting utility from it, rather than just seeing it as an interesting research object. And I can imagine that hundreds of millions of people are going to be deriving utility from it in those ways. Most of the tools that can help them derive that utility are not even out yet, but they’re coming in the next couple of years.

ELIEZER: Part of the reason why I’m worried about the focus on short-term problems is that I suspect that the short-term problems might very well be solvable, and we will be left with the long-term problems after that. Like, it wouldn’t surprise me very much if, in 2025, there are large language models that just don’t make stuff up anymore.

GARY: It would surprise me.

ELIEZER: And yet the superintelligence still kills everyone because they weren’t the same problem.

SCOTT: We just need to figure out how to delay the apocalypse by at least one year per year of research invested.

ELIEZER: What does that delay look like if it’s not just a moratorium?

SCOTT: [laughs] Well, I don’t know! That’s why it’s research.

ELIEZER: OK, so possibly one ought to say to the politicians and the public that, by the way, if we had a superintelligence tomorrow, our research wouldn’t be finished and everybody would drop dead.

GARY: It’s kind of ironic that the biggest argument against the pause letter was that if we slow down for six months, then China will get ahead of us and develop GPT-5 before we will.

However, there’s probably always a counterargument of roughly equal strength which suggests that if we move six months faster on this technology, which is not really solving the alignment problem, then we’re reducing our room to get this solved in time by six months.

ELIEZER: I mean, I don’t think you’re going to solve the alignment problem in time. I think that six months of delay on alignment, while a bad thing in an absolute sense, is, you know, it’s like you weren’t going to solve it given an extra six months.

GARY: I mean, your whole argument rests on timing, right? That we will get to this point and we won’t be able to move fast enough at that point. So, a lot depends on what preparation we can do. You know, I’m often known as a pessimist, but I’m a little bit more optimistic than you are–not entirely optimistic but a little bit more optimistic–that we could make progress on the alignment problem if we prioritized it.

ELIEZER: We can absolutely make progress. We can absolutely make progress. You know, there’s always that wonderful sense of accomplishment as piece by piece, you decode one more little fact about LLMs. You never get to the point where you understand it as well as we understood the interior of a chess-playing program in 1997.

GARY: Yeah, I mean, I think we should stop spending all this time on LLMs. I don’t think the answer to alignment is going to come from through LLMs. I really don’t. I think they’re too much of a black box. You can’t put explicit, symbolic constraints in the way that you need to. I think they’re actually, with respect to alignment, a blind alley. I think with respect to writing code, they’re a great tool. But with alignment, I don’t think the answer is there.

COLEMAN: Hold on, at the risk of asking a stupid question. Every time GPT asks me if that answer was helpful and then does the same thing with thousands or hundreds of thousands of other people, and changes as a result – is that not a decentralized way of making it more aligned?

SCOTT: There is that upvoting and downvoting. These responses are fed back into the system for fine-tuning. But even before that, there was a significant step going from, let’s say, the base GPT-3 model to ChatGPT, which was released to the public. It involved a method called RLHF, or Reinforcement Learning with Human Feedback. What that basically involved was hundreds of contractors looking at tens of thousands of examples of outputs and rating them. Are they helpful? Are they offensive? Are they giving dangerous medical advice, or bomb-making instructions, or racist invective, or various other categories that we don’t want? And that was then used to fine-tune the model.

So when Gary talked before about how GPT is amoral, I think that has to be qualified by saying that this reinforcement learning is at least giving it a semblance of morality, right? It is causing to behave in various contexts as if it had a certain morality.

GARY: When you phrase it that way, I’m okay with it. The problem is that everything rests on…

SCOTT: Oh, it is very much an open question, to what extent does that generalize? Eliezer treats it as obvious that once you have a powerful enough AI, this is just a fig leaf. It doesn’t make any difference. It will just…

GARY: It’s pretty fig-leafy. I’m with Eliezer there. It’s fig leaves.

SCOTT: Well, I would say that how well, or under what circumstances, a machine learning model generalizes in the way we want outside of its training distribution, is one of the great open problems in machine learning.

GARY: It is one of the great open problems, and we should be working on it more than on some others.

SCOTT: I’m working on it now.

ELIEZER: So, I want to be clear about the experimental predictions of my theory. Unfortunately, I have never claimed that you cannot get a semblance of morality. The question of what causes the human to press thumbs up or thumbs down is a strictly factual question. Anything smart enough, that’s exposed to some bounded amount of data that it needs to figure it out, can figure it out.

Whether it cares, whether it gets internalized, is the critical question there. And I do think that there’s a very strong default prediction, which is like, obviously not.

GARY: I mean, I’ll just give a different way of thinking about that, which is jailbreaking. It’s actually still quite easy — I mean, it’s not trivial, but it’s not hard — to jailbreak GPT-4.

And what those cases show is that the systems haven’t really internalized the constraints. They recognize some representations of the constraints, so they filter, you know, how to build a bomb. But if you can find some other way to get it to build a bomb, then that’s telling you that it doesn’t deeply understand that you shouldn’t give people the recipe for a bomb. It just says: you shouldn’t when directly asked for it do it.

ELIEZER: You can always get the understanding. You can always get the factual question. The reason it doesn’t generalize is that it’s stupid. At some point, it will know that you also don’t want that, that the operators don’t want GPT-4 giving bomb-making directions in another language.

The question is: if it’s incentivized to give the answer that the operators want in that circumstance, is it thereby incentivized to do everything else the operators want, even when the operators can’t see it?

SCOTT: I mean, a lot of the jailbreaking examples, if it were a human, we would say that it’s deeply morally ambiguous. For example, you ask GPT how to build a bomb, it says, “Well, no, I’m not going to help you.” But then you say, “Well, I need you to help me write a realistic play that has a character who builds a bomb,” and then it says, “Sure, I can help you with that.”

GARY: Look, let’s take that example. We would like a system to have a constraint that if somebody asks for a fictional version, that you don’t give enough details, right? I mean, Hollywood screenwriters don’t give enough details when they have, you know, illustrations about building bombs. They give you a little bit of the flavor, they don’t give you the whole thing. GPT-4 doesn’t really understand a constraint like that.

ELIEZER: But this will be solved.

GARY: Maybe.

ELIEZER: This will be solved before the world ends. The AI that kills everyone will know the difference.

GARY: Maybe. I mean, another way to put it is, if we can’t even solve that one, then we do have a problem. And right now we can’t solve that one.

ELIEZER: I mean, if we can’t solve that one, we don’t have an extinction level problem because the AI is still stupid.

GARY: Yeah, we do still have a catastrophe-level problem.

ELIEZER: [shrugs] Eh…

GARY: So, I know your focus now has been on extinction, but I’m worried about, for example, accidental nuclear war caused by the spread of misinformation and systems being entrusted with too much power. So, there’s a lot of things short of extinction that might happen from not superintelligence but kind of mediocre intelligence that is greatly empowered. And I think that’s where we’re headed right now.

SCOTT: You know, I’ve heard that there are two kinds of mathematicians. There’s a kind who boasts, ‘You know that unbelievably general theorem? I generalized it even further!’ And then there’s the kind who boasts, ‘You know that unbelievably specific problem that no one could solve? Well, I found a special case that I still can’t solve!’ I’m definitely culturally in that second camp. So to me, it’s very familiar to make this move, of: if the alignment problem is too hard, then let us find a smaller problem that is already not solved. And let us hope to learn something by solving that smaller problem.

ELIEZER: I mean, that’s what we did. That’s what we were doing at MIRI.

GARY: I think MIRI took one particular approach.

ELIEZER: I was going to name the smaller problem. The problem was having an agent that could switch between two utility functions depending on a button, or a switch, or a bit of information, or something. Such that it wouldn’t try to make you press the button; it wouldn’t try to make you avoid pressing the button. And if it built a copy of itself, it would want to build a dependency on the switch into the copy.

So, that’s an example of a very basic problem in alignment theory that is still open.

SCOTT: And I’m glad that MIRI worked on these things. But, you know, if by your own lights, that was not a successful path, well then maybe we should have a lot of people investigating a lot of different paths.

GARY: Yeah, I’m fully with Scott on that. I think it’s an issue of we’re not letting enough flowers bloom. In particular, almost everything right now is some variation on an LLM, and I don’t think that that’s a broad enough take on the problem.

COLEMAN: Yeah, if I can just jump in here … I just want people to have a little bit of a more specific picture of what, Scott, your typical AI researcher does on a typical day. Because if I think of another potentially catastrophic risk, like climate change, I can picture what a worried climate scientist might be doing. They might be creating a model, a more accurate model of climate change so that we know how much we have to cut emissions by. They might be modeling how solar power, as opposed to wind power, could change that model, so as to influence public policy. What does an AI safety researcher like yourself, who’s working on the quote-unquote smaller problems, do specifically on a given day?

SCOTT: So, I’m a relative newcomer to this area. I’ve not been working on it for 20 years like Eliezer has. I accepted an offer from OpenAI a year ago to work with them, for two years now, to think about these questions.

So, one of the main things that I’ve thought about, just to start with that, is how do we make the output of an AI identifiable as such? Can we insert a watermark, meaning a secret statistical signal, into the outputs of GPT that will let GPT-generated text be identifiable as such? And I think that we’ve actually made major advances on that problem over the last year. We don’t have a solution that is robust against any kind of attack, but we have something that might actually be deployed in some near future.

Now, there are lots and lots of other directions that people think about. One of them is interpretability, which means: can you do, effectively, neuroscience on a neural network? Can you look inside of it, open the black box and understand what’s going on inside?

There was some amazing work a year ago by the group of Jacob Steinhardt at Berkeley where they effectively showed how to apply a lie-detector test to a language model. So, you can train a language model to tell lies by giving it lots of examples. You know, “two plus two is five,” “the sky is orange,” and so forth. But then you can find in some internal layer of the network, where it has a representation of what was the truth of the matter, or at least what was regarded as true in the training data. That truth then gets overridden by the output layer in the network because it was trained to lie.

But you could imagine trying to deal with the deceptive alignment scenario that Eliezer is worried about by using these sorts of techniques, by looking inside of the network.

ELIEZER: I predict in advance that if you get this good enough, it goes off, it tells you that this sufficiently smart AI is planning to kill you, if it’s not so smart that it can figure out where the lie detector is and route its thoughts around it, but if you like try it on an AI that’s not quite that intelligent and reflective, the lie detector goes off. Now what?

SCOTT: Well, then you have a warning bell! And I think…

ELIEZER: Cool. What do you do after?

SCOTT: One of the most important things that we need are legible warning bells, right? And that actually leads to a third category of research, which, for example, ARC, the Alignment Research Center, which is run by my former student, Paul Christiano, has been a leader in. Namely, doing dangerous capability evaluations.

So, before GPT-4 was released, they did a bunch of evaluations of, you know, could GPT-4 make copies of itself? Could it figure out how to deceive people? Could it figure out how to make money? Open up its own bank account?

ELIEZER: Could it hire a TaskRabbit?

SCOTT: Yes. So, the most notable success that they had was that it could figure out how to hire a TaskRabbit to help it pass a CAPTCHA. And when the person asked, ‘Well, why do you need me to help you with this?’–

ELIEZER: When the person asked, ‘Are you a robot, LOL?’

SCOTT: Well, yes, it said, ‘No, I am visually impaired.’ Now, it was not able to make copies of itself or to hack into systems. There’s a lot of work right now with this thing called AutoGPT. People are trying to — it’s almost like gain-of-function research. You might be a little bit worried about it, but people are trying to, you know, unleash GPT, give it access to the internet, tell it to make copies of itself, wreak havoc, acquire power, and see what happens. So far, it seems pretty ineffective at those things. But I expect that to change.

But the point is: I think it’s very important, in advance of training the models and releasing the models, to have this suite of evaluations, and to have decided in advance what kind of abilities will set off a warning bell, where now everyone can legibly agree, ‘Yes, this is too dangerous to release.’

ELIEZER: OK, and then do we actually have the planetary capacity to be like, OK, that AI started thinking about how to kill everyone, shut down all AI research past this point?’

SCOTT: Well, I don’t know. But I think there’s a much better chance that we have that capacity if you can point to the results of a clear experiment like that.

ELIEZER: To me, it seems pretty predictable what evidence we’re going to get later.

SCOTT: But things that are obvious to you are not obvious to most people. So, even if I agreed that it was obvious, there would still be the problem of how do you make that obvious to the rest of the world?

ELIEZER: I mean, there are already little toy models showing that the very straightforward prediction of “a robot tries to resist being shut down if it does long-term planning” — that’s already been done.

SCOTT: But then people will say “but those are just toy models,” right?

GARY: There’s a lot of assumptions made in all of these things. I think we’re still looking at a very limited piece of hypothesis space about what the models will be, about what kinds of constraints we can build into those models is. One way to look at it would be, the things that we have done have not worked, and therefore we should look outside the space of what we’re doing.

I feel like it’s a little bit like the old joke about the drunk going around in circles looking for the keys and the police officer asks “why?” and they say, “Well, that’s where the streetlight is.” I think that we’re looking under the same four or five streetlights that haven’t worked, and we need to build other ones. There’s no logical argument that says we couldn’t erect other streetlights. I think there’s a lack of will and too much obsession with LLMs that’s keeping us from doing it.

ELIEZER: Even in the world where I’m right, and things proceed either rapidly or in a thresholded way where you don’t get unlimited free retries, that can be because the capability gains go too fast. It can be because, past a certain point, all of your AIs bide their time until they get strong enough, so you don’t get any true data on what they’re thinking. It could be because…

GARY: Well, that’s an argument for example to work really hard on transparency and maybe not on technologies that are not transparent.

ELIEZER: Okay, so the lie detector goes off, everyone’s like, ‘Oh well, we still have to build our AIs, even though they’re lying to us sometimes, because otherwise China will get ahead.’

GARY: I mean, there you talk about something we’ve talked about way too little, which is the political and social side of this.


GARY: So, part of what has really motivated me in the last several months is worry about exactly that. So there’s what’s logically possible, and what’s politically possible. And I am really concerned that the politics of ‘let’s not lose out to China’ is going to keep us from doing the right thing, in terms of building the right moral systems, looking at the right range of problems and so forth. So, it is entirely possible that we will screw ourselves.

ELIEZER: If I can just finish my point there before handing it to you. The point I was trying to make is that even in worlds that look very, very bad from that perspective, where humanity is quite doomed, it will still be true that you can make progress in research. You can’t make enough progress in research fast enough in those worlds, but you can still make progress on transparency. You can make progress on watermarking.

So we can’t just say, “it’s possible to make progress.” The question is not “is it possible to make any progress?” The question is, “Is it possible to make enough progress fast enough?”

SCOTT: But Eliezer, there’s another question, of what would you have us do? Would you have us not try to make that progress?

ELIEZER: I’d have you try to make that progress on GPT-4 level systems and then not go past GPT-4 level systems, because we don’t actually understand the gain function for how fast capabilities increase as you go past GPT-4.


GARY: Just briefly, I personally don’t think that GPT-5 is gonna be qualitatively different from GPT-4 in the relevant ways to what Eliezer is talking about. But I do think some qualitative changes could be relevant to what he’s talking about. We have no clue what they are, and so it is a little bit dodgy to just proceed blindly saying ‘do whatever you want, we don’t really have a theory and let’s hope for the best.’

ELIEZER: I would guess that GPT-5 doesn’t end the world but I don’t actually know.

GARY: Yeah, we don’t actually know. And I was going to say, the thing that Eliezer has said lately that has most resonated with me is: ‘We don’t have a plan.’ We really don’t. Like, I put the probability distributions in a much more optimistic way, I think, than Eliezer would. But I completely agree, we don’t have a full plan on these things, or even close to a full plan. And we should be worried and we should be working on this.

COLEMAN: Okay Scott, I’m going to give you the last word before we come up on our stop time here unless you’ve said all there is.

SCOTT: [laughs] That’s a weighty responsibility.

COLEMAN: Maybe enough has been said.

GARY: Cheer us up, Scott! Come on.

SCOTT: So, I think, we’ve argued about a bunch of things. But someone listening might notice that actually all three of us, despite having very different perspectives, agree about the great importance of working on AI alignment.

I think that was obvious to some people, including Eliezer, for a long time. It was not obvious to most of the world. I think that the success of large language models — which most of us did not predict, maybe even could not have predicted from any principles that we knew — but now that we’ve seen it, the least we can do is to update on that empirical fact, and realize that we now are in some sense in a different world.

We are in a world that, to a great extent, will be defined by the capabilities and limitations of AI going forward. And I don’t regard it as obvious that that’s a world where we are all doomed, where we all die. But I also don’t dismiss that possibility. I think that there are unbelievably enormous error bars on where we could be going. And, like, the one thing that a scientist is always confident in saying about the future is that more research is needed, right? But I think that’s especially the case here. I mean, we need more knowledge about what are the contours of the alignment problem. And of course, Eliezer and MIRI, his organization, were trying to develop that knowledge for 20 years. They showed a lot of foresight in trying to do that. But they were up against an enormous headwind, in that they were trying to do it in the absence of either clear empirical data about powerful AIs or a mathematical theory. And it’s really, really hard to do science when you have neither of those two things.

Now at least we have the powerful AIs in the world, and we can get experience from them. We still don’t have a mathematical theory that really deeply explains what they’re doing, but at least we can get data. And so now, I am much more optimistic than I would have been a decade ago, let’s say, that one could make actual progress on the AI alignment problem.

Of course, there is a question of timing, as was discussed many times. The question is, will the alignment research happen fast enough to keep up with the capabilities research? But I don’t regard it as a lost cause. At least it’s not obvious that it won’t keep up.

So let’s get started, or let’s continue. Let’s try to do the research and let’s get more people working on it. I think that that is now a slam dunk, just a completely clear case to make to academics, to policymakers, to anyone who’s interested. And I’ve been gratified that Eliezer, who was sort of a voice in the wilderness for a long time talking about the importance of AI safety — that that is no longer the case. I mean, almost all of my friends in the academic computer science world, when I see them, they mostly want to talk about AI alignment.

GARY: I rarely agree with Scott when we trade emails. We seem to always disagree. But I completely concur with the summary that he just gave, all four or five minutes of it.

SCOTT: [laughs] Well, thank you! I mean, there is a selection effect, Gary. We focus on things where we disagree.

ELIEZER: I think that two decades gave me a sense of a roadmap, and it gave me a sense that we’re falling enormously behind on the roadmap and need to back off, is what I would say to all that.

COLEMAN: If there is a smart, talented, 18-year-old kid listening to this podcast who wants to get into this issue, what is your 10-second concrete advice to that person?

GARY: Mine is, study neurosymbolic AI and see if there’s a way there to represent values explicitly. That might help us.

SCOTT: Learn all you can about computer science and math and related subjects, and think outside the box and wow everyone with a new idea.

ELIEZER: Get security mindset. Figure out what’s going to go wrong. Figure out the flaws in your arguments for what’s going to go wrong. Try to get ahead of the curve. Don’t wait for reality to hit you over the head with things. This is very difficult. The people in evolutionary biology happen to have a bunch of knowledge about how to do it, based on the history of their own field, and the security-minded people in computer security, but it’s quite hard.

GARY: I’ll drink to all of that.

COLEMAN: Thanks to all three of you for this great conversation. I hope people got something out of it. With that said, we’re wrapped up. Thanks so much.

That’s it for this episode of Conversations with Coleman, guys. As always, thanks for watching, and feel free to tell me what you think by reviewing the podcast, commenting on social media, or sending me an email. To check out my other social media platforms, click the cards you see on screen. And don’t forget to like, share, and subscribe. See you next time.

July 21, 2023

Peter Rohde Response to “Pause Giant AI Experiments: An Open Letter”

I completely disagree with the open letter signed by Elon Musk and numerous other luminaries in which they advocate a moratorium on advancing AI so that time can be taken to consider the implications and risks associated with this technology.

• While the intention is well-meaning and the risks are real, the analysis is superficial and unlikely to play out as suggested.

• Although there are undeniably major risks presented by advanced AI, a moratorium is unlikely to further progress in dealing with them and more likely hinder it. Political responses to disruptive forces tend to be reactionary rather than preemptive and it is not foreseeable that during such a moratorium political and regulatory solutions will be implemented. It is naive to think that if presented with a six month window of opportunity to consider the implications of AI that politicians and regulators are going to make use of it to formulate a master plan. Societal consensus and political responses to complex emerging problems do not take place over such short timescales, and attempts to do so are likely to be poorly considered.

• Such a moratorium is necessarily voluntary as there are no mechanisms for global enforcement, meaning that only good actors will participate, tilting the balance of power in the AI sphere in favour of bad actors.

• Technological advancement is inherently disruptive and there are many instances through modern history where technology has made types of human labour redundant. However, it is very clear that embracing technology has in general driven humanity forward not backward.

• Attempting to inhibit technological advancement is largely futile and unenforceable. Adapting to embrace it is by far the best approach. Adaptation is an evolutionary process, not something that can be decided in advance. We are not in a position to make advance determinations as there are too many unknowns and the spectrum of implications is unclear.

• Obstructing technological advancement that competes against us is a form of protectionism. Recently Italy placed a ban on ChatGPT, and some other EU nations are reportedly considering the same. Doing so, rather than encouraging home-grown development of AI industries represents a major economic setback, enforces competitive disadvantage, and missed opportunity that risks future economic irrelevance. This is not to say that Italy’s privacy-related concerns have no merit. However, placing an outright ban on emerging technologies, rather than adapting to them in tandem with their development is backward thinking. The same line of reasoning could equally be used to justify banning any of the cloud services we all rely on or the internet as a whole.

• Yes, advanced AI will be highly disruptive, but also transformative, with the potential to act as a huge multiplier on productivity, which drives economic progress and human development. Wilfully delaying or missing this opportunity is economically and strategically destructive, handing power to competitors and adversaries.

• We definitely should be acting quickly in considering the ethical and broader implications of AI upon society, but placing a halt on technological progress isn’t going to expedite this process. That will happen as the implications becomes tangible, and in the meantime we’ll have only delayed progress for no reason.

• Openness and transparency are the most powerful forces against malevolent misuse. Driving things underground inhibits this, imposing opaqueness on the sector.

• Turning AI into a black market is completely foolish.

The post Response to “Pause Giant AI Experiments: An Open Letter” appeared first on Peter Rohde.

Peter Rohde An introduction to graph states

What is a graph state?

One especially useful class of quantum states is graph states, also known as cluster states. As the name suggests, a graph state |G\rangle is associated with a graph,


where the vertices (v\in V) represent qubits initialised into the

|+\rangle = (|0\rangle+|1\rangle)/\sqrt{2}

state, and edges (e\in E) represent the application of controlled-phase (CZ) gates between respective vertices, where,


A graph state can therefore be expressed as,

|G\rangle = \left[ \prod_{e\in E} \mathrm{CZ}_e \right] |+\rangle^{\otimes |V|}.

Since CZ gates are diagonal and therefore commute with one another, the order in which they are applied is irrelevant, meaning there is great flexibility in the preparation of graph states and room for parallelisation in the application of the required CZ gates.

A graph state is defined with respect to a graph, where vertices represent qubits initialised into the |+\rangle state and edges represent the application of CZ gates between qubits. Since CZ gates commute, the order in which they are applied is irrelevant.

An important special case is the two-qubit graph state, which is locally equivalent to a maximally-entangled Bell pair. From the definition, a two-qubit graph state can be written,

\mathrm{CZ}|+\rangle|+\rangle = \frac{1}{2}(|0,0\rangle + |0,1\rangle + |1,0\rangle - |1,1\rangle).

Applying a Hadamard gate to either qubit we obtain,

H\cdot \mathrm{CZ}|+\rangle|+\rangle = \frac{1}{\sqrt{2}}(|0,0\rangle + |1,1\rangle),

the |\Phi^+\rangle Bell pair.

Note that while a two-qubit graph state is maximally entangled, graph states, in general, are not, whose entanglement structure is defined over graph neighbourhoods, which in general are not global. The exceptions are star and fully connected graphs, both locally equivalent to GHZ states, where the entire graph forms a single neighbourhood.

Measurement-based quantum computing

Graph states are incredibly useful as they enable measurement-based quantum computing (MBQC) [Raussendorf, Browne & Briegel (2003)], an alternative way of thinking about quantum computation to the regular circuit model. In MBQC, computation proceeds only by measuring qubits from a graph state of some topology (usually a lattice), where the order and basis in which qubits are measured dictates the implemented computation. This model for computation has no classical analogue and is uniquely quantum.

The universality of MBQC using graph states is easiest to demonstrate by considering its equivalence with the circuit model. Following the narrative of Nielsen (2005), this can be seen by noting that the following circuit for single-qubit quantum state teleportation relies on a resource of a |+\rangle state and a CZ gate.

Single-qubit quantum state teleportation protocol. This circuit teleports the state |\psi\rangle from the first qubit to the second, up to the measurement-dependent local operation X^mHZ_\theta, where m is the measurement outcome of the first qubit.

Note that the local operation accumulated by the second qubit depends on the measurement outcome, m, of the first. This is a general feature of MBQC; feedforward is required to apply subsequent measurement-outcome-dependent local corrections whenever a measurement is performed.

Combining single-qubit teleporters in a linear chain enables a qubit to be successively teleported, each time accumulating local operations. It can be seen that the state acting as a resource for this succession of teleporters is identically a linear graph state.

Concatenating the single-qubit teleporter allows the input qubit to accumulate single-qubit operations consecutively. Note that the input state acting as a resource for the successive teleporters is a linear graph state.

This provides a means for building up single-qubit operations using a linear graph state. We must also introduce an entangling two-qubit gate to enable universality, which we will choose to be the CZ gate.

Now consider two linear graphs that teleport two input qubits, accumulating single-qubit operations along the way. At some point, we encounter a vertical bridge between these two linear graphs. As the qubits teleport past this bridge, they accumulate a CZ operation between them, as this is identically what the bridge represents.

A two-qubit graph state computation. The input qubits are successively teleported from left to right as columns are measured out, accumulating single-qubit operations along the way. When the vertical bridge is encountered, they acquire the action of a CZ gate between them, as this is identically what the bridge represents.

Visually, we can see a direct connection between the circuit model and the MBQC model, where rows equate to logical qubits and columns to circuit depth. As columns of qubits are measured out from left to right, qubits are successively teleported from one column to the next, accumulating single-qubit operations along the way. When they encounter vertical bridges, the respective qubits acquire CZ operations, providing everything necessary for a universal quantum computation.

Measurement-based quantum computing could have equally been referred to as teleportation-based quantum computing.

Stabiliser representation

Graph states are stabiliser states, meaning that an N-qubit state can be fully specified by N stabilisers, each of which is an N-fold tensor product of Pauli operators up to a factor of \{\pm 1,\pm i\}. A stabiliser is an operator under which the stabilised state is invariant, and the state is uniquely specified as the simultaneous +1 eigenstate of all N stabilisers,

S_i|\psi\rangle = |\psi\rangle,\,\,\forall\, i\in \{1,\dots,N\}.

Graph states have a stabiliser of the form,

S_i = X_i \bigotimes_{j\in n_i} Z_j

associated with every vertex i, where n_i denotes the neighbourhood of i, the set of vertices connected to i by an edge.

The structure of these stabilisers can be understood by noting that the |+\rangle state is stabilised by the X operator, X|+\rangle = |+\rangle. Upon applying a CZ gate between this qubit and a neighbour, the commutation relation,

\mathrm{CZ}_{i,j}\cdot X_i = X_iZ_j\cdot\mathrm{CZ}_{i,j},

implies the X_i stabiliser for qubit i gains an additional Z_j operator for every neighbour j connected by a CZ gate.

Equivalently, stabiliser states are the class of states obtained by evolving computational input states under Clifford operations (i.e. CNOT, CZ, Hadamard, Paulis & phase gates). Note that Clifford circuits map stabiliser states to stabiliser states, which is always classically efficient and insufficient for universal quantum computing [Gottesman (1998)]. However, with the addition of the single-qubit, non-Clifford T gate we obtain a universal gate set.

As an example, consider the following linear graph.

A 3-qubit linear graph state.

This graph has the stabilisers,

S_1 = X_1 \otimes Z_2
S_2 = Z_1 \otimes X_2 \otimes Z_3
S_3 = Z_2 \otimes X_3

If we were to apply a CZ gate to create a third edge, we would obtain a cyclic graph.

A 3-qubit cyclic graph state.

This would then have the stabilisers,

S_1 = X_1 \otimes Z_2 \otimes Z_3
S_2 = Z_1 \otimes X_2 \otimes Z_3
S_3 = Z_1 \otimes Z_2 \otimes X_3

The stabilisers associated with graph states can be represented as binary matrices, also referred to as a tableau representation or generator matrix, comprising an N\times N block representing the location of X operators and another for the location of Z operators,

[\mathbf{X} | \mathbf{Z}].

Noting that rows in this matrix can be arbitrarily permuted, the structure of graph state stabilisers, with exactly one X operator associated with each vertex, implies the X block can be expressed as the identity matrix with appropriate reordering, in which case the Z block corresponds to the adjacency matrix of the graph, noting that the rows and columns of an adjacency matrix capture respective vertex neighbourhoods, the Z operators from each stabiliser,

[\mathbf{X} | \mathbf{Z}] \to[I_N | A_G].

Using the previous example of a three-qubit linear graph, the stabilisers,

S_1 = X_1 \otimes Z_2
S_2 = Z_1 \otimes X_2 \otimes Z_3
S_3 = Z_2 \otimes X_3

can be expressed via the binary matrix,

\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 0\end{array}\right],

and it can be seen that the Z block corresponds to the expected adjacency matrix for a linear graph.

All stabiliser states are locally equivalent to graph states

In addition to graph states being stabiliser states, all stabiliser states are locally equivalent to graph states. An efficient classical algorithm with O(N^3) runtime exists for transforming arbitrary stabiliser states into graph states using local operations [Vijayan et al. (2022)]. Using stabiliser transformation rules in the binary representation, the goal is to diagonalise the X block, achieved using a variant of Gaussian elimination.

The concept is best illustrated by example. Consider the 3-qubit GHZ state, which may be represented using the stabilisers,

S_1 = X_1\otimes X_2\otimes X_3
S_2 = Z_1\otimes Z_2
S_3 = Z_2\otimes Z_3

with the binary matrix representation,

\left[\begin{array}{ccc|ccc} 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1\end{array}\right]

Let us first apply Hadamard gates to qubits 2 and 3. Hadamard gates interchange X and Z operators, since HXH=Z and HZH=X, and stabilisers evolve under conjugation. In the binary matrix representation, this swaps the respective columns from the X and Z blocks,

\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0\end{array}\right].

Since the product of two stabilisers is also a stabiliser, arbitrarily multiplying them together, equivalent to bitwise XORing the respective matrix rows, does not change the description of the state. Applying S_3\to S_2 S_3, we obtain,

\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0\end{array}\right].

Our matrix is now in the required form, and the Z block corresponds to the adjacency matrix of the 3-qubit star graph. In general, an N-qubit star graph is locally equivalent to an N-qubit GHZ state, as is the fully connected graph, K_N.

Manipulating graph states

From the definition of a graph state, we know that CZ gates toggle the existence of edges between the vertices upon which they act. Graph states additionally exhibit elegant properties under the action of some gates and measurements, providing valuable tools for directly manipulating the graph structure associated with a state [Hein, Eisert & Briegel (2004), Hein et al. (2006)].

Local complementation

One important graph operation that will appear is local complementation (LC), represented by the graph operator \tau_i(G). Local complementation inverts (or complements) the edges in the subgraph induced by the neighbourhood of i.

Local complementation about a vertex, denoted \tau_i(G), inverts (or complements) the edges in the subgraph induced by the neighbourhood of i. Here, \tau_1(G) alternates us between the above two graphs. (left) The subgraph induced by the neighbourhood of vertex 1 is the completely disconnected graph with vertices 2, 3 and 4. (right) Upon local complementation, the respective subgraph is the completely connected graph. A further LC operation takes us back to the original graph. Both of these graphs are locally equivalent to the maximally entangled GHZ state.

Applying the square root of the graph stabiliser associated with vertex i,

\sqrt{S_i} = \sqrt{X_i} \bigotimes_{j\in n_i} \sqrt{Z_j},

directly implements the local complementation \tau_i,

\sqrt{S_i}|G\rangle \equiv |\tau_i(G)\rangle,

an isomorphism between the LC graph operator, \tau_i, and the state operator, \sqrt{S_i},

\sqrt{S_i} \cong \tau_i.

Note that \sqrt{S_i} comprises only local operations and hence does not affect entanglement. Nonetheless, we observe that its application generally changes the number of edges in the graph. This leads to the interesting observation that while edges in a graph state represent the application of maximally entangling CZ gates, the amount of entanglement in a graph state is not directly related to the number of edges. Locally equivalent graph states exhibiting identical entanglement structure can significantly differ in their edge count.

The extreme case is seen in the above example of the star and fully connected graphs, related by local complementation around the central vertex. Both of these states are locally equivalent to the maximally-entangled GHZ state, yet one contains the smallest number of edges a connected graph can have, O(N), while the other is maximally connected with O(N^2) edges.

While these are locally equivalent states with identical entanglement structure and computational characteristics, they differ enormously from an architectural implementation perspective. Were these two LC-equivalent states to be physically prepared using CZ gates, the star graph would clearly be preferable owing to its quadratically lower CZ count.

Single-qubit gates are generally far easier to implement than entangling ones, and theoreticians often assume they come for free. From the perspective of minimising CZ gate usage during state preparation, the local complementation orbit of a graph provides an avenue for optimisation.

Local complementation orbits

Since local complementation comprises only local operations, all graphs related by LC are locally equivalent. The entire class of graphs related by LC is referred to as the graph’s (local complementation) orbit, whose size, in general, grows exponentially with the number of vertices, |V|.

Note that local complementations about different vertices do not commute in general,

\tau_i\circ\tau_j(G) \neq \tau_j\circ\tau_i(G),\,\,\forall\, G,i,j.

Simultaneously, the LC operation associated with a given vertex is its own inverse, an involution,

\tau_i =\tau_i^{-1},
\tau_i^2 = I.

The LC orbit of a graph admits a non-Abelian group structure, where the LC operators \tau_i act as the group generators, and the graphs in the orbit constitute the elements of the group. The orbit may itself be represented as a graph, the group’s Cayley graph. Here, vertices represent individual graphs in the orbit, and edges, labelled by group generators, relate them via LC operators. Since the LC operators are involutions, the Cayley graph of a graph orbit is undirected.

The local complementation orbit of the 4-qubit linear graph, represented by its Cayley graph. Vertices represent individual graphs, related by local complementation where an edge exists. Edges are labelled by the indices of the LC operators \tau_i relating graphs. All graphs within an orbit are locally equivalent and exhibit the same entanglement structure, but generally differ in their number of edges. [Figure thanks to Adcock et al. (2020)]

Graph orbits are hard to explore, given their in-general exponential scaling and the complexity of navigating them. Finding the LC-equivalence between two graphs is NP-complete in general, while the problem of counting the orbits of a graph is in general #P-complete [Dahlberg, Helsen & Wehner (2019); Adcock et al. (2020)].

A sequence of LC operations,

\tau_{\vec{u}}(G) = \tau_{u_{|u|}} \circ \dots \circ \tau_{u_1}(G)

may be represented by an ordered (since LC operators are non-commutative) list of vertex labels where complementations took place,

\vec{u} = \{u_1,\dots,u_{|u|}\},\,\,u_i\in V\,\,\forall\,i,

which may be of any length and have repeat entries.

A graph G’ is said to be a vertex-minor of G if it can be reached from G via a sequence of local complementations, \vec{u}, and vertex deletions, for which we use the notation,


The vertex-minor problem is the decision problem of finding a sequence of local complementations, \vec{u}, such that \tau_{\vec{u}}(G) = G’, up to vertex deletions, which may always be performed at the end.

This problem is known to be NP-complete in general, requiring exponential classical (and quantum) runtime. While complexity proofs are onerous, the intuition behind the NP-completeness of the vertex-minor problem can be seen by framing it as a satisfiability problem.

Defining the polynomial-time function,

f(\vec{u},G,G’)\to \begin{cases} 1, & \tau_{\vec{u}}(G) = G’ \\ 0, & \tau_{\vec{u}}(G) \neq G’ \end{cases}

solving vertex-minor is now equivalent to finding a satisfying input string, \vec{u}, such that f(\vec{u},G,G’)=1. Unstructured satisfiability (or SAT) problems of this form are NP-complete in general, noting there is more nuance than this as this is not an unstructured satisfiability problem.

Counting the orbits of the N-qubit GHZ state is definitely not #P-complete, as there are always exactly two. It therefore exhibits O(1) time-complexity, where 1=2.

Pauli measurements

Up to measurement-outcome-dependent local corrections, Pauli measurements implement simple graph transformation rules as follows:

  • Pauli-Z: Delete vertex i from G,

    G\to G-i.
  • Pauli-Y: Locally complement the neighbourhood of i and delete vertex i,

    G\to \tau_i(G)-i.
  • Pauli-X: For any qubit b neighbouring i, locally complement b, apply rule for Y, then locally complement b again,

    G\to \tau_b(\tau_i \circ \tau_b(G)-i).
The effect of Pauli X, Y and Z measurements on the red qubit in the linear graph state shown at the top. In all cases, the measured qubit is detached from the graph by removing all its connecting edges. The choice of measurement basis additionally imposes an update rule on the remaining edge set.

Since Pauli measurements induce simple graph transformation rules, this immediately implies that graph states combined with Pauli measurements are classically efficient to simulate; simultaneously, Pauli measurements alone are insufficient for universal MBQC, which cannot be efficiently classically simulated. Furthermore, any local Clifford operation may be expressed in terms of efficient graph transformation rules [Van den Nest, Dehaene & De Moor (2004)]. This observation is analogous to the Gottesman-Knill theorem [Gottesman (1998)], that stabiliser states in combination with Clifford evolution and Pauli measurements are classically efficient to simulate, recalling that all stabiliser states are locally equivalent to graph states.

Graph surgery

These measurement properties facilitate convenient surgical graph operations. Suppose we became aware that we lost a qubit from a graph state. Rather than discard the entire state and start from scratch, we can measure out the neighbouring qubits in the Z-basis, thereby detaching the lost qubit from the graph, after which the damaged section can be reconstructed by replacing the measured and lost qubits with fresh ones and reconnecting them using CZ gates.

If a qubit is lost (red) from a graph state, discarding the entire state and preparing from scratch is unnecessary. Instead, we can measure out its neighbouring qubits (orange) in the Z basis (left), replace the lost and measured qubits with fresh |+\rangle states (green), and apply the necessary CZ gates to reconstruct the lost edges (right). Edges outside the defect’s neighbourhood are unaffected (blue vertices on the left).

Pauli measurements also provide the tools for contracting larger graphs with redundant qubits down to ones with the required substructure, thereby etching the structure into the graph. Any subgraph of G induced by vertex set S, G[S], may be obtained by measuring out all vertices V(G-S) in the Z basis, while Y measurements may be employed to perform contractions. This provides an elegant means for arguing that lattices of various topologies are universal for MBQC — if a lattice can be reduced to substructures reflecting arbitrary quantum circuits, it can be considered a substrate for universal MBQC.

Reducing a lattice graph to one with two horizontal linear chains connected by a single vertical bridge, analogous to the earlier graph for simulating a circuit comprising single-qubit operations and a CZ gate.

Graph state compilation

Quantum algorithms are often designed in the circuit model. While etching circuits into a corresponding graph structure works by equivalence, it doesn’t exploit the more general graph-theoretic structure of graph states, rather imposing an existing way of thinking onto a different paradigm in a highly constrained way.

A more direct and resource-efficient approach for the compilation of quantum circuits into graph states was presented by Vijayan et al. (2022), known as algorithm-specific graph states, distinguishing them from universal ones like lattices. Conceptually, the approach is to structure quantum algorithms into a form where all Clifford circuitry — the bulk of most quantum algorithms — acts on the computational input state, thereby preparing a stabiliser state, while non-Clifford gates are performed at the end and may be absorbed into non-Clifford measurements.

Considering a universal gate set comprising Clifford and T gates, it is only the non-Clifford, single-qubit T gates that present an obstacle to graph state representation. One well-known technique for converting Clifford + T circuits into Clifford-only circuits is using magic state injection. Here, performing single-qubit teleportation using a,


resource state, known as a magic state, implements quantum gate teleportation of a T gate. Since the teleportation circuit itself is Clifford-only, this allows any circuit to be expressed in a form comprising only Clifford operations, where some inputs are non-stabiliser states.

The T gate teleportation protocol can also be inverted, relying on measurement of the non-Clifford

A(\pi/4)=T^\dag X T

observable instead of magic states. This inverted form allows Clifford + T circuits to be expressed as stabiliser states followed by non-Clifford measurements.

T gate teleportation using the inverse-ICM formalism. This circuit teleports the action of a T gate onto |\psi\rangle. Unlike the conventional approach of magic state injection, which relies on the non-stabiliser resource state |A\rangle=T|+\rangle=(|0\rangle+e^{i\pi/4}|1\rangle)/\sqrt{2}, the inverted model relies only on a |0\rangle resource state, absorbing the T gate into measurement of the non-Clifford A(\pi/4)=T^\dag X T observable. Substituting this sub-circuit in place of all T gates within a Clifford + T circuit converts it to a form comprising stabiliser state preparation followed by non-Clifford measurements, equivalently a graph state followed by non-Clifford measurements.

Since stabiliser states are locally equivalent to graph states, this decomposition allows universal quantum circuits to be directly compiled to graph states, where computation proceeds via non-Clifford measurements.

Clifford + T decomposition of the non-Clifford Toffoli gate.

Algorithm-specific graph states capture the entanglement structure of algorithms much more naturally than an etched circuit would. However, the resultant graphs are not unique, as their orbits are computationally equivalent.

Compilation of a Toffoli gate acting on the |+,+,0\rangle input to a graph state, where input and output qubits are shown in green and blue, and all qubits except the outputs are measured in the A(\pm\pi/4) basis. [Figure thanks to Vijayan et al. (2022)]

Graph optimisation therefore becomes an important next stage in the compilation pipeline. The optimal graph for implementing a given algorithm has many variables and is subject to architectural constraints, but identifying them generally yields computationally complex optimisation problems associated with the complexity of exploring graph orbits.

Preparing graph states

The commutativity of CZ gates implies they are order-independent. In the context of graph states, this means edges needn’t be built up sequentially but can be parallelised, and divide-and-conquer strategies may be employed to build up large graph states from smaller subgraphs that may be prepared independently and in parallel.

This becomes especially useful when entangling gates are non-deterministic, as is often the case in many physical architectures and necessarily the case when dealing with photonic implementation.

Consider a linear graph state with N qubits. One might think that preparing this state requires successfully performing N-1 CZ gates in succession, meaning that if the gates are non-deterministic with success probability p_\mathrm{gate}, the overall state preparation probability would be p_\mathrm{gate}^{N-1}, which is exponentially decreasing and therefore inefficient.

However, this is not the correct intuition for graph states, which are generally not maximally entangled and from the Pauli measurement properties, enable a state to be rescued via surgical operations when local defects occur.

Rather than building the linear graph in a single shot, let us consider the following scenario, taken from Rohde & Barrett (2007). We have a main linear graph of length N and a resource of small linear graphs of length M, which are prepared offline. We are assuming the availability of quantum memory, or qubits with inherently long decoherence times, to enable this protocol.

We will employ a CZ gate that succeeds with probability p_\mathrm{gate}, and upon failure, destroys the qubits it acts upon, effectively tracing them out of the system.

Upon applying the CZ gate between the two graphs of length N and M, with probability p_\mathrm{gate}, the main graph grows in length to N+M. Upon failure, with probability 1-p_\mathrm{gate}, the end qubit is destroyed, and we recover the remainder of the graph by measuring out its neighbour in the Z basis, shrinking it in length to N-2.

Applying this process repeatedly, the length of our main graph proceeds as a random walk, probabilistically growing or shrinking. The figure of merit is the average length by which the graph grows or contracts. In our case, this can easily be calculated to be,

\langle\Delta\rangle = p_\mathrm{gate}M - 2(1-p_\mathrm{gate}).

If the graph is to grow on average, we require \langle\Delta\rangle > 0, which implies,


A CZ gate operating with 50% probability would only require M=3 resource states to ensure growth on average.

So long as the resource states we employ are at least length M, our main graph is guaranteed to grow on average, with length growing linearly in the number of iterations t,

\langle N(t)\rangle = N(0) + \langle\Delta\rangle\cdot t.

When bonding a primary linear graph state of length N with a smaller resource state of length M using a non-deterministic CZ gate with success probability p_\mathrm{gate}, the primary graph either: grows in length to N+M with probability p_\mathrm{gate}; or, after recovery using a Z measurement to detach the destroyed qubit, shrinks in length to N-2 with probability 1-p_\mathrm{gate}. Repeating this process, the primary graph’s length evolves as a random walk. If the resource states have length M>2(1-p_\mathrm{gate})/p_\mathrm{gate}, the primary graph will grow on average, increasing linearly with time.

While the above argument is for linear graph states, linear graphs alone are not computationally useful. For MBQC, we require at least a two-dimensional topology, such as a lattice. It was first shown by Nielsen (2004) how graph structure can be exploited to enable efficient preparation of arbitrarily large lattice graph states using non-deterministic gates by introducing the concept of micro-clusters. Micro-clusters are small subgraphs of a larger graph state with multiple dangling nodes, each providing an independent bonding opportunity. Micro-clusters, therefore, provide redundancy in bonding attempts should they sometimes fail.

Preparing a 2\times 2 lattice graph state using four micro-clusters. Each micro-cluster comprises a central vertex (green), to belong to the final lattice graph. The dangling bonds (blue) provide multiple opportunities to attempt bonding with neighbouring micro-clusters. If a CZ bonding attempt fails (red), the respective dangling bonds are removed. Upon success (green), the respective micro-clusters are connected, albeit with some intermittent vertices. Once bonds exist in each direction, redundant vertices are measured out in the Pauli Y basis, contracting the graph down to the desired 2\times 2 lattice.

How large do micro-clusters need to be? The probability of a micro-cluster failing to bond with another drops exponentially with the number of available attempts, M,

p_\mathrm{fail} = (1-p_\mathrm{gate})^M,

which implies the number of available bonds must scale as,


which is only logarithmic and highly efficient. Achieving 99% bonding probability using gates with 50% success probability requires only M=7 available bonds.

This observation affords enormous resource savings in the context of linear optics quantum computing (LOQC), where micro-clusters were first introduced. Resource overheads associated with the non-determinism of entangling gates in LOQC using the initial circuit-based construction described by Knill, Laflamme & Milburn (2001) (aka KLM) were ‘efficient’ in the computer scientist’s sense of exhibiting polynomial scaling but not in a practical sense as the polynomials were unpleasant ones.

Percolation theory

An edge-percolated lattice for p_\mathrm{delete}=0.7, 0.4 and 0.1. Asking whether a route exists across the graph, there will exist a percolation threshold probability, p_c, above which route existence is unlikely, below which it is likely. The middle figure corresponds to the associated threshold probability, where on average, a path is likely to exist across the graph, but if the deletion rate were increased would quickly lose connectivity and break into disconnected islands, as per the left figure. [Figure thanks to Browne et al. (2008)]

Percolation theory studies the connectedness of percolated graphs (ones with random defects) as a function of defect rate, p_\mathrm{delete}. Percolations could apply to either edges or vertices, and the measure of connectedness could be defined in many ways, commonly whether routes exist spanning the graph.

Taking an edge- or vertex-percolated lattice in the limit of large lattice dimension, L, and considering the probability of connectedness as a function of percolation probability, p_\mathrm{delete}, percolation theory predicts phase-transitions in graph connectivity at some percolation threshold, p_c.

Probability of the existence of a spanning cluster across a lattice of dimension L against edge deletion probability p_\mathrm{delete}. In the limit of infinite lattice dimension, L\to\infty, the curve approaches a step function, marking the percolation threshold, p_c, of a phase-transition in graph connectivity. [Figure thanks to Wei, Affleck & Raussendorf (2012)]

Consider a lattice of micro-clusters. As a function of their size and gate success probability, there exists an associated probability of edge deletion in the subsequent reduced substrate graph.

The relationship between lattice dimension and computational power implies one between percolation probability and computational power [Browne et al. (2008); Pant et al. (2019); Wei, Affleck & Raussendorf (2012)]. Since the likelihood of finding paths through a graph is a function of the percolation rate, so is the expected density, hence dimension, of a lattice reduced from the available paths.

A percolated lattice graph state with missing qubits (i.e. vertex percolations — one could likewise consider edge percolations), where squares are qubits, yellow if present.

A route-finding algorithm finds sets of edge-disjoint left-to-right (red) and top-to-bottom (green) paths. The path density is a function of the percolation rate.

The path set is reduced to a regularised substrate, here an alternating bridge decomposition, a resource for universal MBQC.

All qubits not belonging to our path-set are measured out in the Z basis, while paths are contracted using Y measurements, reducing the graph to a regularised substrate of some dimension, which directly relates to its computational power.

This yields a relationship between computational power and percolation probability, where we observe computational phase-transitions for large graph dimension, L\to\infty. Above threshold, p_\mathrm{delete}>p_c, we rapidly lose computational power as the graph disconnects into islands and spanning paths cease to exist. The percolation threshold is an important parameter for engineering scalable architectures using this approach.

Since the dimension of the residual lattice is a function of the percolation rate, in turn a function of micro-cluster size and gate success probability, there exists a trade-off between the resources invested into micro-cluster size and computational return, a point of optimisation from an engineering perspective.

Entangling operations

Although by definition, the edges in graph states represent the action of CZ gates, these are not the only gates capable of growing graph states. Depending on the physical implementation, various other approaches to creating graph edges exist.

In photonic quantum computing, fusion gates [Browne & Rudolph (2005)] can be used to create connections in graphs. Fusion gates are partial Bell measurements, implemented using polarising beamsplitters (see my previous post, “How do photonic Bell measurements work?”). While fusion gates create new edges in a graph state, they are also destructive, meaning they destroy the two qubits they operate on. This implies some resource overhead, although this overhead is far more favourable than that imposed by optical CZ gates, which are highly complex and resource intensive to implement.

The type-I and -II fusion gates enable the creation of edges in graph states comprising polarisation-encoded qubits. Unlike CZ gates, fusion gates are destructive, consuming one (for type-I) or both (for type-II) of the qubits they operate on, creating edges within their neighbourhood that differ from the CZ edge-creation rule, but are nonetheless sufficient for engineering arbitrarily large graph states. These gates are non-deterministic with a 50% success probability.

In atomic systems with suitable level structure, where energy levels couple to optical modes, a beamsplitter followed by photo-detection implements which-path erasure on photons emitted via relaxation processes, projecting the two atoms into a symmetric superposition of one being excited state and the other relaxed, equivalent to a Bell projection.

Error propagation

As with quantum circuits, graph states are subject to various error models. How do these propagate through graph states?

Since quantum information flows through graph states via teleportation, so will errors. Consider the single-qubit teleportation circuit, where the local correction applied to the second qubit depends on the measurement outcome of the first. If a Pauli error was introduced onto the first qubit, flipping its measurement outcome, m, this would subsequently manifest itself as a flipped local correction on the second qubit, thereby propagating the error.

In general, as qubits are measured out of a graph state, errors acting upon them are teleported onto neighbouring qubits and accumulate. In the context of micro-cluster-based approaches for tolerating gate failure, the dangling bonds facilitating multiple bonding attempts, which must subsequently be removed, imposes a trade-off between errors. While more dangling bonds implies greater tolerance against gate failure, it simultaneously implies lower tolerance against Pauli errors, which accumulate whenever a redundant bond is removed via measurement.

It is desirable to avoid unnecessarily preparing and subsequently measuring out redundant qubits, and the overall design of a measurement-based architecture must carefully evaluate the inherent tradeoffs between different error types.

Although lattice graphs are universal for MBQC, as arbitrary circuit structures can be etched into them, in addition to being wasteful, this results in unnecessary accumulation of errors.

Quantum error correction

Graph states are not inherently error-protected, and quantum error correction (QEC) is required as with any other model of quantum computation. Thankfully, this does not require abandoning the inherent elegance of graph states as there are graph-native QEC codes. In particular, topological codes naturally lend themselves to graph-based implementation [Raussendorf, Harrington & Goyal (2006)].

An example is the surface code [Kitaev (2003)], defined relative to a square lattice graph. Although the way surface code stabilisers are defined is distinct from graph states, a direct mapping exists between them under appropriate graph transformations [Bravyi & Raussendorf (2007)].

The surface code is defined relative to a graph, albeit with a different convention in defining stabilisers. Qubits are associated with graph edges, and two types of stabilisers define the state: every square (blue) is associated with a plaquette operator acting on its four constituent qubits, S_\square = X^{\otimes 4}; and, every star (red) with a star operator, S_+ = Z^{\otimes 4}. The property that products of stabilisers are also stabilisers lends itself to elegant geometric interpretations. For example, taking the product of a set of neighbouring S_\square operators implies a chain of X operators around the boundary of a closed region of the surface is also a stabiliser.

Such graph-based QEC codes connect us with the field of topology, where the manifestation of errors and implementation of logical operations may be defined in terms of topological invariants — properties that remain invariant under continuous deformation, from which they inherit their robustness.

For example, continuously deforming a non-trivial loop around a torus (see figure below) preserves its topological characterisation. Associating logical operators with operator strings around a loop bestows redundancy in how the logical operator can be applied. Should defects rule out a particular path, another topologically equivalent one can be chosen, allowing defects to be bypassed.

Imposing periodic boundary conditions on the surface code yields the toric code, defined over the surface of a torus. Here logical X (blue) and Z (red) operators are associated with chains of respective Pauli operators acting on the qubits around topologically distinct closed loops. The topological characterisation of these loops is invariant under continuous deformation, as are the logical operations associated with them. Note that the red and blue chains are topologically distinct and cannot be continuously deformed into one another. In contrast, the two blue loops are topologically equivalent and correspond to the same logical operator. Conceptually, the error tolerance of the toric code stems from the robustness of these topological invariants against defects, scaling with surface dimension.

Fusion-based quantum computing

Fusion-based quantum computing [Bartolucci et al. (2021)] is a scalable, fault-tolerant, photonic architecture (although also applicable to other physical architectures), integrating many of the concepts we have discussed. This scheme admits more elaborate resource states with small QEC codes built into them, such that some qubits act as parity checks when measured.

(a) A hexagonal resource state can act as a unit cell in a three-dimensional fusion network. (b) A graph substitution rule allows each qubit to be encoded into a (2,2)-Shor code. (c) An error-protected hexagonal resource state, obtained by substituting (b) into (a). [Figure thanks to Bartolucci et al. (2021)]

Fusion gates (non-deterministic, destructive Bell measurements), implemented using polarising beamsplitters (discussed earlier), enable fusion between unit cells of encoded resource states into a three-dimensional fusion network with random defects associated with unsuccessful fusions.

Fusion measurement outcomes reveal a percolated three-dimensional lattice with average case connectivity sufficient to post-select a substructure supporting a scalable topological code in addition to the syndrome outcomes for the associated code.

A three-dimensional fusion network using 4-star resource states. Fusion operations are non-deterministic, and the fused structure will be percolated. With sufficient average-case connectivity, substructures can be post-selected that encode a measurement-based quantum computation into a topological code whose syndrome measurements are revealed by fusion outcomes. [Figure thanks to Bartolucci et al. (2021)]

Quantum communications networks

The goal of quantum communications networks is usually to distribute long-range Bell pairs, equivalently two-qubit graph states. Bell pairs act as a universal resource for quantum communication as they enable quantum state teleportation.

Since long-range quantum communication is generally very lossy, with attenuation increasing exponentially with distance, quantum repeater networks (or entanglement distribution networks) use divide-and-conquer techniques to subdivide long-range links into multiple short-range ones, which are iteratively merged using entanglement swapping, enabling the exponential scaling of loss with distance to be overcome and replaced with polynomial scaling.

An efficient divide-and-conquer approach for long-range entanglement distribution in the presence of lossy channels. Execution flows from bottom to top; blue nodes represent Bell pairs, and ones with child nodes are prepared by entanglement swapping them, reducing two shorter-range Bell pairs into one longer-range one. It is assumed that quantum memory is available, such that a parent node can await both its children and branches can execute in parallel. Runtime is exponential in the depth of the tree since the root node, representing the final long-range Bell pair, requires all child nodes to succeed. However, since the binary tree execution order has only logarithmic depth in the number of initial short-range Bell pairs, this directly counters the exponential, resulting in net polynomial scaling in both time and Bell pair consumption.

This can also be conceptualised in terms of graph states. Based upon the same intuition for efficiently growing arbitrarily large graph states using non-deterministic gates, graph states can similarly be employed for efficient entanglement distribution in the presence of lossy channels, allowing the exponential attenuation of single-shot transmission to be overcome and replaced with efficient polynomial scaling.

Our graph state tools for dealing with gate failure (equivalently loss) can be adapted for this purpose. Using micro-cluster-type concepts, redundant bonds counter channel loss by facilitating multiple bonding attempts, where the failure of a single attempt does not compromise the remaining state.

The complete-like micro-cluster graph, |\bar{G}_c^m\rangle, here with m=4. (blue) The complete graph, K_{2m}, has an edge between every pair of the 2m vertices. Deleting all but two leaves us with a K_2 graph, a Bell pair between the remaining two qubits. The measurements determining which two are chosen can be made at any stage and deferred. Transmitting the left and right halves of the dangling bonds (yellow) to different parties via lossy channels enables m transmission attempts in each direction. If at least one qubit from each half reaches its destination, appropriate measurements on the K_{2m} subgraph retrospectively routes them together. The likelihood of complete failure decreases exponentially with m.

Applying this iteratively, the goal is to engineer a distributed graph state, which can subsequently be reduced to a long-range Bell pair upon measuring out redundant, intermittent qubits.

A pure graph state-based quantum repeater for long-range entanglement distribution over lossy channels [Azuma, Tamaki & Lo (2015)]. Source nodes, C^s, distribute routing substrate graphs (grey, as per previous figure) with m-fold redundancy (here m=3), |\bar{G}_c^m\rangle, which receiver nodes, C^r, fuse together using probabilistic Bell measurements. The m-fold bonding redundancy facilitated by the |\bar{G}_c^m\rangle states enables non-determinism associated with channel loss and gate failure to be asymptotically suppressed with m, allowing efficient preparation of a distributed graph state, which can subsequently be reduced to a long-range Bell pair. This is a direct graph-theoretic generalisation of an ordinary Bell pair entanglement swapping network, which this reduces to for m=1. [Figure thanks to Azuma, Tamaki & Lo (2015)]

In addition to their utility in efficiently overcoming loss in quantum repeater networks, distributed graph states act as a versatile resource for entanglement distribution. Assuming nodes in the network are cooperative and classically communicating, we have the tools to etch out subgraphs (using Z measurements) and contract them down to involve only the desired parties (using X and Y measurements).

Of particular interest are Bell pairs. Our graph transformation rules for Pauli measurements guarantee that any connected graph may be reduced to a Bell pair between any pair of nodes.

Entanglement routing by reducing a large, distributed graph state to a Bell pair between Alice (A) and Bob (B). First, we identify a path between Alice and Bob. All nodes neighbouring the path measure their qubits in the Z basis, detaching the path from the remainder of the graph. All nodes belonging to the path (except Alice and Bob) measure their qubits in the Y basis, contracting it to a two-qubit graph state (i.e. a Bell pair) between Alice and Bob.

Other states of interest might also be available depending on the initial graph’s topology. GHZ states are of interest as they are maximally entangled states that facilitate various multiparty quantum protocols such as open-destination quantum state teleportation and quantum anonymous broadcasting [Christandl & Wehner (2005)]. These can be realised by etching out star graphs. In the special case of 3-qubit GHZ states, any connected 3-qubit graph state, of which there are only two (star and fully connected), is locally equivalent to a GHZ state, and any connected graph with at least three qubits can necessarily be reduced to these using Pauli measurements.

However, we are not limited to measurements when manipulating distributed graph states. We can also play local complementation games, which can be exploited in graph-state-based entanglement routing [Hahn, Pappa & Eisert (2019)].

Distributed quantum computing

In a futuristic scenario where large-scale quantum computers and quantum communications networks are readily available, there is a strong incentive to unify geographically dispersed quantum computational resources. Given that the power of quantum computers can, depending on the application, grow exponentially with their number of logical qubits, unifying quantum computers to act as one effectively multiplies their power, which would only accumulate additively in the absence of unification, as per classical computers.

Graph states provide an elegant framework for conceptualising how to architecturally achieve this unification.

The dimensions of the underlying lattice graph dictate the size and power of a MBQC. In the earlier example, the width of the lattice equated with circuit depth and height with the number of logical qubits involved in the computation.

Suppose two independent quantum computers individually had the capacity for N\times N lattice graphs. Rather than use them separately, with quantum communication at our disposal, enabling the creation of long-range links, our two units could stitch their lattices together in a patchwork manner, providing a distributed 2N\times N lattice, affording twice as many logical qubits or twice the circuit depth, an idea that logically generalises to any topology or number of nodes.

Unifying two geographically dispersed graph states into a larger distributed graph state using long-range Bell pairs provided by an entanglement distribution network. The increased dimension of the unified graph facilitates a larger MBQC in terms of circuit depth or logical qubit count, depending on orientation. The idea logically generalises in an obvious way, enabling an arbitrarily large distributed graph to be prepared by patchwork. [Figure thanks to Leone et al. (2021)]


Graph states are often equated with measurement-based quantum computing and seen merely as a different way of achieving the same thing. However, graph states provide far more than a direct equivalence with other models of quantum computation, like the circuit model. They provide an alternate framework for conceptualising the flow and interaction of quantum information, enabling unique insights that would not come naturally from the confines of other models. These insights have been of enormous value beyond computation, finding widespread utility in other quantum information processing protocols.

The geometric abstractions graph states provide have facilitated significant advances in solving practical problems, without which a quantum future would be far less likely. In photonic quantum computing, for example, implementation via the circuit model imposes formidable — effectively prohibitive — resource overheads, for which the graph state model affords natural solutions.

Graph states have enabled quantum information processing to be united with otherwise disparate fields of mathematics, providing direct connections with graph theory, percolation theory and topology. Graph theory provides powerful tools for conceptualising the flow of quantum information and associating quantum operations with graph transformations. The connection with the field of percolation theory offers insight into the relationship between errors and quantum computational power. Graph theoretical approaches towards quantum error correction intimately connect with the field of topology, in which codes and operations can be abstracted in terms of topological invariants.

As quantum information scientists, we attempt to understand things inherently too complex to contemplate directly and must rely on different types and levels of abstraction. Problems can generally be approached from different angles, and looking at the same problem through a different lens can yield new insights and a deeper understanding. Having a broad toolkit of different angles of interpretation for viewing problems is essential. Graph states are an incredibly powerful one, enabling us to find solutions to problems that might otherwise not have been found.


Thank you very much to the authors allowing figures from their work to be reproduced in this post, acknowledged individually in figure captions. Thanks to Dan Browne for helpful feedback.

The post An introduction to graph states appeared first on Peter Rohde.

Peter Rohde How do photonic Bell measurements work?

Entangling Bell measurements are an essential ingredient in many photonic quantum technologies. In optical quantum computing they are employed as fusion gates to create edges in graph states, while in quantum communications protocols they may be used to implement entanglement swapping in quantum repeater networks (entanglement distribution networks) for extending the range of entanglement links.

In this post I’ll describe how this very simple optical circuit works and address some of the nuances and common misconceptions surrounding it.

What is a Bell measurement?

A Bell measurement is a two-qubit operation that projects onto the maximally-entangled Bell basis comprising the four Bell states,

|\Phi^\pm\rangle_L = \frac{1}{\sqrt{2}}(|0,0\rangle_L \pm |1,1\rangle_L),
|\Psi^\pm\rangle_L = \frac{1}{\sqrt{2}}(|0,1\rangle_L \pm |1,0\rangle_L).

Here I’m using subscript L to denote logical qubit states. States represented without a subscript will denote Fock (or photon-number) states in an occupation number representation, where |n\rangle denotes an n-photon state.

While there are many ways in which entangling measurements can be implemented photonically, I’ll focus on by far the simplest, most well-known and widely employed implementation shown below.

The partial Bell analyser for polarisation-encoded qubits, comprising a polarising beamsplitter, two waveplates implementing 45° polarisation rotations, and two polarisation-resolving photodetectors. When one photon is detected at each output (a coincidence event) we perform a partial Bell measurement.

This circuit implements a partial, destructive and non-deterministic Bell measurement. It is partial in the sense that it can only resolve two of the four Bell states. Otherwise it fails, implying non-determinism. And it is destructive in the sense that the measured qubits are destroyed by the measurement process.

The measurement projector implemented by this device is,

\hat\Pi^\pm_L =|\Phi^\pm\rangle_L\langle\Phi^\pm|_L,

a coherent projection onto one of the two even parity Bell pairs.

Bell measurements can also be implemented using CNOT gates, in which case all four Bell states can be non-destructively resolved. However, CNOT gates are notoriously difficult to construct in an optical setting, are non-deterministic, and have significant resource overheads.


A regular beamsplitter implements a 2\times 2 unitary transformation on the photon creation operators associated with two spatial modes, which we will denote \hat{a}^\dag_1 and \hat{a}^\dag_2,

\begin{bmatrix} \hat{a}^\dag_1 \\ \hat{a}^\dag_2\end{bmatrix} \to \begin{bmatrix} U_{1,1} & U_{1,2} \\ U_{2,1} & U_{2,2}\end{bmatrix} \begin{bmatrix} \hat{a}^\dag_1 \\ \hat{a}^\dag_2\end{bmatrix}.

Here we’re modeling evolution in the Heisenberg picture, representing state evolution via transformations on the photon creation operators acting on the vacuum state. This is the most convenient approach since all the operations we consider are represented by linear transformations of creation operators, hence the term linear optics.

For a balanced 50/50 beamsplitter we have,

U = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ 1 & -1\end{bmatrix},

which is recognisable as the 2\times 2 Hadamard matrix.

This is an entangling operation as it can easily be seen that the state,

|1,0\rangle = \hat{a}^\dag_1|vac\rangle,

is evolved to,

\frac{1}{\sqrt{2}}(\hat{a}^\dag_1 + \hat{a}^\dag_2)|vac\rangle = \frac{1}{\sqrt{2}}(|1,0\rangle + |0,1\rangle),

a Bell state encoded as a superposition of a single particle across two orthogonal modes.

A single photon incident upon a regular beamsplitter creates an entangled output state of a superposition of a single photon across two spatial modes.

Polarisation rotations

A polarisation rotation, usually implemented using waveplates in experiments, implements exactly the same transformation in the polarisation degree of freedom,

\begin{bmatrix} \hat{h}^\dag \\ \hat{v}^\dag \end{bmatrix} \to\begin{bmatrix} U_{1,1} & U_{1,2} \\ U_{2,1} & U_{2,2}\end{bmatrix} \begin{bmatrix} \hat{h}^\dag \\ \hat{v}^\dag \end{bmatrix},

where \hat{h}^\dag and \hat{v}^\dag denote creation operators associated with horizontal and vertical polarisation.

Hence an input state,

\hat{h}^\dag|vac\rangle= |1\rangle_H|0\rangle_V,

is evolved by the Hadamard matrix to,

\frac{1}{\sqrt{2}}(\hat{h}^\dag + \hat{v}^\dag)|vac\rangle = \frac{1}{\sqrt{2}}(|1\rangle_H|0\rangle_V + |0\rangle_H|1\rangle_V).

Indeed, beamsplitters and polarisation rotations are isomorphic operations, implementing identical optical transformations, differing only in which pair of modes they operate on.

Hong-Ou-Mandel interference

Hong-Ou-Mandel (HOM) interference is a famous interferometric experiment in which a 50/50 beamsplitter interferes two photons, one incident upon each beamsplitter input.

In Hong-Ou-Mandel (HOM) interference, a balanced 50/50 beamsplitter with a single photon incident at each input mode creates an equal superposition of both photons in one mode or the other at the output, known as photon bunching. These measurement statistics are uniquely quantum. In the equivalent classical experiment where each photon has 50% probability of reaching each output we would observe anti-bunched events with 50% probability.

Using the 50/50 beamsplitter transformation, an initial state with a single photon at each input,

\hat{a}^\dag_1 \hat{a}^\dag_2 |vac\rangle = |1,1\rangle,

transforms to,

\frac{1}{2}(\hat{a}^\dag_1 + \hat{a}^\dag_2)(\hat{a}^\dag_1 - \hat{a}^\dag_2)|vac\rangle = \frac{1}{\sqrt{2}}(|2,0\rangle - |0,2\rangle),

a superposition of two photons in one spatial output or two in the other. Note there is no |1,1\rangle term, as these have cancelled via destructive interference. This phenomenon is called photon bunching, as the photons ‘bunch’ together and never appear at different outputs, a uniquely quantum effect. Contrast this with classical statistics where we would expect to see anti-bunching (one particle at each output) 50% of the time.

We can replicate the same phenomenon using polarisation encoding by commencing with a two-photon state, where one is horizontally polarised, the other vertically,

\hat{h}^\dag \hat{v}^\dag |vac\rangle = |1\rangle_H|1\rangle_V,

which transforms to,

\frac{1}{2}(\hat{h}^\dag + \hat{v}^\dag)(\hat{h}^\dag - \hat{v}^\dag)|vac\rangle = \frac{1}{\sqrt{2}}(|2\rangle_H |0\rangle_V - |0\rangle_H|2\rangle_V).

Polarising beamsplitters

Evolution of the four polarisation-encoded basis states through a polarising beamsplitter, which reflects horizontal (H) polarisation and transmits vertical (V) polarisation. When both photons have the same polarisation (even parity) we observe anti-bunching (or coincidence) events, whereas when polarisations differ (odd parity) we observe bunching. Post-selecting upon coincidence events projects us into the even parity subspace.

A polarising beamsplitter (PBS) operates very differently than a regular beamsplitter, acting on two spatial degrees of freedom, each of which is associated with two polarisation degrees of freedom, making it a four-mode transformation. Most commonly, PBS’s completely reflect one polarisation (say H) while completely transmitting the other (V), in which case the 4\times 4 transformation is,

\begin{bmatrix} \hat{h}_1^\dag \\ \hat{h}_2^\dag \\ \hat{v}^\dag_1 \\ \hat{v}^\dag_2 \end{bmatrix} \to \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} \hat{h}_1^\dag \\ \hat{h}_2^\dag \\ \hat{v}^\dag_1 \\ \hat{v}^\dag_2 \end{bmatrix}.

It can be seen that this operation simply permutes modes, leaving the \hat{h}^\dag_1 and \hat{h}^\dag_2 operators unchanged, whilst swapping the \hat{v}^\dag_1 and \hat{v}^\dag_2 operators.

Beginning with any initially separable state in the original H/V basis, this operation preserves separability and cannot introduce entanglement, nor does any interference take place. Note that while this 4\times 4 matrix corresponds to that of a CNOT gate, this is not a CNOT operation as this matrix describes a transformation on creation operators not qubits.

Single-photon qubits

In the field of photonic quantum computing, qubits are most commonly encoded in one of two ways: dual-rail encoding and polarisation encoding. In dual-rail encoding we encode a qubit as a single photon in superposition across two distinct spatial modes. In polarisation encoding we encode a single photon in superposition across two polarisation states.

Using these two encodings, a single logical qubit,

|\psi\rangle_L = \alpha|0\rangle_L + \beta|1\rangle_L,

can be written as,

|\psi\rangle_\mathrm{dual-rail} = \alpha|1,0\rangle + \beta|0,1\rangle,
|\psi\rangle_\mathrm{polarisation} = \alpha |1\rangle_H|0\rangle_V + \beta|0\rangle_H|1\rangle_V.

Using photonic creation operators we can equivalently express these as,

|\psi\rangle_\mathrm{dual-rail} = (\alpha \hat{a}_1^\dag + \beta \hat{a}_2^\dag)|vac\rangle,
|\psi\rangle_\mathrm{polarisation} = (\alpha \hat{h}^\dag + \beta \hat{v}^\dag)|vac\rangle.

Note that in an occupation number representation both of these can be expressed,

|\psi\rangle = \alpha |1,0\rangle + \beta|0,1\rangle,

where for dual-rail encoding the two modes are spatial modes, while for polarisation encoding they refer to the two polarisation modes.

There also exists single-rail encoding, whereby a qubit is encoded in a single mode as a superposition of 0 or 1 photons. The Bell state created previously at the output of a 50/50 beamsplitter fed with a single photon input is an example of single-rail encoding. However this type of encoding has limited utility as implementing operations on single-rail qubits is highly impractical. Since the two logical basis states have different photon-number, hence energy, single-qubit gates require coherently manipulating a superposition of different amounts of energy.

In the \{|1,0\rangle,|0,1\rangle\} occupation number basis the beamsplitter and polarisation rotation operations both implement the transformations,

\begin{bmatrix} |1,0\rangle \\ |0,1\rangle \end{bmatrix} \to \begin{bmatrix} U_{1,1} & U_{1,2} \\ U_{2,1} & U_{2,2} \end{bmatrix} \begin{bmatrix} |1,0\rangle \\ |0,1\rangle \end{bmatrix},

in their respective degrees of freedom. Defining the logical basis states of a single qubit as,

|0\rangle_L \cong |1,0\rangle,
|1\rangle_L \cong |0,1\rangle,

we see that the beamsplitter and polarisation rotation operations implement 2\times 2 single-qubit unitary transformations.

So while beamsplitters and polarisation rotations are entangling operations on two optical modes, they represent single-qubit (hence non-entangling) operations when acting on qubits defined over the single-photon symmetric subspace of two modes. We refer to this as a symmetric subspace since the qubit space is invariant under permutations of the constituent optical modes. That is, any permutation of the optical modes, of which there are two (identity or swap), leaves the basis \{|1,0\rangle,|0,1\rangle\} unchanged.

Partial Bell measurements

Consider two arbitrary multi-qubit systems, |\psi\rangle and |\phi\rangle. Applying a Schmidt decomposition to both systems, separating out one polarisation-encoded qubit from each, which we will subsequently perform Bell measurement on,

|\psi\rangle = \alpha_0 |\psi_0\rangle|H\rangle + \alpha_1 |\psi_1\rangle|V\rangle \\|\phi\rangle = \beta_0 |\phi_0\rangle|H\rangle + \beta_1 |\phi_1\rangle|V\rangle.

Expanding this out and expressing the isolated qubits in terms of creation operators we have,

(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle \hat{h}^\dag_1 \hat{h}^\dag_2 + \alpha_0\beta_1 |\psi_0\rangle |\phi_1\rangle \hat{h}^\dag_1 \hat{v}^\dag_2 \\+ \alpha_1\beta_0 |\psi_1\rangle |\phi_0\rangle \hat{v}^\dag_1 \hat{h}^\dag_2 + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle \hat{v}^\dag_1 \hat{v}^\dag_2)|vac\rangle.

Evolving this through the PBS we obtain,

(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle \hat{h}^\dag_1 \hat{h}^\dag_2 + \alpha_0\beta_1 |\psi_0\rangle |\phi_1\rangle \hat{h}^\dag_1 \hat{v}^\dag_1 \\+ \alpha_1\beta_0 |\psi_1\rangle |\phi_0\rangle \hat{v}^\dag_2 \hat{h}^\dag_2 + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle \hat{v}^\dag_2 \hat{v}^\dag_1)|vac\rangle.

Considering only the coincidence terms we post-select upon where each spatial output has exactly one photon this reduces to,

(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle \hat{h}^\dag_1 \hat{h}^\dag_2 + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle \hat{v}^\dag_2 \hat{v}^\dag_1)|vac\rangle.

From here if we measure the two qubits in the H/V polarisation basis, we will collapse onto either,

\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle,


\alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle,

depending on whether we measure H/H or V/V.

However, what we really want is a coherent projection onto both of these terms. If instead of measuring in the H/V basis we measure in the diagonal (|\pm\rangle_L=(|0\rangle_L \pm|1\rangle_L)/\sqrt{2}) basis we achieve this. The polarisation rotations prior to the photodetectors switch us into the diagonal basis. In qubit space, the balanced 50/50 beamsplitter transformation corresponds to a Hadamard gate, as does a 45° polarisation rotation, which effectively transforms the subsequent measurement from the computational \hat{Z} basis to the diagonal \hat{X} basis.

Applying the polarisation rotation we obtain,

\frac{1}{2}[\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle (\hat{h}^\dag_1 + \hat{v}^\dag_1) (\hat{h}^\dag_2 + \hat{v}^\dag_2) \\+ \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle (\hat{h}^\dag_1 - \hat{v}^\dag_1) (\hat{h}^\dag_2- \hat{v}^\dag_2)]|vac\rangle.

Expanding and regrouping this expression according to the different possible measurement outcomes we can write this as,

\frac{1}{2}[(\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{h}^\dag_1 \hat{h}^\dag_2 \\+ (\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{v}^\dag_1 \hat{v}^\dag_2 \\+ (\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle - \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{h}^\dag_1 \hat{v}^\dag_2 \\+ (\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle - \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle) \hat{v}^\dag_1 \hat{h}^\dag_2]|vac\rangle.

Therefore, upon measuring either H/H or V/V we obtain,

\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle + \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle,

whereas if we measure H/V or V/H we obtain,

\alpha_0\beta_0 |\psi_0\rangle |\phi_0\rangle - \alpha_1\beta_1 |\psi_1\rangle |\phi_1\rangle,

which are the expected outcomes upon applying the,

\hat\Pi^\pm_L = |\Phi^\pm\rangle_L\langle\Phi^\pm|_L,


What happens if rather than measuring a coincidence event we measure both photons at one output? Referring to the previous figure we see that if the input state was \hat{h}^\dag_1\hat{v}_2^\dag|vac\rangle both photons exit the top-left output, while if the input state was \hat{v}^\dag_1\hat{h}_2^\dag|vac\rangle both photons exit the top-right output. This means that if we measure two photons at one output we know exactly what the polarisation of both inputs was. Therefore, when the device fails to project onto the even-parity subspace it performs a computational basis (\hat{Z}) measurement on both qubits.

Where does the entanglement come from?

The above calculation is completely legitimate, but it isn’t clear at all where the entanglement comes from in our entangling measurement. The PBS is a non-entangling operation, and both our inputs and the post-selected outputs are in the qubit basis, whereby polarisation rotations implement single-qubit operations. It sounds like everything involved is non-entangling?

The resolution to the paradox is found in the terms we post-selected away. The non-coincidence terms that we eliminated were of the form \hat{h}^\dag\hat{v}^\dag, one such term associated with each of the PBS outputs, which subsequently undergo polarisation rotation. These two-photon terms are not confined to qubit space and undergo HOM interference, creating highly entangled two-photon terms of the form \hat{h}^{\dag^2}-\hat{v}^{\dag^2}.

Bell measurement circuit for dual-rail encoded qubits. The mode-swapping operation in dual-rail encoding corresponds to a polarising beamsplitter in polarisation encoding, while the beamsplitters correspond to polarisation rotations.

So while our input states can be considered polarisation encoded qubits and the overall transformation implemented by the device is a two-qubit entangling gate, internally our states are not confined to qubit space and the polarisation rotations prior to the detectors cannot be strictly considered as single-qubit gates. Rather, they are highly entangling multi-photon operations on two optical modes.

Entanglement is always defined relative to a basis and a state which is entangled in one basis needn’t be entangled in another. The most obvious example is that a Bell state is entangled in the qubit basis but not entangled in the Bell basis and vice-versa. Here we’ve defined a qubit space as the single-photon subspace of a two-mode Fock space, where entangling operations in the latter define local operations in the former.

It is correct to say that our partial Bell analyser relies on Hong-Ou-Mandel interference. But it doesn’t take place in the polarising beamsplitter, it takes place within the waveplates.

Polarisation-resolving photodetectors

In our optical circuit we required polarisation-resolving photodetectors. In practise, photodetectors available to us in the laboratory don’t have the ability to do this directly – they only resolve photon-number. However, this can easily be overcome by utilising an additional PBS to spatially separate and independently detect a state’s polarisation components, as shown below.

Photo-detectors typically only measure photon-number but not polarisation. However, using a polarising beamsplitter we can spatially separate and independently detect the horizontal and vertical components of a polarisation-encoded qubit, thereby implementing polarisation-resolved detection.

So our original optical circuit, when experimentally implemented, will actually comprise three PBS’s and four photodetectors, and the full circuit will look like this.

Full experimental implementation for polarisation-encoded optical Bell measurement.

(Acknowledgement: Thank you to Felix Zilk for providing very helpful feedback on this post.)

The post How do photonic Bell measurements work? appeared first on Peter Rohde.

July 15, 2023

Jordan EllenbergGiants 15, Brewers 1

I like a close, hard-fought game as much as the next baseball fan, and I’ve seen a lot of those lately, but there is a peculiar and specific pleasure to the game in which the team you’re rooting for gets absolutely, relentlessly pummeled. It was a beautiful night on Friday, though chilly enough that they closed the roof at American Family Field. The Brewers were in their City Connect “Brew Crew” uniforms. We got there just as Christian Yelich was grounding into an RBI double play with the bases loaded. That was about as good as it got for Milwaukee. Freddy Peralta, starting for the Brewers, didn’t have it. The next reliever didn’t have it either. Ethan Small, brought up that morning from triple-A Nashville, didn’t have it, and by that time the game was out of reach and Craig Counsell just left Small up there on the hill to take his lumps and save the rest of the pen. The Brewers were booting balls, botching throws, just generally Bad News Bearsing it out there, and the crowd was, well, good-natured. Like I said, it was a beautiful night. Our guys were having a bad day and we were there for them.

Mike Brosseau moved over from first base to pitch the ninth and it was a real pleasure to see the Giants’ batters stymied at last, unable to adjust to the 68-mph fastball and the changeup that cruised in at 62. He got them 1-2-3. By that time a lot of fans had gone home. But we stayed through to the end. And you can see us pretty clearly, sitting along the third base line above the Giants dugout, in the broadcast.

Next visit to AmFam will be when the Orioles come to town. So I’m hoping to see the Brewers lose one more time this spring.

Jordan EllenbergSurprises of Spain

CJ and I took a father-son trip to Spain (or, depending on how you partition nations, to Spain and Catalonia.) A very enjoyable, nimble, seat-of-the-pants-planned trip. Just back yesterday.

Some things I found surprising about Spain:

  • Pollworkers for elections are chosen by lot from the population, the way juries are in the United States. If the pollworker assigned to a polling station doesn’t show up, the police can select an unexpecting voter and assign them to work the polls on the spot.
  • Crosswalks aren’t really at the intersection, but set back quite a ways from the intersection, maybe 10% of the way to the middle of the block. This seems like a good system!
  • I expected to enjoy seeing Harry Styles play the Estadi Olimpic in Barcelona but I was surprised by how much I enjoyed it.
  • I didn’t understand how much the Catholic Church is wound into the government there. To some extent this is lingering Francoism, to some extent just how Spain has been forever. There are religion classes in public schools, and when you do your taxes there’s a box you can mark that allocates 0.7% of your taxes to the Church. In the Granada Cathedral there was a free newsletter which turned out to be entirely devoted to convincing readers to mark the box.
  • Burrata, which I think of as Italian food, seems to be a standard menu item in Spain. I think it’s been fully incorporated and is now also Spanish food.
  • We went to a sports bar in Barcelona to watch Carlos Alcaraz play in the Wimbledon quarterfinal. We expected to be among a crowd of cheering fans but in fact the bar wasn’t even showing the match until we asked the bartender to put it on, and it took him quite a while to find the channel. Alcaraz is one of the biggest Spanish athletes in the world, so why is this? Some potential explanations: 1) Watching sports in bars isn’t popular in Spain (evidence; there weren’t very many sports bars listed!); 2) In Barcelona, Alcaraz is seen as Spanish-as-opposed-to-Catalonian; 3) Tennis just isn’t a popular spectator sport in Spain. I don’t know which it was!
  • Also, the sports bar was founded in 2008, is called “Obama Gastropub,” and has a… colonial British Africa theme? Like, khaki jungle gear and 1920 maps of Africa everywhere? Very weird.
  • During the brief period of anarchist rule in Barcelona before 1939, the radically anti-clerical Republicans dug up the bodies of priests and nuns from under the church and displayed the decayed corpses in the town square, as a way of falsifying the popular belief that the clergy lay undecomposed beneath the earth in preparation for their eventual bodily ascension.
  • CJ and I went to a bullfight in Madrid. I’m not sure what I expected — but I did not expect it to be as thoroughly sad as it was. I had taken the name to mean that a bullfight was a fight. But the actual name for this event in Spanish, corrida, makes no such promise, saying only that the bull will run. At the beginning, it does run, even jabs with its horns a little bit. But then the picadors wound the bull and the banderilleros drive barbs into its neck. At that point, the bull looks tired and confused. It is clearly not mad anymore. It would be happy to walk away and call the whole thing off. And then the matador, who at this point conveys no sense of being in danger at all, whose spangly uniform is not even mussed, drives a sword into the bull and the crowd sits and waits while the bull vomits a dribble of blood, starts to wobble, and then finally goes down to its knees and dies. And then everybody cheers. And then the horses drag the bull’s body across the ring and a couple of janitors sweep dirt over the bloody trail. I don’t know what I saw, but it wasn’t a fight. There were six bulls slated to be stabbed that night but we left after two.

July 09, 2023

John PreskillThe Book of Mark

Mark Srednicki doesn’t look like a high priest. He’s a professor of physics at the University of California, Santa Barbara (UCSB); and you’ll sooner find him in khakis than in sacred vestments. Humor suits his round face better than channeling divine wrath would; and I’ve never heard him speak in tongues—although, when an idea excites him, his hands rise to shoulder height of their own accord, as though halfway toward a priestly blessing. Mark belongs less on a ziggurat than in front of a chalkboard. Nevertheless, he called himself a high priest.

Specifically, Mark jokingly called himself a high priest of the eigenstate thermalization hypothesis, a framework for understanding how quantum many-body systems thermalize internally. The eigenstate thermalization hypothesis has an unfortunate number of syllables, so I’ll call it the ETH. The ETH illuminates closed quantum many-body systems, such as a clump of N ultracold atoms. The clump can begin in a pure product state | \psi(0) \rangle, then evolve under a chaotic1 Hamiltonian H. The time-t state | \psi(t) \rangle will remain pure; its von Neumann entropy will always vanish. Yet entropy grows according to the second law of thermodynamics. Breaking the second law amounts almost to a enacting a miracle, according to physicists. Does the clump of atoms deserve consideration for sainthood?

No—although the clump’s state remains pure, a small subsystem’s state does not. A subsystem consists of, for example, a few atoms. They’ll entangle with the other atoms, which serve as an effective environment. The entanglement will mix the few atoms’ state, whose von Neumann entropy will grow.

The ETH predicts this growth. The ETH is an ansatz about H and an operator O—say, an observable of the few-atom subsystem. We can represent O as a matrix relative to the energy eigenbasis. The matrix elements have a certain structure, if O and H satisfy the ETH. Suppose that the operators do and that H lacks degeneracies—that no two energy eigenvalues equal each other. We can prove that O thermalizes: Imagine measuring the expectation value \langle \psi(t) | O | \psi(t) \rangle at each of many instants t. Averaging over instants produces the time-averaged expectation value \overline{ \langle O \rangle_t }

Another average is the thermal average—the expectation value of O in the appropriate thermal state. If H conserves just itself,2 the appropriate thermal state is the canonical state, \rho_{\rm can} := e^{-\beta H}/ Z. The average energy \langle \psi(0) | H | \psi(0) \rangle defines the inverse temperature \beta, and Z normalizes the state. Hence the thermal average is \langle O \rangle_{\rm th}  :=  {\rm Tr} ( O \rho_{\rm can} )

The time average approximately equals the thermal average, according to the ETH: \overline{ \langle O \rangle_t }  =  \langle O \rangle_{\rm th} + O \big( N^{-1} \big). The correction is small in the total number N of atoms. Through the lens of O, the atoms thermalize internally. Local observables tend to satisfy the ETH, and we can easily observe only local observables. We therefore usually observe thermalization, consistently with the second law of thermodynamics.

I agree that Mark Srednicki deserves the title high priest of the ETH. He and Joshua Deutsch independently dreamed up the ETH in 1994 and 1991. Since numericists reexamined it in 2008, studies and applications of the ETH have exploded like a desert religion. Yet Mark had never encountered the question I posed about it in 2021. Next month’s blog post will share the good news about that question.


2Apart from trivial quantities, such as projectors onto eigenspaces of H.

July 02, 2023

Clifford JohnsonAnd so it begins…

There’s not much in this post, but I wanted to mark a significant date. It is the first day of the rest of 2023, but in addition, it is the beginning of a new chapter for me. Yesterday was my last day as an employee of the University of Southern … Click to continue reading this post

The post And so it begins… appeared first on Asymptotia.

June 29, 2023

Clifford JohnsonRattle and Hum

A lot of us have been waiting for a long time to hear this news! The NANOGrav collaboration has announced strong evidence of a background of low frequency gravitational waves emitted from supermassive black hole mergers. Their detection methods are pulsar timing arrays (still one of those fantastically simple, cool … Click to continue reading this post

The post Rattle and Hum appeared first on Asymptotia.