Planet Musings

April 26, 2024

n-Category Café Line Bundles on Complex Tori (Part 3)

You thought this series was dead. But it was only dormant!

In Part 1, I explained how the classification of holomorphic line bundles on a complex torus XX breaks into two parts:

  • the ‘discrete part’: their underlying topological line bundles are classified by elements of a free abelian group called the Néron–Severi group NS(X)\mathrm{NS}(X).

  • the ‘continuous part’: the holomorphic line bundles with a given underlying topological line bundle are classified by elements of a complex torus called the Jacobian Jac(X)\mathrm{Jac}(X).

In Part 2, I explained duality for complex tori, which is a spinoff of duality for complex vector spaces. I used this to give several concrete descriptions of the Néron–Severi group NS(X)NS(X).

But the fun for me lies in the examples. Today let’s actually compute a Néron–Severi group and begin seeing how it leads to this remarkable picture by Roice Nelson:

This is joint work with James Dolan.

The most interesting complex tori are the complex abelian varieties. These are not just complex manifolds: they’re projective varieties, so the ideas of algebraic geometry apply! To be precise, a complex abelian variety is an abelian group object in the category of smooth complex projective varieties.

If you want to learn the general theory, I recommend this:

  • Christina Birkenhake and Herbert Lange, Complex Abelian Varieties, Springer, Berlin, 2013.

It’s given me more pleasure than any book I’ve read for a long time. One reason is that it ties the theory nicely to ideas from physics, like the Heisenberg group and — without coming out and saying so — geometric quantization. Another is that abelian varieties are a charming, safe playground for beginners in algebraic geometry. You can easily compute things, classify things, and so on. It really amounts to linear algebra where all your vector spaces have lattices in them.

But instead of talking about general theorems, I’d like to look at an interesting example.

Every 1-dimensional complex torus can be made into an abelian variety: 1-dimensional abelian varieties are called elliptic curves, and everyone loves them. In higher dimensions the story is completely different: most complex tori can’t be made into abelian varieties! So, a lot of interesting phenomena are first seen in dimension 2. 2-dimensional abelian varieties are called complex abelian surfaces.

Here’s a cheap way to get our hands on an abelian surface: take the product of two elliptic curves. It’s tempting to use one of the two most symmetrical elliptic curves:

  • The Gaussian curve /𝔾\mathbb{C}/\mathbb{G}, where

𝔾={a+bi|a,b} \mathbb{G} = \{ a + b i \; \vert \; a, b \in \mathbb{Z} \}

is called the Gaussian integers because it’s the ring of algebraic integers in the field [i]\mathbb{Q}[i].

  • The Eisenstein curve /𝔼\mathbb{C}/\mathbb{E}, where

𝔼={a+bω|a,b} \mathbb{E} = \{ a + b \omega \; \vert \; a, b \in \mathbb{Z} \}

and ω\omega is the cube root of unity exp(2πi/3)\exp(2 \pi i/ 3). 𝔼\mathbb{E} is called the Eisenstein integers because its the ring of algebraic integers in the field [ω]\mathbb{Q}[\omega].

The Gaussian integers form a square lattice:

while the Eisenstein integers form an equilateral triangular lattice:

There are no other lattices in the plane as symmetrical as these, though there are interesting runners-up coming from algebraic integers in other fields [n]\mathbb{Q}[\sqrt{-n}].

Since the Eisenstein curve has 6-fold symmetry while the Gaussian curve has only 4-fold symmetry, let’s go all out and form an abelian surface by taking a product of two copies of the Eistenstein curve! I’ll call it the Eisenstein surface:

E=/𝔼×/𝔼= 2/𝔼 2 E = \mathbb{C}/\mathbb{E} \times \mathbb{C}/\mathbb{E} = \mathbb{C}^2/\mathbb{E}^2

What are the symmetries of this? Like any complex torus, it acts on itself by translations. These are incredibly important, but they don’t preserve the group structure because they move the origin around. When we talk about morphisms of abelian varieties, we usually mean maps of varieties that also preserve the group structure. So what are the automorphisms of EE as an abelian variety?

Well, actually it’s nice to think about endomorphisms of EE as an abelian variety. Suppose TM 2(𝔼)T \in \mathrm{M}_2(\mathbb{E}) is any 2×22 \times 2 matrix of Eisenstein integers. Then TT acts on 2\mathbb{C}^2 in a linear way, by matrix multiplication. It obviously maps the lattice 𝔼 2 2\mathbb{E}^2 \subset \mathbb{C}^2 to itself. So it defines an endomorphism of 2/𝔼 2\mathbb{C}^2/\mathbb{E}^2. In other words, it gives an endomorphism of the Eisenstein surface as an abelian variety!

It’s not hard to see that these are all we get. So

End(E)=M 2(𝔼) \mathrm{End}(E) = \mathrm{M}_2(\mathbb{E})

Note that these endomorphisms form a ring: not only can you multiply them (i.e. compose them), you can also add them pointwise. Indeed any abelian variety has a ring of endomorphisms for the same reason, and these rings are very important in the overall theory.

Among the endomorphisms are the automorphisms, and I believe the automorphism group of the Eisenstein surface is

Aut(E)=GL(2,𝔼) \mathrm{Aut}(E) = \mathrm{GL}(2,\mathbb{E})

This is an infinite group because it contains ‘shears’ like

(1 1 0 1) \left( \begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix} \right)

Now, what about line bundles on the Eisenstein surface EE? Let’s sketch how to figure out its Néron–Severi group NS(E)NS(E). Remember, this is a coarse classification of holomorphic line bundles where two count as the same if they are topologically isomorphic. Thus we get a discrete classification, not a ‘moduli space’.

I described the Néron–Severi group in a bunch of ways in Part 2. Here’s the one we want now. If XX is a complex torus we can write

X=V/L X = V/L

where VV is a finite-dimensional complex vector space and LL is a lattice in VV. The vector space VV has a dual V *V^\ast, defined in the usual way, and the lattice LVL \subset V also has a dual L *V *L^\ast \subset V^\ast, defined in a different way:

L *={f:V:fisreallinearandf(v)forallvL} L^\ast = \{ f \colon V \to \mathbb{R} : f \; is\; real–linear \; and\; f(v) \in \mathbb{Z} \; for \; all \; v \in L \}

Then we saw something I called Theorem 22':

Theorem 22'. The Néron–Severi group NS(X)NS(X) consists of linear maps h:VV *h \colon V \to V^\ast that map LL into L *L^\ast and have h *=hh^\ast = h.

The point here is that any linear map f:VWf \colon V \to W has an adjoint f *:W *V *f^\ast \colon W^\ast \to V^\ast, so the map hh has an adjoint h *:V **V *h^\ast \colon V^{\ast\ast} \to V^\ast, but the double dual of VV is canonically isomorphic to VV itself, so with a nod and a wink we can write h *:VV *h^\ast \colon V \to V^\ast, so it makes sense to say h *=hh^\ast = h.

You may be slightly dazed now — are you seeing stars? Luckily, all of this becomes less confusing in our actual example where V= 2V = \mathbb{C}^2 and L=𝔼 2L = \mathbb{E}^2, since the standard inner product on 2\mathbb{C}^2 lets us identify this vector space with its dual, and — check this out, this part is not quite trivial — that lets us identify the lattice 𝔼\mathbb{E} with its dual!

So, the Neron–Severi group NS(E)NS(E) of the Eisenstein surface E= 2/𝔼 2E = \mathbb{C}^2/\mathbb{E}^2 consists of 2×22 \times 2 complex matrices that map 𝔼 2\mathbb{E}^2 to itself and are self-adjoint!

But it’s even simpler than that, since 2×22 \times 2 complex matrices that map 𝔼 2\mathbb{E}^2 to itself are just 2×22 \times 2 matrices of Eisenstein integers. The set of these is our friend M 2(𝔼)\mathrm{M}_2(\mathbb{E}). But now we want the self-adjoint ones. I’ll denote the set of these by 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}). Here the gothic 𝔥\mathfrak{h} stands for ‘hermitian’.

So we’ve figured out the Neron–Severi group of the Eisenstein surface. It consists of 2×22 \times 2 hermitian matrices of Eisenstein integers:

NS(E)=𝔥 2(𝔼)! NS(E) = \mathfrak{h}_2(\mathbb{E}) \; !

Now let’s try to visualize it.

The fun part

I’ll dig into this more next time, but let me state the marvelous facts now, just to whet your appetite. The space of all complex 2×22 \times 2 self-adjoint matrices, called 𝔥 2()\mathfrak{h}_2(\mathbb{C}), is famous in physics. It’s 4-dimensional — and it’s a nice way of thinking about Minkowski spacetime, our model of spacetime in special relativity.

Sitting inside Minkowski spacetime, we now see the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) of 2×22 \times 2 self-adjoint matrices with Eisenstein number entries. It’s a very nice discretization of spacetime.

It’s a bit hard to visualize 4-dimensional things. So let’s look at 2×22 \times 2 self-adjoint matrices whose determinant is 11 and whose trace is positive. These form a 3-dimensional hyperboloid in Minkowski spacetime, called hyperbolic space. And it’s no hyperbole to say that this a staggeringly beautiful alternative to 3-dimensional Euclidean space. It’s negatively curved, so lines that start out parallel get further and further apart in an exponential way as they march along. There’s a lot more room in hyperbolic space — a lot of room for fun.

What happens if we look at points in our lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that happen to lie in hyperbolic space? I believe we get the centers of the hexagons in this picture:

And I believe the other features of this picture arise from other relationships between 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) and hyperbolic space. There’s a lot to check here. Greg Egan has made a lot of progress, but I’ll talk about that next time.

One last thing. I showed you that elements of 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) correspond to topological isomorphism classes of holomorphic line bundles on the Eisenstein surface. Then I showed you a cool picture of a subset of 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}), namely the elements with determinant 11 and trace >0> 0. But what’s the importance of these? Am I focusing on them merely to get a charismatic picture in hyperbolic space?

No: it turns out that these elements correspond to something really nice: principal polarizations of the Eisenstein surface! These come from very best line bundles, in a certain precise sense.

April 25, 2024

Terence TaoNotes on the B+B+t theorem

A recent paper of Kra, Moreira, Richter, and Robertson established the following theorem, resolving a question of Erdös. Given a discrete amenable group {G = (G,+)}, and a subset {A} of {G}, we define the Banach density of {A} to be the quantity

\displaystyle  \sup_\Phi \limsup_{N \rightarrow \infty} |A \cap \Phi_N|/|\Phi_N|,

where the supremum is over all Følner sequences {\Phi = (\Phi_N)_{N=1}^\infty} of {G}. Given a set {B} in {G}, we define the restricted sumset {B \oplus B} to be the set of all pairs {b_1+b_2} where {b_1, b_2} are distinct elements of {B}.

Theorem 1 Let {G} be a countably infinite abelian group with the index {[G:2G]} finite. Let {A} be a positive Banach density subset of {G}. Then there exists an infinite set {B \subset A} and {t \in G} such that {B \oplus B + t \subset A}.

Strictly speaking, the main result of Kra et al. only claims this theorem for the case of the integers {G={\bf Z}}, but as noted in the recent preprint of Charamaras and Mountakis, the argument in fact applies for all countable abelian {G} in which the subgroup {2G := \{ 2x: x \in G \}} has finite index. This condition is in fact necessary (as observed by forthcoming work of Ethan Acklesberg): if {2G} has infinite index, then one can find a subgroup {H_j} of {G} of index {2^j} for any {j \geq 1} that contains {2G} (or equivalently, {G/H_j} is {2}-torsion). If one lets {y_1,y_2,\dots} be an enumeration of {G}, and one can then check that the set

\displaystyle  A := G \backslash \bigcup_{j=1}^\infty (H_{j+1} + y_j) \backslash \{y_1,\dots,y_j\}

has positive Banach density, but does not contain any set of the form {B \oplus B + t} for any {t} (indeed, from the pigeonhole principle and the {2}-torsion nature of {G/H_{j+1}} one can show that {B \oplus B + y_j} must intersect {H_{j+1} + y_j \backslash \{y_1,\dots,y_j\}} whenever {B} has cardinality larger than {j 2^{j+1}}). It is also necessary to work with restricted sums {B \oplus B} rather than full sums {B+B}: a counterexample to the latter is provided for instance by the example with {G = {\bf Z}} and {A := \bigcup_{j=1}^\infty [10^j, 1.1 \times 10^j]}. Finally, the presence of the shift {t} is also necessary, as can be seen by considering the example of {A} being the odd numbers in {G ={\bf Z}}, though in the case {G=2G} one can of course delete the shift {t} at the cost of giving up the containment {B \subset A}.

Theorem 1 resembles other theorems in density Ramsey theory, such as Szemerédi’s theorem, but with the notable difference that the pattern located in the dense set {A} is infinite rather than merely arbitrarily large but finite. As such, it does not seem that this theorem can be proven by purely finitary means. However, one can view this result as the conjunction of an infinite number of statements, each of which is a finitary density Ramsey theory statement. To see this, we need some more notation. Observe from Tychonoff’s theorem that the collection {2^G := \{ B: B \subset G \}} is a compact topological space (with the topology of pointwise convergence) (it is also metrizable since {G} is countable). Subsets {{\mathcal F}} of {2^G} can be thought of as properties of subsets of {G}; for instance, the property a subset {B} of {G} of being finite is of this form, as is the complementary property of being infinite. A property of subsets of {G} can then be said to be closed or open if it corresponds to a closed or open subset of {2^G}. Thus, a property is closed and only if if it is closed under pointwise limits, and a property is open if, whenever a set {B} has this property, then any other set {B'} that shares a sufficiently large (but finite) initial segment with {B} will also have this property. Since {2^G} is compact and Hausdorff, a property is closed if and only if it is compact.

The properties of being finite or infinite are neither closed nor open. Define a smallness property to be a closed (or compact) property of subsets of {G} that is only satisfied by finite sets; the complement to this is a largeness property, which is an open property of subsets of {G} that is satisfied by all infinite sets. (One could also choose to impose other axioms on these properties, for instance requiring a largeness property to be an upper set, but we will not do so here.) Examples of largeness properties for a subset {B} of {G} include:

  • {B} has at least {10} elements.
  • {B} is non-empty and has at least {b_1} elements, where {b_1} is the smallest element of {B}.
  • {B} is non-empty and has at least {b_{b_1}} elements, where {b_n} is the {n^{\mathrm{th}}} element of {B}.
  • {T} halts when given {B} as input, where {T} is a given Turing machine that halts whenever given an infinite set as input. (Note that this encompasses the preceding three examples as special cases, by selecting {T} appropriately.)
We will call a set obeying a largeness property {{\mathcal P}} an {{\mathcal P}}-large set.

Theorem 1 is then equivalent to the following “almost finitary” version (cf. this previous discussion of almost finitary versions of the infinite pigeonhole principle):

Theorem 2 (Almost finitary form of main theorem) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {\Phi_n} be a Følner sequence in {G}, let {\delta>0}, and let {{\mathcal P}_t} be a largeness property for each {t \in G}. Then there exists {N} such that if {A \subset G} is such that {|A \cap \Phi_n| / |\Phi_n| \geq \delta} for all {n \leq N}, then there exists a shift {t \in G} and {A} contains a {{\mathcal P}_t}-large set {B} such that {B \oplus B + t \subset A}.

Proof of Theorem 2 assuming Theorem 1. Let {G, \Phi_n}, {\delta}, {{\mathcal P}_t} be as in Theorem 2. Suppose for contradiction that Theorem 2 failed, then for each {N} we can find {A_N} with {|A_N \cap \Phi_n| / |\Phi_n| \geq \delta} for all {n \leq N}, such that there is no {t} and {{\mathcal P}_t}-large {B} such that {B, B \oplus B + t \subset A_N}. By compactness, a subsequence of the {A_N} converges pointwise to a set {A}, which then has Banach density at least {\delta}. By Theorem 1, there is an infinite set {B} and a {t} such that {B, B \oplus B + t \subset A}. By openness, we conclude that there exists a finite {{\mathcal P}_t}-large set {B'} contained in {B}, thus {B', B' \oplus B' + t \subset A}. This implies that {B', B' \oplus B' + t \subset A_N} for infinitely many {N}, a contradiction.

Proof of Theorem 1 assuming Theorem 2. Let {G, A} be as in Theorem 1. If the claim failed, then for each {t}, the property {{\mathcal P}_t} of being a set {B} for which {B, B \oplus B + t \subset A} would be a smallness property. By Theorem 2, we see that there is a {t} and a {B} obeying the complement of this property such that {B, B \oplus B + t \subset A}, a contradiction.

Remark 3 Define a relation {R} between {2^G} and {2^G \times G} by declaring {A\ R\ (B,t)} if {B \subset A} and {B \oplus B + t \subset A}. The key observation that makes the above equivalences work is that this relation is continuous in the sense that if {U} is an open subset of {2^G \times G}, then the inverse image

\displaystyle R^{-1} U := \{ A \in 2^G: A\ R\ (B,t) \hbox{ for some } (B,t) \in U \}

is also open. Indeed, if {A\ R\ (B,t)} for some {(B,t) \in U}, then {B} contains a finite set {B'} such that {(B',t) \in U}, and then any {A'} that contains both {B'} and {B' \oplus B' + t} lies in {R^{-1} U}.

For each specific largeness property, such as the examples listed previously, Theorem 2 can be viewed as a finitary assertion (at least if the property is “computable” in some sense), but if one quantifies over all largeness properties, then the theorem becomes infinitary. In the spirit of the Paris-Harrington theorem, I would in fact expect some cases of Theorem 2 to undecidable statements of Peano arithmetic, although I do not have a rigorous proof of this assertion.

Despite the complicated finitary interpretation of this theorem, I was still interested in trying to write the proof of Theorem 1 in some sort of “pseudo-finitary” manner, in which one can see analogies with finitary arguments in additive combinatorics. The proof of Theorem 1 that I give below the fold is my attempt to achieve this, although to avoid a complete explosion of “epsilon management” I will still use at one juncture an ergodic theory reduction from the original paper of Kra et al. that relies on such infinitary tools as the ergodic decomposition, the ergodic theory, and the spectral theorem. Also some of the steps will be a little sketchy, and assume some familiarity with additive combinatorics tools (such as the arithmetic regularity lemma).

— 1. Proof of theorem —

The proof of Kra et al. proceeds by establishing the following related statement. Define a (length three) combinatorial Erdös progression to be a triple {(A,X_1,X_2)} of subsets of {G} such that there exists a sequence {n_j \rightarrow \infty} in {G} such that {A - n_j} converges pointwise to {X_1} and {X_1-n_j} converges pointwise to {X_2}. (By {n_j \rightarrow \infty}, we mean with respect to the cocompact filter; that is, that for any finite (or, equivalently, compact) subset {K} of {G}, {n_j \not \in K} for all sufficiently large {j}.)

Theorem 4 (Combinatorial Erdös progression) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {A} be a positive Banach density subset of {G}. Then there exists a combinatorial Erdös progression {(A,X_1,X_2)} with {0 \in X_1} and {X_2} non-empty.

Let us see how Theorem 4 implies Theorem 1. Let {G, A, X_1, X_2, n_j} be as in Theorem 4. By hypothesis, {X_2} contains an element {t} of {G}, thus {0 \in X_1} and {t \in X_2}. Setting {b_1} to be a sufficiently large element of the sequence {n_1, n_2, \dots}, we conclude that {b_1 \in A} and {b_1 + t \in X_1}. Setting {b_2} to be an even larger element of this sequence, we then have {b_2, b_2+b_1+t \in A} and {b_2 +t \in X_1}. Setting {b_3} to be an even larger element, we have {b_3, b_3+b_1+t, b_3+b_2+t \in A} and {b_3 + t \in X_1}. Continuing in this fashion we obtain the desired infinite set {B}.

It remains to establish Theorem 4. The proof of Kra et al. converts this to a topological dynamics/ergodic theory problem. Define a topological measure-preserving {G}-system {(X,T,\mu)} to be a compact space {X} equipped with a Borel probability measure {\mu} as well as a measure-preserving homeomorphism {T: X \rightarrow X}. A point {a} in {X} is said to be generic for {\mu} with respect to a Følner sequence {\Phi} if one has

\displaystyle  \int_X f\ d\mu = \lim_{N \rightarrow \infty} {\bf E}_{n \in \Phi_N} f(T^n a)

for all continuous {f: X \rightarrow {\bf C}}. Define an (length three) dynamical Erdös progression to be a tuple {(a,x_1,x_2)} in {X} with the property that there exists a sequence {n_j \rightarrow \infty} such that {T^{n_j} a \rightarrow x_1} and {T^{n_j} x_1 \rightarrow x_2}.

Theorem 4 then follows from

Theorem 5 (Dynamical Erdös progression) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {(X,T,\mu)} be a topological measure-preserving {G}-system, let {a} be a {\Phi}-generic point of {\mu} for some Følner sequence {\Phi}, and let {E} be a positive measure open subset of {X}. Then there exists a dynamical Erdös progression {(a,x_1,x_2)} with {x_1 \in E} and {x_2 \in \bigcup_{t \in G} T^t E}.

Indeed, we can take {X} to be {2^G}, {a} to be {A}, {T} to be the shift {T^n B := B-n}, {E := \{ B \in 2^G: 0 \in B \}}, and {\mu} to be a weak limit of the {\mathop{\bf E}_{n \in \Phi_N} \delta_{A-n}} for a Følner sequence {\Phi_N} with {\lim_{N \rightarrow \infty} |A \cap \Phi_N| / |\Phi_N| > 0}, at which point Theorem 4 follows from Theorem 5 after chasing definitions. (It is also possible to establish the reverse implication, but we will not need to do so here.)

A remarkable fact about this theorem is that the point {a} need not be in the support of {\mu}! (In a related vein, the elements {\Phi_j} of the Følner sequence are not required to contain the origin.)

Using a certain amount of ergodic theory and spectral theory, Kra et al. were able to reduce this theorem to a special case:

Theorem 6 (Reduction) To prove Theorem 5, it suffices to do so under the additional hypotheses that {X} is ergodic, and there is a continuous factor map to the Kronecker factor. (In particular, the eigenfunctions of {X} can be taken to be continuous.)

We refer the reader to the paper of Kra et al. for the details of this reduction. Now we specialize for simplicity to the case where {G = {\bf F}_p^\omega = \bigcup_N {\bf F}_p^N} is a countable vector space over a finite field of size equal to an odd prime {p}, so in particular {2G=G}; we also specialize to Følner sequences of the form {\Phi_j = x_j + {\bf F}_p^{N_j}} for some {x_j \in G} and {N_j \geq 1}. In this case we can prove a stronger statement:

Theorem 7 (Odd characteristic case) Let {G = {\bf F}_p^\omega} for an odd prime {p}. Let {(X,T,\mu)} be a topological measure-preserving {G}-system with a continuous factor map to the Kronecker factor, and let {E_1, E_2} be open subsets of {X} with {\mu(E_1) + \mu(E_2) > 1}. Then if {a} is a {\Phi}-generic point of {\mu} for some Følner sequence {\Phi_j = y_j + {\bf F}_p^{n_j}}, there exists an Erdös progression {(a,x_1,x_2)} with {x_1 \in E_1} and {x_2 \in E_2}.

Indeed, in the setting of Theorem 5 with the ergodicity hypothesis, the set {\bigcup_{t \in G} T^t E} has full measure, so the hypothesis {\mu(E_1)+\mu(E_2) > 1} of Theorem 7 will be verified in this case. (In the case of more general {G}, this hypothesis ends up being replaced with {\mu(E_1)/[G:2G] + \mu(E_2) > 1}; see Theorem 2.1 of this recent preprint of Kousek and Radic for a treatment of the case {G={\bf Z}} (but the proof extends without much difficulty to the general case).)

As with Theorem 1, Theorem 7 is still an infinitary statement and does not have a direct finitary analogue (though it can likely be expressed as the conjunction of infinitely many such finitary statements, as we did with Theorem 1). Nevertheless we can formulate the following finitary statement which can be viewed as a “baby” version of the above theorem:

Theorem 8 (Finitary model problem) Let {X = (X,d)} be a compact metric space, let {G = {\bf F}_p^N} be a finite vector space over a field of odd prime order. Let {T} be an action of {G} on {X} by homeomorphisms, let {a \in X}, and let {\mu} be the associated {G}-invariant measure {\mu = {\bf E}_{x \in G} \delta_{T^x a}}. Let {E_1, E_2} be subsets of {X} with {\mu(E_1) + \mu(E_2) > 1 + \delta} for some {\delta>0}. Then for any {\varepsilon>0}, there exist {x_1 \in E_1, x_2 \in E_2} such that

\displaystyle  |\{ h \in G: d(T^h a,x_1) \leq \varepsilon, d(T^h x_1,x_2) \leq \varepsilon \}| \gg_{p,\delta,\varepsilon,X} |G|.

The important thing here is that the bounds are uniform in the dimension {N} (as well as the initial point {a} and the action {T}).

Let us now give a finitary proof of Theorem 8. We can cover the compact metric space {X} by a finite collection {B_1,\dots,B_M} of open balls of radius {\varepsilon/2}. This induces a coloring function {\tilde c: X \rightarrow \{1,\dots,M\}} that assigns to each point in {X} the index {m} of the first ball {B_m} that covers that point. This then induces a coloring {c: G \rightarrow \{1,\dots,M\}} of {G} by the formula {c(h) := \tilde c(T^h a)}. We also define the pullbacks {A_i := \{ h \in G: T^h a \in E_i \}} for {i=1,2}. By hypothesis, we have {|A_1| + |A_2| > (1+\delta)|G|}, and it will now suffice by the triangle inequality to show that

\displaystyle  |\{ h \in G: c(h) = c(x_1); c(h+x_1)=c(x_2) \}| \gg_{p,\delta,M} |G|.

Now we apply the arithmetic lemma of Green with some regularity parameter {\kappa>0} to be chosen later. This allows us to partition {G} into cosets of a subgroup {H} of index {O_{p,\kappa}(1)}, such that on all but {\kappa [G:H]} of these cosets {y+H}, all the color classes {\{x \in y+H: c(x) = c_0\}} are {\kappa^{100}}-regular in the Fourier ({U^2}) sense. Now we sample {x_1} uniformly from {G}, and set {x_2 := 2x_1}; as {p} is odd, {x_2} is also uniform in {G}. If {x_1} lies in a coset {y+H}, then {x_2} will lie in {2y+H}. By removing an exceptional event of probability {O(\kappa)}, we may assume that neither of these cosetgs {y+H}, {2y+H} is a bad coset. By removing a further exceptional event of probability {O_M(\kappa)}, we may also assume that {x_1} is in a popular color class of {y+H} in the sense that

\displaystyle  |\{ x \in y+H: c(x) = c(x_1) \}| \geq \kappa |H| \ \ \ \ \ (1)

since the set of exceptional {x_1} that fail to achieve this only are hit with probability {O(M\kappa)}. Similarly we may assume that

\displaystyle  |\{ x \in 2y+H: c(x) = c(x_2) \}| \geq \kappa |H|. \ \ \ \ \ (2)

Now we consider the quantity

\displaystyle  |\{ h \in y+H: c(h) = c(x_1); c(h+x_1)=c(x_2) \}|

which we can write as

\displaystyle  |H| {\bf E}_{h \in y+H} 1_{c^{-1}(c(x_1))}(h) 1_{c^{-1}(c(x_2))}(h+x_1).

Both factors here are {O(\kappa^{100})}-uniform in their respective cosets. Thus by standard Fourier calculations, we see that after excluding another exceptional event of probabitiy {O(\kappa)}, this quantity is equal to

\displaystyle  |H| (({\bf E}_{h \in y+H} 1_{c^{-1}(c(x_1))}(h)) ({\bf E}_{h \in y+H} 1_{c^{-1}(c(x_2))}(h+x_1)) + O(\kappa^{10})).

By (1), (2), this expression is {\gg \kappa^2 |H| \gg_{p,\kappa} |G|}. By choosing {\kappa} small enough depending on {M,\delta}, we can ensure that {x_1 \in E_1} and {x_2 \in E_2}, and the claim follows.

Now we can prove the infinitary result in Theorem 7. Let us place a metric {d} on {X}. By sparsifying the Følner sequence {\Phi_j = y_j + {\bf F}_p^{N_j}}, we may assume that the {n_j} grow as fast as we wish. Once we do so, we claim that for each {J}, we can find {x_{1,J}, x_{2,J} \in X} such that for each {1 \leq j \leq J}, there exists {n_j \in \Phi_j} that lies outside of {{\bf F}_p^j} such that

\displaystyle  d(T^{n_j} a, x_{1,J}) \leq 1/j, \quad d(T^{n_j} x_{1,J}, x_{2,J}) \leq 1/j.

Passing to a subsequence to make {x_{1,J}, x_{2,J}} converge to {x_1, x_2} respectively, we obtain the desired Erdös progression.

Fix {J}, and let {M} be a large parameter (much larger than {J}) to be chosen later. By genericity, we know that the discrete measures {{\bf E}_{h \in \Phi_M} \delta_{T^h a}} converge vaguely to {\mu}, so any point in the support in {\mu} can be approximated by some point {T^h a} with {h \in \Phi_M}. Unfortunately, {a} does not necessarily lie in this support! (Note that {\Phi_M} need not contain the origin.) However, we are assuming a continuous factor map {\pi:X \rightarrow Z} to the Kronecker factor {Z}, which is a compact abelian group, and {\mu} pushes down to the Haar measure of {Z}, which has full support. In particular, thus pushforward contains {\pi(a)}. As a consequence, we can find {h_M \in \Phi_M} such that {\pi(T^{h_M} a)} converges to {\pi(a)}, even if we cannot ensure that {T^{h_M} a} converges to {a}. We are assuming that {\Phi_M} is a coset of {{\bf F}_p^{n_M}}, so now {{\bf E}_{h \in {\bf F}_p^{n_M}} \delta_{T^{h+h_M} a}} converges vaguely to {\mu}.

We make the random choice {x_{1,J} := T^{h_*+h_M} a}, {x_{2,J} := T^{2h_*+h_M} a}, where {h_*} is drawn uniformly at random from {{\bf F}_p^{n_M}}. This is not the only possible choice that can be made here, and is in fact not optimal in certain respects (in particular, it creates a fair bit of coupling between {x_{1,J}}, {x_{2,J}}), but is easy to describe and will suffice for our argument. (A more appropriate choice, closer to the arguments of Kra et al., would be to {x_{2,J}} in the above construction by {T^{2h_*+k_*+h_M} a}, where the additional shift {k_*} is a random variable in {{\bf F}_p^{n_M}} independent of {h_*} that is uniformly drawn from all shifts annihilated by the first {M} characters associated to some enumeration of the (necessarily countable) point spectrum of {T}, but this is harder to describe.)

Since we are in odd characteristic, the map {h \mapsto 2h} is a permutation on {h \in {\bf F}_p^{n_M}}, and so {x_{1,J}}, {x_{2,J}} are both distributed according to the law {{\bf E}_{h \in {\bf F}_p^{n_M}} \delta_{T^{h+h_M} a}}, though they are coupled to each other. In particular, by vague convergence (and inner regularity) we have

\displaystyle  {\bf P}( x_{1,J} \in E_1 ) \geq \mu(E_1) - o(1)

and

\displaystyle  {\bf P}( x_{2,J} \in E_2 ) \geq \mu(E_2) - o(1)

where {o(1)} denotes a quantity that goes to zero as {M \rightarrow \infty} (holding all other parameters fixed). By the hypothesis {\mu(E_1)+\mu(E_2) > 1}, we thus have

\displaystyle  {\bf P}( x_{1,J} \in E_1, x_{2,J} \in E_2 ) \geq \kappa - o(1) \ \ \ \ \ (3)

for some {\kappa>0} independent of {M}.

We will show that for each {1 \leq j \leq J}, one has

\displaystyle  |\{ h \in \Phi_j: d(T^{h} a,x_{1,J}) \leq 1/j, d(T^h x_{1,J},x_{2,J}) \leq 1/j \}| \ \ \ \ \ (4)

\displaystyle  \gg_{p,\kappa,j,X} (1-o(1)) |\Phi_j|

outside of an event of probability at most {\kappa/2^{j+1}+o(1)} (compare with Theorem 8). If this is the case, then by the union bound we can find (for {M} large enough) a choice of {x_{1,J}}, {x_{2,J}} obeying (3) as well as (4) for all {1 \leq j \leq J}. If the {N_j} grow fast enough, we can then ensure that for each {1 \leq j \leq J} one can find (again for {M} large enough) {n_j} in the set in (4) that avoids {{\bf F}_p^j}, and the claim follows.

It remains to show (4) outside of an exceptional event of acceptable probability. Let {\tilde c: X \rightarrow \{1,\dots,M_j\}} be the coloring function from the proof of Theorem 8 (with {\varepsilon := 1/j}). Then it suffices to show that

\displaystyle  |\{ h \in \Phi_j: c_0(h) = c(h_*); c(h+h_*)=c(2h_*) \}| \gg_{p,\kappa,M_j} (1-o(1)) |\Phi_j|

where {c_0(h) := \tilde c(T^h a)} and {c(h) := \tilde c(T^{h+h_M} a)}. This is a counting problem associated to the patterm {(h_*, h, h+h_*, 2h_*)}; if we concatenate the {h_*} and {2h_*} components of the pattern, this is a classic “complexity one” pattern, of the type that would be expected to be amenable to Fourier analysis (especially if one applies Cauchy-Schwarz to eliminate the {h_*} averaging and absolute value, at which point one is left with the {U^2} pattern {(h, h+h_*, h', h'+h_*)}).

In the finitary setting, we used the arithmetic regularity lemma. Here, we will need to use the Kronecker factor instead. The indicator function {1_{\tilde c^{-1}(i)}} of a level set of the coloring function {\tilde c} is a bounded measurable function of {X}, and can thus be decomposed into a function {f_i} that is measurable on the Kronecker factor, plus an error term {g_i} that is orthogonal to that factor and thus is weakly mixing in the sense that {|\langle T^h g_i, g_i \rangle|} tends to zero on average (or equivalently, that the Host-Kra seminorm {\|g_i\|_{U^2}} vanishes). Meanwhile, for any {\varepsilon > 0}, the Kronecker-measurable function {f_i} can be decomposed further as {P_{i,\varepsilon} + k_{i,\varepsilon}}, where {P_{i,\varepsilon}} is a bounded “trigonometric polynomial” (a finite sum of eigenfunctions) and {\|k_{i,\varepsilon}\|_{L^2} < \varepsilon}. The polynomial {P_{i,\varepsilon}} is continuous by hypothesis. The other two terms in the decomposition are merely meaurable, but can be approximated to arbitrary accuracy by continuous functions. The upshot is that we can arrive at a decomposition

\displaystyle  1_{\tilde c^{-1}(i)} = P_{i,\varepsilon} + k_{i,\varepsilon,\varepsilon'} + g_{i,\varepsilon'}

(analogous to the arithmetic regularity lemma) for any {\varepsilon,\varepsilon'>0}, where {k_{i,\varepsilon,\varepsilon'}} is a bounded continuous function of {L^2} norm at most {\varepsilon}, and {g_{i,\varepsilon'}} is a bounded continuous function of {U^2} norm at most {\varepsilon'} (in practice we will take {\varepsilon'} much smaller than {\varepsilon}). Pulling back to {c}, we then have

\displaystyle  1_{c(h)=i} = P_{i,\varepsilon}(T^{h+h_M} a) + k_{i,\varepsilon,\varepsilon'}(T^{h+h_M}a) + g_{i,\varepsilon'}(T^{h+h_M}a). \ \ \ \ \ (5)

Let {\varepsilon,\varepsilon'>0} be chosen later. The trigonometric polynomial {h \mapsto P_{i,\varepsilon}(T^{h} a)} is just a sum of {O_{\varepsilon,M_j}(1)} characters on {G}, so one can find a subgroup {H} of {G} of index {O_{p,\varepsilon,M_j}(1)} such that these polynomial are constant on each coset of {H} for all {i}. Then {h_*} lies in some coset {a_*+H} and {2h_*} lies in the coset {2a_*+H}. We then restrict {h} to also lie in {a_*+H}, and we will show that

\displaystyle  |\{ h \in \Phi_j \cap (a_*+H): c_0(h) = c(h_*); c(h+h_*)=c(2h_*) \}| \ \ \ \ \ (6)

\displaystyle  \gg_{\kappa,p,M_j} (1-o(1)) |\Phi_j \cap (a_*+H)|

outside of an exceptional event of proability {\kappa/2+o(1)}, which will establish our claim because {\varepsilon} will ultimately be chosen to dependon {p,\kappa,M_j}.

The left-hand side can be written as

\displaystyle  \sum_{i,i'} \sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i} 1_{c(h_*)=i, c(2h_*)=i'} 1_{c(h+h_*)=i'}.

The coupling of the constraints {c(h_*)=i} and {c(2h_*)=i'} is annoying (as {(h_*,2h_*)} is an “infinite complexity” pattern that cannot be controlled by any uniformity norm), but (perhaps surprisingly) will not end up causing an essential difficulty to the argument, as we shall see when we start eliminating the terms in this sum one at a time starting from the right.

We decompose the {1_{c(h+h_*)=i'}} term using (5):

\displaystyle  1_{c(h+h_*)=i'} = P_{i',\varepsilon}(T^{h+h_*+h_M} a) + k_{i,\varepsilon,\varepsilon'}(T^{h+h_*+h_M}a) + g_{i,\varepsilon'}(T^{h+h_*+h_M}a).

By Markov’s inequality, and removing an exceptional event of probabiilty at most {\kappa/100}, we may assume that the {g_{i',\varepsilon}} have normalized {L^2} norm {O_{\kappa,M_j}(\varepsilon)} on both of these cosets {a_*+H, 2a_*+H}. As such, the contribution of {k_{i',\varepsilon,\varepsilon'}(T^{h+h_*+h_M}a)} to (6) become negligible if {\varepsilon} is small enough (depending on {\kappa,p,M_j}). From the near weak mixing of the {g_{i,\varepsilon'}}, we know that

\displaystyle {\bf E}_{h \in \Phi_j \cap (a_*+H)} |\langle T^h g_{i,\varepsilon'}, g_{i,\varepsilon'} \rangle| \ll_{p,\varepsilon,M_j} \varepsilon'

for all {i}, if we choose {\Phi_j} large enough. By genericity of {a}, this implies that

\displaystyle {\bf E}_{h \in \Phi_j \cap (a_*+H)} |{\bf E}_{l \in {\bf F}_p^{n_M}} g_{i,\varepsilon'}(T^{h+l+h_M} a) g_{i,\varepsilon'}(T^{l+h_M} a)| \ll_{p,\varepsilon,M_j} \varepsilon' + o(1).

From this and standard Cauchy-Schwarz (or van der Corput) arguments we can then show that the contribution of the {g_{i',\varepsilon'}(T^{h+h_*+h_M}a)} to (6) is negligible outside of an exceptional event of probability at most {\kappa/100+o(1)}, if {\varepsilon'} is small enough depending on {\kappa,p,M_j,\varepsilon}. Finally, the quantity {P_{i',\varepsilon}(T^{h+h_*+h_M} a)} is independent of {h}, and in fact is equal up to negligible error to the density of {c^{-1}(i')} in the coset {{\bf F}_p^{M_j}(2a_*+H)}. This density will be {\gg_{p,\kappa,M_j}} except for those {i'} which would have made a negligible impact on (6) in any event due to the rareness of the event {c(2h_*)=i'} in such cases. As such, to prove (6) it suffices to show that

\displaystyle  \sum_{i,i'} \sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i} 1_{c(h_*)=i, c(2h_*)=i'} \gg_{\kappa,p,M_j} (1-o(1)) |\Phi_j \cap (a_*+H)|

outside of an event of probability {\kappa/100+o(1)}. Now one can sum in {i'} to simplify the above estiamte to

\displaystyle  \sum_{i} 1_{c(h_*)=i} (\sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i}) / |\Phi_j \cap (a_*+H)| \gg_{\kappa,p,M_j} 1-o(1).

If {i} is such that {(\sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i})/|\Phi_j \cap (a_*+H)|} is small compared with {p,\kappa,M_j}, then by genericity (and assuming {\Phi_j} large enough), the probability that {c(h_*)=i} will similarly be small (up to {o(1)} errors), and thus have a negligible influence on the above sum. As such, the above estimate simplifies to

\displaystyle  \sum_{i} 1_{c(h_*)=i} \gg_{\kappa,p,M_j} 1-o(1).

But the left-hand side sums to one, and the claim follows.

Matt Strassler Why a Wave Function Can’t Hurt You

In recent talks at physics departments about my book, I have emphasized that the elementary “particles” of nature — electrons, photons, quarks and so on — are really little waves (or, to borrow a term that was suggested by Sir Arthur Eddington in the 1920s, “wavicles”.) But this notion inevitably generates confusion. That’s because of another wavy concept that arises in “quantum mechanics” —the quantum physics of the 1920s, taught to every physics student. That concept is Erwin Schrödinger’s famous “wave function”.

It’s natural to guess that wave functions and wavicles are roughly the same. In fact, however, they are generally unrelated.

Wavicles Versus Wave Functions

Before quantum physics came along, field theory was already used to predict the behavior of ordinary waves in ordinary settings. Field theory is useful for sound waves in air, seismic waves in rock, and waves on water.

Quantum field theory, the quantum physics that arose out of the 1940s and 1950s, adds something new: it tells us that waves in quantum fields are made from wavicles, the gentlest possible waves. A photon, for instance, is a wavicle of light — the dimmest possible flash of light.

By contrast, a wave function describes a system of objects operating according to quantum physics. Importantly, it’s not one wave function per object — it’s one wave function per system of interacting objects. That’s true whether the objects in the system are particles in motion, or something as simple as particles that cannot move, or something as complex as fields and their wavicles.

One of the points I like to make, to draw the distinction between these two types of waves in quantum physics, is this:

  • Wavicles can hurt you.
  • Wave functions cannot.

Daniel Whiteson, the well-known Large Hadron Collider physicist, podcaster and popular science writer, liked this phrasing so much that he quoted the second half on X/Twitter. Immediately there were protests. One person wrote “Everything that has ever hurt anyone was in truth a wave function.” Another posted a video of an unfortunate incident involving the collision between a baseball and a batter, and said: “the wave function of this baseball disagrees.”

It’s completely understandable why there’s widespread muddlement about this. We have two classes of waves floating around in quantum physics, and both of them are inherently confusing. My aim today is to make it clear why a wave function couldn’t hurt a fly, a cat, or even a particle.

The Basic Concepts

Wavicles, such as photons or electrons, are real objects. X-rays are a form of light, and are made of photons — wavicles of light. A strong beam of X-ray photons can hurt you. The photons travel across three-dimensional space carrying energy and momentum; they can strike your body, damage your DNA, and thereby cause you to develop cancer.

The wave function associated with the X-ray beam, however, is not an object. All it does is describe the beam and its possible futures. It tells us what the beam’s energy may be, but it doesn’t have any energy, and cannot inflict the beam’s energy on anything else. The wave function tells us where the beam may go, but itself goes nowhere. Though it describes a beam as it crosses ordinary three-dimensional space, the wave function does not itself exist in three-dimensional space.

In fact, if the X-ray beam is interacting with your body, then the X-ray beam cannot be said to have its own wave function. Instead, there is only one wave function — one that describes the beam of photons, your atoms, and the interactions between your atoms and the photons.

More generally, if a bunch of objects interact with each other,the multiple interacting objects form a single indivisible system, and a single wave function must describe it. The individual objects do not have separate wave functions.

Schrödinger’s Cat

This point is already illustrated by Schrödinger’s famous (albeit unrealistic) example of the cat in a box that is both dead and alive. The box contains a radioactive atom which will, via a quantum process, eventually “decay” [i.e. transform itself into a new type of atom, releasing a subatomic particle in the process], but may or may not have done so yet. If and when the atom does decay, it triggers the poisoning of the cat. The cat’s survival or demise thus depends on a quantum effect, and it becomes a party to a quantum phenomenon.

It would be a mistake to say that “the atom has a wave function” (or even worse, that “the atom is a wave function”) and that this wave function can kill the cat. To do so would miss Schrödinger’s point. Instead, the wave function includes the atom, the killing device, and the cat.

Initially, when the box is closed, the three are independent of one another, and so they have a relatively simple wave function which one may crudely sketch as

  • Wave Function = (atom undecayed) x (device off) x (cat alive)

This wave function represents our certainty that the atom has not yet decayed, the murder weapon has not been triggered, and the cat is still alive.

But this initial wave function immediately begins evolving into a more complicated form, one which depends on two time-varying complex numbers C and D, with |C|2 + |D|2 = 1:

  • Wave Function = C(t) x (atom undecayed) x (device off) x (cat alive) + D(t) x (atom decayed) x (device on) x (cat dead)

The wave function is now a sum of two “branches” which describe two distinct possibilities, and assigns them probabilities |C|2 and |D|2, the former gradually decreasing and the latter gradually increasing. [Note these two branches are added together in the wave function; its branches cannot be rearranged into wave functions for the atom, device and cat separately, nor can the two branches ever be separated from one another.]

In no sense has the wave function killed the cat; in one of its branches the cat is dead, but the other branch describes a live cat. And in no sense did the “wave function of the atom” or “of the device” kill the cat, because no such wave functions are well-defined in this interacting system.

A More Explicit Example

Let’s now look at an example, similar to the cat but more concrete, and easier to think about and draw.

Let’s take two particles [not wavicles] A and B. These particles travel only in a one-dimensional line, instead of in three-dimensional space.

Initially, particle B is roughly stationary and particle A comes flying toward it. There are two possible outcomes.

  • There is a 30% probability that A passes right by B without affecting it, in which case B simply says “hi” as A goes by.
  • There is a 70% probability that A strikes B head-on and bounces off of it, in which case B, recoiling from the blow, says “ow”.

In the second case, we may indeed say that A “hurts” B — at least in the sense of causing B to recoil suddenly.

The Classical Probabilities

Before we answer quantum questions, let’s first think about how one might describe this situation in a world without quantum physics. There are several ways of depicting what may happen.

Motion in One-Dimensional Physical Space

We could describe how the particles move within their one-dimensional universe, using arrows to illustrate their motions over time. In the figure below, I show both the “hi” possibility and the “ow” possibility.

Figure 1: (Top) With 30% probability, the Hi case: A (in blue) passes by B without interacting with it. (Bottom) With 70% probability, the Ow case: A strikes B, following which A rebounds and B recoils.

Or, using an animation, we can show the time-dependence more explicitly and more clearly. In the second case, I’ve assumed that B has more mass than A, so it recoils more slowly from the blow than does A.

Figure 2: Animation of Fig. 1, showing the Hi case in which A passes B, and the Ow case where A strikes B.

Motion in the Two-Dimensional Space of Possibilities

But we could also describe how the particles move as a system in their two-dimensional space of possibilities. Each point in that space tells us both where A is and where B is; the point’s location along the horizontal axis gives A’s position, and its location along the vertical axis gives B’s position. At each moment, the system is at one point in that space; over time, as A and B change their positions, the location of the system in that two-dimensional space also changes.

The motion of the system for the Hi and Ow cases is shown in Fig. 3. It has exactly the same information as Fig. 1, though depicted differently and somewhat more precisely. Instead of following the two dots that correspond to the two particles as they move in one dimension, we now depict the whole system as a single diamond that tells us where both particles are located.

In the first part of Fig. 3, we see that B’s position is at the center of the space, and remains there, while A’s position goes from the far right to the far left; compare to Fig. 1. In the second part of Fig. 3, A and B collide at the center, following which A moves to positive position, B moves to negative position, and correspondingly, within its space of possibilities, the system as a whole moves down and to the right.

Figure 3: How the A/B system moves through the space of possibilities. (Top) In the Hi case, A moves while B remains fixed at its central position. (Botttom) The Ow case is the same as the Hi case until A’s position reaches B’s position at the center; a collision then causes A to reverse course to a positive position, while B is driven to a negative position (which is downward in this graph.) The system as a whole thus moves down and to the right in the space of possibilities.

And finally, let’s look at an animation in the two-dimensional space of possibilities. Compare this to Fig. 3, and then to Fig. 2, noting that it has the same information.

Figure 4: Animation of Figure 3, showing (top) the Hi case in which A passes B and (bottom) the Ow case where A strikes B.

In Fig. 4, we see that

  • the system as a whole is represented as a single moving point in the space of possibilities
  • each of the two futures for the system are represented as separate time-dependent paths across the space of possibilities

The Quantum System

Now, what if the system is described using quantum physics? What’s up with the system’s wave function?

As noted, we do not have a wave function for particle A and a separate wave function for particle B, and so we do not have a collision of two wave functions. Instead, we have a single wave function for the A/B system, one which describes the collision of the two particles and the aftermath thereof.

It is impossible to depict the wave function using the one-dimensional universe that the particles live in. The wave function itself only exists in the space of possibilities. So in quantum physics, there are no analogues to Figs. 1 and 2.

Meanwhile, although we can depict the wave function at any one moment in the two dimensional space, we cannot simply use arrows to depict how it changes over time. This is because we cannot view the “Hi” and “Ow” cases as distinct, and as something we can draw in two separate figures, as we did in Figs. 3 and 4. In quantum physics, we have to view both possibilities as described by the same wave function; they are not distinct outcomes.

The only option we have is to do an animation in the two-dimensional space of possibilities, somewhat similar to Fig. 4, but without separating the Hi and Ow outcomes. There’s just one wave function that shows both the “Hi” and “Ow” cases together. The square of this wave function, which gives the probabilities for the system’s possible futures, is sketched in Fig. 5.

[Note that what is shown is merely a sketch! It is not the true wave function, which requires a complete solution of Schrödinger’s wave equation. While the solution is well known, it is tricky to get all the details of the math exactly right, and I haven’t had the time. I’ll try to add the complete and correct solution at a later date.]

Compare Fig. 5 with Fig. 4, recalling that the probability of the Hi case is 30% and the probability of the Ow case is 70%. Both possibilities appear in the wave function, with the branch corresponding to the Ow case carrying larger weight than the branch corresponding to the Hi case.

Figure 5: A rough sketch of what the square of the wave function of the A/B system looks like; small-scale details are not modeled correctly. Note both Hi and Ow possibilities, and their relative probabilities, appear in the wave function. Compare with Fig. 4 and with the example of Schrödinger’s cat.

In contrast to Fig. 4, the key differences are that

  • the system is no longer represented as a point in the space of possibilities, but instead as a (broadened) set of possibilities
  • the wave function is complicated during the collision, and develops two distinct branches only after the collision
  • all possible futures for the system exist within the same wave function
    • this has the consequence that distinct future possibilities of the system could potentially affect each other at a later time — a concept which makes no sense in non-quantum physics
  • the probabilities of those distinct futures are given by the relative sizes of the wave function within the two branches.

Notice that even though particle A has a 70% probability of “hurting” B, the wave function itself does not, and cannot, “hurt” B. It just describes what may happen; it contains both A and B, and describes both the possibility of Hi and Ow. The wave function isn’t a part of the A/B system, and doesn’t participate in its activities. Instead, it exists outside the system, as a means for understanding that system’s behavior.

Summing Up

A system has a wave function, but individual objects in the system do not have wave functions. That’s the key point.

To be fair, it is true that when objects or groups of objects in a system interact weakly enough, we may imagine the system’s full wave function as though it were a simple combination of wave functions for each object or group of objects. That is true of the initial Schrödinger cat wave function, which is a product of separate factors for the atom, device and cat, and is also true of the wave function in Fig. 5 before the collision of A and B. But once significant interactions occur, this is no longer the case, as we see in the later-stage Schrödinger cat wave function and in Fig. 5 after the collision.

A wave function expresses how the overall system moves through the full space of its possibilities, and grows ever more complex when there are many possible paths for a system to take. This is completely unrelated to wavicles, which are objects that move through physical space and create physical phenomena, forming parts of a system that itself is described by a wave function.

A Final Note on Wave Functions

As a final comment: I’ve given this simple example because it’s one of the very few that one can draw start to finish.

Wave functions of systems with just one particle are misleading, because they make it easy to imagine that there is one wave function per particle. But with more than one particle, the only wave functions that can easily be depicted are those of two particles moving in one dimension, such as the one I have given you. Such examples offer a unique opportunity to clarify what a wave function is and isn’t, and it’s therefore crucial to appreciate them.

Any wave function more complicated than this becomes impossible to draw. Here are some things to consider.

  • I have only drawn the square of the wave function in Fig. 5. The full wave function is a complex function (i.e. a complex number at each point in the space of possibilities), and the contour plot I have used in Fig. 5 could only be used to draw its real part, its imaginary part, or its square. Thus even in this simple situation with a two-dimensional space of possibilities, the full wave function cannot easily be represented.
  • If we had four particles moving in one dimension instead of two, with positions x1, x2, x3 and x4 respectively, then the wave function would be a function of the four-dimensional space of possibilities, with coordinates x1, x2, x3, x4 . [The square of the wave function at each point in that space tells us the probability that particle 1 is at position x1, particle 2 is at position x2, and similarly for 3 and 4.] A function in four dimensions can be handled using math, but is impossible to draw.
  • If we had two particles moving in three dimensions, the first with position x1, y1, z1, and the second with position x2, y2, z2, the space of possibilities would be six-dimensional — x1, y1, z1, x2, y2, z2 . Again, this cannot be drawn.

These difficulties explain why one almost never sees a proper discussion of wave functions of complicated systems, and why wave functions of fields are almost never described and are never depicted.

April 24, 2024

John PreskillTo thermalize, or not to thermalize, that is the question.

The Noncommuting-Charges World Tour (Part 3 of 4)

This is the third part of a four-part series covering the recent Perspective on noncommuting charges. I’ll post one part every ~6 weeks leading up to my PhD thesis defence. You can find Part 1 here and Part 2 here.

If Hamlet had been a system of noncommuting charges, his famous soliloquy may have gone like this…

To thermalize, or not to thermalize, that is the question:
Whether ’tis more natural for the system to suffer
The large entanglement of thermalizing dynamics,
Or to take arms against the ETH
And by opposing inhibit it. To die—to thermalize,
No more; and by thermalization to say we end
The dynamical symmetries and quantum scars
That complicate dynamics: ’tis a consummation
Devoutly to be wish’d. To die, to thermalize;
To thermalize, perchance to compute—ay, there’s the rub:
For in that thermalization our quantum information decoheres,
When our coherence has shuffled off this quantum coil,
Must give us pause—there’s the respect
That makes calamity of resisting thermalization.

Hamlet (the quantum steampunk edition)


In the original play, Hamlet grapples with the dilemma of whether to live or die. Noncommuting charges have a dilemma regarding whether they facilitate or impede thermalization. Among the five research opportunities highlighted in the Perspective article, resolving this debate is my favourite opportunity due to its potential implications for quantum technologies. A primary obstacle in developing scalable quantum computers is mitigating decoherence; here, thermalization plays a crucial role. If systems with noncommuting charges are shown to resist thermalization, they may contribute to quantum technologies that are more resistant to decoherence. Systems with noncommuting charges, such as spin systems and squeezed states of light, naturally occur in quantum computing models like quantum dots and optical approaches. This possibility is further supported by recent advances demonstrating that non-Abelian symmetric operations are universal for quantum computing (see references 1 and 2).

In this penultimate blog post of the series, I will review some results that argue both in favour of and against noncommuting charges hindering thermalization. This discussion includes content from Sections III, IV, and V of the Perspective article, along with a dash of some related works at the end—one I recently posted and another I recently found. The results I will review do not directly contradict one another because they arise from different setups. My final blog post will delve into the remaining parts of the Perspective article.

Playing Hamlet is like jury duty for actors–sooner or later, you’re getting the call (source).

Arguments for hindering thermalization

The first argument supporting the idea that noncommuting charges hinder thermalization is that they can reduce the production of thermodynamic entropy. In their study, Manzano, Parrondo, and Landi explore a collisional model involving two systems, each composed of numerous subsystems. In each “collision,” one subsystem from each system is randomly selected to “collide.” These subsystems undergo a unitary evolution during the collision and are subsequently returned to their original systems. The researchers derive a formula for the entropy production per collision within a certain regime (the linear-response regime). Notably, one term of this formula is negative if and only if the charges do not commute. Since thermodynamic entropy production is a hallmark of thermalization, this finding implies that systems with noncommuting charges may thermalize more slowly. Two other extensions support this result.

The second argument stems from an essential result in quantum computing. This result is that every algorithm you want to run on your quantum computer can be broken down into gates you run on one or two qubits (the building blocks of quantum computers). Marvian’s research reveals that this principle fails when dealing with charge-conserving unitaries. For instance, consider the charge as energy. Marvian’s results suggest that energy-preserving interactions between neighbouring qubits don’t suffice to construct all energy-preserving interactions across all qubits. The restrictions become more severe when dealing with noncommuting charges. Local interactions that preserve noncommuting charges impose stricter constraints on the system’s overall dynamics compared to commuting charges. These constraints could potentially reduce chaos, something that tends to lead to thermalization.

Adding to the evidence, we revisit the eigenstate thermalization hypothesis (ETH), which I discussed in my first post. The ETH essentially asserts that if an observable and Hamiltonian adhere to the ETH, the observable will thermalize. This means its expectation value stabilizes over time, aligning with the expectation value of the thermal state, albeit with some important corrections. Noncommuting charges cause all kinds of problems for the ETH, as detailed in these two posts by Nicole Yunger Halpern. Rather than reiterating Nicole’s succinct explanations, I’ll present the main takeaway: noncommuting charges undermine the ETH. This has led to the development of a non-Abelian version of the ETH by Murthy and collaborators. This new framework still predicts thermalization in many, but not all, cases. Under a reasonable physical assumption, the previously mentioned corrections to the ETH may be more substantial.

If this story ended here, I would have needed to reference a different Shakespearean work. Fortunately, the internal conflict inherent in noncommuting aligns well with Hamlet. Noncommuting charges appear to impede thermalization in various aspects, yet paradoxically, they also seem to promote it in others.

Arguments for promoting thermalization

Among the many factors accompanying the thermalization of quantum systems, entanglement is one of the most studied. Last year, I wrote a blog post explaining how my collaborators and I constructed analogous models that differ in whether their charges commute. One of the paper’s results was that the model with noncommuting charges had higher average entanglement entropy. As a result of that blog post, I was invited to CBC’s “Quirks & Quarks” Podcast to explain, on national radio, whether quantum entanglement can explain the extreme similarities we see in identical twins who are raised apart. Spoilers for the interview: it can’t, but wouldn’t it be grand if it could?

Following up on that work, my collaborators and I introduced noncommuting charges into monitored quantum circuits (MQCs)—quantum circuits with mid-circuit measurements. MQCs offer a practical framework for exploring how, for example, entanglement is affected by the interplay between unitary dynamics and measurements. MQCs with no charges or with commuting charges have a weakly entangled phase (“area-law” phase) when the measurements are done often enough, and a highly entangled phase (“volume-law” phase) otherwise. However, in MQCs with noncommuting charges, this weakly entangled phase never exists. In its place, there is a critical phase marked by long-range entanglement. This finding supports our earlier observation that noncommuting charges tend to increase entanglement.

I recently looked at a different angle to this thermalization puzzle. It’s well known that most quantum many-body systems thermalize; some don’t. In those that don’t, what effect do noncommuting charges have? One paper that answers this question is covered in the Perspective. Here, Potter and Vasseur study many-body localization (MBL). Imagine a chain of spins that are strongly interacting. We can add a disorder term, such as an external field whose magnitude varies across sites on this chain. If the disorder is sufficiently strong, the system “localizes.” This implies that if we measured the expectation value of some property of each qubit at some time, it would maintain that same value for a while. MBL is one type of behaviour that resists thermalization. Potter and Vasseur found that noncommuting charges destabilize MBL, thereby promoting thermalizing behaviour.

In addition to the papers discussed in our Perspective article, I want to highlight two other studies that study how systems can avoid thermalization. One mechanism is through the presence of “dynamical symmetries,” also known as “spectrum-generating algebras.” These are operators that act similarly to ladder operators for the Hamiltonian. For any observable that overlaps with these dynamical symmetries, the observable’s expectation value will continue to evolve over time and will not thermalize in accordance with the Eigenstate Thermalization Hypothesis (ETH). In my recent work, I demonstrate that noncommuting charges remove the non-thermalizing dynamics that emerge from dynamical symmetries.

Additionally, I came across a study by O’Dea, Burnell, Chandran, and Khemani, which proposes a method for constructing Hamiltonians that exhibit quantum scars. Quantum scars are unique eigenstates of the Hamiltonian that do not thermalize despite being surrounded by a spectrum of other eigenstates that do thermalize. Their approach involves creating a Hamiltonian with noncommuting charges and subsequently breaking the non-Abelian symmetry. When the symmetry is broken, quantum scars appear; however, if the non-Abelian symmetry were to be restored, the quantum scars vanish. These last three results suggest that noncommuting charges impede various types of non-thermalizing dynamics.

Unlike Hamlet, the narrative of noncommuting charges is still unfolding. I wish I could conclude with a dramatic finale akin to the duel between Hamlet and Laertes, Claudius’s poisoning, and the proclamation of a new heir to the Danish throne. However, that chapter is yet to be written. “To thermalize or not to thermalize?” We will just have to wait and see.

Scott Aaronson My Passover press release

FOR IMMEDIATE RELEASE – From the university campuses of Assyria to the thoroughfares of Ur to the palaces of the Hittite Empire, students across the Fertile Crescent have formed human chains, camel caravans, and even makeshift tent cities to protest the oppression of innocent Egyptians by the rogue proto-nation of “Israel” and its vengeful, warlike deity Yahweh. According to leading human rights organizations, the Hebrews, under the leadership of a bearded extremist known as Moses or “Genocide Moe,” have unleashed frogs, wild beasts, hail, locusts, cattle disease, and other prohibited collective punishments on Egypt’s civilian population, regardless of the humanitarian cost.

Human-rights expert Asenath Albanese says that “under international law, it is the Hebrews’ sole responsibility to supply food, water, and energy to the Egyptian populace, just as it was their responsibility to build mud-brick store-cities for Pharoah. Turning the entire Nile into blood, and plunging Egypt into neverending darkness, are manifestly inconsistent with the Israelites’ humanitarian obligations.”

Israelite propaganda materials have held these supernatural assaults to be justified by Pharoah’s alleged enslavement of the Hebrews, as well as unverified reports of his casting all newborn Hebrew boys into the Nile. Chanting “Let My People Go,” some Hebrew counterprotesters claim that Pharoah could end the plagues at any time by simply releasing those held in bondage.

Yet Ptahmose O’Connor, Chair of Middle East Studies at the University of Avaris, retorts that this simplistic formulation ignores the broader context. “Ever since Joseph became Pharoah’s economic adviser, the Israelites have enjoyed a position of unearned power and privilege in Egypt. Through underhanded dealings, they even recruited the world’s sole superpower—namely Adonai, Creator of the Universe—as their ally, removing any possibility that Adonai could serve as a neutral mediator in the conflict. As such, Egypt’s oppressed have a right to resist their oppression by any means necessary. This includes commonsense measures like setting taskmasters over the Hebrews to afflict them with heavy burdens, and dealing shrewdly with them lest they multiply.”

Professor O’Connor, however, dismissed the claims of drowned Hebrew babies as unverified rumors. “Infanticide accusations,” he explained, “have an ugly history of racism, Orientalism, and Egyptophobia. Therefore, unless you’re a racist or an Orientalist, the only possible conclusion is that no Hebrew babies have been drowned in the Nile, except possibly by accident, or of course by Hebrews themselves looking for a pretext to start this conflict.”

Meanwhile, at elite academic institutions across the region, the calls for justice have been deafening. “From the Nile to the Sea of Reeds, free Egypt from Jacob’s seeds!” students chanted. Some protesters even taunted passing Hebrew slaves with “go back to Canaan!”, though others were quick to disavow that message. According to Professor O’Connor, it’s important to clarify that the Hebrews don’t belong in Canaan either, and that finding a place where they do belong is not the protesters’ job.

In the face of such stridency, a few professors and temple priests have called the protests anti-Semitic. The protesters, however, dismiss that charge, pointing as proof to the many Hebrews and other Semitic peoples in their own ranks. For example, Sa-Hathor Goldstein, who currently serves as Pithom College’s Chapter President of Jews for Pharoah, told us that “we stand in solidarity with our Egyptian brethren, with the shepherds, goat-workers, and queer and mummified voices around the world. And every time Genocide Moe strikes down his staff to summon another of Yahweh’s barbaric plagues, we’ll be right there to tell him: Not In Our Name!”

“Look,” Goldstein added softly, “my own grandparents were murdered by Egyptian taskmasters. But the lesson I draw from my family’s tragic history is to speak up for oppressed people everywhere—even the ones who are standing over me with whips.”

“If Yahweh is so all-powerful,” Goldstein went on to ask, “why could He not devise a way to free the Israelites without a single Egyptian needing to suffer? Why did He allow us to become slaves in the first place? And why, after each plague, does He harden Pharoah’s heart against our release? Not only does that tactic needlessly prolong the suffering of Israelites and Egyptians alike, it also infringes on Pharoah’s bodily autonomy.”

But the strongest argument, Goldstein concluded, arching his eyebrow, is that “ever since I started speaking out on this issue, it’s been so easy to get with all the Midianite chicks at my school. That’s because they, like me, see past the endless intellectual arguments over ‘who started’ or ‘how’ or ‘why’ to the emotional truth that the suffering just has to stop, man.”

Last night, college towns across the Tigris, Euphrates, and Nile were aglow with candelight vigils for Baka Ahhotep, an Egyptian taskmaster and beloved father of three cruelly slain by “Genocide Moe,” in an altercation over alleged mistreatment of a Hebrew slave whose details remain disputed.

According to Caitlyn Mentuhotep, a sophomore majoring in hieroglyphic theory at the University of Pi-Ramesses who attended her school’s vigil for Ahhotep, staying true to her convictions hasn’t been easy in the face of Yahweh’s unending plagues—particularly the head lice. “But what keeps me going,” she said, “is the absolute certainty that, when people centuries from now write the story of our time, they’ll say that those of us who stood with Pharoah were on the right side of history.”

Have a wonderful holiday!

Terence TaoErratum for “An inverse theorem for the Gowers U^{s+1}[N]-norm”

The purpose of this post is to report an erratum to the 2012 paper “An inverse theorem for the Gowers {U^{s+1}[N]}-norm” of Ben Green, myself, and Tamar Ziegler (previously discussed in this blog post). The main results of this paper have been superseded with stronger quantitative results, first in work of Manners (using somewhat different methods), and more recently in a remarkable paper of Leng, Sah, and Sawhney which combined the methods of our paper with several new innovations to obtain quite strong bounds (of quasipolynomial type); see also an alternate proof of our main results (again by quite different methods) by Candela and Szegedy. In the course of their work, they discovered some fixable but nontrivial errors in our paper. These (rather technical) issues were already implicitly corrected in this followup work which supersedes our own paper, but for the sake of completeness we are also providing a formal erratum for our original paper, which can be found here. We thank Leng, Sah, and Sawhney for bringing these issues to our attention.

Excluding some minor (mostly typographical) issues which we also have reported in this erratum, the main issues stemmed from a conflation of two notions of a degree {s} filtration

\displaystyle  G = G_0 \geq G_1 \geq \dots \geq G_s \geq G_{s+1} = \{1\}

of a group {G}, which is a nested sequence of subgroups that obey the relation {[G_i,G_j] \leq G_{i+j}} for all {i,j}. The weaker notion (sometimes known as a prefiltration) permits the group {G_1} to be strictly smaller than {G_0}, while the stronger notion requires {G_0} and {G_1} to equal. In practice, one can often move between the two concepts, as {G_1} is always normal in {G_0}, and a prefiltration behaves like a filtration on every coset of {G_1} (after applying a translation and perhaps also a conjugation). However, we did not clarify this issue sufficiently in the paper, and there are some places in the text where results that were only proven for filtrations were applied for prefiltrations. The erratum fixes this issues, mostly by clarifying that we work with filtrations throughout (which requires some decomposition into cosets in places where prefiltrations are generated). Similar adjustments need to be made for multidegree filtrations and degree-rank filtrations, which we also use heavily on our paper.

In most cases, fixing this issue only required minor changes to the text, but there is one place (Section 8) where there was a non-trivial problem: we used the claim that the final group {G_s} was a central group, which is true for filtrations, but not necessarily for prefiltrations. This fact (or more precisely, a multidegree variant of it) was used to claim a factorization for a certain product of nilcharacters, which is in fact not true as stated. In the erratum, a substitute factorization for a slightly different product of nilcharacters is provided, which is still sufficient to conclude the main result of this part of the paper (namely, a statistical linearization of a certain family of nilcharacters in the shift parameter {h}).

Again, we stress that these issues do not impact the paper of Leng, Sah, and Sawhney, as they adapted the methods in our paper in a fashion that avoids these errors.

April 23, 2024

n-Category Café Counting Points on Elliptic Curves (Part 3)

In Part 1 of this little series I showed you Wikipedia’s current definition of the LL-function of an elliptic curve, and you were supposed to shudder in horror. In this definition the LL-function is a product over all primes pp. But what do we multiply in this product? There are 4 different cases, each with its own weird and unmotivated formula!

In Part 2 we studied the 4 cases. They correspond to 4 things that can happen when we look at our elliptic curve over the finite field 𝔽 p\mathbb{F}_{p}: it can stay smooth, or it can become singular in 3 different ways. In each case we got a formula for number of points the resulting curve over the fields 𝔽 p k\mathbb{F}_{p^k}.

Now I’ll give a much better definition of the LL-function of an elliptic curve. Using our work from last time, I’ll show that it’s equivalent to the horrible definition on Wikipedia. And eventually I may get up the nerve to improve the Wikipedia definition. Then future generations will wonder what I was complaining about.

I want to explain the LL-function of an elliptic curve as simply as possible — thus, with a minimum of terminology and unmotivated nonsense.

The LL-function of an elliptic curve is a slight tweak of something more fundamental: its zeta function. So we have to start there.

The zeta function of an elliptic curve

You can define the zeta function of any gadget SS that assigns a finite set S(R)S(R) to any finite commutative ring RR. It goes like this:

ζ S(s)= n=1 |Z S(n)|n!n s \zeta_S(s) = \sum_{n = 1}^\infty \frac{|Z_S(n)|}{n!} n^{-s}

where ss is a complex number and the sum will converge if Re(s)Re(s) is big enough.

What’s Z S(n)Z_S(n)? A ring that’s a finite product of finite fields is called a finite semisimple commutative ring. An element of Z S(n)Z_S(n) is a way to make the set {1,,n}\{1, \dots, n\} into a finite semisimple commutative ring, say RR, and choose an element of S(R)S(R).

So, to define the zeta function of an elliptic curve, we just need a way for an elliptic curve EE to assign a finite set E(R)E(R) to any finite semisimple commutative ring RR. This is not hard. By an elliptic curve I simply mean an equation

y 2=P(x) y^2 = P(x)

where PP is a cubic equation with integer coefficients and distinct roots. When RR is a finite field, this equation will have a finite set of solutions in RR, and we take those and one extra ‘point at infinity’ to be the points of our set E(R)E(R). When RR is a general finite semsimple ring, it’s a product of finite fields, say

RF 1××F n R \cong F_1 \times \cdots \times F_n

and we define

E(R)=E(F 1)××E(F n) E(R) = E(F_1) \times \cdots \times E(F_n)

Then the zeta function of our elliptic curve EE is

ζ E(s)= n=1 |Z E(n)|n!n s \zeta_E(s) = \sum_{n = 1}^\infty \frac{|Z_E(n)|}{n!} n^{-s}

The L-function of an elliptic curve

Later today we will calculate the zeta function of an elliptic curve. And we’ll see that it always has a special form:

ζ E(s)=ζ(s)ζ(s1)somerationalfunctionofs \zeta_E(s) = \frac{ \zeta(s) \zeta(s - 1)}{some \; rational \; function \; of \; s}

where ζ\zeta is the Riemann zeta function. The denominator here is called the L-function of our elliptic curve, L(E,s)L(E,s). That’s all there is to it!

In short:

L(E,s)=ζ(s)ζ(s1)ζ E(s) L(E,s) = \frac{ \zeta(s) \zeta(s - 1)}{\zeta_E(s)}

You should think of the LL-function as the ‘interesting part’ of the zeta function of the elliptic curve — but flipped upside down, just to confuse amateurs. That’s also why we write n sn^{-s} in the formula for the zeta function instead of n sn^s: it’s a deliberately unnatural convention designed to keep out the riff-raff.

Arbitrary conventions aside, I hope you see the LL-function of an elliptic curve is a fairly simple thing. You might wonder why the zeta function is defined as it is, and why the zeta function of the elliptic curve has a factor of ζ(s)ζ(s1)\zeta(s) \zeta(s-1) in it. Those are very good questions, with good answers. But my point is this: all the gory complexity of the LL-function arises when we actually try to compute it more explicitly.

Now let’s do that.

The Euler product formula

An elliptic curve EE gives a finite set E(R)E(R) for each finite semisimple commutative ring RR. We need to count these sets to compute the zeta function or LL-function of our elliptic curve. But we have set things up so that

E(R×R)E(R)×E(R) E(R \times R') \cong E(R) \times E(R')

Since every finite semisimple commutative ring is a product of finite fields, this lets us focus on counting E(R)E(R) when RR is a finite field. And since every finite field has a prime power number of elements, we can tackle this counting problem ‘one prime at a time’.

If we carry this through, we get an interesting formula for the zeta function of an elliptic curve. In fact it’s a very general thing:

Euler Product Formula. Suppose SS is any functor from finite commutative rings to finite sets such that S(R×R)S(R)×S(R)S(R \times R') \cong S(R) \times S(R'). Then

ζ S(s)= pexp( k=1 |S(𝔽 p k)|kp ks) \zeta_S(s) = \prod_p \exp \left( \sum_{k = 1}^\infty \frac{|S(\mathbb{F}_{p^k})|}{k} p^{-k s} \right)

where we take the product over all primes pp, and 𝔽 p k\mathbb{F}_{p^k} is the field with p kp^k elements.

I wrote up a proof here:

so check it out if you want. I was not trying to make the argument look as simple as possible, but it’s really quite easy given what I’ve said: you can probably work it out yourself.

So: the zeta function of an elliptic curve EE is a product over primes. The factor for the prime pp is called the local zeta function

Z p(E,s)=exp( k=1 |E(𝔽 p k)|kp ks) Z_p(E,s) = \exp \left( \sum_{k = 1}^\infty \frac{|E(\mathbb{F}_{p^k})|}{k} p^{-k s} \right)

To compute this, we need to know the numbers |E(𝔽 p k)||E(\mathbb{F}_{p^k})|. Luckily we worked these out last time! But there are four cases.

In every case we have

|E(𝔽 p k)|=p k+1+c(p,k) |E(\mathbb{F}_{p^k})| = p^k + 1 + c(p,k)

where c(p,k)c(p,k) is some sort of ‘correction’. If the correction c(p,k)c(p,k) is zero, we get

Z p(E,s) = exp( k=1 p k+1kp ks) = exp(ln(1p s+1)ln(1p s)) = 1(1p s+1)(1p s) \begin{array}{ccl} Z_p(E,s) &=& \displaystyle{ \exp \left(\sum_{k = 1}^\infty \frac{p^k + 1}{k} p^{-k s} \right) } \\ \\ &=& \displaystyle{ \exp \left( -\ln(1 - p^{-s + 1}) - \ln(1 - p^{-s}) \right) } \\ \\ &=& \displaystyle{ \frac{1}{(1 - p^{-s + 1})(1 - p^{-s}) } } \end{array}

I did the sum pretty fast, but not because I’m good at sums — merely to keep you from getting bored. To do it yourself, all you need to know is the Taylor series for the logarithm.

To get the zeta function of our elliptic curve we multiply all the local zeta functions Z p(E,s)Z_p(E,s). So if all the corrections c(p,k)c(p,k) were zero, we’d get

Z(E,s)= p11p s+1 p11p s=ζ(s1)ζ(s) Z(E,s) = \prod_p \frac{1}{1 - p^{-s + 1}} \prod_p \frac{1}{1 - p^{-s} } = \zeta(s-1) \zeta(s)

Here I used the Euler product formula for the Riemann zeta function.

This is precisely why folks define the LL-function of an elliptic curve to be

L(E,s) 1=ζ E(s)ζ(s)ζ(s1) L(E,s)^{-1} = \frac{\zeta_E(s)}{ \zeta(s) \zeta(s - 1)}

It lets us focus on the effect of the corrections!. Well, it doesn’t explain that stupid reciprocal on the left-hand side, which is just a convention — but apart from that, we’re taking the zeta function of the elliptic curve and dividing out by what we’d get if all the corrections c(p,k)c(p,k) were zero. So, if you think about it a bit, we have

L(E,s) 1= pexp( k=1 c(p,k)kp ks) L(E,s)^{-1} = \prod_p \exp \left( \sum_{k = 1}^\infty \frac{c(p,k)}{k} p^{-k s} \right)

It’s like the Euler product formula for the zeta function, but using only the corrections c(p,k)c(p,k) instead of the full count of points |E(𝔽 p k)||E(\mathbb{F}_{p^k})|.

As you can see, the LL-function is a product of local L-functions

L p(E,s) 1=exp( k=1 c(p,k)kp ks) L_p(E,s)^{-1} = \exp \left( \sum_{k = 1}^\infty \frac{c(p,k)}{k} p^{-k s}\right)

So let’s work those out! There are four cases.

The local zeta function of an elliptic curve: additive reduction

If our elliptic curve gets a cusp over 𝔽 p\mathbb{F}_p, we say it has additive reduction. In this case we saw in Theorem 2 last time that

|E(𝔽 p k)|=p k+1 |E(\mathbb{F}_{p^k})| = p^k + 1

So in this case the correction vanishes:

c(p,k)=0 c(p,k) = 0

This makes the local LL-function very simple:

L p(E,s) 1=exp( k=1 c(p,k)kp ks)=1 L_p(E,s)^{-1} = \exp \left( \sum_{k = 1}^\infty \frac{c(p,k)}{k} p^{-k s}\right) = 1

The local zeta function of an elliptic curve: split multiplicative reduction

If our elliptic curve gets a node over 𝔽 p\mathbb{F}_p and the two lines tangent to this node have slopes defined in 𝔽 p\mathbb{F}_p, we say our curve has split multiplicative reduction. In this case we saw in Theorem 3 last time that

|E(𝔽 p k)|=p k |E(\mathbb{F}_{p^k})| = p^k

So in this case, the correction is 1-1:

c(p,k)=1 c(p,k) = -1

This gives

L p(E,s) 1 = exp( k=1 1kp ks) = exp(ln(1p s)) = 1p s \begin{array}{ccl} L_p(E,s)^{-1} &=& \displaystyle{ \exp \left( \sum_{k = 1}^\infty \frac{1}{k} p^{-k s}\right) } \\ \\ &=& \displaystyle{ \exp \left( \ln(1 - p^{-s}) \right) } \\ \\ &=& 1 - p^{-s} \end{array}

Again I used my profound mastery of Taylor series of the logarithm to do the sum.

The local zeta function of an elliptic curve: split multiplicative reduction

If our elliptic curve gets a node over 𝔽 p\mathbb{F}_p and the two lines tangent to this node have slopes that are not defined in 𝔽 p\mathbb{F}_p, we say our curve has nonsplit multiplicative reduction. In this case we saw in Theorem 4 last time that

|E(𝔽 p k)|=p k+1(1) k |E(\mathbb{F}_{p^k})| = p^k + 1 - (-1)^k

In this case the correction is more interesting:

c(p,k)=(1) k c(p,k) = -(-1)^k

This gives

L p(E,s) 1 = exp( k=1 (1) kkp ks) = exp(ln(1+p s)) = 1+p s \begin{array}{ccl} L_p(E,s)^{-1} &=& \displaystyle{ \exp \left( -\sum_{k = 1}^\infty \frac{(-1)^k}{k} p^{-k s}\right) } \\ \\ &=& \displaystyle{ \exp \left( \ln(1 + p^{-s}) \right) } \\ \\ &=& 1 + p^{-s} \end{array}

Again, I just used the Taylor series of the log function.

The local zeta function of an elliptic curve: good reduction

If our elliptic curve stays smooth over 𝔽 p\mathbb{F}_p, we say it has good reduction. Ironically this gives the most complicated local zeta function. In Theorem 1 last time we saw

|E(𝔽 p k)|=p kα kα¯ k+1 |E(\mathbb{F}_{p^k})| = p^k - \alpha^k - \overline{\alpha}^k + 1

where α\alpha is a complex number with αα¯=p\alpha \overline{\alpha} = p. We didn’t prove this, we literally just saw it: it’s a fairly substantial result due to Hasse.

So, in this case the correction is

c(p,k)=α k+α¯ k c(p,k) = \alpha^k + \overline{\alpha}^k

This gives

L p(E,s) 1 = exp( k=1 α k+α¯ kkp ks) = exp(ln(1αp s)+ln(1α¯p s)) = (1αp s)(1α¯p s) \begin{array}{ccl} L_p(E,s)^{-1} &=& \displaystyle{ \exp \left( -\sum_{k = 1}^\infty \frac{\alpha^k + \overline{\alpha}^k}{k} p^{-k s}\right) } \\ \\ &=& \displaystyle{ \exp \left( \ln\left(1 - \alpha p^{-s}\right) \; + \; \ln\left(1 - \overline{\alpha} p^{-s}\right) \right) } \\ \\ &=& (1 - \alpha p^{-s})(1 - \overline{\alpha} p^{-s}) \end{array}

Again I just used the Taylor series of the log function. I’m sure glad I went to class that day.

But we can get a bit further using αα¯=p\alpha \overline{\alpha} = p:

L p(E,s) 1 = (1αp s)(1α¯p s) = 1(α+α¯)p s+p 12s \begin{array}{ccl} L_p(E,s)^{-1} &=& (1 - \alpha p^{-s})(1 - \overline{\alpha} p^{-s}) \\ &=& 1 - (\alpha + \overline{\alpha})p^{-s} + p^{1-2s} \end{array}

At this point people usually notice that

|E(𝔽 p)|=pαα¯+1 |E(\mathbb{F}_{p})| = p - \alpha - \overline{\alpha} + 1

so

α+α¯=p+1|E(𝔽 p)| \alpha + \overline{\alpha} = p + 1 - |E(\mathbb{F}_{p})|

Thus, you can compute this number using just the number of points of our curve over 𝔽 p\mathbb{F}_p. And to be cute, people call this number something like a p(E)a_p(E). So in the end, for elliptic curves of good reduction over the prime pp we have

L p(E,s) 1=1a p(E)p s+p 12s L_p(E,s)^{-1} = 1 - a_p(E) p^{-s} + p^{1-2s}

Whew, we’re done!

The L-function of an elliptic curve, revisited

Okay, now we can summarize all our work in an explicit formula for the LL-function of an elliptic curve.

Theorem. The LL-function of an elliptic curve EE equals

L(E,s)= pL p(E,s) 1 L(E,s) = \prod_p L_p(E,s)^{-1}

where:

1) L p(E,s)=1a p(E)p s+p 12sL_p(E,s) = 1 - a_p(E) p^{-s} + p^{1-2s} if EE remains smooth over 𝔽 p\mathbb{F}_p. Here a p(E)a_p(E) is p+1p + 1 minus the number of points of EE over 𝔽 p\mathbb{F}_p.

2) L p(E,s)=1L_p(E,s) = 1 if EE gets a cusp over 𝔽 p\mathbb{F}_p.

3) L p(E,s)=1p sL_p(E,s) = 1 - p^{-s} if EE gets a node over 𝔽 p\mathbb{F}_p, and the two tangent lines to this node have slopes that are defined in 𝔽 p\mathbb{F}_p.

4) L p(E,s)=1+p sL_p(E,s) = 1 + p^{-s} if EE we gets a node over 𝔽 p\mathbb{F}_p, but the two tangent lines to this node have slopes that are not defined in 𝔽 p\mathbb{F}_p.

My god! This is exactly what I showed you in Part 1. So this rather elaborate theorem is what some people run around calling the definition of the LL-function of an elliptic curve!

n-Category Café Moving On From Kent

Was it really seventeen years ago that John broke the news on this blog that I had finally landed a permanent academic job? That was a long wait – I’d had twelve years of temporary contracts after receiving my PhD.

And now it has been decided that I am to move on from the University of Kent. The University is struggling financially and has decreed that a number of programs, including Philosophy, are to be cut. Whatever the wisdom of their plan, my time here comes to an end this July.

What next? It’s a little early for me to retire. If anyone has suggestions, I’d be happy to hear them.

We started this blog just one year before I started at Kent. To help think things over, in the coming weeks I thought I’d revisit some themes developed here over the years to see how they panned out:

  1. Higher geometry: categorifying the Erlanger program
  2. Category theory meets machine learning
  3. Duality
  4. Categorifying logic
  5. Category theory applied to philosophy
  6. Rationality of (mathematical and scientific) theory change as understood through historical development

April 20, 2024

n-Category Café The Modularity Theorem as a Bijection of Sets

guest post by Bruce Bartlett

John has been making some great posts on counting points on elliptic curves (Part 1, Part 2, Part 3). So I thought I’d take the opportunity and float my understanding here of the Modularity Theorem for elliptic curves, which frames it as an explicit bijection between sets. To my knowledge, it is not stated exactly in this form in the literature. There are aspects of this that I don’t understand (the explicit isogeny); perhaps someone can assist.

Bijection statement

Here is the statement as I understand it to be, framed as a bijection of sets. My chief reference is the wonderful book Elliptic Curves, Modular Forms and their L-Functions by Álvaro Lozano-Robledo (and references therein), as well as the standard reference A First Course in Modular Forms by Diamond and Shurman.

I will first make the statement as succinctly as I can, then I will ask the question I want to ask, then I will briefly explain the terminology I’ve used.

Modularity Theorem (Bijection version). The following maps are well-defined and inverse to each other, and give rise to an explicit bijection of sets:

{Elliptic curves defined over with conductorN}/isogeny{Integral normalized newforms of weight 2 for Γ 0(N)} \left\{\begin{array}{c} \text{Elliptic curves defined over}\: \mathbb{Q} \\ \text{with conductor}\: N \end{array} \right\} \: / \: \text{isogeny} \quad \leftrightarrows \quad \left\{ \begin{array}{c} \text{Integral normalized newforms} \\ \text{of weight 2 for }\: \Gamma_0(N) \end{array} \right\}

  • In the forward direction, given an elliptic curve EE defined over the rationals, we build the modular form f E(z)= n=1 a nq n,q=e 2πiz f_E(z) = \sum_{n=1}^\infty a_n q^n , \quad q=e^{2 \pi i z} where the coefficients a na_n are obtained by expanding out the following product over all primes as a Dirichlet series, pexp( k=1 |E(𝔽 p k)|kp ks)=a 11 s+a 22 s+a 33 s+a 44 s+, \prod_p \text{exp}\left( \sum_{k=1}^\infty \frac{|E(\mathbb{F}_{p^k})|}{k} p^{-k s} \right) = \frac{a_1}{1^s} + \frac{a_2}{2^s} + \frac{a_3}{3^s} + \frac{a_4}{4^s} + \cdots , where |E(𝔽 p k)||E(\mathbb{F}_{p^k})| counts the number of solutions to the equation for the elliptic curve over the finite field 𝔽 p k\mathbb{F}_{p^k} (including the point at infinity). So for example, as John taught us in Part 3, for good primes pp (which is almost all of them), a p=p+1|E(𝔽 p)|. a_p = p + 1 - |\!E(\mathbb{F}_p)\!|. But the above description tells you how to compute a na_n for any natural number nn. (By the way, the nontrivial content of the theorem is proving that f Ef_E is indeed a modular form for any elliptic curve EE).

  • In the reverse direction, given an integral normalized newform ff of weight 22 for Γ 0(N)\Gamma_0(N), we interpret it as a differential form on the genus gg modular surface X 0(N)X_0(N), and then compute its period lattice Λ\Lambda \subset \mathbb{C} by integrating it over all the 1-cycles in the first homology group of X 0(N)X_0(N). Then the resulting elliptic curve is E f=/ΛE_f = \mathbb{C}/\Lambda.

An explicit isogeny?

My question to the experts is the following. Suppose we start with an elliptic curve EE defined over \mathbb{Q}, then compute the modular form f Ef_E, and then compute its period lattice Λ\Lambda to arrive at the elliptic curve E=/ΛE' = \mathbb{C} / \Lambda. The theorem says that EE and EE' are isogenous. What is the explicit isogeny?

Explanations

  • An elliptic curve is a complex curve E 2E \subset \mathbb{C}\mathbb{P}^2 defined by a cubic polynomial F(X,Y,Z)=0F(X,Y,Z)=0 with rational coefficients, such that EE is smooth, i.e. the tangent vector (FX,FY,FZ)(\frac{\partial F}{\partial X}, \frac{\partial F}{\partial Y}, \frac{\partial F}{\partial Z}) does not vanish at any point pEp \in E. If the coefficients are all rational, then we say that EE is defined over \mathbb{Q}. We can always make a transformation of variables and write the equation for EE in an affine chart in Weierstrass form, y 2=x 3+Ax+B. y^2 = x^3 + A x + B. Importantly, every elliptic curve is isomorphic to one of the form /Λ\mathbb{C} / \Lambda where Λ\Lambda is a rank 2 sublattice of \mathbb{C}. So, an elliptic curve is topologically a doughnut S 1×S 1S^1 \times S^1, and it has an addition law making it into an abelian group.

  • An isogeny from EE to EE' is a surjective holomorphic homomorphism. This is actually an equivalence relation on the class of elliptic curves.

  • The conductor of an elliptic curve EE defined over the rationals is N= pp f p N = \prod_p p^{f_p} where: f p={0 ifEremains smooth over𝔽 p 1 ifEgets a node over𝔽 p 2 ifEgets a cusp over𝔽 pandp2,3 2+δ p ifEgets a cusp over𝔽 pandp=2or3 f_p = \begin{cases} 0 & \text{if}\:E\:\text{remains smooth over}\:\mathbb{F}_p \\ 1 & \text{if}\:E\:\text{gets a node over}\:\mathbb{F}_p \\ 2 & \text{if}\:E\:\text{gets a cusp over}\:\mathbb{F}_p\:\text{and}\: p \neq 2,3 \\ 2+\delta_p & \text{if}\:E\:\text{gets a cusp over}\:\mathbb{F}_p\:\text{and}\: p = 2\:\text{or}\:3 \end{cases} where δ p\delta_p is a technical invariant that describes whether there is wild ramification in the action of the inertia group at pp of Gal(¯/)\text{Gal}(\bar{\mathbb{Q}}/\mathbb{Q}) on the Tate module T p(E)T_p(E).

  • The modular curve X 0(N)X_0(N) is a certain compact Riemann surface which parametrizes isomorphism classes of pairs (E,C)(E,C) where EE is an elliptic curve and CC is a cyclic subgroup of EE of order NN.The genus of X 0(N)X_0(N) depends on NN.

  • A modular form ff for Γ 0(N)\Gamma_0(N) of weight kk is a certain kind of holomorphic function f:f : \mathbb{H} \rightarrow \mathbb{C}. The number NN is called the level of the modular form.

  • Every modular form f(z)f(z) can be expanded as a Fourier series f(z)= n=0 a nq n,q=e 2πiz f(z) = \sum_{n=0}^\infty a_n q^n, \quad q=e^{2 \pi i z} We say that ff is integral if all its Fourier coefficients a na_n are integers. We say ff is a cusp form if a 0=0a_0 = 0. A cusp form is called normalized if a 1=1a_1 = 1.

  • Geometrically, a cusp form of weight kk can be interpreted as a holomorphic section of a certain line bundle L kL_k over X 0(N)X_0(N). Since X 0(N)X_0(N) is compact, this implies that the vector space of cusp modular forms is finite-dimensional. (In particular, this means that ff is determined by only finitely many of its Fourier coefficients).

  • In particular, L 2L_2 is the cotangent bundle of X 0(N)X_0(N). This means that the cusp modular forms for Γ 0(N)\Gamma_0(N) of weight 2 can be interpreted as differential forms on X 0(N)X_0(N). That is to say, they are things that can be integrated along curves on X 0(N)X_0(N).

  • If you have a modular form of level MM which divides NN, then there is a way to build a new modular form of level NN. We call level NN forms of this type old. They form a subspace of the vector space S 2(Γ 0(N))S_2(\Gamma_0(N)). If we’re at level NN, then we are really interested in the new forms — these are the forms in S 2(Γ 0(N))S_2(\Gamma_0(N)) which are orthogonal to the old forms, with respect to a certain natural inner product.

  • If you have a weight 2 newform ff, and you interpret it as a differential form on X 0(N)X_0(N), then the integrals of ff along 1-cycles γ\gamma in X 0(N)X_0(N) will form a rank-2 sublattice Λ\Lambda \subset \mathbb{C}. (This may seem strange, since X 0(N)X_0(N) has genus gg, so you would expect the period integrals of ff to give a dense subset of \mathbb{C}, but that is the magic of being a newform: it only “sees” two directions in H 1(X 0(N),)H_1(X_0(N), \mathbb{Q})).

  • So, given a weight 2 newform ff, we get a canonical integration map I:X 0(N)/Λ I: X_0(N) \rightarrow \mathbb{C}/\Lambda obtained by fixing a basepoint x 0X 0(N)x_0 \in X_0(N) and then defining I(x)= γf I(x) = \int_{\gamma} f where γ\gamma is any path from x 0x_0 to xx in X 0(N)X_0(N). The answer won’t depend on the choice of path, because different choices will differ by a 1-cycle, and we are modding out by the periods of 1-cycles!

  • The Jacobian of a Riemann surface XX is the quotient group Jac(X)=Ω hol 1(X) /H 1(X;) \text{Jac}(X) = \Omega^1_\text{hol} (X)^\vee / H_1(X; \mathbb{Z}) This is why one version of the Modularity Theorem says:

    Modularity Theorem (Diamond and Shurman’s Version J CJ_C). There exists a surjective holomorphic homomorphism of the (higher-dimensional) complex torus Jac(X 0(N))\text{Jac}(X_0(N)) onto EE.

    I would like to ask the same question here as I asked before: is there an explicit description of this map?

Terence TaoTwo announcements: AI for Math resources, and erdosproblems.com

This post contains two unrelated announcements. Firstly, I would like to promote a useful list of resources for AI in Mathematics, that was initiated by Talia Ringer (with the crowdsourced assistance of many others) during the National Academies workshop on “AI in mathematical reasoning” last year. This list is now accepting new contributions, updates, or corrections; please feel free to submit them directly to the list (which I am helping Talia to edit). Incidentally, next week there will be a second followup webinar to the aforementioned workshop, building on the topics covered there. (The first webinar may be found here.)

Secondly, I would like to advertise the erdosproblems.com website, launched recently by Thomas Bloom. This is intended to be a living repository of the many mathematical problems proposed in various venues by Paul Erdős, who was particularly noted for his influential posing of such problems. For a tour of the site and an explanation of its purpose, I can recommend Thomas’s recent talk on this topic at a conference last week in honor of Timothy Gowers.

Thomas is currently issuing a call for help to develop the erdosproblems.com website in a number of ways (quoting directly from that page):

  • You know Github and could set a suitable project up to allow people to contribute new problems (and corrections to old ones) to the database, and could help me maintain the Github project;
  • You know things about web design and have suggestions for how this website could look or perform better;
  • You know things about Python/Flask/HTML/SQL/whatever and want to help me code cool new features on the website;
  • You know about accessibility and have an idea how I can make this website more accessible (to any group of people);
  • You are a mathematician who has thought about some of the problems here and wants to write an expanded commentary for one of them, with lots of references, comparisons to other problems, and other miscellaneous insights (mathematician here is interpreted broadly, in that if you have thought about the problems on this site and are willing to write such a commentary you qualify);
  • You knew Erdős and have any memories or personal correspondence concerning a particular problem;
  • You have solved an Erdős problem and I’ll update the website accordingly (and apologies if you solved this problem some time ago);
  • You have spotted a mistake, typo, or duplicate problem, or anything else that has confused you and I’ll correct things;
  • You are a human being with an internet connection and want to volunteer a particular Erdős paper or problem list to go through and add new problems from (please let me know before you start, to avoid duplicate efforts);
  • You have any other ideas or suggestions – there are probably lots of things I haven’t thought of, both in ways this site can be made better, and also what else could be done from this project. Please get in touch with any ideas!

I for instance contributed a problem to the site (#587) that Erdős himself gave to me personally (this was the topic of a somewhat well known photo of Paul and myself, and which he communicated again to be shortly afterwards on a postcard; links to both images can be found by following the above link). As it turns out, this particular problem was essentially solved in 2010 by Nguyen and Vu.

(Incidentally, I also spoke at the same conference that Thomas spoke at, on my recent work with Gowers, Green, and Manners; here is the video of my talk, and here are my slides.)

April 19, 2024

Scott Aaronson That IACR preprint

Update (April 19): Apparently a bug has been found, and the author has withdrawn the claim (see the comments).


For those who don’t yet know from their other social media: a week ago the cryptographer Yilei Chen posted a preprint, eprint.iacr.org/2024/555, claiming to give a polynomial-time quantum algorithm to solve lattice problems. For example, it claims to solve the GapSVP problem, which asks to approximate the length of the shortest nonzero vector in a given n-dimensional lattice, to within an approximation ratio of ~n4.5. The best approximation ratio previously known to be achievable in classical or quantum polynomial time was exponential in n.

If it’s correct, this is an extremely big deal. It doesn’t quite break the main lattice-based cryptosystems, but it would put those cryptosystems into a precarious position, vulnerable to a mere further polynomial improvement in the approximation factor. And, as we learned from the recent NIST competition, if the lattice-based and LWE-based systems were to fall, then we really don’t have many great candidates left for post-quantum public-key cryptography! On top of that, a full quantum break of LWE (which, again, Chen is not claiming) would lay waste (in a world with scalable QCs, of course) to a large fraction of the beautiful sandcastles that classical and quantum cryptographers have built up over the last couple decades—everything from Fully Homomorphic Encryption schemes, to Mahadev’s protocol for proving the output of any quantum computation to a classical skeptic.

So on the one hand, this would substantially enlarge the scope of exponential quantum speedups beyond what we knew a week ago: yet more reason to try to build scalable QCs! But on the other hand, it could also fuel an argument for coordinating to slow down the race to scalable fault-tolerant QCs, until the world can get its cryptographic house into better order. (Of course, as we’ve seen with the many proposals to slow down AI scaling, this might or might not be possible.)

So then, is the paper correct? I don’t know. It’s very obviously a serious effort by a serious researcher, a world away from the P=NP proofs that fill my inbox every day. But it might fail anyway. I’ve asked the world experts in quantum algorithms for lattice problems, and they’ve been looking at it, and none of them is ready yet to render a verdict. The central difficulty is that the algorithm is convoluted, and involves new tools that seem to come from left field, including complex Gaussian functions, the windowed quantum Fourier transform, and Karst waves (whatever those are). The algorithm has 9 phases by the author’s count. In my own perusal, I haven’t yet extracted even a high-level intuition—I can’t tell any little story like for Shor’s algorithm, e.g. “first you reduce factoring to period-finding, then you solve period-finding by applying a Fourier transform to a vector of amplitudes.”

So, the main purpose of this post is simply to throw things open to commenters! I’m happy to provide a public clearinghouse for questions and comments about the preprint, if those studying it would like that. You can even embed LaTeX in your comments, as will probably be needed to get anywhere.


Unrelated Update: Connor Tabarrok and his friends just put a podcast with me up on YouTube, in which they interview me in my office at UT Austin about watermarking of large language models and other AI safety measures.

Matt von HippelNo Unmoved Movers

Economists must find academics confusing.

When investors put money in a company, they have some control over what that company does. They vote to decide a board, and the board votes to hire a CEO. If the company isn’t doing what the investors want, the board can fire the CEO, or the investors can vote in a new board. Everybody is incentivized to do what the people who gave the money want to happen. And usually, those people want the company to increase its profits, since most of them people are companies with their own investors).

Academics are paid by universities and research centers, funded in the aggregate by governments and student tuition and endowments from donors. But individually, they’re also often funded by grants.

What grant-givers want is more ambiguous. The money comes in big lumps from governments and private foundations, which generally want something vague like “scientific progress”. The actual decision of who gets the money are made by committees made up of senior scientists. These people aren’t experts in every topic, so they have to extrapolate, much as investors have to guess whether a new company will be profitable based on past experience. At their best, they use their deep familiarity with scientific research to judge which projects are most likely to work, and which have the most interesting payoffs. At their weakest, though, they stick with ideas they’ve heard of, things they know work because they’ve seen them work before. That, in a nutshell, is why mainstream research prevails: not because the mainstream wants to suppress alternatives, but because sometimes the only way to guess if something will work is raw familiarity.

(What “works” means is another question. The cynical answers are “publishes papers” or “gets citations”, but that’s a bit unfair: in Europe and the US, most funders know that these numbers don’t tell the whole story. The trivial answer is “achieves what you said it would”, but that can’t be the whole story, because some goals are more pointless than others. You might want the answer to be “benefits humanity”, but that’s almost impossible to judge. So in the end the answer is “sounds like good science”, which is vulnerable to all the fads you can imagine…but is pretty much our only option, regardless.)

So are academics incentivized to do what the grant committees want? Sort of.

Science never goes according to plan. Grant committees are made up of scientists, so they know that. So while many grants have a review process afterwards to see whether you achieved what you planned, they aren’t all that picky about it. If you can tell a good story, you can explain why you moved away from your original proposal. You can say the original idea inspired a new direction, or that it became clear that a new approach was necessary. I’ve done this with an EU grant, and they were fine with it.

Looking at this, you might imagine that an academic who’s a half-capable storyteller could get away with anything they wanted. Propose a fashionable project, work on what you actually care about, and tell a good story afterwards to avoid getting in trouble. As long as you’re not literally embezzling the money (the guy who was paying himself rent out of his visitor funding, for instance), what could go wrong? You get the money without the incentives, you move the scientific world and nobody gets to move you.

It’s not quite that easy, though.

Sabine Hossenfelder told herself she could do something like this. She got grants for fashionable topics she thought were pointless, and told herself she’d spend time on the side on the things she felt were actually important. Eventually, she realized she wasn’t actually doing the important things: the faddish research ended up taking all her time. Not able to get grants doing what she actually cared about (and, in one of those weird temporary European positions that only lasts until you run out of grants), she now has to make a living from her science popularization work.

I can’t speak for Hossenfelder, but I’ve also put some thought into how to choose what to research, about whether I could actually be an unmoved mover. A few things get in the way:

First, applying for grants doesn’t just take storytelling skills, it takes scientific knowledge. Grant committees aren’t experts in everything, but they usually send grants to be reviewed by much more appropriate experts. These experts will check if your grant makes sense. In order to make the grant make sense, you have to know enough about the faddish topic to propose something reasonable. You have to keep up with the fad. You have to spend time reading papers, and talking to people in the faddish subfield. This takes work, but also changes your motivation. If you spend time around people excited by an idea, you’ll either get excited too, or be too drained by the dissonance to get any work done.

Second, you can’t change things that much. You still need a plausible story as to how you got from where you are to where you are going.

Third, you need to be a plausible person to do the work. If the committee looks at your CV and sees that you’ve never actually worked on the faddish topic, they’re more likely to give a grant to someone who’s actually worked on it.

Fourth, you have to choose what to do when you hire people. If you never hire any postdocs or students working on the faddish topic, then it will be very obvious that you aren’t trying to research it. If you do hire them, then you’ll be surrounded by people who actually care about the fad, and want your help to understand how to work with it.

Ultimately, to avoid the grant committee’s incentives, you need a golden tongue and a heart of stone, and even then you’ll need to spend some time working on something you think is pointless.

Even if you don’t apply for grants, even if you have a real permanent position or even tenure, you still feel some of these pressures. You’re still surrounded by people who care about particular things, by students and postdocs who need grants and jobs and fellow professors who are confident the mainstream is the right path forward. It takes a lot of strength, and sometimes cruelty, to avoid bowing to that.

So despite the ambiguous rules and lack of oversight, academics still respond to incentives: they can’t just do whatever they feel like. They aren’t bound by shareholders, they aren’t expected to make a profit. But ultimately, the things that do constrain them, expertise and cognitive load, social pressure and compassion for those they mentor, those can be even stronger.

I suspect that those pressures dominate the private sector as well. My guess is that for all that companies think of themselves as trying to maximize profits, the all-too-human motivations we share are more powerful than any corporate governance structure or org chart. But I don’t know yet. Likely, I’ll find out soon.

April 18, 2024

Tommaso DorigoOn Rating Universities

In a world where we live hostages of advertisement, where our email addresses and phone numbers are sold and bought by companies eager to intrude in our lives and command our actions, preferences, tastes; in a world where appearance trumps substance 10 to zero, where your knowledge and education are less valued than your looks, a world where truth is worth dimes and myths earn you millions - in this XXI century world, that is, Universities look increasingly out of place. 

read more

n-Category Café The Quintic, the Icosahedron, and Elliptic Curves

Old-timers here will remember the days when Bruce Bartlett and Urs Schreiber were regularly talking about 2-vector spaces and the like. Later I enjoyed conversations with Bruce and Greg Egan on quintics and the icosahedron. And now Bruce has come out with a great article linking those topics to elliptic curves!

It’s expository and fun to read.

I can’t do better than quoting the start:

There is a remarkable relationship between the roots of a quintic polynomial, the icosahedron, and elliptic curves. This discovery is principally due to Felix Klein (1878), but Klein’s marvellous book misses a trick or two, and doesn’t tell the whole story. The purpose of this article is to present this relationship in a fresh, engaging, and concise way. We will see that there is a direct correspondence between:

  • “evenly ordered” roots (x 1,,x 5)(x_1, \dots, x_5) of a Brioschi quintic x 5+10bx 3+45bx 2+b 2=0x^5 + 10b x^3 + 45b x^2 + b^2 = 0,
  • points on the icosahedron, and
  • elliptic curves equipped with a primitive basis for their 5-torsion, up to isomorphism.

Moreover, this correspondence gives us a very efficient direct method to actually calculate the roots of a general quintic! For this, we’ll need some tools both new and old, such as Cremona and Thongjunthug’s complex arithmetic geometric mean, and the Rogers–Ramanujan continued fraction. These tools are not found in Klein’s book, as they had not been invented yet!

If you are impatient, skip to the end to see the algorithm.

If not, join me on a mathematical carpet ride through the mathematics of the last four centuries. Along the way we will marvel at Kepler’s Platonic model of the solar system from 1597, witness Gauss’ excitement in his diary entry from 1799, and experience the atmosphere in Trinity College Hall during the wonderful moment Ramanujan burst onto the scene in 1913.

The prose sizzles with excitement, and the math lives up to this.

April 17, 2024

Matt Strassler Speaking Today in Seattle, Tomorrow near Portland

A quick reminder, to those in the northwest’s big cities, that I will be giving two talks about my book in the next 48 hours:

Hope to see some of you there! (You can keep track of my speaking events at my events page.)

John BaezAgent-Based Models (Part 8)

Last time I presented a class of agent-based models where agents hop around a graph in a stochastic way. Each vertex of the graph is some ‘state’ agents can be in, and each edge is called a ‘transition’. In these models, the probability per time of an agent making a transition and leaving some state can depend on when it arrived at that state. It can also depend on which agents are in other states that are ‘linked’ to that edge—and when those agents arrived.

I’ve been trying to generalize this framework to handle processes where agents are born or die—or perhaps more generally, processes where some number of agents turn into some other number of agents. There’s already a framework that does something sort of like this. It’s called ‘stochastic Petri nets’, and we explained this framework here:

• John Baez and Jacob Biamonte, Quantum Techniques for Stochastic Mechanics, World Scientific Press, Singapore, 2018. (See also blog articles here.)

However, in their simplest form, stochastic Petri nets are designed for agents whose only distinguishing information is which state they’re in. They don’t have ‘names’—that is, individual identities. Thus, even calling them ‘agents’ is a bit of a stretch: usually they’re called ‘tokens’, since they’re drawn as black dots.

We could try to enhance the Petri net framework to give tokens names and other identifying features. There are various imaginable ways to do this, such as ‘colored Petri nets’. But so far this approach seems rather ill-adapted for processes where agents have identities—perhaps because I’m not thinking about the problem the right way.

So, at some point I decided to try something less ambitious. It turns out that in applications to epidemiology, general processes where n agents come in and m go out are not often required. So I’ve been trying to minimally enhance the framework from last time to include processes ‘birth’ and ‘death’ processes as well as transitions from state to state.

As I thought about this, some questions kept plaguing me:

When an agent gets created, or ‘born’, which one actually gets born? In other words, what is its name? Its precise name may not matter, but if we want to keep track of it after it’s born, we need to give it a name. And this name had better be ‘fresh’: not already the name of some other agent.

There’s also the question of what happens when an agent gets destroyed, or ‘dies’. This feels less difficult: there just stops being an agent with the given name. But probably we want to prevent a new agent from having the same name as that dead agent.

Both these questions seem fairly simple, but so far they’re making it hard for me to invent a truly elegant framework. At first I tried to separately describe transitions between states, births, and deaths. But this seemed to triplicate the amount of work I needed to do.

Then I tried models that have

• a finite set S of states,

• a finite set T of transitions,

• maps u, d \colon T \to S + \{\textrm{undefined}\} mapping each transition to its upstream and downstream states.

Here S + \{\textrm{undefined}\} is the disjoint union of S and a singleton whose one element is called undefined. Maps from T to S + \{\textrm{undefined}\} are a standard way to talk about partially defined maps from T to S. We get four cases:

1) If the downstream of a transition is defined (i.e. in S) but its upstream is undefined we call this transition a birth transition.

2) If the upstream of a transition is defined but its downstream is undefined we call this transition a death transition.

3) If the upstream and downstream of a transition are both defined we call this transition a transformation. In practice most of transitions will be of this sort.

4) We never need transitions whose upstream and downstream are undefined: these would describe agents that pop into existence and instantly disappear.

This is sort of nice, except for the fourth case. Unfortunately when I go ahead and try to actually describe a model based on this paradigm, I seem still to wind up needing to handle births, deaths and transformations quite differently.

For example, last time my models had a fixed set A of agents. To handle births and deaths, I wanted to make this set time-dependent. But I need to separately say how this works for transformations, birth transitions and death transitions. For transformations we don’t change A. For birth transitions we add a new element to A. And for death transitions we remove an element from A, and maybe record its name on a ledger or drive a stake through its heart to make sure it can never be born again!

So far this is tolerable, but things get worse. Our model also needs ‘links’ from states to transitions, to say how agents present in those states affect the timing of those transition. These are used in the ‘jump function’, a stochastic function that answers this question:

If at time t agent a arrives at the state upstream to some transition e, and the agents at states linked to the transition e form some set S_e, when will agent a make the transition e given that it doesn’t do anything else first?

This works fine for transformations, meaning transitions e that have both an upstream and downstream state. It works just a tiny bit differently for death transitions. But birth transitions are quite different: since newly born agents don’t have a previous upstream state u(e), they don’t have a time at which they arrived at that state.

Perhaps this is just how modeling works: perhaps the search for a staggeringly beautiful framework is a distraction. But another approach just occurred to me. Today I just want to briefly state it. I don’t want to write a full blog article on it yet, since I’ve already spent a lot of time writing two articles that I deleted when I became disgusted with them—and I might become disgusted with this approach too!

Briefly, this approach is exactly the approach I described last time. There are fundamentally no births and no deaths: all transitions have an upstream and a downstream state. There is a fixed set A of agents that does not change with time. We handle births and deaths using a dirty trick.

Namely, births are transitions out of a ‘unborn’ state. Agents hang around in this state until they are born.

Similarly, deaths are transitions to a ‘dead’ state.

There can be multiple ‘unborn’ states and ‘dead’ states. Having multiple unborn states makes it easy to have agents with different characteristics enter the model. Having multiple dead states makes it easy for us to keep tallies of different causes of death. We should make the unborn states distinct from the dead states to prevent ‘reincarnation’—that is, the birth of a new agent that happens to equal an agent that previously died.

I’m hoping that when we proceed this way, we can shoehorn birth and death processes into the framework described last time, without really needing to modify it at all! All we’re doing is exploiting it in a new way.

Here’s one possible problem: if we start with a finite number of agents in the ‘unborn’ states, the population of agents can’t grow indefinitely! But this doesn’t seem very dire. For most agent-based models we don’t feel a need to let the number of agents grow arbitrarily large. Or we can relax the requirement that the set of agents is finite, and put an infinite number of agents u_1, u_2, u_3, \dots in an unborn state. This can be done without using an infinite amount of memory: it’s a ‘potential infinity’ rather than an ‘actual infinity’.

There could be other problems. So I’ll post this now before I think of them.

April 16, 2024

Matt Strassler Why The Higgs Field is Nothing Like Molasses, Soup, or a Crowd

The idea that a field could be responsible for the masses of particles (specifically the masses of photon-like [“spin-one”] particles) was proposed in several papers in 1964. They included one by Peter Higgs, one by Robert Brout and Francois Englert, and one, slightly later but independent, by Gerald Guralnik, C. Richard Hagen, and Tom Kibble. This general idea was then incorporated into a specific theory of the real world’s particles; this was accomplished in 1967-1968 in two papers, one written by Steven Weinberg and one by Abdus Salam. The bare bones of this “Standard Model of Particle Physics” was finally confirmed experimentally in 2012.

How precisely can mass come from a field? There’s a short answer to this question, invented a couple of decades ago. It’s the kind of answer that serves if time is short and attention spans are limited; it is intended to sound plausible, even though the person delivering the “explanation” knows that it is wrong. In my recent book, I called this type of little lie, a compromise that physicists sometimes have to make between giving no answer and giving a correct but long answer, a “phib” — a physics fib. Phibs are usually harmless, as long as people don’t take them seriously. But the Higgs field’s phib is particularly problematic.

The Higgs Phib

The Higgs phib comes in various forms. Here’s a particularly short one:

There’s this substance, like a soup, that fills the universe; that’s the Higgs field. As objects move through it, the soup slows them down, and that’s how they get mass.

Some variants replace the soup with other thick substances, or even imagine the field as though it were a crowd of people.

How bad is this phib, really? Well, here’s the problem with it. This phib violates several basic laws of physics. These include foundational laws that have had a profound impact on human culture and are the first ones taught in any physics class. It also badly misrepresents what a field is and what it can do. As a result, taking the phib seriously makes it literally impossible to understand the universe, or even daily human experience, in a coherent way. It’s a pedagogical step backwards, not forwards.

What’s Wrong With The Higgs Phib

So here are my seven favorite reasons to put a flashing red warning sign next to any presentation of the Higgs phib.

1. Against The Principle of Relativity

The phib brazenly violates the principle of relativity — both Galileo’s original version and Einstein’s updates to it. That principle, the oldest law of physics that has never been revised, says that if your motion is steady and you are in a closed room, no experiment can tell you your speed, your direction of motion, or even whether you are in motion at all. The phib directly contradicts this principle. It claims that

  • if an object moves, the Higgs field affects it by slowing it down, while
  • if it doesn’t move, the Higgs field does nothing to it.

But if that were true, the action of the Higgs field could easily allow you to distinguish steady motion from being stationary, and the principle of relativity would be false.

2. Against Newton’s First Law of Motion

The phib violates Newton’s first law of motion — that an object in motion not acted on by any force will remain in steady motion. If the Higgs field slowed things down, it could only do so, according to this law, by exerting a force.

But Newton, in predicting the motions of the planets, assumed that the only force acting on the planets was that of gravity. If the Higgs field exerted an additional force on the planets simply because they have mass (or because it was giving them mass), Newton’s methods for predicting planetary motions would have failed.

Worse, the slowing from the Higgs field would have acted like friction over billions of years, and would by now have caused the Earth to slow down and spiral into the Sun.

3. Against Newton’s Second Law of Motion

The phib also violates Newton’s second law of motion, by completely misrepresenting what mass is. It makes it seem as though mass makes motion difficult, or at least has something to do with inhibiting motion. But this is wrong.

As Newton’s second law states, mass is something that inhibits changes in motion. It does not inhibit motion, or cause things to slow down, or arise from things being slowed down. Mass is the property that makes it hard both to speed something up and to slow it down. It makes it harder to throw a lead ball compared to a plastic one, and it also makes the lead ball harder to catch bare-handed than a plastic one. It also makes it difficult to change something’s direction.

To say this another way, Newton’s second law F=ma says that to make a change in an object’s motion (an acceleration a) requires a force (F); the larger the object’s mass (m), the larger the required force must be. Notice that it does not have anything to say about an object’s motion (its velocity v).

To suggest that mass has to do with motion, and not with change in motion, is to suggest that Newton’s law should be F=mv — which, in fact, many pre-Newtonian physicists once believed. Let’s not let a phib throw us back to the misguided science of the Middle Ages!

4. Not a Universal Mass-Giver

The phib implies that the Higgs field gives mass to all objects with mass, causing all of them to slow down. After all, if there were a universal “soup” found everywhere, then every object would encounter it. If it were true that the Higgs field acted on all objects in the same way — “universally”, similar to gravity, which pulls on all objects — then every object in our world would get its mass from the Higgs field.

But in fact, the Higgs field only generates the masses of the known elementary particles. More complex particles such as protons and neutrons — and therefore the atoms, molecules, humans and planets that contain them — get most of their mass in another way. The phib, therefore, can’t be right about how the Higgs field does its job.

5. Not Like a Substance

As is true of all fields, the Higgs field is not like a substance, in contrast to soup, molasses, or a crowd. It has no density or materiality, as soup would have. Instead, the Higgs field (like any field!) is more like a property of a substance.

As an analogue, consider air pressure (which is itself an example of an ordinary field.) Air is a substance; it is made of molecules, and has density and weight. But air’s pressure is not a thing; it is a property of air, , and is not itself a substance. Pressure has no density or weight, and is not made from anything. It just tells you what the molecules of air are doing.

The Higgs field is much more like air pressure than it is like air itself. It simply is not a substance, despite what the phib suggests.

6. Not Filling the Universe

The Higgs field does not “fill” the universe any more than pressure fills the atmosphere. Pressure is found throughout the atmosphere, yes, but it is not what makes the atmosphere full. Air is what constitutes the atmosphere, and is the only thing that can be said, in any sense, to fill it.

While a substance could indeed make the universe more full than it would otherwise be, a field of the universe is not a substance. Like the magnetic field or any other cosmic field, the Higgs field exists everywhere — but the universe would be just as empty (and just as full) if the Higgs field did not exist.

7. Not Merely By Its Presence

Finally, the phib doesn’t mention the thing that makes the Higgs field special, and that actually allows it to affect the masses of particles. This is not merely that it is present everywhere across the universe, but that it is, in a sense, “on.” To give you a sense of what this might mean, consider the wind.

On a day with a steady breeze, we can all feel the wind. But even when the wind is calm, physicists would say that the wind exists, though it is inactive. In the language I’m using here, I would say that the wind is something that can always be measured — it always exists — but

  • on a calm day it is “off” or “zero”, while
  • on a day with a steady breeze, it is “on” or “non-zero”.

In other words, the wind is always present, whether it is calm or steady; it can always be measured.

In rough analogy, the Higgs field, though switched on in our universe, might in principle have been off. A switched-off Higgs field would not give mass to anything. The Higgs field affects the masses of elementary particles in our universe only because, in addition to being present, it is on. (Physicists would say it has a “non-zero average value” or a “non-zero vacuum expectation value”)

Why is it on? Great question. From the theoretical point of view, it could have been either on or off, and we don’t know why the universe arranged for the former.

Beyond the Higgs Phib

I don’t think we can really view a phib with so many issues as an acceptable pseudo-explanation. It causes more problems and confusions than it resolves.

But I wish it were as easy to replace the Higgs phib as it is to criticize it. No equally short story can do the job. If such a brief tale were easy to imagine, someone would have invented it by now.

Some years ago, I found a way to explain how the Higgs field works that is non-technical and yet correct — one that I would be happy to present to my professional physics colleagues without apology or embarrassment. (In fact, I did just that in my recent talks at the physics departments at Vanderbilt and Irvine.) Although I tried delivering it to non-experts in an hour-long talk, I found that it just doesn’t fit. But it did fit quite well in a course for non-experts, in which I had several hours to lay out the basics of particle physics before addressing the Higgs field’s role.

That experience motivated me to write a book that contains this explanation. It isn’t brief, and it’s not a light read — the universe is subtle, and I didn’t want to water the explanation down. But it does deliver what it promises. It first carefully explains what “elementary particles” and fields really are [here’s more about fields] and what it means for such a “particle” to have mass. Then it gives the explanation of the Higgs field’s effects — to the extent we understand them. (Readers of the book are welcome to ask me questions about its content; I am collecting Q&A and providing additional resources for readers on this part of the website.)

A somewhat more technical explanation of how the Higgs field works is given elsewhere on this website: check out this series of pages followed by this second series, with additional technical information available in this third series. These pages do not constitute a light read either! But if you are comfortable with first-year university math and physics, you should be able to follow them. Ask questions as need be.

Between the book, the above-mentioned series of webpages, and my answers to your questions, I hope that most readers who want to know more about the Higgs field can find the explanation that best fits their interests and background.

John BaezAgent-Based Models (Part 7)

Last time I presented a simple, limited class of agent-based models where each agent independently hops around a graph. I wrote:

Today the probability for an agent to hop from one vertex of the graph to another by going along some edge will be determined the moment the agent arrives at that vertex. It will depend only on the agent and the various edges leaving that vertex. Later I’ll want this probability to depend on other things too—like whether other agents are at some vertex or other. When we do that, we’ll need to keep updating this probability as the other agents move around.

Let me try to figure out that generalization now.

Last time I discovered something surprising to me. To describe it, let’s bring in some jargon. The conditional probability per time of an agent making a transition from its current state to a chosen other state (given that it doesn’t make some other transition) is called the hazard function of that transition. In a Markov process, the hazard function is actually a constant, independent of how long the agent has been in its current state. In a semi-Markov process, the hazard function is a function only of how long the agent has been in its current state.

For example, people like to describe radioactive decay using a Markov process, since experimentally it doesn’t seem that ‘old’ radioactive atoms decay at a higher or lower rate than ‘young’ ones. (Quantum theory says this can’t be exactly true, but nobody has seen deviations yet.) On the other hand, the death rate of people is highly non-Markovian, but we might try to describe it using a semi-Markov process. Shortly after birth it’s high—that’s called ‘infant mortality’. Then it goes down, and then it gradually increases.

We definitely want to our agent-based processes to have the ability to describe semi-Markov processes. What surprised me last time is that I could do it without explicitly keeping track of how long the agent has been in its current state, or when it entered its current state!

The reason is that we can decide which state an agent will transition to next, and when, as soon as it enters its current state. This decision is random, of course. But using random number generators we can make this decision the moment the agent enters the given state—because there is nothing more to be learned by waiting! I described an algorithm for doing this.

I’m sure this is well-known, but I had fun rediscovering it.

But today I want to allow the hazard function for a given agent to make a given transition to depend on the states of other agents. In this case, if some other agent randomly changes state, we will need to recompute our agent’s hazard function. There is probably no computationally feasible way to avoid this, in general. In some analytically solvable models there might be—but we’re simulating systems precisely because we don’t know how to solve them analytically.

So now we’ll want to keep track of the residence time of each agent—that is, how long it’s been in its current state. But William Waites pointed out a clever way to do this: it’s cheaper to keep track of the agent’s arrival time, i.e. when it entered its current state. This way you don’t need to keep updating the residence time. Whenever you need to know the residence time, you can just subtract the arrival time from the current clock time.

Even more importantly, our model should now have ‘informational links’ from states to transitions. If we want the presence or absence of agents in some state to affect the hazard function of some transition, we should draw a ‘link’ from that state to that transition! Of course you could say that anything is allowed to affect anything else. But this would create an undisciplined mess where you can’t keep track of the chains of causation. So we want to see explicit ‘links’.

So, here’s my new modeling approach, which generalizes the one we saw last time. For starters, a model should have:

• a finite set V of vertices or states,

• a finite set E of edges or transitions,

• maps u, d \colon E \to V mapping each edge to its source and target, also called its upstream and downstream,

• finite set A of agents,

• a finite set L of links,

• maps s \colon L \to V and t \colon L \to E mapping each link to its source (a state) and its target (a transition).

All of this stuff, except for the set of agents, is exactly what we had in our earlier paper on stock-flow models, where we treated people en masse instead of as individual agents. You can see this in Section 2.1 here:

• John Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood, Evan Patterson, Compositional modeling with stock and flow models.

So, I’m trying to copy that paradigm, and eventually unify the two paradigms as much as possible.

But they’re different! In particular, our agent-based models will need a ‘jump function’. This says when each agent a \in A will undergo a transition e \in E if it arrives at the state upstream to that transition at a specific time t \in \mathbb{R}. This jump function will not be deterministic: it will be a stochastic function, just as it was in yesterday’s formalism. But today it will depend on more things! Yesterday it depended only on a, e and t. But now the links will come into play.

For each transition e \in E, there is set of links whose target is that transition, namely

t^{-1}(e) = \{\ell \in L \; \vert \; t(\ell) = e \}

Each link in \ell \in  t^{-1}(e) will have one state v as its source. We say this state affects the transition e via the link \ell.

We want the jump function for the transition e to depend on the presence or absence of agents in each state that affects this transition.

Which agents are in a given state? Well, it depends! But those agents will always form some subset of A, and thus an element of 2^A. So, we want the jump function for the transition e to depend on an element of

\prod_{\ell \in t^{-1}(e)} 2^A = 2^{A \times t^{-1}(e)}

I’ll call this element S_e. And as mentioned earlier, the jump function will also depend on a choice of agent a \in A and on the arrival time of the agent a.

So, we’ll say there’s a jump function j_e for each transition e, which is a stochastic function

j_e \colon A \times 2^{A \times t^{-1}(e)} \times \mathbb{R} \rightsquigarrow \mathbb{R}

The idea, then, is that j_e(a, S_e, t) is the answer to this question:

If at time t agent a arrived at the vertex u(e), and the agents at states linked to the edge e are described by the set S_e, when will agent a move along the edge e to the vertex d(e), given that it doesn’t do anything else first?

The answer to this question can keep changing as agents other than a move around, since the set S_e can keep changing. This is the big difference between today’s formalism and yesterday’s.

Here’s how we run our model. At every moment in time we keep track of some information about each agent a \in A, namely:

• Which vertex is it at now? We call this vertex the agent’s state, \sigma(a).

• When did it arrive at this vertex? We call this time the agent’s arrival time, \alpha(a).

• For each edge e whose upstream is \sigma(a), when will agent a move along this edge if it doesn’t do anything else first? Call this time T(a,e).

I need to explain how we keep updating these pieces of information (supposing we already have them). Let’s assume that at some moment in time t_i an agent makes a transition. More specifically, suppose agent \underline{a} \in A makes a transition \underline{e} from the state

\underline{v} = u(\underline{e}) \in V

to the state

\underline{v}' = d(\underline{e}) \in V.

At this moment we update the following information:

1) We set

\alpha(\underline{a}) := t_i

(So, we update the arrival time of that agent.)

2) We set

\sigma(\underline{a}) := \underline{v}'

(So, we update the state of that agent.)

3) We recompute the subset of agents in the state \underline{v} (by removing \underline{a} from this subset) and in the state \underline{v}' (by adding \underline{a} to this subset).

4) For every transition f that’s affected by the state \underline{v} or the state \underline{v}', and for every agent a in the upstream state of that transition, we set

T(a,f) := j_f(a, S_f, \alpha(a))

where S_f is the element of 2^{A \times t^{-1}(f)} saying which subset of agents is in each state affecting the transition f. (So, we update our table of times at which agent a will make the transition f, given that it doesn’t do anything else first.)

Now we need to compute the next time at which something happens, namely t_{i+1}. And we need to compute what actually happens then!

To do this, we look through our table of times T(a,e) for each agent a and all transitions out of the state that agent is in. and see which time is smallest. If there’s a tie, break it. Then we reset \underline{a} and \underline{e} to be the agent-edge pair that minimizes T(a,e).

5) We set

t_{i+1} := T(\underline{a},\underline{e})

Then we loop back around to step 1), but with i+1 replacing i.

Whew! I hope you followed that. If not, please ask questions.

Doug NatelsonThe future of the semiconductor industry, + The Mechanical Universe

 Three items of interest:

  • This article is a nice review of present semiconductor memory technology.  The electron micrographs in Fig. 1 and the scaling history in Fig. 3 are impressive.
  • This article in IEEE Spectrum is a very interesting look at how some people think we will get to chips for AI applications that contain a trillion (\(10^{12}\)) transistors.  For perspective, the processor in my laptop used to write this has about 40 billion transistors.  (The article is nice, though the first figure commits the terrible sin of having no y-axis number or label; clearly it's supposed to represent exponential growth as a function of time in several different parameters.)
  • Caltech announced the passing of David Goodstein, renowned author of States of Matter and several books about the energy transition.  I'd written about my encounter with him, and I wanted to take this opportunity to pass along a working link to the youtube playlist for The Mechanical Universe.  While the animation can look a little dated, it's worth noting that when this was made in the 1980s, the CGI was cutting edge stuff that was presented at siggraph.

April 15, 2024

John PreskillHow I didn’t become a philosopher (but wound up presenting a named philosophy lecture anyway)

Many people ask why I became a theoretical physicist. The answer runs through philosophy—which I thought, for years, I’d left behind in college.

My formal relationship with philosophy originated with Mr. Bohrer. My high school classified him as a religion teacher, but he co-opted our junior-year religion course into a philosophy course. He introduced us to Plato’s cave, metaphysics, and the pursuit of the essence beneath the skin of appearance. The essence of reality overlaps with quantum theory and relativity, which fascinated him. Not that he understood them, he’d hasten to clarify. But he passed along that fascination to me. I’d always loved dealing in abstract ideas, so the notion of studying the nature of the universe attracted me. A friend and I joked about growing up to be philosophers and—on account of not being able to find jobs—living in cardboard boxes next to each other.

After graduating from high school, I searched for more of the same in Dartmouth College’s philosophy department. I began with two prerequisites for the philosophy major: Moral Philosophy and Informal Logic. I adored those courses, but I adored all my courses.

As a sophomore, I embarked upon Dartmouth’s philosophy-of-science course. I was one of the course’s youngest students, but the professor assured me that I’d accumulated enough background information in science and philosophy classes. Yet he and the older students threw around technical terms, such as qualia, that I’d never heard of. Those terms resurfaced in the assigned reading, again without definitions. I struggled to follow the conversation.

Meanwhile, I’d been cycling through the sciences. I’d taken my high school’s highest-level physics course, senior year—AP Physics C: Mechanics and Electromagnetism. So, upon enrolling in college, I made the rounds of biology, chemistry, and computer science. I cycled back to physics at the beginning of sophomore year, taking Modern Physics I in parallel with Informal Logic. The physics professor, Miles Blencowe, told me, “I want to see physics in your major.” I did, too, I assured him. But I wanted to see most subjects in my major.

Miles, together with department chair Jay Lawrence, helped me incorporate multiple subjects into a physics-centric program. The major, called “Physics Modified,” stood halfway between the physics major and the create-your-own major offered at some American liberal-arts colleges. The program began with heaps of prerequisite courses across multiple departments. Then, I chose upper-level physics courses, a math course, two history courses, and a philosophy course. I could scarcely believe that I’d planted myself in a physics department; although I’d loved physics since my first course in it, I loved all subjects, and nobody in my family did anything close to physics. But my major would provide a well-rounded view of the subject.

From shortly after I declared my Physics Modified major. Photo from outside the National Academy of Sciences headquarters in Washington, DC.

The major’s philosophy course was an independent study on quantum theory. In one project, I dissected the “EPR paper” published by Einstein, Podolsky, and Rosen (EPR) in 1935. It introduced the paradox that now underlies our understanding of entanglement. But who reads the EPR paper in physics courses nowadays? I appreciated having the space to grapple with the original text. Still, I wanted to understand the paper more deeply; the philosophy course pushed me toward upper-level physics classes.

What I thought of as my last chance at philosophy evaporated during my senior spring. I wanted to apply to graduate programs soon, but I hadn’t decided which subject to pursue. The philosophy and history of physics remained on the table. A history-of-physics course, taught by cosmologist Marcelo Gleiser, settled the matter. I worked my rear off in that course, and I learned loads—but I already knew some of the material from physics courses. Moreover, I knew the material more deeply than the level at which the course covered it. I couldn’t stand the thought of understanding the rest of physics only at this surface level. So I resolved to burrow into physics in graduate school. 

Appropriately, Marcelo published a book with a philosopher (and an astrophysicist) this March.

Burrow I did: after a stint in condensed-matter research, I submerged up to my eyeballs in quantum field theory and differential geometry at the Perimeter Scholars International master’s program. My research there bridged quantum information theory and quantum foundations. I appreciated the balance of fundamental thinking and possible applications to quantum-information-processing technologies. The rigorous mathematical style (lemma-theorem-corollary-lemma-theorem-corollary) appealed to my penchant for abstract thinking. Eating lunch with the Perimeter Institute’s quantum-foundations group, I felt at home.

Craving more research at the intersection of quantum thermodynamics and information theory, I enrolled at Caltech for my PhD. As I’d scarcely believed that I’d committed myself to my college’s physics department, I could scarcely believe that I was enrolling in a tech school. I was such a child of the liberal arts! But the liberal arts include the sciences, and I ended up wrapping Caltech’s hardcore vibe around myself like a favorite denim jacket.

Caltech kindled interests in condensed matter; atomic, molecular, and optical physics; and even high-energy physics. Theorists at Caltech thought not only abstractly, but also about physical platforms; so I started to, as well. I began collaborating with experimentalists as a postdoc, and I’m now working with as many labs as I can interface with at once. I’ve collaborated on experiments performed with superconducting qubits, photons, trapped ions, and jammed grains. Developing an abstract idea, then nursing it from mathematics to reality, satisfies me. I’m even trying to redirect quantum thermodynamics from foundational insights to practical applications.

At the University of Toronto in 2022, with my experimental collaborator Batuhan Yılmaz—and a real optics table!

So I did a double-take upon receiving an invitation to present a named lecture at the University of Pittsburgh Center for Philosophy of Science. Even I, despite not being a philosopher, had heard of the cache of Pitt’s philosophy-of-science program. Why on Earth had I received the invitation? I felt the same incredulity as when I’d handed my heart to Dartmouth’s physics department and then to a tech school. But now, instead of laughing at the image of myself as a physicist, I couldn’t see past it.

Why had I received that invitation? I did a triple-take. At Perimeter, I’d begun undertaking research on resource theories—simple, information-theoretic models for situations in which constraints restrict the operations one can perform. Hardly anyone worked on resource theories then, although they form a popular field now. Philosophers like them, and I’ve worked with multiple classes of resource theories by now.

More recently, I’ve worked with contextuality, a feature that distinguishes quantum theory from classical theories. And I’ve even coauthored papers about closed timelike curves (CTCs), hypothetical worldlines that travel backward in time. CTCs are consistent with general relativity, but we don’t know whether they exist in reality. Regardless, one can simulate CTCs, using entanglement. Collaborators and I applied CTC simulations to metrology—to protocols for measuring quantities precisely. So we kept a foot in practicality and a foot in foundations.

Perhaps the idea of presenting a named lecture on the philosophy of science wasn’t hopelessly bonkers. All right, then. I’d present it.

Presenting at the Center for Philosophy of Science

This March, I presented an ALS Lecture (an Annual Lecture Series Lecture, redundantly) entitled “Field notes on the second law of quantum thermodynamics from a quantum physicist.” Scientists formulated the second law the early 1800s. It helps us understand why time appears to flow in only one direction. I described three enhancements of that understanding, which have grown from quantum thermodynamics and nonequilibrium statistical mechanics: resource-theory results, fluctuation theorems, and thermodynamic applications of entanglement. I also enjoyed talking with Center faculty and graduate students during the afternoon and evening. Then—being a child of the liberal arts—I stayed in Pittsburgh for half the following Saturday to visit the Carnegie Museum of Art.

With a copy of a statue of the goddess Sekhmet. She lives in the Carnegie Museum of Natural History, which shares a building with the art museum, from which I detoured to see the natural-history museum’s ancient-Egypt area (as Quantum Frontiers regulars won’t be surprised to hear).

Don’t get me wrong: I’m a physicist, not a philosopher. I don’t have the training to undertake philosophy, and I have enough work to do in pursuit of my physics goals. But my high-school self would approve—that self is still me.

Matt Strassler Update to the Higgs FAQ

Although I’ve been slowly revising the Higgs FAQ 2.0, this seemed an appropriate time to bring the Higgs FAQ on this website fully into the 2020’s. You will find the Higgs FAQ 3.0 here; it explains the basics of the Higgs boson and Higgs field, along with some of the wider context.

For deeper explanations of the Higgs field:

  • if you are comfortable with math, you can find this series of pages useful (but you will probably to read this series first.)
  • if you would prefer to avoid the math, a full and accurate conceptual explanation of the Higgs field is given in my book.

Events: this week I am speaking Tuesday in Berkeley, CA; Wednesday in Seattle, WA (at Town Hall); and Thursday outside of Portland, OR (at the Powell’s bookstore in Cedar Hills). Click here for more details.

April 14, 2024

John BaezProtonium

It looks like they’ve found protonium in the decay of a heavy particle!

Protonium is made of a proton and an antiproton orbiting each other. It lasts a very short time before they annihilate each other.

It’s a bit like a hydrogen atom where the electron has been replaced with an antiproton! But it’s much smaller than a hydrogen atom. And unlike a hydrogen atom, which is held together by the electric force, protonium is mainly held together by the strong nuclear force.

There are various ways to make protonium. One is to make a bunch of antiprotons and mix them with protons. This was done accidentally in 2002. They only realized this upon carefully analyzing the data 4 years later.

This time, people were studying the decay of the J/psi particle. The J/psi is made of a heavy quark and its antiparticle. It’s 3.3 times as heavy as a proton, so it’s theoretically able to decay into protonium. And careful study showed that yes, it does this sometimes!

The new paper on this has a rather dry title—not “We found protonium!” But it has over 550 authors, which hints that it’s a big deal. I won’t list them.

• BESIII Collaboration, Observation of the anomalous shape of X(1840) in J/ψ→γ3(π+π−), Phys. Rev. Lett. 132 (2024), 151901.

The idea here is that sometimes the J/ψ particle decays into a gamma ray and 3 pion-antipion pairs. When they examined this decay, they found evidence that an intermediate step involved a particle of mass 1880 MeV/c², a bit more than an already known intermediate of mass 1840 MeV/c².

This new particle is a bit lighter than twice the mass of a proton, 938 MeV/c². So, there’s a good chance that it’s protonium!

But how did physicists made protonium by accident in 2002? They were trying to make antihydrogen, which is a positron orbiting an antiproton. To do this, they used the Antiproton Decelerator at CERN. This is just one of the many cool gadgets they keep near the Swiss-French border.

You see, to create antiprotons you need to smash particles at each other at almost the speed of light—so the antiprotons usually shoot out really fast. It takes serious cleverness to slow them down and catch them without letting them bump into matter and annihilate.

That’s what the Antiproton Decelerator does. So they created a bunch of antiprotons and slowed them down. Once they managed to do this, they caught the antiprotons in a Penning trap. This holds charged particles using magnetic and electric fields. Then they cooled the antiprotons—slowed them even more—by letting them interact with a cold gas of electrons. Then they mixed in some positrons. And they got antihydrogen!

But apparently some protons got in there too, so they also made some protonium, by accident. They only realized this when they carefully analyzed the data 4 years later, in a paper with only a few authors:

• N. Zurlo, M. Amoretti, C. Amsler, G. Bonomi, C. Carraro, C. L. Cesar, M. Charlton, M. Doser, A. Fontana, R. Funakoshi, P. Genova, R. S. Hayano, L. V. Jorgensen, A. Kellerbauer, V. Lagomarsino, R. Landua, E. Lodi Rizzini, M. Macri, N. Madsen, G. Manuzio, D. Mitchard, P. Montagna, L. G. Posada, H. Pruys, C. Regenfus, A. Rotondi, G. Testera, D. P. Van der Werf, A. Variola, L. Venturelli and Y. Yamazaki, Production of slow protonium in vacuum, Hyperfine Interactions 172 (2006), 97–105.

Protonium is sometimes called an ‘exotic atom’—though personally I’d consider it an exotic nucleus. The child in me thinks it’s really cool that there’s an abbreviation for protonium, Pn, just like a normal element.

John Preskill“Once Upon a Time”…with a twist

The Noncommuting-Charges World Tour (Part 1 of 4)

This is the first part in a four part series covering the recent Perspectives article on noncommuting charges. I’ll be posting one part every 6 weeks leading up to my PhD thesis defence.

Thermodynamics problems have surprisingly many similarities with fairy tales. For example, most of them begin with a familiar opening. In thermodynamics, the phrase “Consider an isolated box of particles” serves a similar purpose to “Once upon a time” in fairy tales—both serve as a gateway to their respective worlds. Additionally, both have been around for a long time. Thermodynamics emerged in the Victorian era to help us understand steam engines, while Beauty and the Beast and Rumpelstiltskin, for example, originated about 4000 years ago. Moreover, each conclude with important lessons. In thermodynamics, we learn hard truths such as the futility of defying the second law, while fairy tales often impart morals like the risks of accepting apples from strangers. The parallels go on; both feature archetypal characters—such as wise old men and fairy godmothers versus ideal gases and perfect insulators—and simplified models of complex ideas, like portraying clear moral dichotomies in narratives versus assuming non-interacting particles in scientific models.1

Of all the ways thermodynamic problems are like fairytale, one is most relevant to me: both have experienced modern reimagining. Sometimes, all you need is a little twist to liven things up. In thermodynamics, noncommuting conserved quantities, or charges, have added a twist.

Unfortunately, my favourite fairy tale, ‘The Hunchback of Notre-Dame,’ does not start with the classic opening line ‘Once upon a time.’ For a story that begins with this traditional phrase, ‘Cinderella’ is a great choice.

First, let me recap some of my favourite thermodynamic stories before I highlight the role that the noncommuting-charge twist plays. The first is the inevitability of the thermal state. For example, this means that, at most times, the state of most sufficiently small subsystem within the box will be close to a specific form (the thermal state).

The second is an apparent paradox that arises in quantum thermodynamics: How do the reversible processes inherent in quantum dynamics lead to irreversible phenomena such as thermalization? If you’ve been keeping up with Nicole Yunger Halpern‘s (my PhD co-advisor and fellow fan of fairytale) recent posts on the eigenstate thermalization hypothesis (ETH) (part 1 and part 2) you already know the answer. The expectation value of a quantum observable is often comprised of a sum of basis states with various phases. As time passes, these phases tend to experience destructive interference, leading to a stable expectation value over a longer period. This stable value tends to align with that of a thermal state’s. Thus, despite the apparent paradox, stationary dynamics in quantum systems are commonplace.

The third story is about how concentrations of one quantity can cause flows in another. Imagine a box of charged particles that’s initially outside of equilibrium such that there exists gradients in particle concentration and temperature across the box. The temperature gradient will cause a flow of heat (Fourier’s law) and charged particles (Seebeck effect) and the particle-concentration gradient will cause the same—a flow of particles (Fick’s law) and heat (Peltier effect). These movements are encompassed within Onsager’s theory of transport dynamics…if the gradients are very small. If you’re reading this post on your computer, the Peltier effect is likely at work for you right now by cooling your computer.

What do various derivations of the thermal state’s forms, the eigenstate thermalization hypothesis (ETH), and the Onsager coefficients have in common? Each concept is founded on the assumption that the system we’re studying contains charges that commute with each other (e.g. particle number, energy, and electric charge). It’s only recently that physicists have acknowledged that this assumption was even present.

This is important to note because not all charges commute. In fact, the noncommutation of charges leads to fundamental quantum phenomena, such as the Einstein–Podolsky–Rosen (EPR) paradox, uncertainty relations, and disturbances during measurement. This raises an intriguing question. How would the above mentioned stories change if we introduce the following twist?

“Consider an isolated box with charges that do not commute with one another.” 

This question is at the core of a burgeoning subfield that intersects quantum information, thermodynamics, and many-body physics. I had the pleasure of co-authoring a recent perspective article in Nature Reviews Physics that centres on this topic. Collaborating with me in this endeavour were three members of Nicole’s group: the avid mountain climber, Billy Braasch; the powerlifter, Aleksander Lasek; and Twesh Upadhyaya, known for his prowess in street basketball. Completing our authorship team were Nicole herself and Amir Kalev.

To give you a touchstone, let me present a simple example of a system with noncommuting charges. Imagine a chain of qubits, where each qubit interacts with its nearest and next-nearest neighbours, such as in the image below.

The figure is courtesy of the talented team at Nature. Two qubits form the system S of interest, and the rest form the environment E. A qubit’s three spin components, σa=x,y,z, form the local noncommuting charges. The dynamics locally transport and globally conserve the charges.

In this interaction, the qubits exchange quanta of spin angular momentum, forming what is known as a Heisenberg spin chain. This chain is characterized by three charges which are the total spin components in the x, y, and z directions, which I’ll refer to as Qx, Qy, and Qz, respectively. The Hamiltonian H conserves these charges, satisfying [H, Qa] = 0 for each a, and these three charges are non-commuting, [Qa, Qb] 0, for any pair a, b ∈ {x,y,z} where a≠b. It’s noteworthy that Hamiltonians can be constructed to transport various other kinds of noncommuting charges. I have discussed the procedure to do so in more detail here (to summarize that post: it essentially involves constructing a Koi pond).

This is the first in a series of blog posts where I will highlight key elements discussed in the perspective article. Motivated by requests from peers for a streamlined introduction to the subject, I’ve designed this series specifically for a target audience: graduate students in physics. Additionally, I’m gearing up to defending my PhD thesis on noncommuting-charge physics next semester and these blog posts will double as a fun way to prepare for that.

  1. This opening text was taken from the draft of my thesis. ↩

April 13, 2024

Doug NatelsonElectronic structure and a couple of fun links

Real life has been very busy recently.  Posting will hopefully pick up soon.  

One brief item.  Earlier this week, Rice hosted Gabi Kotliar for a distinguished lecture, and he gave a very nice, pedagogical talk about different approaches to electronic structure calculations.  When we teach undergraduate chemistry on the one hand and solid state physics on the other, we largely neglect electron-electron interactions (except for very particular issues, like Hund's Rules).  Trying to solve the many-electron problem fully is extremely difficult.  Often, approximating by solving the single-electron problem (e.g. finding the allowed single-electron states for a spatially periodic potential as in a crystal) and then "filling up"* those states gives decent results.   As we see in introductory courses, one can try different types of single-electron states.  We can start with atomic-like orbitals localized to each site, and end up doing tight binding / LCAO / Hückel (when applied to molecules).  Alternately, we can do the nearly-free electron approach and think about Bloch wavesDensity functional theory, discussed here, is more sophisticated but can struggle with situations when electron-electron interactions are strong.

One of Prof. Kotliar's big contributions is something called dynamical mean field theory, an approach to strongly interacting problems.  In a "mean field" theory, the idea is to reduce a many-particle interacting problem to an effective single-particle problem, where that single particle feels an interaction based on the averaged response of the other particles.  Arguably the most famous example is in models of magnetism.  We know how to write the energy of a spin \(\mathbf{s}_{i}\) in terms of its interactions \(J\) with other spins \(\mathbf{s}_{j}\) as \(\sum_{j} J \mathbf{s}_{i}\cdot \mathbf{s}_{j}\).  If there are \(z\) such neighbors that interact with spin \(i\), then we can try instead writing that energy as \(zJ \mathbf{s}_{i} \cdot \langle \mathbf{s}_{i}\rangle\), where the angle brackets signify the average.  From there, we can get a self-consistent equation for \(\langle \mathbf{s}_{i}\rangle\).  

Dynamical mean field theory is rather similar in spirit; there are non-perturbative ways to solve some strong-interaction "quantum impurity" problems.  DMFT is like a way of approximating a whole lattice of strongly interacting sites as a self-consistent quantum impurity problem for one site.  The solutions are not for wave functions but for the spectral function.  We still can't solve every strongly interacting problem, but Prof. Kotliar makes a good case that we have made real progress in how to think about many systems, and when the atomic details matter.

*Here, "filling up" means writing the many-electron wave function as a totally antisymmetric linear combination of single-electron states, including the spin states.

PS - two fun links:

April 12, 2024

Matt von HippelThe Hidden Higgs

Peter Higgs, the theoretical physicist whose name graces the Higgs boson, died this week.

Peter Higgs, after the Higgs boson discovery was confirmed

This post isn’t an obituary: you can find plenty of those online, and I don’t have anything special to say that others haven’t. Reading the obituaries, you’ll notice they summarize Higgs’s contribution in different ways. Higgs was one of the people who proposed what today is known as the Higgs mechanism, the principle by which most (perhaps all) elementary particles gain their mass. He wasn’t the only one: Robert Brout and François Englert proposed essentially the same idea in a paper that was published two months earlier, in August 1964. Two other teams came up with the idea slightly later than that: Gerald Guralnik, Carl Richard Hagen, and Tom Kibble were published one month after Higgs, while Alexander Migdal and Alexander Polyakov found the idea independently in 1965 but couldn’t get it published till 1966.

Higgs did, however, do something that Brout and Englert didn’t. His paper doesn’t just propose a mechanism, involving a field which gives particles mass. It also proposes a particle one could discover as a result. Read the more detailed obituaries, and you’ll discover that this particle was not in the original paper: Higgs’s paper was rejected at first, and he added the discussion of the particle to make it more interesting.

At this point, I bet some of you are wondering what the big deal was. You’ve heard me say that particles are ripples in quantum fields. So shouldn’t we expect every field to have a particle?

Tell that to the other three Higgs bosons.

Electromagnetism has one type of charge, with two signs: plus, and minus. There are electrons, with negative charge, and their anti-particles, positrons, with positive charge.

Quarks have three types of charge, called colors: red, green, and blue. Each of these also has two “signs”: red and anti-red, green and anti-green, and blue and anti-blue. So for each type of quark (like an up quark), there are six different versions: red, green, and blue, and anti-quarks with anti-red, anti-green, and anti-blue.

Diagram of the colors of quarks

When we talk about quarks, we say that the force under which they are charged, the strong nuclear force, is an “SU(3)” force. The “S” and “U” there are shorthand for mathematical properties that are a bit too complicated to explain here, but the “(3)” is quite simple: it means there are three colors.

The Higgs boson’s primary role is to make the weak nuclear force weak, by making the particles that carry it from place to place massive. (That way, it takes too much energy for them to go anywhere, a feeling I think we can all relate to.) The weak nuclear force is an “SU(2)” force. So there should be two “colors” of particles that interact with the weak nuclear force…which includes Higgs bosons. For each, there should also be an anti-color, just like the quarks had anti-red, anti-green, and anti-blue. So we need two “colors” of Higgs bosons, and two “anti-colors”, for a total of four!

But the Higgs boson discovered at the LHC was a neutral particle. It didn’t have any electric charge, or any color. There was only one, not four. So what happened to the other three Higgs bosons?

The real answer is subtle, one of those physics things that’s tricky to concisely explain. But a partial answer is that they’re indistinguishable from the W and Z bosons.

Normally, the fundamental forces have transverse waves, with two polarizations. Light can wiggle along its path back and forth, or up and down, but it can’t wiggle forward and backward. A fundamental force with massive particles is different, because they can have longitudinal waves: they have an extra direction in which they can wiggle. There are two W bosons (plus and minus) and one Z boson, and they all get one more polarization when they become massive due to the Higgs.

That’s three new ways the W and Z bosons can wiggle. That’s the same number as the number of Higgs bosons that went away, and that’s no coincidence. We physicist like to say that the W and Z bosons “ate” the extra Higgs, which is evocative but may sound mysterious. Instead, you can think of it as the two wiggles being secretly the same, mixing together in a way that makes them impossible to tell apart.

The “count”, of how many wiggles exist, stays the same. You start with four Higgs wiggles, and two wiggles each for the precursors of the W+, W-, and Z bosons, giving ten. You end up with one Higgs wiggle, and three wiggles each for the W+, W-, and Z bosons, which still adds up to ten. But which fields match with which wiggles, and thus which particles we can detect, changes. It takes some thought to look at the whole system and figure out, for each field, what kind of particle you might find.

Higgs did that work. And now, we call it the Higgs boson.

Matt Strassler Peter Higgs versus the “God Particle”

The particle physics community is mourning the passing of Peter Higgs, the influential theoretical physicist and 2013 Nobel Prize laureate. Higgs actually wrote very few papers in his career, but he made them count.

It’s widely known that Higgs deeply disapproved of the term “God Particle”. That’s the nickname that has been given to the type of particle (the “Higgs boson”) whose existence he proposed. But what’s not as widely appreciated is why he disliked it, as do most other scientists I know.

It’s true that Higgs himself was an atheist. Still, no matter what your views on such subjects, it might bother you that the notion of a “God Particle” emerged neither from science nor from religion, and could easily be viewed as disrespectful to both of them. Instead, it arose out of marketing and advertising in the publishing industry, and it survives due to another industry: the news media.

But there’s something else more profound — something quite sad, really. The nickname puts the emphasis entirely in the wrong place. It largely obscures what Higgs (and his colleagues/competitors) actually accomplished, and why they are famous among scientists.

Let me ask you this. Imagine a type of particle that

  • once created, vanishes in a billionth of a trillionth of a second,
  • is not found naturally on Earth, nor anywhere in the universe for billions of years,
  • has no influence on daily life — in fact it has never had any direct impact on the human species — and
  • only was discovered when humans started making examples artificially.

This doesn’t seem very God-like to me. What do you think?

Perhaps this does seem spiritual or divine to you, and in that case, by all means call the “Higgs boson” the “God Particle”. But otherwise, you might want to consider alternatives.

For most humans, and even for most professional physicists, the only importance of the Higgs boson is this: it gives us insight into the Higgs field. This field

  • exists everywhere, including within the Earth and within every human body,
  • has existed throughout the history of the known universe,
  • has been reliably constant and steady since the earliest moments of the Big Bang, and
  • is crucial for the existence of atoms, and therefore for the existence of Earth and all its life;

It may even be capable of bringing about the universe’s destruction, someday in the distant future. So if you’re going to assign some divinity to Higgs’ insights, this is really where it belongs.

In short, what’s truly consequential in Higgs’ work (and that of others who had the same basic idea: Robert Brout and Francois Englert, and Gerald Guralnik, C. Richard Hagen and Tom Kibble) is the Higgs field. Your life depends upon the existence and stability of this field. The discovery in 2012 of the Higgs boson was important because it proved that the Higgs field really exists in nature. Study of this type of particle continues at the Large Hadron Collider, not because we are fascinated by the particle per se, but because measuring its properties is the most effective way for us to learn more about the all-important Higgs field.

Professor Higgs helped reveal one of the universe’s great secrets, and we owe him a great deal. I personally feel that we would honor his legacy, in a way that would have pleased him, through better explanations of what he achieved — ones that clarify how he earned a place in scientists’ Hall of Fame for eternity.

April 11, 2024

Scott Aaronson Avi Wigderson wins Turing Award!

Back in 2006, in the midst of an unusually stupid debate in the comment section of Lance Fortnow and Bill Gasarch’s blog, someone chimed in:

Since the point of theoretical computer science is solely to recognize who is the most badass theoretical computer scientist, I can only say:

GO HOME PUNKS!

WIGDERSON OWNS YOU!

Avi Wigderson: central unifying figure of theoretical computer science for decades; consummate generalist who’s contributed to pretty much every corner of the field; advocate and cheerleader for the field; postdoc adviser to a large fraction of all theoretical computer scientists, including both me and my wife Dana; derandomizer of BPP (provided E requires exponential-size circuits). Now, Avi not only “owns you,” he also owns a well-deserved Turing Award (on top of his well-deserved Nevanlinna, Abel, Gödel, and Knuth prizes). As Avi’s health has been a matter of concern to those close to him ever since his cancer treatment, which he blogged about a few years ago, I’m sure today’s news will do much to lift his spirits.

I first met Avi a quarter-century ago, when I was 19, at a PCMI summer school on computational complexity at the Institute for Advanced Study in Princeton. Then I was lucky enough to visit Avi in Israel when he was still a professor at the Hebrew University (and I was a grad student at Berkeley)—first briefly, but then Avi invited me back to spend a whole semester in Jerusalem, which ended up being one of my most productive semesters ever. Then Avi, having by then moved to the IAS in Princeton, hosted me for a one-year postdoc there, and later he and I collaborated closely on the algebrization paper. He’s had a greater influence on my career than all but a tiny number of people, and I’m far from the only one who can say that.

Summarizing Avi’s scientific contributions could easily fill a book, but Quanta and New Scientist and Lance’s blog can all get you started if you’re interested. Eight years ago, I took a stab at explaining one tiny little slice of Avi’s impact—namely, his decades-long obsession with “why the permanent is so much harder than the determinant”—in my IAS lecture Avi Wigderson’s “Permanent” Impact On Me, to which I refer you now (I can’t produce a new such lecture on one day’s notice!).

Huge congratulations to Avi.

Jordan EllenbergRoad trip to totality 2024

The last time we did this it was so magnificent that I said, on the spot, “see you again in 2024,” and seven years didn’t dim my wish to see the sun wink out again. It was easier this time — the path went through Indiana, which is a lot closer to home than St. Louis. More importantly, CJ can drive now, and likes to, so the trip is fully chauffeured. We saw the totality in Zionsville, IN, in a little park at the end of a residential cul-de-sac.

It was a smaller crowd than the one at Festus, MO in 2017; and unlike last time there weren’t a lot of travelers. These were just people who happened to live in Zionsville, IN and who were home in the middle of the day to see the eclipse. There were clouds, and a lot of worries about the clouds, but in the end it was just thin cirrus strips that blocked the sun, and then the non-sun, not at all.

To me it was a little less dramatic this time — because the crowd was more casual, because the temperature drop was less stark in April than it was in August, and of course because it was never again going to be the first time. But CJ and AB thought this one was better. We had very good corona. You could see a tiny red dot on the edge of the sun which was in fact a plasma prominence much bigger than the Earth.

Some notes:

  • We learned our lesson last time when we got caught in a massive traffic jam in the middle of a cornfield. We chose Zionsville because it was in the northern half of the totality, right on the highway, so we could be in the car zipping north on I-65 before the massive wave of northbound traffic out of Indianapolis caught up with us. And we were! Very satisfying, to watch on Google Maps as the traffic jam got longer and longer behind us, but was never quite where we were, as if we were depositing it behind us.
  • We had lunch in downtown Indianapolis where there is a giant Kurt Vonnegut Jr. painted on a wall. CJ is reading Slaughterhouse Five for school — in fact, to my annoyance, it’s the only full novel they’ve read in their American Lit elective. But it’s a pretty good choice for high school assigned reading. In the car I tried to explain Vonnegut’s theory of the granfaloon as it applied to “Hoosier” but neither kid was really interested.
  • We’ve done a fair number of road trips in the Mach-E and this was the first time charging created any annoyance. The Electrify America station we wanted on the way down had two chargers in use and the other two broken, so we had to detour quite a ways into downtown Lafayette to charge at a Cadillac dealership. On the way back, the station we planned on was full with one person waiting in line, so we had to change course and charge at the Whole Foods parking lot, and even there we got lucky as one person was leaving just as we arrived. The charging process probably added an hour to our trip each way.
  • While we charged at the Whole Foods in Schaumburg we hung out at the Woodfield Mall. Nostalgic feelings, for this suburban kid, to be in a thriving, functioning mall, with groups of kids just hanging out and vaguely shopping, the way we used to. The malls in Madison don’t really work like this any more. Is it a Chicago thing?
  • CJ is off to college next year. Sad to think there may not be any more roadtrips, or at least any more roadtrips where all of us are starting from home.
  • I was wondering whether total eclipses in the long run are equidistributed on the Earth’s surface and the answer is no: Ernie Wright at NASA made an image of the last 5000 years of eclipse paths superimposed:

There are more in the northern hemisphere than the southern because there are more eclipses in the summer (sun’s up longer!) and the sun is a little farther (whence visually a little smaller and more eclipsible) during northern hemisphere summer than southern hemisphere summer.

See you again in 2045!

April 09, 2024

Tommaso DorigoGoodbye Peter Higgs, And Thanks For The Boson

Peter Higgs passed away yesterday, at the age of 94. The scottish physicist, a winner of the 2013 Nobel Prize in Physics together with Francois Englert, hypothesized in 1964 the existence of the most mysterious elementary particle we know of, the Higgs boson, which was only discovered 48 years later by the ATLAS and CMS collaborations at the CERN Large Hadron Collider. 


read more

April 05, 2024

Terence TaoMarton’s conjecture in abelian groups with bounded torsion

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Marton’s conjecture) Let {G = (G,+)} be an abelian {m}-torsion group (thus, {mx=0} for all {x \in G}), and let {A \subset G} be such that {|A+A| \leq K|A|}. Then {A} can be covered by at most {(2K)^{O(m^3)}} translates of a subgroup {H} of {G} of cardinality at most {|A|}. Moreover, {H} is contained in {\ell A - \ell A} for some {\ell \ll (2 + m \log K)^{O(m^3 \log m)}}.

We had previously established the {m=2} case of this result, with the number of translates bounded by {(2K)^{12}} (which was subsequently improved to {(2K)^{11}} by Jyun-Jie Liao), but without the additional containment {H \subset \ell A - \ell A}. It remains a challenge to replace {\ell} by a bounded constant (such as {2}); this is essentially the “polynomial Bogolyubov conjecture”, which is still open. The {m=2} result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic {2} to higher characteristic.
Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to {m}-dependent constants, is actually equivalent to the above theorem):

Theorem 2 (Marton’s conjecture, entropy form) Let {G} be an abelian {m}-torsion group, and let {X_1,\dots,X_m} be independent finitely supported random variables on {G}, such that

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i] \leq \log K,

where {{\bf H}} denotes Shannon entropy. Then there is a uniform random variable {U_H} on a subgroup {H} of {G} such that

\displaystyle \frac{1}{m} \sum_{i=1}^m d[X_i; U_H] \ll m^3 \log K,

where {d} denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the {X_i} take values in some symmetric set {S}, then {H} lies in {\ell S} for some {\ell \ll (2 + \log K)^{O(m^3 \log m)}}.

As a first approximation, one should think of all the {X_i} as identically distributed, and having the uniform distribution on {A}, as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the {X_i} separately. It also is technically convenient to work with {m} independent variables, rather than just a pair of variables as we did in the {m=2} case; this is perhaps the biggest additional technical complication needed to handle higher characteristics.
The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications {X'_1,\dots,X'_m} of {X_1,\dots,X_m} that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i]

which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the {X_i} are all translates of a uniform random variable {U_H} on a subgroup {H}). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent {12}, but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the {m=2} case, albeit with somewhat worse exponents).
As before, we search for such improved random variables {X'_1,\dots,X'_m} by introducing more independent random variables – we end up taking an array of {m^2} random variables {Y_{i,j}} for {i,j=1,\dots,m}, with each {Y_{i,j}} a copy of {X_i}, and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums {\sum_j Y_{i,j}} of our array will end up being close to independent of the column sums {\sum_i Y_{i,j}}, subject to conditioning on the total sum {\sum_{i,j} Y_{i,j}}. Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if {X,Y} are two independent copies of a Gaussian random variable, then {X+Y} and {X-Y} are also independent of each other.
Up until now, the argument does not use the {m}-torsion hypothesis, nor the fact that we work with an {m \times m} array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity

\displaystyle \sum_{i,j} i Y_{i,j} + \sum_{i,j} j Y_{i,j} + \sum_{i,j} (-i-j) Y_{i,j} = 0.

In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum {\sum_{i,j} Y_{i,j}}). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable {U} of small entropic doubling that is reasonably close to all of the {X_i} in Ruzsa distance, which provides the final way to reduce the multidistance.
Besides the polynomial Bogolyubov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).

Matt von HippelMaking More Nails

They say when all you have is a hammer, everything looks like a nail.

Academics are a bit smarter than that. Confidently predict a world of nails, and you fall to the first paper that shows evidence of a screw. There are limits to how long you can delude yourself when your job is supposed to be all about finding the truth.

You can make your own nails, though.

Suppose there’s something you’re really good at. Maybe, like many of my past colleagues, you can do particle physics calculations faster than anyone else, even when the particles are super-complicated hypothetical gravitons. Maybe you know more than anyone else about how to make a quantum computer, or maybe you just know how to build a “quantum computer“. Maybe you’re an expert in esoteric mathematics, who can re-phrase anything in terms of the arcane language of category theory.

That’s your hammer. Get good enough with it, and anyone with a nail-based problem will come to you to solve it. If nails are trendy, then you’ll impress grant committees and hiring committees, and your students will too.

When nails aren’t trendy, though, you need to try something else. If your job is secure, and you don’t have students with their own insecure jobs banging down your door, then you could spend a while retraining. You could form a reading group, pick up a textbook or two about screwdrivers and wrenches, and learn how to use different tools. Eventually, you might find a screwdriving task you have an advantage with, something you can once again do better than everyone else, and you’ll start getting all those rewards again.

Or, maybe you won’t. You’ll get less funding to hire people, so you’ll do less research, so your work will get less impressive and you’ll get less funding, and so on and so forth.

Instead of risking that, most academics take another path. They take what they’re good at, and invent new problems in the new trendy area to use that expertise.

If everyone is excited about gravitational waves, you turn a black hole calculation into a graviton calculation. If companies are investing in computation in the here-and-now, then you find ways those companies can use insights from your quantum research. If everyone wants to know how AI works, you build a mathematical picture that sort of looks like one part of how AI works, and do category theory to it.

At first, you won’t be competitive. Your hammer isn’t going to work nearly as well as the screwdrivers people have been using forever for these problems, and there will be all sorts of new issues you have to solve just to get your hammer in position in the first place. But that doesn’t matter so much, as long as you’re honest. Academic research is expected to take time, applications aren’t supposed to be obvious. Grant committees care about what you’re trying to do, as long as you have a reasonably plausible story about how you’ll get there.

(Investors are also not immune to a nice story. Customers are also not immune to a nice story. You can take this farther than you might think.)

So, unlike the re-trainers, you survive. And some of the time, you make it work. Your hammer-based screwdriving ends up morphing into something that, some of the time, actually does something the screwdrivers can’t. Instead of delusionally imagining nails, you’ve added a real ersatz nail to the world, where previously there was just a screw.

Making nails is a better path for you. Is it a better path for the world? I’m not sure.

If all those grants you won, all those jobs you and your students got, all that money from investors or customers drawn in by a good story, if that all went to the people who had the screwdrivers in the first place, could they have done a better job?

Sometimes, no. Sometimes you happen upon some real irreproducible magic. Your hammer is Thor’s hammer, and when hefted by the worthy it can do great things.

Sometimes, though, your hammer was just the hammer that got the funding. Now every screwdriver kit has to have a space for a little hammer, when it could have had another specialized screwdriver that fit better in the box.

In the end, the world is build out of these kinds of ill-fitting toolkits. We all try to survive, both as human beings and by our sub-culture’s concept of the good life. We each have our hammers, and regardless of whether the world is full of screws, we have to convince people they want a hammer anyway. Everything we do is built on a vast rickety pile of consequences, the end-results of billions of people desperate to be wanted. For those of us who love clean solutions and ideal paths, this is maddening and frustrating and terrifying. But it’s life, and in a world where we never know the ideal path, screw-nails and nail-screws are the best way we’ve found to get things done.

Scott Aaronson And yet quantum computing continues to progress

Pissing away my life in a haze of doomscrolling, sporadic attempts to “parent” two rebellious kids, and now endless conversations about AI safety, I’m liable to forget for days that I’m still mostly known (such as I am) as a quantum computing theorist, and this blog is still mostly known as a quantum computing blog. Maybe it’s just that I spent a quarter-century on quantum computing theory. As an ADHD sufferer, anything could bore me after that much time, even one of the a-priori most exciting things in the world.

It’s like, some young whippersnappers proved another monster 80-page theorem that I’ll barely understand tying together the quantum PCP conjecture, area laws, and Gibbs states? Another company has a quantum software platform, or hardware platform, and they’ve issued a press release about it? Another hypester claimed that QC will revolutionize optimization and machine learning, based on the usual rogues’ gallery of quantum heuristic algorithms that don’t seem to outperform classical heuristics? Another skeptic claimed that scalable quantum computing is a pipe dream—mashing together the real reasons why it’s difficult with basic misunderstandings of the fault-tolerance theorem? In each case, I’ll agree with you that I probably should get up, sit at my laptop, and blog about it (it’s hard to blog with two thumbs), but as likely as not I won’t.


And yet quantum computing continues to progress. In December we saw Harvard and QuEra announce a small net gain from error-detection in neutral atoms, and accuracy that increased with the use of larger error-correcting codes. Today, a collaboration between Microsoft and Quantinuum has announced what might be the first demonstration of error-corrected two-qubit entangling gates with substantially lower error than the same gates applied to the bare physical qubits. (This is still at the stage where you need to be super-careful in how you phrase every such sentence—experts should chime in if I’ve already fallen short; I take responsibility for any failures to error-correct this post.)

You can read the research paper here, or I’ll tell you the details to the best of my understanding (I’m grateful to Microsoft’s Krysta Svore and others from the collaboration for briefing me by Zoom). The collaboration used a trapped-ion system with 32 fully-connected physical qubits (meaning, the qubits can be shuttled around a track so that any qubit can directly interact with any other). One can apply an entangling gate to any pair of qubits with ~99.8% fidelity.

What did they do with this system? They created up to 4 logical encoded qubits, using the Steane code and other CSS codes. Using logical CNOT gates, they then created logical Bell pairs — i.e., (|00⟩+|11⟩)/√2 — and verified that they did this.

That’s in the version of their experiment that uses “preselection but not postselection.” In other words, they have to try many times until they prepare the logical initial states correctly—as with magic state factories. But once they do successfully prepare the initial states, there’s no further cheating involving postselection (i.e., throwing away bad results): they just apply the logical CNOT gates, measure, and see what they got.

For me personally, that’s the headline result. But then they do various further experiments to “spike the football.” For one thing, they show that when they do allow postselected measurement outcomes, the decrease in the effective error rate can be much much larger, as large as 800x. That allows them (again, under postselection!) to demonstrate up to two rounds of error syndrome extraction and correction while still seeing a net gain, or three rounds albeit with unclear gain. The other thing they demonstrate is teleportation of fault-tolerant qubits—so, a little fancier than just preparing an encoded Bell pair and then measuring it.

They don’t try to do (e.g.) a quantum supremacy demonstration with their encoded qubits, like Harvard/QuEra did—they don’t have nearly enough qubits for that. But this is already extremely cool, and it sets a new bar in quantum error-correction experiments for others to meet or exceed (superconducting, neutral atom, and photonics people, that means you!). And I wasn’t expecting it! Indeed, I’m so far behind the times that I still imagined Microsoft as committed to a strategy of “topological qubits or bust.” While Microsoft is still pursuing the topological approach, their strategy has clearly pivoted over the last few years towards “whatever works.”

Anyway, huge congratulations to the teams at Microsoft and Quantinuum for their accomplishment!


Stepping back, what is the state of experimental quantum computing, 42 years after Feynman’s lecture, 30 years after Shor’s algorithm, 25 years after I entered the field, 5 years after Google’s supremacy experiment? There’s one narrative that quantum computing is already being used to solve practical problems that couldn’t be solved otherwise (look at all the hundreds of startups! they couldn’t possibly exist without providing real value, could they?). Then there’s another narrative that quantum computing has been exposed as a fraud, an impossibility, a pipe dream. Both narratives seem utterly disconnected from the reality on the ground.

If you want to track the experimental reality, my one-sentence piece of advice would be to focus relentlessly on the fidelity with which experimenters can apply a single physical 2-qubit gate. When I entered the field in the late 1990s, ~50% woud’ve been an impressive fidelity. At some point it became ~90%. With Google’s supremacy experiment in 2019, we saw 1000 gates applied to 53 qubits, each gate with ~99.5% fidelity. Now, in superconducting, trapped ions, and neutral atoms alike, we’re routinely seeing ~99.8% fidelities, which is what made possible (for example) the new Microsoft/Quantinuum result. The best fidelities I’ve heard reported this year are more like ~99.9%.

Meanwhile, on paper, it looks like known methods for quantum fault-tolerance, for example using the surface code, should start to become practical once you have 2-qubit fidelities around ~99.99%—i.e., one more “9” from where we are now. And then there should “merely” be the practical difficulty of maintaining that 99.99% fidelity while you scale up to millions or hundreds of millions of physical qubits!

What I’m trying to say is: this looks a pretty good trajectory! It looks like, if we plot the infidelity on a log scale, the experimentalists have already gone three-quarters of the distance. It now looks like it would be a surprise if we couldn’t have hundreds of fault-tolerant qubits and millions of gates on them within the next decade, if we really wanted that—like something unexpected would have to go wrong to prevent it.

Wouldn’t be ironic if all that was true, but it will simply matter much less than we hoped in the 1990s? Either just because the set of problems for which a quantum computing is useful has remained stubbornly more specialized than the world wants it to be (for more on that, see the entire past 20 years of this blog) … or because advances in classical AI render what was always quantum computing’s most important killer app, to the simulation of quantum chemistry and materials, increasingly superfluous (as AlphaFold may have already done for protein folding) … or simply because civilization descends further into barbarism, or the unaligned AGIs start taking over, and we all have bigger things to worry about than fault-tolerant quantum computing.

But, you know, maybe fault-tolerant quantum computing will not only work, but matter—and its use to design better batteries and drugs and photovoltaic cells and so on will pass from science-fiction fantasy to quotidian reality so quickly that much of the world (weary from the hypesters crying wolf too many times?) will barely even notice it when it finally happens, just like what we saw with Large Language Models a few years ago. That would be worth getting out of bed for.

April 04, 2024

Tommaso DorigoSignificance Of Counting Experiments With Background Uncertainty

In the course of Statistics for Data Analysis I give every spring to PhD students in Physics I spend some time discussing the apparently trivial problem of evaluating the significance of an excess of observed events N over expected background B. 

This is a quite common setup in many searches in Physics and Astrophysics: you have some detection apparatus that records the number of phenomena of a specified kind, and you let it run for some time, whereafter you declare that you have observed N of them. If the occurrence of each phenomenon has equal probability and they do not influence one another, that number N is understood to be sampled from a Poisson distribution of mean B. 

read more

April 03, 2024

Scott Aaronson Open Letter to Anti-Zionists on Twitter

Dear Twitter Anti-Zionists,

For five months, ever since Oct. 7, I’ve read you obsessively. While my current job is supposed to involve protecting humanity from the dangers of AI (with a side of quantum computing theory), I’m ashamed to say that half the days I don’t do any science; instead I just scroll and scroll, reading anti-Israel content and then pro-Israel content and then more anti-Israel content. I thought refusing to post on Twitter would save me from wasting my life there as so many others have, but apparently it doesn’t, not anymore. (No, I won’t call it “X.”)

At the high end of the spectrum, I religiously check the tweets of Paul Graham, a personal hero and inspiration to me ever since he wrote Why Nerds Are Unpopular twenty years ago, and a man with whom I seem to resonate deeply on every important topic except for two: Zionism and functional programming. At the low end, I’ve read hundreds of the seemingly infinite army of Tweeters who post images of hook-nosed rats with black hats and sidecurls and dollar signs in their eyes, sneering as they strangle the earth and stab Palestinian babies. I study their detailed theories about why the October 7 pogrom never happened, and also it was secretly masterminded by Israel just to create an excuse to mass-murder Palestinians, and also it was justified and thrilling (exactly the same melange long ago embraced for the Holocaust).

I’m aware, of course, that the bottom-feeders make life too easy for me, and that a single Paul Graham who endorses the anti-Zionist cause ought to bother me more than a billion sharers of hook-nosed rat memes. And he does. That’s why, in this letter, I’ll try to stay at the higher levels of Graham’s Disagreement Hierarchy.

More to the point, though, why have I spent so much time on such a depressing, unproductive reading project?

Damned if I know. But it’s less surprising when you recall that, outside theoretical computer science, I’m (alas) mostly known to the world for having once confessed, in a discussion deep in the comment section of this blog, that I spent much of my youth obsessively studying radical feminist literature. I explained that I did that because my wish, for a decade, was to confront progressivism’s highest moral authorities on sex and relationships, and make them tell me either that

(1) I, personally, deserved to die celibate and unloved, as a gross white male semi-autistic STEM nerd and stunted emotional and aesthetic cripple, or else
(2) no, I was a decent human being who didn’t deserve that.

One way or the other, I sought a truthful answer, one that emerged organically from the reigning morality of our time and that wasn’t just an unprincipled exception to it. And I felt ready to pursue progressive journalists and activists and bloggers and humanities professors to the ends of the earth before I’d let them leave this one question hanging menacingly over everything they’d ever written, with (I thought) my only shot at happiness in life hinging on their answer to it.

You might call this my central character flaw: this need for clarity from others about the moral foundations of my own existence. I’m self-aware enough to know that it is a severe flaw, but alas, that doesn’t mean that I ever figured out how to fix it.

It’s been exactly the same way with the anti-Zionists since October 7. Every day I read them, searching for one thing and one thing only: their own answer to the “Jewish Question.” How would they ensure that the significant fraction of the world that yearns to murder all Jews doesn’t get its wish in the 21st century, as to a staggering extent it did in the 20th? I confess to caring about that question, partly (of course) because of the accident of having been born a Jew, and having an Israeli wife and family in Israel and so forth, but also because, even if I’d happened to be a Gentile, the continued survival of the world’s Jews would still seem remarkably bound up with science, Enlightenment, minority rights, liberal democracy, meritocracy, and everything else I’ve ever cared about.

I understand the charges against me. Namely: that if I don’t call for Israel to lay down its arms right now in its war against Hamas (and ideally: to dissolve itself entirely), then I’m a genocidal monster on the wrong side of history. That I value Jewish lives more than Palestinian lives. That I’m a hasbara apologist for the IDF’s mass-murder and apartheid and stealing of land. That if images of children in Gaza with their limbs blown off, or dead in their parents arms, or clawing for bread, don’t cause to admit that Israel is evil, then I’m just as evil as the Israelis are.

Unsurprisingly I contest the charges. As a father of two, I can no longer see any images of child suffering without thinking about my own kids. For all my supposed psychological abnormality, the part of me that’s horrified by such images seems to be in working order. If you want to change my mind, rather than showing me more such images, you’ll need to target the cognitive part of me: the part that asks why so many children are suffering, and what causal levers we’d need to push to reach a place where neither side’s children ever have to suffer like this ever again.

At risk of stating the obvious: my first-order model is that Hamas, with the diabolical brilliance of a Marvel villain, successfully contrived a situation where Israel could prevent the further massacring of its own population only by fighting a gruesome urban war, of a kind that always, anywhere in the world, kills tens of thousands of civilians. Hamas, of course, was helped in this plan by an ideology that considers martyrdom the highest possible calling for the innocents who it rules ruthlessly and hides underneath. But Hamas also understood that the images of civilian carnage would (rightly!) shock the consciences of Israel’s Western allies and many Israelis themselves, thereby forcing a ceasefire before the war was over, thereby giving Hamas the opportunity to regroup and, with God’s and of course Iran’s help, finally finish the job of killing all Jews another day.

And this is key: once you remember why Hamas launched this war and what its long-term goals are, every detail of Twitter’s case against Israel has to be reexamined in a new light. Take starvation, for example. Clearly the only explanation for why Israelis would let Gazan children starve is the malice in their hearts? Well, until you think through the logistical challenges of feeding 2.3 million starving people whose sole governing authority is interested only in painting the streets red with Jewish blood. Should we let that authority commandeer the flour and water for its fighters, while innocents continue to starve? No? Then how about UNRWA? Alas, we learned that UNRWA, packed with employees who cheered the Oct. 7 massacre in their Telegram channels and in some cases took part in the murders themselves, capitulates to Hamas so quickly that it effectively is Hamas. So then Israel should distribute the food itself! But as we’ve dramatically witnessed, Israel can’t distribute food without imposing order, which would seem to mean reoccupying Gaza and earning the world’s condemnation for it. Do you start to appreciate the difficulty of the problem—and why the Biden administration was pushed to absurd-sounding extremes like air-dropping food and then building a floating port?

It all seems so much easier, once you remove the constraint of not empowering Hamas in its openly-announced goal of completing the Holocaust. And hence, removing that constraint is precisely what the global left does.

For all that, by Israeli standards I’m firmly in the anti-Netanyahu, left-wing peace camp—exactly where I’ve been since the 1990s, as a teenager mourning the murder of Rabin. And I hope even the anti-Israel side might agree with me that, if all the suffering since Oct. 7 has created a tiny opening for peace, then walking through that opening depends on two things happening:

  1. the removal of Netanyahu, and
  2. the removal of Hamas.

The good news is that Netanyahu, the catastrophically failed “Protector of Israel,” not only can, but plausibly will (if enough government ministers show some backbone), soon be removed in a democratic election.

Hamas, by contrast, hasn’t allowed a single election since it took power in 2006, in a process notable for its opponents being thrown from the roofs of tall buildings. That’s why even my left-leaning Israeli colleagues—the ones who despise Netanyahu, who marched against him last year—support Israel’s current war. They support it because, even if the Israeli PM were Fred Rogers, how can you ever get to peace without removing Hamas, and how can you remove Hamas except by war, any more than you could cut a deal with Nazi Germany?

I want to see the IDF do more to protect Gazan civilians—despite my bitter awareness of survey data suggesting that many of those civilians would murder my children in front of me if they ever got a chance. Maybe I’d be the same way if I’d been marinated since birth in an ideology of Jew-killing, and blocked from other sources of information. I’m heartened by the fact that despite this, indeed despite the risk to their lives for speaking out, a full 15% of Gazans openly disapprove of the Oct. 7 massacre. I want a solution where that 15% becomes 95% with the passing of generations. My endgame is peaceful coexistence.

But to the anti-Zionists I say: I don’t even mind you calling me a baby-eating monster, provided you honestly field one question. Namely:

Suppose the Palestinian side got everything you wanted for it; then what would be your plan for the survival of Israel’s Jews?

Let’s assume that not only has Netanyahu lost the next election in a landslide, but is justly spending the rest of his life in Israeli prison. Waving my wand, I’ve made you Prime Minister in his stead, with an overwhelming majority in the Knesset. You now get to go down in history as the liberator of Palestine. But you’re now also in charge of protecting Israel’s 7 million Jews (and 2 million other residents) from near-immediate slaughter at the hands of those who you’ve liberated.

Granted, it seems pretty paranoid to expect such a slaughter! Or rather: it would seem paranoid, if the Palestinians’ Grand Mufti (progenitor of the Muslim Brotherhood and hence Hamas) hadn’t allied himself with Hitler in WWII, enthusiastically supported the Nazi Final Solution, and tried to export it to Palestine; if in 1947 the Palestinians hadn’t rejected the UN’s two-state solution (the one Israel agreed to) and instead launched another war to exterminate the Jews (a war they lost); if they hadn’t joined the quest to exterminate the Jews a third time in 1967; etc., or if all this hadn’t happened back before there were any settlements or occupation, when the only question on the table was Israel’s existence. It would seem paranoid if Arafat had chosen a two-state solution when Israel offered it to him at Camp David, rather than suicide bombings. It would seem paranoid if not for the candies passed out in the streets in celebration on October 7.

But if someone has a whole ideology, which they teach their children and from which they’ve never really wavered for a century, about how murdering you is a religious honor, and also they’ve actually tried to murder you at every opportunity—-what more do you want them to do, before you’ll believe them?

So, you tell me your plan for how to protect Israel’s 7 million Jews from extermination at the hands of neighbors who have their extermination—my family’s extermination—as their central political goal, and who had that as their goal long before there was any occupation of the West Bank or Gaza. Tell me how to do it while protecting Palestinian innocents. And tell me your fallback plan if your first plan turns out not to work.

We can go through the main options.


(1) UNILATERAL TWO-STATE SOLUTION

Maybe your plan is that Israel should unilaterally dismantle West Bank settlements, recognize a Palestinian state, and retreat to the 1967 borders.

This is an honorable plan. It was my preferred plan—until the horror of October 7, and then the even greater horror of the worldwide left reacting to that horror by sharing celebratory images of paragliders, and by tearing down posters of kidnapped Jewish children.

Today, you might say October 7 has sort of put a giant flaming-red exclamation point on what’s always been the central risk of unilateral withdrawal. Namely: what happens if, afterward, rather than building a peaceful state on their side of the border, the Palestinian leadership chooses instead to launch a new Iran-backed war on Israel—one that, given the West Bank’s proximity to Israel’s main population centers, makes October 7 look like a pillow fight?

If that happens, will you admit that the hated Zionists were right and you were wrong all along, that this was never about settlements but always, only about Israel’s existence? Will you then agree that Israel has a moral prerogative to invade the West Bank, to occupy and pacify it as the Allies did Germany and Japan after World War II? Can I get this in writing from you, right now? Or, following the future (October 7)2 launched from a Judenfrei West Bank, will your creativity once again set to work constructing a reason to blame Israel for its own invasion—because you never actually wanted a two-state solution at all, but only Israel’s dismantlement?


(2) NEGOTIATED TWO-STATE SOLUTION

So, what about a two-state solution negotiated between the parties? Israel would uproot all West Bank settlements that prevent a Palestinian state, and resettle half a million Jews in pre-1967 Israel—in exchange for the Palestinians renouncing their goal of ending Israel’s existence, via a “right of return” or any other euphemism.

If so: congratulations, your “anti-Zionism” now seems barely distinguishable from my “Zionism”! If they made me the Prime Minister of Israel, and put you in charge of the Palestinians, I feel optimistic that you and I could reach a deal in an hour and then go out for hummus and babaganoush.


(3) SECULAR BINATIONAL STATE

In my experience, in the rare cases they deign to address the question directly, most anti-Zionists advocate a “secular, binational state” between the Jordan and Mediterranean, with equal rights for all inhabitants. Certainly, that would make sense if you believe that Israel is an apartheid state just like South Africa.

To me, though, this analogy falls apart on a single question: who’s the Palestinian Nelson Mandela? Who’s the Palestinian leader who’s ever said to the Jews, “end your Jewish state so that we can live together in peace,” rather than “end your Jewish state so that we can end your existence”? To impose a binational state would be to impose something, not only that Israelis regard as an existential horror, but that most Palestinians have never wanted either.

But, suppose we do it anyway. We place 7 million Jews, almost half the Jews who remain on Earth, into a binational state where perhaps a third of their fellow citizens hold the theological belief that all Jews should be exterminated, and that a heavenly reward follows martyrdom in blowing up Jews. The exterminationists don’t quite have a majority, but they’re the second-largest voting bloc. Do you predict that the exterminationists will give up their genocidal ambition because of new political circumstances that finally put their ambition within reach? If October-7 style pogroms against Jews turn out to be a regular occurrence in our secular binational state, how will its government respond—like the Palestinian Authority? like UNRWA? like the British Mandate? like Tsarist Russia?

In such a case, perhaps the Jews (along with those Arabs and Bedouins and Druze and others who cast their lot with the Jews) would need form a country-within-a-country: their own little autonomous zone within the binational state, with its own defense force. But of course, such a country-within-a-country already formed, for pretty much this exact reason. It’s called Israel. A cycle has been detected in your arc of progress.


(4) EVACUATION OF THE JEWS FROM ISRAEL

We come now to the anti-Zionists who are plainspoken enough to say: Israel’s creation was a grave mistake, and that mistake must now be reversed.

This is a natural option for anyone who sees Israel as an “illegitimate settler-colonial project,” like British India or French Algeria, but who isn’t quite ready to call for another Jewish genocide.

Again, the analogy runs into obvious problems: Israelis would seem to be the first “settler-colonialists” in the history of the world who not only were indigenous to the land they colonized, as much as anyone was, but who weren’t colonizing on behalf of any mother country, and who have no obvious such country to which they can return.

Some say spitefully: then let the Jews go back to Poland. These people might be unaware that, precisely because of how thorough the Holocaust was, more Israeli Jews trace their ancestry to Muslim countries than to Europe. Is there to be a “right of return” to Egypt, Iraq, Morocco, and Yemen, for all the Jews forcibly expelled from those places and for their children and grandchildren?

Others, however, talk about evacuating the Jews from Israel with goodness in their hearts. They say: we’d love the Israelis’ economic dynamism here in Austin or Sydney or Oxfordshire, joining their many coreligionists who already call these places home. What’s more, they’ll be safer here—who wants to live with missiles raining down on their neighborhood? Maybe we could even set aside some acres in Montana for a new Jewish homeland.

Again, if this is your survival plan, I’m a billion times happier to discuss it openly than to have it as unstated subtext!

Except, maybe you could say a little more about the logistics. Who will finance the move? How confident are you that the target country will accept millions of defeated, desperate Jews, as no country on earth was the last time this question arose?

I realize it’s no longer the 1930s, and Israel now has friends, most famously in America. But—what’s a good analogy here? I’ve met various Silicon Valley gazillionaires. I expect that I could raise millions from them, right now, if I got them excited about a new project in quantum computing or AI or whatever. But I doubt I could raise a penny from them if I came to them begging for their pity or their charity.

Likewise: for all the anti-Zionists’ loudness, a solid majority of Americans continue to support Israel (which, incidentally, provides a much simpler explanation than the hook-nosed perfidy of AIPAC for why Congress and the President mostly support it). But it seems to me that Americans support Israel in the “exciting project” sense, rather than in the “charity” sense. They like that Israelis are plucky underdogs who made the deserts bloom, and built a thriving tech industry, and now produce hit shows like Shtisel and Fauda, and take the fight against a common foe to the latter’s doorstep, and maintain one of the birthplaces of Western civilization for tourists and Christian pilgrims, and restarted the riveting drama of the Bible after a 2000-year hiatus, which some believe is a crucial prerequisite to the Second Coming.

What’s important, for present purposes, is not whether you agree with any of these rationales, but simply that none of them translate into a reason to accept millions of Jewish refugees.

But if you think dismantling Israel and relocating its seven million Jews is a workable plan—OK then, are you doing anything to make that more than a thought experiment, as the Zionists did a century ago with their survival plan? Have even I done more to implement your plan than you have, by causing one Israeli (my wife) to move to the US?


Suppose you say it’s not your job to give me a survival plan for Israel’s Jews. Suppose you say the request is offensive, an attempt to distract from the suffering of the Palestinians, so you change the subject.

In that case, fine, but you can now take off your cloak of righteousness, your pretense of standing above me and judging me from the end of history. Your refusal to answer the question amounts to a confession that, for you, the goal of “a free Palestine from the river to the sea” doesn’t actually require the physical survival of Israel’s Jews.

Which means, we’ve now established what you are. I won’t give you the satisfaction of calling you a Nazi or an antisemite. Thousands of years before those concepts existed, Jews already had terms for you. The terms tended toward a liturgical register, as in “those who rise up in every generation to destroy us.” The whole point of all the best-known Jewish holidays, like Purim yesterday, is to talk about those wicked would-be destroyers in the past tense, with the very presence of live Jews attesting to what the outcome was.

(Yesterday, I took my kids to a Purim carnival in Austin. Unlike in previous years, there were armed police everywhere. It felt almost like … visiting Israel.)

If you won’t answer the question, then it wasn’t Zionist Jews who told you that their choices are either to (1) oppose you or else (2) go up in black smoke like their grandparents did. You just told them that yourself.


Many will ask: why don’t I likewise have an obligation to give you my Palestinian survival plan?

I do. But the nice thing about my position is that I can tell you my Palestinian survival plan cheerfully, immediately, with zero equivocating or changing the subject. It’s broadly the same plan that David Ben-Gurion and Yitzchak Rabin and Ehud Barak and Bill Clinton and the UN put on the table over and over and over, only for the Palestinians’ leaders to sweep it off.

I want the Palestinians to have a state, comprising the West Bank and Gaza, with a capital in East Jerusalem. I want Israel to uproot all West Bank settlements that prevent such a state. I want this to happen the instant there arises a Palestinian leadership genuinely committed to peace—one that embraces liberal values and rejects martyr values, in everything from textbooks to street names.

And I want more. I want the new Palestinian state to be as prosperous and free and educated as modern Germany and Japan are. I want it to embrace women’s rights and LGBTQ+ rights and the rest of the modern package, so that “Queers for Palestine” would no longer be a sick joke. I want the new Palestine to be as intertwined with Israel, culturally and economically, as the US and Canada are.

Ironically, if this ever became a reality, then Israel-as-a-Jewish-state would no longer be needed—but it’s certainly needed in the meantime.

Anti-Zionists on Twitter: can you be equally explicit about what you want?


I come, finally, to what many anti-Zionists regard as their ultimate trump card. Look at all the anti-Zionist Jews and Israelis who agree with us, they say. Jewish Voice for Peace. IfNotNow. Noam Chomsky. Norman Finkelstein. The Neturei Karta.

Intellectually, of course, the fact of anti-Zionist Jews makes not the slightest difference to anything. My question for them remains exactly the same as for anti-Zionist Gentiles: what is your Jewish survival plan, for the day after we dismantle the racist supremacist apartheid state that’s currently the only thing standing between half the world’s remaining Jews and their slaughter by their neighbors? Feel free to choose from any of the four options above, or suggest a fifth.

But in the event that Jewish anti-Zionists evade that conversation, or change the subject from it, maybe some special words are in order. You know the famous Golda Meir line, “If we have to choose between being dead and pitied and being alive with a bad image, we’d rather be alive and have the bad image”?

It seems to me that many anti-Zionist Jews considered Golda Meir’s question carefully and honestly, and simply decided it the other way, in favor of Jews being dead and pitied.

Bear with me here: I won’t treat this as a reductio ad absurdum of their position. Not even if the anti-Zionist Jews themselves wish to remain safely ensconced in Berkeley or New Haven, while the Israelis fulfill the “dead and pitied” part for them.

In fact, I’ll go further. Again and again in life I’ve been seized by a dark thought: if half the world’s Jews can only be kept alive, today, via a militarized ethnostate that constantly needs to defend its existence with machine guns and missiles, racking up civilian deaths and destabilizing the world’s geopolitics—if, to put a fine point on it, there are 16 million Jews in the world, but at least a half billion antisemites who wake up every morning and go to sleep every night desperately wishing those Jews dead—then, from a crude utilitarian standpoint, might it not be better for the world if we Jews vanished after all?

Remember, I’m someone who spent a decade asking myself whether the rapacious, predatory nature of men’s sexual desire for women, which I experienced as a curse and an affliction, meant that the only moral course for me was to spend my life as a celibate mathematical monk. But I kept stumbling over one point: why should such a moral obligation fall on me alone? Why doesn’t it fall on other straight men, particularly the ones who presume to lecture me on my failings?

And also: supposing I did take the celibate monk route, would even that satisfy my haters? Would they come after me anyway for glancing at a woman too long or making an inappropriate joke? And also: would the haters soon say I shouldn’t have my scientific career either, since I’ve stolen my coveted academic position from the underprivileged? Where exactly does my self-sacrifice end?

When I did, finally, start approaching women and asking them out on dates, I worked up the courage partly by telling myself: I am now going to do the Zionist thing. I said: if other nerdy Jews can risk death in war, then this nerdy Jew can risk ridicule and contemptuous stares. You can accept that half the world will denounce you as a monster for living your life, so long as your own conscience (and, hopefully, the people you respect the most) continue to assure you that you’re nothing of the kind.

This took more than a decade of internal struggle, but it’s where I ended up. And today, if anyone tells me I had no business ever forming any romantic attachments, I have two beautiful children as my reply. I can say: forget about me, you’re asking for my children never to have existed—that’s why I’m confident you’re wrong.

Likewise with the anti-Zionists. When the Twitter-warriors share their memes of hook-nosed Jews strangling the planet, innocent Palestinian blood dripping from their knives, when the global protests shut down schools and universities and bridges and parliament buildings, there’s a part of me that feels eager to commit suicide if only it would appease the mob, if only it would expiate all the cosmic guilt they’ve loaded onto my shoulders.

But then I remember that this isn’t just about me. It’s about Einstein and Spinoza and Feynman and Erdös and von Neumann and Weinberg and Landau and Michelson and Rabi and Tarski and Asimov and Sagan and Salk and Noether and Meitner, and Irving Berlin and Stan Lee and Rodney Dangerfield and Steven Spielberg. Even if I didn’t happen to be born Jewish—if I had anything like my current values, I’d still think that so much of what’s worth preserving in human civilization, so much of math and science and Enlightenment and democracy and humor, would seem oddly bound up with the continued survival of this tiny people. And conversely, I’d think that so much of what’s hateful in civilization would seem oddly bound up with the quest to exterminate this tiny people, or to deny it any means to defend itself from extermination.

So that’s my answer, both to anti-Zionist Gentiles and to anti-Zionist Jews. The problem of Jewish survival, on a planet much of which yearns for the Jews’ annihilation and much of the rest of which is indifferent, is both hard and important, like P versus NP. And so a radical solution was called for. The solution arrived at a century ago, at once brand-new and older than Homer and Hesiod, was called the State of Israel. If you can’t stomach that solution—if, in particular, you can’t stomach the violence needed to preserve it, so long as Israel’s neighbors retain their annihilationist dream—then your response ought to be to propose a better solution. I promise to consider your solution in good faith—asking, just like with P vs. NP provers, how you overcome the problems that doomed all previous attempts. But if you throw my demand for a better solution back in my face, then you might as well be pushing my kids into a gas chamber yourself, for all the moral authority that I now recognize you to have over me.


Possibly the last thing Einstein wrote was a speech celebrating Israel’s 7th Independence Day, which he died a week before he was to deliver. So let’s turn the floor over to Mr. Albert, the leftist pacifist internationalist:

This is the seventh anniversary of the establishment of the State of Israel. The establishment of this State was internationally approved and recognised largely for the purpose of rescuing the remnant of the Jewish people from unspeakable horrors of persecution and oppression.

Thus, the establishment of Israel is an event which actively engages the conscience of this generation. It is, therefore, a bitter paradox to find that a State which was destined to be a shelter for a martyred people is itself threatened by grave dangers to its own security. The universal conscience cannot be indifferent to such peril.

It is anomalous that world opinion should only criticize Israel’s response to hostility and should not actively seek to bring an end to the Arab hostility which is the root cause of the tension.

I love Einstein’s use of “anomalous,” as if this were a physics problem. From the standpoint of history, what’s anomalous about the Israeli-Palestinian conflict is not, as the Twitterers claim, the brutality of the Israelis—if you think that’s anomalous, you really haven’t studied history—but something different. In other times and places, an entity like Palestine, which launches a war of total annihilation against a much stronger neighbor, and then another and another, would soon disappear from the annals of history. Israel, however, is held to a different standard. Again and again, bowing to international pressure and pressure from its own left flank, the Israelis have let their would-be exterminators off the hook, bruised but mostly still alive and completely unrepentant, to have another go at finishing the Holocaust in a few years. And after every bout, sadly but understandably, Israeli culture drifts more to the right, becomes 10% more like the other side always was.

I don’t want Israel to drift to the right. I find the values of Theodor Herzl and David Ben-Gurion to be almost as good as any human values have ever been, and I’d like Israel to keep them. Of course, Israel will need to continue defending itself from genocidal neighbors, until the day that a leader arises among the Palestinians with the moral courage of Egypt’s Anwar Sadat or Jordan’s King Hussein: a leader who not only talks peace but means it. Then there can be peace, and an end of settlements in the West Bank, and an independent Palestinian state. And however much like dark comedy that seems right now, I’m actually optimistic that it will someday happen, conceivably even soon depending on what happens in the current war. Unless nuclear war or climate change or AI apocalypse makes the whole question moot.


Anyway, thanks for reading—a lot built up these past months that I needed to get off my chest. When I told a friend that I was working on this post, he replied “I agree with you about Israel, of course, but I choose not to die on that hill in public.” I answered that I’ve already died on that hill and on several other hills, yet am somehow still alive!

Meanwhile, I was gratified that other friends, even ones who strongly disagree with me about Israel, told me that I should not disengage, but continue to tell it like I see it, trying civilly to change minds while being open to having my own mind changed.

And now, maybe, I can at last go back to happier topics, like how to prevent the destruction of the world by AI.

Cheers,
Scott

April 02, 2024

Terence TaoAI Mathematical Olympiad – Progress Prize Competition now open

The first progress prize competition for the AI Mathematical Olympiad has now launched. (Disclosure: I am on the advisory committee for the prize.) This is a competition in which contestants submit an AI model which, after the submissions deadline on June 27, will be tested (on a fixed computational resource, without internet access) on a set of 50 “private” test math problems, each of which has an answer as an integer between 0 and 999. Prior to the close of submission, the models can be tested on 50 “public” test math problems (where the results of the model are public, but not the problems themselves), as well as 10 training problems that are available to all contestants. As of this time of writing, the leaderboard shows that the best-performing model has solved 4 out of 50 of the questions (a standard benchmark, Gemma 7B, had previously solved 3 out of 50). A total of $2^{20} ($1.048 million) has been allocated for various prizes associated to this competition. More detailed rules can be found here.

Jordan EllenbergOrioles 13, Angels 4

I had the great privilege to be present at Camden Yards last weekend for what I believe to be the severest ass-whupping I have ever personally seen the Orioles administer. The Orioles went into the 6th winning 3-1 but the game felt like they were winning by more than that. Then suddenly they actually were — nine batters, nine runs, no outs (though in the middle of it all there was an easy double-play ball by Ramon Urias that the Angels’ shortstop Zach Neto just inexplicably dropped — it was that kind of day.) We had pitching (Grayson Rodriguez almost unhittable for six innings but for one mistake pitch), defense (Urias snagging a line drive at third almost before I saw it leave the bat) and of course a three-run homer, by Anthony Santander, to plate the 7th, 8th, and 9th of those nine runs.

Is being an Angels fan the saddest kind of fan to be right now? The Mets and the Padres, you have more of a “we spent all the money and built what should have been a superteam and didn’t win.” The A’s, you have the embarrassment of the on-field performance and the fact that your owner screwed your city and moved the team out of town. But the Angels? Somehow they just put together the two generational talents of this era of baseball and — didn’t do anything with them. There’s a certain heaviness to the sadness.

As good as the Orioles have been so far, taking three out of their first four and massively outscoring the opposition, I still think they weren’t really a 101-win team last year, and everything will have to go right again for them to be as good this year as they were last year. Our Felix Bautista replacement, Craig Kimbrel, has already blown his first and only save opportunity, which is to say he’s not really a Felix Bautista replacement. But it’s a hell of a team to watch.

The only downside — Gunnar Henderson, with a single, a triple and a home run already, is set to lead off the ninth but Hyde brings in Tony Kemp to pinch hit. Why? The fans want to see Gunnar on second for the cycle, let the fans see Gunnar on second for the cycle.

March 30, 2024

Andrew JaffeThe Milky Way

Doug NatelsonThoughts on undergrad solid-state content

Figuring out what to include in an undergraduate introduction to solid-state physics course is always a challenge.   Books like the present incarnation of Kittel are overstuffed with more content than can readily fit in a one-semester course, and because that book has grown organically from edition to edition, it's organizationally not the most pedagogical.  I'm a big fan of and have been teaching from my friend Steve Simon's Oxford Solid State Basics, which is great but a bit short for a (US) one-semester class.  Prof. Simon is interested in collecting opinions on what other topics would be good to include in a hypothetical second edition or second volume, and we thought that crowdsourcing it to this blog's readership could be fun.  As food for thought, some possibilities that occurred to me were:

  • A slightly longer discussion of field-effect transistors, since they're the basis for so much modern technology
  • A chapter or two on materials of reduced dimensionality (2D electron gas, 1D quantum wires, quantum point contacts, quantum dots; graphene and other 2D materials)
  • A discussion of fermiology (Shubnikov-DeHaas, DeHaas-van Alphen) - this is in Kittel, but it's difficult to explain in an accessible way
  • An introduction to the quantum Hall effect
  • Some mention of topology (anomalous velocity?  Berry connection?)
  • An intro to superconductivity (though without second quantization and the gap equation, this ends up being phenomenology)
  • Some discussion of Ginzburg-Landau treatment of phase transitions (though I tend to think of that as a topic for a statistical/thermal physics course)
  • An intro to Fermi liquid theory
  • Some additional discussion of electronic structure methods beyond the tight binding and nearly-free electron approaches in the present book (Wannier functions, an intro to density functional theory)
What do people think about this?

March 29, 2024

Matt von HippelGeneralizing a Black Box Theory

In physics and in machine learning, we have different ways of thinking about models.

A model in physics, like the Standard Model, is a tool to make predictions. Using statistics and a whole lot of data (from particle physics experiments), we fix the model’s free parameters (like the mass of the Higgs boson). The model then lets us predict what we’ll see next: when we turn on the Large Hadron Collider, what will the data look like? In physics, when a model works well, we think that model is true, that it describes the real way the world works. The Standard Model isn’t the ultimate truth: we expect that a better model exists that makes better predictions. But it is still true, in an in-between kind of way. There really are Higgs bosons, even if they’re a result of some more mysterious process underneath, just like there really are atoms, even if they’re made out of protons, neutrons, and electrons.

A model in machine learning, like the Large Language Model that fuels ChatGPT, is also a tool to make predictions. Using statistics and a whole lot of data (from text on the internet, or images, or databases of proteins, or games of chess…) we fix the model’s free parameters (called weights, numbers for the strengths of connections between metaphorical neurons). The model then lets us predict what we’ll see next: when a text begins “Q: How do I report a stolen card? A:”, how does it end?

So far, that sounds a lot like physics. But in machine learning, we don’t generally think these models are true, at least not in the same way. The thing producing language isn’t really a neural network like a Large Language Model. It’s the sum of many human brains, many internet users, spread over many different circumstances. Each brain might be sort of like a neural network, but they’re not like the neural networks sitting on OpenAI’s servers. A Large Language Model isn’t true in some in-between kind of way, like atoms or Higgs bosons. It just isn’t true. It’s a black box, a machine that makes predictions, and nothing more.

But here’s the rub: what do we mean by true?

I want to be a pragmatist here. I don’t want to get stuck in a philosophical rabbit-hole, arguing with metaphysicists about what “really exists”. A true theory should be one that makes good predictions, that lets each of us know, based on our actions, what we should expect to see. That’s why science leads to technology, why governments and companies pay people to do it: because the truth lets us know what will happen, and make better choices. So if Large Language Models and the Standard Model both make good predictions, why is only one of them true?

Recently, I saw Dan Elton of More is Different make the point that there is a practical reason to prefer the “true” explanations: they generalize. A Large Language Model might predict what words come next in a text. But it doesn’t predict what happens when you crack someone’s brain open and see how the neurons connect to each other, even if that person is the one who made the text. A good explanation, a true model, can be used elsewhere. The Standard Model tells you what data from the Large Hadron Collider will look like, but it also tells you what data from the muon g-2 experiment will look like. It also, in principle, tells you things far away from particle physics: what stars look like, what atoms look like, what the inside of a nuclear reactor looks like. A black box can’t do that, even if it makes great predictions.

It’s a good point. But thinking about it, I realized things are a little murkier.

You can’t generalize a Large Language Model to tell you how human neurons are connected. But you can generalize it in other ways, and people do. There’s a huge industry in trying to figure out what GPT and its relatives “know”. How much math can they do? How much do they know about geography? Can they predict the future?

These generalizations don’t work the way that they do in physics, or the rest of science, though. When we generalize the Standard Model, we aren’t taking a machine that makes particle physics predictions and trying to see what those particle physics predictions can tell us. We’re taking something “inside” the machine, the fields and particles, and generalizing that, seeing how the things around us could be made of those fields and those particles. In contrast, when people generalize GPT, they typically don’t look inside the “black box”. They use the Large Language Model to make predictions, and see what those predictions “know about”.

On the other hand, we do sometimes generalize scientific models that way too.

If you’re simulating the climate, or a baby star, or a colony of bacteria, you typically aren’t using your simulation like a prediction machine. You don’t plug in exactly what is going on in reality, then ask what happens next. Instead, you run many simulations with different conditions, and look for patterns. You see how a cloud of sulfur might cool down the Earth, or how baby stars often form in groups, leading them to grow up into systems of orbiting black holes. Your simulation is kind of like a black box, one that you try out in different ways until you uncover some explainable principle, something your simulation “knows” that you can generalize.

And isn’t nature that kind of black box, too? When we do an experiment, aren’t we just doing what the Large Language Models are doing, prompting the black box in different ways to get an idea of what it knows? Are scientists who do experiments that picky about finding out what’s “really going on”, or do they just want a model that works?

We want our models to be general, and to be usable. Building a black box can’t be the whole story, because a black box, by itself, isn’t general. But it can certainly be part of the story. Going from the black box of nature to the black box of a machine lets you run tests you couldn’t previously do, lets you investigate faster and ask stranger questions. With a simulation, you can blow up stars. With a Large Language Model, you can ask, for a million social media comments, whether the average internet user would call them positive or negative. And if you make sure to generalize, and try to make better decisions, then it won’t be just the machine learning. You’ll be learning too.

March 27, 2024

John BaezT Corona Borealis

 

Sometime this year, the star T Corona Borealis will go nova and become much brighter! At least that’s what a lot of astronomers think. So examine the sky between Arcturus and Vega now—and look again if you hear this event has happened. Normally this star is magnitude 10, too dim to see. When it goes nova is should reach magnitude 2 for a week—as bright as the North Star. So you will see a new star, which is the original meaning of ‘nova’.

But why do they think T Corona Borealis will go nova this year? How could they possibly know that?

It’s done this before. It’s a binary star with a white dwarf orbiting a red giant. The red giant is spewing out gas. The much denser white dwarf collects some of this gas on its surface until there’s enough fuel to cause a runaway thermonuclear reaction—a nova!

We’ve seen it happen twice. T Corona Borealis went nova on May 12, 1866 and again on February 9, 1946. What’s happening now is a lot like what happened in 1946.

In February 2015, there was a sustained brightening of T Corona Borealis: it went from magnitude 10.5 to about 9.2. The same thing happened eight years before it went nova the last time.

In June 2018, the star dimmed slightly but still remained at an unusually high level of activity. Then in April 2023 it dimmed to magnitude 12.3. The same thing happened one year before it went nova the last time.

If this pattern continues, T Corona Borealis should erupt sometime between now and September 2024. I’m not completely confident that it will follow the same pattern! But we can just wait and see.

This is one of only 5 known repeating novas in the Milky Way, so we’re lucky to have this chance.

Here’s how it might work:

The description at NASA’s blog:

A red giant star and white dwarf orbit each other in this animation of a nova. The red giant is a large sphere in shades of red, orange, and white, with the side facing the white dwarf the lightest shades. The white dwarf is hidden in a bright glow of white and yellows, which represent an accretion disk around the star. A stream of material, shown as a diffuse cloud of red, flows from the red giant to the white dwarf. The animation opens with the red giant on the right side of the screen, co-orbiting the white dwarf. When the red giant moves behind the white dwarf, a nova explosion on the white dwarf ignites, filling the screen with white light. After the light fades, a ball of ejected nova material is shown in pale orange. A small white spot remains after the fog of material clears, indicating that the white dwarf has survived the explosion.

For more details, try this:

• B. E. Schaefer, B. Kloppenborg, E. O. Waagen and the AAVSO observers, Announcing T CrB pre-eruption dip, AAVSO News and Announcements.

March 25, 2024

John PreskillMy experimental adventures in quantum thermodynamics

Imagine a billiard ball bouncing around on a pool table. High-school level physics enables us to predict its motion until the end of time using simple equations for energy and momentum conservation, as long as you know the initial conditions – how fast the ball is moving at launch, and in which direction.

What if you add a second ball? This makes things more complicated, but predicting the future state of this system would still be possible based on the same principles. What about if you had a thousand balls, or a million? Technically, you could still apply the same equations, but the problem would not be tractable in any practical sense.

Billiard balls bouncing around on a pool table are a good analogy for a many-body system like a gas of molecules. Image credit

Thermodynamics lets us make precise predictions about averaged (over all the particles) properties of complicated, many-body systems, like millions of billiard balls or atoms bouncing around, without needing to know the gory details. We can make these predictions by introducing the notion of probabilities. Even though the system is deterministic – we can in principle calculate the exact motion of every ball – there are so many balls in this system, that the properties of the whole will be very close to the average properties of the balls. If you throw a six-sided die, the result is in principle deterministic and predictable, based on the way you throw it, but it’s in practice completely random to you – it could be 1 through 6, equally likely. But you know that if you cast a thousand dice, the average will be close to 3.5 – the average of all possibilities. Statistical physics enables us to calculate a probability distribution over the energies of the balls, which tells us everything about the average properties of the system. And because of entropy – the tendency for the system to go from ordered to disordered configurations, even if the probability distribution of the initial system is far from the one statistical physics predicts, after the system is allowed to bounce around and settle, this final distribution will be extremely close to a generic distribution that depends on average properties only. We call this the thermal distribution, and the process of the system mixing and settling to one of the most likely configurations – thermalization.

For a practical example – instead of billiard balls, consider a gas of air molecules bouncing around. The average energy of this gas is proportional to its temperature, which we can calculate from the probability distribution of energies. Being able to predict the temperature of a gas is useful for practical things like weather forecasting, cooling your home efficiently, or building an engine. The important properties of the initial state we needed to know – energy and number of particles – are conserved during the evolution, and we call them “thermodynamic charges”. They don’t actually need to be electric charges, although it is a good example of something that’s conserved.

Let’s cross from the classical world – balls bouncing around – to the quantum one, which deals with elementary particles that can be entangled, or in a superposition. What changes when we introduce this complexity? Do systems even thermalize in the quantum world? Because of the above differences, we cannot in principle be sure that the mixing and settling of the system will happen just like in the classical cases of balls or gas molecules colliding.

A visualization of a complex pattern called a quantum scar that can develop in quantum systems. Image credit

It turns out that we can predict the thermal state of a quantum system using very similar principles and equations that let us do this in the classical case. Well, with one exception – what if we cannot simultaneously measure our critical quantities – the charges?

One of the quirks of quantum mechanics is that observing the state of the system can change it. Before the observation, the system might be in a quantum superposition of many states. After the observation, a definite classical value will be recorded on our instrument – we say that the system has collapsed to this state, and thus changed its state. There are certain observables that are mutually incompatible – we cannot know their values simultaneously, because observing one definite value collapses the system to a state in which the other observable is in a superposition. We call these observables noncommuting, because the order of observation matters – unlike in multiplication of numbers, which is a commuting operation you’re familiar with. 2 * 3 = 6, and also 3 * 2 = 6 – the order of multiplication doesn’t matter.

Electron spin is a common example that entails noncommutation. In a simplified picture, we can think of spin as an axis of rotation of our electron in 3D space. Note that the electron doesn’t actually rotate in space, but it is a useful analogy – the property is “spin” for a reason. We can measure the spin along the x-,y-, or z-axis of a 3D coordinate system and obtain a definite positive or negative value, but this observation will result in a complete loss of information about spin in the other two perpendicular directions.

An illustration of electron spin. We can imagine it as an axis in 3D space that points in a particular direction. Image from Wikimedia Commons.

If we investigate a system that conserves the three spin components independently, we will be in a situation where the three conserved charges do not commute. We call them “non-Abelian” charges, because they enjoy a non-Abelian, that is, noncommuting, algebra. Will such a system thermalize, and if so, to what kind of final state?

This is precisely what we set out to investigate. Noncommutation of charges breaks usual derivations of the thermal state, but researchers have managed to show that with non-Abelian charges, a subtly different non-Abelian thermal state (NATS) should emerge. Myself and Nicole Yunger Halpern at the Joint Center for Quantum Information and Computer Science (QuICS) at the University of Maryland have collaborated with Amir Kalev from the Information Sciences Institute (ISI) at the University of Southern California, and experimentalists from the University of Innsbruck (Florian Kranzl, Manoj Joshi, Rainer Blatt and Christian Roos) to observe thermalization in a non-Abelian system – and we’ve recently published this work in PRX Quantum .

The experimentalists used a device that can trap ions with electric fields, as well as manipulate and read out their states using lasers. Only select energy levels of these ions are used, which effectively makes them behave like electrons. The laser field can couple the ions in a way that approximates the Heisenberg Hamiltonian – an interaction that conserves the three total spin components individually. We thus construct the quantum system we want to study – multiple particles coupled with interactions that conserve noncommuting charges.

We conceptually divide the ions into a system of interest and an environment. The system of interest, which consists of two particles, is what we want to measure and compare to theoretical predictions. Meanwhile, the other ions act as the effective environment for our pair of ions – the environment ions interact with the pair in a way that simulates a large bath exchanging heat and spin.

Photo of our University of Maryland group. From left to right: Twesh Upadhyaya, Billy Braasch, Shayan Majidy, Nicole Yunger Halpern, Aleks Lasek, Jose Antonio Guzman, Anthony Munson.

If we start this total system in some initial state, and let it evolve under our engineered interaction for a long enough time, we can then measure the final state of the system of interest. To make the NATS distinguishable from the usual thermal state, I designed an initial state that is easy to prepare, and has the ions pointing in directions that result in high charge averages and relatively low temperature. High charge averages make the noncommuting nature of the charges more pronounced, and low temperature makes the state easy to distinguish from the thermal background. However, we also show that our experiment works for a variety of more-arbitrary states.

We let the system evolve from this initial state for as long as possible given experimental limitations, which was 15 ms. The experimentalists then used quantum state tomography to reconstruct the state of the system of interest. Quantum state tomography makes multiple measurements over many experimental runs to approximate the average quantum state of the system measured. We then check how close the measured state is to the NATS. We have found that it’s about as close as one can expect in this experiment!

And we know this because we have also implemented a different coupling scheme, one that doesn’t have non-Abelian charges. The expected thermal state in the latter case was reached within a distance that’s a little smaller than our non-Abelian case. This tells us that the NATS is almost reached in our experiment, and so it is a good, and the best known, thermal state for the non-Abelian system – we have compared it to competitor thermal states.

Working with the experimentalists directly has been a new experience for me. While I was focused on the theory and analyzing the tomography results they obtained, they needed to figure out practical ways to realize what we asked of them. I feel like each group has learned a lot about the tasks of the other. I have become well acquainted with the trapped ion experiment and its capabilities and limitation. Overall, it has been great collaborating with the Austrian group.

Our result is exciting, as it’s the first experimental observation within the field of non-Abelian thermodynamics! This result was observed in a realistic, non-fine-tuned system that experiences non-negligible errors due to noise. So the system does thermalize after all. We have also demonstrated that the trapped ion experiment of our Austrian friends can be used to simulate interesting many-body quantum systems. With different settings and programming, other types of couplings can be simulated in different types of experiments.

The experiment also opened avenues for future work. The distance to the NATS was greater than the analogous distance to the Abelian system. This suggests that thermalization is inhibited by the noncommutation of charges, but more evidence is needed to justify this claim. In fact, our other recent paper in Physical Review B suggests the opposite!

As noncommutation is one of the core features that distinguishes classical and quantum physics, it is of great interest to unravel the fine differences non-Abelian charges can cause. But we also hope that this research can have practical uses. If thermalization is disrupted by noncommutation of charges, engineered systems featuring them could possibly be used to build quantum memory that is more robust, or maybe even reduce noise in quantum computers. We continue to explore noncommutation, looking for interesting effects that we can pin on it. I am currently working on verifying the workings of a hypothesis that explains when and why quantum systems thermalize internally.

Doug NatelsonItems of interest

The time since the APS meeting has been very busy, hence the lack of posting.  A few items of interest:

  • The present issue of Nature Physics has several articles about physics education that I really want to read. 
  • This past week we hosted N. Peter Armitage for a really fun colloquium "On Ising's Model of Magnetism" (a title that he acknowledged borrowing from Peierls).  In addition to some excellent science about spin chains, the talk included a lot of history of science about Ising that I hadn't known.  An interesting yet trivial tidbit: when he was in Germany and later Luxembourg, the pronunciation was "eeesing", while after emigrating to the US, he changed it to "eye-sing", so however you've been saying it to yourself, you're not wrong.  The fact that the Isings survived the war in Europe is amazing, given that he was a Jew in an occupied country.  Someone should write a biography....
  • When I participated in a DOD-related program 13 years ago, I had the privilege to meet General Al Gray, former commandant of the US Marine Corps.  He just passed away this week, and people had collected Grayisms (pdf), his takes on leadership and management.  I'm generally not a big fan of leadership guides and advice books, but this is good stuff, told concisely.
  • It took a while, but a Scientific American article that I wrote is now out in the April issue.
  • Integrating nitrogen-vacancy centers for magnetic field sensing directly into the diamond anvils seems like a great way to make progress on characterizing possible superconductivity in hydrides at high pressures.
  • Congratulations to Peter Woit on 20 (!!) years of blogging at Not Even Wrong.  

March 24, 2024

Tommaso DorigoThe Analogy: A Powerful Instrument For Physics Outreach

About a month ago I was contacted by a colleague who invited me to write a piece on the topic of science outreach for an electronic journal (Ithaca). I was happy to accept, but when I later pondered on what I would have liked to write, I could not help thinking back at a piece on the power and limits of the use of analogies in the explanation of physics, which I wrote 12 years ago as a proceedings paper for a conference themed on physics outreach in Torino. It dawned on me that although 12 years had gone by, my understanding of what constitutes good techniques for engagement of the public and for effective communication of scientific concepts had not widened very significantly. 

read more

March 22, 2024

Matt von HippelHow Subfields Grow

A commenter recently asked me about the different “tribes” in my sub-field. I’ve been working in an area called “amplitudeology”, where we try to find more efficient ways to make predictions (calculate “scattering amplitudes”) for particle physics and gravitational waves. I plan to do a longer post on the “tribes” of amplitudeology…but not this week.

This week, I’ve got a simpler goal. I want to talk about where these kinds of “tribes” come from, in general. A sub-field is a group of researchers focused on a particular idea, or a particular goal. How do those groups change over time? How do new sub-groups form? For the amplitudes fans in the audience, I’ll use amplitudeology examples to illustrate.

The first way subfields gain new tribes is by differentiation. Do a PhD or a Postdoc with someone in a subfield, and you’ll learn that subfield’s techniques. That’s valuable, but probably not enough to get you hired: if you’re just a copy of your advisor, then the field just needs your advisor: research doesn’t need to be done twice. You need to differentiate yourself, finding a variant of what your advisor does where you can excel. The most distinct such variants go on to form distinct tribes of their own. This can also happen for researchers at the same level who collaborate as Postdocs. Each has to show something new, beyond what they did as a team. In my sub-field, it’s the source of some of the bigger tribes. Lance Dixon, Zvi Bern, and David Kosower made their names working together, but when they found long-term positions they made new tribes of their own. Zvi Bern focused on supergravity, and later on gravitational waves, while Lance Dixon was a central figure in the symbology bootstrap.

(Of course, if you differentiate too far you end up in a different sub-field, or a different field altogether. Jared Kaplan was an amplitudeologist, but I wouldn’t call Anthropic an amplitudeology project, although it would help my job prospects if it was!)

The second way subfields gain new tribes is by bridges. Sometimes, a researcher in a sub-field needs to collaborate with someone outside of that sub-field. These collaborations can just be one-and-done, but sometimes they strike up a spark, and people in each sub-field start realizing they have a lot more in common than they realized. They start showing up to each other’s conferences, and eventually identifying as two tribes in a single sub-field. An example from amplitudeology is the group founded by Dirk Kreimer, with a long track record of interesting work on the boundary between math and physics. They didn’t start out interacting with the “amplitudeology” community itself, but over time they collaborated with them more and more, and now I think it’s fair to say they’re a central part of the sub-field.

A third way subfields gain new tribes is through newcomers. Sometimes, someone outside of a subfield will decide they have something to contribute. They’ll read up on the latest papers, learn the subfield’s techniques, and do something new with them: applying them to a new problem of their own interest, or applying their own methods to a problem in the subfield. Because these people bring something new, either in what they work on or how they do it, they often spin off new tribes. Many new tribes in amplitudeology have come from this process, from Edward Witten’s work on the twistor string bringing in twistor approaches to Nima Arkani-Hamed’s idiosyncratic goals and methods.

There are probably other ways subfields gain new tribes, but these are the ones I came up with. If you think of more, let me know in the comments!

March 18, 2024

John PreskillThe quantum gold rush

Even if you don’t recognize the name, you probably recognize the saguaro cactus. It’s the archetype of the cactus, a column from which protrude arms bent at right angles like elbows. As my husband pointed out, the cactus emoji is a saguaro: 🌵. In Tucson, Arizona, even the airport has a saguaro crop sufficient for staging a Western short film. I didn’t have a film to shoot, but the garden set the stage for another adventure: the ITAMP winter school on quantum thermodynamics.

Tucson airport

ITAMP is the Institute for Theoretical Atomic, Molecular, and Optical Physics (the Optical is silent). Harvard University and the Smithsonian Institute share ITAMP, where I worked as a postdoc. ITAMP hosted the first quantum-thermodynamics conference to take place on US soil, in 2017. Also, ITAMP hosts a winter school in Arizona every February. (If you lived in the Boston area, you might want to escape to the southwest then, too.) The winter school’s topic varies from year to year. 

How about a winter school on quantum thermodynamics? ITAMP’s director, Hossein Sadeghpour, asked me when I visited Cambridge, Massachusetts last spring.

Let’s do it, I said. 

Lecturers came from near and far. Kanu Sinha, of the University of Arizona, spoke about how electric charges fluctuate in the quantum vacuum. Fluctuations feature also in extensions of the second law of thermodynamics, which helps explain why time flows in only one direction. Gabriel Landi, from the University of Rochester, lectured about these fluctuation relations. ITAMP Postdoctoral Fellow Ceren Dag explained why many-particle quantum systems register time’s arrow. Ferdinand Schmidt-Kaler described the many-particle quantum systems—the trapped ions—in his lab at the University of Mainz.

Ronnie Kosloff, of Hebrew University in Jerusalem, lectured about quantum engines. Nelly Ng, an Assistant Professor at Nanyang Technological University, has featured on Quantum Frontiers at least three times. She described resource theories—information-theoretic models—for thermodynamics. Information and energy both serve as resources in thermodynamics and computation, I explained in my lectures.

The 2024 ITAMP winter school

The winter school took place at the conference center adjacent to Biosphere 2. Biosphere 2 is an enclosure that contains several miniature climate zones, including a coastal fog desert, a rainforest, and an ocean. You might have heard of Biosphere 2 due to two experiments staged there during the 1990s: in each experiment, a group of people was sealed in the enclosure. The experimentalists harvested their own food and weren’t supposed to receive any matter from outside. The first experiment lasted for two years. The group, though, ran out of oxygen, which a support crew pumped in. Research at Biosphere 2 contributes to our understanding of ecosystems and space colonization.

Fascinating as the landscape inside Biosphere 2 is, so is the landscape outside. The winter school included an afternoon hike, and my husband and I explored the territory around the enclosure.

Did you see any snakes? my best friend asked after I returned home.

No, I said. But we were chased by a vicious beast. 

On our first afternoon, my husband and I followed an overgrown path away from the biosphere to an almost deserted-looking cluster of buildings. We eventually encountered what looked like a warehouse from which noises were emanating. Outside hung a sign with which I resonated.

Scientists, I thought. Indeed, a researcher emerged from the warehouse and described his work to us. His group was preparing to seal off a building where they were simulating a Martian environment. He also warned us about the territory we were about to enter, especially the creature that roosted there. We were too curious to retreat, though, so we set off into a ghost town.

At least, that’s what the other winter-school participants called the area, later in the week—a ghost town. My husband and I had already surveyed the administrative offices, conference center, and other buildings used by biosphere personnel today. Personnel in the 1980s used a different set of buildings. I don’t know why one site gave way to the other. But the old buildings survive—as what passes for ancient ruins to many Americans. 

Weeds have grown up in the cracks in an old parking lot’s tarmac. A sign outside one door says, “Classroom”; below it is a sign that must not have been correct in decades: “Class in progress.” Through the glass doors of the old visitors’ center, we glimpsed cushioned benches and what appeared to be a diorama exhibit; outside, feathers and bird droppings covered the ground. I searched for a tumbleweed emoji, to illustrate the atmosphere, but found only a tumbler one: 🥃.

After exploring, my husband and I rested in the shade of an empty building, drank some of the water we’d brought, and turned around. We began retracing our steps past the defunct visitors’ center. Suddenly, a monstrous Presence loomed on our right. 

I can’t tell you how large it was; I only glimpsed it before turning and firmly not running away. But the Presence loomed. And it confirmed what I’d guessed upon finding the feathers and droppings earlier: the old visitors’ center now served as the Lair of the Beast.

The Mars researcher had warned us about the aggressive male turkey who ruled the ghost town. The turkey, the researcher had said, hated men—especially men wearing blue. My husband, naturally, was wearing a blue shirt. You might be able to outrun him, the researcher added pensively.

My husband zipped up his black jacket over the blue shirt. I advised him to walk confidently and not too quickly. Hikes in bear country, as well as summers at Busch Gardens Zoo Camp, gave me the impression that we mustn’t run; the turkey would probably chase us, get riled up, and excite himself to violence. So we walked, and the monstrous turkey escorted us. For surprisingly and frighteningly many minutes. 

The turkey kept scolding us in monosyllabic squawks, which sounded increasingly close to the back of my head. I didn’t turn around to look, but he sounded inches away. I occasionally responded in the soothing voice I was taught to use on horses. But my husband and I marched increasingly quickly.

We left the old visitors’ center, curved around, and climbed most of a hill before ceasing to threaten the turkey—or before he ceased to threaten us. He squawked a final warning and fell back. My husband and I found ourselves amid the guest houses of workshops past, shaky but unmolested. Not that the turkey wreaks much violence, according to the Mars researcher: at most, he beats his wings against people and scratches up their cars (especially blue ones). But we were relieved to return to civilization.

Afternoon hike at Catalina State Park, a drive away from Biosphere 2. (Yes, that’s a KITP hat.)

The ITAMP winter school reminded me of Roughing It, a Mark Twain book I finished this year. Twain chronicled the adventures he’d experienced out West during the 1860s. The Gold Rush, he wrote, attracted the top young men of all nations. The quantum-technologies gold rush has been attracting the top young people of all nations, and the winter school evidenced their eagerness. Yet the winter school also evidenced how many women have risen to the top: 10 of the 24 registrants were women, as were four of the seven lecturers.1 

The winter-school participants in the shuttle I rode from the Tucson airport to Biosphere 2

We’ll see to what extent the quantum-technologies gold rush plays out like Mark Twain’s. Ours at least involves a ghost town and ferocious southwestern critters.

1For reference, when I applied to graduate programs, I was told that approximately 20% of physics PhD students nationwide were women. The percentage of women drops as one progresses up the academic chain to postdocs and then to faculty members. And primarily PhD students and postdocs registered for the winter school.

March 16, 2024

David Hoggsubmitted!

OMG I actually just submitted an actual paper, with me as first author. I submitted to the AAS Journals, with a preference for The Astronomical Journal. I don't write all that many first-author papers, so I am stoked about this. If you want to read it: It should come out on arXiv within days, or if you want to type pdflatex a few times, it is available at this GitHub repo. It is about how to combine many shifted images into one combined, mean image.

David HoggIAIFI Symposium, day two

Today was day two of a meeting on generative AI in physics, hosted by MIT. My favorite talks today were by Song Han (MIT) and Thea Aarestad (ETH), both of whom are working on making ML systems run ultra-fast on extremely limited hardware. Themes were: Work at low precision. Even 4-bit number representations! Radical. And bandwidth is way more expensive than compute: Never move data, latents, or weights to new hardware; work as locally as you can. They both showed amazing performance on terrible, tiny hardware. In addition, Han makes really cute 3d-printed devices! A conversation at the end that didn't quite happen is about how Aarestad's work might benefit from equivariant methods: Her application area is triggers in the CMS device at the LHC; her symmetry group is the Lorentz group (and permutations and etc). The day started with me on a panel in which my co-panelists said absolutely unhhinged things about the future of physics and artificial intelligence. I learned that many people think we are only years away from having independently operating, fully functional aritificial physicists that are more capable than we are.

David HoggIAIFI Symposium, day one

Today was the first day of a two-day symposium on the impact of Generative AI in physics. It is hosted by IAIFI and A3D3, two interdisciplinary and inter-institutional entities working on things related to machine learning. I really enjoyed the content today. One example was Anna Scaife (Manchester) telling us that all the different methods they have used for uncertainty quantification in astronomy-meets-ML contexts give different and inconsistent answers. It is very hard to know your uncertainty when you are doing ML. Another example was Simon Batzner (DeepMind) explaining that equivariant methods were absolutely required for the materials-design projects at DeepMind, and that introducing the equivariance absolutely did not bork optimization (as many believe it will). Those materials-design projects have been ridiculously successful. He said the amusing thing “Machine learning is IID, science is OOD”. I couldn't agree more. In a panel at the end of the day I learned that learned ML controllers now beat hand-built controllers in some robotics applications. That's interesting and surprising.

March 14, 2024

John BaezThe Probability of the Law of Excluded Middle

The Law of Excluded Middle says that for any statement P, “P or not P” is true.

Is this law true? In classical logic it is. But in intuitionistic logic it’s not.

So, in intuitionistic logic we can ask what’s the probability that a randomly chosen statement obeys the Law of Excluded Middle. And the answer is “at most 2/3—or else your logic is classical”.

This is a very nice new result by Benjamin Bumpus and Zoltan Kocsis:

• Benjamin Bumpus, Degree of classicality, Merlin’s Notebook, 27 February 2024.

Of course they had to make this more precise before proving it. Just as classical logic is described by Boolean algebras, intuitionistic logic is described by something a bit more general: Heyting algebras. They proved that in a finite Heyting algebra, if more than 2/3 of the statements obey the Law of Excluded Middle, then it must be a Boolean algebra!

Interestingly, nothing like this is true for “not not P implies P”. They showed this can hold for an arbitrarily high fraction of statements in a Heyting algebra that is still not Boolean.

Here’s a piece of the free Heyting algebra on one generator, which some call the Rieger–Nishimura lattice:



Taking the principle of excluded middle from the mathematician would be the same, say, as proscribing the telescope to the astronomer or to the boxer the use of his fists. — David Hilbert

I disagree with this statement, but boy, Hilbert sure could write!

March 13, 2024

Tommaso DorigoOn The Utility Function Of Future Experiments

At a recent meeting of the board of editors of a journal I am an editor of, it was decided to produce a special issue (to commemorate an important anniversary). As I liked the idea I got carried away a bit, and proposed to write an article for it. 

read more

March 12, 2024

David Hoggblack holes as the dark matter

Today Cameron Norton (NYU) gave a great brown-bag talk on the possibility that the dark matter might be asteroid-mass-scale black holes. This is allowed by all constraints at present: If the masses are much smaller, the black holes evaporate or emit observably. If the black holes are much smaller, they would create observable microlensing or dynamical signatures.

She and Kleban (NYU) are working on methods for creating such black holes primordially, by modifying hte potential at inflation, creating opportunities for bubble nucleations in inflation that would subsequently collapse into small black holes after the Universe exits inflation. It's speculative obviously, but not ruled out at present!

An argument broke out during and after the talk whether you would be injured if you were intersected by a 1020 g black hole! My position is that you would be totally fine! Everyone else in the room disagreed with me, for many different reasons. Time to get calculating.

Another great idea: Could we find stars that have captured low-mass black holes by looking for the radial-velocity signal? I got really interested in this one at the end.

David HoggThe Cannon and El Cañon

At the end of the day I got a bit of quality time in with Danny Horta (Flatiron) and Adrian Price-Whelan (Flatiron), who have just (actually just before I met with them) created a new implementation of The Cannon (the data-driven model of stellar photospheres originally created by Melissa Ness and me back in 2014/2015). Why!? Not because the world needs another implementation. We are building a new implementation because we plan to extend out to El Cañon, which will extend the probabilistic model into the label domain: It will properly generate or treat noisy and missing labels. That will permit us to learn latent labels, and de-noise noisy labels.

March 07, 2024

Doug NatelsonAPS March Meeting 2024, Day 4 and wrap-up

Because of the timing of my flight back to Houston, I really only went to one session today, in which my student spoke as did some collaborators.  It was a pretty interesting collection of contributed talks.  

  • The work that's been done on spin transport in multiferroic insulators is particularly interesting to me.  A relevant preprint is this one, in which electric fields are used to reorient \(\mathbf{P}\) in BiFeO3, which correspondingly switches the magnetization in this system (which is described by a complicated spin cycloid order) and therefore modulates the transmission of spin currents (as seen in ferromagnetic resonance).  
  • Similarly adding a bit of La to BiFeO3 to favor single ferroelectric domain formation was a neat complement to this.
  • There were also multiple talks showing the utility of the spin Hall magnetoresistance as a way to characterize spin transport between magnetic insulators and strong spin-orbit coupled metals.
Some wrap-up thoughts:
  • This meeting venue and environment was superior in essentially every way relative to last year's mess in Las Vegas.  Nice facilities, broadly good rooms, room sizes, projectors, and climate control.  Lots of hotels.  Lots of restaurants that are not absurdly expensive.  I'd be very happy to have the meeting in Minneapolis again at some point.  There was even a puppy-visiting booth at the exhibit hall on Tuesday and Thursday.
  • Speaking of the exhibit hall, I think this is the first time I've been at a meeting where a vendor was actually running a dilution refrigerator on the premises.  
  • Only one room that I was in had what I would describe as a bad projector (poor color balance, loud fan, not really able to be focused crisply).  I also did not see any session chair this year blow it by allowing speakers to blow past their allotted times.
  • We really lucked out on the weather.  
  • Does anyone know what happens if someone ignores the "Warning: Do Not Drive Over Plate" label on the 30 cm by 40 cm yellow floor plate in the main lobby?  Like, does it trigger a self-destruct mechanism, or the apocalypse or something?
  • Next year's combined March/April meeting in Anaheim should be interesting - hopefully the venue is up to the task, and likewise I hope there are good, close housing and food options.

February 19, 2024

Mark GoodsellRencontres de Physique des Particules 2024

Just over a week ago the annual meeting of theoretical particle physicists (RPP 2024) was held at Jussieu, the campus of Sorbonne University where I work. I wrote about the 2020 edition (held just outside Paris) here; in keeping with tradition, this year's version also contained similar political sessions with the heads of the CNRS' relevant physics institutes and members of CNRS committees, although they were perhaps less spicy (despite rumours of big changes in the air). 

One of the roles of these meetings is as a shop window for young researchers looking to be hired in France, and a great way to demonstrate that they are interested and have a connection to the system. Of course, this isn't and shouldn't be obligatory by any means; I wasn't really aware of this prior to entering the CNRS though I had many connections to the country. But that sort of thing seems especially important after the problems described by 4gravitons recently, and his post about getting a permanent job in France -- being able to settle in a country is non-trivial, it's a big worry for both future employers and often not enough for candidates fighting tooth and nail for the few jobs there are. There was another recent case of someone getting a (CNRS) job -- to come to my lab, even -- who much more quickly decided to leave the entire field for personal reasons. Both these stories saddened me. I can understand -- there is the well-known Paris syndrome for one thing -- and the current political anxiety about immigration and the government's response to the rise of the far right (across the world), coupled with Brexit, is clearly leading to things getting harder for many. These stories are especially worrying because we expect to be recruiting for university positions in my lab this year.

I was obviously very lucky and my experience was vastly different; I love both the job and the place, and I'm proud to be a naturalised citizen. Permanent jobs in the CNRS are amazing, especially in terms of the time and freedom you have, and there are all sorts of connections between the groups throughout the country such as via the IRN Terascale or GdR Intensity Frontier; or IRN Quantum Fields and Strings and French Strings meetings for more formal topics. I'd recommend anyone thinking about working here to check out these meetings and the communities built around them, as well as taking the opportunity to find out about life here. For those moving with family, France also offers a lot of support (healthcare, childcare, very generous holidays, etc) once you have got into the system.

The other thing to add that was emphasised in the political sessions at the RPP (reinforcing the message that we're hearing a lot) is that the CNRS is very keen to encourage people from under-represented groups to apply and be hired. One of the ways they see to help this is to put pressure on the committees to hire researchers (even) earlier after their PhD, in order to reduce the length of the leaky pipeline.

Back to physics

Coming back to the RPP, this year was particularly well attended and had an excellent program of reviews of hot topics, invited and contributed talks, put together very carefully by my colleagues. It was particularly poignant for me because two former students in my lab who I worked with a lot, one who recently got a permanent job, were talking; and in addition both a former student of mine and his current PhD student were giving talks: this made me feel old. (All these talks were fascinating, of course!) 

One review that stood out as relevant for this blog was Bogdan Malaescu's review of progress in understanding the problem with muon g-2. As I discussed here, there is currently a lot of confusion in what the Standard Model prediction should be for that quantity. This is obviously very concerning for the experiments measuring muon g-2, who in a paper last year reduced their uncertainty by a factor of 2 to $$a_\mu (\mathrm{exp}) = 116 592 059(22)\times 10^{−11}. $$

The Lattice calculation (which has been confirmed now by several groups) disagrees with the prediction using the data-driven R-ratio method however, and there is a race on to understand why. New data from the CMD-3 experiment seems to agree with the lattice result, combining all global data on measurements of \(e^+ e^- \rightarrow \pi^+ \pi^- \) still gives a discrepancy of more than \(5\sigma\). There is clearly a significant disagreement within the data samples used (indeed, CMD-3 significantly disagrees with their own previous measurement, CMD-2). The confusion is summarised by this plot:

As can be seen, the finger of blame is often pointed at the KLOE data; excluding it but including the others in the plot gives agreement with the lattice result and a significance of non-zero \(\Delta a_\mu\) compared to experiment of \(2.8\sigma\) (or for just the dispersive method without the lattice data \( \Delta a_\mu \equiv a_\mu^{\rm SM} - a_\mu^{\rm exp} = −123 \pm 33 \pm 29 \pm 22 \times 10^{-11} \) , a discrepancy of \(2.5\sigma\)). In Bogdan's talk (see also his recent paper) he discusses these tensions and also the tensions between the data and the evaluation of \(a_\mu^{\rm win}\), which is the contribution coming from a narrow "window" (when the total contribution to the Hadronic Vacuum Polarisation is split into short, medium and long-distance pieces, the medium-range part should be the one most reliable for lattice calculations -- at short distances the lattice spacing may be too small, and at long ones the lattice may not be large enough). There he shows that, if we exclude the KLOE data and just include the BABAR, CMD-3 and Tau data, while the overall result agrees with the BMW lattice result, the window one disagrees by \(2.9 \sigma\) [thanks Bogdan for the correction to the original post]. It's clear that there is still a lot to be understood in the discrepancies of the data, and perhaps, with the added experimental precision on muon g-2, there is even still a hint of new physics ...

February 13, 2024

Jordan EllenbergAlphabetical Diaries

Enough of this.Enough.Equivocal or vague principles, as a rule, will make your life an uninspired, undirected, and meaningless act.

This is taken from Alphabetical Diaries, a remarkable book I am reading by Sheila Heti, composed of many thousands of sentences drawn from her decades of diaries and presented in alphabetical order. It starts like this:

A book about how difficult it is to change, why we don’t want to, and what is going on in our brain.A book can be about more than one thing, like a kaleidoscope, it can have man things that coalesce into one thing, different strands of a story, the attempt to do several, many, more than one thing at a time, since a book is kept together by the binding.A book like a shopping mart, all the selections.A book that does only one thing, one thing at a time.A book that even the hardest of men would read.A book that is a game.A budget will help you know where to go.

How does a simple, one might even say cheap, technique, one might even say gimmick, work so well? I thrill to the aphorisms even when I don’t believe them, as with the aphorism above: principles must be equivocal or at least vague to work as principles; without the necessary vagueness they are axioms, which are not good for making one’s life a meaningful act, only good for arguing on the Internet. I was reading Alphabetical Diaries while I walked home along the southwest bike path. I stopped for a minute and went up a muddy slope into the cemetery where there was a gap in the fence, and it turned out this gap opened on the area of infant graves, graves about the size of a book, graves overlaying people who were born and then did what they did for a week and then died — enough of this.

January 24, 2024

Robert HellingHow do magnets work?

I came across this excerpt from a a christian home schooling book:

which is of course funny in so many ways not at least as the whole process of "seeing" is electromagnetic at its very core and of course most people will have felt electricity at some point in their life. Even historically, this is pretty much how it was discovered by Galvani (using forge' legs) at a time when electricity was about cat skins and amber.

It also brings to mind this quite famous Youtube video that shows Feynman being interviewed by the BBC and first getting somewhat angry about the question how magnets work and then actually goes into a quite deep explanation of what it means to explain something
 

But how do magnets work? When I look at what my kids are taught in school, it basically boils down to "a magnet is made up of tiny magnets that all align" which if you think about it is actually a non-explanation. Can we do better (using more than layman's physics)? What is it exactly that makes magnets behave like magnets?

I would define magnetism as the force that moving charges feel in an electromagnetic field (the part proportional to the velocity) or said the other way round: The magnetic field is the field that is caused by moving charges. Using this definition, my interpretation of the question about magnets is then why permanent magnets feel this force.  For the permanent magnets, I want to use the "they are made of tiny magnets" line of thought but remove the circularity of the argument by replacing it by "they are made of tiny spins". 

This transforms the question to "Why do the elementary particles that make up matter feel the same force as moving charges even if they are not moving?".

And this question has an answer: Because they are Dirac particles! At small energies, the Dirac equation reduces to the Pauli equation which involves the term (thanks to minimal coupling)
$$(\vec\sigma\cdot(\vec p+q\vec A)^2$$
and when you expand the square that contains (in Coulomb gauge)
$$(\vec\sigma\cdot \vec p)(\vec\sigma\cdot q\vec A)= q\vec A\cdot\vec p + (\vec p\times q\vec A)\cdot\vec\sigma$$
Here, the first term is the one responsible for the interaction of the magnetic field and moving charges while the second one couples $$\nabla\times\vec A$$ to the operator $$\vec\sigma$$, i.e. the spin. And since you need to have both terms, this links the force on moving charges to this property we call spin. If you like, the fact that the g-factor is not vanishing is the core of the explanation how magnets work.

And if you want, you can add spin-statistics which then implies the full "stability of matter" story in the end is responsible that you can from macroscopic objects out of Dirac particles that can be magnets.


January 20, 2024

Jacques Distler Responsibility

Many years ago, when I was an assistant professor at Princeton, there was a cocktail party at Curt Callan’s house to mark the beginning of the semester. There, I found myself in the kitchen, chatting with Sacha Polyakov. I asked him what he was going to be teaching that semester, and he replied that he was very nervous because — for the first time in his life — he would be teaching an undergraduate course. After my initial surprise that he had gotten this far in life without ever having taught an undergraduate course, I asked which course it was. He said it was the advanced undergraduate Mechanics course (chaos, etc.) and we agreed that would be a fun subject to teach. We chatted some more, and then he said that, on reflection, he probably shouldn’t be quite so worried. After all, it wasn’t as if he was going to teach Quantum Field Theory, “That’s a subject I’d feel responsible for.”

This remark stuck with me, but it never seemed quite so poignant until this semester, when I find myself teaching the undergraduate particle physics course.

The textbooks (and I mean all of them) start off by “explaining” that relativistic quantum mechanics (e.g. replacing the Schrödinger equation with Klein-Gordon) make no sense (negative probabilities and all that …). And they then proceed to use it anyway (supplemented by some Feynman rules pulled out of thin air).

This drives me up the #@%^ing wall. It is precisely wrong.

There is a perfectly consistent quantum mechanical theory of free particles. The problem arises when you want to introduce interactions. In Special Relativity, there is no interaction-at-a-distance; all forces are necessarily mediated by fields. Those fields fluctuate and, when you want to study the quantum theory, you end up having to quantize them.

But the free particle is just fine. Of course it has to be: free field theory is just the theory of an (indefinite number of) free particles. So it better be true that the quantum theory of a single relativistic free particle makes sense.

So what is that theory?

  1. It has a Hilbert space, \mathcal{H}, of states. To make the action of Lorentz transformations as simple as possible, it behoves us to use a Lorentz-invariant inner product on that Hilbert space. This is most easily done in the momentum representation χ|ϕ=d 3k(2π) 32k 2+m 2χ(k) *ϕ(k) \langle\chi|\phi\rangle = \int \frac{d^3\vec{k}}{{(2\pi)}^3 2\sqrt{\vec{k}^2+m^2}}\, \chi(\vec{k})^* \phi(\vec{k})
  2. As usual, the time-evolution is given by a Schrödinger equation
(1)i t|ψ=H 0|ψi\partial_t |\psi\rangle = H_0 |\psi\rangle

where H 0=p 2+m 2H_0 = \sqrt{\vec{p}^2+m^2}. Now, you might object that it is hard to make sense of a pseudo-differential operator like H 0H_0. Perhaps. But it’s not any harder than making sense of U(t)=e ip 2t/2mU(t)= e^{-i \vec{p}^2 t/2m}, which we routinely pretend to do in elementary quantum. In both cases, we use the fact that, in the momentum representation, the operator p\vec{p} is represented as multiplication by k\vec{k}.

I could go on, but let me leave the rest of the development of the theory as a series of questions.

  1. The self-adjoint operator, x\vec{x}, satisfies [x i,p j]=iδ j i [x^i,p_j] = i \delta^{i}_j Thus it can be written in the form x i=i(k i+f i(k)) x^i = i\left(\frac{\partial}{\partial k_i} + f_i(\vec{k})\right) for some real function f if_i. What is f i(k)f_i(\vec{k})?
  2. Define J 0(r)J^0(\vec{r}) to be the probability density. That is, when the particle is in state |ϕ|\phi\rangle, the probability for finding it in some Borel subset S 3S\subset\mathbb{R}^3 is given by Prob(S)= Sd 3rJ 0(r) \text{Prob}(S) = \int_S d^3\vec{r} J^0(\vec{r}) Obviously, J 0(r)J^0(\vec{r}) must take the form J 0(r)=d 3kd 3k(2π) 64k 2+m 2k 2+m 2g(k,k)e i(kk)rϕ(k)ϕ(k) * J^0(\vec{r}) = \int\frac{d^3\vec{k}d^3\vec{k}'}{{(2\pi)}^6 4\sqrt{\vec{k}^2+m^2}\sqrt{{\vec{k}'}^2+m^2}} g(\vec{k},\vec{k}') e^{i(\vec{k}-\vec{k'})\cdot\vec{r}}\phi(\vec{k})\phi(\vec{k}')^* Find g(k,k)g(\vec{k},\vec{k}'). (Hint: you need to diagonalize the operator x\vec{x} that you found in problem 1.)
  3. The conservation of probability says 0= tJ 0+ iJ i 0=\partial_t J^0 + \partial_i J^i Use the Schrödinger equation (1) to find J i(r)J^i(\vec{r}).
  4. Under Lorentz transformations, H 0H_0 and p\vec{p} transform as the components of a 4-vector. For a boost in the zz-direction, of rapidity λ\lambda, we should have U λp 2+m 2U λ 1 =cosh(λ)p 2+m 2+sinh(λ)p 3 U λp 1U λ 1 =p 1 U λp 2U λ 1 =p 3 U λp 3U λ 1 =sinh(λ)p 2+m 2+cosh(λ)p 3 \begin{split} U_\lambda \sqrt{\vec{p}^2+m^2} U_\lambda^{-1} &= \cosh(\lambda) \sqrt{\vec{p}^2+m^2} + \sinh(\lambda) p_3\\ U_\lambda p_1 U_\lambda^{-1} &= p_1\\ U_\lambda p_2 U_\lambda^{-1} &= p_3\\ U_\lambda p_3 U_\lambda^{-1} &= \sinh(\lambda) \sqrt{\vec{p}^2+m^2} + \cosh(\lambda) p_3 \end{split} and we should be able to write U λ=e iλBU_\lambda = e^{i\lambda B} for some self-adjoint operator, BB. What is BB? (N.B.: by contrast the x ix^i, introduced above, do not transform in a simple way under Lorentz transformations.)

The Hilbert space of a free scalar field is now n=0 Sym n\bigoplus_{n=0}^\infty \text{Sym}^n\mathcal{H}. That’s perhaps not the easiest way to get there. But it is a way …

Update:

Yike! Well, that went south pretty fast. For the first time (ever, I think) I’m closing comments on this one, and calling it a day. To summarize, for those who still care,

  1. There is a decomposition of the Hilbert space of a Free Scalar field as ϕ= n=0 n \mathcal{H}_\phi = \bigoplus_{n=0}^\infty \mathcal{H}_n where n=Sym n \mathcal{H}_n = \text{Sym}^n \mathcal{H} and \mathcal{H} is 1-particle Hilbert space described above (also known as the spin-00, mass-mm, irreducible unitary representation of Poincaré).
  2. The Hamiltonian of the Free Scalar field is the direct sum of the induced Hamiltonia on n\mathcal{H}_n, induced from the Hamiltonian, H=p 2+m 2H=\sqrt{\vec{p}^2+m^2}, on \mathcal{H}. In particular, it (along with the other Poincaré generators) is block-diagonal with respect to this decomposition.
  3. There are other interesting observables which are also block-diagonal, with respect to this decomposition (i.e., don’t change the particle number) and hence we can discuss their restriction to n\mathcal{H}_n.

Gotta keep reminding myself why I decided to foreswear blogging…

December 20, 2023

Richard EastherA Bigger Sky

Amongst everything else that happened in 2023, a key anniversary of a huge leap in our understanding of the Universe passed largely unnoticed – the centenary of the realisation that not only was our Sun one of many stars in the Milky Way galaxy but that our galaxy was one of many galaxies in the Universe.

I had been watching the approaching anniversary for over a decade, thanks to teaching the cosmology section of the introductory astronomy course at the University of Auckland. My lectures come at the end of the semester and each October finds me showing this image – with its “October 1923” inscription – to a roomful of students.

The image was captured by the astronomer Edwin Hubble, using the world’s then-largest telescope, on top of Mt Wilson, outside Los Angeles. At first glance, it may not even look like a picture of the night sky: raw photographic images are reversed, so stars show up as dark spots against a light backgrounds. However, this odd-looking picture changed our sense of where we live in the Universe.

My usual approach when I share this image with my students is to ask for a show of hands by people with a living relative born before 1923. It’s a decent-sized class and this year a few of them had a centenarian in the family. However, I would get far more hands a decade ago when I asked about mere 90 year olds. And sometime soon no hands will rise at this prompt and I will have to come up with a new shtick. But it is remarkable to me that there are people alive today who were born before we understood of the overall arrangement of the Universe.

For tens of thousands of years, the Milky Way – the band of light that stretches across the dark night sky – would have been one of the most striking sights in the sky on a dark night once you stepped away from the fire.

Milky Way — via Unsplash

Ironically, the same technological prowess that has allowed us to explore the farthest reaches of the Universe also gives us cities and electric lights. I always ask whether my students have seen the Milky Way for themselves with another show of hands and each year quite a few of them disclose that they have not. I encourage them (and everyone) to find chances to sit out under a cloudless, moonless sky and take in the full majesty of the heavens as it slowly reveals itself to you as your eyes adapt to the dark.

In the meantime, though, we make do with a projector and a darkened lecture theatre.

It was over 400 years ago that Galileo pointed the first, small telescope at the sky. In that moment the apparent clouds of the Milky Way revealed themselves to be composed of many individual stars. By the 1920s, we understood that our Sun is a star and that the Milky Way is a collection of billions of stars, with our Sun inside it. But the single biggest question in astronomy in 1923 — which, with hindsight, became known “Great Debate” — was whether the Milky Way was an isolated island of stars in an infinite and otherwise empty ocean of space, or if it was one of many such islands, sprinkled across the sky.

In other words, for Hubble and his contemporaries the question was whether our galaxy was the galaxy, or one of many?

More specifically, the argument was whether nebulae, which are visible as extended patches of light in the night sky, were themselves galaxies or contained within the Milky Way. These objects, almost all of which are only detectable in telescopes, had been catalogued by astronomers as they mapped the sky with increasingly capable instruments. There are many kinds of nebulae, but the white nebulae had the colour of starlight and looked like little clouds through the eyepiece. Since the 1750s these had been proposed as possible galaxies. But until 1923 nobody knew with certainty whether they were small objects on the outskirts of our galaxy – or much larger, far more distant objects on the same scale as the Milky Way itself.

To human observers, the largest and most impressive of the nebulae is Andromeda. This was this object at which Hubble had pointed his telescope in October 1923. Hubble was renowned for his ability to spot interesting details in complex images [1] and after the photographic plate was developed his eye alighted on a little spot that had not been present in an earlier observation [2].

Hubble’s original guess was that this was a nova, a kind of star that sporadically flares in brightness by a factor of 1,000 or more, so he marked it and a couple of other candidates with an “N”. However, after looking back at images that he had already taken and monitoring the star through the following months Hubble came to realise that he had found a Cepheid variable – a star whose brightness changes rhythmically over weeks or months.

Stars come in a huge range of sizes and big stars are millions of times brighter than little ones, so simply looking a star in the sky tells us little about its distance from us. But Cepheids have a useful property [3]: brighter Cepheids takes longer to pass through a single cycle than their smaller siblings.

Imagine a group of people holding torches (flashlights if you are North Americans) each of which has a bulb with its own distinctive brightness. If this group fans out across a field at night and turns on their torches, we cannot tell how far away each person simply by looking at the resulting pattern of lights. Is that torch faint because it is further from us than most, or because its bulb is dimmer than most? But if each person were to flash the wattage of their bulbs in Morse Code we could estimate distances by comparing their apparent brightness (since distant objects appear fainter) to their actual intensity (which is encoded in the flashing light).

In the case of Cepheids they are not flashing in Morse code; instead, nature provides us with the requisite information via the time it takes for their brightness to vary from maximum to minimum and back to a maximum again.

Hubble used this knowledge to estimate the distance to Andromeda. While the number he found was lower than the best present-day estimates it was still large enough to show that it was far from the Milky Way and this roughly the same size as our galaxy.

The immediate implication, given that Andromeda is the brightest of the many nebulae we see in big telescopes, was that our Milky Way was neither alone nor unique in the Universe. Thus we confirmed that our galaxy was just one of an almost uncountable number of islands in the ocean of space – and the full scale of the cosmos yielded to human measurement for the first time, through Hubble’s careful lens on a curious star.

A modern image (made by Richard Gentler) of the Andromeda galaxy with a closeup on what is now called “Hubble’s star” taken using the (appropriately enough) Hubble Space Telescope, in the white circle. A “positive” image from Hubble’s original plate is shown at the bottom right.

Illustration Credit: NASA, ESA and Z. Levay (STScI). Credit: NASA, ESA and the Hubble Heritage Team (STScI/AURA)


[1] Astronomers in Hubble’s day used a gizmo called a “Blink Comparator” that chops quickly between two images viewed through an eyepiece, so objects changing in brightness draw attention to themselves by flickering.

[2] In most reproductions of the original plate I am hard put to spot it all, even more so when it is projected on a screen in a lecture theatre. A bit of mild image processing makes it a little clearer, but it hardly calls attention to itself.

snap1.jpg
snap2.jpeg

[3] This “period-luminosity law” had been described just 15 years earlier by Henrietta Swan Leavitt and it is still key to setting the overall scale of the Universe.

December 18, 2023

Jordan EllenbergShow report: Bug Moment, Graham Hunt, Dusk, Disq at High Noon Saloon

I haven’t done a show report in a long time because I barely go to shows anymore! Actually, though, this fall I went to three. First, The Beths, opening for The National, but I didn’t stay for The National because I don’t know or care about them; I just wanted to see the latest geniuses of New Zealand play “Expert in a Dying Field”

Next was the Violent Femmes, playing their self-titled debut in order. They used to tour a lot and I used to see them a lot, four or five times in college and grad school I think. They never really grow old and Gordon Gano never stops sounding exactly like Gordon Gano. A lot of times I go to reunion shows and there are a lot of young people who must have come to the band through their back catalogue. Not Violent Femmes! 2000 people filling the Sylvee and I’d say 95% were between 50 and 55. One of the most demographically narrowcast shows I’ve ever been to. Maybe beaten out by the time I saw Black Francis at High Noon and not only was everybody exactly my age they were also all men. (Actually, it was interesting to me there were a lot of women at this show! I think of Violent Femmes as a band for the boys.)

But I came in to write about the show I saw this weekend, four Wisconsin acts playing the High Noon. I really came to see Disq, whose single “Daily Routine” I loved when it came out and I still haven’t gotten tired of. Those chords! Sevenths? They’re something:

Dusk was an Appleton band that played funky/stompy/indie, Bug Moment had an energetic frontwoman named Rosenblatt and were one of those bands where no two members looked like they were in the same band. But the real discovery of the night, for me, was Graham Hunt, who has apparently been a Wisconsin scene fixture forever. Never heard of the guy. But wow! Indie power-pop of the highest order. When Hunt’s voice cracks and scrapes the high notes he reminds me a lot of the other great Madison noisy-indie genius named Graham, Graham Smith, aka Kleenex Girl Wonder, who recorded the last great album of the 1990s in his UW-Madison dorm room. Graham Hunt’s new album, Try Not To Laugh, is out this week. ”Emergency Contact” is about as pretty and urgent as this kind of music gets. 

And from his last record, If You Knew Would You Believe it, “How Is That Different,” which rhymes blanket, eye slit, left it, and orbit. Love it! Reader, I bought a T-shirt.

November 27, 2023

Sean Carroll New Course: The Many Hidden Worlds of Quantum Mechanics

In past years I’ve done several courses for The Great Courses/Wondrium (formerly The Teaching Company): Dark Matter and Dark Energy, Mysteries of Modern Physics:Time, and The Higgs Boson and Beyond. Now I’m happy to announce a new one, The Many Hidden Worlds of Quantum Mechanics.

This is a series of 24 half-hour lectures, given by me with impressive video effects from the Wondrium folks.

The content will be somewhat familiar if you’ve read my book Something Deeply Hidden — the course follows a similar outline, with a few new additions and elaborations along the way. So it’s both a general introduction to quantum mechanics, and also an in-depth exploration of the Many Worlds approach in particular. It’s meant for absolutely everybody — essentially no equations this time! — but 24 lectures is plenty of time to go into depth.

Check out this trailer:

As I type this on Monday 27 November, I believe there is some kind of sale going on! So move quickly to get your quantum mechanics at unbelievably affordable prices.