Planet Musings

June 12, 2024

John BaezThe Grotthuss Mechanism

If you could watch an individual water molecule, once in a while you’d see it do this.

As it bounces around, every so often it hits another water molecule hard enough enough for one to steal a hydrogen nucleus—that is, a proton—from the other!

The water molecule with the missing proton is called a hydroxide ion, OH⁻. The one with an extra proton is called a hydronium ion, H₃O⁺.

This process is called the ‘autoionization’ of water. Thanks to this, a few molecules in a glass of water are actually OH⁻ or H₃O⁺, not the H₂O you expect.

And this gives a cool way for protons to move through water. Here’s a little movie of how it works, made by Mark Petersen:

A positively charged proton gets passed from one molecule to another! This is called the ‘Grotthuss mechanism’, because Theodor Grotthuss proposed this theory in his paper “Theory of decomposition of liquids by electrical currents” back in 1806. It was quite revolutionary at the time, since ions were not well understood.

Something like this theory is true. But in fact, I believe all the pictures I’ve shown so far are oversimplified! A hydronium ion is too powerfully positive to remain a lone H₃O⁺. It usually attracts a bunch of other water molecules by the van der Waals force and creates larger structures. You can see these here:

Water, Azimuth, 29 November 2013.

Water with even trace amounts of salts in it conducts electricity vastly better than pure water, because when salts dissolve in water they create free ions. So, the Grotthus mechanism seems to be the dominant form of electrical conduction in water only when the water is extremely pure. According to Wikipedia:

Pure water containing no exogenous ions is an excellent electronic insulator, but not even “deionized” water is completely free of ions. Water undergoes autoionization in the liquid state when two water molecules form one hydroxide anion (OH⁻) and one hydronium cation (H₃O⁺). Because of autoionization, at ambient temperatures pure liquid water has a similar intrinsic charge carrier concentration to the semiconductor germanium and an intrinsic charge carrier concentration three orders of magnitude greater than the semiconductor silicon, hence, based on charge carrier concentration, water can not be considered to be a completely dielectric material or electrical insulator but to be a limited conductor of ionic charge.

Because water is such a good solvent, it almost always has some solute dissolved in it, often a salt. If water has even a tiny amount of such an impurity, then the ions can carry charges back and forth, allowing the water to conduct electricity far more readily.

It is known that the theoretical maximum electrical resistivity for water is approximately 18.2 MΩ·cm (182 kΩ·m) at 25 °C. This figure agrees well with what is typically seen on reverse osmosis, ultra-filtered and deionized ultra-pure water systems used, for instance, in semiconductor manufacturing plants. A salt or acid contaminant level exceeding even 100 parts per trillion (ppt) in otherwise ultra-pure water begins to noticeably lower its resistivity by up to several kΩ·m.

I have a couple of questions:

Puzzle 1. What fraction of water molecules are autoionized at any time? It should be possible to compute this for water at 25℃ knowing that

[H₃O⁺] [OH⁻] = 1.006 × 10-14

at this temperature.

Puzzle 2. How often, on average, does an individual water molecule autoionize? Wikipedia says it happens about once every 10 hours, and cites this paper:

• Manfred Eigen and L. De Maeyer, Untersuchungen über die Kinetik der Neutralisation I, Z. Elektrochem. 59 (1955), 986.

But I don’t know how this was estimated, so I don’t know how seriously to take it.

If we knew answers to Puzzles 1 and 2, maybe we could compute how long an individual molecule remains ionized each time it autoionizes, on average. But I’m worried about a lot of subtleties that I don’t really understand.

For more, read:

• Wikipedia, Self-ionization of water.

• Wikipedia, Grotthuss mechanism.

Matt Strassler Virtual Tour of Two LHC Experiments TODAY!

[Update: unfortunately, the link below was taken down before the tour, with no explanation. If anyone knows why, please let me know. Apologies to anyone who got their hopes up. I’m sure there will be other tours in the future, and I’ll try to make sure I have more stable information next time.]

Would anyone like a tour of the ATLAS and CMS experiments, the general purpose particle detectors at the Large Hadron Collider that were used to discover the particle known as the Higgs boson? A live, virtual tour is being given today (Tuesday June 11) on YouTube,, at 1700 CERN time — that’s 1600 London time, 11:00 New York time, 8:00 San Francisco time. Find out how these enormous, complex, magnificent devices are constructed, and learn how their various parts work together, 25 million times every second, to allow scientists to track the tiniest objects in the universe. Includes a Q&A at the end for participants.

June 10, 2024

John PreskillQuantum Frontiers salutes an English teacher

If I ever mention a crazy high-school English teacher to you, I might be referring to Mr. Lukacs. One morning, before the first bell rang, I found him wandering among the lockers, wearing a white beard and a mischievous grin. (The school had pronounced the day “Dress Up as Your Favorite Writer” Day, or some such designation, but still.1) Mr. Lukacs was carrying a copy of Leaves of Grass, a book by the nineteenth-century American poet Walt Whitman, and yawping. To yawp is to cry out, and Whitman garnered acclaim for weaving such colloquialisms into his poetry. “I sound my barbaric yawp over the roofs of the world,” he wrote in Leaves of Grass—as Mr. Lukacs illustrated until the bells rang for class. And, for all I know, until the final bell.

I call Mr. Lukacs one of my crazy high-school English teachers despite never having taken any course of his.2 He served as the faculty advisor for the school’s literary magazine, on whose editorial board I served. As a freshman and sophomore, I kept my head down and scarcely came to know Mr. Lukacs. He wore small, round glasses and a bowtie. As though to ham up the idiosyncrasy, he kept a basket of bowties in his classroom. His hair had grayed, he spoke slowly, and he laughed in startling little bursts that resembled gasps. 

Junior year, I served as co-editor-in-chief of the literary magazine; and, senior year, as editor-in-chief. I grew to conjecture that Mr. Lukacs spoke slowly because he was hunting for the optimal word to use next. Finding that word cost him a pause, but learning his choice enriched the listener. And Mr. Lukacs adored literature. You could hear, when he read aloud, how he invested himself in it. 

I once submitted to the literary magazine a poem about string theory, inspired by a Brian Greene book.3 As you might expect, if you’ve ever read about string theory, the poem invoked music. Mr. Lukacs pretended to no expertise in science; he even had a feud with the calculus teacher.4 But he wrote that the poem made him feel like dancing.

You might fear that Mr. Lukacs too strongly echoed the protagonist of Dead Poets Society to harbor any originality. The 1989 film Dead Poets Society stars Robin Williams as an English teacher who inspires students to discover their own voices, including by yawping à la Whitman. But Mr. Lukacs leaned into the film, with a gleeful sort of exultation. He even interviewed one of the costars, who’d left acting to teach, for a job. The interview took place beside a cardboard-cutout advertisement for Dead Poets Society—a possession, I’m guessing, of Mr. Lukacs’s.

This winter, friends of Mr. Lukacs’s helped him create a Youtube video for his former students. He sounded as he had twenty years before. But he said goodbye, expecting his cancer journey to end soon. Since watching the video, I’ve been waffling between reading Goodbye, Mr. Chips—a classic novella I learned of around the time the video debuted—and avoiding it. I’m not sure what Mr. Lukacs would advise—probably to read, rather than not to read. But I like the thought of saluting a literary-magazine advisor on Quantum Frontiers. We became Facebook friends years ago; and, although I’ve rarely seen activity by him, he’s occasionally effused over some physics post of mine.

Physics brought me to the Washington, DC area, where a Whitman quote greets entrants to the Dupont Circle metro station. The DC area also houses Abraham Lincoln’s Cottage, where the president moved with his wife. They sought quietude to mourn their son Willie, who’d succumbed to an illness. Lincoln rode from the cottage to the White House every day. Whitman lived along his commute, according to a panel in the visitors’ center. I was tickled to learn that the two men used to exchange bows during that commute—one giant of politics and one giant of literature.

I wrote the text above this paragraph, as well as the text below, within a few weeks of watching the Youtube video. The transition between the two bothered me; it felt too abrupt. But I asked Mr. Lukacs via email whether he’d mind my posting the story. I never heard back. I learned why this weekend: he’d passed away on Friday. The announcement said, “please consider doing something that reminds you of George in the coming days. Read a few lines of a cherished text. Marvel at a hummingbird…” So I determined to publish the story without approval. I can think of no tribute more fitting than a personal essay published on a quantum blog that’s charted my intellectual journey of the past decade.

Here’s to another giant of literature. Goodbye, Mr. Lukacs.

Image from

1I was too boring to dress up as anyone.

2I call him one of my crazy high-school English teachers because his wife merits the epithet, too. She called herself senile, enacted the climax of Jude the Obscure with a student’s person-shaped pencil case, and occasionally imitated a chipmunk; but damn, do I know my chiasmus from my caesura because of her.

3That fact sounds hackneyed to me now. But I’m proud never to have entertained grand dreams of discovering a theory of everything.

4AKA my crazy high-school calculus teacher. My high school had loads of crazy teachers, but it also had loads of excellent teachers, and the crazy ones formed a subset of the excellent ones.

June 08, 2024

Scott Aaronson Situational Awareness

My friend Leopold Aschenbrenner, who I got to know and respect on OpenAI’s now-disbanded Superalignment team before he left the company under disputed circumstances, just released “Situational Awareness,” one of the most extraordinary documents I’ve ever read. With unusual clarity, concreteness, and seriousness, and with a noticeably different style than the LessWrongers with whom he shares some key beliefs, Leopold sets out his vision of how AI is going to transform civilization over the next 5-10 years. He makes a case that, even after ChatGPT and all that followed it, the world still hasn’t come close to “pricing in” what’s about to hit it. We’re still treating this as a business and technology story like personal computing or the Internet, rather than (also) a national security story like the birth of nuclear weapons, except more so. And we’re still indexing on LLMs’ current capabilities (“fine, so they can pass physics exams, but they still can’t do original physics research“), rather than looking at the difference between now and five years ago, and then trying our best to project forward an additional five years.

Leopold makes an impassioned plea for the US to beat China and its other autocratic adversaries in the race to superintelligence, and to start by preventing frontier model weights from being stolen. He argues that the development of frontier AI models will inevitably be nationalized, once governments wake up to the implications, so we might as well start planning for that now. Parting ways from the Yudkowskyans despite their obvious points of agreement, Leopold is much less worried about superintelligence turning us all into paperclips than he is about it doing the bidding of authoritarian regimes, although he does worry about both.

Leopold foresaw the Covid lockdowns, as well as the current AI boom, before most of us did, and apparently made a lot of money as a result. I don’t know how his latest predictions will look from the standpoint of 2030. In any case, though, it’s very hard for me to imagine anyone in the US national security establishment reading Leopold’s document without crapping their pants. Is that enough to convince you to read it?

Scott Aaronson My “museums designed by blankfaces” series: London Science Museum

Update (June 8): I’ve closed comments on this post. Thanks again to everyone who wrote to offer sympathy and advice. And to those who wrote to sneer at me—may you enter a hell where every day, for all eternity, you have to travel with kids and experience precisely what I have.

Update (June 2): To make this into something positive, here is my advice for those traveling to London with kids. Forget the museums and just book one West End musical after another. This is counterintuitive, since one thinks of kids as loving museums full of interactive exhibits and unable to sit through plays. In London, however, the museums are very much not interactive, while a family-friendly musical (like Back to the Future, which we saw last night) will engage even kids who struggle to sit through five minutes of a movie. With my 7-year-old son, the issue wasn’t that he was bored, but on the contrary, that he kept jumping up and excitedly commenting on everything. He couldn’t get over that Marty McFly needed to teach his own former father to be brave.

Also, to pay for the Tube in London, you literally just tap your credit card on the card reader when you enter and exit. And if you have kids, you bring them through with you for free in the wide lane. No one will explain this. We met friends living in London who literally spent a year here before they figured it out.

Update (June 1): I’m pleased to say that we had a vastly better time in Oxford—visiting fine museums, giving talks in the CS department, touring Balliol College, and meeting Toby Ord and David Deutsch. No matter where in the world you travel, friends and colleagues make an enormous difference.

I’m grateful also to all the people who wrote in with London travel advice, and to confirm that I was not insane to notice what I noticed: the crowds are immense, the London Science Museum kind of sucks, bathrooms/lavatories are hard to find, and trash cans/bins are few and far between (apparently that’s because they were used for terrorist attacks a quarter century ago). On the positive side, just about every bathroom in every AirBnB has a rack to heat the towels (but it doesn’t have shampoo).

I was wrong, of course, to get so emotional about it. Given that this is what reliably happens, however, whenever I become absolutely certain that I’m standing face to face with a blankface, who’s making my and my kids’ lives miserable just because they can, writing about it helps me a lot to get over it. I am over it.

Now we’re back in London and at the Natural History Museum, which I’m reliably told is better than the science museum!

Three times in my life, I’ve gone to museums where I had such a horrifying experience, and was treated by museum staff with such blankfaced contempt, that the only way I could restore any feeling of cosmic justice was to use this blog to impose some cost on the museum for what it did to me. The first time was when my family was turned away from the Mark Twain House in Hartford, CT, after we’d made a long detour there. Turns out it was guided tours only (idiotic policy right there), and while the last guided tour hadn’t yet left and wasn’t full, the blankface behind the desk arbitrarily decided that we’d come too late, and that therefore we should just leave. The second time, a couple months ago, was when my son and I were kicked out of the only decent room in Washington DC’s “Planet Word” — when we were singled out as criminals despite not having passed any sign telling us not to enter — after my son had found the only exhibit in the museum that held his interest.

Today, alas, it was the London Science Museum, which was the very first thing we chose to visit on my kids’ very first visit to the UK. Every positive review of this abomination on the Internet is a lie, every British person should feel ashamed to have the museum represent him or her, and every visitor should avoid it at all costs.

For an hour, my son desperately needed to use a bathroom. But there’s apparently only one set of bathrooms in the whole museum, hidden away in the basement with no signs leading to them. I asked multiple employees. Not one of them could clearly answer the “where is the bathroom?” question — as if I’d asked them for an interdimensional platypus, or as if the English don’t speak English. Eventually I called out loudly: “WOW, OH MY GOD, THERE ARE NO BATHROOMS ON THIS ENTIRE FLOOR! WHAT KIND OF MUSEUM IS THIS? WHAT IDIOTS DESIGNED IT? HOW HORRIBLE COULD IT POSSIBLY BE?” Many employees heard; not one offered to help. May they feel shame until the day they die.

Beyond that, each exhibit was a depressing touch-screen job designed by morons, like the iPad games that otherwise fill my kids’ lives except even less educational and certainly less fun. And I haven’t even mentioned the neverending lines (sorry, “queues”). Belying the English reputation for politeness, other patrons constantly butted ahead of me and my son, so that the queue for each exhibit got longer the longer we stood. In an hour, my son got to see a grand total of two exhibits. They both sucked.

Apparently there are amazing historical artifacts elsewhere in the museum, including a Babbage Analytical Engine. Alas, we felt forced to leave before we got to see any of those.

There’s one other part of the museum that apparently doesn’t suck, called “WonderLab.” But our extra-cost WonderLab tickets were for 4pm, and the rest of the museum sucked so badly that my son and I stormed out beforehand. We instead took a long walk through Hyde Park, talking about all the plants and birds we encountered and ending up at the Diana Memorial Playground. It was an infinitely better experience than the one we’d paid for.

Speaking of which, we got scammed. Just like at Planet Word, the entrance fee — we might as well call it that, for all you’ll realistically avoid paying — is called a “suggested donation.” But then, after the blankfaces twist the knife and ruin your family’s entire vacation, you’re not entitled to a refund, because after all, you technically never bought anything … you just “donated”!

I feel like my standards for museums are rock-bottom. Just provide me and my kids a non-horrible experience. Have stuff for my kids to play with, right now, without directing me or them to go through your blankfaced processes or systems. Have chicken tenders if my kids are hungry, water fountains if they’re thirsty, benches if they’re tired, and bathrooms if they need bathrooms. And most importantly: if you see guests suffering because of the idiocies of your museum’s design, be helpful and apologetic rather than blankfaced and contemptuous.

The London Science Museum failed on each of these counts.

And yes, I know, I know: I’m the crazy one. To paraphrase a famous Londoner, the reasonable man adapts himself to blankfaced museums, while the unreasonable man persists in trying to adapt those museums to himself. Therefore all progress in making museums non-horrible depends on unreasonable men.

We have another week in London and Oxford. I hope and expect it will be better than this!

June 07, 2024

Matt von HippelGravity-Defying Theories

Universal gravitation was arguably Newton’s greatest discovery. Newton realized that the same laws could describe the orbits of the planets and the fall of objects on Earth, that bodies like the Moon can be fully understood only if you take into account both the Earth and the Sun’s gravity. In a Newtonian world, every mass attracts every other mass in a tiny, but detectable way.

Einstein, in turn, explained why. In Einstein’s general theory of relativity, gravity comes from the shape of space and time. Mass attracts mass, but energy affects gravity as well. Anything that can be measured has a gravitational effect, because the shape of space and time is nothing more than the rules by which we measure distances and times. So gravitation really is universal, and has to be universal.

…except when it isn’t.

It turns out, physicists can write down theories with some odd properties. Including theories where things are, in a certain sense, immune to gravity.

The story started with two mathematicians, Shiing-Shen Chern and Jim Simons. Chern and Simons weren’t trying to say anything in particular about physics. Instead, they cared about classifying different types of mathematical space. They found a formula that, when added up over one of these spaces, counted some interesting properties of that space. A bit more specifically, it told them about the space’s topology: rough details, like the number of holes in a donut, that stay the same even if the space is stretched or compressed. Their formula was called the Chern-Simons Form.

The physicist Albert Schwarz saw this Chern-Simons Form, and realized it could be interpreted another way. He looked at it as a formula describing a quantum field, like the electromagnetic field, describing how the field’s energy varied across space and time. He called the theory describing the field Chern-Simons Theory, and it was one of the first examples of what would come to be known as topological quantum field theories.

In a topological field theory, every question you might want to ask can be answered in a topological way. Write down the chance you observe the fields at particular strengths in particular places, and you’ll find that the answer you get only depends on the topology of the space the fields occupy. The answers are the same if the space is stretched or squished together. That means that nothing you ask depends on the details of how you measure things, that nothing depends on the detailed shape of space and time. Your theory is, in a certain sense, independent of gravity.

Others discovered more theories of this kind. Edward Witten found theories that at first looked like they depend on gravity, but where the gravity secretly “cancels out”, making the theory topological again. It turned out that there were many ways to “twist” string theory to get theories of this kind.

Our world is for the most part not described by a topological theory, gravity matters! (Though it can be a good approximation for describing certain materials.) These theories are most useful, though, in how they allow physicists and mathematicians to work together. Physicists don’t have a fully mathematically rigorous way of defining most of their theories, just a series of approximations and an overall picture that’s supposed to tie them together. For a topological theory, though, that overall picture has a rigorous mathematical meaning: it counts topological properties! As such, topological theories allow mathematicians to prove rigorous results about physical theories. It means they can take a theory of quantum fields or strings that has a particular property that physicists are curious about, and find a version of that property that they can study in fully mathematical rigorous detail. It’s been a boon both to mathematicians interested in topology, and to physicists who want to know more about their theories.

So while you won’t have antigravity boots any time soon, theories that defy gravity are still useful!

June 06, 2024

John BaezAgent-Based Models (Part 12)

Today I’d like to wrap up my discussion of how to implement the Game of Life in our agent-based model software called AlgebraicABMs.

Kris Brown’s software for the Game of Life is here:

• game_of_life: code and explanation of the code.

He now has carefully documented the code to help you walk through it, and to see it in a beautiful format I recommend clicking on ‘explanation of the code’.

A fair amount of the rather short program is devoted to building the grid on which the Game of Life runs, and displaying the game as it runs. Instead of talking about this in detail—for that, read Kris Brown’s documentation!—I’ll just explain some of the underlying math.

In Part 10, I explained ‘C-sets’, which we use to represent ‘combinatorial’ information about the state of the world in our agent-based models. By ‘combinatorial’ I mean things that can be described using finite sets and maps between finite sets, like:

• what is the set of people in the model?
• for each person, who is their father and mother?
• for each pair of people, are they friends?
• what are the social networks by which people interact?

and so on.

But in addition to combinatorial information, our models need to include quantitative information about the state of the world. For example, entities can have real-valued attributes, integer-valued attributes and so on:

• people have ages and incomes,
• reservoirs have water levels,

and so on. To represent all of these we use ‘attributed C-sets’.

Attributed C-sets are an important data structure available in AlgebraicJulia. They have already been used to handle various kinds of networks that crucially involve quantitative information, e.g.

Petri nets where each species has a ‘value’ and each transition has a `rate constant’

• ‘stock-flow diagrams where each stock has a ‘value’ and each flow has a `flow function’.

In the Game of Life we are using attributed C-sets in a milder way. Our approach to the Game of Life lets the cells be vertices of an arbitrary graph. But suppose we want that graph to be a square grid, like this:

Yes, this is a bit unorthodox: the cells are shown as circles rather than squares, and we’re drawing edges between them to say which are neighbors of which. Green cells are live; red cells are dead.

But my point here is that to display this picture, we want the cells to have x and y coordinates! And we can treat these coordinates as ‘attributes’ of the cells.

We’re using attributes in a ‘mild’ way here because the cells’ coordinates don’t change with time—and they don’t even show up in the rules for the Game of Life, so they don’t affect how the state of the world changes with time. We’re only using them to create a picture of the state of the world. But in most agent-based models, attributes will play a more significant role. So it’s good to talk about attributes.

Here’s how we get cells to have coordinates in AlgebraicJulia. First we do this:

@present SchLifeCoords <: SchLifeGraph begin
  coords::Attr(V, Coords)

Here we are taking the schema SchLifeGraph, which I explained in Part 10 and which looks like this:

and we’re making this schema larger by giving the object V (for ‘vertex’) an attribute called coords:

Note that Coords looks like just another object in our schema, and it looks like our schema has another morphism

coords: V → Coords

However, Coords is not just any old object in our schema: it’s an ‘attribute type’. And coords: V → Coords is not just any old morphism: it’s an ‘attribute’. And now I need to tell you what these things mean!

Simply put, while an instance of our schema will assign arbitrary finite sets to V and E (since a graph can have an arbitrary finite set of vertices and edges), Coords will be forced to be a particular set, which happens not to be finite, namely the set of pairs of integers, ℤ2.

In the code, this happens here:

@acset_type LifeStateCoords(SchLifeCoords){Tuple{Int,Int}} <: 

You can see that the type ‘pair of integers’ is getting invoked. There’s also some more mysterious stuff going on. But instead of explaining that stuff, let me say more about the math of attributed C-sets. What are they, really?

Attributed C-sets

Attributed C-sets were introduced here:

• Evan Patterson, Owen Lynch and James Fairbanks, Categorical data structures for technical computing, Compositionality 4 5 (2022).

and further explained here:

• Owen Lynch, The categorical scoop on attributed C-sets, AlgebraicJulia blog, 5 October 2020.

The first paper gives two ways of thinking about attributed C-sets, and Owen’s paper gives a third more sophisticated way. I will go in the other direction and give a less sophisticated way.

I defined schemas and their instances in Part 10; now let me generalize all that stuff.

Remember, I said that a schema consists of:

1) a finite set of objects,

2) a finite set of morphisms, where each morphism goes from some object to some other object: e.g. if x and y are objects in our schema, we can have a morphism f: x → y, and

3) a finite set of equations between formal composites of morphisms in our schema: e.g. if we have morphisms f: x → y, g: y → z and h: x → z in our schema, we can have an equation h = g ∘ f.

Now we will add on an extra layer of structure, namely:

4) a subset of objects called attribute types, and

5) a subset of morphisms f: x → y called attributes where y is an attribute type and x is not, and

6) a set K(x) for each attribute type.

Mathematically K(x) is often an infinite set, like the integers ℤ or real numbers ℝ. But in AlgebraicJulia, K(x) can be any data type that has elements, e.g. Int (for integers) or Float32 (for single-precision floating-point numbers).

People still call this more elaborate thing a schema, though as a mathematician that makes me nervous.

An instance of this more elaborate kind of schema consists of:

1) a finite set F(x) for each object in the schema, and

2) a function F(f): F(x) → F(y) for each morphism in the schema, such that

3) whenever composites of morphisms in the schema obey an equation, their corresponding functions obey the corresponding equation, e.g. if h = g ∘ f in the schema then F(h) = F(g) ∘ F(f), and

4) F(x) = K(x) when x is an attribute type.

If our schema presents some category C, we also call an instance of it an attributed C-set.

But I hope you understand the key point. This setup gives us a way to ‘nail down’ the set F(x) when x is an attribute type, forcing it to equal the same set K(x) for every instance F. In the Game of Life, we choose

K(Coord) = ℤ2

This forces

F(Coord) = ℤ2

for every instance F. This in turn forces the coordinates of every vertex v ∈ F(V) to be a pair of integers for every instance F—that is, for every state of the world in the Game of Life.

This is all I will say about our implementation of the Game of Life. It’s rather atypical as agent-based models go, so while it illustrates many aspects of our methodology, for others we’ll need to turn to some other models. Xiaoyan Li has been working hard on some models of pertussis (whooping cough), so I should talk about those.

John PreskillWatch out for geese! My summer in Waterloo

It’s the beginning of another summer, and I’m looking forward to outdoor barbecues, swimming in lakes and pools, and sharing my home-made ice cream with friends and family. One thing that I won’t encounter this summer, but I did last year, is a Canadian goose. In summer 2023, I ventured north from the University of Maryland – College Park to Waterloo, Canada, for a position at the University of Waterloo. The university houses the Institute for Quantum Computing (IQC), and the Perimeter Institute (PI) for Theoretical Physics is nearby. I spent my summer at these two institutions because I was accepted into the IQC’s Undergraduate School on Experimental Quantum Information Processing (USEQIP) and received an Undergraduate Research Award. I’ll detail my experiences in the program and the fun social activities I participated in along the way.

For my first two weeks in Waterloo, I participated in USEQIP. This program is an intense boot camp in quantum hardware. I learned about many quantum-computing platforms, including trapped ions, superconducting circuits, and nuclear magnetic resonance systems. There were interactive lab sessions where I built a low-temperature thermometer, assembled a quantum key distribution setup, and designed an experiment of the Quantum Zeno Effect using nuclear magnetic resonance systems. We also toured the IQC’s numerous research labs and their nano-fabrication clean room. I learned a lot from these two weeks, and I settled into life in goose-filled Waterloo, trying to avoid goose poop on my daily walks around campus.

I pour liquid nitrogen into a low-temperature container.

Once USEQIP ended, I began the work for my Undergraduate Research Award, joining Dr. Raymond Laflamme’s group. My job was to read Dr. Laflamme’s soon-to-be-published textbook about quantum hardware, which he co-wrote with graduate student Shayan Majidy and Dr. Chris Wilson. I read through the sections for clarity and equation errors. I also worked through the textbook’s exercises to ensure they were appropriate for the book. Additionally, I contributed figures to the book.

The most challenging part of this work was completing the exercises. I would become frustrated with the complex problems, sometimes toiling over a single problem for over three hours. My frustrations were aggravated when I asked Shayan for help, and my bitter labor was to him a simple trick I had not seen. I had to remind myself that I had been asked to test drive this textbook because I am the target audience for it. I offered an authentic undergraduate perspective on the material that would be valuable to the book’s development. Despite the challenges, I successfully completed my book review, and Shayan sent the textbook for publication at the beginning of August.

After, I moved on to another project. I worked on the quantum thermodynamics research that I conduct with Dr. Nicole Yunger Halpern. My work with Dr. Yunger Halpern concerns systems with noncommuting charges. I run numerical calculations on these systems to understand how they thermalize internally. I enjoyed working at both the IQC and the Perimeter Institute with their wonderful office views and free coffee.

Dr. Laflamme and I at the Perimeter Institute on my last day in Waterloo.

Midway through the summer, Dr. Laflamme’s former and current students celebrated his 60th birthday with a birthday conference. As one of his newest students, I had a wonderful time meeting many of his past students who’ve had exciting careers following their graduation from the group. During the birthday conference, we had six hours of talks daily, but these were not traditional research talks. The talks were on any topic the speaker wanted to share with the audience. I learned about how a senior data scientist at TD Bank uses machine learning, a museum exhibit organized by the University of Waterloo called Quantum: The Exhibition, and photonic quantum science at the Raman Research Institute. For the socializing portion, we played street hockey and enjoyed delicious sushi, sandwiches, and pastries. By coincidence, Dr. Laflamme’s birthday and mine are one day apart!

Outside of my work, I spent almost every weekend exploring Ontario. I beheld the majesty of Niagara Falls for the first time; I visited Canada’s wine country, Niagara on the Lake; I met with friends and family in Toronto; I stargazed with the hope of seeing the aurora borealis (unfortunately, the Northern Lights did not appear). I also joined a women’s ultimate frisbee team, PPF (sorry, we can’t tell you what it stands for), during my stay in Canada. I had a blast getting to play while sharpening my skills for the collegiate ultimate frisbee season. Finally, my summer would not have been great without the friendships that I formed with my fellow USEQIP undergraduates. We shared more than just meals; we shared our hopes and dreams, and I am so lucky to have met such inspiring people.

I spent my first weekend in Canada at Niagara Falls.

Though my summer in Waterloo has come to an end now, I’ll never forget the incredible experiences I had. 

Matt Strassler Today: Panel Discussion at the Boston Public Library

This evening, Thursday June 6th at 6:30 pm, I’ll be joined at the Boston Public Library by Sarah Demers, professor at Yale and member of the ATLAS experiment, and Katrina Miller, Ph.D. in neutrino physics and writer for, among other publications, the New York Times. We’ll serve on a panel entitled “Particle Physics: Where the Universe and Humanity Collide”, talking about the future of particle physics and about how we got into physics in the first place. This event, intended for the general public, is part of the international scientific meeting that I’m attending this week, the 12th annual Large Hadron Collider Physics conference. I hope to see some of you there!

June 05, 2024

John BaezAgent-Based Models (Part 10)

We’ve been hard at work here in Edinburgh. Kris Brown has created Julia code to implement the ‘stochastic C-set rewriting systems’ I described last time. I want to start explaining this code and also examples of how we use it.

I’ll start with an easy example of how we can use it. Kris decided to implement the famous cellular automaton called the Game of Life, so I’ll explain that. I won’t get very far today because there are a lot of prerequisites I want to cover, and I don’t want to rush through them. But let’s get started!

Choosing the Game of Life as an example may seem weird, because I’ve been talking about stochastic C-set rewriting systems, and the Game of Life doesn’t look stochastic. There’s no randomness: the state of each cell gets updated once each time step, deterministically, according to the states of its neighbors.

But in fact, determinism is a special case of randomness! It’s just randomness where every event happens with probability 0 or 1. A stochastic C-set rewriting system lets us specify that an event happens with probability 1 at a fixed time in the future as soon as the conditions become right. Thus, we can fit the Game of Life into this framework. And once we write the code to do this, it’s easy to tweak the code slightly and get a truly stochastic variant of the Game of Life which incorporates randomness.

Let’s look at the program Kris wrote, called game_of_life. It’s in the language called Julia. I’ll start at the beginning.

# # Game of Life
# First we want to load our package with `using`

using AlgebraicABMs, Catlab, AlgebraicRewriting

This calls up AlgebraicABMs, which is the core piece of code used to implement stochastic C-set rewriting models. I need to explain this! But I wanted to start with something easier.

It also calls up Catlab, which is a framework for doing applied and computational category theory in Julia. This is the foundation of everything we're doing.

It also calls up AlgebraicRewriting, which is a program developed by Kris Brown and others that implements C-set rewriting in Julia.

# # Schema 
# We define a network of cells that can be alive or dead (alive cells are in 
# the image of the `live` function, which picks out a subset of the vertices.)

@present SchLifeGraph <: SchSymmetricGraph begin 

This code is defining a schema called SchLifeGraph. Last time I spoke of C-sets, which are functors from a category C to the category of sets. To describe a category in Catlab we use a ‘schema’. A schema consists of

1) a finite set of objects,

2) a finite set of morphisms, where each morphism goes from some object to some other object: e.g. if x and y are objects in our schema, we can have a morphism f: x → y, and

3) a finite set of equations between formal composites of morphisms in our schema: e.g. if we have morphisms f: x → y, g: y → z and h: x → z in our schema, we can have an equation h = g ∘ f.

What we care about, ultimately, are the ‘instances’ of a schema. An instance F of a schema consists of:

1) a finite set F(x) for each object in the schema, and

2) a function F(f): F(x) → F(y) for each morphism in the schema, such that

3) whenever composites of morphisms in the schema obey an equation, their corresponding functions obey the corresponding The objects and morphisms are sometimes called generators while the equations are sometimes called relations, and we say that a schema is a way of presenting a category using generator and relations.equation, e.g. if h = g ∘ f in the schema then F(h) = F(g) ∘ F(f).

(Mathematically, the objects and morphisms of a schema are sometimes called generators, while the equations are sometimes called relations, and we say that a schema is a way of presenting a category using generators and relations. If a schema presents some category C, an instance of this schema is a functor F: C → Set. Thus, we also call an instance of this schema a C-set. Many things we do with schemas often take advantage of this more mathematical point of view.)

The command @present SchLifeGraph <: SchSymmetricGraph says we're going to create a schema called SchLifeGraph by taking a previously defined schema called SchSymmetricGraph and throwing in more objects, morphisms and/or equations.

The schema SchSymmetricGraph was already defined in CatLab. It's the schema whose instances are symmetric graphs: roughly, directed graphs where you can ‘turn around’ any edge going from a vertex v to a vertex w and get an edge from w to v. The extra stuff in the schema SchLifeGraph will pick out which vertices are ‘live’. And this is exactly what we want in the Game of Life—if we treat the square ‘cells’ in this game as vertices, and treat neighboring cells as vertices connected by edges. In fact we will implement a more general version of the Game of Life which makes sense for any graph! Then we will implement a square grid and run the game on that.

More precisely, SchSymmetricGraph is the schema with two objects E and V, two morphisms src, tgt: E → V, and a morphism inv: E → E obeying

src ∘ inv = tgt
tgt ∘ inv = src
inv ∘ inv = 1E

AlgebraicJulia can draw schemas, and if you ask it to draw SchSymmetricGraph it will show you this:

This picture doesn’t show the equations.

An instance of the schema SchSymmetricGraph is

• a set of edges,
• a set of vertices,
• two maps from the set of edges to the set of vertices (specifying the source and target of each edge),
• a map that ‘turns around’ each edge, switching its source and target, such that
• turning around an edge twice gives you the original edge again.

This is a symmetric graph!

We want to take the schema SchSymmetricGraph and throw in a new object called Life and a new morphism live: Life → V We do this with the lines


Now we’ve defined our schema SchLifeGraph. If you ask AlgebraicJulia to draw, you’ll see this:

I hope you can see what an instance of this schema is! It’s a symmetric graph together with a set and a function from this set to the set of vertices of our graph. This picks out which vertices are ‘live’. And this is exactly what we want in the Game of Life, if what we usually call ‘cells’ are treated as vertices, and neighboring cells are connected by edges.

The schema SchLifeGraph presents some category C. A state of the world in the Game of Life is then a C-set, i.e. an instance of the schema SchLifeGraph. This is just the first step in describing a stochastic C-set rewriting system for the Game of Life. As explained in Part 9, next we need to specify

• the rewrite rules which say how the state of the world changes with time,


• the ‘timers’ which say when it changes.

I’ll do that next time!

Matt Strassler From the LHCP Conference, a Step Forward

At a conference like LHCP12, covering all of Large Hadron Collider [LHC] physics and beyond, there’s far too much to summarize: hundreds of talks, with thousands of incremental experimental results and theoretical insights. So instead, today I’ll draw attention to one of the longest-running puzzles of the LHC era, and to a significant step that’s been made toward resolving it. The puzzle in question involves a rare decay of bottom quarks.

[All figures in this post are taken from LHCP12 talks by Zhangqier Wang and Eluned Smith.]

The Decay of a Bottom Quark to a Lepton-Antilepton Pair

In the Standard Model of particle physics, bottom quarks most often decay to charm quarks. They do so via a “virtual W boson” — a general disturbance in the W field — which subsequently is converted either

  • to a quark-antiquark pair, or
  • to a “lepton” (an electron, muon or tau) and an anti-neutrino.

[See for instance Figure 1 of this post.]

But rarely, a bottom quark can decay to a strange quark and to a lepton-antilepton pair (an electron and a positron, or a muon and an anti-muon, or a tau and an anti-tau.) The example of a muon-antimuon pair is shown below.

Figure 1: The general form of a rare process in which a bottom quark b decays to a strange quark s, a muon μ and an anti-muon μ+.

Within the Standard Model such a process can occur through quantum physics, involving subtle interactions of the known elementary fields. It is very rare; less than one in a million bottom quarks decays this way. But it can be measured in detail.

Figure 2: In the Standard Model, the decay shown in Fig. 1 occurs through a quantum effect, involving the up, charm or top quark field (u,c,t), the W field, and the electromagnetic (γ) or Z fields. The rate is small but measurable. The details depend on the invariant mass-squared, called q2 here, of the muon and anti-muon.

Because it is rare, the rate for this process is easily altered by new particles and fields that aren’t included in the Standard Model, making it an interesting target for theorists to explore. And since the process is relatively easy to measure, it has been a key target for experimentalists at the LHC experiments, expecially LHCb and CMS, and to some degree ATLAS.

The Discrepancy at the LHCb Experiment

For a decade, theorists’ predictions for this decay have been in conflict with the measurements made by the LHCb experiment. This is quantified in the plot below, which shows a certain aspect of the process as a function of the invariant mass-squared (q2) of the muon/anti-muon pair. (More precisely, as shown at the bottom of Figure 1 of this post from 2013, the measurement involves a B meson decaying to a K meson plus a muon/anti-muon pair)

Figure 3: Older data from the LHCb experiment (black crosses), showing a certain measure of the process in the previous figure as a function of q2. It shows significant disagreement with a theoretical prediction (orange bars). Light gray bars are regions where predictions are not possible and are excluded from the comparison.

What are we to make of this disagreement? Well, as always in such situations, there are three possibilities:

  • The experimental measurements have a mistake somewhere,
  • The theoretical calculations behind the prediction have a mistake somewhere, or
  • The Standard Model of particle physics is missing something

The third case would be of enormous importance in particle physics: a discovery of something fundamentally new, and a cause for celebration. The first two options would be far less exciting, and we must rule both of them out convincingly before celebrating.

If there is an error in the measurements or calculations, it is unlikely to be something simple. The people involved are experienced professionals, and their work has by now been checked by many other experts. Still, subtle mistakes — an underestimate of a complex quantum effect, or a feature of the experimental detector that hasn’t been properly modeled — do happen, and are more common than true discoveries.

The Contribution of the CMS Experiment

Importantly, we can now rule out the first possibility: there’s no mistake in the LHCb measurement. The CMS experiment has now repeated the measurement, with much improved precision compared to their previous efforts. As shown in Fig. 4, the LHCb and CMS measurements match. (CMS and LHCb are so different in their design that there’s no reasonable possibility that they have correlated detector issues.)

Figure 4: New data from CMS (black) compared to LHCb’s data (orange) and CMS’s older, less precise data (maroon). (The right-hand panel is the update of Fig. 3.) Agreement of the new higher-precision data from CMS with that of LHCb is now clear and compelling.

Since the experiments agree, focus now moves squarely to the theorists. Are their predictions correct? We have at least two sets of predictions; they appear as blue and orange bars in Fig. 5, which shows they agree with each other but disagree, in the center and right panels, with CMS data (and therefore, from Fig. 4, with LHCb data.)

Figure 5: The new data from CMS (black) compared to two theoretical predictions in blue and orange. The two theoretical predictions agree, but disagree with the CMS data, which (as seen in Fig. 4) agrees with LHCb.

Even though the two theoretical calculations agree, they are based on similar assumptions. Perhaps those assumptions are flawed?

There are certainly things to worry about. Anything involving the strong nuclear force, when it acts at distances comparable to the size of a proton, has to be subjected to heavy scrutiny. Far too often, discrepancies between theory and experiment have dissolved when potential theoretical uncertainties from the strong nuclear force were reconsidered. (For a recent example, see this one.) We will have to let the theory experts hash this out… which could take some time. I would not plan to order champagne any time soon. Nevertheless, this bears watching over the next few years.

Tommaso DorigoAcknowledging Giorgio's Mentoring Superpowers

Yesterday I gladly attended a symposium in honor of Giorgio Bellettini, who just turned 90. The italian physicist, who had a very big impact in particle physics in his long and illustrious career, is still very active -e.g. he makes all the hard questions at the conferences he attends, as he has always done. The symposium included recollections of Giorgio's career and achievements by colleagues who collaborated with him and/or shared a part of his path. Among them there were talks by Samuel Ting, Paul Grannis, Michelangelo Mangano, Hans Grasmann, Mario Greco.
I also was allowed to give a short recollection of a couple of episodes, that underline the exceptional disposition of Giorgio with students. Here is a quick-and-dirty English translation of my speech (it was in Italian).

read more

June 04, 2024

n-Category Café 3d Rotations and the 7d Cross Product (Part 2)

On Mathstodon, Paul Schwahn raised a fascinating question connected to the octonions. Can we explicitly describe an irreducible representation of SO(3)SO(3) on 7d space that preserves the 7d cross product?

I explained this question here:

This led to an intense conversation involving Layra Idarani, Greg Egan, and Paul Schwahn himself. The result was a shocking new formula for the 7d cross product in terms of the 3d cross product.

Let me summarize.

There are two equivalent ways to say what’s been done:

Theorem 1. We can explicitly describe an SO(3)SO(3) subgroup of G 2\mathrm{G}_2 such that the 7d irreducible representation of G 2\mathrm{G}_2 remains irreducible when restricted to this subgroup.

Theorem 2. We can explicitly describe an irreducible representation of SO(3)SO(3) on the imaginary octonions that preserves their dot product and cross product.

These are equivalent thanks to several well-known facts. The group of automorphisms of the octonions is G 2\mathrm{G}_2. Its action on the space of imaginary octonions, Im(𝕆)Im(\mathbb{O}), is the unique 7d irreducible representation of G 2\mathrm{G}_2. This action preserves the usual dot product and cross product of imaginary octonions, given by

vw=12(vw+wv),v×w=12(vwwv) v \cdot w= - \frac{1}{2}(v w + w v), \qquad v\times w = \frac{1}{2}(v w - w v)

Conversely, any linear transformation of Im(𝕆)Im(\mathbb{O}) that preserves the cross product also preserves the dot product, and all such transformations come from the action of G 2\mathrm{G}_2.

Either way we state the theorem, the only novelty — if any — is that we now have an explicit description. The existence seems to go back to old work by Dynkin:

  • E. B. Dynkin, Semisimple subalgebras of semisimple Lie algebras, American Mathematical Society Translations, Series 2, Volume 6, 1957.

In fact we seem to have two explicit descriptions. Unfortunately it takes some serious calculation to prove that either of them actually works. I think a truly conceptual proof still awaits us, though the second description points a way forward.

Our first description will start by building the unique 7-dimensional irreducible representation of SO(3)SO(3) in a familiar way. Then we will equip it with an isomorphism to Im(𝕆)Im(\mathbb{O}), and proof that the resulting action of SO(3)SO(3) on Im(𝕆)Im(\mathbb{O}) preserves the dot product and cross product. Layra Idarani proved this using fairly brutal calculations, which Greg Egan checked using Mathematica.

We start with the 3-dimensional inner product space VV with orthonormal basis x,y,zx, y, z. Then we let WW be the space of harmonic homogeneous degree-3 polynomials in x,y,x, y, and zz. This is 7-dimensional, since it has a basis

x 33xy 2,y 33yx 2,x^3 - 3x y^2, \quad y^3 - 3y x^2, y 33yz 2,z 33zy 2,y^3 - 3y z^2 , \quad z^3 - 3z y^2, z 33zx 2,x 33xz 2, z^3 - 3z x^2, \quad x^3 - 3x z^2, xyz x y z

Since VV has an inner product we get an isomorphism VV *V \cong V^\ast, so we can also think of these polynomials as functions on VV. SO(3)SO(3) acts on functions on VV, preserving the conditions of being harmonic and homogeneous of degree 3, so it acts on WW. This is well-known to give the unique 7-dimensional irreducible representation of SO(3)SO(3).

The next step is to choose a vector space isomorphism between WW and Im(𝕆)Im(\mathbb{O}). For this, we use a well-known orthonormal basis of the imaginary octonions:


This pictures shows the Fano plane, with 7 points and 7 lines (one of which is drawn as a circle). Each line contains 3 points, and the arrows indicate a cyclic ordering of these 3 points. Each point corresponds to a basis element e ie_i of the imaginary octonions. The cross product obeys

e i×e j=e ke_i \times e_j = e_k

whenever i,j,ki, j, k are a cyclically ordered triple of points on a line. For example, e 1×e 2=e 4e_1 \times e_2 = e_4 and e 5×e 6=e 1e_5 \times e_6 = e_1, but e 6×e 5=e 1e_6 \times e_5 = -e_1 because the cross product is anticommutative.

Layra Idarani chose this isomorphism between Im(𝕆)\mathrm{Im}(\mathbb{O}) and WW:

e 135(2x 33xy 23xz 2),e 235(2z 33x 2z3y 2z),e 435(2y 33yz 23x 2y)e_1 \mapsto \sqrt{\frac{3}{5}}(2x^3 - 3x y^2 - 3x z^2), \; e_2 \mapsto \sqrt{\frac{3}{5}}(2z^3 - 3x^2 z - 3y^2z), \; e_4 \mapsto \sqrt{\frac{3}{5}}(2y^3 - 3y z^2 - 3x^2 y)

e 33xz 23xy 2,e 53yx 23yz 2,e 63zy 23zx 2e_3 \mapsto 3 x z^2 - 3x y^2 , \; e_5 \mapsto 3y x^2 - 3y z^2, \; e_6 \mapsto 3z y^2 - 3z x^2

e 76xyze_7 \mapsto 6x y z

Using rather lengthy calculations, Layra and Greg checked that if we use this isomorphism to transfer the cross product on Im(𝕆)\mathrm{Im}(\mathbb{O}) to WW, we get a cross product on WW that is SO(3)SO(3)-invariant.

Besides the calculations required, the main downside to this argument is that it relies on cleverly choosing an isomorphism Im(𝕆)W\mathrm{Im}(\mathbb{O}) \cong W. Luckily, Paul Schwahn came up with a second approach that elegantly defines a cross product on WW without choosing an isomorphism Im(𝕆)W\mathrm{Im}(\mathbb{O}) \cong W. Alas, it still requires a hard calculation to show this cross product is isomorphic to the usual cross product on Im(𝕆)\mathrm{Im}(\mathbb{O}), but Layra says he has done that calculation.

Here’s the cool part: this second approach defines the 7d cross product as a kind of ‘cube’ of the 3d cross product! That came as a big surprise to me.

Here’s how it works.

Let S 3VS^3 V be the space of homogeneous degree-3 polynomials on our 3d inner product space VV. This gets an inner product from VV, so let

p:S 3VW p : S^3 V \to W

be the orthogonal projection onto the subspace of harmonic homogeneous degree-3 polynomials. If we pick an orientation on VV, we can define the usual 3d cross product

×:V×VV \times : V \times V \to V

using the right-hand rule.

Then Schwahn defines a bilinear operation

:S 3V×S 3VS 3V \bullet : S^3 V \times S^3 V \to S^3 V

in a cunning way. First, note that we can cube any element vVv \in V and get an element v 3S 3Vv^3 \in S^3 V. Then, let

u 3v 3=(u×v) 3 u^3 \bullet v^3 = (u \times v)^3

for all u,vVu, v \in V. It would take work to show that there exists a unique bilinear operation \bullet obeying this formula. I haven’t done all this work. But for uniqueness, it’s enough to note that any degree-3 polynomial on VV is a linear combination of cubes, which follows from the ‘polarization identity’ for cubic maps.

Next Schwahn defines a cross product on WS 3VW \subseteq S^3 V by

a×b=p(ab) a \times b = p(a \bullet b)

for all a,bWa,b \in W.

With this definition, it’s obvious that this cross product

×:W×WW \times : W \times W \to W

is SO(3)SO(3)-invariant. The work comes when we try to choose an isomorphism WIm(𝕆)W \cong \mathrm{Im}(\mathbb{O}) that carries this cross product to the usual cross product of imaginary octonions! And currently this seems to require a hard computation.

An alternative approach would be to check that this cross product on WW, together with the inner product on WW, obeys the axioms of a vector product algebra:

a×b=b×a a \times b = - b \times a

a(b×c)=b(c×a) a \cdot (b \times c) = b \cdot (c \times a)

(a×b)×a=(aa)b(ab)a (a \times b) \times a = (a \cdot a) b - (a \cdot b) a

The first one is obvious because \bullet is already antisymmetric… but I haven’t figured out how to show the next two!

June 03, 2024

Matt Strassler Panel Discussion This Week in Boston

The 12th Large Hadron Collider Physics conference is taking place this week in Boston, and for the first time in a several years, I’ll be able to attend in person. I’ll post about it all week.

As part of the conference activities, I will be participating in a public event Thursday night at the Boston Public Library, a panel discussion entitled “Where the Universe and Humanity Collide.” The other panelists are Yale Professor Sarah Demers, a member of the ATLAS experiment at the Large Hadron Collider, and Dr. Katrina Miller, a particle physicis and a writer and essayist for the New York Times and other publications. We’ll be discussing the future of particle physics, talking about how we got into the field, and answering whatever questions the audience might have for us. If you’re in Boston, please consider attending!

June 02, 2024

Doug NatelsonMaterials families: Halide perovskites

Looking back, I realized that I haven't written much about halide perovskites, which is quite an oversight given how much research impact they're having.  I'm not an expert, and there are multiple extensive review articles out there (e.g. here, here, here, here, here), so this will only be a very broad strokes intro, trying to give some context to why these systems are important, remarkable, and may have plenty of additional tricks to play.

From ACS Energy Lett. 5, 2, 604–610 (2020).

Perovskites are a class of crystals based on a structural motif (an example is ABX3, originally identified in the mineral CaTiO3, though there are others) involving octahedrally coordinated metal atoms.  As shown in the figure, each B atom is in the center of an octahedron defined by six X atoms.  There are many flavors of purely inorganic perovskites, including the copper oxide semiconductors and various piezo and ferroelectric oxides.  

The big excitement in recent years, though, involves halide perovskites, in which the X atom = Cl, Br, I, the B atom is most often Pb or Sn.  These materials are quite ionic, in the sense that the B atom is in the 2+ oxidation state, the X atom is in the 1- oxidation state, and whatever is in the A site is in the 1+ oxidation state (whether it's Cs+ or a molecular ion like methylammonium (MA = [CH3NH3]+) or foramidinium (FA = [HC(NH2)2]+).  

From Chem. Rev. 123, 13, 8154–8231 (2023).

There is an enormous zoo of materials based on these building blocks, made even more rich by the capability of organic chemists to toss in various small organic, covalent ligands to alter spacings between the components (and hence electronic overlap and bandwidths), tilt or rotate the octahedra, add in chirality, etc.  Forms that are 3D, effectively 2D (layers of corner-sharing octahedra), 1D, and "OD" (with isolated octahedra) exist.  Remarkably:

  • These materials can be processed in solution form, and it's possible to cast highly crystalline films.
  • Despite the highly ionic character of much of the bonding, many of these materials are semiconductors, with bandgaps in the visible.
  • Despite the differences in what chemists and semiconductor physicists usually mean by "pure", these materials can be sufficiently clean and free of the wrong kinds of defects that it is possible to make solar cells with efficiencies greater than 26% (!) (and very bright light emitting diodes).  
These features make the halide perovskites extremely attractive for possible applications, especially in photovoltaics and potentially light sources (even quantum emitters).  They are seemingly much more forgiving (in terms of high carrier mobility, vulnerability to disorder, and having a high dielectric polarizability and hence lower exciton binding energy and greater ease of charge extraction) than most organic semiconductors.  The halide perovskites do face some serious challenges (chemical stability under UV illumination and air/moisture exposure; the unpleasantness of Pb), but their promise is enormous

Sometimes nature seems to provide materials with particularly convenient properties.  Examples include water and the fact that ordinary ice is less dense than the liquid form; silicon and its outstanding oxide; gallium arsenide and the fact that it can be grown with great purity and stoichiometry even in an extremely As rich environment; I'm sure commenters can provide many more.  The halide perovskites seem to be another addition to this catalog, and as material properties continue to improve, condensed matter physicists are going to be looking for interesting things to do in these systems. 

Tommaso DorigoA Workshop You Should Not Miss

... if you are a researcher in physics or astrophysics and you are working with machine learning, that is.

Between September 23 and 25 - just when summer is over - we will meet in Valencia, Spain, to discuss the latest developments in deep learning applications to optimization of experiments in fundamental science. This is the fourth workshop of the MODE Collaboration, which focuses on a new frontier of application of deep learning: co-design and high-level optimization, and the tools to pull it off.

read more

June 01, 2024

Jordan EllenbergBagel, cream cheese, and kimchi

That’s it. No more to say. A bagel with cream cheese and kimchi is a great combination and I recommend it.

John BaezTransition Metals (Part 2)

Why is copper red? Why is it so soft compared to, say, nickel—the element right next to it in the periodic table? Why is it such a good conductor of electricity?

All of this stems from a violation of Hund’s rules. Let me explain.

In Part 1, I explained the basic math of transition metals. Now I just want to talk about how the first row of transition metals fill up the 10 orbitals in the 3d subshell, and what’s special about copper:

These elements have all the electrons that argon does: that’s the [Ar] in this chart. Most also have two electrons in the 4s subshell: one spin up, and one spin down. So the action mainly happens in the 3d subshell. This has 10 slots for electrons: 5 spin up, and 5 spin down. If you don’t know why 5, read Part 1.

Hund’s rules predict how these 10 slots are filled. It predicts that we first get 5 metals with 1, 2, 3, 4, and 5 spin-up electrons, and then 5 more metals that add in 1, 2, 3, 4, and 5 spin-down electrons. And that’s almost what we see.

But notice: when we hit chromium we get an exception! Chromium steals an electron from the 4s subshell to get 5 spin-up electrons in the 4d subshell. And we get an another exception when we hit copper. Can you guess why?

The short answer is that in every atom, its electrons are arranged so as to minimize energy. Hund’s rules are a good guess about how this works. But they’re not the whole story. It turns out that electrons in the d subshell can lower their energy if we have the maximum number possible, namely 5, with spins pointing in the same direction. (We arbitrarily call this direction ‘up’, but there’s nothing special about the ‘up’ direction so don’t lose sleep over that.

So: Hund’s rules predict that we get 5 spin-up electrons when we hit manganese, with 5 electrons in the d subshell. And that’s true. But the energy-lowering effect is strong enough that chromium ‘jumps the gun’ and steals one electron from the somewhat lower-energy s subshell to put 5 spin-up electrons in the d subshell! So chromium also has 5 electrons in the d subshell.

Similarly, Hund’s rules predict that we get 5 spin-up and 5 spin-down electrons when we reach zinc, with 10 electrons in the d subshell. And that’s true. But the element before zinc, namely copper, jumps the gun and steals an electron from the s subshell, so it also has 10 electrons in the d subshell.

The lone electron in its 4s shell makes copper a great conductor of electricity: these electrons can easily hop from atom to atom. And that in turn means that blue and green light are energetic enough to push those electrons around, so copper absorbs blue and green light… while still reflecting the lower-energy red light!

Similar things happen with the elements directly below copper in the periodic table: silver and gold. Wikipedia explains it a bit more technically:

Copper, silver, and gold are in group 11 of the periodic table; these three metals have one s-orbital electron on top of a filled d-electron shell and are characterized by high ductility, and electrical and thermal conductivity. The filled d-shells in these elements contribute little to interatomic interactions, which are dominated by the s-electrons through metallic bonds. Unlike metals with incomplete d-shells, metallic bonds in copper are lacking a covalent character and are relatively weak. This observation explains the low hardness and high ductility of single crystals of copper”

Copper is one of a few metallic elements with a natural color other than gray or silver. Pure copper is orange-red and acquires a reddish tarnish when exposed to air. This is due to the low plasma frequency of the metal, which lies in the red part of the visible spectrum, causing it to absorb the higher-frequency green and blue colors.”

It would take more work to understand why copper, silver and gold have such different colors! People often blame the color of gold on relativistic effects, but this of course is not a full explanation:

• Physics FAQ, Relativity in chemistry: the color of gold.

May 31, 2024

Jordan EllenbergI dream of Gunnar

Last night I dreamed I found Gunnar Henderson’s apartment unlocked and started hanging out there. It was a really nice apartment. Dr. Mrs. Q was there too, we were watching TV, eating out of his fridge, etc. Suddenly I started to feel that what we were doing was really dangerous and that Henderson was likely to come back at any time. In a huge rush I packed up everything I’d left around and got myself out the door, but try as I might I couldn’t get Dr. Mrs. Q. to have the same level of urgency, and she was a little behind me. And as I was leaving, there was Gunnar Henderson coming up the stairs! I tried to distract him by asking for his autograph, but it was no use — he went into his apartment and found my wife there. I was freaking out, pretty sure we were going to arrested, but in fact Gunnar Henderson was very cool about it and invited us to a party some guys on the Orioles were having in a few months’ time.

Henderson really has been as good as I could have dreamed, not just in a “overlooking breaking and entering if the perpetrator is a true fan” kind of way but by leading the American League in home runs while playing spectacular defense. I was pretty pessimistic at the end of last season about the Orioles chances of getting close to a title again. I was both right and wrong. Wrong, in that I wrote

with an ownership willing to add expensive free agents to fill the holes, it could be a championship team. But we have an ownership that’s ecstatic that the 2023 team lucked into 101 regular season wins, and that will be perfectly happy to enjoy 90-win seasons and trips to the Wild Card game for the next few years, until the unextended players mentioned above peel off into free agency one by one.

That changed: now we do have new ownership, and a new expensive #1 starter in Corbin Burnes, and that makes a huge difference in how well set-up we are for a playoff series. You just don’t have to win many games started by anybody other than Burnes, Grayson Rodriguez, and Kyle Bradish, as long as those three stay healthy, and that’s a good position to be in.

But I was right about

But this year, both the Yankees and Red Sox were kind of bad, and content to be kind of bad, and didn’t make gigantic talent adds in a bid for the playoffs. That hasn’t been the case for years and it won’t be the case again anytime soon.

The Yankees added Juan Soto and are not the same Yankees we finished comfortably ahead of last year.

One of my main points at the end of last year was that the Orioles got really lucky in one-run games and probably weren’t really a 101-win team. This year, so far, we’re whaling the tar out of the ball and actually are playing like a 100-win team. That’s the big thing I didn’t predict — not just that Gunnar would be this good but that guys like Jordan Westburg, Colton Cowser would be raking too.

I don’t think there’s any question the Orioles have made a real change to their hitting approach. It’s much more aggressive. Adley Rutschman, who used to battle for the league lead in walks, has only 12 in 51 games. But he’s still hitting better than last year, because some of those walks have turned into homers. In fact, the Orioles are second in the AL in home runs and dead last in walks. That’s just weird! Usually teams with power get pitched around a lot; and I think the Orioles are just refusing to be pitched around, and swinging at pitches they can drive in the air, even if they might be balls. Elevation is key; the Orioles have hit into only 20 double plays in their first 54 games, a pace of 60 for a full season; the lowest team total ever is the 1945 St. Louis Cardinals with 75, and that was in a 154-game season. Only two Iteams have ever had that few GIDP in their first 54 games, both matching the Orioles’ 20 exactly: the 2019 Mariners (finished with 84) and the 2016 Rays (87).

Matt von HippelDoes Science Require Publication?

Seen on Twitter:

As is traditional, twitter erupted into dumb arguments over this. Some made fun of Yann LeCun for implying that Elon Musk will be forgotten, which despite any other faults of his seems unlikely. Science popularizer Sabine Hossenfelder pointed out that there are two senses of “publish” getting confused here: publish as in “make public” and publish as in “put in a scientific journal”. The latter tends to be necessary for scientists in practice, but is not required in principle. (The way journals work has changed a lot over just the last century!) The former, Sabine argued, is still 100% necessary.

Plenty of people on twitter still disagreed (this always happens). It got me thinking a bit about the role of publication in science.

When we talk about what science requires or doesn’t require, what are we actually talking about?

“Science” is a word, and like any word its meaning is determined by how it is used. Scientists use the word “science” of course, as do schools and governments and journalists. But if we’re getting into arguments about what does or does not count as science, then we’re asking about a philosophical problem, one in which philosophers of science try to understand what counts as science and what doesn’t.

What do philosophers of science want? Many things, but a big one is to explain why science works so well. Over a few centuries, humanity went from understanding the world in terms of familiar materials and living creatures to decomposing them in terms of molecules and atoms and cells and proteins. In doing this, we radically changed what we were capable of, computers out of the reach of blacksmiths and cures for diseases that weren’t even distinguishable. And while other human endeavors have seen some progress over this time (democracy, human rights…), science’s accomplishment demands an explanation.

Part of that explanation, I think, has to include making results public. Alchemists were interested in many of the things later chemists were, and had started to get some valuable insights. But alchemists were fearful of what their knowledge would bring (especially the ones who actually thought they could turn lead into gold). They published almost only in code. As such, the pieces of progress they made didn’t build up, didn’t aggregate, didn’t become overall progress. It was only when a new scientific culture emerged, when natural philosophers and physicists and chemists started writing to each other as clearly as they could, that knowledge began to build on itself.

Some on twitter pointed out the example of the Manhattan project during World War II. A group of scientists got together and made progress on something almost entirely in secret. Does that not count as science?

I’m willing to bite this bullet: I don’t think it does! When the Soviets tried to replicate the bomb, they mostly had to start from scratch, aside from some smuggled atomic secrets. Today, nations trying to build their own bombs know more, but they still must reinvent most of it. We may think this is a good thing, we may not want more countries to make progress in this way. But I don’t think we can deny that it genuinely does slow progress!

At the same time, to contradict myself a bit: I think you can think of science that happens within a particular community. The scientists of the Manhattan project didn’t publish in journals the Soviets could read. But they did write internal reports, they did publish to each other. I don’t think science by its nature has to include the whole of humanity (if it does, then perhaps studying the inside of black holes really is unscientific). You probably can do science sticking to just your own little world. But it will be slower. Better, for progress’s sake, if you can include people from across the world.

May 30, 2024

n-Category Café Lanthanides and the Exceptional Lie Group G2

The lanthanides are the 14 elements defined by the fact that their electrons fill up, one by one, the 14 orbitals in the so-called f subshell. Here they are:

lanthanum, cerium, praseodymium, neodymium, promethium, samarium, europium, gadolinium, terbium, dysprosium, holmium, erbium, thulium, ytterbium.

They are also called ‘rare earths’, but that term is often also applied to 3 more elements. Why? That’s a fascinating puzzle in its own right. But what matters to me now is something else: an apparent connection between the lanthanides and the exceptional Lie group G2!

Alas, this connection remains profoundly mysterious to me, so I’m pleading for your help.

Why are there 14 lanthanides? It’s because

  • the electrons in the f subshell have orbital angular momentum 33,
  • the irreducible representation of SO(3)SO(3) corresponding to angular momentum j=3j = 3 has dimension 2j+1=72j+1 = 7, and
  • each electron can also come in 22 spin states, for a total of 2×7=142 \times 7 = 14 states.

What does this have to do with the exceptional Lie group G 2\mathrm{G}_2? The aforementioned 77-dimensional representation of SO(3)SO(3) can also be thought of as the space of imaginary octonions, since — rather amazingly — the key algebraic structure on the imaginary octonions, their cross product, is invariant under this representation of SO(3)SO(3). Indeed, the 7-dimensional representation of G 2\mathrm{G}_2 on the imaginary octonions remains irreducible when restricted to a certain SO(3)SO(3) subgroup of G 2\mathrm{G}_2, sometimes called SO(3) irrSO(3)_{irr} — and this gives our friend the j=3j = 3 representation of SO(3)SO(3).

All these facts were noticed and apparently put to some use by the mathematician and physicist Giulio Racah, famous for his work on the quantum mechanics of angular momentum. This was recently brought to my attention by Paul Schwahn, who is working to better understand the underlying math.

But Racah’s thoughts remain deeply mysterious, because Schwahn found them in a fragmentary second-hand account, and we haven’t been able to find more details!

Schwahn writes:

I thought me mentioning the f-orbital was just a crackpot idea.

But in the AMS volume Selected Papers of E. B. Dynkin with Commentary (which also contains Dynkin’s original discovery of SO(3)ᵢᵣᵣ) one finds a short review by Yuval Ne’eman, titled “Dynkin Diagrams in the Physics of Particles, Fields and Strings”. The whole thing is a delight to read, but he writes something particularly interesting about an idea of physicist Giulio Racah:

“Racah found ways of applying various simple algebras in classifying higher spectra. His methods, later developed and extended by such as L. Biedenharn and M. Moshinsky, exploited higher rank Lie algebras applied to the representation spaces of SO(3)SO(3). I recall Racah enjoying (anecdotically) the fact that he had found an application for Cartan’s exceptional G(2)\mathrm{G}(2), in studying the f-subshell in atomic spectra. One defines an SO(7)SO(7) algebra acting on some constructs involving the 7-dimensional f-subshell representation of SO(3)SO(3) - and the inclusion G(2)SO(7)G(2) \subset SO(7) does it. In these very complicated atomic spectra of the lanthanides, it provides some physical insights.”

I really wonder that these physical insights are…

It’s possible that even if Racah’s thoughts are lost in the dark mist of time, later researchers on group representation theory and the quantum mechanics of atoms have used the Lie group G 2\mathrm{G}_2 to understand something about f subshell electrons. For example, Biedenharn’s book may contain some clues. But I haven’t yet turned up any clues yet.

May 29, 2024

Doug NatelsonInteresting reading - resonators, quantum geometry w/ phonons, and fractional quantum anomalous Hall

 Real life continues to be busy, but I wanted to point out three recent articles that I found interesting:

  • Mechanical resonators are a topic with a long history, going back to the first bells and the tuning fork.  I've written about micromachined resonators before, and the quest to try to get very high quality resonators.  This recent publication is very impressive.  The authors have succeeded in fabricating suspended Si3N4 resonators that are 70 nm thick but 3 cm (!!) long.  In terms of aspect ratio, that'd be like a diving board 3 cm thick and 12.8 km long.  By varying the shape of the suspended "string" along its length, they create phononic band gaps, so that some vibrations are blocked from propagating along the resonator, leading to reduced losses.  They are able to make such resonators that work at acoustic frequencies at room temperature (in vacuum) and have quality factors as high as \(6.5 \times 10^{9}\), which is amazing.  
  • Speaking of vibrations, this paper in Nature Physics is a thought-provoking piece of work.  Electrons in solids are coupled to lattice vibrations (phonons), and that's not particularly surprising.  The electronic band structure depends on how the atoms are stacked in space, and a vibration like a phonon is a particular perturbation of that atomic arrangement.  The new insight here is to look at what is being called quantum geometry and how that affects the electron-phonon coupling.  As I wrote here, electrons in crystals can be described by Bloch waves which include a function \(u_{\mathbf{k}}(\mathbf{r})\) that has the real-space periodicity of the crystal lattice.  How that function varies over \(\mathbf{k}\)-space is called quantum geometry and has all kinds of consequences (e.g., here and here).  It turns out that this piece of the band structure can have a big and sometimes dominant influence on the coupling between mobile electrons and phonons.
  • Speaking of quantum geometry and all that, here is a nice article in Quanta about the observation of the fractional quantum anomalous Hall effect in different 2D material systems.  In the "ordinary" fractional quantum Hall effect, topology and interactions combine at low temperatures and (usually) high magnetic fields in clean 2D materials to give unusual electronic states with, e.g., fractionally charged low energy excitations.  Recent exciting advances have found related fractional Chern insulator states in various 2D materials at zero magnetic field.  The article does a nice job capturing the excitement of these recent works.

Matt Strassler Celebrating the Standard Model: The Twins We’re Made Of

At the core of every atom lies its nucleus, where protons and neutrons are found. As their names suggest, these two subatomic particles are profoundly different.

  • Protons carry positive electric charge, and can attract negatively-charged electrons, making atoms possible.
  • Neutrons have no electric charge and are thus electrically neutral, hence their name; they have no impact on the electrons in atoms.

The distinctions extend to their magnetic effects. Both protons and neutrons have a “magnetic moment,” meaning that in a magnetic field, they will point like compasses. But neutrons point in the opposite direction from protons, and less agressively.

Nevertheless, the proton and neutron have almost identical masses, differing by less than two tenths of a percent! If we ignored their electric and magnetic effects, they’d almost be twins. Why are they so different in some ways and so similar in others? What does it reflect about nature?

(in units of GeV/c2)
electric charge
(in units of e)
magnetic moment
(in units of e ℏ / 2 mp)
Table 1: The masses (specifically the intrinsic, speed-independent “rest masses”) of the proton and neutron are almost identical, but their electric charge (in units of e) and magnetic responses (in units of e ℏ / 2 mp, where mp is the proton’s mass) are quite different.

To resolve this puzzle required three stages of enlightenment…

Step 1: The Nuclear Force is Strong in These Ones

Atomic nuclei were a puzzle for several decades. The proton was discovered, and identified as the nucleus of hydrogen, before 1920. But other nuclei had larger electric charge and mass; for instance, the helium nucleus has double the charge and about four times the mass of a proton. Only in 1932 was the neutron discovered, after which point it soon became clear that nuclei are made of protons and neutrons combined together. Physicists then realized that to prevent the protons’ mutual electric repulsion from blowing a nucleus apart, there must exist an additional attractive force between the protons and neutrons, now known to be an effect of the “strong nuclear force”, that pulls harder and holds the nucleus together.

Almost immediately following the discovery of the neutron, and noting its similar mass to that of the proton, Heisenberg proposed that perhaps they were the same particle in two different manifestations, despite their different electric charges. Soon it was learned that small atomic nuclei that differ only in the replacing of one proton with one neutron often have remarkably similar masses. For example,

nucleusMagnesium 27Aluminum 27Silicon 27
# protons121314
# neutrons151413
mass (in GeV/c^2)25.135725.133125.1380

Thus, not only are protons and neutrons in isolation almost interchangable (excepting electromagnetism), they remain so when bound together by the strong nuclear force. This is a clue that the strong nuclear force treats them identically, or nearly so. Meanwhile, although their different electromagnetic properties seem of great importance to us at first, they are actually little more than a shiny but irrelevant detail, akin to two different paint colors on cars of exactly the same make.

It turns out the proton and neutron are not quite the same object. But their similarities can still be attributed to similarities in their contents.

Step 2: Bags of Three

Based on the properties of many other particles discovered in the 1940s and 1950s, both Murray Gell-Mann and George Zweig (see also work by A. Petermann) proposed an idea that I’ll refer to as “kuarqs”, in which

  • the proton involves two up kuarqs and one down kuarq;
  • the neutron involves two down kuarqs and one up kuarq;
  • the reason that the proton and neutron are twins is that the up kuarq and down kuarq are twins, differing only in their electric and magnetic effects.

You should note, in addition to my odd spelling, that I did not say “the proton is made of two up kuarqs and one down kuarq”. That’s for a very good reason.

Some physicists, including Zweig, considered that these kuarqs might truly be particles inside a proton. In this view, much as a helium nucleus is a bag made of two protons and two neutrons, each carrying about a quarter of the nucleus’s mass, a proton would be a bag made of three kuarqs, each kuarq carrying a third of the proton’s mass. The neutron would be the same except with one up kuarq replaced with one down kuarq.

Fig. 1: An oversimplified vision of protons as made from two up quarks and a down quark, and neutrons as made from two down quarks and an up quark --- and nothing else.
A naive picture: protons and neutrons made from three kuarqs each.

These physicists were able to make quite a lot of successful predictions using this viewpoint, in which:

quantityup kuarqdown kuarq
(in units of GeV/c2)
0.30 – 0.330.30 – 0.33
electric charge
(in units of e)
2/3– 1/3
magnetic moment
(in units of e ℏ / 2 mkuarq)
1.9– 0.9
Table 3: The simplistic picture of protons and neutrons made from three kuarqs requires they have the above properties; specifically, their masses are roughly 1/3 that of a proton or neutron.

But Gell-Mann (and to some extent Zweig also) emphasized that it would be a mistake to literally view the proton as a simple bag of three objects. The strong nuclear force is too strong for this; such a simplistic view would make the picture inconsistent. Most importantly, other types of related particles, especially pions, would be impossible to explain in a simple way using this method; so how could one expect protons and neutrons to be so simple?

Gell-Mann therefore argued that his kuarqs were mainly a mathematical trick, an organizing device, and were unlikely to actually exist as actual particles. Even if they did exist, he reasoned, they should have very large masses, with the proton mass reduced by the strong nuclear force (due to binding energy, which makes an atom’s mass slightly less than the combined mass of its electrons, protons and neutrons, and similarly reduces the mass of a nucleus below that of its protons and neutrons.)

Step 3: Bags of Plenty

The full story only began to become clear ten years later, in the early 1970s. It turned out that Gell-Mann was right: his kuarqs do not exist. And yet they reflect something that does: a subset of the elementary particles that we call “quarks”.

There are indeed up and down quarks, just as there are up and down kuarqs. But in contrast to kuarqs,

  • quarks are real particles, not mere mathematical tools;
  • the up and down quarks are not twins;
  • protons and neutrons are not made from three quarks.
quantityup quarkdown quark
(in units of GeV/c2)
electric charge
(in units of e)
2/3– 1/3
magnetic moment
(in units of e ℏ / 2 mquark)
Table 4: The elementary up and down quarks. Their masses cannot be precisely determined, but are small and quite different. Their electric charges are the same as for the kuarqs. The magnetic properties of individual quarks are both simple — that of elementary particles — and complex — thanks to the strong nuclear force — but they are certainly very different from those of the kuarqs, thanks to their small masses.

As you see, quarks are very different from kuarqs; their masses are very small compared to a proton’s mass, and the down quark mass is more than double that of the up quark. (Actually it took several decades for the table shown above to stabilize, because quarks are never seen individually and their masses must be inferred indirectly.)

The picture of a proton and neutron is then also very different. Instead of imagining three kuarqs moving slowly around a proton, one finds large numbers of fast-moving particles inside. The proton and neutron have almost identical interiors; they contain essentially the same combinations of quarks, anti-quarks and gluons. Their only difference is that a single up quark of the former is exchanged for a single down quark in the latter. More about this viewpoint is explained here or, more carefully, in my book chapter 6.3.

Fig. 3
A more realistic, though still quite imperfect, snapshot of a proton and neutron: full of quarks (u,d,s), anti-quarks (with an overbar) and gluons (g), moving around at high speed. Just a single quark distinguishes a proton from a neutron (note the arrow.)

What this means is that the proton and neutron are twins not because the up and down quarks are twins, but rather in spite of the fact that the up and down quarks are not twins. If we convert a proton to a neutron by trading an up quark for a down quark, the neutron’s mass remains the same as the proton’s because the difference between the up and down quark masses is much smaller than that of the proton’s mass, and is thus almost irrelevant.

Essentially, the strong nuclear force brings about the proton and neutron as bags of many fast-moving particles. So strong is that force that any differences in the quarks’ electric effects, magnetic effects, and even their masses are minor details, all of which combine together to explain the very small difference between the proton and neutron masses, as well as their electric and magnetic differences.

With protons and neutrons so complicated, you might well wonder why all protons are the same, all neutrons are the same, and why protons and neutrons are so similar inside. Some discussion of this quantum-physics effect is given in my book’s final chapters.

Kuarqs and Quarks

When quarks of very low mass were discovered in experiments and confirmed in theory, Gell-Mann was quick to insist that he’d known his kuarqs were real particles all along. Clearly this is revisionist history,. Not to take much away from the great man, who deserved his Nobel prize, but he was right the first time. His kuarqs were mathematical objects, and the reason that his kuarq approach (and that of Zweig) worked so well for protons, neutrons and other similar particles is indeed due to the existence of somewhat obscure mathematical symmetries, as pointed out in a wonderful 1994 paper of Dashen, Jenkins and Manohar. This paper does not settle all the issues (specifically it does not address pions and other “mesons”), but it does help make clear the senses in which kuarqs differ from quarks. It also explains why models of protons and neutrons that have no kuarqs in them at all (cf. the “Skyrme model”) can make just as good predictions as those that do, as long as they contain the same obscure mathematical symmetries. Kuarqs, in short, are useful but not necessary concepts.

This is in contrast to quarks, which are elementary particles appearing directly and explicitly in the equations of the Standard Model of particle physics. There are six types, only three of which are reflected in Gell-Mann and Zweig’s kuarqs. They are fundamental ingredients to modern computer simulations that can directly compute the difference between the proton and neutron masses. We can’t do particle physics without them.

May 26, 2024

Clifford JohnsonTumble Science Podcast Episode

For some weekend listening, there’s a fun and informative podcast for youngsters called Tumble Science Podcast. I learned of it recently because they asked to interview me for an episode, and it is now available! It is all about time travel, and I hope you (and/or yours) have fun listening … Click to continue reading this post

The post Tumble Science Podcast Episode appeared first on Asymptotia.

n-Category Café Wild Knots are Wildly Difficult to Classify

In the real world, the rope in a knot has some nonzero thickness. In math, knots are made of infinitely thin stuff. This allows mathematical knots to be tied in infinitely complicated ways — ways that are impossible for knots with nonzero thickness! These are called ‘wild’ knots.

Check out the wild knot in this video by Henry Segerman. There’s just one point where it needs to have zero thickness. So we say it’s wild at just one point. But some knots are wild at many points.

There are even knots that are wild at every point! To build these you need to recursively put in wildness at more and more places, forever. I would like to see a good picture of such an everywhere wild knot. I haven’t seen one.

Wild knots are extremely hard to classify. This is not just a feeling — it’s a theorem. Vadim Kulikov showed that wild knots are harder to classify than any sort of countable structure that you can describe using first-order classical logic with just countably many symbols!

Very roughly speaking, this means wild knots are so complicated that we can’t classify them using anything we can write down. This makes them very different from ‘tame’ knots: knots that aren’t wild. Yeah, tame knots are hard to classify, but nowhere near that hard.

Let me say a bit more about this paper:

As I mentioned, he proved wild knots are harder to classify than any sort of countable structure describable using first-order classical logic with countably many symbols. And it’s interesting how he proved this. He proved it by studying the space of all knots.

So he used logic to prove a topology problem is hard — but he also used topology to study logic!

More precisely:

Kulikov studied the topological space of all knots, which are topological embeddings KK of the circle in the 3-sphere. He also studied the equivalence relation on knots saying KKK \sim K' if there’s a homeomorphism of the 3-sphere mapping KK to KK'.

This is an example of a ‘Borel relation on a Polish space’. A Polish space is a topological space XX homeomorphic to a complete separable metric space. A Borel relation is a relation RX×XR \subseteq X \times X that’s a Borel set. For more about the definitions, click the links.

A lot of classification problems can be thought of this way: you give a Polish space of things you’re trying to classify, and an equivalence relation saying when two count as ‘the same’, which is a Borel relation. We then say a Borel relation RX×XR \subseteq X \times X is Borel reducible to a Borel relation SY×YS \subseteq Y \times Y if there’s a Borel function f:XYf: X \to Y such that

R(x,x)S(f(x),f(x)) R(x,x') \iff S(f(x), f(x')) for all x,xXx, x' \in X

In this situation people say the classification problem (X,R)(X,R) can be Borel reduced to the classification problem (Y,S)(Y,S).

This is what Kulikov used to state and prove his result. As far as I can tell, he showed:

1) Equivalence of countable models of any first-order theory with countably many symbols can be Borel reduced to equivalence of (possibly wild) knots.

2) Equivalence of knots is not Borel reducible to the equivalence of countable models of any first-order theory with countably many symbols.

At this point you start noticing that the word ‘logical’ is hiding inside the word ‘topological’.

It’s interesting to see how Kulikov proved his result — his paper is so well-written that you can follow the overall logic without sinking into the weeds of detail.

H. Friedman and L. Stanley showed that the space of countable models of any first-order theory with countably many symbols is Borel reducible to a single one of these, coming from the theory of linear orders.

This is pretty surprising to me: I wouldn’t have guessed that classifying countable linear orders was maximally difficult in this sense.

But thanks to this, to prove 1) Kulikov just needs to show:

1^\prime) Equivalence of countable linear orders can be Borel reduced to equivalence of (possibly wild) knots.

For 2), he uses a general result due to Hjorth. Suppose that a Polish group GG (a group in the category of Polish spaces) acts on a Polish space XX in a ‘turbulent’ way — some sort of highly chaotic way, defined in Kulikov’s paper. Then the Borel relation

xxgGgx=x x \sim x' \iff \exists g \in G \; g x = x'

is not Borel reducible to equivalence of countable models of any first-order theory with countably many symbols!

So Kulikov just needs to show

2^\prime) The group of homeomorphisms of the 3-sphere acts in a turbulent way on the space of topological embeddings of the circle in the 3-sphere.

Connections to category theory

How do, or could, categorical logicians think about questions like this?

For example, what do categorical logicians think about the problem of classifying countable linear orders? Is there a sense, similar to the one sketched above, in which it’s maximally hard among some class of problems? Or does dropping the axiom of choice dramatically change its status?

Also: what do they think about the topology of the space XX of countable models of a first-order theory (which Kulikov says is homeomorphic to the Cantor set)?

I imagine XX is the space of objects of a topological groupoid, where the isomorphisms are the usual isomorphisms of models. But Kulikov merely equips XX with the relation of “isomorphicness”. That’s how the makes it into a Polish space with a Borel equivalence relation.

Similarly, since we have the Polish group of homeomorphisms of S 3S^3 acting on the Polish space of embeddings K:S 1S 3K : S^1 \to S^3, the action groupoid of this groupoid should be a ‘Polish groupoid’. But Kulikov instead treats it as a Polish space with a Borel equivalence relation.

May 25, 2024

Terence TaoOn product representations of squares

I’ve just uploaded to the arXiv my paper “On product representations of squares“. This short paper answers (in the negative) a (somewhat obscure) question of Erdös. Namely, for any {k \geq 1}, let {F_k(N)} be the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no {k} distinct elements of {A} multiply to a square. In a paper by Erdös, Sárközy, and Sós, the following asymptotics were shown for fixed {k}:

  • {F_1(N) = (1+o(1)) N}.
  • {F_2(N) = (\frac{6}{\pi^2} + o(1)) N}.
  • {F_3(N) = (1+o(1)) N}.
  • {F_{4k}(N) = (1+o(1)) \frac{N}{\log N}} for {k \geq 1}.
  • {F_{4k+2}(N) = (\frac{3}{2}+o(1)) \frac{N}{\log N}} for {k \geq 1}.
  • {(\log 2 + o(1)) N \leq F_{2k+1}(N) \leq N} for {k \geq 2}.
Thus the asymptotics for {F_k(N)} for odd {k \geq 5} were not completely settled. Erdös asked if one had {F_k(N) = (1-o(1)) N} for odd {k \geq 5}. The main result of this paper is that this is not the case; that is to say, there exists {c_k>0} such that any subset {A} of {\{1,\dots,N\}} of cardinality at least {(1-c_k) N} will contain {k} distinct elements that multiply to a square, if {N} is large enough. In fact, the argument works for all {k \geq 4}, although it is not new in the even case. I will also note that there are now quite sharp upper and lower bounds on {F_k} for even {k \geq 4}, using methods from graph theory: see this recent paper of Pach and Vizer for the latest results in this direction. Thanks to the results of Granville and Soundararajan, we know that the constant {c_k} cannot exceed the Hall-Montgomery constant

\displaystyle  1 - \log(1+\sqrt{e}) + 2 \int_1^{\sqrt{e}} \frac{\log t}{t+1}\ dt = 0.171500\dots

and I (very tentatively) conjecture that this is in fact the optimal value for this constant. This looks somewhat difficult, but a more feasible conjecture would be that the {c_k} asymptotically approach the Hall-Montgomery constant as {k \rightarrow \infty}, since the aforementioned result of Granville and Soundararajan morally corresponds to the {k=\infty} case.

In the end, the argument turned out to be relatively simple; no advanced results from additive combinatorics, graph theory, or analytic number theory were required. I found it convenient to proceed via the probabilistic method (although the more combinatorial technique of double counting would also suffice here). The main idea is to generate a tuple {(\mathbf{n}_1,\dots,\mathbf{n}_k)} of distinct random natural numbers in {\{1,\dots,N\}} which multiply to a square, and which are reasonably uniformly distributed throughout {\{1,\dots,N\}}, in that each individual number {1 \leq n \leq N} is attained by one of the random variables {\mathbf{n}_i} with a probability of {O(1/N)}. If one can find such a distribution, then if the density of {A} is sufficienly close to {1}, it will happen with positive probability that each of the {\mathbf{n}_i} will lie in {A}, giving the claim.

When {k=3}, this strategy cannot work, as it contradicts the arguments of Erdös, Särközy, and Sós. The reason can be explained as follows. The most natural way to generate a triple {(\mathbf{n}_1,\mathbf{n}_2,\mathbf{n}_3)} of random natural numbers in {\{1,\dots,N\}} which multiply to a square is to set

\displaystyle  \mathbf{n}_1 := \mathbf{d}_{12} \mathbf{d}_{13}, \mathbf{n}_2 := \mathbf{d}_{12} \mathbf{d}_{23}, \mathbf{n}_3 := \mathbf{d}_{13} \mathbf{d}_{23}

for some random natural numbers {\mathbf{d}_{12} \mathbf{d}_{13}, \mathbf{d}_{23}}. But if one wants all these numbers to have magnitude {\asymp N}, one sees on taking logarithms that one would need

\displaystyle  \log \mathbf{d}_{12} + \log \mathbf{d}_{13}, \log \mathbf{d}_{12} + \log \mathbf{d}_{23}, \log \mathbf{d}_{13} + \log \mathbf{d}_{23} = \log N + O(1)

which by elementary linear algebra forces

\displaystyle  \log \mathbf{d}_{12}, \log \mathbf{d}_{13}, \log \mathbf{d}_{23} = \frac{1}{2} \log N + O(1),

so in particular each of the {\mathbf{n}_i} would have a factor comparable to {\sqrt{N}}. However, it follows from known results on the “multiplication table problem” (how many distinct integers are there in the {n \times n} multiplication table?) that most numbers up to {N} do not have a factor comparable to {\sqrt{N}}. (Quick proof: by the Hardy–Ramanujan law, a typical number of size {N} or of size {\sqrt{N}} has {(1+o(1)) \log\log N} factors, hence typically a number of size {N} will not factor into two factors of size {\sqrt{N}}.) So the above strategy cannot work for {k=3}.

However, the situation changes for larger {k}. For instance, for {k=4}, we can try the same strategy with the ansatz

\displaystyle \mathbf{n}_1 = \mathbf{d}_{12} \mathbf{d}_{13} \mathbf{d}_{14}; \quad \mathbf{n}_2 = \mathbf{d}_{12} \mathbf{d}_{23} \mathbf{d}_{24}; \quad \mathbf{n}_3 = \mathbf{d}_{13} \mathbf{d}_{23} \mathbf{d}_{34}; \quad \mathbf{n}_4 = \mathbf{d}_{14} \mathbf{d}_{24} \mathbf{d}_{34}.

Whereas before there were three (approximate) equations constraining three unknowns, now we would have four equations and six unknowns, and so we no longer have strong constraints on any of the {\mathbf{d}_{ij}}. So in principle we now have a chance to find a suitable random choice of the {\mathbf{d}_{ij}}. The most significant remaining obstacle is the Hardy–Ramanujan law: since the {\mathbf{n}_i} typically have {(1+o(1))\log\log N} prime factors, it is natural in this {k=4} case to choose each {\mathbf{d}_{ij}} to have {(\frac{1}{3}+o(1)) \log\log N} prime factors. As it turns out, if one does this (basically by requiring each prime {p \leq N^{\varepsilon^2}} to divide {\mathbf{d}_{ij}} with an independent probability of about {\frac{1}{3p}}, for some small {\varepsilon>0}, and then also adding in one large prime to bring the magnitude of the {\mathbf{n}_i} to be comparable to {N}), the calculations all work out, and one obtains the claimed result.

May 24, 2024

Matt von HippelAt Quanta This Week, and Some Bonus Material

When I moved back to Denmark, I mentioned that I was planning to do more science journalism work. The first fruit of that plan is up this week: I have a piece at Quanta Magazine about a perennially trendy topic in physics, the S-matrix.

It’s been great working with Quanta again. They’ve been thorough, attentive to the science, and patient with my still-uncertain life situation. I’m quite likely to have more pieces there in future, and I’ve got ideas cooking with other outlets as well, so stay tuned!

My piece with Quanta is relatively short, the kind of thing they used to label a “blog” rather than say a “feature”. Since the S-matrix is a pretty broad topic, there were a few things I couldn’t cover there, so I thought it would be nice to discuss them here. You can think of this as a kind of “bonus material” section for the piece. So before reading on, read my piece at Quanta first!

Welcome back!

At Quanta I wrote a kind of cartoon of the S-matrix, asking you to think about it as a matrix of probabilities, with rows for input particles and columns for output particles. There are a couple different simplifications I snuck in there, the pop physicist’s “lies to children“. One, I already flag in the piece: the entries aren’t really probabilities, they’re complex numbers, probability amplitudes.

There’s another simplification that I didn’t have space to flag. The rows and columns aren’t just lists of particles, they’re lists of particles in particular states.

What do I mean by states? A state is a complete description of a particle. A particle’s state includes its energy and momentum, including the direction it’s traveling in. It includes its spin, and the direction of its spin: for example, clockwise or counterclockwise? It also includes any charges, from the familiar electric charge to the color of a quark.

This makes the matrix even bigger than you might have thought. I was already describing an infinite matrix, one where you can have as many columns and rows as you can imagine numbers of colliding particles. But the number of rows and columns isn’t just infinite, but uncountable, as many rows and columns as there are different numbers you can use for energy and momentum.

For some of you, an uncountably infinite matrix doesn’t sound much like a matrix. But for mathematicians familiar with vector spaces, this is totally reasonable. Even if your matrix is infinite, or even uncountably infinite, it can still be useful to think about it as a matrix.

Another subtlety, which I’m sure physicists will be howling at me about: the Higgs boson is not supposed to be in the S-matrix!

In the article, I alluded to the idea that the S-matrix lets you “hide” particles that only exist momentarily inside of a particle collision. The Higgs is precisely that sort of particle, an unstable particle. And normally, the S-matrix is supposed to only describe interactions between stable particles, particles that can survive all the way to infinity.

In my defense, if you want a nice table of probabilities to put in an article, you need an unstable particle: interactions between stable particles depend on their energy and momentum, sometimes in complicated ways, while a single unstable particle will decay into a reliable set of options.

More technically, there are also contexts in which it’s totally fine to think about an S-matrix between unstable particles, even if it’s not usually how we use the idea.

My piece also didn’t have a lot of room to discuss new developments. I thought at minimum I’d say a bit more about the work of the young people I mentioned. You can think of this as an appetizer: there are a lot of people working on different aspects of this subject these days.

Part of the initial inspiration for the piece was when an editor at Quanta noticed a recent paper by Christian Copetti, Lucía Cordova, and Shota Komatsu. The paper shows an interesting case, where one of the “logical” conditions imposed in the original S-matrix bootstrap doesn’t actually apply. It ended up being too technical for the Quanta piece, but I thought I could say a bit about it, and related questions, here.

Some of the conditions imposed by the original bootstrappers seem unavoidable. Quantum mechanics makes no sense if doesn’t compute probabilities, and probabilities can’t be negative, or larger than one, so we’d better have an S-matrix that obeys those rules. Causality is another big one: we probably shouldn’t have an S-matrix that lets us send messages back in time and change the past.

Other conditions came from a mixture of intuition and observation. Crossing is a big one here. Crossing tells you that you can take an S-matrix entry with in-coming particles, and relate it to a different S-matrix entry with out-going anti-particles, using techniques from the calculus of complex numbers.

Crossing may seem quite obscure, but after some experience with S-matrices it feels obvious and intuitive. That’s why for an expert, results like the paper by Copetti, Cordova, and Komatsu seem so surprising. What they found was that a particularly exotic type of symmetry, called a non-invertible symmetry, was incompatible with crossing symmetry. They could find consistent S-matrices for theories with these strange non-invertible symmetries, but only if they threw out one of the basic assumptions of the bootstrap.

This was weird, but upon reflection not too weird. In theories with non-invertible symmetries, the behaviors of different particles are correlated together. One can’t treat far away particles as separate, the way one usually does with the S-matrix. So trying to “cross” a particle from one side of a process to another changes more than it usually would, and you need a more sophisticated approach to keep track of it. When I talked to Cordova and Komatsu, they related this to another concept called soft theorems, aspects of which have been getting a lot of attention and funding of late.

In the meantime, others have been trying to figure out where the crossing rules come from in the first place.

There were attempts in the 1970’s to understand crossing in terms of other fundamental principles. They slowed in part because, as the original S-matrix bootstrap was overtaken by QCD, there was less motivation to do this type of work anymore. But they also ran into a weird puzzle. When they tried to use the rules of crossing more broadly, only some of the things they found looked like S-matrices. Others looked like stranger, meaningless calculations.

A recent paper by Simon Caron-Huot, Mathieu Giroux, Holmfridur Hannesdottir, and Sebastian Mizera revisited these meaningless calculations, and showed that they aren’t so meaningless after all. In particular, some of them match well to the kinds of calculations people wanted to do to predict gravitational waves from colliding black holes.

Imagine a pair of black holes passing close to each other, then scattering away in different directions. Unlike particles in a collider, we have no hope of catching the black holes themselves. They’re big classical objects, and they will continue far away from us. We do catch gravitational waves, emitted from the interaction of the black holes.

This different setup turns out to give the problem a very different character. It ends up meaning that instead of the S-matrix, you want a subtly different mathematical object, one related to the original S-matrix by crossing relations. Using crossing, Caron-Huot, Giroux, Hannesdottir and Mizera found many different quantities one could observe in different situations, linked by the same rules that the original S-matrix bootstrappers used to relate S-matrix entries.

The work of these two groups is just some of the work done in the new S-matrix program, but it’s typical of where the focus is going. People are trying to understand the general rules found in the past. They want to know where they came from, and as a consequence, when they can go wrong. They have a lot to learn from the older papers, and a lot of new insights come from diligent reading. But they also have a lot of new insights to discover, based on the new tools and perspectives of the modern day. For the most part, they don’t expect to find a new unified theory of physics from bootstrapping alone. But by learning how S-matrices work in general, they expect to find valuable knowledge no matter how the future goes.

John BaezAgent-Based Models (Part 11)

Last time I began explaining how to run the Game of Life on our software for stochastic C-set rewriting systems. Remember that a stochastic stochastic C-set rewriting system consists of three parts:

• a category C that describes the type of data that’s stochastically evolving in time

• a collection of ‘rewrite rules’ that say how this data is allowed to change

• for each rewrite rule, a ‘timer’ that says the probability that we apply the rule as a function of time.

I explained all this with more mathematical precision in Part 9.

Now let’s return to an example of all this: the Game of Life. To see the code, go here.

Specifying the category C

Last time we specified a category C for the Game of Life. This takes just a tiny bit of code:

using AlgebraicABMs, Catlab, AlgebraicRewriting

@present SchLifeGraph <: SchSymmetricGraph begin 

This code actually specifies a ‘schema’ for C, as explained last time, and it calls this schema SchLifeGraph. The schema consists of three objects:

E, V, Life

four morphisms:

src: E → V
tgt: E → V
inv: E → E
life: Life → V

and three equations:

src ∘ inv = tgt
tgt ∘ inv = src
inv ∘ inv = 1E

We can automatically visualize the schema, though this doesn’t show the equations:

An instance of this schema, called a C-set, is a functor F: C → Set. In other words, it’s:

• a set of edges F(E),
• a set of vertices F(V), also called cells in the Game of Life
• a map F(src): F(E) → F(V) specifying the source of each edge,
• a map F(tgt): F(E) → F(V) specifying the target of each edge,
• a map F(inv): F(E) → F(E) that turns around each edge, switching its source and target, such that turning around an edge twice gives you the original edge again,
• a set F(Life) of living cells, and
• a map F(live): F(Life) → F(V) saying which cells are alive.

More precisely, cells in the image of F(Life) are called alive and those not in its image are called dead.

Specifying the rewrite rules and timers

Next we’ll specify 3 rewrite rules for the Game of Life, and their timers. The code looks like this; it’s terse, but it will take some time to explain:

# ## Create model by defining update rules

# A cell dies due to underpopulation if it has 
# < 2 living neighbors

underpop = 
  TickRule(:Underpop, to_life, id(Cell); 

# A cell dies due to overpopulation if it has 
# > 3 living neighbors

overpop = 
  TickRule(:Overpop, to_life, id(Cell); 

# A cell is born if it has 3 living neighbors

birth = TickRule(:Birth, id(Cell), to_life; 
                 ac=[PAC(living_neighbors(3; alive=false)),
                     NAC(living_neighbors(4; alive=false)),

These are the three rewrite rules:

underpop says a vertex in our graph switches from being alive to dead if it has less than 2 living neighbors

overpop says a vertex switches from being alive to dead if it has more than 3 living neighbors

birth says a vertex switches from being dead to alive if it has exactly 3 living neighbors.

Each of these rewrite rules comes with a timer that says the rule is applied wherever possible at each tick of the clock. This is specified by invoking TickRule, which I’ll explain in more detail elsewhere.

In Part 9 I said a bit about what a ‘rewrite rule’ actually is. I said it’s a diagram of C-sets

L \stackrel{\ell}{\hookleftarrow} I \stackrel{r}{\to} R

where \ell is monic. The idea is roughly that we can take any C-set, find a map from L into it, and replace that copy of L with a copy of R. This deserves to be explained more clearly, but right now I just want to point out that in our software, we specify each rewrite rule by giving its morphisms \ell and r.

For example,

underpop = TickRule(:Underpop, to_life, id(Cell);

says that underpop gives a rule where \ell is a morphism called to_life and r is a morphism called id(Cell). to_life is a way of picking out a living cell, and id(Cell) is a way of picking out a dead cell. So, this rewrite rule kills off a living cell. But I will explain this in more detail later.


TickRule(:Overpop, to_life, id(Cell);

kills off a living cell, and

birth = TickRule(:Birth, id(Cell), to_life;

makes a dead cell become alive.

But there’s more in the description of each of these rewrite rules, starting with a thing called ac. This stands for application conditions. To give our models more expressivity, we can require that some conditions hold for each rewrite rule to be applied! This goes beyond the framework described in Part 9.

Namely: we can impose positive application conditions, saying that certain patterns must be present for a rewrite rule to be applied. We can also impose negative application conditions, saying that some patterns must not be present. We denote the former by PAC and the latter by NAC. You can see both in our Game of Life example:

# ## Create model by defining update rules

# A cell dies due to underpopulation if it has 
# < 2 living neighbors

underpop = 
  TickRule(:Underpop, to_life, id(Cell); 

# A cell dies due to overpopulation if it has 
# > 3 living neighbors

overpop = 
  TickRule(:Overpop, to_life, id(Cell); 

# A cell is born if it has 3 living neighbors

birth = TickRule(:Birth, id(Cell), to_life; 
                 ac=[PAC(living_neighbors(3; alive=false)),
                     NAC(living_neighbors(4; alive=false)),

For underpop, the negative application condition says we cannot kill off a cell if it has 2 distinct living neighbors (or more).

For overpop, the positive application condition says we can only kill off a cell if it has 4 distinct living neighbors (or more).

For birth, the positive application condition says we can only bring a cell to life if it has 3 distinct living neighbors (or more), and the negative application conditions say we cannot bring it to life it has 4 distinct living neighbors (or more) or if it is already alive.

There’s a lot more to explain. Don’t be shy about asking questions! But I’ll stop here for now, because I’ve shown you the core aspects of Kris Brown’s code that expresses the Game of Life as a stochastic C-set writing system.

May 23, 2024

John PreskillFilm noir and quantum thermo

The Noncommuting-Charges World Tour (Part 4 of 4)

This is the final part of a four-part series covering the recent Perspective on noncommuting charges. I’ve been posting one part every ~5 weeks leading up to my PhD thesis defence. You can find Part 1 here, Part 2 here, and Part 3 here.

In four months, I’ll embark on the adventure of a lifetime—fatherhood.

To prepare, I’ve been honing a quintessential father skill—storytelling. If my son inherits even a fraction of my tastes, he’ll soon develop a passion for film noir detective stories. And really, who can resist the allure of a hardboiled detective, a femme fatale, moody chiaroscuro lighting, and plot twists that leave you reeling? For the uninitiated, here’s a quick breakdown of the genre.

To sharpen my storytelling skills, I’ve decided to channel my inner noir writer and craft this final blog post—the opportunities for future work, as outlined in the Perspective—in that style.

I wouldn’t say film noir needs to be watched in black and white like how I wouldn’t say jazz needs to be listened to on vinyl. But it adds a charm that’s hard to replicate.

Theft at the Quantum Frontier

Under the dim light of a flickering bulb, private investigator Max Kelvin leaned back in his creaky chair, nursing a cigarette. The steady patter of rain against the window was interrupted by the creak of the office door. In walked trouble. Trouble with a capital T.

She was tall, moving with a confident stride that barely masked the worry lines etched into her face. Her dark hair was pulled back in a tight bun, and her eyes were as sharp as the edges of the papers she clutched in her gloved hand.

“Mr. Kelvin?” she asked, her voice a low, smoky whisper.

“That’s what the sign says,” Max replied, taking a long drag of his cigarette, the ember glowing a fiery red. “What can I do for you, Miss…?”

“Doctor,” she corrected, her tone firm, “Shayna Majidy. I need your help. Someone’s about to scoop my research.”

Max’s eyebrows arched. “Scooped? You mean someone stole your work?”

“Yes,” Shayna said, frustration seeping into her voice. “I’ve been working on noncommuting charge physics, a topic recently highlighted in a Perspective article. But someone has stolen my paper. We need to find who did it before they send it to the local rag, The Ark Hive.”

Max leaned forward, snuffing out his cigarette and grabbing his coat in one smooth motion. “Alright, Dr. Majidy, let’s see where your work might have wandered off to.”

They started their investigation with Joey “The Ant” Guzman, an experimental physicist whose lab was a tangled maze of gleaming equipment. Superconducting qubits, quantum dots, ultracold atoms, quantum optics, and optomechanics cluttered the room, each device buzzing with the hum of cutting-edge science. Joey earned his nickname due to his meticulous and industrious nature, much like an ant in its colony.

Guzman was a prime suspect, Shayna had whispered as they approached. His experiments could validate the predictions of noncommuting charges. “The first test of noncommuting-charge thermodynamics was performed with trapped ions,” she explained, her voice low and tense. “But there’s a lot more to explore—decreased entropy production rates, increased entanglement, to name a couple. There are many platforms to test these results, and Guzman knows them all. It’s a major opportunity for future work.”

Guzman looked up from his work as they entered, his expression guarded. “Can I help you?” he asked, wiping his hands on a rag.

Max stepped forward, his eyes scanning the room. “A rag? I guess you really are a quantum mechanic.” He paused for laughter, but only silence answered. “We’re investigating some missing research,” he said, his voice calm but edged with intensity. “You wouldn’t happen to know anything about noncommuting charges, would you?”

Guzman’s eyes narrowed, a flicker of suspicion crossing his face. “Almost everyone is interested in that right now,” he replied cautiously.

Shayna stepped forward, her eyes boring into Guzman’s. “So what’s stopping you from doing experimental tests? Do you have enough qubits? Long enough decoherence times?”

Guzman shifted uncomfortably but kept his silence. Max took another drag of his cigarette, the smoke curling around his thoughts. “Alright, Guzman,” he said finally. “If you think of anything that might help, you know where to find us.”

As they left the lab, Max turned to Shayna. “He’s hiding something,” he said quietly. “But whether it’s your work or how noisy and intermediate scale his hardware is, we need more to go on.”

Shayna nodded, her face set in grim determination. The rain had stopped, but the storm was just beginning.

I bless the night my mom picked up “Who Framed Roger Rabbit” at Blockbuster. That, along with the criminally underrated “Dog City,” likely ignited my love for the genre.

Their next stop was the dimly lit office of Alex “Last Piece” Lasek, a puzzle enthusiast with a sudden obsession with noncommuting charge physics. The room was a chaotic labyrinth, papers strewn haphazardly, each covered with intricate diagrams and cryptic scrawlings. The stale aroma of old coffee and ink permeated the air.

Lasek was hunched over his desk, scribbling furiously, his eyes darting across the page. He barely acknowledged their presence as they entered. “Noncommuting charges,” he muttered, his voice a gravelly whisper, “they present a fascinating puzzle. They hinder thermalization in some ways and enhance it in others.”

“Last Piece Lasek, I presume?” Max’s voice sliced through the dense silence.

Lasek blinked, finally lifting his gaze. “Yeah, that’s me,” he said, pushing his glasses up the bridge of his nose. “Who wants to know?”

“Max Kelvin, private eye,” Max replied, flicking his card onto the cluttered desk. “And this is Dr. Majidy. We’re investigating some missing research.”

Shayna stepped forward, her eyes sweeping the room like a hawk. “I’ve read your papers, Lasek,” she said, her tone a blend of admiration and suspicion. “You live for puzzles, and this one’s as tangled as they come. How do you plan to crack it?”

Lasek shrugged, leaning back in his creaky chair. “It’s a tough nut,” he admitted, a sly smile playing at his lips. “But I’m no thief, Dr. Majidy. I’m more interested in solving the puzzle than in academic glory.”

As they exited Lasek’s shadowy lair, Max turned to Shayna. “He’s a riddle wrapped in an enigma, but he doesn’t strike me as a thief.”

Shayna nodded, her expression grim. “Then we keep digging. Time’s slipping away, and we’ve got to find the missing pieces before it’s too late.”

Their third stop was the office of Billy “Brass Knuckles,” a classical physicist infamous for his no-nonsense attitude and a knack for punching holes in established theories.

Max’s skepticism was palpable as they entered the office. “He’s a classical physicist; why would he give a damn about noncommuting charges?” he asked Shayna, raising an eyebrow.

Billy, overhearing Max’s question, let out a gravelly chuckle. “It’s not as crazy as it sounds,” he said, his eyes glinting with amusement. “Sure, the noncommutation of observables is at the core of quantum quirks like uncertainty, measurement disturbances, and the Einstein-Podolsky-Rosen paradox.”

Max nodded slowly, “Go on.”

“However,” Billy continued, leaning forward, “classical mechanics also deals with quantities that don’t commute, like rotations around different axes. So, how unique is noncommuting-charge thermodynamics to the quantum realm? What parts of this new physics can we find in classical systems?”

Shayna crossed her arms, a devious smile playing on her lips. “Wouldn’t you like to know?”

“Wouldn’t we all?” Billy retorted, his grin mirroring hers. “But I’m about to retire. I’m not the one sneaking around your work.”

Max studied Billy for a moment longer, then nodded. “Alright, Brass Knuckles. Thanks for your time.”

As they stepped out of the shadowy office and into the damp night air, Shayna turned to Max. “Another dead end?”

Max nodded and lit a cigarette, the smoke curling into the misty air. “Seems so. But the clock’s ticking, and we can’t afford to stop now.”

If you want contemporary takes on the genre, Sin City (2005), Memento (2000), and L.A. Confidential (1997) each deliver in their own distinct ways.

Their fourth suspect, Tony “Munchies” Munsoni, was a specialist in chaos theory and thermodynamics, with an insatiable appetite for both science and snacks.

“Another non-quantum physicist?” Max muttered to Shayna, raising an eyebrow.

Shayna nodded, a glint of excitement in her eyes. “The most thrilling discoveries often happen at the crossroads of different fields.”

Dr. Munson looked up from his desk as they entered, setting aside his bag of chips with a wry smile. “I’ve read the Perspective article,” he said, getting straight to the point. “I agree—every chaotic or thermodynamic phenomenon deserves another look under the lens of noncommuting charges.”

Max leaned against the doorframe, studying Munsoni closely.

“We’ve seen how they shake up the Eigenstate Thermalization Hypothesis, monitored quantum circuits, fluctuation relations, and Page curves,” Munson continued, his eyes alight with intellectual fervour. “There’s so much more to uncover. Think about their impact on diffusion coefficients, transport relations, thermalization times, out-of-time-ordered correlators, operator spreading, and quantum-complexity growth.”

Shayna leaned in, clearly intrigued. “Which avenue do you think holds the most promise?”

Munsoni’s enthusiasm dimmed slightly, his expression turning regretful. “I’d love to dive into this, but I’m swamped with other projects right now. Give me a few months, and then you can start grilling me.”

Max glanced at Shayna, then back at Munsoni. “Alright, Munchies. If you hear anything or stumble upon any unusual findings, keep us in the loop.”

As they stepped back into the dimly lit hallway, Max turned to Shayna. “I saw his calendar; he’s telling the truth. His schedule is too packed to be stealing your work.”

Shayna’s shoulders slumped slightly. “Maybe. But we’re not done yet. The clock’s ticking, and we’ve got to keep moving.”

Finally, they turned to a pair of researchers dabbling in the peripheries of quantum thermodynamics. One was Twitch Uppity, an expert on non-Abelian gauge theories. The other, Jada LeShock, specialized in hydrodynamics and heavy-ion collisions.

Max leaned against the doorframe, his voice casual but probing. “What exactly are non-Abelian gauge theories?” he asked (setting up the exposition for the Quantum Frontiers reader’s benefit).

Uppity looked up, his eyes showing the weary patience of someone who had explained this concept countless times. “Imagine different particles interacting, like magnets and electric charges,” he began, his voice steady. “We describe the rules for these interactions using mathematical objects called ‘fields.’ These rules are called field theories. Electromagnetism is one example. Gauge theories are a class of field theories where the laws of physics are invariant under certain local transformations. This means that a gauge theory includes more degrees of freedom than the physical system it represents. We can choose a ‘gauge’ to eliminate the extra degrees of freedom, making the math simpler.”

Max nodded slowly, his eyes fixed on Uppity. “Go on.”

“These transformations form what is called a gauge group,” Uppity continued, taking a sip of his coffee. “Electromagnetism is described by the gauge group U(1). Other interactions are described by more complex gauge groups. For instance, quantum chromodynamics, or QCD, uses an SU(3) symmetry and describes the strong force between particles in an atom. QCD is a non-Abelian gauge theory because its gauge group is noncommutative. This leads to many intriguing effects.”

“I see the noncommuting part,” Max stated, trying to keep up. “But, what’s the connection to noncommuting charges in quantum thermodynamics?”

“That’s the golden question,” Shayna interjected, excitement in her voice. “In QCD, particle physics uses non-Abelian groups, so it may exhibit phenomena related to noncommuting charges in thermodynamics.”

“May is the keyword,” Uppity replied. “In QCD, the symmetry is local, unlike the global symmetries described in the Perspective. An open question is how much noncommuting-charge quantum thermodynamics applies to non-Abelian gauge theories.”

Max turned his gaze to Jada. “How about you? What are hydrodynamics and heavy-ion collisions?” he asked, setting up more exposition.

Jada dropped her pencil and raised her head. “Hydrodynamics is the study of fluid motion and the forces acting on them,” she began. “We focus on large-scale properties, assuming that even if the fluid isn’t in equilibrium as a whole, small regions within it are. Hydrodynamics can explain systems in condensed matter and stages of heavy-ion collisions—collisions between large atomic nuclei at high speeds.”

“Where does the non-Abelian part come in?” Max asked, his curiosity piqued.

“Hydrodynamics researchers have identified specific effects caused by non-Abelian symmetries,” Jada answered. “These include non-Abelian contributions to conductivity, effects on entropy currents, and shortening neutralization times in heavy-ion collisions.”

“Are you looking for more effects due to non-Abelian symmetries?” Shayna asked, her interest clear. “A long-standing question is how heavy-ion collisions thermalize. Maybe the non-Abelian ETH would help explain this?”

Jada nodded, a faint smile playing on her lips. “That’s the hope. But as with all cutting-edge research, the answers are elusive.”

Max glanced at Shayna, his eyes thoughtful. “Let’s wrap this up. We’ve got some thinking to do.”

After hearing from each researcher, Max and Shayna found themselves back at the office. The dim light of the flickering bulb cast long shadows on the walls. Max poured himself a drink. He offered one to Shayna, who declined, her eyes darting around the room, betraying her nerves.

“So,” Max said, leaning back in his chair, the creak of the wood echoing in the silence. “Everyone seems to be minding their own business. Well…” Max paused, taking a slow sip of his drink, “almost everyone.”

Shayna’s eyes widened, a flicker of panic crossing her face. “I’m not sure who you’re referring to,” she said, her voice wavering slightly. “Did you figure out who stole my work?” She took a seat, her discomfort apparent.

Max stood up and began circling Shayna’s chair like a predator stalking its prey. His eyes were sharp, scrutinizing her every move. “I couldn’t help but notice all the questions you were asking and your eyes peeking onto their desks.”

Shayna sighed, her confident façade cracking under the pressure. “You’re good, Max. Too good… No one stole my work.” Shayna looked down, her voice barely above a whisper. “I read that Perspective article. It mentioned all these promising research avenues. I wanted to see what others were working on so I could get a jump on them.”

Max shook his head, a wry smile playing on his lips. “You tried to scoop the scoopers, huh?”

Shayna nodded, looking somewhat sheepish. “I guess I got a bit carried away.”

Max chuckled, pouring himself another drink. “Science is a tough game, Dr. Majidy. Just make sure next time you play fair.”

As Shayna left the office, Max watched the rain continue to fall outside. His thoughts lingered on the strange case, a world where the race for discovery was cutthroat and unforgiving. But even in the darkest corners of competition, integrity was a prize worth keeping…

That concludes my four-part series on our recent Perspective article. I hope you had as much fun reading them as I did writing them.

n-Category Café 3d Rotations and the 7d Cross Product (Part 1)

There’s a dot product and cross product of vectors in 3 dimensions. But there’s also a dot product and cross product in 7 dimensions obeying a lot of the same identities! There’s nothing really like this in other dimensions.

The following stuff is well-known: the group of linear transformations of n\mathbb{R}^n preserving the dot and cross product is called SO(3)SO(3). It consists of rotations. We say SO(3)SO(3) has an ‘irreducible representation’ on 3\mathbb{R}^3 because there’s no linear subspace of 3\mathbb{R}^3 that’s mapped to itself by every transformation in SO(3)SO(3), except for {0}\{0\} and the whole space.

Ho hum. But here’s something more surprising: it seems that SO(3)SO(3) also has an irreducible representation on 7\mathbb{R}^7 where every transformation preserves the dot product and cross product in 7 dimensions!

That’s right—no typo there. There is not an irreducible representation of SO(7)SO(7) on 7\mathbb{R}^7 that preserves the dot product and cross product. Preserving the dot product is easy. But the cross product in 7 dimensions is a strange thing that breaks rotation symmetry.

There is, apparently, an irreducible representation of the much smaller group SO(3)SO(3) on 7\mathbb{R}^7 that preserves the dot and cross product. But I only know this because people say Dynkin proved it! More technically, it seems Dynkin said there’s an SO(3)SO(3) subgroup of G 2G_2 for which the irreducible representation of G 2\mathrm{G}_2 on 7\mathbb{R}^7 remains irreducible when restricted to this subgroup. I want to see one explicitly.

We can get the dot and cross product in 3 dimensions by taking the space of imaginary quaternions, which is 3 dimensional, and defining

vw=12(vw+wv),v×w=12(vwwv) v \cdot w= - \frac{1}{2}(v w + w v), \qquad v\times w = \frac{1}{2}(v w - w v)

The multiplication on the right-hand side of these formulas is the usual quaternion product.

We can get the dot and cross product in 7 dimensions using formulas that look just the same! But we start with the space of imaginary octonions, which is 7 dimensional, and we use the octonion product.

In both cases we get a ‘vector product algebra’. A vector product algebra is a finite-dimensional real vector space with an inner product I’ll call the dot product and denote by

:V 2 \cdot \colon V^2 \to \mathbb{R}

together with a bilinear operation I’ll call the cross product

×:V 2V \times \colon V^2 \to V

obeying three identities:

u×v=v×u u \times v = - v \times u

u(v×w)=v(w×u) u \cdot (v \times w) = v \cdot (w \times u)

(u×v)×u=(uu)v(uv)u (u \times v) \times u = (u \cdot u) v - (u \cdot v) u

These imply a bunch more identities.

You can get a vector product algebra from a normed division algebra by taking the subspace of imaginary elements, namely those orthogonal to 11, and defining a dot and cross product using the formulas above. You can also reverse this process. Since there are only four normed division algebras, ,,\mathbb{R}, \mathbb{C}, \mathbb{H} and 𝕆\mathbb{O}, there are only four vector product algebras! But you can also run this argument backwards, which is nice because there’s a great string diagram proof that there are only four vector product algebras:

The four vector product algebras have dimensions 0, 1, 3, and 7. But only the last two are interesting, since in the first two the cross product is zero.

In fact the category of normed division algebras and algebra homomorphisms (which automatically preserve the inner product these algebras have) is equivalent to the category of vector product algebras. Thus the group of automorphisms of the 7-dimensional vector product algebra is isomorphic to the group of automorphisms of 𝕆\mathbb{O}. This group is called G 2\mathrm{G}_2.

Recently on Mastodon Paul Schwahn wrote:

The compact Lie group G 2\mathrm{G}_2, usually defined as automorphism group of the octonion algebra 𝕆\mathbb{O}, has (up to conjugacy) three maximal connected subgroups:

  • the subgroup preserving the algebra of quaternions 𝕆\mathbb{H} \subset \mathbb{O} which is isomorphic to SO(4)SO(4),
  • the subgroup preserving some imaginary element like ii, which is isomorphic to SU(3),SU(3),
  • the subgroup SO(3) irrSO(3)_{\text{irr}} given by the image of the irreducible, faithful 7-dimensional real representation of SO(3)SO(3). This representation may be realized as the space of harmonic cubic homogeneous polynomials on 3,\mathbb{R}^3, or if you are a chemist, the space of ff-orbital wavefunctions.

Now I wonder whether SO(3) irrSO(3)_{\text{irr}} also has some interpretation in terms of the octonions. What irreducible action of SO(3)SO(3) on the imaginary octonions is there?

@johncarlosbaez, do you perhaps have an idea?

Alas, I’m a bit stuck!

I know the compact Lie group G 2\mathrm{G}_2 acts irreducibly on the 7-dimensional space Im(𝕆)Im(\mathbb{O}) of imaginary octonions. In fact it’s precisely the group of linear transformations that preserves cross product on Im(𝕆)Im(\mathbb{O}), and all these automatically preserve the dot product (by the argument here).

But I don’t know a concrete example of an SO(3)SO(3) subgroup of G 2\mathrm{G}_2 that acts irreducibly on Im(𝕆)Im(\mathbb{O}). Apparently they’re all conjugate, so we can pick any one and call it SO(3) irrSO(3)_{\text{irr}}. Can anyone here describe one concretely?

I do know how to get SO(3)SO(3) to act irreducibly on a 7-dimensional space. It’s called the spin-3 representation of SO(3)SO(3), or more precisely it’s the real form of that. Up to isomorphism this is the only irreducible representation of SO(3)SO(3) on a 7d real vector space, so the representation of SO(3) irrG 2SO(3)_{\text{irr}} \subset \mathrm{G}_2 on Im(𝕆)Im(\mathbb{O}) must be this.

There may be an explicit description of SO(3) irrSO(3)_{\text{irr}} in here:

  • E. B. Dynkin, Semisimple subalgebras of semisimple Lie algebras, American Mathematical Society Translations, Series 2, Volume 6, 1957.

But I haven’t dug into this text yet. I’ll try. But if you happen to know such an explicit description, please tell me!

May 22, 2024

Robert HellingWhat happens to particles after they have been interacting according to Bohm?

 Once more, I am trying to better understand the Bohmian or pilot wave approach to quantum mechanics. And I came across this technical question, which I have not been able to successfully answer from the literature:

Consider a particle, described by a wave function \(\psi(x)\) and a Bohmian position \(q\) that both happily evolve in time according to the Schrödinger equation and the Bohmian equation of motion along the flow field. Now, at some point in time, the (actual) position of that particle gets recorded, either using a photographic plate oder by flying through a bubble chamber or similar. 

Unless I am not mistaken, following the "having a position is the defining property of a particle"-mantra, what is getting recorded is \(q\). After all, the fact, that there is exactly one place on a photographic place that gets dark was the the original motivation of introducing the particle position denoted by \(q\). So far, so good (I hope).

My question, however, is: What happens next? What value of \(q\) am I supposed to take for the further time evolution? I see three possibilities:

  1. I use the \(q\) that was recorded.
  2. Thanks to the recording, the wave function collapses to an appropriate eigenstate (possibly my measurement was not exact, I just inferred that the particle is inside some interval, then the wave function only gets projected to that interval) and thanks to the interaction all I can know is that \(q\) is then randomly distributed according to \(|P\psi|^2\) (where \(P\) is the projector) ("new equilibrium").
  3. Anything can happen, depending on the detailed inner workings and degrees of freedom of the recording device, after all the Bohmian flow equation is non-local and involves all degrees of freedom in the universe.
  4. Something else
All three sound somewhat reasonable, but upon further inspection, all of them have drawbacks: If option 1 were the case, that would have just prepared the position \(q\) for the further evolution. Allowing this to happen, opens the door to faster than light signalling as I explained before in this paper. Option 2 gives up the deterministic nature of the theory and allows for random jumps of the "true" position of the particle. This is even worse for option 3: Of course, you can always say this and think you are safe. If there are other particles beyond the one recorded and their wave functions are entangled, option 3 completely gives up on making any prediction about the future also of those other particles. Note that more orthodox interpretations of quantum mechanics (like Copenhagen, whatever you understand under this name) does make very precise predictions about those other particles after an entangled one has been measured. So that would be a shortcoming of the Bohmian approach.

I am honestly interested in the answer to this question. So please comment if you know or have an opinion!

May 21, 2024

Clifford JohnsonWhen Worlds Collide…

This morning I had a really fantastic meeting with some filmmakers about scientific aspects of the visuals (and other content) for a film to appear on your screens one day, and also discussed finding time to chat with one of the leads in order to help them get familiar with aspects of the world (and perhaps mindset) of a theoretical physicist. (It was part of a long series of very productive meetings about which I can really say nothing more at the current time, but I'm quite sure you'll hear about this film in the fullness of time.)

Then a bit later I had a chat with my wife about logistical aspects of the day so that she can make time to go down to Los Angeles and do an audition for a role in something. So far, so routine, and I carried on with some computations I was doing (some lovely clarity had arrived earlier and various piece of a puzzle fell together marvellously)...

But then, a bit later in the morning while doing a search, I stumbled upon some mention of the recent Breakthrough Prize ceremony, and found the video below [...] Click to continue reading this post

The post When Worlds Collide… appeared first on Asymptotia.

Scott Aaronson Openness on OpenAI

I am, of course, sad that Jan Leike and Ilya Sutskever, the two central people who recruited me to OpenAI and then served as my “bosses” there—two people for whom I developed tremendous admiration—have both now resigned from the company. Ilya’s resignation followed the board drama six months ago, but Jan’s resignation last week came as a shock to me and others. The Superalignment team, which Jan and Ilya led and which I was part of, is being split up and merged into other teams at OpenAI.

See here for Ilya’s parting statement, and here for Jan’s. See here for Zvi Mowshowitz’s perspective and summary of reporting on these events. For additional takes, see pretty much the entire rest of the nerd Internet.

As for me? My two-year leave at OpenAI was scheduled to end this summer anyway. It seems pretty clear that I ought to spend my remaining months at OpenAI simply doing my best for AI safety—for example, by shepherding watermarking toward deployment. After a long delay, I’m gratified that interest in watermarking has spiked recently, not only within OpenAI and other companies but among legislative bodies in the US and Europe.

And afterwards? I’ll certainly continue thinking about how AI is changing the world and how (if at all) we can steer its development to avoid catastrophes, because how could I not think about that? I spent 15 years mostly avoiding the subject, and that now seems like a huge mistake, and probably like enough of that mistake for one lifetime.

So I’ll continue looking for juicy open problems in complexity theory that are motivated by interpretability, or scalable oversight, or dangerous capability evaluations, or other aspects of AI safety—I’ve already identified a few such problems! And without giving up on quantum computing (because how could I?), I expect to reorient at least some of my academic work toward problems at the interface of theoretical computer science and AI safety, and to recruit students who want to work on those problems, and to apply for grants about them. And I’ll presumably continue giving talks about this stuff, and doing podcasts and panels and so on—anyway, as long as people keep asking me to!

And I’ll be open to future sabbaticals or consulting arrangements with AI organizations, like the one I’ve done at OpenAI. But I expect that my main identity will always be as an academic. Certainly I never want to be in a position where I have to speak for an organization rather than myself, or censor what I can say in public about the central problems I’m working on, or sign a nondisparagement agreement or anything of the kind.

I can tell you this: in two years at OpenAI, hanging out at the office and meeting the leadership and rank-and-file engineers, I never once found a smoke-filled room where they laugh at all the rubes who take the talk about “safety” and “alignment” seriously. While my interactions were admittedly skewed toward safetyists, the OpenAI folks I met were invariably smart and earnest and dead serious about the mission of getting AI right for humankind.

It’s more than fair for outsiders to ask whether that’s enough, whether even good intentions can survive bad incentives. It’s likewise fair of them to ask: what fraction of compute and other resources ought to be set aside for alignment research? What exactly should OpenAI do on alignment going forward? What should governments force them and other AI companies to do? What should employees and ex-employees be allowed, or encouraged, to share publicly?

I don’t know the answers to these questions, but if you do, feel free to tell me in the comments!

May 20, 2024

Clifford JohnsonCatching Up

Since you asked, I should indeed say a few words about how things have been going since I left my previous position and moved to being faculty at the Santa Barbara Department of Physics.

It's Simply Wonderful!

(Well, that's really four I suppose, depending upon whether you count the contraction as one or two.)

Really though, I've been having a great time. It is such a wonderful department with welcoming colleagues doing fantastic work in so many areas of physics. There's overall a real feeling of community, and of looking out for the best for each other, and there's a sense that the department is highly valued (and listened to) across the wider campus. From the moment I arrived I've had any number of excellent students, postdocs, and faculty knocking on my door, interested in finding out what I'm working on, looking for projects, someone to bounce an idea off, to collaborate, and more.

We've restarted the habit of regular (several times a week) lunch gatherings within the group, chatting about physics ideas we're working on, things we've heard about, papers we're reading, classes we're teaching and so forth. This has been a true delight, since that connectivity with colleagues has been absent in my physics life for very many years now and I've sorely missed it. Moreover, there's a nostalgic aspect to it as well: This is the very routine (often with the same places and some of the same people) that I had as a postdoc back in the mid 1990s, and it really helped shape the physicist I was to become, so it is a delight to continue the tradition.

And I have not even got to mentioning the Kavli Institute for Theoretical Physics (KITP) [....] Click to continue reading this post

The post Catching Up appeared first on Asymptotia.

Clifford JohnsonRecurrence Relations

(A more technical post follows.) By the way, in both sets of talks that I mentioned in the previous post, early on I started talking about orthogonal polynomials , and how they generically satisfy a three-term recurrence relation (or recursion relation): Someone raised their hand and ask why it truncates … Click to continue reading this post

The post Recurrence Relations appeared first on Asymptotia.

John Preskill“Once Upon a Time”…with a twist

The Noncommuting-Charges World Tour (Part 1 of 4)

This is the first part in a four part series covering the recent Perspectives article on noncommuting charges. I’ll be posting one part every ~6 weeks leading up to my PhD thesis defence.

Thermodynamics problems have surprisingly many similarities with fairy tales. For example, most of them begin with a familiar opening. In thermodynamics, the phrase “Consider an isolated box of particles” serves a similar purpose to “Once upon a time” in fairy tales—both serve as a gateway to their respective worlds. Additionally, both have been around for a long time. Thermodynamics emerged in the Victorian era to help us understand steam engines, while Beauty and the Beast and Rumpelstiltskin, for example, originated about 4000 years ago. Moreover, each conclude with important lessons. In thermodynamics, we learn hard truths such as the futility of defying the second law, while fairy tales often impart morals like the risks of accepting apples from strangers. The parallels go on; both feature archetypal characters—such as wise old men and fairy godmothers versus ideal gases and perfect insulators—and simplified models of complex ideas, like portraying clear moral dichotomies in narratives versus assuming non-interacting particles in scientific models.1

Of all the ways thermodynamic problems are like fairytale, one is most relevant to me: both have experienced modern reimagining. Sometimes, all you need is a little twist to liven things up. In thermodynamics, noncommuting conserved quantities, or charges, have added a twist.

Unfortunately, my favourite fairy tale, ‘The Hunchback of Notre-Dame,’ does not start with the classic opening line ‘Once upon a time.’ For a story that begins with this traditional phrase, ‘Cinderella’ is a great choice.

First, let me recap some of my favourite thermodynamic stories before I highlight the role that the noncommuting-charge twist plays. The first is the inevitability of the thermal state. For example, this means that, at most times, the state of most sufficiently small subsystem within the box will be close to a specific form (the thermal state).

The second is an apparent paradox that arises in quantum thermodynamics: How do the reversible processes inherent in quantum dynamics lead to irreversible phenomena such as thermalization? If you’ve been keeping up with Nicole Yunger Halpern‘s (my PhD co-advisor and fellow fan of fairytale) recent posts on the eigenstate thermalization hypothesis (ETH) (part 1 and part 2) you already know the answer. The expectation value of a quantum observable is often comprised of a sum of basis states with various phases. As time passes, these phases tend to experience destructive interference, leading to a stable expectation value over a longer period. This stable value tends to align with that of a thermal state’s. Thus, despite the apparent paradox, stationary dynamics in quantum systems are commonplace.

The third story is about how concentrations of one quantity can cause flows in another. Imagine a box of charged particles that’s initially outside of equilibrium such that there exists gradients in particle concentration and temperature across the box. The temperature gradient will cause a flow of heat (Fourier’s law) and charged particles (Seebeck effect) and the particle-concentration gradient will cause the same—a flow of particles (Fick’s law) and heat (Peltier effect). These movements are encompassed within Onsager’s theory of transport dynamics…if the gradients are very small. If you’re reading this post on your computer, the Peltier effect is likely at work for you right now by cooling your computer.

What do various derivations of the thermal state’s forms, the eigenstate thermalization hypothesis (ETH), and the Onsager coefficients have in common? Each concept is founded on the assumption that the system we’re studying contains charges that commute with each other (e.g. particle number, energy, and electric charge). It’s only recently that physicists have acknowledged that this assumption was even present.

This is important to note because not all charges commute. In fact, the noncommutation of charges leads to fundamental quantum phenomena, such as the Einstein–Podolsky–Rosen (EPR) paradox, uncertainty relations, and disturbances during measurement. This raises an intriguing question. How would the above mentioned stories change if we introduce the following twist?

“Consider an isolated box with charges that do not commute with one another.” 

This question is at the core of a burgeoning subfield that intersects quantum information, thermodynamics, and many-body physics. I had the pleasure of co-authoring a recent perspective article in Nature Reviews Physics that centres on this topic. Collaborating with me in this endeavour were three members of Nicole’s group: the avid mountain climber, Billy Braasch; the powerlifter, Aleksander Lasek; and Twesh Upadhyaya, known for his prowess in street basketball. Completing our authorship team were Nicole herself and Amir Kalev.

To give you a touchstone, let me present a simple example of a system with noncommuting charges. Imagine a chain of qubits, where each qubit interacts with its nearest and next-nearest neighbours, such as in the image below.

The figure is courtesy of the talented team at Nature. Two qubits form the system S of interest, and the rest form the environment E. A qubit’s three spin components, σa=x,y,z, form the local noncommuting charges. The dynamics locally transport and globally conserve the charges.

In this interaction, the qubits exchange quanta of spin angular momentum, forming what is known as a Heisenberg spin chain. This chain is characterized by three charges which are the total spin components in the x, y, and z directions, which I’ll refer to as Qx, Qy, and Qz, respectively. The Hamiltonian H conserves these charges, satisfying [H, Qa] = 0 for each a, and these three charges are non-commuting, [Qa, Qb] 0, for any pair a, b ∈ {x,y,z} where a≠b. It’s noteworthy that Hamiltonians can be constructed to transport various other kinds of noncommuting charges. I have discussed the procedure to do so in more detail here (to summarize that post: it essentially involves constructing a Koi pond).

This is the first in a series of blog posts where I will highlight key elements discussed in the perspective article. Motivated by requests from peers for a streamlined introduction to the subject, I’ve designed this series specifically for a target audience: graduate students in physics. Additionally, I’m gearing up to defending my PhD thesis on noncommuting-charge physics next semester and these blog posts will double as a fun way to prepare for that.

  1. This opening text was taken from the draft of my thesis. ↩

John PreskillNoncommuting charges are much like Batman

The Noncommuting-Charges World Tour Part 2 of 4

This is the second part of a four-part series covering the recent Perspective on noncommuting charges. I’ll post one part every ~5 weeks leading up to my PhD thesis defence. You can find part 1 here.

Understanding a character’s origins enriches their narrative and motivates their actions. Take Batman as an example: without knowing his backstory, he appears merely as a billionaire who might achieve more by donating his wealth rather than masquerading as a bat to combat crime. However, with the context of his tragic past, Batman transforms into a symbol designed to instill fear in the hearts of criminals. Another example involves noncommuting charges. Without understanding their origins, the question “What happens when charges don’t commute?” might appear contrived or simply devised to occupy quantum information theorists and thermodynamicists. However, understanding the context of their emergence, we find that numerous established results unravel, for various reasons, in the face of noncommuting charges. In this light, noncommuting charges are much like Batman; their backstory adds to their intrigue and clarifies their motivation. Admittedly, noncommuting charges come with fewer costumes, outside the occasional steampunk top hat my advisor Nicole Yunger Halpern might sport.

Growing up, television was my constant companion. Of all the shows I’d get lost in, ‘Batman: The Animated Series’ stands the test of time. I highly recommend giving it a watch.

In the early works I’m about to discuss, a common thread emerges: the initial breakdown of some well-understood derivations and the effort to establish a new derivation that accommodates noncommuting charges. These findings will illuminate, yet not fully capture, the multitude of results predicated on the assumption that charges commute. Removing this assumption is akin to pulling a piece from a Jenga tower, triggering a cascade of other results. Critics might argue, “If you’re merely rederiving known results, this field seems uninteresting.” However, the reality is far more compelling. As researchers diligently worked to reconstruct this theoretical framework, they have continually uncovered ways in which noncommuting charges might pave the way for new physics. That said, the exploration of these novel phenomena will be the subject of my next post, where we delve into the emerging physics. So, I invite you to stay tuned. Back to the history…

E.T. Jaynes’s 1957 formalization of the maximum entropy principle has a blink-and-you’ll-miss-it reference to noncommuting charges. Consider a quantum system, similar to the box discussed in Part 1, where our understanding of the system’s state is limited to the expectation values of certain observables. Our aim is to deduce a probability distribution for the system’s potential pure states that accurately reflects our knowledge without making unjustified assumptions. According to the maximum entropy principle, this objective is met by maximizing the entropy of the distribution, which serve as a measure of uncertainty. This resulting state is known as the generalized Gibbs ensemble. Jaynes noted that this reasoning, based on information theory for the generalized Gibbs ensemble, remains valid even when our knowledge is restricted to the expectation values of noncommuting charges. However, later scholars have highlighted that physically substantiating the generalized Gibbs ensemble becomes significantly more challenging when the charges do not commute. Due to this and other reasons, when the system’s charges do not commute, the generalized Gibbs ensemble is specifically referred to as the non-Abelian thermal state (NATS).

For approximately 60 years, discussions about noncommuting charges remain dormant, outside a few mentions here and there. This changed when two studies highlighted how noncommuting charges break commonplace thermodynamics derivations. The first of these, conducted by Matteo Lostaglio as part of his 2014 thesis, challenged expectations about a system’s free energy—a measure of the system’s capacity for performing work. Interestingly, one can define a free energy for each charge within a system. Imagine a scenario where a system with commuting charges comes into contact with an environment that also has commuting charges. We then evolve the system such that the total charges in both the system and the environment are conserved. This evolution alters the system’s information content and its correlation with the environment. This change in information content depends on a sum of terms. Each term depends on the average change in one of the environment’s charges and the change in the system’s free energy for that same charge. However, this neat distinction of terms according to each charge breaks down when the system and environment exchange noncommuting charges. In such cases, the terms cannot be cleanly attributed to individual charges, and the conventional derivation falters.

The second work delved into resource theories, a topic discussed at length in Quantum Frontiers blog posts. In short, resource theories are frameworks used to quantify how effectively an agent can perform a task subject to some constraints. For example, consider all allowed evolutions (those conserving energy and other charges) one can perform on a closed system. From these evolutions, what system can you not extract any work from? The answer is systems in thermal equilibrium. The method used to determine the thermal state’s structure also fails when the system includes noncommuting charges. Building on this result, three groups (one, two, and three) presented physically motivated derivations of the form of the thermal state for systems with noncommuting charges using resource-theory-related arguments. Ultimately, the form of the NATS was recovered in each work.

Just as re-examining Batman’s origin story unveils a deeper, more compelling reason behind his crusade against crime, diving into the history and implications of noncommuting charges reveals their untapped potential for new physics. Behind every mask—or theory—there can lie an untold story. Earlier, I hinted at how reevaluating results with noncommuting charges opens the door to new physics. A specific example, initially veiled in Part 1, involves the violation of the Onsager coefficients’ derivation by noncommuting charges. By recalculating these coefficients for systems with noncommuting charges, we discover that their noncommutation can decrease entropy production. In Part 3, we’ll delve into other new physics that stems from charges’ noncommutation, exploring how noncommuting charges, akin to Batman, can really pack a punch.

Tommaso DorigoA Cool Rare Decay

By and large, particle physicists confronted with the need to awe and enthuse an audience of laypersons will have no hesitation in choosing to speak about the Higgs boson and its mysteries - undoubtedly a fascinating story that requires one to start with the 1960ies and the intuition of a handful of theoretical physicists, and then grows epic in a crescendo of colliders that sought and missed the Higgs boson, and then the LHC which finally found the elusive signal of production and decay of that particle.

read more

Andrew JaffeIt’s been a while

If you’re reading this, then you might realise that I haven’t posted anything substantive here since 2018, commemorating the near-end of the Planck collaboration. In fact it took us well into the covid pandemic before the last of the official Planck papers were published, and further improved analyses of our data continues, alongside the use of the results as the closest thing we have to a standard cosmological model, despite ongoing worries about tensions between data from Planck and other measurements of the cosmological parameters.

As the years have passed, it has felt more and more difficult to add to this blog, but I recently decided to move to a new host and blogging software (cheaper and better than my previous setup, which nonetheless served me well for almost two decades until I received a message from my old hosting company that the site was being used as part of a bot-net…).

So, I’m back. Topics for the near future might include:

  • The book (the first draft of which) I have just finished writing;
  • Meralgia paraesthetica;
  • My upcoming sabbatical (Japan, New York, Leiden);
  • Cosmology with the Simons Observatory, Euclid, LISA, and other coming missions;
  • Monte Carlo sampling;
  • The topology of the Universe;
  • Parenthood;
  • rock ‘n’ roll; and (unfortunately but unavoidably)
  • the dysfunctional politics of my adopted home in the UK and the even more dysfunctional politics of my native USA (where, because of the aforementioned sabbatical, I will probably be when the next president takes office in 2025).

Clifford JohnsonMulticritical Matrix Model Miracles

Well, that was my title for my seminar last Thursday at the KITP. My plan was to explain more the techniques behind some of the work I've been doing over the last few years, in particular the business of treating multicritical matrix models as building blocks for making more complicated theories of gravity.

chalkboard from KITP seminar

The seminar ended up being a bit scattered in places as I realised that I had to re-adjust my ambitions to match limitations of time, and so ended up improvising here and there to explain certain computational details more, partly in response to questions. This always happens of course, and I sort of knew it would at the outset (as was clear from my opening remarks of the talk). The point is that I work on a set of techniques that are very powerful at what they do, and most people of a certain generation don't know those techniques as they fell out of vogue a long time ago. In the last few years I've resurrected them and developed them to a point where they can now do some marvellous things. But when I give talks about them it means I have a choice: I can quickly summarise and then get to the new results, in which case people think I'm performing magic tricks since they don't know the methods, or I can try to unpack and review the methods, in which case I never get to the new results. Either way, you're not likely to get people to dive in and help move the research program forward, which should be the main point of explaining your results. (The same problem occurs to some extent when I write papers on this stuff: short paper getting swiftly to the point, or long paper laying out all the methods first? The last time I did the latter, tons of new results got missed inside what people thought was largely just a review paper, so I'm not doing that any more.)

Anyway, so I ended up trying at least to explain what (basic) multicritical matrix models were, since it turns out that most people don't know these days what the (often invoked) double scaling limit of a matrix model really is, in detail. This ended up taking most of the hour, so I at least managed to get that across, and whet the appetite of the younger people in the audience to learn more about how this stuff works and appreciate how very approachable these techniques are. I spent a good amount of time trying to show how to compute everything from scratch - part of the demystifying process.

I did mention (and worked out detailed notes on) briefly a different class of [...] Click to continue reading this post

The post Multicritical Matrix Model Miracles appeared first on Asymptotia.

May 19, 2024

Doug NatelsonPower and computing

The Wall Street Journal last week had an article (sorry about the paywall) titled "There’s Not Enough Power for America’s High-Tech Ambitions", about how there is enormous demand for more data centers (think Amazon Web Services and the like), and electricity production can't readily keep up.  I've written about this before, and this is part of the motivation for programs like FuSE (NSF's Future of Semiconductors call).  It seems that we are going to be faced with a choice: slow down the growth of computing demand (which seems unlikely, particularly with the rise of AI-related computing, to say nothing of cryptocurrencies); develop massive new electrical generating capacity (much as I like nuclear power, it's hard for me to believe that small modular reactors will really be installed at scale at data centers); or develop approaches to computing that are far more energy efficient; or some combination.  

The standard computing architecture that's been employed since the 1940s is attributed to von Neumann.  Binary numbers (1, 0) are represented by two different voltage levels (say some \(V\) for a 1 and \(V \approx 0\) for a 0); memory functions and logical operations happen in two different places (e.g., your DRAM and your CPU), with information shuttled back and forth as needed.  The key ingredient in conventional computers is the field-effect transistor (FET), a voltage-activated switch, in which a third (gate) electrode can switch the current flow between a source electrode and a drain electrode.  

The idea that we should try to lower power consumption of computing hardware is far from new.  Indeed, NSF ran a science and technology center for a decade at Berkeley about exploring more energy-efficient approaches.  The simplest approach, as Moore's Law cooked along in the 1970s, 80s, and 90s, was to steadily try to reduce the magnitude of the operating voltages on chips.  Very roughly speaking, power consumption goes as \(V^{2}\).  The losses in the wiring and transistors scale like \(I \cdot V\); the losses in the capacitors that are parts of the transistors scale like some fraction of the stored energy, which is also like \(V^{2}\).  For FETs to still work, one wants to keep the same amount of gated charge density when switching, meaning that the capacitance per area has to stay the same, so dropping \(V\) means reducing the thickness of the gate dielectric layer.  This went on for a while with SiO2 as the insulator, and eventually in the early 2000s the switch was made to a higher dielectric constant material because SiO2 could not be made any thinner.  Since the 1970s, the operating voltage \(V\) has fallen from 5 V to around 1 V.  There are also clever schemes now to try to vary the voltage dynamically.  For example, one might be willing to live with higher error rates in the least significant bits of some calculations (like video or audio playback) if it means lower power consumption.  With conventional architectures, voltage scaling has been taken about as far as it can go.

Way back in 2006, I went to a conference and Eli Yablonovitch talked at me over dinner about how we needed to be thinking about far lower voltage operations.  Basically, his argument was that if we are using voltages that are far greater than the thermal voltage noise in our wires and devices, we are wasting energy.  With conventional transistors, though, we're kind of stuck because of issues like subthreshold swing.  

So what are the options?  There are many ideas out there. 
  • Change materials.  There are materials that have metal-insulator transitions, for example, such that it might be possible to trigger dramatic changes in conduction (for switching purposes) with small stimuli, evading the device physics responsible for the subthreshold slope argument.  
  • Change architectures.  Having memory and logic physically separated isn't the only way to do digital computing.  The idea of "logic-in-memory" computing goes back to before I was born.  
  • Radically change architectures.  As I've written before, there is great interest in neuromorphic computing, trying to make devices with connectivity and function designed to mimic the way neurons work in biological brains.  This would likely mean analog rather than digital logic and memory, complex history-dependent responses, and trying to get vastly improved connectivity.  As was published last week in Science, 1 cubic millimeter of brain tissue contains 57,000 cells and 150,000,000 synapses.  Trying to duplicate that level of 3D integration at scale is going to be very hard.  The approach of just making something that starts with crazy but uncontrolled connectivity and training it somehow (e.g., this idea from 2002) may reappear.
  • Update: A user on twitter pointed out that the time may finally be right for superconducting electronics.  Here is a recent article in IEEE Spectrum about this, and here is a youtube video of a pretty good intro.  The technology of interest is "rapid single-flux quantum" (RSFQ) logic, where information is stored in circulating current loops in devices based on Josephson junctions.  The compelling aspects include intrinsically ultralow power dissipation b/c of superconductivity, and intrinsically fast timescales (clock speeds of hundreds of GHz) because of the frequency scales associated with the Josephson effect.  I'm a bit skeptical, because these ideas have been around for 30+ years and the integration challenges are still significant, but maybe now the economic motivation is finally sufficient.
A huge driving constraint on everything is economics.  We are not going to decide that computing is so important that we will sacrifice refrigeration, for example; basic societal needs will limit what fraction of total generating capacity we devote to computing, and that includes concerns about impact of power generation on climate.  Likewise, switching materials or architectures is going to be very expensive at least initially, and is unlikely to be quick.  It will be interesting to see where we are in another decade.... 

May 17, 2024

Matt von HippelThe Impact of Jim Simons

The obituaries have been weirdly relevant lately.

First, a couple weeks back, Daniel Dennett died. Dennett was someone who could have had a huge impact on my life. Growing up combatively atheist in the early 2000’s, Dennett seemed to be exploring every question that mattered: how the semblance of consciousness could come from non-conscious matter, how evolution gives rise to complexity, how to raise a new generation to grow beyond religion and think seriously about the world around them. I went to Tufts to get my bachelor’s degree based on a glowing description he wrote in the acknowledgements of one of his books, and after getting there, I asked him to be my advisor.

(One of three, because the US education system, like all good games, can be min-maxed.)

I then proceeded to be far too intimidated to have a conversation with him more meaningful than “can you please sign my registration form?”

I heard a few good stories about Dennett while I was there, and I saw him debate once. I went into physics for my PhD, not philosophy.

Jim Simons died on May 10. I never spoke to him at all, not even to ask him to sign something. But he had a much bigger impact on my life.

I began my PhD at SUNY Stony Brook with a small scholarship from the Simons Foundation. The university’s Simons Center for Geometry and Physics had just opened, a shining edifice of modern glass next to the concrete blocks of the physics and math departments.

For a student aspiring to theoretical physics, the Simons Center virtually shouted a message. It taught me that physics, and especially theoretical physics, was something prestigious, something special. That if I kept going down that path I could stay in that world of shiny new buildings and daily cookie breaks with the occasional fancy jar-based desserts, of talks by artists and a café with twenty-dollar lunches (half-price once a week for students, the only time we could afford it, and still about twice what we paid elsewhere on campus). There would be garden parties with sushi buffets and late conference dinners with cauliflower steaks and watermelon salads. If I was smart enough (and I longed to be smart enough), that would be my future.

Simons and his foundation clearly wanted to say something along those lines, if not quite as filtered by the stars in a student’s eyes. He thought that theoretical physics, and research more broadly, should be something prestigious. That his favored scholars deserved more, and should demand more.

This did have weird consequences sometimes. One year, the university charged us an extra “academic excellence fee”. The story we heard was that Simons had demanded Stony Brook increase its tuition in order to accept his donations, so that it would charge more similarly to more prestigious places. As a state university, Stony Brook couldn’t do that…but it could add an extra fee. And since PhD students got their tuition, but not fees, paid by the department, we were left with an extra dent in our budgets.

The Simons Foundation created Quanta Magazine. If the Simons Center used food to tell me physics mattered, Quanta delivered the same message to professors through journalism. Suddenly, someone was writing about us, not just copying press releases but with the research and care of an investigative reporter. And they wrote about everything: not just sci-fi stories and cancer cures but abstract mathematics and the space of quantum field theories. Professors who had spent their lives straining to capture the public’s interest suddenly were shown an audience that actually wanted the real story.

In practice, the Simons Foundation made its decisions through the usual experts and grant committees. But the way we thought about it, the decisions always had a Jim Simons flavor. When others in my field applied for funding from the Foundation, they debated what Simons would want: would he support research on predictions for the LHC and LIGO? Or would he favor links to pure mathematics, or hints towards quantum gravity? Simons Collaboration Grants have an enormous impact on theoretical physics, dwarfing many other sources of funding. A grant funds an army of postdocs across the US, shifting the priorities of the field for years at a time.

Denmark has big foundations that have an outsize impact on science. Carlsberg, Villum, and the bigger-than-Denmark’s GDP Novo Nordisk have foundations with a major influence on scientific priorities. But Denmark is a country of six million. It’s much harder to have that influence on a country of three hundred million. Despite that, Simons came surprisingly close.

While we did like to think of the Foundation’s priorities as Simons’, I suspect that it will continue largely on the same track without him. Quanta Magazine is editorially independent, and clearly puts its trust in the journalists that made it what it is today.

I didn’t know Simons, I don’t think I even ever smelled one of his famous cigars. Usually, that would be enough to keep me from writing a post like this. But, through the Foundation, and now through Quanta, he’s been there with me the last fourteen years. That’s worth a reflection, at the very least.

May 12, 2024

Scott Aaronson Jim Simons (1938-2024)

When I learned of Jim Simons’s passing, I was actually at the Simons Foundation headquarters in lower Manhattan, for the annual board meeting of the unparalleled Quanta Magazine, which Simons founded and named. The meeting was interrupted to share the sad news, before it became public … and then it was continued, because that’s obviously what Simons would’ve wanted. An oil portrait of Simons in the conference room took on new meaning.

See here for the Simons Foundation’s announcement, or here for the NYT’s obituary.

Although the Simons Foundation has had multiple significant influences on my life—funding my research, founding the Simons Institute for Theory of Computing in Berkeley that I often visit (including two weeks ago), and much more—I’ve exchanged all of a few sentences with Jim Simons himself. At a previous Simons Foundation meeting, I think he said he’d heard I’d moved from MIT to UT Austin, and asked whether I’d bought a cowboy hat yet. I said I did but I hadn’t yet worn it non-ironically, and he laughed at that. (My wife Dana knew him better, having spent a day at a brainstorming meeting for what became the Simons Institute, his trademark cigar smoke filling the room.)

I am, of course, in awe of what Jim Simons achieved in all three phases of his career — firstly, in mathematical research, where he introduced the Chern-Simons form and other pioneering contributions and led the math department at Stony Brook; secondly, in founding Renaissance and making insane amounts of money (“disproving the Efficient Market Hypothesis,” as some have claimed); and thirdly, in giving his money away to support basic research and the public understanding of it.

I’m glad that Simons, as a lifelong chain smoker, made it all the way to age 86. And I’m glad that the Simons Foundation, which I’m told will continue in perpetuity with no operational changes, will stand as a testament to his vision for the world.

May 11, 2024

Terence TaoTwo announcements: AI for Math resources, and

This post contains two unrelated announcements. Firstly, I would like to promote a useful list of resources for AI in Mathematics, that was initiated by Talia Ringer (with the crowdsourced assistance of many others) during the National Academies workshop on “AI in mathematical reasoning” last year. This list is now accepting new contributions, updates, or corrections; please feel free to submit them directly to the list (which I am helping Talia to edit). Incidentally, next week there will be a second followup webinar to the aforementioned workshop, building on the topics covered there. (The first webinar may be found here.)

Secondly, I would like to advertise the website, launched recently by Thomas Bloom. This is intended to be a living repository of the many mathematical problems proposed in various venues by Paul Erdős, who was particularly noted for his influential posing of such problems. For a tour of the site and an explanation of its purpose, I can recommend Thomas’s recent talk on this topic at a conference last week in honor of Timothy Gowers.

Thomas is currently issuing a call for help to develop the website in a number of ways (quoting directly from that page):

  • You know Github and could set a suitable project up to allow people to contribute new problems (and corrections to old ones) to the database, and could help me maintain the Github project;
  • You know things about web design and have suggestions for how this website could look or perform better;
  • You know things about Python/Flask/HTML/SQL/whatever and want to help me code cool new features on the website;
  • You know about accessibility and have an idea how I can make this website more accessible (to any group of people);
  • You are a mathematician who has thought about some of the problems here and wants to write an expanded commentary for one of them, with lots of references, comparisons to other problems, and other miscellaneous insights (mathematician here is interpreted broadly, in that if you have thought about the problems on this site and are willing to write such a commentary you qualify);
  • You knew Erdős and have any memories or personal correspondence concerning a particular problem;
  • You have solved an Erdős problem and I’ll update the website accordingly (and apologies if you solved this problem some time ago);
  • You have spotted a mistake, typo, or duplicate problem, or anything else that has confused you and I’ll correct things;
  • You are a human being with an internet connection and want to volunteer a particular Erdős paper or problem list to go through and add new problems from (please let me know before you start, to avoid duplicate efforts);
  • You have any other ideas or suggestions – there are probably lots of things I haven’t thought of, both in ways this site can be made better, and also what else could be done from this project. Please get in touch with any ideas!

I for instance contributed a problem to the site (#587) that Erdős himself gave to me personally (this was the topic of a somewhat well known photo of Paul and myself, and which he communicated again to be shortly afterwards on a postcard; links to both images can be found by following the above link). As it turns out, this particular problem was essentially solved in 2010 by Nguyen and Vu.

(Incidentally, I also spoke at the same conference that Thomas spoke at, on my recent work with Gowers, Green, and Manners; here is the video of my talk, and here are my slides.)

Scott Aaronson UmeshFest

Unrelated Announcements: See here for a long interview with me in The Texas Orator, covering the usual stuff (quantum computing, complexity theory, AI safety). And see here for a podcast with me and Spencer Greenberg about a similar mix of topics.

A couple weeks ago, I helped organize UmeshFest: Don’t Miss This Flight, a workshop at UC Berkeley’s Simons Institute to celebrate the 26th birthday of my former PhD adviser Umesh Vazirani. Peter Shor, John Preskill, Manuel Blum, Madhu Sudan, Sanjeev Arora, and dozens of other luminaries of quantum and classical computation were on hand to help tell the story of quantum computing theory and Umesh’s central role in it. There was also constant roasting of Umesh—of his life lessons from the squash court, his last-minute organizational changes and phone calls at random hours. I was delighted to find that my old coinage of “Umeshisms” was simply standard usage among the attendees.

At Berkeley, many things were as I remembered them—my favorite Thai eatery, the bubble tea, the Campanile—but not everything was the same. Here I am in front of Berkeley’s Gaza encampment, a.k.a. its “Anti Zionism Zone” or what was formerly Sproul Plaza (zoom into the chalk):

I felt a need to walk through the Anti Zionism Zone day after day (albeit unassumingly, neither draped in an Israeli flag nor looking to start an argument with anyone), for more-or-less the same reasons why the US regularly sends aircraft carriers through the Strait of Taiwan.

Back in the more sheltered environment of the Simons Institute, it was great to be among friends, some of whom I hadn’t seen since before Covid. Andris Ambainis and I worked together for a bit on an open problem in quantum query complexity, for old times’ sake (we haven’t solved it yet).

And then there were talks! I thought I’d share my own talk, which was entitled The Story of BQP (Bounded-Error Quantum Polynomial-Time). Here are the PowerPoint slides, but I’ll also share screen-grabs for those of you who constantly complain that you can’t open PPTX files.

I was particularly proud of the design of my title slide:

Moving on:

The class BQP/qpoly, I should explain, is all about an advisor who’s all-wise and perfectly benevolent, but who doesn’t have a lot of time to meet with his students, so he simply doles out the same generic advice to all of them, regardless of their thesis problem x.

I then displayed my infamous “Umeshisms” blog post from 2005—one of the first posts in the history of this blog:

As I explained, now that I hang out with the rationalist and AI safety communities, which are also headquartered in Berkeley, I’ve learned that my “Umeshisms” post somehow took on a life of its own. Once, when dining at one of the rationalists’ polyamorous Berkeley group houses, I said this has been lovely but I’ll now need to leave, to visit my PhD former adviser Umesh Vazirani. “You mean the Umesh?!” the rationalists excitedly exclaimed. “Of Umeshisms? If you’ve never missed a flight?”

But moving on:

(Note that by “QBPP,” Bethiaume and Brassard meant what we now call BQP.)

Feynman and Deutsch asked exactly the right question—does simulating quantum mechanics on a classical computer inherently produce an exponential slowdown, or not?—but they lacked most of the tools to start formally investigating the question. A factor-of-two quantum speedup for the XOR function could be dismissed as unimpressive, while a much greater quantum speedup for the “constant vs. balanced” problem could be dismissed as a win against only deterministic classical algorithms, rather than randomized algorithms. Deutsch-Jozsa may have been the first time that an apparent quantum speedup faltered in an honest comparison against classical algorithms. It certainly wasn’t the last!

Ah, but this is where Bernstein and Vazirani enter the scene.

Bernstein and Vazirani didn’t merely define BQP, which remains the central object of study in quantum complexity theory. They also established its most basic properties:

And, at least in the black-box model, Bernstein and Vazirani gave the first impressive quantum speedup for a classical problem that survived in a fair comparison against the best classical algorithm:

The Recursive Bernstein-Vazirani problem, also called Recursive Fourier Sampling, is constructed as a “tree” of instances of the Bernstein-Vazirani problem, where to query the Boolean function at any given level, you need to solve a Bernstein-Vazirani problem for a Boolean function at the level below it, and then run the secret string s through a fixed Boolean function g. For more, see my old paper Quantum Lower Bound for Recursive Fourier Sampling.

Each Bernstein-Vazirani instance has classical query complexity n and quantum query complexity 1. So, if the tree of instances has depth d, then overall the classical query complexity is nd, while the quantum query complexity is only 2d. Where did the 2 come from? From the need to uncompute the secret strings s at each level, to enable quantum interference at the next level up—thereby forcing us to run the algorithm twice. A key insight.

The Recursive Fourier Sampling separation set the stage for Simon’s algorithm, which gave a more impressive speedup in the black-box model, and thence for the famous Shor’s algorithm for factoring and discrete log:

But Umesh wasn’t done establishing the most fundamental properties of BQP! There’s also the seminal 1994 paper by Bennett, Bernstein, Brassard, and Vazirani:

In light of the BV and BBBV papers, let’s see how BQP seems to fit with classical complexity classes—an understanding that’s remained largely stable for the past 30 years:

We can state a large fraction of the research agenda of the whole field, to this day, as questions about BQP:

I won’t have time to discuss all of these questions, but let me at least drill down on the first few.

Many people hoped the list of known problems in BQP would now be longer than it is. So it goes: we don’t decide the truth, we only discover it.

As a 17-year-old just learning about quantum computing in 1998 by reading the Bernstein-Vazirani paper, I was thrilled when I managed to improve their containment BQP ⊆ P#P to BQP ⊆ PP. I thought that would be my big debut in quantum complexity theory. I was then crushed when I learned that Adleman, DeMarrais, and Huang had proved the same thing a year prior. OK, but at least it wasn’t, like, 50 years prior! Maybe if I kept at it, I’d reach the frontier soon enough.

Umesh, from the very beginning, raised the profound question of BQP’s relation to the polynomial hierarchy. Could we at least construct an oracle relative to which BQP⊄PH—or, closely related, relative to which P=NP≠BQP? Recursive Fourier Sampling was a already candidate for such a separation. I spent months trying to prove that candidate wasn’t in PH, but failed. That led me eventually to propose a very different problem, Forrelation, which seemed like a stronger candidate, although I couldn’t prove that either. Finally, in 2018, after four years of effort, Ran Raz and Avishay Tal proved that my Forrelation problem was not in PH, thereby resolving Umesh’s question after a quarter century.

We now know three different ways by which a quantum computer can not merely solve any BQP problem efficiently, but prove its answer to a classical skeptic via an interactive protocol! Using quantum communication, using two entangled (but non-communicating) quantum computers, or using cryptography (this last a breakthrough of Umesh’s PhD student Urmila Mahadev). It remains a great open problem, first posed to my knowledge by Daniel Gottesman, whether one can do it with none of these things.

To see many of the advantages of quantum computation over classical, we’ve learned that we need to broaden our vision beyond BQP (which is a class of languages), to promise problems (like estimating the expectation values of observables), sampling problems (like BosonSampling and Random Circuit Sampling), and relational problems (like the Yamakawa-Zhandry problem, subject of a recent breakthrough). It’s conceivable that quantum advantage could remain for such problems even if it turned out that P=BQP.

A much broader question is whether BQP captures all languages that can be efficiently decided using “reasonable physical resources.” What about chiral quantum field theories, like the Standard Model of elementary particles? What about quantum theories of gravity? Good questions!

Since it was Passover during the talk, I literally said “Dayenu” to Umesh: “if you had only given us BQP, that would’ve been enough! but you didn’t, you gave us so much more!”

Happy birthday Umesh!! We look forward to celebrating again on all your subsequent power-of-2 birthdays.

May 10, 2024

Matt von HippelGetting It Right vs Getting It Done

With all the hype around machine learning, I occasionally get asked if it could be used to make predictions for particle colliders, like the LHC.

Physicists do use machine learning these days, to be clear. There are tricks and heuristics, ways to quickly classify different particle collisions and speed up computation. But if you’re imagining something that replaces particle physics calculations entirely, or even replace the LHC itself, then you’re misunderstanding what particle physics calculations are for.

Why do physicists try to predict the results of particle collisions? Why not just observe what happens?

Physicists make predictions not in order to know what will happen in advance, but to compare those predictions to experimental results. If the predictions match the experiments, that supports existing theories like the Standard Model. If they don’t, then a new theory might be needed.

Those predictions certainly don’t need to be made by humans: most of the calculations are done by computers anyway. And they don’t need to be perfectly accurate: in particle physics, every calculation is an approximation. But the approximations used in particle physics are controlled approximations. Physicists keep track of what assumptions they make, and how they might go wrong. That’s not something you can typically do in machine learning, where you might train a neural network with millions of parameters. The whole point is to be able to check experiments against a known theory, and we can’t do that if we don’t know whether our calculation actually respects the theory.

That difference, between caring about the result and caring about how you got there, is a useful guide. If you want to predict how a protein folds in order to understand what it does in a cell, then you will find AlphaFold useful. If you want to confirm your theory of how protein folding happens, it will be less useful.

Some industries just want the final result, and can benefit from machine learning. If you want to know what your customers will buy, or which suppliers are cheating you, or whether your warehouse is moldy, then machine learning can be really helpful.

Other industries are trying, like particle physicists, to confirm that a theory is true. If you’re running a clinical trial, you want to be crystal clear about how the trial data turn into statistics. You, and the regulators, care about how you got there, not just about what answer you got. The same can be true for banks: if laws tell you you aren’t allowed to discriminate against certain kinds of customers for loans, you need to use a method where you know what traits you’re actually discriminating against.

So will physicists use machine learning? Yes, and more of it over time. But will they use it to replace normal calculations, or replace the LHC? No, that would be missing the point.

May 08, 2024

Doug NatelsonWind-up nanotechnology

When I was a kid, I used to take allowance money and occasionally buy rubber-band-powered balsa wood airplanes at a local store.  Maybe you've seen these.  You wind up the rubber band, which stretches the elastomer and stores energy in the elastic strain of the polymer, as in Hooke's Law (though I suspect the rubber band goes well beyond the linear regime when it's really wound up, because of the higher order twisting that happens).  Rhett Alain wrote about how well you can store energy like this.  It turns out that the stored energy per mass of the rubber band can get pretty substantial. 

Carbon nanotubes are one of the most elastically strong materials out there.  A bit over a decade ago, a group at Michigan State did a serious theoretical analysis of how much energy you could store in a twisted yarn made from single-walled carbon nanotubes.  They found that the specific energy storage could get as large as several MJ/kg, as much as four times what you get with lithium ion batteries!

Now, a group in Japan has actually put this to the test, in this Nature Nano paper.  They get up to 2.1 MJ/kg, over the lithium ion battery mark, and the specific power (when they release the energy) at about \(10^{6}\) W/kg is not too far away from "non-cyclable" energy storage media, like TNT.  Very cool!  

May 06, 2024

Tommaso DorigoMove Over - The Talk I Will Not Give

Last week I was in Amsterdam, where I attended the first European AI for Fundamental Physics conference (EUCAIF). Unfortunately I could not properly follow the works there, as in the midst of it I got grounded by a very nasty bronchial bug. Then over the weekend I was able to drag myself back home, and today, still struggling with the after-effects, am traveling to Rome for another relevant event.

read more

May 04, 2024

n-Category Café Line Bundles on Complex Tori (Part 4)

Last time I introduced a 2-dimensional complex variety called the Eisenstein surface

E=/𝔼×/𝔼 E = \mathbb{C}/\mathbb{E} \times \mathbb{C}/\mathbb{E}

where 𝔼\mathbb{E} \subset \mathbb{C} is the lattice of Eisenstein integers. We worked out the Néron–Severi group NS(E)\mathrm{NS}(E) of this surface: that is, the group of equivalence classes of holomorphic line bundles on this surface, where we count two as equivalent if they’re isomorphic as topological line bundles. And we got a nice answer:

NS(E)𝔥 2(𝔼) \mathrm{NS}(E) \cong \mathfrak{h}_2(\mathbb{E})

where 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) consists of 2×22 \times 2 hermitian matrices with Eisenstein integers as entries.

Now we’ll see how this is related to the ‘hexagonal tiling honeycomb’:

We’ll see an explicit bijection between so-called ‘principal polarizations’ of the Eisenstein surface and the centers of hexagons in this picture! We won’t prove it works — I hope to do that later. But we’ll get everything set up.

The hexagonal tiling honeycomb

This picture by Roice Nelson shows a remarkable structure: the hexagonal tiling honeycomb.

What is it? Roughly speaking, a honeycomb is a way of filling 3d space with polyhedra. The most symmetrical honeycombs are the ‘regular’ ones. For any honeycomb, we define a flag to be a chosen vertex lying on a chosen edge lying on a chosen face lying on a chosen polyhedron. A honeycomb is regular if its geometrical symmetries act transitively on flags.

The most familiar regular honeycomb is the usual way of filling Euclidean space with cubes. This cubic honeycomb is denoted by the symbol {4,3,4}\{4,3,4\}, because a square has 4 edges, 3 squares meet at each corner of a cube, and 4 cubes meet along each edge of this honeycomb. We can also define regular honeycombs in hyperbolic space. For example, the order-5 cubic honeycomb is a hyperbolic honeycomb denoted {4,3,5}\{4,3,5\}, since 5 cubes meet along each edge:

Coxeter showed there are 15 regular hyperbolic honeycombs. The hexagonal tiling honeycomb is one of these. But it does not contain polyhedra of the usual sort! Instead, it contains flat Euclidean planes embedded in hyperbolic space, each plane containing the vertices of infinitely many regular hexagons. You can think of such a sheet of hexagons as a generalized polyhedron with infinitely many faces. You can see a bunch of such sheets in the picture:

The symbol for the hexagonal tiling honeycomb is {6,3,3}\{6,3,3\}, because a hexagon has 6 edges, 3 hexagons meet at each corner in a plane tiled by regular hexagons, and 3 such planes meet along each edge of this honeycomb. You can see that too if you look carefully.

A flat Euclidean plane in hyperbolic space is called a horosphere. Here’s a picture of a horosphere tiled with regular hexagons, yet again drawn by Roice:

Unlike the previous pictures, which are views from inside hyperbolic space, this uses the Poincaré ball model of hyperbolic space. As you can see here, a horosphere is a limiting case of a sphere in hyperbolic space, where one point of the sphere has become a ‘point at infinity’.

Be careful. A horosphere is intrinsically flat, so if you draw regular hexagons on it their internal angles are

2π/3=120 2\pi/3 = 120^\circ

as usual in Euclidean geometry. But a horosphere is not ‘totally geodesic’: straight lines in the horosphere are not geodesics in hyperbolic space! Thus, a hexagon in hyperbolic space with the same vertices as one of the hexagons in the horosphere actually bulges out from the horosphere a bit — and its internal angles are less than 2π/32\pi/3: they are

arccos(13)109.47 \arccos\left(-\frac{1}{3}\right) \approx 109.47^\circ

It’s really these hexagons in hyperbolic space that are faces of the hexagonal tiling honeycomb, not those tiling the horospheres, though perhaps you can barely see the difference. This can be quite confusing until you think about a simpler example, like the difference between a cube in Euclidean 3-space and a cube drawn on a sphere in Euclidean space.

Connection to special relativity

Johnson and Weiss have studied the symmetry group of the hexagonal tiling honeycomb:

They describe this group using the ring of Eisenstein integers:

𝔼={a+bω|a,b} \mathbb{E} = \{ a + b \omega \; \vert \; a, b \in \mathbb{Z} \} \subset \mathbb{C}

where ω\omega is the cube root of unity exp(2πi/3)\exp(2 \pi i/ 3). And I believe their work implies this result:

Theorem. The orientation-preserving symmetries of the hexagonal tiling honeycomb form the group PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}).

I’ll sketch a proof later, starting from what they actually show.

For comparison, the group of all orientation-preserving symmetries of hyperbolic space forms the larger group PGL(2,)\mathrm{PGL}(2,\mathbb{C}). This group is the same as PSL(2,)\mathrm{PSL}(2,\mathbb{C})… and this naturally brings Minkowski spacetime into the picture!

You see, in special relativity, Minkowski spacetime is 4\mathbb{R}^4 equipped with the nondegenerate bilinear form

(t,x,y,z)(t,x,y,z)=ttxxyyzz (t,x,y,z) \cdot (t',x',y',z') = t t' - x x' - y y' - z z

usually called the Minkowski metric.

Hyperbolic space sits inside Minowski spacetime as the hyperboloid of points x=(t,x,y,z)\mathbf{x} = (t,x,y,z) with xx=1\mathbf{x} \cdot \mathbf{x} = 1 and t>0t &gt; 0. Equivalently, we can think of Minkowski spacetime as the space 𝔥 2()\mathfrak{h}_2(\mathbb{C}) of 2×22 \times 2 hermitian complex matrices, using the fact that every such matrix is of the form

A=(t+z xiy x+iy tz) A = \left( \begin{array}{cc} t + z & x - i y \\ x + i y & t - z \end{array} \right)


det(A)=t 2x 2y 2z 2 \det(A) = t^2 - x^2 - y^2 - z^2

One reason this viewpoint is productive is that the group of symmetries of Minkowski spacetime that preserve the orientation and also preserve the distinction between future and past is the projective special linear group PSL(2,)\mathrm{PSL}(2,\mathbb{C}). The idea here is that any element gSL(2,)g \in \mathrm{SL}(2,\mathbb{C}) acts on 𝔥 2()\mathfrak{h}_2(\mathbb{C}) by

AgAg * A \mapsto g A g^\ast

This action clearly preserves the Minkowski metric (since it preserves the determinant of AA) and also the orientation and the direction of time (because SL(2,)\mathrm{SL}(2,\mathbb{C}) is connected). However, multiples of the identity matrix, namely the matrices ±I\pm I, act trivially. So, we get an action of the quotient group PSL(2,)\mathrm{PSL}(2,\mathbb{C}).

In these terms, the future cone in Minkowski spacetime is the cone of positive definite hermitian matrices:

𝒦={A𝔥 2()|detA>0,tr(A)>0} \mathcal{K} = \left\{A \in \mathfrak{h}_2(\mathbb{C}) \, \vert \, \det A &gt; 0, \, \mathrm{tr}(A) &gt; 0 \right\}

Sitting inside this we have the hyperboloid

={A𝔥 2()|detA=1,tr(A)>0} \mathcal{H} = \left\{A \in \mathfrak{h}_2(\mathbb{C}) \, \vert \, \det A = 1, \, \mathrm{tr}(A) &gt; 0 \right\}

which is none other than hyperbolic space! The Minkowski metric on 𝔥 2()\mathfrak{h}_2(\mathbb{C}) induces the usual Riemannian metric on hyperbolic space (up to a change of sign).

Indeed, not only is the symmetry group of the hexagonal tiling honeycomb abstractly isomorphic to the subgroup

PGL(2,𝔼)PGL(2,)=PSL(2,) \mathrm{PGL}(2,\mathbb{E}) \subset \mathrm{PGL}(2,\mathbb{C}) = \mathrm{PSL}(2,\mathbb{C})

we’ve also seen this subgroup acts as orientation-preserving isometries of hyperbolic space. So it seems almost obvious that PGL(2,𝔼) \mathrm{PGL}(2,\mathbb{E}) acts on hyperbolic space so as to preserve some hexagonal tiling honeycomb!

Constructing the hexagonal tiling honeycomb

Thus, the big question is: how can we actually construct a hexagonal tiling honeycomb inside \mathcal{H} that is preserved by the action of PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E})? I want to answer this question.

Sitting in the complex numbers we have the ring 𝔼\mathbb{E} of Eisenstein integers. This lets us define a lattice in Minkowski spacetime, called 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}), consisting of 2×22 \times 2 hermitian matrices with entries that are Eisenstein integers. James Dolan and I conjectured this:

Conjecture. The points in the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) that lie on the hyperboloid \mathcal{H} are the centers of hexagons in a hexagonal tiling honeycomb.

I think Greg Egan has done enough work to make this clear. I will try to write up proof here next time. Once this is done, it should be fairly easy to construct the other features of the hexagonal tiling honeycomb. They should all be related in various ways to the lattice 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}).

Connection to the Néron–Severi group of the Eisenstein surface

But what does any of this have to do with the supposed theme of these blog posts: line bundles on complex tori? To answer this, we need to remember that last time I gave an explicit isomorphism between

  • the Néron–Severi group NS(E)\mathrm{NS}(E) of the Eisenstein surface E= 2/𝔼 2E = \mathbb{C}^2/\mathbb{E}^2


  • 𝔥 2(𝔼)\mathfrak{h}_2(\mathbb{E}) (viewed as an additive group).

This isomorphism wasn’t new: experts know a lot about it. For example, under this correspondence, elements A𝔥 2(𝔼)A \in \mathfrak{h}_2(\mathbb{E}) with det(A)>0\det(A) &gt; 0 and tr(A)>0\mathrm{tr}(A) &gt; 0 correspond to elements of the Néron–Severi group coming from ample line bundles. Elements of the Néron–Severi group coming from ample line bundles are called polarizations.

Furthermore, elements with A𝔥 2(𝔼)A \in \mathfrak{h}_2(\mathbb{E}) with det(A)=1\det(A) = 1 and tr(A)>0\mathrm{tr}(A) &gt; 0 are known to correspond to certain specially nice polarizations called ‘principal’ polarizations.

So, the conjecture implies this:

Main Result. There is an explicit bijection between principal polarizations of the Eisenstein surface and centers of hexagons in the hexagonal tiling honeycomb.

There’s a lot more to say, but I’ll stop here for now, at least after filling in some details that I owe you.

Symmetries of the hexagonal tiling honeycomb

As shown already by Coxeter, the group of all isometries of hyperbolic space mapping the hexagonal tiling honeycomb to itself is the Coxeter group [6,3,3][6,3,3]. That means this group has a presentation with generators v,e,f,pv, e, f, p and relations

v 2=e 2=f 2=p 2=1 v^2 = e^2 = f^2 = p^2 = 1

(ve) 6=1,(ef) 3=1,(fp) 3=1,(vf) 2=1,(ep) 2=1 (v e)^6 = 1, \qquad (e f)^3 = 1, \qquad (f p)^3 = 1, \qquad (v f)^2 = 1, \qquad (e p)^2 = 1

Each generator corresponds to a reflection in hyperbolic space — that is, an orientation-reversing transformation that preserves angles and distances. The products of pairs of generators are orientation preserving, and they generate an index-2 subgroup of [6,3,3][6,3,3] called [6,3,3] +[6,3,3]^+.

Johnson and Weiss describe the group [6,3,3] +[6,3,3]^+ using the Eisenstein integers. Namely, they show it’s isomorphic to the group PS¯L(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}).

But what the heck is that?

As usual, SL(2,𝔼)\mathrm{SL}(2,\mathbb{E}) is the group of 2×22 \times 2 matrices with entries in 𝔼\mathbb{E} having determinant 11. But Johnson and Weiss define a slightly larger group S¯L(2,𝔼)\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) to consist of all 2×22 \times 2 matrices with entries in 𝔼\mathbb{E} that have determinant with absolute value 11. This in turn has a subgroup S¯Z(2,𝔼)\overline{\mathrm{S}}\mathrm{Z}(2,\mathbb{E}) consisting of multiples of the identity λI\lambda I where λ𝔼\lambda \in \mathbb{E} has absolute value 11. Then they define PS¯L(2,𝔼)=S¯L(2,𝔼)/S¯Z(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) = \overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E})/\overline{\mathrm{S}}\mathrm{Z}(2,\mathbb{E}).

What does the group PS¯L(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) actually amount to? Since all the units in 𝔼\mathbb{E} have absolute value 11 — they’re just the 6th roots of unity — S¯L(2,𝔼)\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) is the same as the group GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) consisting of all invertible 2×22 \times 2 matrices with entries in 𝔼\mathbb{E}. The subgroup S¯Z(2,𝔼)\overline{\mathrm{S}}\mathrm{Z}(2,\mathbb{E}) consists of matrices λI\lambda I where λ\lambda is a 6th root of unity. If I’m not confused, this is just the center of GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}). So what they’re calling PS¯L(2,𝔼)\mathrm{P}\overline{\mathrm{S}}\mathrm{L}(2,\mathbb{E}) is GL(2,𝔼)\mathrm{GL}(2,\mathbb{E}) modulo its center. This is usually called PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}).

All this is a bit confusing, but I think that with massive help from Johnson and Weiss we’ve shown this:

Theorem. The orientation-preserving symmetries of the hexagonal tiling honeycomb form the group PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}).

The interplay between PSL(2,𝔼)\mathrm{PSL}(2,\mathbb{E}) and PGL(2,𝔼)\mathrm{PGL}(2,\mathbb{E}) will become clearer next time: the latter group contains some 60 degree rotations that the former group does not!

April 29, 2024

Doug NatelsonMoiré and making superlattices

One of the biggest condensed matter trends in recent years has been the stacking of 2D materials and the development of moiré lattices.  The idea is, take a layer of 2D material and stack it either (1) on itself but with a twist angle, or (2) on another material with a slightly different lattice constant.  Because of interactions between the layers, the electrons in the material have an effective potential energy that has a spatial periodicity associated with the moiré pattern that results.  Twisted stacking hexagonal lattice materials (like graphene or many of the transition metal dichalcogenides) results in a triangular moiré lattice with a moiré lattice constant that depends on twist angle.  Some of the most interesting physics in these systems seems to pop out when the moiré lattice constant is on the order of a few nm to 10 nm or so.  The upside of the moiré approach is that it can produce such an effective lattice over large areas with really good precision and uniformity (provided that the twist angle can really be controlled - see here and here, for example.)  You might imagine using lithography to make designer superlattices, but getting the kind of cleanliness and homogeneity at these very small length scales is very challenging.

It's not surprising, then, that people are interested in somehow applying superlattice potentials to nearby monolayer systems.  Earlier this year, Nature Materials ran three papers published sequentially in one issue on this topic, and this is the accompanying News and Views article.

  • In one approach, a MoSe2/WS2 bilayer is made and the charge in the bilayer is tuned so that the bilayer system is a Mott insulator, with charges localized in exactly the moiré lattice sites.  That results in an electrostatic potential that varies on the moiré lattice scale that can then influence a nearby monolayer, which then shows cool moiré/flat band physics itself.
  • Closely related, investigators used a small-angle twisted bilayer of graphene.  That provides a moiré periodic dielectric environment for a nearby single layer of WSe2.  They can optically excite Rydberg excitons in the WSe2, excitons that are comparatively big and puffy and thus quite sensitive to their dielectric environment.  
  • Similarly, twisted bilayer WS2 can be used to apply a periodic Coulomb potential to a nearby bilayer of graphene, resulting in correlated insulating states in the graphene that otherwise wouldn't be there.

Clearly this is a growth industry.  Clever, creative ways to introduce highly ordered superlattice potentials on very small lengthscales with other symmetries besides triangular lattices would be very interesting.

April 26, 2024

Tommaso DorigoShaping The Future Of AI For Fundamental Physics

From April 30 to May 3 more than 300 researchers in fundamental physics will gather in Amsterdam for the first edition of the EUCAIF conference, an initiative supported by the APPEC, NuPecc and ECFA consortia, which is meant to structure future European research activities in fundamental physics with Artificial Intelligence technologies.

read more

April 25, 2024

Terence TaoNotes on the B+B+t theorem

A recent paper of Kra, Moreira, Richter, and Robertson established the following theorem, resolving a question of Erdös. Given a discrete amenable group {G = (G,+)}, and a subset {A} of {G}, we define the Banach density of {A} to be the quantity

\displaystyle  \sup_\Phi \limsup_{N \rightarrow \infty} |A \cap \Phi_N|/|\Phi_N|,

where the supremum is over all Følner sequences {\Phi = (\Phi_N)_{N=1}^\infty} of {G}. Given a set {B} in {G}, we define the restricted sumset {B \oplus B} to be the set of all pairs {b_1+b_2} where {b_1, b_2} are distinct elements of {B}.

Theorem 1 Let {G} be a countably infinite abelian group with the index {[G:2G]} finite. Let {A} be a positive Banach density subset of {G}. Then there exists an infinite set {B \subset A} and {t \in G} such that {B \oplus B + t \subset A}.

Strictly speaking, the main result of Kra et al. only claims this theorem for the case of the integers {G={\bf Z}}, but as noted in the recent preprint of Charamaras and Mountakis, the argument in fact applies for all countable abelian {G} in which the subgroup {2G := \{ 2x: x \in G \}} has finite index. This condition is in fact necessary (as observed by forthcoming work of Ethan Acklesberg): if {2G} has infinite index, then one can find a subgroup {H_j} of {G} of index {2^j} for any {j \geq 1} that contains {2G} (or equivalently, {G/H_j} is {2}-torsion). If one lets {y_1,y_2,\dots} be an enumeration of {G}, and one can then check that the set

\displaystyle  A := G \backslash \bigcup_{j=1}^\infty (H_{j+1} + y_j) \backslash \{y_1,\dots,y_j\}

has positive Banach density, but does not contain any set of the form {B \oplus B + t} for any {t} (indeed, from the pigeonhole principle and the {2}-torsion nature of {G/H_{j+1}} one can show that {B \oplus B + y_j} must intersect {H_{j+1} + y_j \backslash \{y_1,\dots,y_j\}} whenever {B} has cardinality larger than {j 2^{j+1}}). It is also necessary to work with restricted sums {B \oplus B} rather than full sums {B+B}: a counterexample to the latter is provided for instance by the example with {G = {\bf Z}} and {A := \bigcup_{j=1}^\infty [10^j, 1.1 \times 10^j]}. Finally, the presence of the shift {t} is also necessary, as can be seen by considering the example of {A} being the odd numbers in {G ={\bf Z}}, though in the case {G=2G} one can of course delete the shift {t} at the cost of giving up the containment {B \subset A}.

Theorem 1 resembles other theorems in density Ramsey theory, such as Szemerédi’s theorem, but with the notable difference that the pattern located in the dense set {A} is infinite rather than merely arbitrarily large but finite. As such, it does not seem that this theorem can be proven by purely finitary means. However, one can view this result as the conjunction of an infinite number of statements, each of which is a finitary density Ramsey theory statement. To see this, we need some more notation. Observe from Tychonoff’s theorem that the collection {2^G := \{ B: B \subset G \}} is a compact topological space (with the topology of pointwise convergence) (it is also metrizable since {G} is countable). Subsets {{\mathcal F}} of {2^G} can be thought of as properties of subsets of {G}; for instance, the property a subset {B} of {G} of being finite is of this form, as is the complementary property of being infinite. A property of subsets of {G} can then be said to be closed or open if it corresponds to a closed or open subset of {2^G}. Thus, a property is closed and only if if it is closed under pointwise limits, and a property is open if, whenever a set {B} has this property, then any other set {B'} that shares a sufficiently large (but finite) initial segment with {B} will also have this property. Since {2^G} is compact and Hausdorff, a property is closed if and only if it is compact.

The properties of being finite or infinite are neither closed nor open. Define a smallness property to be a closed (or compact) property of subsets of {G} that is only satisfied by finite sets; the complement to this is a largeness property, which is an open property of subsets of {G} that is satisfied by all infinite sets. (One could also choose to impose other axioms on these properties, for instance requiring a largeness property to be an upper set, but we will not do so here.) Examples of largeness properties for a subset {B} of {G} include:

  • {B} has at least {10} elements.
  • {B} is non-empty and has at least {b_1} elements, where {b_1} is the smallest element of {B}.
  • {B} is non-empty and has at least {b_{b_1}} elements, where {b_n} is the {n^{\mathrm{th}}} element of {B}.
  • {T} halts when given {B} as input, where {T} is a given Turing machine that halts whenever given an infinite set as input. (Note that this encompasses the preceding three examples as special cases, by selecting {T} appropriately.)
We will call a set obeying a largeness property {{\mathcal P}} an {{\mathcal P}}-large set.

Theorem 1 is then equivalent to the following “almost finitary” version (cf. this previous discussion of almost finitary versions of the infinite pigeonhole principle):

Theorem 2 (Almost finitary form of main theorem) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {\Phi_n} be a Følner sequence in {G}, let {\delta>0}, and let {{\mathcal P}_t} be a largeness property for each {t \in G}. Then there exists {N} such that if {A \subset G} is such that {|A \cap \Phi_n| / |\Phi_n| \geq \delta} for all {n \leq N}, then there exists a shift {t \in G} and {A} contains a {{\mathcal P}_t}-large set {B} such that {B \oplus B + t \subset A}.

Proof of Theorem 2 assuming Theorem 1. Let {G, \Phi_n}, {\delta}, {{\mathcal P}_t} be as in Theorem 2. Suppose for contradiction that Theorem 2 failed, then for each {N} we can find {A_N} with {|A_N \cap \Phi_n| / |\Phi_n| \geq \delta} for all {n \leq N}, such that there is no {t} and {{\mathcal P}_t}-large {B} such that {B, B \oplus B + t \subset A_N}. By compactness, a subsequence of the {A_N} converges pointwise to a set {A}, which then has Banach density at least {\delta}. By Theorem 1, there is an infinite set {B} and a {t} such that {B, B \oplus B + t \subset A}. By openness, we conclude that there exists a finite {{\mathcal P}_t}-large set {B'} contained in {B}, thus {B', B' \oplus B' + t \subset A}. This implies that {B', B' \oplus B' + t \subset A_N} for infinitely many {N}, a contradiction.

Proof of Theorem 1 assuming Theorem 2. Let {G, A} be as in Theorem 1. If the claim failed, then for each {t}, the property {{\mathcal P}_t} of being a set {B} for which {B, B \oplus B + t \subset A} would be a smallness property. By Theorem 2, we see that there is a {t} and a {B} obeying the complement of this property such that {B, B \oplus B + t \subset A}, a contradiction.

Remark 3 Define a relation {R} between {2^G} and {2^G \times G} by declaring {A\ R\ (B,t)} if {B \subset A} and {B \oplus B + t \subset A}. The key observation that makes the above equivalences work is that this relation is continuous in the sense that if {U} is an open subset of {2^G \times G}, then the inverse image

\displaystyle R^{-1} U := \{ A \in 2^G: A\ R\ (B,t) \hbox{ for some } (B,t) \in U \}

is also open. Indeed, if {A\ R\ (B,t)} for some {(B,t) \in U}, then {B} contains a finite set {B'} such that {(B',t) \in U}, and then any {A'} that contains both {B'} and {B' \oplus B' + t} lies in {R^{-1} U}.

For each specific largeness property, such as the examples listed previously, Theorem 2 can be viewed as a finitary assertion (at least if the property is “computable” in some sense), but if one quantifies over all largeness properties, then the theorem becomes infinitary. In the spirit of the Paris-Harrington theorem, I would in fact expect some cases of Theorem 2 to undecidable statements of Peano arithmetic, although I do not have a rigorous proof of this assertion.

Despite the complicated finitary interpretation of this theorem, I was still interested in trying to write the proof of Theorem 1 in some sort of “pseudo-finitary” manner, in which one can see analogies with finitary arguments in additive combinatorics. The proof of Theorem 1 that I give below the fold is my attempt to achieve this, although to avoid a complete explosion of “epsilon management” I will still use at one juncture an ergodic theory reduction from the original paper of Kra et al. that relies on such infinitary tools as the ergodic decomposition, the ergodic theory, and the spectral theorem. Also some of the steps will be a little sketchy, and assume some familiarity with additive combinatorics tools (such as the arithmetic regularity lemma).

— 1. Proof of theorem —

The proof of Kra et al. proceeds by establishing the following related statement. Define a (length three) combinatorial Erdös progression to be a triple {(A,X_1,X_2)} of subsets of {G} such that there exists a sequence {n_j \rightarrow \infty} in {G} such that {A - n_j} converges pointwise to {X_1} and {X_1-n_j} converges pointwise to {X_2}. (By {n_j \rightarrow \infty}, we mean with respect to the cocompact filter; that is, that for any finite (or, equivalently, compact) subset {K} of {G}, {n_j \not \in K} for all sufficiently large {j}.)

Theorem 4 (Combinatorial Erdös progression) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {A} be a positive Banach density subset of {G}. Then there exists a combinatorial Erdös progression {(A,X_1,X_2)} with {0 \in X_1} and {X_2} non-empty.

Let us see how Theorem 4 implies Theorem 1. Let {G, A, X_1, X_2, n_j} be as in Theorem 4. By hypothesis, {X_2} contains an element {t} of {G}, thus {0 \in X_1} and {t \in X_2}. Setting {b_1} to be a sufficiently large element of the sequence {n_1, n_2, \dots}, we conclude that {b_1 \in A} and {b_1 + t \in X_1}. Setting {b_2} to be an even larger element of this sequence, we then have {b_2, b_2+b_1+t \in A} and {b_2 +t \in X_1}. Setting {b_3} to be an even larger element, we have {b_3, b_3+b_1+t, b_3+b_2+t \in A} and {b_3 + t \in X_1}. Continuing in this fashion we obtain the desired infinite set {B}.

It remains to establish Theorem 4. The proof of Kra et al. converts this to a topological dynamics/ergodic theory problem. Define a topological measure-preserving {G}-system {(X,T,\mu)} to be a compact space {X} equipped with a Borel probability measure {\mu} as well as a measure-preserving homeomorphism {T: X \rightarrow X}. A point {a} in {X} is said to be generic for {\mu} with respect to a Følner sequence {\Phi} if one has

\displaystyle  \int_X f\ d\mu = \lim_{N \rightarrow \infty} {\bf E}_{n \in \Phi_N} f(T^n a)

for all continuous {f: X \rightarrow {\bf C}}. Define an (length three) dynamical Erdös progression to be a tuple {(a,x_1,x_2)} in {X} with the property that there exists a sequence {n_j \rightarrow \infty} such that {T^{n_j} a \rightarrow x_1} and {T^{n_j} x_1 \rightarrow x_2}.

Theorem 4 then follows from

Theorem 5 (Dynamical Erdös progression) Let {G} be a countably infinite abelian group with {[G:2G]} finite. Let {(X,T,\mu)} be a topological measure-preserving {G}-system, let {a} be a {\Phi}-generic point of {\mu} for some Følner sequence {\Phi}, and let {E} be a positive measure open subset of {X}. Then there exists a dynamical Erdös progression {(a,x_1,x_2)} with {x_1 \in E} and {x_2 \in \bigcup_{t \in G} T^t E}.

Indeed, we can take {X} to be {2^G}, {a} to be {A}, {T} to be the shift {T^n B := B-n}, {E := \{ B \in 2^G: 0 \in B \}}, and {\mu} to be a weak limit of the {\mathop{\bf E}_{n \in \Phi_N} \delta_{A-n}} for a Følner sequence {\Phi_N} with {\lim_{N \rightarrow \infty} |A \cap \Phi_N| / |\Phi_N| > 0}, at which point Theorem 4 follows from Theorem 5 after chasing definitions. (It is also possible to establish the reverse implication, but we will not need to do so here.)

A remarkable fact about this theorem is that the point {a} need not be in the support of {\mu}! (In a related vein, the elements {\Phi_j} of the Følner sequence are not required to contain the origin.)

Using a certain amount of ergodic theory and spectral theory, Kra et al. were able to reduce this theorem to a special case:

Theorem 6 (Reduction) To prove Theorem 5, it suffices to do so under the additional hypotheses that {X} is ergodic, and there is a continuous factor map to the Kronecker factor. (In particular, the eigenfunctions of {X} can be taken to be continuous.)

We refer the reader to the paper of Kra et al. for the details of this reduction. Now we specialize for simplicity to the case where {G = {\bf F}_p^\omega = \bigcup_N {\bf F}_p^N} is a countable vector space over a finite field of size equal to an odd prime {p}, so in particular {2G=G}; we also specialize to Følner sequences of the form {\Phi_j = x_j + {\bf F}_p^{N_j}} for some {x_j \in G} and {N_j \geq 1}. In this case we can prove a stronger statement:

Theorem 7 (Odd characteristic case) Let {G = {\bf F}_p^\omega} for an odd prime {p}. Let {(X,T,\mu)} be a topological measure-preserving {G}-system with a continuous factor map to the Kronecker factor, and let {E_1, E_2} be open subsets of {X} with {\mu(E_1) + \mu(E_2) > 1}. Then if {a} is a {\Phi}-generic point of {\mu} for some Følner sequence {\Phi_j = y_j + {\bf F}_p^{n_j}}, there exists an Erdös progression {(a,x_1,x_2)} with {x_1 \in E_1} and {x_2 \in E_2}.

Indeed, in the setting of Theorem 5 with the ergodicity hypothesis, the set {\bigcup_{t \in G} T^t E} has full measure, so the hypothesis {\mu(E_1)+\mu(E_2) > 1} of Theorem 7 will be verified in this case. (In the case of more general {G}, this hypothesis ends up being replaced with {\mu(E_1)/[G:2G] + \mu(E_2) > 1}; see Theorem 2.1 of this recent preprint of Kousek and Radic for a treatment of the case {G={\bf Z}} (but the proof extends without much difficulty to the general case).)

As with Theorem 1, Theorem 7 is still an infinitary statement and does not have a direct finitary analogue (though it can likely be expressed as the conjunction of infinitely many such finitary statements, as we did with Theorem 1). Nevertheless we can formulate the following finitary statement which can be viewed as a “baby” version of the above theorem:

Theorem 8 (Finitary model problem) Let {X = (X,d)} be a compact metric space, let {G = {\bf F}_p^N} be a finite vector space over a field of odd prime order. Let {T} be an action of {G} on {X} by homeomorphisms, let {a \in X}, and let {\mu} be the associated {G}-invariant measure {\mu = {\bf E}_{x \in G} \delta_{T^x a}}. Let {E_1, E_2} be subsets of {X} with {\mu(E_1) + \mu(E_2) > 1 + \delta} for some {\delta>0}. Then for any {\varepsilon>0}, there exist {x_1 \in E_1, x_2 \in E_2} such that

\displaystyle  |\{ h \in G: d(T^h a,x_1) \leq \varepsilon, d(T^h x_1,x_2) \leq \varepsilon \}| \gg_{p,\delta,\varepsilon,X} |G|.

The important thing here is that the bounds are uniform in the dimension {N} (as well as the initial point {a} and the action {T}).

Let us now give a finitary proof of Theorem 8. We can cover the compact metric space {X} by a finite collection {B_1,\dots,B_M} of open balls of radius {\varepsilon/2}. This induces a coloring function {\tilde c: X \rightarrow \{1,\dots,M\}} that assigns to each point in {X} the index {m} of the first ball {B_m} that covers that point. This then induces a coloring {c: G \rightarrow \{1,\dots,M\}} of {G} by the formula {c(h) := \tilde c(T^h a)}. We also define the pullbacks {A_i := \{ h \in G: T^h a \in E_i \}} for {i=1,2}. By hypothesis, we have {|A_1| + |A_2| > (1+\delta)|G|}, and it will now suffice by the triangle inequality to show that

\displaystyle  |\{ h \in G: c(h) = c(x_1); c(h+x_1)=c(x_2) \}| \gg_{p,\delta,M} |G|.

Now we apply the arithmetic lemma of Green with some regularity parameter {\kappa>0} to be chosen later. This allows us to partition {G} into cosets of a subgroup {H} of index {O_{p,\kappa}(1)}, such that on all but {\kappa [G:H]} of these cosets {y+H}, all the color classes {\{x \in y+H: c(x) = c_0\}} are {\kappa^{100}}-regular in the Fourier ({U^2}) sense. Now we sample {x_1} uniformly from {G}, and set {x_2 := 2x_1}; as {p} is odd, {x_2} is also uniform in {G}. If {x_1} lies in a coset {y+H}, then {x_2} will lie in {2y+H}. By removing an exceptional event of probability {O(\kappa)}, we may assume that neither of these cosetgs {y+H}, {2y+H} is a bad coset. By removing a further exceptional event of probability {O_M(\kappa)}, we may also assume that {x_1} is in a popular color class of {y+H} in the sense that

\displaystyle  |\{ x \in y+H: c(x) = c(x_1) \}| \geq \kappa |H| \ \ \ \ \ (1)

since the set of exceptional {x_1} that fail to achieve this only are hit with probability {O(M\kappa)}. Similarly we may assume that

\displaystyle  |\{ x \in 2y+H: c(x) = c(x_2) \}| \geq \kappa |H|. \ \ \ \ \ (2)

Now we consider the quantity

\displaystyle  |\{ h \in y+H: c(h) = c(x_1); c(h+x_1)=c(x_2) \}|

which we can write as

\displaystyle  |H| {\bf E}_{h \in y+H} 1_{c^{-1}(c(x_1))}(h) 1_{c^{-1}(c(x_2))}(h+x_1).

Both factors here are {O(\kappa^{100})}-uniform in their respective cosets. Thus by standard Fourier calculations, we see that after excluding another exceptional event of probabitiy {O(\kappa)}, this quantity is equal to

\displaystyle  |H| (({\bf E}_{h \in y+H} 1_{c^{-1}(c(x_1))}(h)) ({\bf E}_{h \in y+H} 1_{c^{-1}(c(x_2))}(h+x_1)) + O(\kappa^{10})).

By (1), (2), this expression is {\gg \kappa^2 |H| \gg_{p,\kappa} |G|}. By choosing {\kappa} small enough depending on {M,\delta}, we can ensure that {x_1 \in E_1} and {x_2 \in E_2}, and the claim follows.

Now we can prove the infinitary result in Theorem 7. Let us place a metric {d} on {X}. By sparsifying the Følner sequence {\Phi_j = y_j + {\bf F}_p^{N_j}}, we may assume that the {n_j} grow as fast as we wish. Once we do so, we claim that for each {J}, we can find {x_{1,J}, x_{2,J} \in X} such that for each {1 \leq j \leq J}, there exists {n_j \in \Phi_j} that lies outside of {{\bf F}_p^j} such that

\displaystyle  d(T^{n_j} a, x_{1,J}) \leq 1/j, \quad d(T^{n_j} x_{1,J}, x_{2,J}) \leq 1/j.

Passing to a subsequence to make {x_{1,J}, x_{2,J}} converge to {x_1, x_2} respectively, we obtain the desired Erdös progression.

Fix {J}, and let {M} be a large parameter (much larger than {J}) to be chosen later. By genericity, we know that the discrete measures {{\bf E}_{h \in \Phi_M} \delta_{T^h a}} converge vaguely to {\mu}, so any point in the support in {\mu} can be approximated by some point {T^h a} with {h \in \Phi_M}. Unfortunately, {a} does not necessarily lie in this support! (Note that {\Phi_M} need not contain the origin.) However, we are assuming a continuous factor map {\pi:X \rightarrow Z} to the Kronecker factor {Z}, which is a compact abelian group, and {\mu} pushes down to the Haar measure of {Z}, which has full support. In particular, thus pushforward contains {\pi(a)}. As a consequence, we can find {h_M \in \Phi_M} such that {\pi(T^{h_M} a)} converges to {\pi(a)}, even if we cannot ensure that {T^{h_M} a} converges to {a}. We are assuming that {\Phi_M} is a coset of {{\bf F}_p^{n_M}}, so now {{\bf E}_{h \in {\bf F}_p^{n_M}} \delta_{T^{h+h_M} a}} converges vaguely to {\mu}.

We make the random choice {x_{1,J} := T^{h_*+h_M} a}, {x_{2,J} := T^{2h_*+h_M} a}, where {h_*} is drawn uniformly at random from {{\bf F}_p^{n_M}}. This is not the only possible choice that can be made here, and is in fact not optimal in certain respects (in particular, it creates a fair bit of coupling between {x_{1,J}}, {x_{2,J}}), but is easy to describe and will suffice for our argument. (A more appropriate choice, closer to the arguments of Kra et al., would be to {x_{2,J}} in the above construction by {T^{2h_*+k_*+h_M} a}, where the additional shift {k_*} is a random variable in {{\bf F}_p^{n_M}} independent of {h_*} that is uniformly drawn from all shifts annihilated by the first {M} characters associated to some enumeration of the (necessarily countable) point spectrum of {T}, but this is harder to describe.)

Since we are in odd characteristic, the map {h \mapsto 2h} is a permutation on {h \in {\bf F}_p^{n_M}}, and so {x_{1,J}}, {x_{2,J}} are both distributed according to the law {{\bf E}_{h \in {\bf F}_p^{n_M}} \delta_{T^{h+h_M} a}}, though they are coupled to each other. In particular, by vague convergence (and inner regularity) we have

\displaystyle  {\bf P}( x_{1,J} \in E_1 ) \geq \mu(E_1) - o(1)


\displaystyle  {\bf P}( x_{2,J} \in E_2 ) \geq \mu(E_2) - o(1)

where {o(1)} denotes a quantity that goes to zero as {M \rightarrow \infty} (holding all other parameters fixed). By the hypothesis {\mu(E_1)+\mu(E_2) > 1}, we thus have

\displaystyle  {\bf P}( x_{1,J} \in E_1, x_{2,J} \in E_2 ) \geq \kappa - o(1) \ \ \ \ \ (3)

for some {\kappa>0} independent of {M}.

We will show that for each {1 \leq j \leq J}, one has

\displaystyle  |\{ h \in \Phi_j: d(T^{h} a,x_{1,J}) \leq 1/j, d(T^h x_{1,J},x_{2,J}) \leq 1/j \}| \ \ \ \ \ (4)

\displaystyle  \gg_{p,\kappa,j,X} (1-o(1)) |\Phi_j|

outside of an event of probability at most {\kappa/2^{j+1}+o(1)} (compare with Theorem 8). If this is the case, then by the union bound we can find (for {M} large enough) a choice of {x_{1,J}}, {x_{2,J}} obeying (3) as well as (4) for all {1 \leq j \leq J}. If the {N_j} grow fast enough, we can then ensure that for each {1 \leq j \leq J} one can find (again for {M} large enough) {n_j} in the set in (4) that avoids {{\bf F}_p^j}, and the claim follows.

It remains to show (4) outside of an exceptional event of acceptable probability. Let {\tilde c: X \rightarrow \{1,\dots,M_j\}} be the coloring function from the proof of Theorem 8 (with {\varepsilon := 1/j}). Then it suffices to show that

\displaystyle  |\{ h \in \Phi_j: c_0(h) = c(h_*); c(h+h_*)=c(2h_*) \}| \gg_{p,\kappa,M_j} (1-o(1)) |\Phi_j|

where {c_0(h) := \tilde c(T^h a)} and {c(h) := \tilde c(T^{h+h_M} a)}. This is a counting problem associated to the patterm {(h_*, h, h+h_*, 2h_*)}; if we concatenate the {h_*} and {2h_*} components of the pattern, this is a classic “complexity one” pattern, of the type that would be expected to be amenable to Fourier analysis (especially if one applies Cauchy-Schwarz to eliminate the {h_*} averaging and absolute value, at which point one is left with the {U^2} pattern {(h, h+h_*, h', h'+h_*)}).

In the finitary setting, we used the arithmetic regularity lemma. Here, we will need to use the Kronecker factor instead. The indicator function {1_{\tilde c^{-1}(i)}} of a level set of the coloring function {\tilde c} is a bounded measurable function of {X}, and can thus be decomposed into a function {f_i} that is measurable on the Kronecker factor, plus an error term {g_i} that is orthogonal to that factor and thus is weakly mixing in the sense that {|\langle T^h g_i, g_i \rangle|} tends to zero on average (or equivalently, that the Host-Kra seminorm {\|g_i\|_{U^2}} vanishes). Meanwhile, for any {\varepsilon > 0}, the Kronecker-measurable function {f_i} can be decomposed further as {P_{i,\varepsilon} + k_{i,\varepsilon}}, where {P_{i,\varepsilon}} is a bounded “trigonometric polynomial” (a finite sum of eigenfunctions) and {\|k_{i,\varepsilon}\|_{L^2} < \varepsilon}. The polynomial {P_{i,\varepsilon}} is continuous by hypothesis. The other two terms in the decomposition are merely meaurable, but can be approximated to arbitrary accuracy by continuous functions. The upshot is that we can arrive at a decomposition

\displaystyle  1_{\tilde c^{-1}(i)} = P_{i,\varepsilon} + k_{i,\varepsilon,\varepsilon'} + g_{i,\varepsilon'}

(analogous to the arithmetic regularity lemma) for any {\varepsilon,\varepsilon'>0}, where {k_{i,\varepsilon,\varepsilon'}} is a bounded continuous function of {L^2} norm at most {\varepsilon}, and {g_{i,\varepsilon'}} is a bounded continuous function of {U^2} norm at most {\varepsilon'} (in practice we will take {\varepsilon'} much smaller than {\varepsilon}). Pulling back to {c}, we then have

\displaystyle  1_{c(h)=i} = P_{i,\varepsilon}(T^{h+h_M} a) + k_{i,\varepsilon,\varepsilon'}(T^{h+h_M}a) + g_{i,\varepsilon'}(T^{h+h_M}a). \ \ \ \ \ (5)

Let {\varepsilon,\varepsilon'>0} be chosen later. The trigonometric polynomial {h \mapsto P_{i,\varepsilon}(T^{h} a)} is just a sum of {O_{\varepsilon,M_j}(1)} characters on {G}, so one can find a subgroup {H} of {G} of index {O_{p,\varepsilon,M_j}(1)} such that these polynomial are constant on each coset of {H} for all {i}. Then {h_*} lies in some coset {a_*+H} and {2h_*} lies in the coset {2a_*+H}. We then restrict {h} to also lie in {a_*+H}, and we will show that

\displaystyle  |\{ h \in \Phi_j \cap (a_*+H): c_0(h) = c(h_*); c(h+h_*)=c(2h_*) \}| \ \ \ \ \ (6)

\displaystyle  \gg_{\kappa,p,M_j} (1-o(1)) |\Phi_j \cap (a_*+H)|

outside of an exceptional event of proability {\kappa/2+o(1)}, which will establish our claim because {\varepsilon} will ultimately be chosen to dependon {p,\kappa,M_j}.

The left-hand side can be written as

\displaystyle  \sum_{i,i'} \sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i} 1_{c(h_*)=i, c(2h_*)=i'} 1_{c(h+h_*)=i'}.

The coupling of the constraints {c(h_*)=i} and {c(2h_*)=i'} is annoying (as {(h_*,2h_*)} is an “infinite complexity” pattern that cannot be controlled by any uniformity norm), but (perhaps surprisingly) will not end up causing an essential difficulty to the argument, as we shall see when we start eliminating the terms in this sum one at a time starting from the right.

We decompose the {1_{c(h+h_*)=i'}} term using (5):

\displaystyle  1_{c(h+h_*)=i'} = P_{i',\varepsilon}(T^{h+h_*+h_M} a) + k_{i,\varepsilon,\varepsilon'}(T^{h+h_*+h_M}a) + g_{i,\varepsilon'}(T^{h+h_*+h_M}a).

By Markov’s inequality, and removing an exceptional event of probabiilty at most {\kappa/100}, we may assume that the {g_{i',\varepsilon}} have normalized {L^2} norm {O_{\kappa,M_j}(\varepsilon)} on both of these cosets {a_*+H, 2a_*+H}. As such, the contribution of {k_{i',\varepsilon,\varepsilon'}(T^{h+h_*+h_M}a)} to (6) become negligible if {\varepsilon} is small enough (depending on {\kappa,p,M_j}). From the near weak mixing of the {g_{i,\varepsilon'}}, we know that

\displaystyle {\bf E}_{h \in \Phi_j \cap (a_*+H)} |\langle T^h g_{i,\varepsilon'}, g_{i,\varepsilon'} \rangle| \ll_{p,\varepsilon,M_j} \varepsilon'

for all {i}, if we choose {\Phi_j} large enough. By genericity of {a}, this implies that

\displaystyle {\bf E}_{h \in \Phi_j \cap (a_*+H)} |{\bf E}_{l \in {\bf F}_p^{n_M}} g_{i,\varepsilon'}(T^{h+l+h_M} a) g_{i,\varepsilon'}(T^{l+h_M} a)| \ll_{p,\varepsilon,M_j} \varepsilon' + o(1).

From this and standard Cauchy-Schwarz (or van der Corput) arguments we can then show that the contribution of the {g_{i',\varepsilon'}(T^{h+h_*+h_M}a)} to (6) is negligible outside of an exceptional event of probability at most {\kappa/100+o(1)}, if {\varepsilon'} is small enough depending on {\kappa,p,M_j,\varepsilon}. Finally, the quantity {P_{i',\varepsilon}(T^{h+h_*+h_M} a)} is independent of {h}, and in fact is equal up to negligible error to the density of {c^{-1}(i')} in the coset {{\bf F}_p^{M_j}(2a_*+H)}. This density will be {\gg_{p,\kappa,M_j}} except for those {i'} which would have made a negligible impact on (6) in any event due to the rareness of the event {c(2h_*)=i'} in such cases. As such, to prove (6) it suffices to show that

\displaystyle  \sum_{i,i'} \sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i} 1_{c(h_*)=i, c(2h_*)=i'} \gg_{\kappa,p,M_j} (1-o(1)) |\Phi_j \cap (a_*+H)|

outside of an event of probability {\kappa/100+o(1)}. Now one can sum in {i'} to simplify the above estiamte to

\displaystyle  \sum_{i} 1_{c(h_*)=i} (\sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i}) / |\Phi_j \cap (a_*+H)| \gg_{\kappa,p,M_j} 1-o(1).

If {i} is such that {(\sum_{h \in \Phi_j \cap (a_*+H)} 1_{c_0(h)=i})/|\Phi_j \cap (a_*+H)|} is small compared with {p,\kappa,M_j}, then by genericity (and assuming {\Phi_j} large enough), the probability that {c(h_*)=i} will similarly be small (up to {o(1)} errors), and thus have a negligible influence on the above sum. As such, the above estimate simplifies to

\displaystyle  \sum_{i} 1_{c(h_*)=i} \gg_{\kappa,p,M_j} 1-o(1).

But the left-hand side sums to one, and the claim follows.

April 24, 2024

Terence TaoErratum for “An inverse theorem for the Gowers U^{s+1}[N]-norm”

The purpose of this post is to report an erratum to the 2012 paper “An inverse theorem for the Gowers {U^{s+1}[N]}-norm” of Ben Green, myself, and Tamar Ziegler (previously discussed in this blog post). The main results of this paper have been superseded with stronger quantitative results, first in work of Manners (using somewhat different methods), and more recently in a remarkable paper of Leng, Sah, and Sawhney which combined the methods of our paper with several new innovations to obtain quite strong bounds (of quasipolynomial type); see also an alternate proof of our main results (again by quite different methods) by Candela and Szegedy. In the course of their work, they discovered some fixable but nontrivial errors in our paper. These (rather technical) issues were already implicitly corrected in this followup work which supersedes our own paper, but for the sake of completeness we are also providing a formal erratum for our original paper, which can be found here. We thank Leng, Sah, and Sawhney for bringing these issues to our attention.

Excluding some minor (mostly typographical) issues which we also have reported in this erratum, the main issues stemmed from a conflation of two notions of a degree {s} filtration

\displaystyle  G = G_0 \geq G_1 \geq \dots \geq G_s \geq G_{s+1} = \{1\}

of a group {G}, which is a nested sequence of subgroups that obey the relation {[G_i,G_j] \leq G_{i+j}} for all {i,j}. The weaker notion (sometimes known as a prefiltration) permits the group {G_1} to be strictly smaller than {G_0}, while the stronger notion requires {G_0} and {G_1} to equal. In practice, one can often move between the two concepts, as {G_1} is always normal in {G_0}, and a prefiltration behaves like a filtration on every coset of {G_1} (after applying a translation and perhaps also a conjugation). However, we did not clarify this issue sufficiently in the paper, and there are some places in the text where results that were only proven for filtrations were applied for prefiltrations. The erratum fixes this issues, mostly by clarifying that we work with filtrations throughout (which requires some decomposition into cosets in places where prefiltrations are generated). Similar adjustments need to be made for multidegree filtrations and degree-rank filtrations, which we also use heavily on our paper.

In most cases, fixing this issue only required minor changes to the text, but there is one place (Section 8) where there was a non-trivial problem: we used the claim that the final group {G_s} was a central group, which is true for filtrations, but not necessarily for prefiltrations. This fact (or more precisely, a multidegree variant of it) was used to claim a factorization for a certain product of nilcharacters, which is in fact not true as stated. In the erratum, a substitute factorization for a slightly different product of nilcharacters is provided, which is still sufficient to conclude the main result of this part of the paper (namely, a statistical linearization of a certain family of nilcharacters in the shift parameter {h}).

Again, we stress that these issues do not impact the paper of Leng, Sah, and Sawhney, as they adapted the methods in our paper in a fashion that avoids these errors.

April 11, 2024

Jordan EllenbergRoad trip to totality 2024

The last time we did this it was so magnificent that I said, on the spot, “see you again in 2024,” and seven years didn’t dim my wish to see the sun wink out again. It was easier this time — the path went through Indiana, which is a lot closer to home than St. Louis. More importantly, CJ can drive now, and likes to, so the trip is fully chauffeured. We saw the totality in Zionsville, IN, in a little park at the end of a residential cul-de-sac.

It was a smaller crowd than the one at Festus, MO in 2017; and unlike last time there weren’t a lot of travelers. These were just people who happened to live in Zionsville, IN and who were home in the middle of the day to see the eclipse. There were clouds, and a lot of worries about the clouds, but in the end it was just thin cirrus strips that blocked the sun, and then the non-sun, not at all.

To me it was a little less dramatic this time — because the crowd was more casual, because the temperature drop was less stark in April than it was in August, and of course because it was never again going to be the first time. But CJ and AB thought this one was better. We had very good corona. You could see a tiny red dot on the edge of the sun which was in fact a plasma prominence much bigger than the Earth.

Some notes:

  • We learned our lesson last time when we got caught in a massive traffic jam in the middle of a cornfield. We chose Zionsville because it was in the northern half of the totality, right on the highway, so we could be in the car zipping north on I-65 before the massive wave of northbound traffic out of Indianapolis caught up with us. And we were! Very satisfying, to watch on Google Maps as the traffic jam got longer and longer behind us, but was never quite where we were, as if we were depositing it behind us.
  • We had lunch in downtown Indianapolis where there is a giant Kurt Vonnegut Jr. painted on a wall. CJ is reading Slaughterhouse Five for school — in fact, to my annoyance, it’s the only full novel they’ve read in their American Lit elective. But it’s a pretty good choice for high school assigned reading. In the car I tried to explain Vonnegut’s theory of the granfaloon as it applied to “Hoosier” but neither kid was really interested.
  • We’ve done a fair number of road trips in the Mach-E and this was the first time charging created any annoyance. The Electrify America station we wanted on the way down had two chargers in use and the other two broken, so we had to detour quite a ways into downtown Lafayette to charge at a Cadillac dealership. On the way back, the station we planned on was full with one person waiting in line, so we had to change course and charge at the Whole Foods parking lot, and even there we got lucky as one person was leaving just as we arrived. The charging process probably added an hour to our trip each way.
  • While we charged at the Whole Foods in Schaumburg we hung out at the Woodfield Mall. Nostalgic feelings, for this suburban kid, to be in a thriving, functioning mall, with groups of kids just hanging out and vaguely shopping, the way we used to. The malls in Madison don’t really work like this any more. Is it a Chicago thing?
  • CJ is off to college next year. Sad to think there may not be any more roadtrips, or at least any more roadtrips where all of us are starting from home.
  • I was wondering whether total eclipses in the long run are equidistributed on the Earth’s surface and the answer is no: Ernie Wright at NASA made an image of the last 5000 years of eclipse paths superimposed:

There are more in the northern hemisphere than the southern because there are more eclipses in the summer (sun’s up longer!) and the sun is a little farther (whence visually a little smaller and more eclipsible) during northern hemisphere summer than southern hemisphere summer.

See you again in 2045!

April 05, 2024

Terence TaoMarton’s conjecture in abelian groups with bounded torsion

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “Marton’s conjecture in abelian groups with bounded torsion“. This paper fully resolves a conjecture of Katalin Marton (the bounded torsion case of the Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Marton’s conjecture) Let {G = (G,+)} be an abelian {m}-torsion group (thus, {mx=0} for all {x \in G}), and let {A \subset G} be such that {|A+A| \leq K|A|}. Then {A} can be covered by at most {(2K)^{O(m^3)}} translates of a subgroup {H} of {G} of cardinality at most {|A|}. Moreover, {H} is contained in {\ell A - \ell A} for some {\ell \ll (2 + m \log K)^{O(m^3 \log m)}}.

We had previously established the {m=2} case of this result, with the number of translates bounded by {(2K)^{12}} (which was subsequently improved to {(2K)^{11}} by Jyun-Jie Liao), but without the additional containment {H \subset \ell A - \ell A}. It remains a challenge to replace {\ell} by a bounded constant (such as {2}); this is essentially the “polynomial Bogolyubov conjecture”, which is still open. The {m=2} result has been formalized in the proof assistant language Lean, as discussed in this previous blog post. As a consequence of this result, many of the applications of the previous theorem may now be extended from characteristic {2} to higher characteristic.
Our proof techniques are a modification of those in our previous paper, and in particular continue to be based on the theory of Shannon entropy. For inductive purposes, it turns out to be convenient to work with the following version of the conjecture (which, up to {m}-dependent constants, is actually equivalent to the above theorem):

Theorem 2 (Marton’s conjecture, entropy form) Let {G} be an abelian {m}-torsion group, and let {X_1,\dots,X_m} be independent finitely supported random variables on {G}, such that

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i] \leq \log K,

where {{\bf H}} denotes Shannon entropy. Then there is a uniform random variable {U_H} on a subgroup {H} of {G} such that

\displaystyle \frac{1}{m} \sum_{i=1}^m d[X_i; U_H] \ll m^3 \log K,

where {d} denotes the entropic Ruzsa distance (see previous blog post for a definition); furthermore, if all the {X_i} take values in some symmetric set {S}, then {H} lies in {\ell S} for some {\ell \ll (2 + \log K)^{O(m^3 \log m)}}.

As a first approximation, one should think of all the {X_i} as identically distributed, and having the uniform distribution on {A}, as this is the case that is actually relevant for implying Theorem 1; however, the recursive nature of the proof of Theorem 2 requires one to manipulate the {X_i} separately. It also is technically convenient to work with {m} independent variables, rather than just a pair of variables as we did in the {m=2} case; this is perhaps the biggest additional technical complication needed to handle higher characteristics.
The strategy, as with the previous paper, is to attempt an entropy decrement argument: to try to locate modifications {X'_1,\dots,X'_m} of {X_1,\dots,X_m} that are reasonably close (in Ruzsa distance) to the original random variables, while decrementing the “multidistance”

\displaystyle {\bf H}[X_1+\dots+X_m] - \frac{1}{m} \sum_{i=1}^m {\bf H}[X_i]

which turns out to be a convenient metric for progress (for instance, this quantity is non-negative, and vanishes if and only if the {X_i} are all translates of a uniform random variable {U_H} on a subgroup {H}). In the previous paper we modified the corresponding functional to minimize by some additional terms in order to improve the exponent {12}, but as we are not attempting to completely optimize the constants, we did not do so in the current paper (and as such, our arguments here give a slightly different way of establishing the {m=2} case, albeit with somewhat worse exponents).
As before, we search for such improved random variables {X'_1,\dots,X'_m} by introducing more independent random variables – we end up taking an array of {m^2} random variables {Y_{i,j}} for {i,j=1,\dots,m}, with each {Y_{i,j}} a copy of {X_i}, and forming various sums of these variables and conditioning them against other sums. Thanks to the magic of Shannon entropy inequalities, it turns out that it is guaranteed that at least one of these modifications will decrease the multidistance, except in an “endgame” situation in which certain random variables are nearly (conditionally) independent of each other, in the sense that certain conditional mutual informations are small. In particular, in the endgame scenario, the row sums {\sum_j Y_{i,j}} of our array will end up being close to independent of the column sums {\sum_i Y_{i,j}}, subject to conditioning on the total sum {\sum_{i,j} Y_{i,j}}. Not coincidentally, this type of conditional independence phenomenon also shows up when considering row and column sums of iid independent gaussian random variables, as a specific feature of the gaussian distribution. It is related to the more familiar observation that if {X,Y} are two independent copies of a Gaussian random variable, then {X+Y} and {X-Y} are also independent of each other.
Up until now, the argument does not use the {m}-torsion hypothesis, nor the fact that we work with an {m \times m} array of random variables as opposed to some other shape of array. But now the torsion enters in a key role, via the obvious identity

\displaystyle \sum_{i,j} i Y_{i,j} + \sum_{i,j} j Y_{i,j} + \sum_{i,j} (-i-j) Y_{i,j} = 0.

In the endgame, the any pair of these three random variables are close to independent (after conditioning on the total sum {\sum_{i,j} Y_{i,j}}). Applying some “entropic Ruzsa calculus” (and in particular an entropic version of the Balog–Szeméredi–Gowers inequality), one can then arrive at a new random variable {U} of small entropic doubling that is reasonably close to all of the {X_i} in Ruzsa distance, which provides the final way to reduce the multidistance.
Besides the polynomial Bogolyubov conjecture mentioned above (which we do not know how to address by entropy methods), the other natural question is to try to develop a characteristic zero version of this theory in order to establish the polynomial Freiman–Ruzsa conjecture over torsion-free groups, which in our language asserts (roughly speaking) that random variables of small entropic doubling are close (in Ruzsa distance) to a discrete Gaussian random variable, with good bounds. The above machinery is consistent with this conjecture, in that it produces lots of independent variables related to the original variable, various linear combinations of which obey the same sort of entropy estimates that gaussian random variables would exhibit, but what we are missing is a way to get back from these entropy estimates to an assertion that the random variables really are close to Gaussian in some sense. In continuous settings, Gaussians are known to extremize the entropy for a given variance, and of course we have the central limit theorem that shows that averages of random variables typically converge to a Gaussian, but it is not clear how to adapt these phenomena to the discrete Gaussian setting (without the circular reasoning of assuming the polynoimal Freiman–Ruzsa conjecture to begin with).

April 02, 2024

Jordan EllenbergOrioles 13, Angels 4

I had the great privilege to be present at Camden Yards last weekend for what I believe to be the severest ass-whupping I have ever personally seen the Orioles administer. The Orioles went into the 6th winning 3-1 but the game felt like they were winning by more than that. Then suddenly they actually were — nine batters, nine runs, no outs (though in the middle of it all there was an easy double-play ball by Ramon Urias that the Angels’ shortstop Zach Neto just inexplicably dropped — it was that kind of day.) We had pitching (Grayson Rodriguez almost unhittable for six innings but for one mistake pitch), defense (Urias snagging a line drive at third almost before I saw it leave the bat) and of course a three-run homer, by Anthony Santander, to plate the 7th, 8th, and 9th of those nine runs.

Is being an Angels fan the saddest kind of fan to be right now? The Mets and the Padres, you have more of a “we spent all the money and built what should have been a superteam and didn’t win.” The A’s, you have the embarrassment of the on-field performance and the fact that your owner screwed your city and moved the team out of town. But the Angels? Somehow they just put together the two generational talents of this era of baseball and — didn’t do anything with them. There’s a certain heaviness to the sadness.

As good as the Orioles have been so far, taking three out of their first four and massively outscoring the opposition, I still think they weren’t really a 101-win team last year, and everything will have to go right again for them to be as good this year as they were last year. Our Felix Bautista replacement, Craig Kimbrel, has already blown his first and only save opportunity, which is to say he’s not really a Felix Bautista replacement. But it’s a hell of a team to watch.

The only downside — Gunnar Henderson, with a single, a triple and a home run already, is set to lead off the ninth but Hyde brings in Tony Kemp to pinch hit. Why? The fans want to see Gunnar on second for the cycle, let the fans see Gunnar on second for the cycle.

March 30, 2024

Andrew JaffeThe Milky Way

March 16, 2024

David Hoggsubmitted!

OMG I actually just submitted an actual paper, with me as first author. I submitted to the AAS Journals, with a preference for The Astronomical Journal. I don't write all that many first-author papers, so I am stoked about this. If you want to read it: It should come out on arXiv within days, or if you want to type pdflatex a few times, it is available at this GitHub repo. It is about how to combine many shifted images into one combined, mean image.

David HoggIAIFI Symposium, day two

Today was day two of a meeting on generative AI in physics, hosted by MIT. My favorite talks today were by Song Han (MIT) and Thea Aarestad (ETH), both of whom are working on making ML systems run ultra-fast on extremely limited hardware. Themes were: Work at low precision. Even 4-bit number representations! Radical. And bandwidth is way more expensive than compute: Never move data, latents, or weights to new hardware; work as locally as you can. They both showed amazing performance on terrible, tiny hardware. In addition, Han makes really cute 3d-printed devices! A conversation at the end that didn't quite happen is about how Aarestad's work might benefit from equivariant methods: Her application area is triggers in the CMS device at the LHC; her symmetry group is the Lorentz group (and permutations and etc). The day started with me on a panel in which my co-panelists said absolutely unhhinged things about the future of physics and artificial intelligence. I learned that many people think we are only years away from having independently operating, fully functional aritificial physicists that are more capable than we are.

David HoggIAIFI Symposium, day one

Today was the first day of a two-day symposium on the impact of Generative AI in physics. It is hosted by IAIFI and A3D3, two interdisciplinary and inter-institutional entities working on things related to machine learning. I really enjoyed the content today. One example was Anna Scaife (Manchester) telling us that all the different methods they have used for uncertainty quantification in astronomy-meets-ML contexts give different and inconsistent answers. It is very hard to know your uncertainty when you are doing ML. Another example was Simon Batzner (DeepMind) explaining that equivariant methods were absolutely required for the materials-design projects at DeepMind, and that introducing the equivariance absolutely did not bork optimization (as many believe it will). Those materials-design projects have been ridiculously successful. He said the amusing thing “Machine learning is IID, science is OOD”. I couldn't agree more. In a panel at the end of the day I learned that learned ML controllers now beat hand-built controllers in some robotics applications. That's interesting and surprising.