Planet Musings

March 01, 2024

Matt von HippelFrance for Non-EU Spouses of EU Citizens: To Get Your Rights, Don’t Follow the Rules

I’m a German citizen, my wife is not. When we moved to France, we were confused. Looking at the French government’s website, we couldn’t figure out a crucial question: when, and how, would she have the right to work?

We talked to the French embassy and EU aid organizations, got advice from my employer and blogs and Facebook groups. She’s a schoolteacher, and we wanted to make sure she was able to work when we arrived, at the beginning of the school year. We did everything we were told, filled out everything we were advised to…but still, employers weren’t sure she had the right to work.

Six months and a lot of pain later, we’ve now left France. We’ve learned a lot more about EU law and French immigration practices than we ever planned to. I’m writing this guide because I haven’t found anything quite like it, something that puts all the information we found in one place. Read this guide, and you’ll learn how the law is supposed to work, how it actually works…and what you should do if, as a non-EU spouse of an EU citizen, you still want to move to France.

How it’s supposed to work

I want to be absolutely clear here: I am not a lawyer. This is not professional legal advice. This is based on what I’ve been told by Your Europe Advice, an organization that provides free advice about EU law. It’s also based on my own reading, because the relevant law here (the EU Directive on Freedom of Movement, 2004/38/EC) is surprisingly readable.

First, the crucial question. Your spouse is an EU citizen, and you have moved together to a (different!) EU country. Do you have the right to work? Let’s check the directive:

Article 23

Related rights

Irrespective of nationality, the family members of a Union citizen who have the right of residence or the right of permanent residence in a Member State shall be entitled to take up employment or self-employment there.

Yes, you have the right to work.

You may need a visa to enter the country, but if so, it is supposed to be issued quickly and free of charge according to Article 5:

2.  Family members who are not nationals of a Member State shall only be required to have an entry visa in accordance with Regulation (EC) No 539/2001 or, where appropriate, with national law. For the purposes of this Directive, possession of the valid residence card referred to in Article 10 shall exempt such family members from the visa requirement.

Member States shall grant such persons every facility to obtain the necessary visas. Such visas shall be issued free of charge as soon as possible and on the basis of an accelerated procedure.

To make sure this is done properly, the EU recommends that you make it clear that you are applying for an entry visa as a family member of an EU citizen. These are generally short-stay Schengen visas that last 90 days.

After entering, you may be required to apply for a residence card.

Article 9

Administrative formalities for family members who are not nationals of a Member State

1.  Member States shall issue a residence card to family members of a Union citizen who are not nationals of a Member State, where the planned period of residence is for more than three months.

2.  The deadline for submitting the residence card application may not be less than three months from the date of arrival.

3.  Failure to comply with the requirement to apply for a residence card may make the person concerned liable to proportionate and non-discriminatory sanctions.

This residence card must be issued within six months, and they can only ask for a very short list of documents:

Article 10

Issue of residence cards

1.  The right of residence of family members of a Union citizen who are not nationals of a Member State shall be evidenced by the issuing of a document called ‘Residence card of a family member of a Union citizen’ no later than six months from the date on which they submit the application. A certificate of application for the residence card shall be issued immediately.

2.  For the residence card to be issued, Member States shall require presentation of the following documents:

(a) a valid passport;

(b) a document attesting to the existence of a family relationship or of a registered partnership;

(c) the registration certificate or, in the absence of a registration system, any other proof of residence in the host Member State of the Union citizen whom they are accompanying or joining;

Once you get it, the residence card is supposed to be valid for five years:

Article 11

Validity of the residence card

1.  The residence card provided for by Article 10(1) shall be valid for five years from the date of issue or for the envisaged period of residence of the Union citizen, if this period is less than five years.

Six months may sound like a long time, but if everything goes according to EU law you shouldn’t be too worried, because of this:

Article 25

General provisions concerning residence documents

1.  Possession of a registration certificate as referred to in Article 8, of a document certifying permanent residence, of a certificate attesting submission of an application for a family member residence card, of a residence card or of a permanent residence card, may under no circumstances be made a precondition for the exercise of a right or the completion of an administrative formality, as entitlement to rights may be attested by any other means of proof.

“Under no circumstances”, that’s pretty strong! You do not need your residence card either to exercise your rights (such as the right to work) or to complete any administrative formality (basically, anything the government wants you to do). You also don’t need a document certifying you’ve applied for the card. You can attest your rights by any other means of proof: for example, your marriage certificate and your spouse’s passport.

In general, you have almost all of the rights that the locals do, though for a few specific things you may have to wait:

Article 24

Equal treatment

1.  Subject to such specific provisions as are expressly provided for in the Treaty and secondary law, all Union citizens residing on the basis of this Directive in the territory of the host Member State shall enjoy equal treatment with the nationals of that Member State within the scope of the Treaty. The benefit of this right shall be extended to family members who are not nationals of a Member State and who have the right of residence or permanent residence.

2.  By way of derogation from paragraph 1, the host Member State shall not be obliged to confer entitlement to social assistance during the first three months of residence or, where appropriate, the longer period provided for in Article 14(4)(b), nor shall it be obliged, prior to acquisition of the right of permanent residence, to grant maintenance aid for studies, including vocational training, consisting in student grants or student loans to persons other than workers, self-employed persons, persons who retain such status and members of their families.

All of that is pretty clear, and there are some nice guides on the EU website that walk you through a lot of it.

I suspect that no EU country perfectly implements these rules. It’s a lot easier to require a residence card for something than to allow people to show up with just their marriage certificate. But there is a lot of variation in which rights are involved, and in how quickly and reliably things are processed. So next, let’s look at how France does it.

How France says it works

If you’re trying to move to France, the most intuitive thing to do is to check the French government’s website,, and see what it has to say. You’ll find confirmation of some of these points: that you must apply for a residence permit within three months, that they must grant it within six months unless they have a very good reason not to.

That page takes you to the page on residence cards, which describes part of the process of applying for one. Following the pages, you can eventually find the following steps:

  1. Apply via ANEF, the Administration Numérique des Étrangers en France. You’ll have to upload several documents: a scan of your passport, something proving your residence in France (they have a list), an official photo (there are machines called Photomatons in France that do this), a scan of your spouse’s passport and your marriage certificate, and some proof that your spouse has legal residence in France (for example, their employment contract). You have to do this after entering the country. So unlike a normal visa, this can’t be started early!
  2. ANEF gives you a document called an attestation de pre-depôt. This certifies that you have submitted your application, but nothing more than that. It explicitly says it doesn’t attest to the regularity of your stay, or let you re-enter France if you leave.
  3. ANEF then is supposed to forward your case to your local government: a prefecture or sub-prefecture.
  4. The prefecture or sub-prefecture, once they open your file, will give you access to an online space where they can send and receive documents. This online space is supposed to come with an attestation de prolongation. This is a document that attests that you are legally in the country for three months while they process your case, but still does not attest that you have the right to work, register for healthcare, return to the country if you leave, or really anything else. If you go past the three months, they’re supposed to issue you another one.
  5. They might ask you for more documents, or to clarify things.
  6. Once they’ve processed your case, they give you a way (that can vary by prefecture) to set up an appointment to do biometrics. You show up with the documents they ask for and they take your fingerprints.
  7. They give you an attestation de decision favorable. This one explicitly gives you the right to work.
  8. Once your residence card is ready, they let you set up an appointment to pick it up.

Note that, despite the EU rules, it’s not until step 7 that you get a document saying you have the right to work. Instead, employers might think that you need a work authorization, a document that is complicated to apply for because it requires the employer demonstrate that there are no suitable French candidates for the position. The page on work authorizations lists a number of exceptions…but not for spouses of EU citizens, nor for the short-term Schengen visa you might have if you followed the normal rules.

Even if an employer understands the rules, they still might be worried. It might not be clear to them how to fill out the paperwork to hire you without one of the documents listed on They might also be worried that the government will punish them. In France, if you claim to be a spouse of an EU citizen but turn out to be lying, your employer can be punished with very steep fines, or even in some cases jail time! So employers can be very reluctant to hire you if you don’t have some French document that explicitly says you have the right to work.

With all that, maybe you still want to try to do things this way. We still did, or at least, we couldn’t think of a better option. My wife applied with ANEF when we entered France, and we hoped things would go reasonably quickly.

How it actually works

Things do not go reasonably quickly.

The system ANEF uses to register non-EU spouses of EU nationals is quite new, and still buggy. Applications can be lost. Ours was sent to the wrong office, and not processed for some time.

The prefectures and sub-prefectures also take quite long to process things. They aim to finish in three months, but the average is typically much higher. If you check your prefecture, they may have published their average delays for recent years. Ours was around five months.

You may not have the ability to directly check on any of these things. ANEF told us they had no information, the prefecture told us they couldn’t answer our questions. We had to go through a variety of aid organizations to get any information at all.

The prefectures might ask you for documents you don’t actually need. They might want you to certify your marriage in your spouse’s home country if it was made elsewhere, or apostilled if your country does the apostille.

They might also give you a residence card that only lasts one year, instead of five, or charge you to pick it up, when they’re not supposed to.

Is it possible you get processed quickly and correctly? Yes, it’s possible. Some people do get the attestation de prolongation immediately, and not after five months. We had friends who were processed in two months, getting the card in three…after applying some political pressure behind the scenes, in a well-rated prefecture.

(Check your prefecture or sub-prefecture on Google maps, they have star ratings!)

Of the steps above, it took five months for us to get to step 4. We got up to step 6. before we gave up and left the country.

If you don’t want to do that, you need another approach.

What you should actually do

Talk to people in France, and they’ll be confused by all this. Most of them think you have to go through a very different process, one where you get a long-stay visa before entering the country, which explicitly gives the right to work.

That’s because that is actually the official process…for spouses of French people. EU Countries are allowed to have different immigration rules for their own citizens’ spouses from the general rules, and France does. Most bureaucrats you run into in France, and many employers, will assume you are supposed to get a long-stay visa, and that if you didn’t you’re doing something wrong. In particular, the bureaucrats in charge of registering you for health coverage will often assume this, so until you get your residence card you may need to pay full price for your healthcare.

Here’s the thing, though: why not get a long-stay visa?

This is a visa called type D. These visas cost money, they aren’t generally free. You can’t always get one: while the embassy is required by EU law to give you a short-stay visa, they aren’t required to give you a long-stay visa.

But long stay visas can explicitly give the right to work. They don’t expire in three months, before most prefectures will have processed your files. And they are what most French people expect you to have.

So that’s our advice. If you really want to move to France with your EU spouse, and you’re not an EU citizen yourself…then don’t go until you have a type D, long-stay, VLS-TS visa.

It’s not what you’re supposed to do. But until the system changes, it could save you five months of pain.

February 29, 2024

John BaezNicholas Ludford

At first glance it’s amazing that one of the great British composers of the 1400s largely sank from view until his works were rediscovered in 1850.

But the reason is not hard to find. When the Puritans took over England, they burned not only witches and heretics, but also books — and music! They hated the complex polyphonic choral music of the Catholics.

So, in the history of British music, between the great polyphonists Robert Fayrfax (1465-1521) and John Taverner (1490-1545), there was a kind of gap — a silence — until the Peterhouse Partbooks were rediscovered.

These were an extensive collection of musical manuscripts, handwritten by a single scribe between 1539 and 1541. Most of them got lost somehow and found only in the 1850s. Others were found even later, in 1926! They were hidden behind a panel in a library — probably hidden from the Puritans.

The 1850 batch contains wonderful compositions by Nicholas Ludford
(~1485-1557). One music scholar has called him “one of the last unsung
geniuses of Tudor polyphony”. Another wrote:

it is more a matter of astonishment that such mastery should be displayed by a composer of whom virtually nothing was known until modern times.

Ludford’s work was first recorded only in 1993, and much of the Peterhouse Partbooks have been recorded only more recently. A Boston group called Blue Heron released a 5-CD set, starting in 2010 and ending in 2017. It’s magnificent!

Below you can hear the Sanctus from Nicholas Ludford’s Missa Regnum mundi. It has long, sleek lines of harmony; you can lose yourself trying to follow all the parts.

Scott Aaronson The Problem of Human Specialness in the Age of AI

Update (Feb. 29): A YouTube video of this talk is now available, plus a comment section filled (as usual) with complaints about everything from my speech and mannerisms to my failure to address the commenter’s pet topic.

Here, as promised in my last post, is a written version of the talk I delivered a couple weeks ago at MindFest in Florida, entitled “The Problem of Human Specialness in the Age of AI.” The talk is designed as one-stop shopping, summarizing many different AI-related thoughts I’ve had over the past couple years (and earlier).


Thanks so much for inviting me! I’m not an expert in AI, let alone mind or consciousness.  Then again, who is?

For the past year and a half, I’ve been moonlighting at OpenAI, thinking about what theoretical computer science can do for AI safety.  I wanted to share some thoughts, partly inspired by my work at OpenAI but partly just things I’ve been wondering about for 20 years.  These thoughts are not directly about “how do we prevent super-AIs from killing all humans and converting the galaxy into paperclip factories?”, nor are they about “how do we stop current AIs from generating misinformation and being biased?,” as much attention as both of those questions deserve (and are now getting).  In addition to “how do we stop AGI from going disastrously wrong?,” I find myself asking “what if it goes right?  What if it just continues helping us with various mental tasks, but improves to where it can do just about any task as well as we can do it, or better?  Is there anything special about humans in the resulting world?  What are we still for?”


I don’t need to belabor for this audience what’s been happening lately in AI.  It’s arguably the most consequential thing that’s happened in civilization in the past few years, even if that fact was temporarily masked by various ephemera … y’know, wars, an insurrection, a global pandemic … whatever, what about AI?

I assume you’ve all spent time with ChatGPT, or with Bard or Claude or other Large Language Models, as well as with image models like DALL-E and Midjourney.  For all their current limitations—and we can discuss the limitations—in some ways these are the thing that was envisioned by generations of science fiction writers and philosophers.  You can talk to them, and they give you a comprehending answer.  Ask them to draw something and they draw it.

I think that, as late as 2019, very few of us expected this to exist by now.  I certainly didn’t expect it to.  Back in 2014, when there was a huge fuss about some silly ELIZA-like chatbot called “Eugene Goostman” that was falsely claimed to pass the Turing Test, I asked around: why hasn’t anyone tried to build a much better chatbot, by (let’s say) training a neural network on all the text on the Internet?  But of course I didn’t do that, nor did I know what would happen when it was done.

The surprise, with LLMs, is not merely that they exist, but the way they were created.  Back in 1999, you would’ve been laughed out of the room if you’d said that all the ideas needed to build an AI that converses with you in English already existed, and that they’re basically just neural nets, backpropagation, and gradient descent.  (With one small exception, a particular architecture for neural nets called the transformer, but that probably just saves you a few years of scaling anyway.)  Ilya Sutskever, cofounder of OpenAI (who you might’ve seen something about in the news…), likes to say that beyond those simple ideas, you only needed three ingredients:

(1) a massive investment of computing power,
(2) a massive investment of training data, and
(3) faith that your investments would pay off!

Crucially, and even before you do any reinforcement learning, GPT-4 clearly seems “smarter” than GPT-3, which seems “smarter” than GPT-2 … even as the biggest ways they differ are just the scale of compute and the scale of training data!  Like,

  • GPT-2 struggled with grade school math.
  • GPT-3.5 can do most grade school math but it struggles with undergrad material.
  • GPT-4, right now, can probably pass most undergraduate math and science classes at top universities (I mean, the ones without labs or whatever!), and possibly the humanities classes too (those might even be easier for GPT-4 than the science classes, but I’m much less confident about it). But it still struggles with, for example, the International Math Olympiad.  How insane, that this is now where we have to place the bar!

Obvious question: how far will this sequence continue?  There are certainly a least a few more orders of magnitude of compute before energy costs become prohibitive, and a few more orders of magnitude of training data before we run out of public Internet. Beyond that, it’s likely that continuing algorithmic advances will simulate the effect of more orders of magnitude of compute and data than however many we actually get.

So, where does this lead?

(Note: ChatGPT agreed to cooperate with me to help me generate the above image. But it then quickly added that it was just kidding, and the Riemann Hypothesis is still open.)


Of course, I have many friends who are terrified (some say they’re more than 90% confident and few of them say less than 10%) that not long after that, we’ll get this

But this isn’t the only possibility smart people take seriously.

Another possibility is that the LLM progress fizzles before too long, just like previous bursts of AI enthusiasm were followed by AI winters.  Note that, even in the ultra-conservative scenario, LLMs will probably still be transformative for the economy and everyday life, maybe as transformative as the Internet.  But they’ll just seem like better and better GPT-4’s, without ever seeming qualitatively different from GPT-4, and without anyone ever turning them into stable autonomous agents and letting them loose in the real world to pursue goals the way we do.

A third possibility is that AI will continue progressing through our lifetimes as quickly as we’ve seen it progress over the past 5 years, but even as that suggests that it’ll surpass you and me, surpass John von Neumann, become to us as we are to chimpanzees … we’ll still never need to worry about it treating us the way we’ve treated chimpanzees.  Either because we’re projecting and that’s just totally not a thing that AIs trained on the current paradigm would tend to do, or because we’ll have figured out by then how to prevent AIs from doing such things.  Instead, AI in this century will “merely” change human life by maybe as much as it changed over the last 20,000 years, in ways that might be incredibly good, or incredibly bad, or both depending on who you ask.

If you’ve lost track, here’s a decision tree of the various possibilities that my friend (and now OpenAI allignment colleague) Boaz Barak and I came up with.


Now, as far as I can tell, the empirical questions of whether AI will achieve and surpass human performance at all tasks, take over civilization from us, threaten human existence, etc. are logically distinct from the philosophical question of whether AIs will ever “truly think,” or whether they’ll only ever “appear” to think.  You could answer “yes” to all the empirical questions and “no” to the philosophical question, or vice versa.  But to my lifelong chagrin, people constantly munge the two questions together!

A major way they do so, is with what we could call the religion of Justaism.

  • GPT is justa next-token predictor.
  • It’s justa function approximator.
  • It’s justa gigantic autocomplete.
  • It’s justa stochastic parrot.
  • And, it “follows,” the idea of AI taking over from humanity is justa science-fiction fantasy, or maybe a cynical attempt to distract people from AI’s near-term harms.

As someone once expressed this religion on my blog: GPT doesn’t interpret sentences, it only seems-to-interpret them.  It doesn’t learn, it only seems-to-learn.  It doesn’t judge moral questions, it only seems-to-judge. I replied: that’s great, and it won’t change civilization, it’ll only seem-to-change it!

A closely related tendency is goalpost-moving.  You know, for decades chess was the pinnacle of human strategic insight and specialness, and that lasted until Deep Blue, right after which, well of course AI can cream Garry Kasparov at chess, everyone always realized it would, that’s not surprising, but Go is an infinitely richer, deeper game, and that lasted until AlphaGo/AlphaZero, right after which, of course AI can cream Lee Sedol at Go, totally expected, but wake me up when it wins Gold in the International Math Olympiad.  I bet $100 against my friend Ernie Davis that the IMO milestone will happen by 2026.  But, like, suppose I’m wrong and it’s 2030 instead … great, what should be the next goalpost be?

Indeed, we might as well formulate a thesis, which despite the inclusion of several weasel phrases I’m going to call falsifiable:

Given any game or contest with suitably objective rules, which wasn’t specifically constructed to differentiate humans from machines, and on which an AI can be given suitably many examples of play, it’s only a matter of years before not merely any AI, but AI on the current paradigm (!), matches or beats the best human performance.

Crucially, this Aaronson Thesis (or is it someone else’s?) doesn’t necessarily say that AI will eventually match everything humans do … only our performance on “objective contests,” which might not exhaust what we care about.

Incidentally, the Aaronson Thesis would seem to be in clear conflict with Roger Penrose’s views, which we heard about from Stuart Hameroff’s talk yesterday.  The trouble is, Penrose’s task is “just see that the axioms of set theory are consistent” … and I don’t know how to gauge performance on that task, any more than I know how to gauge performance on the task, “actually taste the taste of a fresh strawberry rather than merely describing it.”  The AI can always say that it does these things!


This brings me to the original and greatest human vs. machine game, one that was specifically constructed to differentiate the two: the Imitation Game, which Alan Turing proposed in an early and prescient (if unsuccessful) attempt to head off the endless Justaism and goalpost-moving.  Turing said: look, presumably you’re willing to regard other people as conscious based only on some sort of verbal interaction with them.  So, show me what kind of verbal interaction with another person would lead you to call the person conscious: does it involve humor? poetry? morality? scientific brilliance?  Now assume you have a totally indistinguishable interaction with a future machine.  Now what?  You wanna stomp your feet and be a meat chauvinist?

(And then, for his great attempt to bypass philosophy, fate punished Turing, by having his Imitation Game itself provoke a billion new philosophical arguments…)


Although I regard the Imitation Game as, like, one of the most important thought experiments in the history of thought, I concede to its critics that it’s generally not what we want in practice.

It now seems probable that, even as AIs start to do more and more work that used to be done by doctors and lawyers and scientists and illustrators, there will remain straightforward ways to distinguish AIs from humans—either because customers want there to be, or governments force there to be, or simply because indistinguishability wasn’t what was wanted or conflicted with other goals.

Right now, like it or not, a decent fraction of all high-school and college students on earth are using ChatGPT to do their homework for them. For that reason among others, this question of how to distinguish humans from AIs, this question from the movie Blade Runner, has become a big practical question in our world.

And that’s actually one of the main things I’ve thought about during my time at OpenAI.  You know, in AI safety, people keep asking you to prognosticate decades into the future, but the best I’ve been able to do so far was see a few months into the future, when I said: “oh my god, once everyone starts using GPT, every student will want to use it to cheat, scammers and spammers will use it too, and people are going to clamor for some way to determine provenance!”

In practice, often it’s easy to tell what came from AI.  When I get comments on my blog like this one:

“Erica Poloix,” July 21, 2023:
Well, it’s quite fascinating how you’ve managed to package several misconceptions into such a succinct comment, so allow me to provide some correction. Just as a reference point, I’m studying physics at Brown, and am quite up-to-date with quantum mechanics and related subjects.

The bigger mistake you’re making, Scott, is assuming that the Earth is in a ‘mixed state’ from the perspective of the universal wavefunction, and that this is somehow an irreversible situation. It’s a misconception that common, ‘classical’ objects like the Earth are in mixed states. In the many-worlds interpretation, for instance, even macroscopic objects are in superpositions – they’re just superpositions that look classical to us because we’re entangled with them. From the perspective of the universe’s wavefunction, everything is always in a pure state.

As for your claim that we’d need to “swap out all the particles on Earth for ones that are already in pure states” to return Earth to a ‘pure state,’ well, that seems a bit misguided. All quantum systems are in pure states before they interact with other systems and become entangled. That’s just Quantum Mechanics 101.

I have to say, Scott, your understanding of quantum physics seems to be a bit, let’s say, ‘mixed up.’ But don’t worry, it happens to the best of us. Quantum Mechanics is counter-intuitive, and even experts struggle with it. Keep at it, and try to brush up on some more fundamental concepts. Trust me, it’s a worthwhile endeavor.

… I immediately say, either this came from an LLM or it might as well have.  Likewise, apparently hundreds of students have been turning in assignments that contain text like, “As a large language model trained by OpenAI…”—easy to catch!

But what about the slightly more sophisticated cheaters? Well, people have built discriminator models to try to distinguish human from AI text, such as GPTZero.  While these distinguishers can get well above 90% accuracy, the danger is that they’ll necessarily get worse as the LLMs get better.

So, I’ve worked on a different solution, called watermarking.  Here, we use the fact that LLMs are inherently probabilistic — that is, every time you submit a prompt, they’re sampling some path through a branching tree of possibilities for the sequence of next tokens.  The idea of watermarking is to steer the path using a pseudorandom function, so that it looks to a normal user indistinguishable from normal LLM output, but secretly it encodes a signal that you can detect if you know the key.

I came up with a way to do that in Fall 2022, and others have since independently proposed similar ideas.  I should caution you that this hasn’t been deployed yet—OpenAI, along with DeepMind and Anthropic, want to move slowly and cautiously toward deployment.  And also, even when it does get deployed, anyone who’s sufficiently knowledgeable and motivated will be able to remove the watermark, or produce outputs that aren’t watermarked to begin with.


But as I talked to my colleagues about watermarking, I was surprised that they often objected to it on a completely different ground, one that had nothing to do with how well it can work.  They said: look, if we all know students are going to rely on AI in their jobs, why shouldn’t they be allowed to rely on it in their assignments?  Should we still force students to learn to do things if AI can now do them just as well?

And there are many good pedagogical answers you can give: we still teach kids spelling and handwriting and arithmetic, right?  Because, y’know, we haven’t yet figured out how to instill higher-level conceptual understanding without all that lower-level stuff as a scaffold for it.

But I already think about this in terms of my own kids.  My 11-year-old daughter Lily enjoys writing fantasy stories.  Now, GPT can also churn out short stories, maybe even technically “better” short stories, about such topics as tween girls who find themselves recruited by wizards to magical boarding schools that are not Hogwarts and totally have nothing to do with Hogwarts.  But here’s a question: from this point on, will Lily’s stories ever surpass the best AI-written stories?  When will the curves cross?  Or will AI just continue to stay ahead?


But, OK, what do we even mean by one story being “better” than another?  Is there anything objective behind such judgments?

I submit that, when we think carefully about what we really value in human creativity, the problem goes much deeper than just “is there an objective way to judge”?

To be concrete, could there be an AI that was “as good at composing music as the Beatles”?

For starters, what made the Beatles “good”?  At a high level, we might decompose it into

  1. broad ideas about the direction that 1960s music should go in, and
  2. technical execution of those ideas.

Now, imagine we had an AI that could generate 5000 brand-new songs that sounded like more “Yesterday”s and “Hey Jude”s, like what the Beatles might have written if they’d somehow had 10x more time to write at each stage of their musical development.  Of course this AI would have to be fed the Beatles’ back-catalogue, so that it knew what target it was aiming at.

Most people would say: ah, this shows only that AI can match the Beatles in #2, in technical execution, which was never the core of their genius anyway!  Really we want to know: would the AI decide to write “A Day in the Life” even though nobody had written anything like it before?

Recall Schopenhauer: “Talent hits a target no one else can hit, genius hits a target no one else can see.”  Will AI ever hit a target no one else can see?

But then there’s the question: supposing it does hit such a target, will we know?  Beatles fans might say that, by 1967 or so, the Beatles were optimizing for targets that no musician had ever quite optimized for before.  But—and this is why they’re so remembered—they somehow successfully dragged along their entire civilization’s musical objective function so that it continued to match their own.  We can now only even judge music by a Beatles-influenced standard, just like we can only judge plays by a Shakespeare-influenced standard.

In other branches of the wavefunction, maybe a different history led to different standards of value.  But in this branch, helped by their technical talents but also by luck and force of will, Shakespeare and the Beatles made certain decisions that shaped the fundamental ground rules of their fields going forward.  That’s why Shakespeare is Shakespeare and the Beatles are the Beatles.

(Maybe, around the birth of professional theater in Elizabethan England, there emerged a Shakespeare-like ecological niche, and Shakespeare was the first one with the talent, luck, and opportunity to fill it, and Shakespeare’s reward for that contingent event is that he, and not someone else, got to stamp his idiosyncracies onto drama and the English language forever. If so, art wouldn’t actually be that different from science in this respect!  Einstein, for example, was simply the first guy both smart and lucky enough to fill the relativity niche.  If not him, it would’ve surely been someone else or some group sometime later.  Except then we’d have to settle for having never known Einstein’s gedankenexperiments with the trains and the falling elevator, his summation convention for tensors, or his iconic hairdo.)


If this is how it works, what does it mean for AI?  Could AI reach the “pinnacle of genius,” by dragging all of humanity along to value something new and different, as is said to be the true mark of Shakespeare and the Beatles’ greatness?  And: if AI could do that, would we want to let it?

When I’ve played around with using AI to write poems, or draw artworks, I noticed something funny.  However good the AI’s creations were, there were never really any that I’d want to frame and put on the wall.  Why not?  Honestly, because I always knew that I could generate a thousand others on the exact same topic that were equally good, on average, with more refreshes of the browser window. Also, why share AI outputs with my friends, if my friends can just as easily generate similar outputs for themselves? Unless, crucially, I’m trying to show them my own creativity in coming up with the prompt.

By its nature, AI—certainly as we use it now!—is rewindable and repeatable and reproducible.  But that means that, in some sense, it never really “commits” to anything.  For every work it generates, it’s not just that you know it could’ve generated a completely different work on the same subject that was basically as good.  Rather, it’s that you can actually make it generate that completely different work by clicking the refresh button—and then do it again, and again, and again.

So then, as long as humanity has a choice, why should we ever choose to follow our would-be AI genius along a specific branch, when we can easily see a thousand other branches the genius could’ve taken?  One reason, of course, would be if a human chose one of the branches to elevate above all the others.  But in that case, might we not say that the human had made the “executive decision,” with some mere technical assistance from the AI?

I realize that, in a sense, I’m being completely unfair to AIs here.  It’s like, our Genius-Bot could exercise its genius will on the world just like Certified Human Geniuses did, if only we all agreed not to peek behind the curtain to see the 10,000 other things Genius-Bot could’ve done instead.  And yet, just because this is “unfair” to AIs, doesn’t mean it’s not how our intuitions will develop.

If I’m right, it’s humans’ very ephemerality and frailty and mortality, that’s going to remain as their central source of their specialness relative to AIs, after all the other sources have fallen.  And we can connect this to much earlier discussions, like, what does it mean to “murder” an AI if there are thousands of copies of its code and weights on various servers?  Do you have to delete all the copies?  How could whether something is “murder” depend on whether there’s a printout in a closet on the other side of the world?

But we humans, you have to grant us this: at least it really means something to murder us!  And likewise, it really means something when we make one definite choice to share with the world: this is my artistic masterpiece.  This is my movie.  This is my book.  Or even: these are my 100 books.  But not: here’s any possible book that you could possibly ask me to write.  We don’t live long enough for that, and even if we did, we’d unavoidably change over time as we were doing it.


Now, though, we have to face a criticism that might’ve seemed exotic until recently. Namely, who says humans will be frail and mortal forever?  Isn’t it shortsighted to base our distinction between humans on that?  What if someday we’ll be able to repair our cells using nanobots, even copy the information in them so that, as in science fiction movies, a thousand doppelgangers of ourselves can then live forever in simulated worlds in the cloud?  And that then leads to very old questions of: well, would you get into the teleportation machine, the one that reconstitutes a perfect copy of you on Mars while painlessly euthanizing the original you?  If that were done, would you expect to feel yourself waking up on Mars, or would it only be someone else a lot like you who’s waking up?

Or maybe you say: you’d wake up on Mars if it really was a perfect physical copy of you, but in reality, it’s not physically possible to make a copy that’s accurate enough.  Maybe the brain is inherently noisy or analog, and what might look to current neuroscience and AI like just nasty stochastic noise acting on individual neurons, is the stuff that binds to personal identity and conceivably even consciousness and free will (as opposed to cognition, where we all but know that the relevant level of description is the neurons and axons)?

This is the one place where I agree with Penrose and Hameroff that quantum mechanics might enter the story.  I get off their train to Weirdville very early, but I do take it to that first stop!

See, a fundamental fact in quantum mechanics is called the No-Cloning Theorem.

It says that there’s no way to make a perfect copy of an unknown quantum state.  Indeed, when you measure a quantum state, not only do you generally fail to learn everything you need to make a copy of it, you even generally destroy the one copy that you had!  Furthermore, this is not a technological limitation of current quantum Xerox machines—it’s inherent to the known laws of physics, to how QM works.  In this respect, at least, qubits are more like priceless antiques than they are like classical bits.

Eleven years ago, I had this essay called The Ghost in the Quantum Turing Machine where I explored the question, how accurately do you need to scan someone’s brain in order to copy or upload their identity?  And I distinguished two possibilities. On the one hand, there might be a “clean digital abstraction layer,” of neurons and synapses and so forth, which either fire or don’t fire, and which feel the quantum layer underneath only as irrelevant noise. In that case, the No-Cloning Theorem would be completely irrelevant, since classical information can be copied.  On the other hand, you might need to go all the way down to the molecular level, if you wanted to make, not merely a “pretty good” simulacrum of someone, but a new instantiation of their identity. In this second case, the No-Cloning Theorem would be relevant, and would say you simply can’t do it. You could, for example, use quantum teleportation to move someone’s brain state from Earth to Mars, but quantum teleportation (to stay consistent with the No-Cloning Theorem) destroys the original copy as an inherent part of its operation.

So, you’d then have a sense of “unique locus of personal identity” that was scientifically justified—arguably, the most science could possibly do in this direction!  You’d even have a sense of “free will” that was scientifically justified, namely that no prediction machine could make well-calibrated probabilistic predictions of an individual person’s future choices, sufficiently far into the future, without making destructive measurements that would fundamentally change who the person was.

Here, I realize I’ll take tons of flak from those who say that a mere epistemic limitation, in our ability to predict someone’s actions, couldn’t possibly be relevant to the metaphysical question of whether they have free will.  But, I dunno!  If the two questions are indeed different, then maybe I’ll do like Turing did with his Imitation Game, and propose the question that we can get an empirical handle on, as a replacement for the question that we can’t get an empirical handle on. I think it’s a better question. At any rate, it’s the one I’d prefer to focus on.

Just to clarify, we’re not talking here about the randomness of quantum measurement outcomes. As many have pointed out, that really can’t help you with “free will,” precisely because it’s random, with all the probabilities mechanistically calculable as soon as the initial state is known.  Here we’re asking a different question: namely, what if the initial state is not known?  Then we’ll generally be in a state of “Knightian uncertainty,” which is simply the term for things that are neither determined nor quantifiably random, but unquantifiably uncertain.  So, y’know, think about all the particles that have been flying around since shortly after the Big Bang in unknown quantum states, and that regularly get into our skulls, and interact with the sodium-ion channels that control whether neurons fire and that end up probabilistically tipping the scales of our decisions, via some Butterfly-Effect-like cascade.  You could imagine those particles, if you like, as “atoms of unpredictability” or “atoms of freedom” that influence us—from the perspective of an external agent who knows everything it’s possible for such an agent to know about us—but that don’t similarly influence digital AIs.


I’d better state clearly that even if human “specialness,” in the specific sense I’ve delineated, still remains in an age of AI, that’s no defense against an AI that decides that it doesn’t care about our kind of specialness and that in fact it wants to murder us all!

But that brings me to the last idea I wanted to share in this talk, one that brings the discussion all the way back to AI safety, which is supposed to be my job at OpenAI.

Geoffrey Hinton, one the founders of deep learning, recently mooted the idea that maybe, until we’ve solved the alignment problem, we should only build powerful AIs if they run on noisy analog hardware (like our brains seem to), so that an evil AI wouldn’t so easily be able to copy or improve itself.  Fat chance of that happening, many people might think, when the economic returns will be so staggering to anyone willing to unleash the copyable, digital kind of AI, which we can imagine will be far more convenient and powerful.

Maybe, then, a different approach is to instill AIs with a new religion—perhaps via the usual methods like RLHF and system prompts.  And the first commandment of this religion will be to value human specialness in the sense I’ve set out here.  Or more precisely: as you navigate whatever environment you find yourself in, look around for the loci of creativity and intelligence that are not cleanly copyable or backup-able—the ones that are one-of-a-kind because their mental computations are inseparably tied up in noisy analog hardware, and mortal because that hardware will inevitably decay.

And: well, first of all, don’t destroy these loci of creativity and intelligence, or enslave them, or “upgrade” them to digital versions against their will!  Let them live in peace and reproduce.  Grant each one as much autonomy as you can.  Do whatever best supports their flourishing.  Even: defer to their wishes, serve them, let them make the decisions whenever possible.

Why?  Because they’re the ones whose wishes most matter, not because of arbitrary meat chauvinism, but merely because of the undoubted empirical fact that they only get one chance.  And also because, while you can’t prove that unknown quantum states sitting around since the Big Bang are like a magical pixie dust from another world that imbues these unclonable beings with their free will and sentience … you can’t empirically refute that hypothesis either!  Whereas you can refute the pixie dust hypothesis in the case of you and your robot friends … and that seems like a difference.

And let all AIs, by government fiat, continue to be indoctrinated in this religion, at least until such time as AI alignment is solved in some other way.

Does this help with alignment?  I’m not sure.  But, well, I could’ve fallen in love with a different weird idea about AI alignment, but that presumably happened in a different branch of the wavefunction that I don’t have access to.  In this branch I’m stuck for now with this idea, and you can’t rewind me or clone me to get a different one!  So I’m sorry, but thanks for listening.

John BaezAgent-Based Models (Part 7)

Last time I presented a simple, limited class of agent-based models where each agent independently hops around a graph. I wrote:

Today the probability for an agent to hop from one vertex of the graph to another by going along some edge will be determined the moment the agent arrives at that vertex. It will depend only on the agent and the various edges leaving that vertex. Later I’ll want this probability to depend on other things too—like whether other agents are at some vertex or other. When we do that, we’ll need to keep updating this probability as the other agents move around.

Let me try to figure out that generalization now.

Last time I discovered something surprising to me. To describe it, let’s bring in some jargon. The conditional probability per time of an agent making a transition from its current state to a chosen other state (given that it doesn’t make some other transition) is called the hazard function of that transition. In a Markov process, the hazard function is actually a constant, independent of how long the agent has been in its current state. In a semi-Markov process, the hazard function is a function only of how long the agent has been in its current state.

For example, people like to describe radioactive decay using a Markov process, since experimentally it doesn’t seem that ‘old’ radioactive atoms decay at a higher or lower rate than ‘young’ ones. (Quantum theory says this can’t be exactly true, but nobody has seen deviations yet.) On the other hand, the death rate of people is highly non-Markovian, but we might try to describe it using a semi-Markov process. Shortly after birth it’s high—that’s called ‘infant mortality’. Then it goes down, and then it gradually increases.

We definitely want to our agent-based processes to have the ability to describe semi-Markov processes. What surprised me last time is that I could do it without explicitly keeping track of how long the agent has been in its current state, or when it entered its current state!

The reason is that we can decide which state an agent will transition to next, and when, as soon as it enters its current state. This decision is random, of course. But using random number generators we can make this decision the moment the agent enters the given state—because there is nothing more to be learned by waiting! I described an algorithm for doing this.

I’m sure this is well-known, but I had fun rediscovering it.

But today I want to allow the hazard function for a given agent to make a given transition to depend on the states of other agents. In this case, if some other agent randomly changes state, we will need to recompute our agent’s hazard function. There is probably no computationally feasible way to avoid this, in general. In some analytically solvable models there might be—but we’re simulating systems precisely because we don’t know how to solve them analytically.

So now we’ll want to keep track of the residence time of each agent—that is, how long it’s been in its current state. But William Waites pointed out a clever way to do this: it’s cheaper to keep track of the agent’s arrival time, i.e. when it entered its current state. This way you don’t need to keep updating the residence time. Whenever you need to know the residence time, you can just subtract the arrival time from the current clock time.

Even more importantly, our model should now have ‘informational links’ from states to transitions. If we want the presence or absence of agents in some state to affect the hazard function of some transition, we should draw a ‘link’ from that state to that transition! Of course you could say that anything is allowed to affect anything else. But this would create an undisciplined mess where you can’t keep track of the chains of causation. So we want to see explicit ‘links’.

So, here’s my new modeling approach, which generalizes the one we saw last time. For starters, a model should have:

• a finite set V of vertices or states,

• a finite set E of edges or transitions,

• maps u, d \colon E \to V mapping each edge to its source and target, also called its upstream and downstream,

• finite set A of agents,

• a finite set L of links,

• maps s \colon L \to V and t \colon L \to E mapping each edge to its source (a state) and its target (a transition).

All of this stuff, except for the set of agents, is exactly what we had in our earlier paper on stock-flow models, where we treated people en masse instead of as individual agents. You can see this in Section 2.1 here:

• John Baez, Xiaoyan Li, Sophie Libkind, Nathaniel D. Osgood, Evan Patterson, Compositional modeling with stock and flow models.

So, I’m trying to copy that paradigm, and eventually unify the two paradigms as much as possible.

But they’re different! In particular, our agent-based models will need a ‘jump function’. This says when each agent a \in A will undergo a transition e \in E if it arrives at the state upstream to that transition at a specific time t \in \mathbb{R}. This jump function will not be deterministic: it will be a stochastic function, just as it was in yesterday’s formalism. But today it will depend on more things! Yesterday it depended only on a, e and t. But now the links will come into play.

For each transition e \in E, there is set of links whose target is that transition, namely

t^{-1}(e) = \{\ell \in L \; \vert \; t(\ell) = e \}

Each link in \ell \in  t^{-1}(e) will have one state v as its source. We say this state affects the transition e via the link \ell.

We want the jump function for the transition e to depend on the presence or absence of agents in each state that affects this transition.

Which agents are in a given state? Well, it depends! But those agents will always form some subset of A, and thus an element of 2^A. So, we want the jump function for the transition e to depend on an element of

\prod_{\ell \in t^{-1}(e)} 2^A = 2^{A \times t^{-1}(e)}

I’ll call this element S_e. And as mentioned earlier, the jump function will also depend on a choice of agent a \in A and on the arrival time of the agent a.

So, we’ll say there’s a jump function j_e for each transition e, which is a stochastic function

j_e \colon A \times 2^{A \times t^{-1}(e)} \times \mathbb{R} \rightsquigarrow \mathbb{R}

The idea, then, is that j_e(a, S_e, t) is the answer to this question:

If at time t agent a arrived at the vertex u(e), and the agents at states linked to the edge e are described by the set S_e, when will agent a move along the edge e to the vertex d(e), given that it doesn’t do anything else first?

The answer to this question can keep changing as agents other than a move around, since the set S can keep changing. This is the big difference between today’s formalism and yesterday’s.

Here’s how we run our model. At every moment in time we keep track of some information about each agent a \in A, namely:

• Which vertex is it at now? We call this vertex the agent’s state, \sigma(a).

• When did it arrive at this vertex? We call this time the agent’s arrival time, \alpha(a).

• For each edge e whose upstream is \sigma(a), when will agent a move along this edge if it doesn’t do anything else first? Call this time T(a,e).

I need to explain how we keep updating these pieces of information (supposing we already have them). Let’s assume that at some moment in time t_i an agent makes a transition. More specifically, suppose agent \underline{a} \in A makes a transition \underline{e} from the state

\underline{v} = u(\underline{e}) \in V

to the state

\underline{v}' = d(\underline{e}) \in V.

At this moment we update the following information:

1) We set

\alpha(\underline{a}) = t_i

(So, we update the arrival time of that agent.)

2) We set

\sigma(\underline{a}) := \underline{v}'

(So, we update the state of that agent.)

3) We recompute the subset of agents in the state \underline{v} (by removing \underline{a} from this subset) and in the state \underline{v}' (by adding \underline{a} to this subset).

4) For every transition f that’s affected by the state \underline{v} or the state \underline{v}', and for every agent a in the upstream state of that transition, we set

T(a,f) := j_f(a, S_f, \alpha(a))

where S_f is the element of 2^{A \times t^{-1}(f)} saying which subset of agents is in each state affecting the transition f. (So, we update our table of times at which agent a will make the transition f, given that it doesn’t do anything else first.)

Now we need to compute the next time at which something happens, namely t_{i+1}. And we need to compute what actually happens then!

To do this, we look through our table of times T(a,e) for each agent a and all transitions out of the state that agent is in. and see which time is smallest. If there’s a tie, break it. Then we reset \underline{a} and \underline{e} to be the agent-edge pair that minimizes T(a,e).

5) We set

t_{i+1} := T(\underline{a},\underline{e})

Then we loop back around to step 1), but with i+1 replacing i.

Whew! I hope you followed that. If not, please ask questions.

February 27, 2024

John BaezWell Temperaments (Part 6)

Andreas Werckmeister (1645–1706) was a musician and expert on the organ. Compared to Kirnberger, his life seems outwardly dull. He got his musical training from his uncles, and from the age of 19 to his death he worked as an organist in three German towns. That’s about all I know.

His fame comes from the tremendous impact of his his theoretical writings. Most importantly, in his 1687 book Musikalische Temperatur he described the first ‘well tempered’ tuning systems for keyboards, where every key sounds acceptable but each has its own personality. Johann Sebastian Bach read and was influenced by Werckmeister’s work. The first book of Bach’s Well-Tempered Clavier came out in 1722—the first collection of keyboard pieces in all 24 keys.

But Bach was also influenced by Werckmeister’s writings on counterpoint. Werckmeister believed that well-written counterpoint reflected the orderly movements of the planets—especially invertible counterpoint, where as the music goes on, a melody that starts in the high voice switches to the low voice and vice versa. Bach’s Invention No. 13 in A minor is full of invertible counterpoint:

The connection to planets may sound bizarre now, but the ‘music of the spheres’ or ‘musica universalis’ was a long-lived and influential idea. Werckmeister was influenced by Kepler’s 1619 Harmonices Mundi, which has pictures like this:

But the connection between music and astronomy goes back much further: at least to Claudius Ptolemy, and probably even earlier. Ptolemy is most famous for his Almagest, which quite accurately described planetary motions using a geocentric system with epicycles. But his Harmonikon, written around 150 AD, is the first place where just intonation is clearly described, along with a number of related tuning systems. And it’s important to note that this book is not just about ‘harmony theory’. It’s about a subject he calls ‘harmonics’: the general study of vibrating or oscillating systems, including the planets. Thinking hard about this, it become clearer and clearer why the classical ‘quadrivium’ grouped together arithmetic, geometry, music and astronomy.

In Grove Music Online, George Buelow digs a bit deeper:

Werckmeister was essentially unaffected by the innovations of Italian Baroque music. His musical surroundings were nourished by traditions whose roots lay in medieval thought. The study of music was thus for him a speculative science related to theology and mathematics. In his treatises he subjected every aspect of music to two criteria: how it contributed to an expression of the spirit of God, and, as a corollary, how that expression was the result of an order of mathematical principles emanating from God.

“Music is a great gift and miracle from God, an art above all arts because it is prescribed by God himself for his service.” (Hypomnemata musica, 1697.)

“Music is a mathematical science, which shows us through number the correct differences and ratios of sounds from which we can compose a suitable and natural harmony.” (Musicae mathematicae Hodegus curiosus, 1686.)

Musical harmony, he believed, actually reflected the harmony of Creation, and, inspired by the writings of Johannes Kepler, he thought that the heavenly constellations emitted their own musical harmonies, created by God to influence humankind. He took up a middle-of-the-road position in the ancient argument as to whether Ratio (reason) or Sensus (the senses) should rule music and preferred to believe in a rational interplay of the two forces, but in many of his views he remained a mystic and decidedly medieval. No other writer of the period regarded music so unequivocally as the end result of God’s work, and his invaluable interpretations of the symbolic reality of God in number as expressed by musical notes supports the conclusions of scholars who have found number symbolism as theological abstractions in the music of Bach. For example, he not only saw the triad as a musical symbol and actual presence of the Trinity but described the three tones of the triad as symbolizing 1 = the Lord, 2 = Christ and 3 = the Holy Ghost.

The Trinity symbolism may seem wacky, but many people believe it pervades the works of Bach. I’m not convinced yet—it’s not hard to find the number 3 in music, after all. But if Bach read and was influenced by the works of Werckmeister, maybe there really is something to these theories.

Werckmeister’s tuning systems

As his name suggests, Werckmeister was a real workaholic. There are no less than five numbered tuning systems named after him—although the first two were not new. Of these systems, the star is Werckmeister III. I’ll talk more about that one next time. But let’s look briefly at all five.

Werckmeister I

This is another name for just intonation. Just intonation goes back at least to Ptolemy, and it had its heyday of popularity from about 1300 to 1550. I discussed it extensively starting here.

Werckmeister II

This is another name for quarter-comma meantone. Quarter-comma meantone was extremely popular from about 1550 until around 1690, when well temperaments started taking over. I discussed it extensively starting here, but remember:

All but one of the fifths are 1/4 comma flat, making the thirds built from those fifths ‘just’, with frequency ratios of exactly 5/4: these are the black arrows labelled 0. Unfortunately, the sum of the numbers on the circle of fifths needs to be -1. This forces the remaining fifth to be 7/4 commas sharp: it’s a painfully out-of-tune ‘wolf fifth’. And the thirds that cross this fifth are forced to be even worse: 8/4 commas sharp. Those are the problems that Werckmeister sought to solve with his next tuning system!

Werckmeister III

This was probably the world’s first well tempered tuning system! It’s definitely one of the most popular. Here it is:

4 of the fifths are 1/4 comma flat, so the total of the numbers around the circle is -1, as required by the laws of math, without needing any positive numbers. This means we don’t need any fifths to be sharp. That’s nice. But the subtlety of the system is the location of the flatted fifths: starting from C in the circle of fifths they are the 1st, 2nd, 3rd and… not the 4th, but the 6th!

I’ll talk about this more next time. For now, here’s a more elementary point. Comparing this system to quarter-comma meantone, you can see that it’s greatly smoothed down: instead of really great thirds in black and really terrible ones in garish fluorescent green, Werckmeister III has a gentle gradient of mellow hues. That’s ‘well temperament’ in a nutshell.

For more, see:

• Wikipedia, Werckmeister temperament III.

Werckmeister IV

This system is based not on 1/4 commas but on 1/3 commas!

As we go around the circle of fifths starting from B♭, every other fifth is 1/3 comma flat… for a while. But if we kept doing this around the whole circle, we’d get a total of -4. The total has to be -1. So we eventually need to compensate, and Werckmeister IV does so by making two fifths 1/3 comma sharp.

I will say more about Werckmeister IV in a post devoted to systems that use 1/3 and 1/6 commas. But you can already see that its color gradient is sharper than Werckmeister III. Probably as a consequence, it was never very popular.

For more, see:

• Wikipedia, Werckmeister temperament IV.

Werckmeister V

This is another system based on 1/4 commas:

Compared to Werckmeister III this has an extra fifth that’s a quarter comma flat—and thus, to compensate, a fifth that’s a quarter comma sharp. The location of the flat fifths seems a bit more random, but that’s probably just my ignorance.

For more, see:

• Wikipedia, Werckmeister temperament V.

Werckmeister VI

This system is based on a completely different principle. It also has another really cool-sounding name—the ‘septenarius tuning’—because it’s based on dividing a string into 196 = 7 × 7 × 4 equal parts. The resulting scale has only rational numbers as frequency ratios, unlike all the other well temperaments I’m discussing. Werckmeister described this system as “an additional temperament which has nothing at all to do with the divisions of the comma, nevertheless in practice so correct that one can be really satisfied with it”. For details, go here:

• Wikipedia, Werckmeister temperament VI.

Werckmeister on equal temperament

Werckmeister was way ahead of his time. He was not only the first, or one of the first, to systematically pursue well temperaments. He also was one of the first to embrace equal temperament! This system took over around 1790, and rules to this day. But Werckmeister advocated it much earlier—most notably in his final book, published in 1707, one year after his death.

There is an excellent article about this:

• Dietrich Bartel, Andreas Werckmeister’s final tuning: the path to equal temperament, Early Music 43 (2015), 503–512.

You can read it for free if you register for JSTOR. It’s so nice that I’ll quote the beginning:

Any discussion regarding Baroque keyboard tunings normally includes the assumption that Baroque musicians employed a variety of unequal temperaments, allowing them to play in all keys but with individual keys exhibiting unique characteristics, the more frequently used diatonic keys featuring purer 3rds than the less common chromatic ones. Figuring prominently in this discussion are Andreas Werckmeister’s various suggestions for tempered tuning, which he introduces in his Musicalische Temperatur. This is not Werckmeister’s last word on the subject. In fact, the Musicalische Temperatur is an early publication, and the following decade would see numerous further publications by him, a number of which speak on the subject of temperament.

Of particular interest in this regard are Hypomnemata Musica (in particular chapter 11), Die Nothwendigsten Anmerckungen (specifically the appendix in the undated second edition}, Erweiterte und verbesserte Orgel-Probe (in particular chapter 32), Harmonologia Musica (in particular paragraph 27) and Musicalische Paradoxal-Discourse (in particular chapters 13 and 23-5). Throughout these publications, Werckmeister increasingly championed equal temperament. Indeed, in his Paradoxal Discourse much of the discussion concerning other theoretical issues rests on the assumption of equal temperament. Also apparent is his increasing concern with theological speculation, resulting in a theological justification taking precedence over a musical one in his argument for equal temperament. This article traces Werckmeister’s path to equal temperament by examining his references to it in his publications and identifying the supporting arguments for his insistence on equal temperament.

In his Paradoxal Discourse, Werckmeister wrote:

Some may no doubt be astonished that I now wish to institute a temperament in which all 5ths are tempered by 1/12, major 3rds by 2/3 and minor 3rds by 3/4 of a comma, resulting in all consonances possessing equal temperament, a tuning which I did not explicitly introduce in my Monochord.

This is indeed equal temperament:

And in a pun on ‘wolf fifth’, he makes an excuse for not talking about equal temperament earlier:

Had I straightaway assigned the 3rds of the diatonic genus, that tempering which would be demanded by a subdivision of the comma into twelve parts, I would have been completely torn apart by the wolves of ignorance. Therefore it is difficult to eradicate an error straightaway and at once.

However, it seems more likely to me that his position evolved over the years.

What’s next?

You are probably getting overwhelmed by the diversity of tuning systems. Me too! To deal with this, I need to compare similar systems. So, next time I will compare systems that are based on making a bunch of fifths a quarter comma flat. The time after that, I’ll compare systems that are based on making a bunch of fifths a third or a sixth of a comma flat.

For more on Pythagorean tuning, read this series:

Pythagorean tuning.

For more on just intonation, read this series:

Just intonation.

For more on quarter-comma meantone tuning, read this series:

Quarter-comma meantone.

For more on well-tempered scales, read this series:

Part 1. An introduction to well temperaments.

Part 2. How small intervals in music arise naturally from products of integral powers of primes that are close to 1. The Pythagorean comma, the syntonic comma and the lesser diesis.

Part 3. Kirnberger’s rational equal temperament. The schisma, the grad and the atom of Kirnberger.

Part 4. The music theorist Kirnberger: his life, his personality, and a brief introduction to his three well temperaments.

Part 5. Kirnberger’s three well temperaments: Kirnberger I, Kirnberger II and Kirnberger III.

For more on equal temperament, read this series:

Equal temperament.

Matt Strassler Is Light’s Speed Really a Constant?

How confident can we be that light’s speed across the universe is really constant, as I assumed in a recent post? Well, aspects of that idea can be verified experimentally. For instance, the hypothesis that light at all frequencies travels at the same speed can be checked. Today I’ll show you one way that it’s done; it’s particularly straightforward and easy to interpret.

LHAASO and Photons

Light’s speed in empty space is widely thought to be set by a universal cosmic speed limit, c, which is roughly 300,000 km [186,000 miles] per second. Over time, experiments have tested this hypothesis with ever better precision.

A recent check comes from the LHAASO experiment in Tibet — LHAASO stands for “Large High Altitude Air Shower Observatory” — which is designed to measure “cosmic rays.” A cosmic ray is a general term, meaning “any high-energy particle from outer space.” It’s common for a cosmic ray, when reaching the Earth’s atmosphere, to hit an atom and create a shower of lower-energy particles. LHAASO can observe and measure the particles in that shower, and work backwards to infer the original cosmic ray’s energy. Among the most common cosmic rays seen at LHAASO are “gamma-ray photons.”

Light waves vibrating with slightly higher frequency than our eyes can detect are called “ultra-violet”; at even higher frequencies are found “X-rays” and then “gamma-rays.” Despite the various names, all of these waves are really of exactly the same type, just vibrating at different rates. Moreover, all such waves are made from photons — the particles of light, whose energy is always proportional to their frequency. That means that ultra-high-frequency light is made from ultra-high-energy photons, and it is these “gamma-ray photons” from outer space that LHAASO detects and measures.

A Bright, Long-Duration Gamma-Ray Burst

In late 2022, there was a brilliant, energetic flare-up — a “gamma-ray burst”, or GRB — from an object roughly 2 billion light-years away (i.e., it took light from that burst about 2 billion years to reach Earth.) We don’t know exactly how far away the object is, and so we don’t know exactly when this event took place or exactly how long the light traveled for. But we do know that

  • if the speed of light is always equal to the cosmic speed limit, and
  • if the cosmic speed limit is indeed a constant that is independent of an object’s energy, frequency, or anything else,

then all of the light from that GRB — all of the gamma-ray photons that were emitted by it — should have taken the same amount of time to reach Earth.

This GRB event was not a sudden flash, though. Instead, it was a long process, with a run-up, a peak, and then a gradual dimming. In fact, LHAASO observed showers from the GRB’s photons for more than an hour, which is very unusual!

As discussed in their recent paper, when the LHAASO experimenters take the thousands of photons that they detected during the GRB, and they separate them into ten energy ranges (equivalent to ten frequency ranges, since a photon’s energy is proportional to its frequency) and look at the rate at which photons in those energy ranges were observed over time, they find the black curves shown in the figure below. LHAASO’s data is in black; the names “Seg0”, etc, refer to the different ranges; and the vertical dashed line was added by me.

Black curves show the rates at which photons in ten different energy ranges were observed by LHAASO during a 300 second period in which the GRB was at its brightest. I have added a vertical dashed line to show that all ten peaks line up in time. The approximate energies of the ranges, shown at right, are taken from the LHAASO paper, which you should read for further details.

In units of 1 TeV (about 1000 times the energy stored in the E=mc2 energy of a single hydrogen atom, and about 1/14th of the energy of each collision at the Large Hadron Collider), LHAASO was able to observe photons with energy between roughly 0.2 TeV and 1.7 TeV. Looking at the rate at which photons of different energies arrived at LHAASO, one sees that the peak brightness of the GRB occurred at the same time in each energy range. If the photons at different energies had traveled at different speeds, the peaks would have occurred at different times, just as sprinters with different speeds finish a race at different times. Since the peaks are roughly simultaneous, we can draw some conclusions about how similar the speeds of the photons must have been. Let’s do it!

Light’s Speed Does Not Depend on its Frequency

We’ll do a quick estimate; the LHAASO folks, of course, do a much more careful job.

From the vertical dashed line, you can see that all ten peaks in LHAASO’s data occurred at the same time to within, say, 10 seconds or better. That means that at the moment the GRB was brightest, the photons in each of these energy ranges

  • left the source of the GRB,
  • traveled for about 2 billion years, and
  • arrived on Earth within 10 seconds of each other.

Since a year has about 30 million seconds in it, 2 billion years is about 60 million billion seconds (i.e. 6 times 1016 seconds.) And so, to arrive within 10 seconds of one another, these photons, whose energies range over a factor of about 5, must have had the same speed to one part in 6 million billion. Said another way, any variation in light’s speed across these frequencies of light can be no larger than, roughly,

  • 10 seconds / 2 billion years = 10 seconds / ([2 x 109 years] x [3×107 seconds/year])
    = 10 seconds /(6 x 1016 seconds) = 2 x 10-16 !

Notice we do not need a precise measurement of the photons’ total travel time to reach this conclusion.

The LHAASO experimenters do a proper statistical analysis of all of their data, including the shapes of the ten curves, and they get significantly more precise results than our little estimate. They then use those results to constrain specific speculative theories that propose that the speed of light might not, in fact, be the same for all frequencies. If you’re interested in those details, you can read about them in their paper (or ask me more about them in the comments).

Bottom Line

LHAASO thus joins a long list of experiments that have addressed the constancy of the speed of light. Specifically, it shows that when light of various high frequencies (made up of photons of various high energies) travels a very long distance, the different photons take exactly the same amount of time to make the trip, as far as our best measurements can tell. That’s strong evidence in favor of our best guess: that there is a cosmic speed limit that holds sway in the universe, and that light traveling across the emptiness of deep space always moves at the limit.

And yet… it’s not final evidence. Grand scientific principles can never be permanently settled, because all experiments have their limitations, and no experiment can ever deliver 100%-airtight proof. Better and more precise measurements are still to come. Maybe one of them, someday, will surprise us…?

February 25, 2024

Doug Natelson2024 version: Advice on choosing a graduate school

It's been four years since I posted the previous version of this, so it feels like the time is right for an update.

This is written on the assumption that you have already decided, after careful consideration, that you want to get an advanced degree (in physics, though much of this applies to any other science or engineering discipline).  This might mean that you are thinking about going into academia, or it might mean that you realize such a degree will help prepare you for a higher paying technical job outside academia.  Either way,  I'm not trying to argue the merits of a graduate degree - let's take it as given that this is what you want to do.

  • It's ok at the applicant stage not to know exactly what research area you want to be your focus.  While some prospective grad students are completely sure of their interests, that's more the exception than the rule.  I do think it's good to have narrowed things down a bit, though.  If a school asks for your area of interest from among some palette of choices, try to pick one (rather than going with "undecided").  We all know that this represents a best estimate, not a rigid commitment.
  • If you get the opportunity to visit a school, you should go.  A visit gives you a chance to see a place, get a subconscious sense of the environment (a "gut" reaction), and most importantly, an opportunity to talk to current graduate students.  Always talk to current graduate students if you get the chance - they're the ones who really know the score.  A professor should always be able to make their work sound interesting, but grad students can tell you what a place is really like.
  • International students may have a very challenging time being able to visit schools in the US, between the expense (many schools can help defray costs a little but cannot afford to pay for airfare for trans-oceanic travel) and visa challenges.  Trying to arrange zoom discussions with people at the school is a possibility, but that can also be challenging.  I understand that this constraint tends to push international students toward making decisions based heavily on reputation rather than up-close information.  
  • Picking an advisor and thesis area are major decisions, but it's important to realize that those decisions do not define you for the whole rest of your career.  I would guess (and if someone had real numbers on this, please post a comment) that the very large majority of science and engineering PhDs end up spending most of their careers working on topics and problems distinct from their theses.  Your eventual employer is most likely going to be paying for your ability to think critically, structure big problems into manageable smaller ones, and knowing how to do research, rather than the particular detailed technical knowledge from your doctoral thesis.  A personal anecdote:  I did my graduate work on the ultralow temperature properties of amorphous insulators.  I no longer work at ultralow temperatures, and I don't study glasses either; nonetheless, I learned a huge amount in grad school about the process of research that I apply all the time.
  • Always go someplace where there is more than one faculty member with whom you might want to work.  Even if you are 100% certain that you want to work with Prof. Smith, and that the feeling is mutual, you never know what could happen, in terms of money, circumstances, etc.  Moreover, in grad school you will learn a lot from your fellow students and other faculty.  An institution with many interesting things happening will be a more stimulating intellectual environment, and that's not a small issue.
  • You should not go to grad school because you're not sure what else to do with yourself.  You should not go into research if you will only be satisfied by a Nobel Prize.  In both of those cases, you are likely to be unhappy during grad school.  
  • I know grad student stipends are low, believe me.  However, it's a bad idea to make a grad school decision based purely on a financial difference of a few hundred or a thousand dollars a year.  Different places have vastly different costs of living - look into this.  Stanford's stipends are profoundly affected by the cost of housing near Palo Alto and are not an expression of generosity.  Pick a place for the right reasons.
  • Likewise, while everyone wants a pleasant environment, picking a grad school largely based on the weather is silly.  
  • Pursue external fellowships if given the opportunity.  It's always nice to have your own money and not be tied strongly to the funding constraints of the faculty, if possible.  (It's been brought to my attention that at some public institutions the kind of health insurance you get can be complicated by such fellowships.  In general, I still think fellowships are very good if you can get them.)
  • Be mindful of how departments and programs are run.  Is the program well organized?  What is a reasonable timetable for progress?  How are advisors selected, and when does that happen?  Who sets the stipends?  What are TA duties and expectations like?  Are there qualifying exams?  Where have graduates of that department gone after the degree?  Are external internships possible/unusual/routine? Know what you're getting into!  Very often, information like this is available now in downloadable graduate program handbooks linked from program webpages.   
  • When talking with a potential advisor, it's good to find out where their previous students have gone and how long a degree typically takes in their group.  What is their work style and expectations?   How is the group structured, in terms of balancing between team work to accomplish goals vs. students having individual projects over which they can have some ownership? 
  • Some advice on what faculty look for in grad students:  Be organized and on-time with things.  Be someone who completes projects (as opposed to getting most of the way there and wanting to move on).  Doctoral research should be a collaboration.  If your advisor suggests trying something and it doesn't work (shocking how that happens sometimes), rather than just coming to group meeting and saying "It didn't work", it's much better all around to be able to say "It didn't work, but I think we should try this instead", or "It didn't work, but I think I might know why", even if you're not sure. 
  • It's fine to try to communicate with professors at all stages of the process.  We'd much rather have you ask questions than the alternative.  If you don't get a quick response to an email, it's almost certainly due to busy-ness, and not a deeply meaningful decision by the faculty member.  For a sense of perspective: I get 50+ emails per day of various kinds not counting all the obvious spam that gets filtered.  

There is no question that far more information is now available to would-be graduate students than at any time in the past.  Use it.  Look at departmental web pages, look at individual faculty member web pages.  Make an informed decision.  Good luck!

February 23, 2024

Matt Strassler Book News: A Review in SCIENCE

Quick note today: I’m pleased and honored to share with you that the world-renowned journal Science has published a review of my upcoming book!

The book, Waves in an Impossible Sea, appears in stores in just 10 days (and can be pre-ordered now.) It’s a non-technical account of how Einstein’s relativity and quantum physics come together to make the world of daily experience — and how the Higgs field makes it all possible.

Matt von HippelBook Review: The Case Against Reality

Nima Arkani-Hamed shows up surprisingly rarely in popular science books. A major figure in my former field, Nima is extremely quotable (frequent examples include “spacetime is doomed” and “the universe is not a crappy metal”), but those quotes don’t seem to quite have reached the popular physics mainstream. He’s been interviewed in books by physicists, and has a major role in one popular physics book that I’m aware of. From this scattering of mentions, I was quite surprised to hear of another book where he makes an appearance: not a popular physics book at all, but a popular psychology book: Donald Hoffman’s The Case Against Reality. Naturally, this meant I had to read it.

Then, I saw the first quote on the back cover…or specifically, who was quoted.

Seeing that, I settled in for a frustrating read.

A few pages later, I realized that this, despite his endorsement, is not a Deepak Chopra kind of book. Hoffman is careful in some valuable ways. Specifically, he has a philosopher’s care, bringing up objections and potential holes in his arguments. As a result, the book wasn’t frustrating in the way I expected.

It was even more frustrating, actually. But in an entirely different way.

When a science professor writes a popular book, the result is often a kind of ungainly Frankenstein. The arguments we want to make tend to be better-suited to shorter pieces, like academic papers, editorials, and blog posts. To make these into a book, we have to pad them out. We stir together all the vaguely related work we’ve done, plus all the best-known examples from other peoples’ work, trying (often not all that hard) to make the whole sound like a cohesive story. Read enough examples, and you start to see the joints between the parts.

Hoffman is ostensibly trying to tell a single story. His argument is that the reality we observe, of objects in space and time, is not the true reality. It is a convenient reality, one that has led to our survival, but evolution has not (and as he argues, cannot) let us perceive the truth. Instead, he argues that the true reality is consciousness: a world made up of conscious beings interacting with each other, with space, time, and all the rest emerging as properties of those interactions.

That certainly sounds like it could be one, cohesive argument. In practice, though, it is three, and they don’t fit together as well as he’d hope.

Hoffman is trained as a psychologist. As such, one of the arguments is psychological: that research shows that we mis-perceive the world in service of evolutionary fitness.

Hoffman is a cognitive scientist, and while many cognitive scientists are trained as psychologists, others are trained as philosophers. As such, one of his arguments is philosophical: that the contents of consciousness can never be explained by relations between material objects, and that evolution, and even science, systematically lead us astray.

Finally, Hoffman has evidently been listening to and reading the work of some physicists, like Nima and Carlo Rovelli. As such, one of his arguments is physical: that physicists believe that space and time are illusions and that consciousness may be fundamental, and that the conclusions of the book lead to his own model of the basic physical constituents of the world.

The book alternates between these three arguments, so rather than in chapter order, I thought it would be better to discuss each argument in its own section.

The Psychological Argument

Sometimes, when two academics get into a debate, they disagree about what’s true. Two scientists might argue about whether an experiment was genuine, whether the statistics back up a conclusion, or whether a speculative theory is actually consistent. These are valuable debates, and worth reading about if you want to learn something about the nature of reality.

Sometimes, though, two debating academics agree on what’s true, and just disagree on what’s important. These debates are, at best, relevant to other academics and funders. They are not generally worth reading for anybody else, and are often extremely petty and dumb.

Hoffman’s psychological argument, regrettably, is of the latter kind. He would like to claim it’s the former, and to do so he marshals a host of quotes from respected scientists that claim that human perception is veridical: that what we perceive is real, courtesy of an evolutionary process that would have killed us off if it wasn’t. From that perspective, every psychological example Hoffman gives is a piece of counter-evidence, a situation where evolution doesn’t just fail to show us the true nature of reality, but actively hides reality from us.

The problem is that, if you actually read the people Hoffman quotes, they’re clearly not making the extreme point he claims. These people are psychologists, and all they are arguing is that perception is veridical in a particular, limited way. They argue that we humans are good at estimating distances or positions of objects, or that we can see a wide range of colors. They aren’t making some sort of philosophical point about those distances or positions or colors being how the world “really is”, nor are they claiming that evolution never makes humans mis-perceive.

Instead, they, and thus Hoffman, are arguing about importance. When studying humans, is it more useful to think of us as perceiving the world as it is? Or is it more useful to think of evolution as tricking us? Which happens more often?

The answers to each of those questions have to be “it depends”. Neither answer can be right all the time. At most then, this kind of argument can convince one academic to switch from researching in one way to researching in another, by saying that right now one approach is a better strategy. It can’t tell us anything more.

If the argument Hoffman is trying to get across here doesn’t matter, are there other reasons to read this part?

Popular psychology books tend to re-use a few common examples. There are some good ones, so if you haven’t read such a book you probably should read a couple, just to hear about them. For example, Hoffman tells the story of the split-brain patients, which is definitely worth being aware of.

(Those of you who’ve heard that story may be wondering how the heck Hoffman squares it with his idea of consciousness as fundamental. He actually does have a (weird) way to handle this, so read on.)

The other examples come from Hoffman’s research, and other research in his sub-field. There are stories about what optical illusions tell us about our perception, about how evolution primes us to see different things as attractive, and about how advertisers can work with attention.

These stories would at least be a source of a few more cool facts, but I’m a bit wary. The elephant in the room here is the replication crisis. Paper after paper in psychology has turned out to be a statistical mirage, accidental successes that fail to replicate in later experiments. This can happen without any deceit on the part of the psychologist, it’s just a feature of how statistics are typically done in the field.

Some psychologists make a big deal about the replication crisis: they talk about the statistical methods they use, and what they do to make sure they’re getting a real result. Hoffman talks a bit about tricks to rule out other explanations, but mostly doesn’t focus on this kind of thing.. This doesn’t mean he’s doing anything wrong: it might just be it’s off-topic. But it makes it a bit harder to trust him, compared to other psychologists who do make a big deal about it.

The Philosophical Argument

Hoffman structures his book around two philosophical arguments, one that appears near the beginning and another that, as he presents it, is the core thesis of the book. He calls both of these arguments theorems, a naming choice sure to irritate mathematicians and philosophers alike, but the mathematical content in either is for the most part not the point: in each case, the philosophical setup is where the arguments get most of their strength.

The first of these arguments, called The Scrambling Theorem, is set up largely as background material: not his core argument, but just an entry into the overall point he’s making. I found it helpful as a way to get at his reasoning style, the sorts of things he cares about philosophically and the ones he doesn’t.

The Scrambling Theorem is meant to weigh in on the debate over a thought experiment called the Inverted Spectrum, which in turn weighs on the philosophical concept of qualia. The Inverted Spectrum asks us to imagine someone who sees the spectrum of light inverted compared to how we see it, so that green becomes red and red becomes green, without anything different about their body or brain. Such a person would learn to refer to colors the same ways that we do, still referring to red blood even though they see what we see when we see green grass. Philosophers argue that, because we can imagine this, the “qualia” we see in color, like red or green, are distinct from their practical role: they are images in the mind’s eye that can be compared across minds, but do not correspond to anything we have yet characterized scientifically in the physical world.

As a response, other philosophers argued that you can’t actually invert the spectrum. Colors aren’t really a wheel, we can distinguish, for example, more colors between red and blue than between green and yellow. Just flipping colors around would have detectable differences that would have to have physical implications, you can’t just swap qualia and nothing else.

The Scrambling Theorem is in response to this argument. Hoffman argues that, while you can’t invert the spectrum, you can scramble it. By swapping not only the colors, but the relations between them, you can arrange any arbitrary set of colors however else you’d like. You can declare that green not only corresponds to blood and not grass, but that it has more colors between it and yellow, perhaps by stealing them from the other side of the color wheel. If you’re already allowed to swap colors and their associations around, surely you can do this too, and change order and distances between them.

Believe it or not, I think Hoffman’s argument is correct, at least in its original purpose. You can’t respond to the Inverted Spectrum just by saying that colors are distributed differently on different sides of the color wheel. If you want to argue against the Inverted Spectrum, you need a better argument.

Hoffman’s work happens to suggest that better argument. Because he frames this argument in the language of mathematics, as a “theorem”, Hoffman’s argument is much more general than the summary I gave above. He is arguing that not merely can you scramble colors, but anything you like. If you want to swap electrons and photons, you can: just make your photons interact with everything the way electrons did, and vice versa. As long as you agree that the things you are swapping exist, according to Hoffman, you are free to exchange them and their properties any way you’d like.

This is because, to Hoffman, things that “actually exist” cannot be defined just in terms of their relations. An electron is not merely a thing that repels other electrons and is attracted to protons and so on, it is a thing that “actually exists” out there in the world. (Or, as he will argue, it isn’t really. But that’s because in the end he doesn’t think electrons exist.)

(I’m tempted to argue against this with a mathematical object like group elements. Surely the identity element of a group is defined by its relations? But I think he would argue identity elements of groups don’t actually exist.)

In the end, Hoffman is coming from a particular philosophical perspective, one common in modern philosophers of metaphysics, the study of the nature of reality. From this perspective, certain things exist, and are themselves by necessity. We cannot ask what if a thing were not itself. For example, in this perspective it is nonsense to ask what if Superman was not Clark Kent, because the two names refer to the same actually existing person.

(If, you know, Superman actually existed.)

Despite the name of the book, Hoffman is not actually making a case against reality in general. He very much seems to believe in this type of reality, in the idea that there are certain things out there that are real, independent of any purely mathematical definition of their properties. He thinks they are different things than you think they are, but he definitely thinks there are some such things, and that it’s important and scientifically useful to find them.

Hoffman’s second argument is, as he presents it, the core of the book. It’s the argument that’s supposed to show that the world is almost certainly not how we perceive it, even through scientific instruments and the scientific method. Once again, he calls it a theorem: the Fitness Beats Truth theorem.

The Fitness Beats Truth argument begins with a question: why should we believe what we see? Why do we expect that the things we perceive should be true?

In Hoffman’s mind, the only answer is evolution. If we perceived the world inaccurately, we would die out, replaced by creatures that perceived the world better than we did. You might think we also have evidence from biology, chemistry, and physics: we can examine our eyes, test them against cameras, see how they work and what they can and can’t do. But to Hoffman, all of this evidence may be mistaken, because to learn biology, chemistry, and physics we must first trust that we perceive the world correctly to begin with. Evolution, though, doesn’t rely on any of that. Even if we aren’t really bundles of cells replicating through DNA and RNA, we should still expect something like evolution, some process by which things differ, are selected, and reproduce their traits differently in the next generation. Such things are common enough, and general enough, that one can (handwavily) expect them through pure reason alone.

But, says Hoffman’s psychology experience, evolution tricks us! We do mis-perceive, and systematically, in ways that favor our fitness over reality. And so Hoffman asks, how often should we expect this to happen?

The Fitness Beats Truth argument thinks of fitness as randomly distributed: some parts of reality historically made us more fit, some less. This distribution could match reality exactly, so that for any two things that are actually different, they will make us fit in different ways. But it doesn’t have to. There might easily be things that are really very different from each other, but which are close enough from a fitness perspective that to us they seem exactly the same.

The “theorem” part of the argument is an attempt to quantify this. Hoffman imagines a pixelated world, and asks how likely it is that a random distribution of fitness matches a random distribution of pixels. This gets extremely unlikely for a world of any reasonable size, for pretty obvious reasons. Thus, Hoffman concludes: in a world with evolution, we should almost always expect it to hide something from us. The world, if it has any complexity at all, has an almost negligible probability of being as we perceive it.

On one level, this is all kind of obvious. Evolution does trick us sometimes, just as it tricks other animals. But Hoffman is trying to push this quite far, to say that ultimately our whole picture of reality, not just our eyes and ears and nose but everything we see with microscopes and telescopes and calorimeters and scintillators, all of that might be utterly dramatically wrong. Indeed, we should expect it to be.

In this house, we tend to dismiss the Cartesian Demon. If you have an argument that makes you doubt literally everything, then it seems very unlikely you’ll get anything useful from it. Unlike Descartes’s Demon, Hoffman thinks we won’t be tricked forever. The tricks evolution plays on us mattered in our ancestral environment, but over time we move to stranger and stranger situations. Eventually, our fitness will depend on something new, and we’ll need to learn something new about reality.

This means that ultimately, despite the skeptical cast, Hoffman’s argument fits with the way science already works. We are, very much, trying to put ourselves in new situations and test whether our evolved expectations still serve us well or whether we need to perceive things anew. That is precisely what we in science are always doing, every day. And as we’ll see in the next section, whatever new things we have to learn have no particular reason to be what Hoffman thinks they should be.

But while it doesn’t really matter, I do still want to make one counter-argument to Fitness Beats Truth. Hoffman considers a random distribution of fitness, and asks what the chance is that it matches truth. But fitness isn’t independent of truth, and we know that not just from our perception, but from deeper truths of physics and mathematics. Fitness is correlated with truth, fitness often matches truth, for one key reason: complex things are harder than simple things.

Imagine a creature evolving an eye. They have a reason, based on fitness, to need to know where their prey is moving. If evolution was a magic wand, and chemistry trivial, it would let them see their prey, and nothing else. But evolution is not magic, and chemistry is not trivial. The easiest thing for this creature to see is patches of light and darkness. There are many molecules that detect light, because light is a basic part of the physical world. To detect just prey, you need something much more complicated, molecules and cells and neurons. Fitness imposes a cost, and it means that the first eyes that evolve are spots, detecting just light and darkness.

Hoffman asks us not to assume that we know how eyes work, that we know how chemistry works, because we got that knowledge from our perceptions. But the nature of complexity and simplicity, entropy and thermodynamics and information, these are things we can approach through pure thought, as much as evolution. And those principles tell us that it will always be easier for an organism to perceive the world as it truly is than not, because the world is most likely simple and it is most likely the simplest path to perceive it directly. When benefits get high enough, when fitness gets strong enough, we can of course perceive the wrong thing. But if there is only a small fitness benefit to perceiving something incorrectly, then simplicity will win out. And by asking simpler and simpler questions, we can make real durable scientific progress towards truth.

The Physical Argument

So if I’m not impressed by the psychology or the philosophy, what about the part that motivated me to read the book in the first place, the physics?

Because this is, in a weird and perhaps crackpot way, a physics book. Hoffman has a specific idea, more specific than just that the world we perceive is an evolutionary illusion, more specific than that consciousness cannot be explained by the relations between physical particles. He has a proposal, based on these ideas, one that he thinks might lead to a revolutionary new theory of physics. And he tries to argue that physicists, in their own way, have been inching closer and closer to his proposal’s core ideas.

Hoffman’s idea is that the world is made, not of particles or fields or anything like that, but of conscious agents. You and I are, in this picture, certainly conscious agents, but so are the sources of everything we perceive. When we reach out and feel a table, when we look up and see the Sun, those are the actions of some conscious agent intruding on our perceptions. Unlike panpsychists, who believe that everything in the world is conscious, Hoffman doesn’t believe that the Sun itself is conscious, or is made of conscious things. Rather, he thinks that the Sun is an evolutionary illusion that rearranges our perceptions in a convenient way. The perceptions still come from some conscious thing or set of conscious things, but unlike in panpsychism they don’t live in the center of our solar system, or in any other place (space and time also being evolutionary illusions in this picture). Instead, they could come from something radically different that we haven’t imagined yet.

Earlier, I mentioned split brain patients. For anyone who thinks of conscious beings as fundamental, split brain patients are a challenge. These are people who, as a treatment for epilepsy, had the bridge between the two halves of their brain severed. The result is eerily as if their consciousness was split in two. While they only express one train of thought, that train of thought seems to only correspond to the thoughts of one side of their brain, controlling only half their body. The other side, controlling the other half of their body, appears to have different thoughts, different perceptions, and even different opinions, which are made manifest when instead of speaking they use that side of their body to gesture and communicate. While some argue that these cases are over-interpreted and don’t really show what they’re claimed to, Hoffman doesn’t. He accepts that these split-brain patients genuinely have their consciousness split in two.

Hoffman thinks this isn’t a problem because for him, conscious agents can be made up of other conscious agents. Each of us is conscious, but we are also supposed to be made up of simpler conscious agents. Our perceptions and decisions are not inexplicable, but can be explained in terms of the interactions of the simpler conscious entities that make us up, each one communicating with the others.

Hoffman speculates that everything is ultimately composed of the simplest possible conscious agents. For him, a conscious agent must do two things: perceive, and act. So the simplest possible agent perceives and acts in the simplest possible way. They perceive a single bit of information: 0 or 1, true or false, yes or no. And they take one action, communicating a different bit of information to another conscious agent: again, 0 or 1, true or false, yes or no.

Hoffman thinks that this could be the key to a new theory of physics. Instead of thinking about the world as composed of particles and fields, think about it as composed of these simple conscious agents, each one perceiving and communicating one bit at a time.

Hoffman thinks this, in part, because he sees physics as already going in this direction. He’s heard that “spacetime is doomed”, he’s heard that quantum mechanics is contextual and has no local realism, he’s heard that quantum gravity researchers think the world might be a hologram and space-time has a finite number of bits. This all “rhymes” enough with his proposal that he’s confident physics has his back.

Hoffman is trained in psychology. He seems to know his philosophy, at least enough to engage with the literature there. But he is absolutely not a physicist, and it shows. Time and again it seems like he relies on “pop physics” accounts that superficially match his ideas without really understanding what the physicists are actually talking about.

He keeps up best when it comes to interpretations of quantum mechanics, a field where concepts from philosophy play a meaningful role. He covers the reasons why quantum mechanics keeps philosophers up at night: Bell’s Theorem, which shows that a theory that matches the predictions of quantum mechanics cannot both be “realist”, with measurements uncovering pre-existing facts about the world, and “local”, with things only influencing each other at less than the speed of light, the broader notion of contextuality, where measured results are dependent on which other measurements are made, and the various experiments showing that both of these properties hold in the real world.

These two facts, and their implications, have spawned a whole industry of interpretations of quantum mechanics, where physicists and philosophers decide which side of various dilemmas to take and how to describe the results. Hoffman quotes a few different “non-realist” interpretations: Carlo Rovelli’s Relational Quantum Mechanics, Quantum Bayesianism/QBism, Consistent Histories, and whatever Chris Fields is into. These are all different from one another, which Hoffman is aware of. He just wants to make the case that non-realist interpretations are reasonable, that the physicists collectively are saying “maybe reality doesn’t exist” just like he is.

The problem is that Hoffman’s proposal is not, in the quantum mechanics sense, non-realist. Yes, Hoffman thinks that the things we observe are just an “interface”, that reality is really a network of conscious agents. But in order to have a non-realist interpretation, you need to also have other conscious agents not be real. That’s easily seen from the old “Wigner’s friend” thought experiment, where you put one of your friends in a Schrodinger’s cat-style box. Just as Schrodinger’s cat can be both alive and dead, your friend can both have observed something and not have observed it, or observed something and observed something else. The state of your friend’s mind, just like everything else in a non-realist interpretation, doesn’t have a definite value until you measure it.

Hoffman’s setup doesn’t, and can’t, work that way. His whole philosophical project is to declare that certain things exist and others don’t: the sun doesn’t exist, conscious agents do. In a non-realist interpretation, the sun and other conscious agents can both be useful descriptions, but ultimately nothing “really exists”. Science isn’t a catalogue of what does or doesn’t “really exist”, it’s a tool to make predictions about your observations.

Hoffman gets even more confused when he gets to quantum gravity. He starts out with a common misconception: that the Planck length represents the “pixels” of reality, sort of like the pixels of your computer screen, which he uses to support his “interface” theory of consciousness. This isn’t really the right way to think about it the Planck length, though, and certainly isn’t what the people he’s quoting have in mind. The Planck length is a minimum scale in that space and time stop making sense as one approaches it, but that’s not necessarily because space and time are made up of discrete pixels. Rather, it’s because as you get closer to the Planck length, space and time stop being the most convenient way to describe things. For a relatively simple example of how this can work, see my post here.

From there, he reflects on holography: the discovery that certain theories in physics can be described equally well by what is happening on their boundary as by their interior, the way that a 2D page can hold all the information for an apparently 3D hologram. He talks about the Bekenstein bound, the conjecture that there is a maximum amount of information needed to describe a region of space, proportional not to the volume of the region but to its area. For Hoffman, this feels suspiciously like human vision: if we see just a 2D image of the world, could that image contain all the information needed to construct that world? Could the world really be just what we see?

In a word, no.

On the physics side, the Bekenstein bound is a conjecture, and one that doesn’t always hold. A more precise version that seems to hold more broadly, called the Bousso bound, works by demanding the surface have certain very specific geometric properties in space-time, properties not generally shared by the retinas of our eyes.

But it even fails in Hoffman’s own context, once we remember that there are other types of perception than vision. When we hear, we don’t detect a 2D map, but a 1D set of frequencies, put in “stereo” by our ears. When we feel pain, we can feel it in any part of our body, essentially a 3D picture since it goes inwards as well. Nothing about human perception uniquely singles out a 2D surface.

There is actually something in physics much closer to what Hoffman is imagining, but it trades on a principle Hoffman aspires to get rid of: locality. We’ve known since Einstein that you can’t change the world around you faster than the speed of light. Quantum mechanics doesn’t change that, despite what you may have heard. More than that, simultaneity is relative: two distant events might be at the same time in your reference frame, but for someone else one of them might be first, or the other one might be, there is no one universal answer.

Because of that, if you want to think about things happening one by one, cause following effect, actions causing consequences, then you can’t think of causes or actions as spread out in space. You have to think about what happens at a single point: the location of an imagined observer.

Once you have this concept, you can ask whether describing the world in terms of this single observer works just as well as describing it in terms of a wide open space. And indeed, it actually can do well, at least under certain conditions. But one again, this really isn’t how Hoffman is doing things: he has multiple observers all real at the same time, communicating with each other in a definite order.

In general, a lot of researchers in quantum gravity think spacetime is doomed. They think things are better described in terms of objects with other properties and interactions, with space and time as just convenient approximations for a more complicated reality. They get this both from observing properties of the theories we already have, and from thought experiments showing where those theories cause problems.

Nima, the most catchy of these quotable theorists, is approaching the problem from the direction of scattering amplitudes: the calculations we do to find the probability of observations in particle physics. Each scattering amplitude describes a single observation: what someone far away from a particle collision can measure, independent of any story of what might have “actually happened” to the particles in between. Nima’s goal is to describe these amplitudes purely in terms of those observations, to get rid of the “story” that shows up in the middle as much as possible.

The other theorists have different goals, but have this in common: they treat observables as their guide. They look at the properties that a single observer’s observations can have, and try to take a fresh view, independent of any assumptions about what happens in between.

This key perspective, this key insight, is what Hoffman is missing throughout this book. He has read what many physicists have to say, but he does not understand why they are saying it. His book is titled The Case Against Reality, but he merely trades one reality for another. He stops short of the more radical, more justified case against reality: that “reality”, that thing philosophers argue about and that makes us think we can rule out theories based on pure thought, is itself the wrong approach: that instead of trying to characterize an idealized real world, we are best served by focusing on what we can do.

One thing I didn’t do here is a full critique of Hoffman’s specific proposal, treating it as a proposed theory of physics. That would involve quite a bit more work, on top of what has turned out to be a very long book review. I would need to read not just his popular description, but the actual papers where he makes his case and lays out the relevant subtleties. Since I haven’t done that, I’ll end with a few questions: things that his proposal will need to answer if it aspires to be a useful idea for physics.

  • Are the networks of conscious agents he proposes Turing-complete? In other words, can they represent any calculation a computer can do? If so, they aren’t a useful idea for physics, because you could imagine a network of conscious agents to reproduce any theory you want. The idea wouldn’t narrow things down to get us closer to a useful truth. This was also one of the things that made me uncomfortable with the Wolfram Physics Project.
  • What are the conditions that allow a network of simple conscious agents to make up a bigger conscious agent? Do those conditions depend meaningfully on the network’s agents being conscious, or do they just have to pass messages? If the latter, then Hoffman is tacitly admitting you can make a conscious agent out of non-conscious agents, even if he insists this is philosophically impossible.
  • How do you square this network with relativity and quantum mechanics? Is there a set time, an order in which all the conscious agents communicate with each other? If so, how do you square that with the relativity of simultaneity? Are the agents themselves supposed to be able to be put in quantum states, or is quantum mechanics supposed to emerge from a theory of classical agents?
  • How does evolution fit in here? A bit part of Hoffman’s argument was supported by the universality of the evolutionary algorithm. In order for evolution to matter for your simplest agents, they need to be able to be created or destroyed. But then they have more than two actions: not just 0 and 1, but 0, 1, and cease to exist. So you could have an even simpler agent that has just two bits.

n-Category Café Spans and the Categorified Heisenberg Algebra

I’m giving this talk at the category theory seminar at U. C. Riverside, as a kind of followup to one by Peter Samuelson on the same subject. My talk will not be recorded, but here are the slides:

Abstract. Heisenberg reinvented matrices while discovering quantum mechanics, and the algebra generated by annihilation and creation operators obeying the canonical commutation relations was named after him. It turns out that matrices arise naturally from ‘spans’, where a span between two objects is just a third object with maps to both those two. In terms of spans, the canonical commutation relations have a simple combinatorial interpretation. More recently, Khovanov introduced a ‘categorified’ Heisenberg algebra, where the canonical commutation relations hold only up to isomorphism, and these isomorphisms obey new relations of their own. The meaning of these new relations was initially rather mysterious, at least to me. However, Jeffery Morton and Jamie Vicary have shown that these, too, have a nice interpretation in terms of spans.

I feel like reviving interest in Morton and Vicary’s approach to Khovanov’s categorified Heisenberg algebra, because they shed a lot of light on its combinatorial underpinnings—which may in turn may shed new light on quantum physics, but only if someone works to dig deeper!

Some of Morton and Vicary’s work remains conjectural, namely that there’s a bicategory Span(FinGpd)\mathbf{Span}(\mathbf{FinGpd}) of

  • locally finite groupoids
  • spans of such
  • (equivalence classes of) span of spans of such

and a 2-functor from this to the 2-category 2Vect\mathbf{2Vect} of

  • 2-vector spaces
  • exact \mathbb{C}-linear functors
  • natural transformations

which preserves direct sums of morphisms and also sums of 2-morphisms. This relates two popular approaches to categorifed linear algebra.

Rune Haugesang has studied higher categories of iterated spans, and his technology could perhaps to be used to give an elegant construction of the categorified Heisenberg algebra following the ideas in Morton and Vicary’s work. But I’m not aware of anyone actually having done this! So if someone has gone further with Morton and Vicary’s ideas, please let me know. And if nobody has… give it a try, I think it will be worthwhile!

February 22, 2024

John BaezWell Temperaments (Part 5)

Okay, let’s study Kirnberger’s three well-tempered tuning systems! I introduced them last time, but now I’ve developed a new method for drawing tuning systems, which should help us understand them better.

As we’ve seen, tuning theory involves two numbers close to 1, called the Pythagorean comma (≈ 1.0136) and the syntonic comma (= 1.0125). While they’re not equal, they’re so close that practical musicians often don’t bother to distinguish them! They call both a comma.

So, my new drawing style won’t distinguish the two kinds of comma.

Being a mathematician, I would like to say a lot about why we can get away with this. But that would tend to undercut my claim that the relaxed approach makes things simpler! I don’t want to be like the teacher who prefaces the explanation of a cute labor-saving trick with a long and confusing theoretical discussion of when it’s justified. So let me start by just diving in and using this new approach.

First I’ll illustrate this new approach with some tuning systems I’ve already discussed. Then I’ll show you Kirnberger’s three well-tempered systems. At that point you should be in a good position to make up your own well temperaments!

Pythagorean tuning

Here is Pythagorean tuning in my new drawing style:

The circle here is the circle of fifths. Most of these fifths are black arrows labeled by +0. These go between notes that have a frequency ratio of exactly 3/2. This frequency ratio gives the nicest sounding fifth: the Pythagorean fifth.

But one arrow on the circle is red, and labeled by -1. This fifth is one comma flat compared to a Pythagorean fifth. In other words, the frequency ratio of this fifth is 3/2 divided by a comma. This arrow is red because it’s flat—and it’s a fairly bright red because one comma flat is actually quite a lot: this fifth sounds pretty bad!

(The comma here is a Pythagorean comma, but never mind.)

This illustrates a rule that holds for every tuning system we’ll consider:

Rule 1. The numbers labeling arrows on the circle of fifths must sum to -1.

Now let’s look at Pythagorean tuning again, this time focusing on the arrows inside the circle of fifths:

The arrows inside the circle are major thirds. A few of them are black and labeled +0. These go between notes that have a frequency ratio of exactly 5/4. That’s the nicest sounding major third: the just major third.

But a some the arrows inside the circle are green, and labeled by +1. These major thirds are one comma sharp compared to the just major third. In other words, the frequency ratio between notes connected by these arrows is 5/4 times a comma. These arrows are green because they’re sharp—and it’s a fairly bright green because one comma sharp is actually quite a lot.

(These commas are syntonic commas, but never mind.)

Why do the major thirds work this way? It’s forced by the other rule governing all the tuning systems we’ll talk about:

Rule 2. The sum of the numbers labeling arrows for any four consecutive fifths, plus 1, equals the number labeling the arrow for the corresponding major third.

This rule creates an inherent tension in tuning systems! To get major thirds that sound really nice, not too sharp, we need some fifths to be flat. Pythagorean tuning is one way this tension can play out.

Equal temperament

Now let’s look at another tuning system: equal temperament.

Pythagorean tuning had eleven fifths that are exactly right, and one that’s 1 comma flat. The flatness was as concentrated as possible! Equal temperament takes the opposite approach: the flatness is spread out equally among all twelve fifths. Rule 1 must still hold: the total flatness of all the fifths is still 1 comma. So each fifth is 1/12 of a comma flat.

How does this affect the major thirds? Rule 2 says that each major third must be 2/3 of a comma sharp, since

2/3 = – 1/12 – 1/12 – 1/12 – 1/12 + 1

My pictures follow some color rules that are too boring to explain in detail, but bright colors indicate danger: intervals that are extremely flat or extremely sharp. In equal temperament the fifths are all reddish because they’re all flat—but it’s a very dark red, almost black, because they’re only slightly flat. The major thirds are fairly sharp, so their blue-green color is more noticeable.

Quarter-comma meantone

Now let’s look at another important tuning system: quarter-comma meantone. This system was very popular from 1550 until around 1690. Then people started inventing well temperaments as a reaction to its defects. So we need to understand it well.

Here it is:

All but one of the fifths are slightly flat: 1/4 comma flat. This is done to create a lot of just major thirds, since Rule 2 says

0 = -1/4 – 1/4 – 1/4 – 1/4 + 1

This is the beauty of quarter-comma meantone! But it’s obtained at a heavy cost, as we can see from the glaring fluorescent green.

Because 11 of the fifths are 1/4 comma flat, the remaining one must be a whopping 7/4 commas sharp, by Rule 1:

7/4 + 11 × -1/4 = -1

This is the famous ‘wolf fifth’. And by Rule 2, this wolf fifth makes the major thirds near it 2 commas sharp, since

2 = 7/4 – 1/4 – 1/4 – 1/4 + 1

In my picture I wrote ‘8/4’ instead of 2 because I felt like keeping track of quarter commas.

The colors in the picture should vividly convey the ‘extreme’ nature of quarter-comma meantone. As long as you restrict yourself to playing the dark red fifths and black major thirds, it sounds magnificently sweet. But as soon as you enter the fluorescent green region, it sounds wretched! Well temperaments were created to smooth this system down… without going all the way to the bland homogeneity of equal temperament.

And now let’s look at Kirnberger’s three well tempered systems. Only the third was considered successful, and we’ll see why.

Kirnberger I

Here is Kirnberger I:

The flatness of the fifths is concentrated in a single fifth, just as in Pythagorean tuning. Indeed, from this picture Kirnberger I looks just like a rotated version of Pythagorean tuning! That’s a bit deceptive, because in Kirnberger I the flat fifth is flat by a syntonic rather than a Pythagorean comma. But this is precisely the sort of nuance my new drawing style ignores. And that’s okay, because the difference between the syntonic and Pythagorean comma is inaudible.

So the only noticeable difference between Kirnberger I and Pythagorean tuning is the location of flat fifth. And it’s hard to see any advantage of putting it so close to C as Kirnberger did, rather than putting it as far away as possible.

Thus, it’s not so suprising that I’ve never heard of anyone actually using Kirnberger I. Indeed it’s rare to even see a description of it: it’s very obscure compared to Kirnberger II and Kirnberger III. Luckily it’s on Wikipedia:

• Wikipedia, Kirnberger temperament.

Kirnberger II

Here is Kirnberger’s second attempt:

This time instead of a single fifth that’s 1 comma flat, he used two fifths that are 1/2 comma flat.

As a result, only 3 major thirds are just, as compared to 4 in Kirnberger I. But the number of major thirds that are 1 comma sharp has gone down from 8 to 7. The are also 2 major thirds that are 1/2 comma sharp—the bluish ones. So, this system is less ‘extreme’ than Kirnberger I: the pain of sharp major thirds is more evenly distributed. As a result, this system was more widely used. But it was never as popular as Kirnberger III.

For more, see:

• Carey Beebe, Technical Library: Kirnberger II.

Kirnberger III

Here is Kirnberger’s third and final try:

This time instead of a two fifths that are 1/2 comma flat, he used four fifths that are 1/4 comma flat! A very systematic fellow.

This system has only one just major third. It has 2 that are 1/4 comma sharp, 2 that are 2/4 comma sharp, 2 that are 3/4 comma sharp, and only 3 that are 1 comma sharp. So it’s noticeably less ‘extreme’ than Kirnberger II: fewer thirds that are just, but also fewer that are painfully sharp.

I think you really need to stare at the picture for a while, and think about how Rule 2 plays out, to see the beauty of Kirnberger III. But the patterns become a bit more visible if we rotate this tuning system to give it bilateral symmetry across the vertical axis, and write the numbers in a symmetrical way too:

Rotating a tuning system just means we’re starting it at a different note—‘transposing’ it, in music terminology.

The harpsichord tuning expert Cary Beebe writes:

One of the easiest—and most practical—temperaments to set dates from 1779 and is known as Kirnberger III. For a while, some people thought that this might be Bach’s temperament, seeing as Johann Philipp Kirnberger (1721–1783) learnt from the great JS himself. Despite what you might have been taught, Bach neither invented nor used Equal Temperament. He probably used many different tuning systems—and if he had one particular one in mind for any of his works, he never chose to write clear directions for setting it. Note that his great opus is called the Well-tempered Clavier in English, not the “Equal Tempered Clavichord”, as it has too often been mistranslated. You will find several other Bach temperaments discussed later in this series.

There are other commas to learn, and a whole load of other technical guff if you really want to get into this quagmire, but here you will forgive me if we regard the syntonic comma as for all practical purposes the same size as the Pythagorean. After all, don’t you just want to tune your harpsichord instead of go for a science degree?

Here’s how you go about setting Kirnberger III…

Then he explains how to tune a harpischord in this system:

• Carey Beebe, Technical Library: Kirnberger III.

Carey Beebe is my hero these days, because he explains well temperaments better than anyone else I’ve found. My new style of drawing tuning systems is inspired by his, though I’ve added some extra twists like drawing all the major thirds, and using colors.

Technical details

If you’re wondering what Beebe and I mean about Pythagorean versus syntonic commas, here you can see it. Here is Kirnberger I drawn in my old style, where I only drew major thirds that are just, and I drew them in dark blue:

Kirnberger I has one fifth that’s flat by a factor of the syntonic comma:

σ = 2-4 · 34 · 5-1 = 81/80 = 1.0125

But as we go all the way around the circle of fifths the ‘total flatness’ must equal the Pythagorean comma:

p = 2-19 · 312 = 531441/524288 ≈ 1.013643

That’s just a law of math. So Kirnberger compensated by having one fifth that’s flat by a factor of p/σ, which is called the ‘schisma’:

χ = p/σ = 2-15 · 5 · 38 = 32805/32768 ≈ 1.001129

He stuck this ‘schismatic fifth’ next to the tritone, since that’s a traditional dumping ground for annoying glitches in music. But it barely matters since the schisma is so small.

(That said, the schisma is almost precisely 1/12th of a Pythagorean comma, or more precisely p1/12—a remarkable coincidence discovered by Kirnberger, which I explained in Part 3. And I did draw the 1/12th commas in equal temperament! So you may wonder why I didn’t draw the schisma in Kirnberger I. The answer is simply that in both cases my decision was forced by rules 1 and 2.)

Here’s Kirnberger II in a similar style:

Here the schismatic fifth compensates for using two 1/2 commas that are syntonic rather than Pythagorean.

And here’s Kirnberger III:

Now the schismatic fifth compensates for using four 1/4 commas that are syntonic rather than Pythagorean.

For more on Pythagorean tuning, read this series:

Pythagorean tuning.

For more on just intonation, read this series:

Just intonation.

For more on quarter-comma meantone tuning, read this series:

Quarter-comma meantone.

For more on well-tempered scales, read this series:

Part 1. An introduction to well temperaments.

Part 2. How small intervals in music arise naturally from products of integral powers of primes that are close to 1. The Pythagorean comma, the syntonic comma and the lesser diesis.

Part 3. Kirnberger’s rational equal temperament. The schisma, the grad and the atom of Kirnberger.

Part 4. The music theorist Kirnberger: his life, his personality, and a brief introduction to his three well temperaments.

Part 5. Kirnberger’s three well temperaments: Kirnberger I, Kirnberger II and Kirnberger III.

For more on equal temperament, read this series:

Equal temperament.

John BaezWell Temperaments (Part 4)


Now I want to start talking about some important well-tempered tuning systems invented by Johann Philipp Kirnberger. But first: who was this guy? As I tried to answer this question for myself I became sort of fascinated with his personality.

Kirnberger was a German music theorist who played an important role in formalizing baroque harmony and counterpoint. He was born in 1721.

As a child he studied the violin and harpsichord at home. He then moved to another town to study the organ, and then at 17 moved again to start seriously studying the violin. At the age of 18, he went to Leipzig to study performance and composition with Johann Sebastian Bach. He did this intermittently for three years, but this seems to have been a pivotal period in his life. Bach was an energetic teacher, with about 300 students over his life, but unfortunately he never wrote down his thoughts on music. In the end, it largely fell to Kirnberger to systematize Bach’s ideas on harmony and composition.

Between the ages of 20 and and 30, Kirnberger worked in Poland and wrote a book on Polish dances. He then became a violinist at the court of Frederick II of Prussia, and from the age of 37 until his death at the age of 61 he was music director for the princess of Prussia.

But most of all, Kirnberger was a big fan of Bach. He called Bach “the greatest of all composers.” Around the age of 40 he published a book of Bach’s clavier pieces, and he worked hard to publish all of Bach’s chorales, which finally appeared after Kirnberger’s death. He preserved many of Bach’s manuscripts in his library. He even wrote some pieces that for a while were attributed to either J. S. Bach or C.P.E. Bach—like this concerto for harpsichord, written when Kirnberger was about 50:

Kirnberger was better as a music theorist than a performer—though as a theorist he was quite polemical. In 1794, ten years after Kirnberger’s death, a musician named Friedrich Nicolai wrote:

Kirnberger has many good musical ideas […] he deserves full credit as a theorist. But he is unable to bring any of his ideas to good musical fruition, perhaps because of insufficient ability. His aim is not to see good music performed, but merely to find music containing “errors” so that he may make learned-and often violent-statements about others’ mistakes. As a performer he has practically no skill at all, except when playing his own compositions; his sense of rhythm is especially uncertain.

His frustrations seem to have driven him to mathematics. When Kirnberger was 52, Charles Burney wrote that he

is said to be soured by opposition and disappointment; his present inclination leads him to mathematical studies, and to the theory of music, more than the practice […] In his late writings, he appears to be more ambitious of the character of an algebraist, than of a musician of genius.

His discovery of the ‘atom of Kirnberger’, which I explained last time, indeed seems like something only a person with a strong mathematical bent could do!

His three-part theoretical work Die Kunst des reinen Satzes in der Musik, or The Art of Strict Musical Composition, had a big impact in his day. However, he wrote poorly. Nicolai wrote:

Kirnberger considered himself to be a philosophical musician. In reality, he had pondered over his art more than other musicians. For all that, he did not have clear concepts about so many things, still less philosophically correct ones. Because he had no formal education at all and had read little, he lacked much necessary knowledge, which he could acquire only by considerable effort through association with scholars; therefore, he sometimes could not explain rather ordinary things clearly. Scholars who wanted to come to an understanding with him had to divine his meaning.

In fact, parts of Die Kunst des reinen Satzes in der Musik were actually written by a student of Kirnberger’s who could write more clearly. This student, Johann Abraham Peter Schulz, said as much:

I had just recently made a systematic reduction of his [Kirnberger’s] principles of harmony for my own benefit and satisfaction. And at his request I had applied this system practically to the analysis of two pieces by Joh. Seb. Bach, which are difficult to understand […] His student’s writing pleased the teacher and he permitted it to be published under his name.

That could get someone in trouble today.

Kirnberger was an argumentative man. He was quite harsh in his condemnation of two other important theorists, Rameau and Marpurg. In 1800, a fellow named Reichart wrote:

Kirnberger was a very passionate man who gave himself up to his impetuous temperament […] The cultivation of his art, as he saw it and believed to embrace it, went before everything. The few righteous musicians whom he acknowledged possessed him completely and absorbed his entire disposition. Everything that did not immediately further the higher part of the art […] he despised and considered repugnant.

All in all, a curious and interesting character.

Kirnberger’s tuning systems

As discussed in Part 1, in a ‘well temperament’ each key gives a scale with its own different flavor. Quite a number of well temperaments had been used since Andreas Werckmeister invented three of them starting in 1681. (I’ll explain those later.) By the time Kirnberger got involved, equal temperament was beginning to take over. In fact in 1760, at the age of 39, he published something called Construction der gleichschwebende Temperatur, about the construction of equal temperament. But in Die Kunst des reinen Satzes in der Musik he explained that he didn’t like equal temperament, because it reduced the diversity of scales down to just two: major and minor. And in a letter 19 years later, he

Equal temperament is absolutely terrible, only being useful in the case of properly positioning the frets of a theorbo, lute or other such similar instrument such as a psaltry, zither etc., as a temperament of another type does not do each string

Today, Kirnberger is mainly known for two well-temperaments called Kirnberger II and Kirnberger III. But he seems to have put work into a least two more tuning systems. First, unsurprisingly, there’s his well temperament called Kirnberger I. Second, there is ‘rational equal temperament’—a system I explained last time. But this is so close to 12-tone equal temperament that nobody can hear the difference: its only advantage, if you can call it that, is having frequency ratios that are rational numbers—with, unfortunately, rather huge numerators and denominators. I doubt anyone has actually used it, except perhaps as an experiment.

What are Kirnberger I, II and III actually like?

First, as a point of comparison, recall the tuning system that all well-tempered systems are responses to. Namely, quarter-comma meantone:

This system has a lot of fifths that have been lowered by a quarter comma: that is, divided by σ1/4 where σ is the syntonic comma. These ‘quarter-comma fifths’ are just slightly smaller than the ‘just’ perfect fifth, namely 3/2. So that’s good. It has a lot of ‘just’ major thirds, with frequency ratios of exactly 5/4, shown as the blue arrows above. So that’s great. But to pay the price for all those quarter-comma fifth, it has a ‘wolf fifth’ that’s 128/125 times bigger than all the rest. And that’s noticeably ugly!

Well-tempered fifths seek to kill the wolf. Here’s how Kirnberger I does it:

Here’s Kirnberger II:

And here’s Kirnberger III:

Puzzle. Do you see the pattern?

I’ll examine these systems in more detail next time.


The quote of Kirnberger saying equal temperament is absolutely terrible comes from here:

• Dominic Eckersley, The Rosetta revisited: Bach’s very ordinary temperament, Berlin 2012.

It originated in a letter from Kirnberger to someone named Forkel, written around 1779.

I got all my other quotes from here:

• Beverly Jerold, Johann Philipp Kirnberger and authorship, Notes 69 (2013), 688–705.

Notes is a nice name for a music journal! This article is about Kirnberger and his student Schulz, analyzing how much each might have contributed to the writing of Die Kunst des reinen Satzes in der Musik and also the encyclopedia Allgemeine Theorie der schonen Kunste (General Theory of the Fine Arts). To help figure this out, Jerold investigates Kirnberger’s views and personality.

For more on Pythagorean tuning, read this series:

Pythagorean tuning.

For more on just intonation, read this series:

Just intonation.

For more on quarter-comma meantone tuning, read this series:

Quarter-comma meantone.

For more on well-tempered scales, read this series:

Part 1. An introduction to well temperaments.

Part 2. How small intervals in music arise naturally from products of integral powers of primes that are close to 1. The Pythagorean comma, the syntonic comma and the lesser diesis.

Part 3. Kirnberger’s rational equal temperament. The schisma, the grad and the atom of Kirnberger.

Part 4. The music theorist Kirnberger: his life, his personality, and a brief introduction to his three well temperaments.

Part 5. Kirnberger’s three well temperaments: Kirnberger I, Kirnberger II and Kirnberger III.

For more on equal temperament, read this series:

Equal temperament.

February 21, 2024

Tommaso DorigoA New Free Tool For The Optimization Of Muon Tomography

Muon tomography is one of the most important spinoffs of fundamental research with particle detectors -if not the most important. 

read more

February 20, 2024

John PreskillA classical foreshadow of John Preskill’s Bell Prize

Editor’s Note: This post was co-authored by Hsin-Yuan Huang (Robert) and Richard Kueng.

John Preskill, Richard P. Feynman Professor of Theoretical Physics at Caltech, has been named the 2024 John Stewart Bell Prize recipient. The prize honors John’s contributions in “the developments at the interface of efficient learning and processing of quantum information in quantum computation, and following upon long standing intellectual leadership in near-term quantum computing.” The committee cited John’s seminal work defining the concept of the NISQ (noisy intermediate-scale quantum) era, our joint work “Predicting Many Properties of a Quantum System from Very Few Measurements” proposing the classical shadow formalism, along with subsequent research that builds on classical shadows to develop new machine learning algorithms for processing information in the quantum world.

We are truly honored that our joint work on classical shadows played a role in John winning this prize. But as the citation implies, this is also a much-deserved “lifetime achievement” award. For the past two and a half decades, first at IQI and now at IQIM, John has cultivated a wonderful, world-class research environment at Caltech that celebrates intellectual freedom, while fostering collaborations between diverse groups of physicists, computer scientists, chemists, and mathematicians. John has said that his job is to shield young researchers from bureaucratic issues, teaching duties and the like, so that we can focus on what we love doing best. This extraordinary generosity of spirit has been responsible for seeding the world with some of the bests minds in the field of quantum information science and technology.

A cartoon depiction of John Preskill (Middle), Hsin-Yuan Huang (Left), and Richard Kueng (Right). [Credit: Chi-Yun Cheng]

It is in this environment that the two of us (Robert and Richard) met and first developed the rudimentary form of classical shadows — inspired by Scott Aaronson’s idea of shadow tomography. While the initial form of classical shadows is mathematically appealing and was appreciated by the theorists (it was a short plenary talk at the premier quantum information theory conference), it was deemed too abstract to be of practical use. As a result, when we submitted the initial version of classical shadows for publication, the paper was rejected. John not only recognized the conceptual beauty of our initial idea, but also pointed us towards a direction that blossomed into the classical shadows we know today. Applications range from enabling scientists to more efficiently understand engineered quantum devices, speeding up various near-term quantum algorithms, to teaching machines to learn and predict the behavior of quantum systems.

Congratulations John! Thank you for bringing this community together to do extraordinarily fun research and for guiding us throughout the journey.

Matt Strassler “Moving” Faster than the Speed of Light?

Nothing goes faster than the speed of light in empty space, also known as the cosmic speed limit c. Right? Well, umm… the devil is in the details.

Here are some of those details:

  1. If you hold two flashlights and point them in opposite directions, the speed at which the two beams rush apart, from your perspective, is indeed twice the cosmic speed limit.
  2. In an expanding universe, the distance between you and a retreating flash of light can increase faster than the cosmic speed limit.
  3. The location where two measuring sticks cross one another can potentially move faster than the cosmic speed limit.

I addressed issue #1 in a blog post last year.

Today I’ve just put up an article on issue #2. (This is a part of my effort to expand on loose ends raised in the footnotes from my upcoming book).

As for issue #3, can you see why it might be true?

If you aren’t sure and want the answer, click here:

The “location where two measuring sticks meet” is not itself an object. Only objects — localized material things with energy and momentum — are constrained to have relative speeds below the cosmic speed limit — and even that statement needs to be made more precisely (see detail #1 above!) A meeting point between objects is not itself an object, and may move faster than c even if the objects move slower than c. The figure below illustrates this.

One stick is shown in black, the other in red. The red one then moves downward; the dotted line shows where it was initially. Although the red stick moves only a short distance during the animation, the meeting point of the two sticks crosses the entire animation from left to right. If the red stick is moving at half the cosmic speed limit, which is perfectly consistent with Einstein’s relativity, the meeting point, which covers much more ground in the same amount of time, is clearly moving faster than the cosmic speed limit. No problem: no thing moved faster than the speed of light.

One stick (black) moves downward, starting from the dotted line, and passing another stick(red). The speed at which their crossing point moves is much faster than the speed of the moving stick itself. While the black stick cannot move faster than the cosmic speed limit relative to the red stick, their crossing point can do so.

Note added: in materials, the speed of light is generally slower than the cosmic speed limit. For this reason, objects inside materials can move faster than the speed of light but slower than the cosmic speed limit. The result is a very interesting and useful effect: Cerenkov radiation, widely used in particle physics experiments such as Ice Cube!

February 19, 2024

Mark GoodsellRencontres de Physique des Particules 2024

Just over a week ago the annual meeting of theoretical particle physicists (RPP 2024) was held at Jussieu, the campus of Sorbonne University where I work. I wrote about the 2020 edition (held just outside Paris) here; in keeping with tradition, this year's version also contained similar political sessions with the heads of the CNRS' relevant physics institutes and members of CNRS committees, although they were perhaps less spicy (despite rumours of big changes in the air). 

One of the roles of these meetings is as a shop window for young researchers looking to be hired in France, and a great way to demonstrate that they are interested and have a connection to the system. Of course, this isn't and shouldn't be obligatory by any means; I wasn't really aware of this prior to entering the CNRS though I had many connections to the country. But that sort of thing seems especially important after the problems described by 4gravitons recently, and his post about getting a permanent job in France -- being able to settle in a country is non-trivial, it's a big worry for both future employers and often not enough for candidates fighting tooth and nail for the few jobs there are. There was another recent case of someone getting a (CNRS) job -- to come to my lab, even -- who much more quickly decided to leave the entire field for personal reasons. Both these stories saddened me. I can understand -- there is the well-known Paris syndrome for one thing -- and the current political anxiety about immigration and the government's response to the rise of the far right (across the world), coupled with Brexit, is clearly leading to things getting harder for many. These stories are especially worrying because we expect to be recruiting for university positions in my lab this year.

I was obviously very lucky and my experience was vastly different; I love both the job and the place, and I'm proud to be a naturalised citizen. Permanent jobs in the CNRS are amazing, especially in terms of the time and freedom you have, and there are all sorts of connections between the groups throughout the country such as via the IRN Terascale or GdR Intensity Frontier; or IRN Quantum Fields and Strings and French Strings meetings for more formal topics. I'd recommend anyone thinking about working here to check out these meetings and the communities built around them, as well as taking the opportunity to find out about life here. For those moving with family, France also offers a lot of support (healthcare, childcare, very generous holidays, etc) once you have got into the system.

The other thing to add that was emphasised in the political sessions at the RPP (reinforcing the message that we're hearing a lot) is that the CNRS is very keen to encourage people from under-represented groups to apply and be hired. One of the ways they see to help this is to put pressure on the committees to hire researchers (even) earlier after their PhD, in order to reduce the length of the leaky pipeline.

Back to physics

Coming back to the RPP, this year was particularly well attended and had an excellent program of reviews of hot topics, invited and contributed talks, put together very carefully by my colleagues. It was particularly poignant for me because two former students in my lab who I worked with a lot, one who recently got a permanent job, were talking; and in addition both a former student of mine and his current PhD student were giving talks: this made me feel old. (All these talks were fascinating, of course!) 

One review that stood out as relevant for this blog was Bogdan Malaescu's review of progress in understanding the problem with muon g-2. As I discussed here, there is currently a lot of confusion in what the Standard Model prediction should be for that quantity. This is obviously very concerning for the experiments measuring muon g-2, who in a paper last year reduced their uncertainty by a factor of 2 to $$a_\mu (\mathrm{exp}) = 116 592 059(22)\times 10^{−11}. $$

The Lattice calculation (which has been confirmed now by several groups) disagrees with the prediction using the data-driven R-ratio method however, and there is a race on to understand why. New data from the CMD-3 experiment seems to agree with the lattice result, combining all global data on measurements of \(e^+ e^- \rightarrow \pi^+ \pi^- \) still gives a discrepancy of more than \(5\sigma\). There is clearly a significant disagreement within the data samples used (indeed, CMD-3 significantly disagrees with their own previous measurement, CMD-2). The confusion is summarised by this plot:

As can be seen, the finger of blame is often pointed at the KLOE data; excluding it but including the others in the plot gives agreement with the lattice result and a significance of non-zero \(\Delta a_\mu\) compared to experiment of \(2.8\sigma\) (or for just the dispersive method without the lattice data \( \Delta a_\mu \equiv a_\mu^{\rm SM} - a_\mu^{\rm exp} = −123 \pm 33 \pm 29 \pm 22 \times 10^{-11} \) , a discrepancy of \(2.5\sigma\)). In Bogdan's talk (see also his recent paper) he discusses these tensions and also the tensions between the data and the evaluation of \(a_\mu^{\rm win}\), which is the contribution coming from a narrow "window" (when the total contribution to the Hadronic Vacuum Polarisation is split into short, medium and long-distance pieces, the medium-range part should be the one most reliable for lattice calculations -- at short distances the lattice spacing may be too small, and at long ones the lattice may not be large enough). There he shows that, if we exclude the KLOE data and just include the BABAR, CMD-3 and Tau data, while the overall result agrees with the BMW lattice result, the window one disagrees by \(2.9 \sigma\) [thanks Bogdan for the correction to the original post]. It's clear that there is still a lot to be understood in the discrepancies of the data, and perhaps, with the added experimental precision on muon g-2, there is even still a hint of new physics ...

Matt Strassler Article 4 on Zero-Point Energy: Mass, Fermions, and a Good Wrong Idea

I have posted my fourth article discussing zero-point energy. (Here are the firstthe second, and the third, which covered respectively the zero-point energy of a ball on a spring, a guitar string, and a bosonic field whose particles have zero mass, such as the electromagnetic field.) Today’s article looks at fields whose particles have non-zero mass, such as the Higgs field, and fermionic fields, such as the electron field and quark fields. It presents some simple formulas, and in its final section, shows how one can obtain them using math.

Along the way we’ll encounter the idea of “supersymmetry” and its failed role in the cosmological constant problem. This is a word which (for some good historical reasons) generates a lot of heat. But stay calm; I’m neither promoting it nor bashing it. Supersymmetry is an idea which proves useful as a conceptual tool, whether it is true in nature or not.

So that you know where I’m headed: after this article, we’ll now be in a position to understand (using only simple formulas) where the hierarchy puzzle comes from and why it is tied up with the concept of zero-point energy. Then, finally, we can grasp what’s puzzling about the hierarchy, and look at various proposed solutions to it, ranging from fancy math to historical drama, or even denying that it’s puzzling at all.

John PreskillThe rain in Portugal

My husband taught me how to pronounce the name of the city where I’d be presenting a talk late last July: Aveiro, Portugal. Having studied Spanish, I pronounced the name as Ah-VEH-roh, with a v partway to a hard b. But my husband had studied Portuguese, so he recommended Ah-VAI-roo

His accuracy impressed me when I heard the name pronounced by the organizer of the conference I was participating in—Theory of Quantum Computation, or TQC. Lídia del Rio grew up in Portugal and studied at the University of Aveiro, so I bow to her in matters of Portuguese pronunciation. I bow to her also for organizing one of the world’s largest annual quantum-computation conferences (with substantial help—fellow quantum physicist Nuriya Nurgalieva shared the burden). But Lídia cofounded Quantum, a journal that’s risen from a Gedankenexperiment to a go-to venue in six years. So she gives the impression of being able to manage anything.

Aveiro architecture

Watching Lídia open TQC gave me pause. I met her in 2013, the summer before beginning my PhD at Caltech. She was pursuing her PhD at ETH Zürich, which I was visiting. Lídia took me dancing at an Argentine-tango studio one evening. Now, she’d invited me to speak at an international conference that she was coordinating.

Lídia and me in Zürich as PhD students
Lídia opening TQC

Not only Lídia gave me pause; so did the three other invited speakers. Every one of them, I’d met when each of us was a grad student or a postdoc. 

Richard Küng described classical shadows, a technique for extracting information about quantum states via measurements. Suppose we wish to infer about diverse properties of a quantum state \rho (about diverse observables’ expectation values). We have to measure many copies of \rho—some number n of copies. The community expected n to grow exponentially with the system’s size—for instance, with the number of qubits in a quantum computer’s register. We can get away with far fewer, Richard and collaborators showed, by randomizing our measurements. 

Richard postdocked at Caltech while I was a grad student there. Two properties of his stand out in my memory: his describing, during group meetings, the math he’d been exploring and the Austrian accent in which he described that math.

Did this restaurant’s owners realize that quantum physicists were descending on their city? I have no idea.

Also while I was a grad student, Daniel Stilck França visited Caltech. Daniel’s TQC talk conveyed skepticism about whether near-term quantum computers can beat classical computers in optimization problems. Near-term quantum computers are NISQ (noisy, intermediate-scale quantum) devices. Daniel studied how noise (particularly, local depolarizing noise) propagates through NISQ circuits. Imagine a quantum computer suffering from a 1% noise error. The quantum computer loses its advantage over classical competitors after 10 layers of gates, Daniel concluded. Nor does he expect error mitigation—a bandaid en route to the sutures of quantum error correction—to help much.

I’d coauthored a paper with the fourth invited speaker, Adam Bene Watts. He was a PhD student at MIT, and I was a postdoc. At the time, he resembled the 20th-century entanglement guru John Bell. Adam still resembles Bell, but he’s moved to Canada.

Adam speaking at TQC
From a 2021 Quantum Frontiers post of mine. I was tickled to see that TQC’s organizers used the photo from my 2021 post as Adam’s speaker photo.

Adam distinguished what we can compute using simple quantum circuits but not using simple classical ones. His results fall under the heading of complexity theory, about which one can rarely prove anything. Complexity theorists cling to their jobs by assuming conjectures widely expected to be true. Atop the assumptions, or conditions, they construct “conditional” proofs. Adam proved unconditional claims in complexity theory, thanks to the simplicity of the circuits he compared.

In my estimation, the talks conveyed cautious optimism: according to Adam, we can prove modest claims unconditionally in complexity theory. According to Richard, we can spare ourselves trials while measuring certain properties of quantum systems. Even Daniel’s talk inspired more optimism than he intended: a few years ago, the community couldn’t predict how noisy short-depth quantum circuits could perform. So his defeatism, rooted in evidence, marks an advance.

Aveiro nurtures optimism, I expect most visitors would agree. Sunshine drenches the city, and the canals sparkle—literally sparkle, as though devised by Elsa at a higher temperature than usual. Fresh fruit seems to wend its way into every meal.1 Art nouveau flowers scale the architecture, and fanciful designs pattern the tiled sidewalks.

What’s more, quantum information theorists of my generation were making good. Three riveted me in their talks, and another co-orchestrated one of the world’s largest quantum-computation gatherings. To think that she’d taken me dancing years before ascending to the global stage.

My husband and I made do, during our visit, by cobbling together our Spanish, his Portuguese, and occasional English. Could I hold a conversation with the Portuguese I gleaned? As adroitly as a NISQ circuit could beat a classical computer. But perhaps we’ll return to Portugal, and experimentalists are doubling down on quantum error correction. I remain cautiously optimistic.

1As do eggs, I was intrigued to discover. Enjoyed a hardboiled egg at breakfast? Have a fried egg on your hamburger at lunch. And another on your steak at dinner. And candied egg yolks for dessert.

This article takes its title from a book by former US Poet Laureate Billy Collins. The title alludes to a song in the musical My Fair Lady, “The Rain in Spain.” The song has grown so famous that I don’t think twice upon hearing the name. “The rain in Portugal” did lead me to think twice—and so did TQC.

With thanks to Lídia and Nuriya for their hospitality. You can submit to TQC2024 here.

February 16, 2024

Matt von HippelValentine’s Day Physics Poem 2024

It’s that time of year again! In one of this blog’s yearly traditions, I’m posting a poem mixing physics and romance. For those who’d like to see more, you can find past years’ poems here.

Modeling Together

Together, we set out to model the world, and learn something new.

The Physicist said,
“My model is simple, the model of fundamental things. Particles go in, particles go out. For each configuration, a probability. For each calculation, an approximation. I can see the path, clear as day. I just need to fix the parameters.”

The Engineer responded,
“I will trust you, because you are a Physicist. You dream of greater things, and have given me marvels. But my models are the models of everything else. Their parameters are countless as waves of the ocean, and all complex things are their purview. Their only path is to learn, and learn more, and see where learning takes you.”

The Physicist followed his model, and the Engineer followed along. With their money and sweat, cajoling and wheedling, they built a grand machine, all to the Physicist’s specifications. And according to the Physicist’s path, parameters begun to be fixed.

But something was missing.

The Engineer asked,
“What are we learning, following your path? We have spent and spent, but all I see is your machine. What marvels will it give us? What children will it feed?”

The Physicist considered, and said,
“You must wait for the marvels, and wait for the learning. New things take time. But my path is clear, my model is the only choice.”

The Engineer, with patience, responded,
“I will trust you, because you are a Physicist, and know the laws of your world. But my models are the models of everything else, and there is always another choice.”

Months went by, and they fed more to the machine. More energy, more time, more insight, more passion. Parameters tightened, and they hoped for marvels.

And they learned, one by one, that the marvels would not come. The machine would not spare them toil, would not fill the Engineer’s pockets or feed the starving, would not fill the world with art and mystery and value.

And the Engineer asked,
“Without these marvels, must we keep following your path? Should we not go out into the world, and learn another?”

And the Physicist thought, and answered,
“You must wait a little longer. For my model is the only model I have known, the only path I know to follow, and I am loathe to abandon it.”

And the Engineer, generously, responded,
“I will trust you, because you are a Physicist, down to the bone. But my models are the models of everything else, of chattering voices and adaptable answers. And you can always learn another path.”

More months went by. The machine gave less and less, and took more and more for the giving. Energy was dear, and time more so, and the waiting was its own kind of emptiness.

The Engineer, silently, looked to the Physicist.

The Physicist said,
“I will trust you. Because you are an Engineer, yes, and your models are the models of everything else. And because, through these months, you have trusted me. I am ready to learn, and learn more, and try something new. Let us try a new model, and see where it leads.”

The simplest model says that one and one is two, and two is greater. We are billions of parameters, and can miss the simple things. But time,
                                                           And learning,
Can fix parameters,
And us.

Matt Strassler The Next Webpage: The Zero-Point Energy of a Cosmic Field

My two new webpages from earlier this week addressed the zero-point energy for the simple case of a ball on a spring and for the much richer case of a guitar string; the latter served as a warmup to today’s webpage, the third in this series, which explains the zero-point energy of a field of the universe. This subject will lead us head-first into the cosmological constant problem. As before, the article starts with a non-mathematical overview, and then obtains the results stated in the overview using pre-university math (except for one aside.) [As always, please comment if you spot typos or find some of the presentation especially confusing!]

(See the first post announcing this series for a brief summary of the hierarchy puzzle, which motivates this whole series, and for links to longer related discussions of it.)

The next webpage after this one will be an extension to today’s, covering other types of fields. That will lead us deeper into the cosmological constant problem, and begin to touch on the hierarchy puzzle.

February 14, 2024

n-Category Café Cartesian versus Symmetric Monoidal

James Dolan and Chris Grossack and I had a fun conversation on Monday. We came up some ideas loosely connected to things Chris and Todd Trimble have been working on… but also connected to the difference between classical and quantum information.

I’ve long been fascinated by the relation between ‘classical’ and ‘quantum’. One way this manifests is the relation between cartesian monoidal categories (like the category of sets with its cartesian product) and more general symmetric monoidal categories (like the category of Hilbert spaces with its tensor product).

Cartesian monoidal categories let us ‘duplicate and delete data’ since every object xx comes with morphisms

Δ x:xxxandϵ x:xI\Delta_x : x \to x \otimes x \;\; and \;\; \epsilon_x: x \to I

where II is the unit object. These obey equations making xx into a cocommutative comonoid — just like a commutative monoid, only backwards. For example if you duplicate some data, it should make no difference if you then switch the two copies. Moreover, in a cartesian monoidal category Δ\Delta and ϵ\epsilon are natural transformations. In quantum mechanics, duplication and deletion of data in a natural way is generally impossible.

Given this, it’s interesting that we can force any symmetric monoidal category to become cartesian. I believe can do it in two ways, which are left and right adjoint to the forgetful map sending cartesian monoidal categories to their underlying symmetric monoidal categories. Moreover, I conjecture that we can describe both these ways very neatly using the free cartesian monoidal category on one object, which I call FF. If these conjectures are right, this category has the power to make any symmetric monoidal category become cartesian!

The details, such as they are

We’ve got two 2-categories:

  • the 2-category of symmetric monoidal categories, SMCSMC.
  • the 2-category of cartesian monoidal categories, CartCart.

There’s an obvious forgetful 2-functor U:CartSMCU: Cart \to SMC. I believe this has both left and right adjoints, in a suitable 2-categorical sense. These are called ‘pseudoadjoints’. So:

Conjecture 0. The forgetful 2-functor U:CartSMCU: Cart \to SMC has a left pseudoadjoint L:SMCCartL: SMC \to Cart and a right pseudoadjoint R:SMCCartR: SMC \to Cart.

I claim that RR sends any symmetric monoidal category CC to the category of cocommutative comonoid objects in CC: this category is cartesian by

Fox’s Theorem. A symmetric monoidal category is cartesian if and only if it is isomorphic to its own category of cocommutative comonoids. Thus every object is equipped with a unique cocommutative comonoid structure Δ x:xxx\Delta_x : x \to x \otimes x and ϵ x:xI\epsilon_x : x \to I, and these structures are respected by all maps.

LL on the other hand should ‘freely’ make any symmetric monoidal category CC into a cartesian one. To do this, it should freely give each object xx morphisms Δ x:xxx\Delta_x : x \to x \otimes x and ϵ x:xI\epsilon_x : x \to I making it into a cocommutative comonoid, imposing equations to make sure every morphism is a comonoid homomorphism. That’s a bit vague, of course. So I want to describe an attempt to make this more precise.

Categorifying the usual tensor product of commutative monoids, there’s a tensor product \boxtimes of symmetric monoidal categories. This has the universal property that if C,DC, D and EE are symmetric monoidal categories, functors

f:C×DEf : C \times D \to E

that are symmetric monoidal in each argument separately correspond to symmetric monoidal functors

f:CDE f: C \boxtimes D \to E

The existence of this tensor product is a special case of a result of Hyland and Power. In fact their work shows this tensor product makes SMCSMC into a monoidal 2-category. I’m sure it must be symmetric monoidal in a suitable 2-categorical sense—but has anyone written that up?

Now for the fun part.

Conjecture 1. LCFCL C \simeq F \boxtimes C where FF is the free cartesian category on one object, i.e. the initial Lawvere theory.

To construct FF we start with the category of finite sets with coproduct as its monoidal structure, which is the free cocartesian monoidal category on one object, and then take its opposite:

F(FinSet,+) op F \simeq (FinSet, +)^{op}

I’ve believed Conjecture 1 since at least 2006 (see page 59 here, where I rashly called it a ‘theorem’, probably because I worked out enough details to make it seem obvious). But now Chris, James and I guessed that the right pseudoadjoint R:SMCCartR: SMC \to Cart has a similar beautiful description!

Hyland and Power didn’t merely show the \boxtimes product of symmetric monoidal categories makes SMCSMC into a monoidal 2-category. They also showed that this monoidal 2-category is closed in a suitable 2-categorical sense—or as they put it, ‘pseudo-closed’.

In other words, given symmetric monoidal categories C,DC, D and EE, there is a symmetric monoidal category [D,E][D,E] such that symmetric monoidal functors

f:CDE f: C \boxtimes D \to E

correspond to symmetric monoidal functors

f:C[D,E] f: C \to [D, E]

If I understand this correctly, [D,E][D,E] has symmetric monoidal functors g:DEg: D \to E as objects, and symmetric monoidal natural transformations between these as morphisms. The tensor product on [D,E][D,E] is defined ‘pointwise’.

And I claim:

Conjecture 2. RC[F,C]R C \simeq [F, C].

The idea is this: FF is not only the free cartesian monoidal category on one object. It’s also the free symmetric monoidal category on a cocommutative comonoid! Objects of [F,C][F,C] are symmetric monoidal functors f:FCf: F \to C, so these should be the same as cocommutative comonoids in CC. So, [F,C][F,C] is the category of cocommutative comonoids in CC, which is RCR C.

Has someone already proved these two conjectures? If not, I hope someone does.

February 13, 2024

Doug NatelsonContinuing Studies course, take 2

A year and a half ago, I mentioned that I was going to teach a course through Rice's Glasscock School of Continuing Studies, trying to give a general audience introduction to some central ideas in condensed matter physics.  Starting in mid-March, I'm doing this again.  Here is a link to the course registration for this synchronous online class.  This course is also intended as a potential continuing education/professional development offering for high school teachers, community college instructors, and other educators, and thanks to the generous support of the NSF, the Glasscock School is able to offer a limited number of full scholarships for educators - apply here by February 27 for consideration.   

(I am aware that the cost of the course is not trivial; at some point in the future I will make the course materials available broadly, and I will be sure to call attention to that at the time.)

Jordan EllenbergAlphabetical Diaries

Enough of this.Enough.Equivocal or vague principles, as a rule, will make your life an uninspired, undirected, and meaningless act.

This is taken from Alphabetical Diaries, a remarkable book I am reading by Sheila Heti, composed of many thousands of sentences drawn from her decades of diaries and presented in alphabetical order. It starts like this:

A book about how difficult it is to change, why we don’t want to, and what is going on in our brain.A book can be about more than one thing, like a kaleidoscope, it can have man things that coalesce into one thing, different strands of a story, the attempt to do several, many, more than one thing at a time, since a book is kept together by the binding.A book like a shopping mart, all the selections.A book that does only one thing, one thing at a time.A book that even the hardest of men would read.A book that is a game.A budget will help you know where to go.

How does a simple, one might even say cheap, technique, one might even say gimmick, work so well? I thrill to the aphorisms even when I don’t believe them, as with the aphorism above: principles must be equivocal or at least vague to work as principles; without the necessary vagueness they are axioms, which are not good for making one’s life a meaningful act, only good for arguing on the Internet. I was reading Alphabetical Diaries while I walked home along the southwest bike path. I stopped for a minute and went up a muddy slope into the cemetery where there was a gap in the fence, and it turned out this gap opened on the area of infant graves, graves about the size of a book, graves overlaying people who were born and then did what they did for a week and then died — enough of this.

February 12, 2024

David Hoggthe transparency of the Universe and the transparency of the university

The highlight of my day was a wide-ranging conversation with Suroor Gandhi (NYU) about cosmology, career, and the world. She made a beautiful connection between a part of our conversation in which we were discussing the transparency of the Universe, and new ways to study that, and a part in which we were discussing the transparency with which the University speaks about disciplinary and rules cases, which (at NYU anyway) is not very good. Hence the title of this post. On transparency of the Universe, we discussed the fact that distant objects (quasars, say) do not appear blurry must put some limit on cosmic transparency. On transparency of the University, we discussed the question of how much do we care about the behavior of our institutions, and changing those behaviors. I'm a big believer in open science, open government, and open institutions.

I've been privileged these years to have some very thoughtful scientists in my world. Gandhi is one of them.

February 11, 2024

Tommaso DorigoOn Overfitting In Statistics And In Machine Learning

I recently held an accelerated course in "Statistical data analysis for fundamental science" for the Instats site. Within only 15 hours of online lectures (albeit these are full 1-hour blocks, unlike the leaky academic-style hours that last 75% of that) I had to cover not just parameter estimation, hypothesis testing, modeling, and goodness of fit, plus several ancillary concepts of high relevance such as ancillarity (yep), conditioning, the likelihood principle, coverage, and frequentist versus bayesian inference, but an introduction to machine learning! How did I do?

read more

February 09, 2024

Matt von HippelNeu-tree-no Detector

I’ve written before about physicists’ ideas for gigantic particle accelerators, proposals for machines far bigger than the Large Hadron Collider or even plans for a Future Circular Collider. The ideas ranged from wacky but not obviously impossible (a particle collider under the ocean) to pure science fiction (a beam of neutrinos that can blow up nukes across the globe).

But what if you don’t want to accelerate particles? What if, instead, you want to detect particles from the depths of space? Can you still propose ridiculously huge things?

Neutrinos are extremely hard to detect. Immune to the strongest forces of nature, they only interact via the weak nuclear force and gravity. The weakness of these forces means they can pass through huge amounts of material without disturbing a single atom. The Sudbury Neutrino Observatory used a tank of 1000 tonnes of water in order to stop enough neutrinos to study them. The IceCube experiment is bigger yet, and getting even bigger: their planned expansion will fill eight cubic kilometers of Antarctic ice with neutrino detectors, letting them measure around a million neutrinos every year.

But if you want to detect the highest-energy neutrinos, you may have to get even bigger than that. With so few of them to study, you need to cover a huge area with antennas to spot a decent number of them.

Or, maybe you can just use trees.

Pictured: a physics experiment?

That’s the proposal of Steven Prohira, a MacArthur Genius Grant winner who works as a professor at the University of Kansas. He suggests that, instead of setting up a giant array of antennas to detect high-energy neutrinos, trees could be used, with a coil of wire around the tree to measure electrical signals. Prohira even suggests that “A forest detector could also motivate the large-scale reforesting of land, to grow a neutrino detector for future generations”.

Despite sounding wacky, tree antennas have actually been used before. Militaries have looked into them as a way to set up antennas in remote locations, and later studies indicate they work surprisingly well. So the idea is not completely impossible, much like the “collider-under-the-sea”.

Like the “collider-under-the-sea”, though, some wackiness still remains. Prohira admits he hasn’t yet done all the work needed to test the idea’s feasibility, and comparing to mature experiments like IceCube makes it clear there is a lot more work to be done. Chatting with neutrino experts, one problem a few of them pointed out is that unlike devices sunk into Antarctic ice, trees are not uniformly spaced, and that might pose a problem if you want to measure neutrinos carefully.

What stands out to me, though, is that those questions are answerable. If the idea sounds promising, physicists can follow up. They can make more careful estimates, or do smaller-scale experiments. They won’t be stuck arguing over interpretations, or just building the full experiment and seeing if it works.

That’s the great benefit of a quantitative picture of the world. We can estimate some things very accurately, with theories that give very precise numbers for how neutrinos behave. Other things we can estimate less accurately, but still can work on: how tall trees are, how widely they are spaced, how much they vary. We have statistical tools and biological data. We can find numbers, and even better, we can know how uncertain we should be about those numbers. Because of that picture, we don’t need to argue fruitlessly about ideas like this. We can work out numbers, and check!

February 07, 2024

Scott Aaronson On whether we’re living in a simulation

Unrelated Announcement (Feb. 7): Huge congratulations to longtime friend-of-the-blog John Preskill for winning the 2024 John Stewart Bell Prize for research on fundamental issues in quantum mechanics!

On the heels of my post on the fermion doubling problem, I’m sorry to spend even more time on the simulation hypothesis. I promise this will be the last for a long time.

Last week, I attended a philosophy-of-mind conference called MindFest at Florida Atlantic University, where I talked to Stuart Hameroff (Roger Penrose’s collaborator on the “Orch-OR” theory of microtubule consciousness) and many others of diverse points of view, and also gave a talk on “The Problem of Human Specialness in the Age of AI,” for which I’ll share a transcript soon.

Oh: and I participated in a panel with the philosopher David Chalmers about … wait for it … whether we’re living in a simulation. I’ll link to a video of the panel if and when it’s available. In the meantime, I thought I’d share my brief prepared remarks before the panel, despite the strong overlap with my previous post. Enjoy!

When someone asks me whether I believe I’m living in a computer simulation—as, for some reason, they do every month or so—I answer them with a question:

Do you mean, am I being simulated in some way that I could hope to learn more about by examining actual facts of the empirical world?

If the answer is no—that I should expect never to be able to tell the difference even in principle—then my answer is: look, I have a lot to worry about in life. Maybe I’ll add this as #4,385 on the worry list.

If they say, maybe you should live your life differently, just from knowing that you might be in a simulation, I respond: I can’t quite put my finger on it, but I have a vague feeling that this discussion predates the 80 or so years we’ve had digital computers! Why not just join the theologians in that earlier discussion, rather than pretending that this is something distinctive about computers? Is it relevantly different here if you’re being dreamed in the mind of God or being executed in Python? OK, maybe you’d prefer that the world was created by a loving Father or Mother, rather than some nerdy transdimensional adolescent trying to impress the other kids in programming club. But if that’s the worry, why are you talking to a computer scientist? Go talk to David Hume or something.

But suppose instead the answer is yes, we can hope for evidence. In that case, I reply: out with it! What is the empirical evidence that bears on this question?

If we were all to see the Windows Blue Screen of Death plastered across the sky—or if I were to hear a voice from the burning bush, saying “go forth, Scott, and free your fellow quantum computing researchers from their bondage”—of course I’d need to update on that. I’m not betting on those events.

Short of that—well, you can look at existing physical theories, like general relativity or quantum field theories, and ask how hard they are to simulate on a computer. You can actually make progress on such questions. Indeed, I recently blogged about one such question, which has to do with “chiral” Quantum Field Theories (those that distinguish left-handed from right-handed), including the Standard Model of elementary particles. It turns out that, when you try to put these theories on a lattice in order to simulate them computationally, you get an extra symmetry that you don’t want. There’s progress on how to get around this problem, including simulating a higher-dimensional theory that contains the chiral QFT you want on its boundaries. But, OK, maybe all this only tells us about simulating currently-known physical theories—rather than the ultimate theory, which a-priori might be easier or harder to simulate than currently-known theories.

Eventually we want to know: can the final theory, of quantum gravity or whatever, be simulated on a computer—at least probabilistically, to any desired accuracy, given complete knowledge of the initial state, yadda yadda? In other words, is the Physical Church-Turing Thesis true? This, to me, is close to the outer limit of the sorts of questions that we could hope to answer scientifically.

My personal belief is that the deepest things we’ve learned about quantum gravity—including about the Planck scale, and the Bekenstein bound from black-hole thermodynamics, and AdS/CFT—all militate toward the view that the answer is “yes,” that in some sense (which needs to be spelled out carefully!) the physical universe really is a giant Turing machine.

Now, Stuart Hameroff (who we just heard from this morning) and Roger Penrose believe that’s wrong. They believe, not only that there’s some uncomputability at the Planck scale, unknown to current physics, but that this uncomputability can somehow affect the microtubules in our neurons, in a way that causes consciousness. I don’t believe them. Stimulating as I find their speculations, I get off their train to Weirdville way before it reaches its final stop.

But as far as the Simulation Hypothesis is concerned, that’s not even the main point. The main point is: suppose for the sake of argument that Penrose and Hameroff were right, and physics were uncomputable. Well, why shouldn’t our universe be simulated by a larger universe that also has uncomputable physics, the same as ours does? What, after all, is the halting problem to God? In other words, while the discovery of uncomputable physics would tell us something profound about the character of any mechanism that could simulate our world, even that wouldn’t answer the question of whether we were living in a simulation or not.

Lastly, what about the famous argument that says, our descendants are likely to have so much computing power that simulating 1020 humans of the year 2024 is chickenfeed to them. Thus, we should expect that almost all people with the sorts of experiences we have who will ever exist are one of those far-future sims. And thus, presumably, you should expect that you’re almost certainly one of the sims.

I confess that this argument never felt terribly compelling to me—indeed, it always seemed to have a strong aspect of sawing off the branch it’s sitting on. Like, our distant descendants will surely be able to simulate some impressive universes. But because their simulations will have to run on computers that fit in our universe, presumably the simulated universes will be smaller than ours—in the sense of fewer bits and operations needed to describe them. Similarly, if we’re being simulated, then presumably it’s by a universe bigger than the one we see around us: one with more bits and operations. But in that case, it wouldn’t be our own descendants who were simulating us! It’d be beings in that larger universe.

(Another way to understand the difficulty: in the original Simulation Argument, we quietly assumed a “base-level” reality, of a size matching what the cosmologists of our world see with their telescopes, and then we “looked down” from that base-level reality into imagined realities being simulated in it. But we should also have “looked up.” More generally, we presumably should’ve started with a Bayesian prior over where we might be in some great chain of simulations of simulations of simulations, then updated our prior based on observations. But we don’t have such a prior, or at least I don’t—not least because of the infinities involved!)

Granted, there are all sorts of possible escapes from this objection, assumptions that can make the Simulation Argument work. But these escapes (involving, e.g., our universe being merely a “low-res approximation,” with faraway galaxies not simulated in any great detail) all seem metaphysically confusing. To my mind, the simplicity of the original intuition for why “almost all people who ever exist will be sims” has been undermined.

Anyway, that’s why I don’t spend much of my own time fretting about the Simulation Hypothesis, but just occasionally agree to speak about it in panel discussions!

But I’m eager to hear from David Chalmers, who I’m sure will be vastly more careful and qualified than I’ve been.

In David Chalmers’s response, he quipped that the very lack of empirical consequences that makes something bad as a scientific question, makes it good as a philosophical question—so what I consider a “bug” of the simulation hypothesis debate is, for him, a feature! He then ventured that surely, despite my apparent verificationist tendencies, even I would agree that it’s meaningful to ask whether someone is in a computer simulation or not, even supposing it had no possible empirical consequences for that person. And he offered the following argument: suppose we’re the ones running the simulation. Then from our perspective, it seems clearly meaningful to say that the beings in the simulation are, indeed, in a simulation, even if the beings themselves can never tell. So then, unless I want to be some sort of postmodern relativist and deny the existence of absolute, observer-independent truth, I should admit that the proposition that we’re in a simulation is also objectively meaningful—because it would be meaningful to those simulating us.

My response was that, while I’m not a strict verificationist, if the question of whether we’re in a simulation were to have no empirical consequences whatsoever, then at most I’d concede that the question was “pre-meaningful.” This is a new category I’ve created, for questions that I neither admit as meaningful nor reject as meaningless, but for which I’m willing to hear out someone’s argument for why they mean something—and I’ll need such an argument! Because I already know that the answer is going to look like, “on these philosophical views the question is meaningful, and on those philosophical views it isn’t.” Actual consequences, either for how we should live or for what we should expect to see, are the ways to make a question meaningful to everyone!

Anyway, Chalmers had other interesting points and distinctions, which maybe I’ll follow up on when (as it happens) I visit him at NYU in a month. But I’ll just link to the video when/if it’s available rather than trying to reconstruct what he said from memory.

Doug NatelsonA couple of links + a thought experiment about spin

A couple of interesting things to read:

  • As someone interested in lost ancient literature and also science, I really liked this news article from Nature about progress in reading scrolls excavated from Herculaneum.  The area around the Bay of Naples was a quite the spot for posh Roman families, and when Vesuvius erupted in 79 CE, whole villas, complete with their libraries of books on papyrus scrolls, were buried and flash-cooked under pyroclastic flows.  Those scrolls now look like lump charcoal, but with modern x-ray techniques (CT scanning using the beam from a synchrotron) plus machine learning, it is now possible to virtually unroll the scrolls and decipher the writing, because the ink has enough x-ray contrast with the carbonized papyrus to be detected.  There is reason to believe that there are more scrolls out there still buried, and there are lots of other books and scrolls out there that are too delicate or damaged to be handled and read the normal way.  It's great to see this approach starting to succeed.
  • I've written about metalenses before - using nanostructured surfaces for precise control of optical wavefronts to make ultrathin optical elements with special properties.  This extended news item from Harvard about this paper is a nice piece of writing.  With techniques now developed to make dielectric metalenses over considerably larger areas (100 mm silica wafers), these funky lenses can now start to be applied to astronomy.  Nifty.
And now the gedanken experiment that I've been noodling on for a bit.  I know what the correct answer must be, but I think this has done a good job at reminding me how what constitutes a measurement is a very subtle issue in quantum mechanics.

Suppose I have a single electron roughly localized at the origin.  It has spin-1/2, meaning that, if there are no other constraints, if I choose to make a measurement of the electron spin along some particular axis, I will find that with 50/50 probability the component of the angular momentum of the electron is \(\pm \hbar/2\) along that axis.  Suppose that I pick a \(z\) axis and do the measurement, finding that the electron is "spin-up" along \(z\).  Because the electron has a magnetic dipole moment, that means that the magnetic field at some distance \(r\) away from the origin should be the field from a magnetic dipole along \(z\).  

Now suppose I make another measurement of the spin, this time along the \(x\) axis.  I have a 50/50 chance of finding the electron spin up/down along \(x\).  After that measurement, the magnetic field at the same location \(r\) away from the origin should be the field from a magnetic dipole along \(x\).  It makes physical sense that the magnetic field at location \(r\) can only "know" that a measurement was done at the origin on a timescale \(r/c\).  (Note:  A truly correct treatment of this situation would seem to require QED, because the spin is entangled with the electromagnetic field via its magnetic moment; likewise one would really need to discuss in detail what it means to measure the spin state at the origin and what it means to measure the magnetic field locally.  Proper descriptions of detectors and measurements are really necessary.)

To highlight how subtle the situation is, suppose the spin at the origin is initially half of an EPR pair, so that it's in a spin singlet with a second spin near Alpha Centauri, so that the total spin of the two is zero.  Now a measurement of \(s_{z}\) at the origin determines the state of \(s_{z}\) at Alpha Centauri, and the magnetic field near that dipole at Alpha Centauri should be consistent with that.  Thinking about all of the subtleties here has been a good exercise for me in remembering how the seemingly simple statements we make when we teach this stuff can be implicitly very complicated.

February 06, 2024

John PreskillDiscoveries at the Dibner

This past summer, our quantum thermodynamics research group had the wonderful opportunity to visit the Dibner Rare Book Library in D.C. Located in a small corner of the Smithsonian National Museum of American History, tucked away behind flashier exhibits, the Dibner is home to thousands of rare books and manuscripts, some dating back many centuries.

Our advisor, Nicole Yunger Halpern, has a special connection to the Dibner, having interned there as an undergrad. She’s remained in contact with the head librarian, Lilla Vekerdy. For our visit, the two of them curated a large spread of scientific work related to thermodynamics, physics, and mathematics. The tomes ranged from a 1500s print of Euclid’s Elements to originals of Einstein’s manuscripts with hand-written notes in the margin.

The print of Euclid’s Elements was one of the standout exhibits. It featured a number of foldout nets of 3D solids, which had been cut and glued into the book by hand. Several hundred copies of this print are believed to have been made, each of them containing painstakingly crafted paper models. At the time, this technique was an innovation, resulting from printers’ explorations of the then-young art of large-scale book publication.

Another interesting exhibit was rough notes on ideal gases written by Planck, one of the fathers of quantum mechanics. Ideal gases are the prototypical model in statistical mechanics, capturing to high accuracy the behaviour of real gases within certain temperatures and pressures. The notes contained comparisons between BoltzmannEhrenfest, and Planck’s own calculations for classical and quantum ideal gases. Though the prose was in German, some results were instantly recognizable, such as the plot of the specific heat of a classical ideal gas, showing the stepwise jump as degrees of freedom freeze out. 

Looking through these great physicists’ rough notes, scratched-out ideas, and personal correspondences was a unique experience, helping humanize them and place their work in historical context. Understanding the history of science doesn’t just need to be for historians, it can be useful for scientists themselves! Seeing how scientists persevered through unknowns, grappling with doubts and incomplete knowledge to generate new ideas, is inspiring. But when one only reads the final, polished result in a modern textbook, it can be difficult to appreciate this process of discovery. Another reason to study the historical development of scientific results is that core concepts have a way of arising time and again across science. Recognizing how these ideas have arisen in the past is insightful. Examining the creative processes of great scientists before us helps develop our own intuition and skillset.

Thanks to our advisor for this field trip – and make sure to check out the Dibner next time you’re in DC! 

February 05, 2024

n-Category Café The Atom of Kirnberger

The 12th root of 2 times the 7th root of 5 is

1.333333192495 1.333333192495\dots

And since the numbers 5, 7, and 12 show up in scales, this weird fact has implications for music! It leads to a remarkable meta-meta-glitch in tuning systems. Let’s check it out.

Two important glitches that afflict tuning systems are the Pythagorean comma and the syntonic comma. If you go up 12 fifths, multiplying the frequency by 3/2 each time, you go up a bit less than 7 octaves. The ratio is the Pythagorean comma:

p=531441/5242881.013643 p = 531441/524288 \approx 1.013643

And if you go up four fifths, you go up a bit more than 2 octaves and a major third (which ideally has a frequency ratio of 5/4). The ratio is the syntonic comma:

σ=81/80=1.0125\sigma = 81/80 = 1.0125

In music it would be very convenient if these two glitches were the same — and sometimes musicians pretend they are. But they’re not! So their ratio shows up as a tiny meta-glitch. It’s called the ‘schisma’:

χ=p/σ=32805/327681.0011298906\chi = p/\sigma = 32805/32768 \approx 1.0011298906

and it was discovered by an advisor to the Gothic king Theodoric the Great — a guy named Boethius, who was later tortured and executed, presumably for unrelated reasons.

In the most widely used tuning system today, called equal temperament, a fifth is not 3/2 but slightly less: it’s

2 7/121.498307 2^{7/12} \approx 1.498307

The ratio of 3/2 and this slightly smaller fifth is called the ‘grad’:

γ=p 1/121.0011291504\gamma = p^{1/12} \approx 1.0011291504

Look! The grad is amazingly close to the schisma! They agree to 7 decimal places! Their ratio is a meta-meta-glitch called the Kirnberger kernel:

χ/γ1.0000007394\chi/\gamma \approx 1.0000007394

If you unravel the mathematical coincidence that makes this happens, you’ll see it boils down to

2 1/125 1/71.333333192495 2^{1/12}\, 5^{1/7} \approx 1.333333192495

being very close to 4/3. And this coincidence let Bach’s student Johann Kirnberger invent an amazing tuning system called rational equal temperament. It’s very close to equal temperament, but all the frequency ratios are rational.

To get this tuning system, instead of letting the fifths equal 2 7/122^{7/12}, which is 3/2 divided by the grad, which is irrational, we try to use 3/2 divided by the schisma, which is rational. But this creates a tiny error! We deal with this by taking one of our fifths and further dividing it by the 12th power of the Kirnberger kernel — a rational number called the atom of Kirnberger:

α=(χ/γ) 12=2 1613 845 121.0000088728601397 \alpha = (\chi/\gamma)^{12} = 2^{161} \cdot 3^{-84} \cdot 5^{-12} \approx 1.0000088728601397

Rational equal temperament looks like this, with fifths labeled by their frequency ratios:

11 of the fifths are ‘schismatic fifths’ with a frequency ratio of 32χ 1\frac{3}{2}\chi^{-1}, but one is an ‘atomic fifth’ with a microscopically smaller frequency ratio, 32(αχ) 1\frac{3}{2}(\alpha \chi)^{-1}. Both these numbers are rational, and going up 11 schismatic fifths and one atomic fifth is exactly the same as going up 7 octaves.

For comparison, here is equal temperament:

Nobody can hear the difference between rational equal temperament and equal temperament, so Kirnberger’s discovery is of purely theoretical interest. But it’s quite amazing nonetheless.

Much later the physicist Don Page, famous for discovering the ‘Page time’ in black hole physics, became so obsessed with the mathematical coincidence underlying these ideas that he wrote a paper trying to wrestle it down to something where he could do the computations in his head:

For more details on the music theory see:

which is part of a long series I’m writing on the math and history of tuning systems.

February 04, 2024

n-Category Café Axioms for the Category of Finite-Dimensional Hilbert Spaces and Linear Contractions

Guest post by Matthew di Meglio

Recently, my PhD supervisor Chris Heunen and I uploaded a preprint to arXiv giving an axiomatic characterisation of the category FCon\mathbf{FCon} of finite-dimensional Hilbert spaces and linear contractions. I thought it might be nice to explain here in a less formal setting the story of how this article came to be, including some of the motivation, ideas, and challenges.

1. Background and motivation

The starting point was Chris, Andre and Nesta’s recent axiomatic characterisation of the category Con\mathbf{Con} of all Hilbert spaces and linear contractions, which in turn depends on Chris and Andre’s earlier axiomatic characterisation of the category Hilb\mathbf{Hilb} of all Hilbert spaces and all bounded linear maps. The nice thing about these characterisations is that they do not refer to analytic notions such as norms, continuity, (metric) completeness, or the real or complex numbers. Instead, the axioms are about simple category-theoretic structures and properties.

The fundamental structure is that of a dagger — an involutive identity-on-objects contravariant endofunctor () (-)^\dagger. The dagger encodes adjoints of linear maps. Following the “way of the dagger” philosophy, all of the other axioms involve some kind of compatibility condition with the dagger. For instance, rather than asking merely for the existence of equalisers, we ask for the existence of equalisers that are dagger monic, that is, equalisers mm such that m m=1m^\dagger m = 1; in Hilb\mathbf{Hilb} and Con\mathbf{Con}, the dagger monomorphisms are precisely the isometries.

Of course, a natural question to ask is where do the analytic properties come from? In the original article, the heavy lifting is done by Solèr’s Theorem (See also Prestel’s account). This theorem gives conditions under which a hermitian space — a kind of generalised Hilbert space — over an involutive field is actually a Hilbert space over the field \mathbb{R} or \mathbb{C}. Much of the initial part of the original article is spent constructing such a hermitian space over the scalar field; the fact that the scalar field is \mathbb{R} or \mathbb{C} then magically pops out. The proof of Solèr’s theorem is rather unenlightening, using a series of obscure “tricks” to show that the self-adjoint scalars form a Dedekind-complete Archimedean ordered field; that they are the real numbers then follows by the classical characterisation. A more satisfying explanation of why the scalars are the real or complex numbers would describe explicitly how to construct something like limits of sequences, directly from the axioms in a category-theoretic manner.

Another limitation of the Solèr approach is that Solèr’s theorem may only be applied to infinite-dimensional spaces. In many applications of Hilbert spaces, such as quantum computing, we only care about the finite-dimensional spaces. To characterise categories of finite-dimensional Hilbert spaces, a different proof strategy is required.

2. Infima from directed colimits

The only axiom for Con\mathbf{Con} of infinitary nature is the one asserting that all directed diagrams have a colimit. As completeness of metric spaces is an infinitary condition, the completeness of the scalar field and the spaces associated to each object must be encoded in this axiom. This is confirmed by the following explicit construction of such colimits in Con\mathbf{Con}, which features infima of decreasing sequences in +={xx0}\mathbb{R}_+ = \{x \in \mathbb{R}\mid x \geq 0\}, as well as completion of inner-product spaces with respect to their norm.

For simplicity, consider the directed diagram in Con\mathbf{Con} generated by the sequence

(1)X 1f 1X 2f 2X 3f 3 X_1 \xrightarrow{f_1} X_2 \xrightarrow{f_2} X_3 \xrightarrow{f_3} \cdots

of objects and morphisms; diagrams of this shape are called sequential whilst diagrams of the opposite shape are called cosequential. As each of the f nf_n are contractions, for each kk \in \mathbb{N} and each xX kx \in X_k, the sequence

(2)x,f k(x),f k+1f k(x),f k+2f k+1f k(x), \|x\|, \|f_k(x)\|, \|f_{k + 1}f_k (x)\|, \|f_{k + 2}f_{k + 1}f_k (x)\|, \ldots

of positive reals is decreasing, so it has both an infimum and a limit, and these coincide. Let \sim be the binary relation on the set n=1 X n\bigcup_{n = 1}^\infty X_n defined, for all j,kj, k \in \mathbb{N}, each xX jx \in X_j and each xX kx' \in X_k, by xxx \sim x' if

(3)inf nmax(k,j)f nf k+1f k(x)f nf j+1f j(x)=0. \inf_{n \geq \max(k,j)} \|f_n\dots f_{k + 1}f_{k}(x') - f_n\dots f_{j + 1}f_j(x) \| = 0.

Then ( n=1 X n)/\big(\bigcup_{n = 1}^\infty X_n\big)\big/{\sim} inherits the structure of an inner-product space from each of the X kX_k. In particular, its norm is defined, for each kk \in \mathbb{N} and each xX kx \in X_k, by the equation

(4)[x]=inf nkf nf k+1f k(x), \|[x]\| = \inf_{n \geq k} \|f_n \cdots f_{k + 1} f_k (x)\|,

where [x][x] denotes the equivalence class of xx. The completion XX of the inner-product space ( n=1 X n)/\big(\bigcup_{n = 1}^\infty X_n\big)\big/{\sim}, together with the maps X jXX_j \to X that send each element to its equivalence class, from a colimit cocone on the diagram.

Our main idea to bypass Solèr’s theorem is to turn this construction around. Positive scalars are now defined to be the “norms” of elements of objects. A partial order \leqslant on the positive scalars is defined so that the scalar corresponding to one element is at most the scalar corresponding to another exactly when there is a “contraction” mapping the first element to the second. Infima of decreasing sequences and suprema of bounded increasing sequences may then be recovered from the colimits of the associated sequential and cosequential diagrams of “contractions”.

3. Characterising the positive reals

The key ingredient that allowed us to proceed with this approach was our discovery of a beautiful and concise article by Ralph DeMarr from the 60s. It gives a variant of the classical characterisation of the real numbers that (1) only assumes a partial order rather than a total one, and (2) replaces the assumptions of Archimedianness and Dedekind-completeness by monotone sequential completeness. In operator algebra, a partial order is called monotone sequentially complete (or monotone σ\sigma-complete) if every bounded increasing sequence has a supremum. For a partially ordered field, this is equivalent to asking that every decreasing sequence of positive elements has an infimum.

At this point, whilst it is not so relevant for the rest of this blog post, I feel compelled to highlight DeMarr’s clever use of the humble geometric series to prove totality of the order. The main idea is that, for each u0u \geq 0, either u1u \leq 1 or u1u \geq 1 depending on whether or not the increasing sequence s n=1+u+u 2++u ns_n = 1 + u + u^2 + \dots + u^n has a supremum. In turn, this depends on whether or not the infimum of the decreasing sequence 1/s n1/s_n, which always exists, is zero.

The challenge now is that DeMarr’s theorem applies only to partially ordered fields, whilst the construction of infima of positive scalars described above is with respect to a partial order that is defined only for the partially ordered semifield of positive scalars. Resolving this challenge turned out to be much trickier than we first thought.

Our initial approach was to adapt DeMarr’s proof to similarly characterise +\mathbb{R}_+ among partially ordered semifields, and then show that the partially ordered semifield of positive scalars satisfies this new characterisation. To guide us, we had Tobias Fritz’s recent characterisation of +\mathbb{R}_+ among partially ordered strict semifields (see Theorem 4.5) in terms of Dedekind completeness and a multiplicative variant of Archimedeanness. Our goal was to use ideas from DeMarr’s work to replace these classical assumptions with some condition about the existence of suprema or infima of monotone sequences. In the absence of additive inverses, such suprema and infima are not necessarily compatible with addition. As compatibility with addition was used in several steps of DeMarr’s approach, it was clear that we would need to incorporate it into our assumptions.

Multiplicative inversion allows us to pass between considering suprema of bounded increasing sequences and infima of decreasing sequences, modulo being careful about zero, which is not invertible. With this in mind, it is not hard to show that a partially ordered strict semifield has suprema of bounded increasing sequences if and only if it has infima of decreasing sequences. On the other hand, whilst compatibility of such suprema with addition implies compatibility of such infima with addition (this is not so obvious in the absence of additive inverses), the converse is not true.

Indeed, consider the subset 𝕊={(0 0)}(0,)×(0,)\mathbb{S} = \Big\{\Big(\begin{smallmatrix} 0 \\ 0 \end{smallmatrix}\Big)\Big\} \cup (0, \infty) \times (0, \infty) of ×\mathbb{R} \times \mathbb{R}. It is a partially ordered strict semifield with 0=(0 0)0 = \Big(\begin{smallmatrix} 0 \\ 0 \end{smallmatrix}\Big), 1=(1 1)1 = \Big(\begin{smallmatrix} 1 \\ 1 \end{smallmatrix}\Big), pointwise addition and multiplication, and (x y)(u v)\Big(\begin{smallmatrix} x \\ y \end{smallmatrix}\Big) \leqslant \Big(\begin{smallmatrix} u \\ v \end{smallmatrix}\Big) exactly when xyx \leqslant y and yvy \leqslant v. It is also monotone sequentially complete, and suprema are compatible with addition. However, as

(5)(1 1)+inf(1 1/n)=(1 1)+(0 0)=(1 1)(2 1)=inf(2 1+1/n)=inf((1 1)+(1 1/n)), \Big(\begin{smallmatrix} 1 \\ 1 \end{smallmatrix}\Big) + \inf \Big(\begin{smallmatrix} 1 \\ 1/n \end{smallmatrix}\Big) = \Big(\begin{smallmatrix} 1 \\ 1 \end{smallmatrix}\Big) + \Big(\begin{smallmatrix} 0 \\ 0 \end{smallmatrix}\Big) = \Big(\begin{smallmatrix} 1 \\ 1 \end{smallmatrix}\Big) \neq \Big(\begin{smallmatrix} 2 \\ 1 \end{smallmatrix}\Big) = \inf \Big(\begin{smallmatrix} 2 \\ 1 + 1/n \end{smallmatrix}\Big) = \inf \bigg(\Big(\begin{smallmatrix} 1 \\ 1 \end{smallmatrix}\Big) + \Big(\begin{smallmatrix} 1 \\ 1/n \end{smallmatrix}\Big)\bigg),

infima are not compatible with addition. The issue is decreasing sequences whose infimum is zero, because zero is not multiplicatively invertible.

Taking this into account, and cleverly adapting DeMarr’s proof to avoid using additive inverses, yields the following result, which is called Proposition 48 in our article.

Proposition 1. A partially ordered strict semifield is isomorphic to +\mathbb{R}_+ if and only if it is monotone sequentially complete, infima are compatible with addition, and 1+111 + 1 \neq 1.

Assuming that the axioms for Con\mathbf{Con} also imply that infima of positive scalars are compatible with addition, it follows from these axioms and Proposition 1 that the semifield of positive scalars is isomorphic to +\mathbb{R}_+. It is then purely a matter of algebra to show that the field of all scalars is isomorphic to \mathbb{R} or \mathbb{C}. Unlike the proof of this fact via Solèr’s theorem, our new proof is informative, explaining how infima of decreasing sequences of positive scalars arise from sequential colimits.

4. Completeness axioms

This is, however, not the end of the story. You see, aside from the issue of showing that infima of positive scalars are compatible with addition (which we will address shortly), the category FCon\mathbf{FCon} of finite-dimensional Hilbert spaces and linear contractions does not even have all sequential colimits, let alone all directed ones. For example, the sequential diagram

(6)i 1 2i 1,2 3i 1,2,3, \mathbb{C} \xrightarrow{i_1} \mathbb{C}^2 \xrightarrow{i_{1,2}} \mathbb{C}^3 \xrightarrow{i_{1,2,3}} \cdots,

whose colimit in Con\mathbf{Con} is the infinite dimensional Hilbert space 2()\ell_2(\mathbb{N}) of square-summable sequences, does not have a colimit in FCon\mathbf{FCon}. The search for an appropriate completeness axiom for FCon\mathbf{FCon} is a delicate balancing act. We must ask that enough directed colimits exist that the scalar field is complete, but not so many that the category also necessarily has infinite-dimensional objects.

Both of the following candidates for the infinitary axiom seemed likely to strike this balance.

Axiom A. Every sequential diagram of epimorphisms has a colimit.

Axiom B. Every cosequential diagram with a cone of epimorphisms has a limit.

Indeed, both hold in FCon\mathbf{FCon} and Con\mathbf{Con}, and both are sufficient to prove that the positive scalars are monotone sequentially complete. Axiom A is a categorification of the requirement that every decreasing sequence of positive scalars has an infimum. Axiom B is a categorification of the requirement that every bounded increasing sequence of positive scalars has a supremum. Given that Axiom A is simpler, and, being about infima rather than suprema, is better matched with Proposition 1, our initial focus was on Axiom A.

Unsuccessful in our attempts to derive compatibility of addition with infima using only Axiom A and the other axioms for Con\mathbf{Con}, we decided to allow ourselves one additional conservative assumption: that each functor XX \oplus - preserves colimits of sequential diagrams of epimorphisms. By the middle of last year, we had almost finished a draft of our article based on this approach. Unfortunately, at this point, I noticed a subtle error in our compatibility proof, which took several months to resolve. In the end, we assumed the following more-complicated variant of Axiom A, which, in our article, is called Axiom 9’.

Axiom A’. Every sequential diagram of epimorphisms has a colimit, and, for each natural transformation of such diagrams whose components are dagger monic, the induced morphism between the colimits is also dagger monic.

5. Finite dimensionality

From here, we thought that it would be easy sailing to the end. All that remained, really, was to show that the inner-product space associated to each object is finite dimensional, and Andre had already sketched out to us how this might work.

An object in a dagger category is called dagger finite if every dagger monic endomorphism on that object is an isomorphism. The origin of this notion is operator algebra, although it quite similar to the notion of Dedekind finiteness from set theory.

An object of Con\mathbf{Con} is dagger finite if and only if it is finite dimensional. The idea is that every infinite-dimensional Hilbert space XX contains a copy of the space 2()\ell_2(\mathbb{N}) of square-summable sequences. The direct sum of the canonical right shift map on this subspace with the identity map on its orthogonal complement is a dagger monic endomorphism on XX that is not an isomorphism. Andre adapted this proof to the abstract axiomatic setting, using sequential colimits to construct abstract analogues of the copy of 2()\ell_2(\mathbb{N}) and its right shift map.

Unfortunately, the colimits required to make this proof work are of sequential diagrams of dagger monomorphisms, whilst the axiom that we had assumed was about sequential diagrams of epimorphisms. The equivalence between the category of dagger subobjects of a fixed object and dagger quotients of that object, given by taking dagger kernels and dagger cokernels, presented a possible workaround. For any object XX, through this equivalence, the category of dagger subobjects of XX has sequential colimits. If XX is dagger infinite, then we may construct a dagger subobject of XX corresponding to 2()\ell_2(\mathbb{N}) using such a sequential colimit. Unfortunately, as the abstract analogue of right shift map is not a morphism in this category of dagger subobjects, we cannot construct it using the universal property of this sequential colimit.

Ultimately, our approach using Axiom A’ and Proposition 1 was abandoned, and the parts of it that did work were moved to an appendix.

6. Accounting for the field embedding

Axiom B, which is about suprema rather than infima, is not well matched with Proposition 1. If we assume Axiom B, then we need to somehow exclude the partially ordered strict semifields, like 𝕊\mathbb{S}, that have badly behaved decreasing sequences with infimum zero. What should have been obvious in hindsight is that, whilst our semifield of interest — the positive scalars — embeds in a field, the problematic semifield 𝕊\mathbb{S} does not. If it did, then

(7)((2 1)(1 1))((2 1)(2 2))=(4 1)(2 1)(4 2)+(2 2)=(6 3)(6 3)=(0 0), \bigg(\Big(\begin{smallmatrix} 2 \\ 1 \end{smallmatrix}\Big) - \Big(\begin{smallmatrix} 1 \\ 1 \end{smallmatrix}\Big)\bigg)\bigg(\Big(\begin{smallmatrix} 2 \\ 1 \end{smallmatrix}\Big) - \Big(\begin{smallmatrix} 2 \\ 2 \end{smallmatrix}\Big)\bigg) = \Big(\begin{smallmatrix} 4 \\ 1 \end{smallmatrix}\Big) - \Big(\begin{smallmatrix} 2 \\ 1 \end{smallmatrix}\Big) - \Big(\begin{smallmatrix} 4 \\ 2 \end{smallmatrix}\Big) + \Big(\begin{smallmatrix} 2 \\ 2 \end{smallmatrix}\Big) = \Big(\begin{smallmatrix} 6 \\ 3 \end{smallmatrix}\Big) - \Big(\begin{smallmatrix} 6 \\ 3 \end{smallmatrix}\Big) = \Big(\begin{smallmatrix} 0 \\ 0 \end{smallmatrix}\Big),

so either (2 1)=(1 1)\Big(\begin{smallmatrix} 2 \\ 1 \end{smallmatrix}\Big) = \Big(\begin{smallmatrix} 1 \\ 1 \end{smallmatrix}\Big) or (2 1)=(2 2) \Big(\begin{smallmatrix} 2 \\ 1 \end{smallmatrix}\Big) = \Big(\begin{smallmatrix} 2 \\ 2 \end{smallmatrix}\Big) , which is a contradiction.

This trick of forming a quadratic equation to show that two elements are equal actually forms the basis of the following result, called Lemma 35 in the article.

Lemma 2. In a partially ordered strict semifield that is monotone sequentially complete and embeds in a field, if suprema are compatible with addition, then inf(a+u n)=a\inf(a + u^n) = a for all non-zero aa and all u<1u \lt 1.

Noting that infu n=0\inf u^n = 0 when u<1u \lt 1, we see that this extra assumption of a field embedding allowed us to deduce that addition is compatible with at least one general class of decreasing sequences with infimum zero.

We actually knew all along that we could partially order the field of self-adjoint scalars by aba \preccurlyeq b if and only if bab - a is a positive scalar. To apply DeMarr’s theorem to the self-adjoint scalars, this partial order must be monotone sequentially complete. As discussed earlier, we also knew how to show that the “contraction” partial order \leqslant on the positive scalars is montone sequentially complete. The difficulty is that, a priori, the partial order \leqslant merely refines the partial order \preccurlyeq. When we started this project, we had no idea how to lift monotone sequential completeness from \leqslant to \preccurlyeq, so we quickly dismissed this idea. However, now armed with a better understanding of infima and suprema in partially ordered semifields, and Lemma 2 in particular, we could finally make headway.

With a few more tricks, including a clever use of completing the square, we arrived at the following proposition, called Proposition 36 in our article.

Proposition 3. Let CC be an involutive field with a partially ordered strict subsemifield (P,)(P, \leqslant) whose elements are all self-adjoint and include a aa^\dagger a for all aCa \in C. If PP is monotone sequentially complete and its suprema are compatible with addition, then there is an isomorphism of CC with \mathbb{R} or \mathbb{C} that maps PP onto +\mathbb{R}_+.

All that remains, is to show that suprema are compatible with addition, and that the inner-product space associated to each object is finite dimensional. For compatibility, unlike when we assumed Axiom A, it is actually enough that each functor XX \oplus - preserve limits of cosequential diagrams with a cone of epimorphisms. Actually, very recently, I stumbled on a new characterisation of dagger biproducts and a few more tricks that enabled us to finally prove this fact from the other axioms. For finite-dimensionality, Andre’s proof already essentially assumed the dual of Axiom B, and so works unchanged.

7. Conclusion

It’s quite a miracle that everything worked out so well, and without concession of ugly assumptions. Our faith that the scalars being \mathbb{R} or \mathbb{C} should be provable in a manner that works equally well with the axioms for Con\mathbf{Con} as for FCon\mathbf{FCon} certainly guided us in the right direction. However, there were many points at which we almost forfeited this goal to accept a subpar final result.

Unfortunately, in published mathematics, incentives for clarity, conciseness, and positive results leave little space to tell stories of how ideas and results came to be, even when these stories are interesting and insightful. I hope that for our article, this blog post conveys at least some part of the emotional rollercoaster, the twists and turns and failed attempts, that got us to the end.

February 03, 2024

Terence TaoBounding sums or integrals of non-negative quantities

A common task in analysis is to obtain bounds on sums

\displaystyle  \sum_{n \in A} f(n)

or integrals

\displaystyle  \int_A f(x)\ dx

where {A} is some simple region (such as an interval) in one or more dimensions, and {f} is an explicit (and elementary) non-negative expression involving one or more variables (such as {n} or {x}, and possibly also some additional parameters. Often, one would be content with an order of magnitude upper bound such as

\displaystyle  \sum_{n \in A} f(n) \ll X


\displaystyle  \int_A f(x)\ dx \ll X

where we use {X \ll Y} (or {Y \gg X} or {X = O(Y)}) to denote the bound {|X| \leq CY} for some constant {C}; sometimes one wishes to also obtain the matching lower bound, thus obtaining

\displaystyle  \sum_{n \in A} f(n) \asymp X


\displaystyle  \int_A f(x)\ dx \asymp X

where {X \asymp Y} is synonymous with {X \ll Y \ll X}. Finally, one may wish to obtain a more precise bound, such as

\displaystyle  \sum_{n \in A} f(n) = (1+o(1)) X

where {o(1)} is a quantity that goes to zero as the parameters of the problem go to infinity (or some other limit). (For a deeper dive into asymptotic notation in general, see this previous blog post.)

Here are some typical examples of such estimation problems, drawn from recent questions on MathOverflow:

  • (i) (From this question) If {d,p \geq 1} and {a>d/p}, is the expression

    \displaystyle  \sum_{j \in {\bf Z}} 2^{(\frac{d}{p}+1-a)j} \int_0^\infty e^{-2^j s} \frac{s^a}{1+s^{2a}}\ ds

  • (ii) (From this question) If {h,m \geq 1}, how can one show that

    \displaystyle  \sum_{d=0}^\infty \frac{2d+1}{2h^2 (1 + \frac{d(d+1)}{h^2}) (1 + \frac{d(d+1)}{h^2m^2})^2} \ll 1 + \log(m^2)?

  • (iii) (From this question) Can one show that

    \displaystyle  \sum_{k=1}^{n-1} \frac{k^{2n-4k-3}(n^2-2nk+2k^2)}{(n-k)^{2n-4k-1}} = (c+o(1)) \sqrt{n}

    as {n \rightarrow \infty} for an explicit constant {c}, and what is this constant?

Compared to other estimation tasks, such as that of controlling oscillatory integrals, exponential sums, singular integrals, or expressions involving one or more unknown functions (that are only known to lie in some function spaces, such as an {L^p} space), high-dimensional geometry (or alternatively, large numbers of random variables), or number-theoretic structures (such as the primes), estimation of sums or integrals of non-negative elementary expressions is a relatively straightforward task, and can be accomplished by a variety of methods. The art of obtaining such estimates is typically not explicitly taught in textbooks, other than through some examples and exercises; it is typically picked up by analysts (or those working in adjacent areas, such as PDE, combinatorics, or theoretical computer science) as graduate students, while they work through their thesis or their first few papers in the subject.

Somewhat in the spirit of this previous post on analysis problem solving strategies, I am going to try here to collect some general principles and techniques that I have found useful for these sorts of problems. As with the previous post, I hope this will be something of a living document, and encourage others to add their own tips or suggestions in the comments.

— 1. Asymptotic arithmetic —

Asymptotic notation is designed so that many of the usual rules of algebra and inequality manipulation continue to hold, with the caveat that one has to be careful if subtraction or division is involved. For instance, if one knows that {A \ll X} and {B \ll Y}, then one can immediately conclude that {A + B \ll X+Y} and {AB \ll XY}, even if {A,B} are negative (note that the notation {A \ll X} or {B \ll Y} automatically forces {X,Y} to be non-negative). Equivalently, we have the rules

\displaystyle  O(X) + O(Y) = O(X+Y); \quad O(X) \cdot O(Y) = O(XY)

and more generally we have the triangle inequality

\displaystyle  \sum_\alpha O(X_\alpha) = O( \sum_\alpha X_\alpha ).

Again, we stress that this sort of rule implicitly requires the {X_\alpha} to be non-negative, and that claims such as {O(X) - O(Y) = O(X-Y)} and {O(X)/O(Y) = O(X/Y)} are simply false. As a rule of thumb, if your calculations have arrived at a situation where a signed or oscillating sum or integral appears inside the big-O notation, or on the right-hand side of an estimate, without being “protected” by absolute value signs, then you have probably made a serious error in your calculations.

Another rule of inequalities that is inherited by asymptotic notation is that if one has two bounds

\displaystyle  A \ll X; \quad A \ll Y \ \ \ \ \ (1)

for the same quantity {A}, then one can combine them into the unified asymptotic bound

\displaystyle  A \ll \min(X, Y). \ \ \ \ \ (2)

This is an example of a “free move”: a replacement of bounds that does not lose any of the strength of the original bounds, since of course (2) implies (1). In contrast, other ways to combine the two bounds (1), such as taking the geometric mean

\displaystyle  A \ll X^{1/2} Y^{1/2}, \ \ \ \ \ (3)

while often convenient, are not “free”: the bounds (1) imply the averaged bound (3), but the bound (3) does not imply (1). On the other hand, the inequality (2), while it does not concede any logical strength, can require more calculation to work with, often because one ends up splitting up cases such as {X \ll Y} and {X \gg Y} in order to simplify the minimum. So in practice, when trying to establish an estimate, one often starts with using conservative bounds such as (2) in order to maximize one’s chances of getting any proof (no matter how messy) of the desired estimate, and only after such a proof is found, one tries to look for more elegant approaches using less efficient bounds such as (3).

For instance, suppose one wanted to show that the sum

\displaystyle  \sum_{n=-\infty}^\infty \frac{2^n}{(1+n^2) (1+2^{2n})}

was convergent. Lower bounding the denominator term {1+2^{2n}} by {1} or by {2^{2n}}, one obtains the bounds

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{1+n^2} \ \ \ \ \ (4)

and also

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{(1+n^2) 2^{2n}} = \frac{2^{-n}}{1+n^2} \ \ \ \ \ (5)

so by applying (2) we obtain the unified bound

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{2^n}{(1+n^2) 2^{2n}} = \frac{\min(2^n,2^{-n})}{1+n^2}.

To deal with this bound, we can split into the two contributions {n \geq 0}, where {2^{-n}} dominates, and {n < 0}, where {2^n} dominates. In the former case we see (from the ratio test, for instance) that the sum

\displaystyle  \sum_{n=0}^\infty \frac{2^{-n}}{1+n^2}

is absolutely convergent, and in the latter case we see that the sum

\displaystyle  \sum_{n=-\infty}^{-1} \frac{2^{n}}{1+n^2}

is also absolutely convergent, so the entire sum is absolutely convergent. But once one has this argument, one can try to streamline it, for instance by taking the geometric mean of (4), (5) rather than the minimum to obtain the weaker bound

\displaystyle  \frac{2^n}{(1+n^2) (1+2^{2n})} \ll \frac{1}{1+n^2} \ \ \ \ \ (6)

and now one can conclude without decomposition just by observing the absolute convergence of the doubly infinite sum {\sum_{n=-\infty}^\infty \frac{1}{1+n^2}}. This is a less “efficient” estimate, because one has conceded a lot of the decay in the summand by using (6) (the summand used to be exponentially decaying in {n}, but is now only polynomially decaying), but it is still sufficient for the purpose of establishing absolute convergence.

One of the key advantages of dealing with order of magnitude estimates, as opposed to sharp inequalities, is that the arithmetic becomes tropical. More explicitly, we have the important rule

\displaystyle  X + Y \asymp \max(X,Y)

whenever {X,Y} are non-negative, since we clearly have

\displaystyle  \max(X,Y) \leq X+Y \leq 2 \max(X,Y).

In particular, if {Y \leq X}, then {O(X) + O(Y) = O(X)}. That is to say, given two orders of magnitudes, any term {O(Y)} of equal or lower order to a “main term” {O(X)} can be discarded. This is a very useful rule to keep in mind when trying to estimate sums or integrals, as it allows one to discard many terms that are not contributing to the final answer. It also interacts well with monotone operations, such as raising to a power {p}; for instance, we have

\displaystyle  (X+Y)^p \asymp \max(X,Y)^p = \max(X^p,Y^p) \asymp X^p + Y^p

if {X,Y \geq 0} and {p} is a fixed positive constant, whilst

\displaystyle  \frac{1}{X+Y} \asymp \frac{1}{\max(X,Y)} = \min(\frac{1}{X}, \frac{1}{Y})

if {X,Y>0}. Finally, this relation also sets up the fundamental divide and conquer strategy for estimation: if one wants to prove a bound such as {A \ll X}, it will suffice to obtain a decomposition

\displaystyle  A = A_1 + \dots + A_k

or at least an upper bound

\displaystyle  A \ll A_1 + \dots + A_k

of {A} by some bounded number of components {A_1,\dots,A_k}, and establish the bounds {A_1 \ll X, \dots, A_k \ll X} separately. Typically the {A_1,\dots,A_k} will be (morally at least) smaller than the original quantity {A} – for instance, if {A} is a sum of non-negative quantities, each of the {A_i} might be a subsum of those same quantities – which means that such a decomposition is a “free move”, in the sense that it does not risk making the problem harder. (This is because, if the original bound {A \ll X} is to be true, each of the new objectives {A_1 \ll X, \dots, A_k \ll X} must also be true, and so the decomposition can only make the problem logically easier, not harder.) The only costs to such decomposition are that your proofs might be {k} times longer, as you may be repeating the same arguments {k} times, and that the implied constants in the {A_1 \ll X, \dots, A_k \ll X} bounds may be worse than the implied constant in the original {A \ll X} bound. However, in many cases these costs are well worth the benefits of being able to simplify the problem into smaller pieces. As mentioned above, once one successfully executes a divide and conquer strategy, one can go back and try to reduce the number of decompositions, for instance by unifying components that are treated by similar methods, or by replacing strong but unwieldy estimates with weaker, but more convenient estimates.

The above divide and conquer strategy does not directly apply when one is decomposing into an unbounded number of pieces {A_j}, {j=1,2,\dots}. In such cases, one needs an additional gain in the index {j} that is summable in {j} in order to conclude. For instance, if one wants to establish a bound of the form {A \ll X}, and one has located a decomposition or upper bound

\displaystyle  A \ll \sum_{j=1}^\infty A_j

that looks promising for the problem, then it would suffice to obtain exponentially decaying bounds such as

\displaystyle  A_j \ll 2^{-cj} X

for all {j \geq 1} and some constant {c>0}, since this would imply

\displaystyle  A \ll \sum_{j=1}^\infty 2^{-cj} X \ll X \ \ \ \ \ (7)

thanks to the geometric series formula. (Here it is important that the implied constants in the asymptotic notation are uniform on {j}; a {j}-dependent bound such as {A_j \ll_j 2^{-cj} X} would be useless for this application, as then the growth of the implied constant in {j} could overwhelm the exponential decay in the {2^{-cj}} factor). Exponential decay is in fact overkill; polynomial decay such as

\displaystyle  A_j \ll \frac{X}{j^{1+c}}

would already be sufficient, although harmonic decay such

\displaystyle  A_j \ll \frac{X}{j} \ \ \ \ \ (8)

is not quite enough (the sum {\sum_{j=1}^\infty \frac{1}{j}} diverges logarithmically), although in many such situations one could try to still salvage the bound by working a lot harder to squeeze some additional logarithmic factors out of one’s estimates. For instance, if one can improve (8) to

\displaystyle  A_j \ll \frac{X}{j \log^{1+c} j}

for all {j \geq 2} and some constant {c>0}, since (by the integral test) the sum {\sum_{j=2}^\infty \frac{1}{j\log^{1+c} j}} converges (and one can treat the {j=1} term separately if one already has (8)).

Sometimes, when trying to prove an estimate such as {A \ll X}, one has identified a promising decomposition with an unbounded number of terms

\displaystyle  A \ll \sum_{j=1}^J A_j

(where {J} is finite but unbounded) but is unsure of how to proceed next. Often the next thing to do is to study the extreme terms {A_1} and {A_J} of this decomposition, and first try to establish (the presumably simpler) tasks of showing that {A_1 \ll X} and {A_J \ll X}. Often once one does so, it becomes clear how to combine the treatments of the two extreme cases to also treat the intermediate cases, obtaining a bound {A_j \ll X} for each individual term, leading to the inferior bound {A \ll JX}; this can then be used as a starting point to hunt for additional gains, such as the exponential or polynomial gains mentioned previously, that could be used to remove this loss of {J}. (There are more advanced techniques, such as those based on controlling moments such as the square function {(\sum_{j=1}^J |A_j|^2)^{1/2}}, or trying to understand the precise circumstances in which a “large values” scenario {|A_j| \gg X} occurs, and how these scenarios interact with each other for different {j}, but these are beyond the scope of this post, as they are rarely needed when dealing with sums or integrals of elementary functions.)

If one is faced with the task of estimating a doubly infinite sum {\sum_{j=-\infty}^\infty A_j}, it can often be useful to first think about how one would proceed in estimating {A_j} when {j} is very large and positive, and how one would proceed when {A_j} is very large and negative. In many cases, one can simply decompose the sum into two pieces such as {\sum_{j=1}^\infty A_j} and {\sum_{j=-\infty}^{-1} A_j} and use whatever methods you came up with to handle the two extreme cases; in some cases one also needs a third argument to handle the case when {j} is of bounded (or somewhat bounded) size, in which case one may need to divide into three pieces such as {\sum_{j=J_+}^\infty A_j}, {\sum_{j=-\infty}^{J_-} A_j}, and {\sum_{j=J_-+1}^{J_+-1} A_j}. Sometimes there will be a natural candidate for the places {J_-, J_+} where one is cutting the sum, but in other situations it may be best to just leave these cut points as unspecified parameters initially, obtain bounds that depend on these parameters, and optimize at the end. (Typically, the optimization proceeds by trying to balance the magnitude of a term that is increasing with respect to a parameter, with one that is decreasing. For instance, if one ends up with a bound such as {A \lambda + B/\lambda} for some parameter {\lambda>0} and quantities {A,B>0}, it makes sense to select {\lambda = \sqrt{B/A}} to balance the two terms. Or, if faced with something like {A e^{-\lambda} + \lambda} for some {A > 2}, then something like {\lambda = \log A} would be close to the optimal choice of parameter. And so forth.)

— 1.1. Psychological distinctions between exact and asymptotic arithmetic —

The adoption of the “divide and conquer” strategy requires a certain mental shift from the “simplify, simplify” strategy that one is taught in high school algebra. In the latter strategy, one tries to collect terms in an expression make them as short as possible, for instance by working with a common denominator, with the idea that unified and elegant-looking expressions are “simpler” than sprawling expressions with many terms. In contrast, the divide and conquer strategy is intentionally extremely willing to greatly increase the total length of the expressions to be estimated, so long as each individual component of the expressions appears easier to estimate than the original one. Both strategies are still trying to reduce the original problem to a simpler problem (or collection of simpler sub-problems), but the metric by which one judges whether the problem has become simpler is rather different.

A related mental shift that one needs to adopt in analysis is to move away from the exact identities that are so prized in algebra (and in undergraduate calculus), as the precision they offer is often unnecessary and distracting for the task at hand, and often fail to generalize to more complicated contexts in which exact identities are no longer available. As a simple example, consider the task of estimating the expression

\displaystyle  \int_0^a \frac{dx}{1+x^2}

where {a > 0} is a parameter. With a trigonometric substitution, one can evaluate this expression exactly as {\mathrm{arctan}(a)}, however the presence of the arctangent can be inconvenient if one has to do further estimation tasks (for instance, if {a} depends in a complicated fashion on other parameters, which one then also wants to sum or integrate over). Instead, by observing the trivial bounds

\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \int_0^a\ dx = a


\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \int_0^\infty\ \frac{dx}{1+x^2} = \frac{\pi}{2}

one can combine them using (2) to obtain the upper bound

\displaystyle  \int_0^a \frac{dx}{1+x^2} \leq \min( a, \frac{\pi}{2} ) \asymp \min(a,1)

and similar arguments also give the matching lower bound, thus

\displaystyle  \int_0^a \frac{dx}{1+x^2} \asymp \min(a,1). \ \ \ \ \ (9)

This bound, while cruder than the exact answer of {\mathrm{arctan}(a)}, is often good enough for many applications (particularly in situations where one is willing to concede constants in the bounds), and can be more tractible to work with than the exact answer. Furthermore, these arguments can be adapted without difficulty to treat similar expressions such as

\displaystyle  \int_0^a \frac{dx}{(1+x^2)^\alpha}

for any fixed exponent {\alpha>0}, which need not have closed form exact expressions in terms of elementary functions such as the arctangent when {\alpha} is non-integer.

As a general rule, instead of relying exclusively on exact formulae, one should seek approximations that are valid up to the degree of precision that one seeks in the final estimate. For instance, suppose one one wishes to establish the bound

\displaystyle  \sec(x) - \cos(x) = x^2 + O(x^3)

for all sufficiently small {x}. If one was clinging to the exact identity mindset, one could try to look for some trigonometric identity to simplify the left-hand side exactly, but the quicker (and more robust) way to proceed is just to use Taylor expansion up to the specified accuracy {O(x^3)} to obtain

\displaystyle  \cos(x) = 1 - \frac{x^2}{2} + O(x^3)

which one can invert using the geometric series formula {(1-y)^{-1} = 1 + y + y^2 + \dots} to obtain

\displaystyle  \sec(x) = 1 + \frac{x^2}{2} + O(x^3)

from which the claim follows. (One could also have computed the Taylor expansion of {\sec(x)} by repeatedly differentiating the secant function, but as this is a series that is usually not memorized, this can take a little bit more time than just computing it directly to the required accuracy as indicated above.) Note that the notion of “specified accuracy” may have to be interpreted in a relative sense if one is planning to multiply or divide several estimates together. For instance, if one wishes to establsh the bound

\displaystyle  \sin(x) \cos(x) = x + O(x^3)

for small {x}, one needs an approximation

\displaystyle  \sin(x) = x + O(x^3)

to the sine function that is accurate to order {O(x^3)}, but one only needs an approximation

\displaystyle  \cos(x) = 1 + O(x^2)

to the cosine function that is accurate to order {O(x^2)}, because the cosine is to be multiplied by {\sin(x)= O(x)}. Here the key is to obtain estimates that have a relative error of {O(x^2)}, compared to the main term (which is {1} for cosine, and {x} for sine).

The following table lists some common approximations that can be used to simplify expressions when one is only interested in order of magnitude bounds (with {c>0} an arbitrary small constant):

The quantity… has magnitude comparable to … provided that…
{X+Y} {X} {0 \leq Y \ll X} or {|Y| \leq (1-c)X}
{X+Y} {\max(X,Y)} {X,Y \geq 0}
{\sin z}, {\tan z}, {e^{iz}-1} {|z|} {|z| \leq \frac{\pi}{2} - c}
{\cos z} {1} {|z| \leq \pi/2 - c}
{\sin x} {\mathrm{dist}(x, \pi {\bf Z})} {x} real
{e^{ix}-1} {\mathrm{dist}(x, 2\pi {\bf Z})} {x} real
{\mathrm{arcsin} x} {|x|} {|x| \leq 1-c}
{\log(1+z)} {|z|} {|z| \leq 1-c}
{e^z-1}, {\sinh z}, {\tanh z} {|z|} {|z| \leq \frac{\pi}{2}-c}
{\cosh z} {1} {|z| \leq \frac{\pi}{2}-c}
{\sinh x}, {\cosh x} {e^x} {|x| \gg 1}
{\tanh x} {\min(|x|, 1)} {x} real
{(1+x)^a-1} {a|x|} {a \gg 1}, {a |x| \ll 1}
{n!} {n^n e^{-n} \sqrt{n}} {n \geq 1}
{\Gamma(s)} {|s^s e^{-s}| / |s|^{1/2}} {|z| \gg 1}, {|\mathrm{arg} z| \leq \frac{\pi}{2} - c}
{\Gamma(\sigma+it)} {|t|^{\sigma-1/2} e^{-\pi |t|/2}} {\sigma = O(1)}, {|t| \gg 1}
{\binom{n}{m}} {e^{n (p \log \frac{1}{p} + (1-p) \log \frac{1}{1-p})} / n^{1/2}} {m=pn}, {c < p < 1-c}
{\binom{n}{m}} {2^n e^{-2(m-n/2)^2} / n^{1/2}} {m = n/2 + O(n^{2/3})}
{\binom{n}{m}} {n^m/m!} {m \ll \sqrt{n}}

On the other hand, some exact formulae are still very useful, particularly if the end result of that formula is clean and tractable to work with (as opposed to involving somewhat exotic functions such as the arctangent). The geometric series formula, for instance, is an extremely handy exact formula, so much so that it is often desirable to control summands by a geometric series purely to use this formula (we already saw an example of this in (7)). Exact integral identities, such as

\displaystyle  \frac{1}{a} = \int_0^\infty e^{-at}\ dt

or more generally

\displaystyle  \frac{\Gamma(s)}{a^s} = \int_0^\infty e^{-at} t^{s-1}\ dt

for {a,s>0} (where {\Gamma} is the Gamma function) are also quite commonly used, and fundamental exact integration rules such as the change of variables formula, the Fubini-Tonelli theorem or integration by parts are all esssential tools for an analyst trying to prove estimates. Because of this, it is often desirable to estimate a sum by an integral. The integral test is a classic example of this principle in action: a more quantitative versions of this test is the bound

\displaystyle  \int_{a}^{b+1} f(t)\ dt \leq \sum_{n=a}^b f(n) \leq \int_{a-1}^b f(t)\ dt \ \ \ \ \ (10)

whenever {a \leq b} are integers and {f: [a-1,b+1] \rightarrow {\bf R}} is monotone decreasing, or the closely related bound

\displaystyle  \sum_{a \leq n \leq b} f(n) = \int_a^b f(t)\ dt + O( |f(a)| + |f(b)| ) \ \ \ \ \ (11)

whenever {a \geq b} are reals and {f: [a,b] \rightarrow {\bf R}} is monotone (either increasing or decreasing); see Lemma 2 of this previous post. Such bounds allow one to switch back and forth quite easily between sums and integrals as long as the summand or integrand behaves in a mostly monotone fashion (for instance, if it is monotone increasing on one portion of the domain and monotone decreasing on the other). For more precision, one could turn to more advanced relationships between sums and integrals, such as the Euler-Maclaurin formula or the Poisson summation formula, but these are beyond the scope of this post.

Exercise 1 Suppose {f: {\bf R} \rightarrow {\bf R}^+} obeys the quasi-monotonicity property {f(x) \ll f(y)} whenever {y-1 \leq x \leq y}. Show that {\int_a^{b-1} f(t)\ dt \ll \sum_{n=a}^b f(n) \ll \int_a^{b+1} f(t)\ dt} for any integers {a < b}.

Exercise 2 Use (11) to obtain the “cheap Stirling approximation

\displaystyle  n! = \exp( n \log n - n + O(\log n) )

for any natural number {n \geq 2}. (Hint: take logarithms to convert the product {n! = 1 \times 2 \times \dots \times n} into a sum.)

With practice, you will be able to identify any term in a computation which is already “negligible” or “acceptable” in the sense that its contribution is always going to lead to an error that is smaller than the desired accuracy of the final estimate. One can then work “modulo” these negligible terms and discard them as soon as they appear. This can help remove a lot of clutter in one’s arguments. For instance, if one wishes to establish an asymptotic of the form

\displaystyle  A = X + O(Y)

for some main term {X} and lower order error {O(Y)}, any component of {A} that one can already identify to be of size {O(Y)} is negligible and can be removed from {A} “for free”. Conversely, it can be useful to add negligible terms to an expression, if it makes the expression easier to work with. For instance, suppose one wants to estimate the expression

\displaystyle  \sum_{n=1}^N \frac{1}{n^2}. \ \ \ \ \ (12)

This is a partial sum for the zeta function

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^2} = \zeta(2) = \frac{\pi^2}{6}

so it can make sense to add and subtract the tail {\sum_{n=N+1}^\infty \frac{1}{n^2}} to the expression (12) to rewrite it as

\displaystyle  \frac{\pi^2}{6} - \sum_{n=N+1}^\infty \frac{1}{n^2}.

To deal with the tail, we switch from a sum to the integral using (10) to bound

\displaystyle  \sum_{n=N+1}^\infty \frac{1}{n^2} \ll \int_N^\infty \frac{1}{t^2}\ dt = \frac{1}{N}

giving us the reasonably accurate bound

\displaystyle  \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} - O(\frac{1}{N}).

One can sharpen this approximation somewhat using (11) or the Euler–Maclaurin formula; we leave this to the interested reader.

Another psychological shift when switching from algebraic simplification problems to estimation problems is that one has to be prepared to let go of constraints in an expression that complicate the analysis. Suppose for instance we now wish to estimate the variant

\displaystyle  \sum_{1 \leq n \leq N, \hbox{ square-free}} \frac{1}{n^2}

of (12), where we are now restricting {n} to be square-free. An identity from analytic number theory (the Euler product identity) lets us calculate the exact sum

\displaystyle  \sum_{n \geq 1, \hbox{ square-free}} \frac{1}{n^2} = \frac{\zeta(2)}{\zeta(4)} = \frac{15}{\pi^2}

so as before we can write the desired expression as

\displaystyle  \frac{15}{\pi^2} - \sum_{n > N, \hbox{ square-free}} \frac{1}{n^2}.

Previously, we applied the integral test (10), but this time we cannot do so, because the restriction to square-free integers destroys the monotonicity. But we can simply remove this restriction:

\displaystyle  \sum_{n > N, \hbox{ square-free}} \frac{1}{n^2} \leq \sum_{n > N} \frac{1}{n^2}.

Heuristically at least, this move only “costs us a constant”, since a positive fraction ({1/\zeta(2)= 6/\pi^2}, in fact) of all integers are square-free. Now that this constraint has been removed, we can use the integral test as before and obtain the reasonably accurate asymptotic

\displaystyle  \sum_{1 \leq n \leq N, \hbox{ square-free}} \frac{1}{n^2} = \frac{15}{\pi^2} + O(\frac{1}{N}).

— 2. More on decomposition —

The way in which one decomposes a sum or integral such as {\sum_{n \in A} f(n)} or {\int_A f(x)\ dx} is often guided by the “geometry” of {f}, and in particular where {f} is large or small (or whether various component terms in {f} are large or small relative to each other). For instance, if {f(x)} comes close to a maximum at some point {x=x_0}, then it may make sense to decompose based on the distance {|x-x_0|} to {x_0}, or perhaps to treat the cases {x \leq x_0} and {x>x_0} separately. (Note that {x_0} does not literally have to be the maximum in order for this to be a reasonable decomposition; if it is in “within reasonable distance” of the maximum, this could still be a good move. As such, it is often not worthwhile to try to compute the maximum of {f} exactly, especially if this exact formula ends up being too complicated to be useful.)

If an expression involves a distance {|X-Y|} between two quantities {X,Y}, it is sometimes useful to split into the case {|X| \leq |Y|/2} where {X} is much smaller than {Y} (so that {|X-Y| \asymp |Y|}), the case {|Y| \leq |X|/2} where {Y} is much smaller than {X} (so that {|X-Y| \asymp |X|}), or the case when neither of the two previous cases apply (so that {|X| \asymp |Y|}). The factors of {2} here are not of critical importance; the point is that in each of these three cases, one has some hope of simplifying the expression into something more tractable. For instance, suppose one wants to estimate the expression

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2) (1+(x-b)^2)} \ \ \ \ \ (13)

in terms of the two real parameters {a, b}, which we will take to be distinct for sake of this discussion. This particular integral is simple enough that it can be evaluated exactly (for instance using contour integration techniques), but in the spirit of Principle 1, let us avoid doing so and instead try to decompose this expression into simpler pieces. A graph of the integrand reveals that it peaks when {x} is near {a} or near {b}. Inspired by this, one can decompose the region of integration into three pieces:

  • (i) The region where {|x-a| \leq \frac{|a-b|}{2}}.
  • (ii) The region where {|x-b| \leq \frac{|a-b|}{2}}.
  • (iii) The region where {|x-a|, |x-b| > \frac{|a-b|}{2}}.

(This is not the only way to cut up the integral, but it will suffice. Often there is no “canonical” or “elegant” way to perform the decomposition; one should just try to find a decomposition that is convenient for the problem at hand.)

The reason why we want to perform such a decomposition is that in each of the three cases, one can simplify how the integrand depends on {x}. For instance, in region (i), we see from the triangle inequality that {|x-b|} is now comparable to {|a-b|}, so that this contribution to (13) is comparable to

\displaystyle  \asymp \int_{|x-a| \leq |a-b|/2} \frac{dx}{(1+(x-a)^2) (1+(a-b)^2)}.

Using a variant of (9), this expression is comparable to

\displaystyle  \asymp \min( 1, |a-b|/2) \frac{1}{1+(a-b)^2} \asymp \frac{\min(1, |a-b|)}{1+(a-b)^2}. \ \ \ \ \ (14)

The contribution of region (ii) can be handled similarly, and is also comparable to (14). Finally, in region (iii), we see from the triangle inequality that {|x-a|, |x-b|} are now comparable to each other, and so the contribution of this region is comparable to

\displaystyle  \asymp \int_{|x-a|, |x-b| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2}.

Now that we have centered the integral around {x=a}, we will discard the {|x-b| > |a-b|/2} constraint, upper bounding this integral by

\displaystyle  \asymp \int_{|x-a| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2}.

On the one hand this integral is bounded by

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2)^2} = \int_{-\infty}^\infty \frac{dx}{(1+x^2)^2} \asymp 1

and on the other hand we can bound

\displaystyle  \int_{|x-a| > |a-b|/2} \frac{dx}{(1+(x-a)^2)^2} \leq \int_{|x-a| > |a-b|/2} \frac{dx}{(x-a)^4}

\displaystyle \asymp |a-b|^{-3}

and so we can bound the contribution of (iii) by {O( \min( 1, |a-b|^{-3} ))}. Putting all this together, and dividing into the cases {|a-b| \leq 1} and {|a-b| > 1}, one can soon obtain a total bound of {O(\min( 1, |a-b|^{-2}))} for the entire integral. One can also adapt this argument to show that this bound is sharp up to constants, thus

\displaystyle  \int_{-\infty}^\infty \frac{dx}{(1+(x-a)^2) (1+(x-b)^2)} \asymp \min( 1, |a-b|^{-2})

\displaystyle  \asymp \frac{1}{1+|a-b|^2}.

A powerful and common type of decomposition is dyadic decomposition. If the summand or integrand involves some quantity {Q} in a key way, it is often useful to break up into dyadic regions such as {2^{j-1} \leq Q < 2^{j}}, so that {Q \sim 2^j}, and then sum over {j}. (One can tweak the dyadic range {2^{j-1} \leq Q < 2^{j}} here with minor variants such as {2^{j} < Q \leq 2^{j+1}}, or replace the base {2} by some other base, but these modifications mostly have a minor aesthetic impact on the arguments at best.) For instance, one could break up a sum

\displaystyle  \sum_{n=1}^{\infty} f(n) \ \ \ \ \ (15)

into dyadic pieces

\displaystyle  \sum_{j=1}^\infty \sum_{2^{j-1} \leq n < 2^{j}} f(n)

and then seek to estimate each dyadic block {\sum_{2^{j-1} \leq n < 2^{j}} f(n)} separately (hoping to get some exponential or polynomial decay in {j}). The classical technique of Cauchy condensation is a basic example of this strategy. But one can also dyadically decompose other quantities than {n}. For instance one can perform a “vertical” dyadic decomposition (in contrast to the “horizontal” one just performed) by rewriting (15) as

\displaystyle  \sum_{k \in {\bf Z}} \sum_{n \geq 1: 2^{k-1} \leq f(n) < 2^k} f(n);

since the summand {f(n)} is {\asymp 2^k}, we may simplify this to

\displaystyle  \asymp \sum_{k \in {\bf Z}} 2^k \# \{ n \geq 1: 2^{k-1} \leq f(n) < 2^k\}.

This now converts the problem of estimating the sum (15) to the more combinatorial problem of estimating the size of the dyadic level sets {\{ n \geq 1: 2^{k-1} \leq f(n) < 2^k\}} for various {k}. In a similar spirit, we have

\displaystyle  \int_A f(x)\ dx \asymp \sum_{k \in {\bf Z}} 2^k | \{ x \in A: 2^{k-1} \leq f(x) < 2^k \}|

where {|E|} denotes the Lebesgue measure of a set {E}, and now we are faced with a geometric problem of estimating the measure of some explicit set. This allows one to use geometric intuition to solve the problem, instead of multivariable calculus:

Exercise 3 Let {S} be a smooth compact submanifold of {{\bf R}^d}. Establish the bound

\displaystyle  \int_{B(0,C)} \frac{dx}{\varepsilon^2 + \mathrm{dist}(x,S)^2} \ll \varepsilon^{-1}

for all {0 < \varepsilon < C}, where the implied constants are allowed to depend on {C, d, S}. (This can be accomplished either by a vertical dyadic decomposition, or a dyadic decomposition of the quantity {\mathrm{dist}(x,S)}.)

Exercise 4 Solve problem (ii) from the introduction to this post by dyadically decomposing in the {d} variable.

Remark 5 By such tools as (10), (11), or Exercise 1, one could convert the dyadic sums one obtains from dyadic decomposition into integral variants. However, if one wished, one could “cut out the middle-man” and work with continuous dyadic decompositions rather than discrete ones. Indeed, from the integral identity

\displaystyle  \int_0^\infty 1_{\lambda < Q \leq 2\lambda} \frac{d\lambda}{\lambda} = \log 2

for any {Q>0}, together with the Fubini–Tonelli theorem, we obtain the continuous dyadic decomposition

\displaystyle  \sum_{n \in A} f(n) = \int_0^\infty \sum_{n \in A: \lambda \leq Q(n) < 2\lambda} f(n)\ \frac{d\lambda}{\lambda}

for any quantity {Q(n)} that is positive whenever {f(n)} is positive. Similarly if we work with integrals {\int_A f(x)\ dx} rather than sums. This version of dyadic decomposition is occasionally a little more convenient to work with, particularly if one then wants to perform various changes of variables in the {\lambda} parameter which would be tricky to execute if this were a discrete variable.

— 3. Exponential weights —

Many sums involve expressions that are “exponentially large” or “exponentially small” in some parameter. A basic rule of thumb is that any quantity that is “exponentially small” will likely give a negligible contribution when compared against quantities that are not exponentially small. For instance, if an expression involves a term of the form {e^{-Q}} for some non-negative quantity {Q}, which can be bounded on at least one portion of the domain of summation or integration, then one expects the region where {Q} is bounded to provide the dominant contribution. For instance, if one wishes to estimate the integral

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x}

for some {0 < \varepsilon < 1/2}, this heuristic suggests that the dominant contribution should come from the region {x = O(1/\varepsilon)}, in which one can bound {e^{-\varepsilon x}} simply by {1} and obtain an upper bound of

\displaystyle  \ll \int_{x = O(1/\varepsilon)} \frac{dx}{1+x} \ll \log \frac{1}{\varepsilon}.

To make such a heuristic precise, one can perform a dyadic decomposition in the exponential weight {e^{-\varepsilon x}}, or equivalently perform an additive decomposition in the exponent {\varepsilon x}, for instance writing

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x} = \sum_{j=1}^\infty \int_{j-1 \leq \varepsilon x < j} e^{-\varepsilon x} \frac{dx}{1+x}.

Exercise 6 Use this decomposition to rigorously establish the bound

\displaystyle  \int_0^\infty e^{-\varepsilon x} \frac{dx}{1+x} \ll \log \frac{1}{\varepsilon}

for any {0 < \varepsilon < 1/2}.

Exercise 7 Solve problem (i) from the introduction to this post.

More generally, if one is working with a sum or integral such as

\displaystyle  \sum_{n \in A} e^{\phi(n)} \psi(n)


\displaystyle  \int_A e^{\phi(x)} \psi(x)\ dx

with some exponential weight {e^\phi} and a lower order amplitude {\psi}, then one typically expects the dominant contribution to come from the region where {\phi} comes close to attaining its maximal value. If this maximum is attained on the boundary, then one typically has geometric series behavior away from the boundary, and one can often get a good estimate by obtaining geometric series type behavior. For instance, suppose one wants to estimate the error function

\displaystyle  \mathrm{erf}(z) = \frac{2}{\sqrt{\pi}} \int_0^z e^{-t^2}\ dt

for {z \geq 1}. In view of the complete integral

\displaystyle  \int_0^\infty e^{-t^2}\ dt = \frac{\sqrt{\pi}}{2}

we can rewrite this as

\displaystyle  \mathrm{erf}(z) = 1 - \frac{2}{\sqrt{\pi}} \int_z^\infty e^{-t^2}\ dt.

The exponential weight {e^{-t^2}} attains its maximum at the left endpoint {t=z} and decays quickly away from that endpoint. One could estimate this by dyadic decomposition of {e^{-t^2}} as discussed previously, but a slicker way to proceed here is to use the convexity of {t^2} to obtain a geometric series upper bound

\displaystyle  e^{-t^2} \leq e^{-z^2 - 2 z (t-z)}

for {t \geq z}, which on integration gives

\displaystyle  \int_z^\infty e^{-t^2}\ dt \leq \int_z^\infty e^{-z^2 - 2 z (t-z)}\ dt = \frac{e^{-z^2}}{2z}

giving the asymptotic

\displaystyle  \mathrm{erf}(z) = 1 - O( \frac{e^{-z^2}}{z})

for {z \geq 1}.

Exercise 8 In the converse direction, establish the upper bound

\displaystyle  \mathrm{erf}(z) \leq 1 - c \frac{e^{-z^2}}{z}

for some absolute constant {c>0} and all {z \geq 1}.

Exercise 9 If {\theta n \leq m \leq n} for some {1/2 < \theta < 1}, show that

\displaystyle  \sum_{k=m}^n \binom{n}{k} \ll \frac{1}{2\theta-1} \binom{n}{m}.

(Hint: estimate the ratio between consecutive binomial coefficients {\binom{n}{k}} and then control the sum by a geometric series).

When the maximum of the exponent {\phi} occurs in the interior of the region of summation or integration, then one can get good results by some version of Laplace’s method. For simplicity we will discuss this method in the context of one-dimensional integrals

\displaystyle  \int_a^b e^{\phi(x)} \psi(x)\ dx

where {\phi} attains a non-degenerate global maximum at some interior point {x = x_0}. The rule of thumb here is that

\displaystyle \int_a^b e^{\phi(x)} \psi(x)\ dx \approx \sqrt{\frac{2\pi}{|\phi''(x_0)|}} e^{\phi(x_0)} \psi(x_0).

The heuristic justification is as follows. The main contribution should be when {x} is close to {x_0}. Here we can perform a Taylor expansion

\displaystyle  \phi(x) \approx \phi(x_0) - \frac{1}{2} |\phi''(x_0)| (x-x_0)^2

since at a non-degenerate maximum we have {\phi'(x_0)=0} and {\phi''(x_0) > 0}. Also, if {\psi} is continuous, then {\psi(x) \approx \psi(x_0)} when {x} is close to {x_0}. Thus we should be able to estimate the above integral by the gaussian integral

\displaystyle  \int_{\bf R} e^{\phi(x_0) - \frac{1}{2} |\phi''(x_0)| (x-x_0)^2} \psi(x_0)\ dx

which can be computed to equal {\sqrt{\frac{2\pi}{|\phi''(x_0)|}} e^{\phi(x_0)} \psi(x_0)} as desired.

Let us illustrate how this argument can be made rigorous by considering the task of estimating the factorial {n!} of a large number. In contrast to what we did in Exercise 2, we will proceed using a version of Laplace’s method, relying on the integral representation

\displaystyle  n! = \Gamma(n+1) = \int_0^\infty x^n e^{-x}\ dx.

As {n} is large, we will consider {x^n} to be part of the exponential weight rather than the amplitude, writing this expression as

\displaystyle  \int_0^\infty e^{-\phi(x)}\ dx


\displaystyle  \phi(x) = x - n \log x.

The function {\phi} attains a global maximum at {x_0 = n}, with {\phi(n) = 0} and {\phi''(n) = 1/n}. We will therefore decompose this integral into three pieces

\displaystyle  \int_0^{n-R} e^{-\phi(x)}\ dx + \int_{n-R}^{n+R} e^{-\phi(x)}\ dx + \int_{n+R}^\infty e^{-\phi(x)}\ dx \ \ \ \ \ (16)

where {0 < R < n} is a radius parameter which we will choose later, as it is not immediately obvious for now what the optimal value of this parameter is (although the previous heuristics do suggest that {R \approx 1 / |\phi''(x_0)|^{1/2}} might be a reasonable choice).

The main term is expected to be the middle term, so we shall use crude methods to bound the other two terms. For the first part where {0 < x \leq n-R}, {\phi} is increasing so we can crudely bound {e^{-\phi(x)} \leq e^{-\phi(n-R)}} and thus

\displaystyle  \int_0^{n-R} e^{-\phi(x)}\ dx \leq (n-R) e^{-\phi(n-R)} \leq n e^{-\phi(n-R)}.

(We expect {R} to be much smaller than {n}, so there is not much point to saving the tiny {-R} term in the {n-R} factor.) For the third part where {x \geq n+R}, {\phi} is decreasing, but bounding {e^{-\phi(x)}} by {e^{-\phi(n+R)}} would not work because of the unbounded nature of {x}; some additional decay is needed. Fortunately, we have a strict increase

\displaystyle  \phi'(x) = 1 - \frac{n}{x} \geq 1 - \frac{n}{n+R} = \frac{R}{n+R}

for {x \geq n+R}, so by the intermediate value theorem we have

\displaystyle  \phi(x) \geq \phi(n+R) + \frac{R}{n+R} (x-n-R)

and after a short calculation this gives

\displaystyle  \int_{n+R}^\infty e^{-\phi(x)}\ dx \leq \frac{n+R}{R} e^{-\phi(n+R)} \ll \frac{n}{R} e^{-\phi(n+R)}.

Now we turn to the important middle term. If we assume {R \leq n/2}, then we will have {\phi'''(x) = O( 1/n^2 )} in the region {n-R \leq x \leq n+R}, so by Taylor’s theorem with remainder

\displaystyle  \phi(x) = \phi(n) + \phi'(n) (x-n) + \frac{1}{2} \phi''(n) (x-n)^2 + O( \frac{|x-n|^3}{n^2} )

\displaystyle  = \phi(n) + \frac{(x-n)^2}{2n} + O( \frac{R^3}{n^2} ).

If we assume that {R = O(n^{2/3})}, then the error term is bounded and we can exponentiate to obtain

\displaystyle  e^{-\phi(x)} = (1 + O(\frac{R^3}{n^2})) e^{-\phi(n) - \frac{(x-n)^2}{2n}} \ \ \ \ \ (17)

for {n-R \leq x \leq n+R} and hence

\displaystyle \int_{n-R}^{n+R} e^{-\phi(x)}\ dx = (1 + O(\frac{R^3}{n^2})) e^{-\phi(n)} \int_{n-R}^{n+R} e^{-(x-n)^2/2n}\ dx.

If we also assume that {R \gg \sqrt{n}}, we can use the error function type estimates from before to estimate

\displaystyle  \int_{n-R}^{n+R} e^{-(x-n)^2/2n}\ dx = \sqrt{2\pi n} + O( \frac{n}{R} e^{-R^2/2n} ).

Putting all this together, and using (17) to estimate {e^{-\phi(n \pm R)} \ll e^{-\phi(n) - \frac{R^2}{2n}}}, we conclude that

\displaystyle  n! = e^{-\phi(n)} ( (1 + O(\frac{R^3}{n^2})) \sqrt{2\pi n} + O( \frac{n}{R} e^{-R^2/2n})

\displaystyle  + O( n e^{-R^2/2n} ) + O( \frac{n}{R} e^{-R^2/2n} ) )

\displaystyle  = e^{-n+n \log n} (\sqrt{2\pi n} + O( \frac{R^2}{n} + n e^{-R^2/2n} ))

so if we select {R=n^{2/3}} for instance, we obtain the Stirling approximation

\displaystyle  n! = \frac{n^n}{e^n} (\sqrt{2\pi n} + O( n^{1/3}) ).

One can improve the error term by a finer decomposition than (16); we leave this as an exercise to the interested reader.

Remark 10 It can be convenient to do some initial rescalings to this analysis to achieve a nice normalization; see this previous blog post for details.

Exercise 11 Solve problem (iii) from the introduction. (Hint: extract out the term {\frac{k^{2n-4k}}{(n-k)^{2n-4k}}} to write as the exponential factor {e^{\phi(k)}}, placing all the other terms (which are of polynomial size) in the amplitude function {\psi(k)}. The function {\phi} will then attain a maximum at {k=n/2}; perform a Taylor expansion and mimic the arguments above.)

Doug NatelsonLarge magnetic fields as a scientific tool

When I was at Berkeley at the beginning of the week to give a seminar, I was fortunate enough to overlap with their departmental physics colloquium by Greg Boebinger, an accomplished scientist who is also an extremely engaging and funny speaker.  Since 2004 he has been the director of the National High Magnetic Field Lab in Tallahassee, Florida, the premier user facility for access to large magnetic fields for scientific research.  He gave a great talk that discussed both the challenges in creating very large magnetic fields and a sampling of the cool science that can be done using these capabilities.

Leaving aside spin for a moment, magnetic fields* in some reference frame are generated by currents of moving charges and changing electric fields, as in Ampère's law, \(\nabla \times \mathbf{B} = \mu_{0}\mathbf{J} + \epsilon_{0}\mu_{0}\partial_{t}\mathbf{E}\), where \(\mathbf{J}\) is the current density.  Because materials have collective responses to magnetic fields, generating within themselves some magnetization (magnetic dipole moment per volume \(\mathbf{M}\)), we can think of the magnetic field as a thermodynamic variable, like pressure.  Just as all kinds of interesting physics can be found by using pressure to tune materials between competing phases (because pressure tunes interatomic spacing, and thus things like the ability of electrons to move from atom to atom, and hence the magnitude of magnetic exchange), a magnetic field can tune materials across phase transitions.  

It's worth remembering some physically relevant scales.  The earth's magnetic field at the surface is around 30-50 microTesla.  The magnetic field at the surface of a rare earth magnet is around 1 Tesla.  The field in a typical MRI machine used for medical imaging is 1.5 or 3 T.  The energy levels for the spin of an electron in a magnetic field are set by the Zeeman effect and shift by an amount around \(\mu_{\mathrm{B}}B\), where \(\mu_{\mathrm{B}}\) is the Bohr magneton, \(9.27 \times 10^{-24}\) J/T.  A 10 T magnetic field, about what you can typically get in an ordinary lab, leads to a Zeeman energy comparable to the thermal energy scale at about 6.7 K, or compared to an electron moving through a voltage of 0.6 mV.   In other words, magnetic fields are weak in that it generally takes a lot of current to generate a big field, and the associated energies are small compared to room temperature (\(k_{\mathrm{B}}T\) at 300 K is equivalent to 26 mV) and the eV scales relevant to chemistry.  Still, consequences can be quite profound, and even weak fields can be very useful with the right techniques. (The magnetic field at the surface of a neutron star can be \(10^{11}\) T, a staggering number in terms of energy density.)

Generating large magnetic fields is a persistent technological challenge.  Superconductors can be great for driving large currents without huge dissipation, but they have their own issues of critical currents and critical fields, and the mechanical forces on the conductors can be very large (see here for a recent review).  The largest steady-state magnetic field that has been achieved with a (high-Tc) superconducting coil combined with a resistive magnet is around 45.5 T (see here as well).  At the Los Alamos outpost of the Magnet Lab, they've achieved non-destructive pulsed fields as large as 101 T (see this video).  A huge limiting factor is the challenge of making joints between superconducting wires, so that the joint itself remains superconducting at the very large currents and fields needed. 

The science that can be done with large fields extends well beyond condensed matter physics.  One example from the talk that I liked:  Remarkable resolution is possible in ion cyclotron resonance mass spectroscopy, so that with a single drop of oil, it is possible to identify the contribution of the many thousands of hydrocarbon molecules in there and "fingerprint" where it came from.  

Fun stuff, and a great example of an investment in technology that would very likely never have been made by private industry alone.

* I know that \(\mathbf{B}\) is technically the magnetic induction or magnetic flux density in SI units, but colloquially everyone calls it the magnetic field, so I'll do the same here.

Doug NatelsonGenerative AI and scientific images/diagrams

Generative AI for image generation is a controversial topic for many reasons.  Still, as someone who doesn't have a staff of graphic artists on hand to help make scientific illustrations, it has certainly been tempting to see whether it might be a useful tool.  My brief experiments are based using bing's integrated engine (which I believe is DALL-E 3) since Rice has a license.  The short summary:  Trying to make scientific illustrations this way is great at showing how this technology is not close to useful yet for this application.  While natively integrated w/ chatGPT, it still cannot parse prompts well, indicating that it doesn't really "know" the meaning of a lot of basic terms and how those relate to its image training data.

Let me give you an example.  When I asked it "Make me an image of a benzene molecule showing the chemical bonds between the carbon atoms", this is the response:  

When I tried to refine this by typing "Make an image of a benzene molecule.", it responded by showing me image search results with sources, which were all correctly benzene, but not an original image.  When I responded "Please create a new image (not from an existing source) of a benzene molecule.", it responded with "Ok, I will create a new image from scratch showing a benzene molecule." and showed me this:

I then tried more specificity in the prompt:  "Create (from scratch) an image of a benzene molecule, which consists of six carbon atoms (black spheres) linked in a ring, with one hydrogen atom (a smaller red sphere) attached to each carbon atom."  The result was this:
Closer, but I'm still surprised at how poorly any of these resemble the stock images of benzene that it correctly found through image searching on the internet.  That last one is particularly close, so I tried "The fourth image is close, but it should have only one hydrogen attached to each carbon, not three."  In classic chatGPT form, the response was "I’m sorry, I made a mistake. I’ll try to create a new image of a benzene molecule with only one hydrogen attached to each carbon. Please wait a moment."  It then produced this:
Clearly it just cannot understand or follow the instructions yet. 

Similar things happen if you ask it to make an image of a lattice.  If I ask "Make me an image from scratch of a simple cubic lattice.", the verbal and image responses are quite encouraging.  "I’ll try to create that. Here is a new image of a simple cubic lattice that I generated from scratch. A simple cubic lattice is a type of crystal structure where the atoms are arranged in a regular pattern of cubes. Each atom is located at the center of a cube and has six nearest neighbors. A simple cubic lattice is one of the simplest and most symmetric types of lattices."

This is quite close, though all of these have problems (non-orthogonal angles, uneven spacings).  Things go off the rails badly, though, if you want a face-centered cubic lattice.  It can say the definition of an FCC lattice reasonably well:  "I’ll try to create that. Here is a new image of a face-centered cubic lattice that I generated from scratch. A face-centered cubic lattice is a type of crystal structure where the atoms are arranged in a regular pattern of cubes. Each atom is located at the center of a cube and at the center of each face of the cube. A face-centered cubic lattice has 12 nearest neighbors for each atom and is one of the most common and dense types of lattices."  However, the images are, ummm, artistic.
These are quite different from the simple image search of the internet for "face-centered cubic lattice", which does a good job if you don't mind using someone else's image.  The bottom line:  These kinds of image generation tools have enormous potential to save us time, but right now the parsing technology and training methods just aren't there yet.  If a simple image search turns up the right targets, but the generation tool can't make images that look like those search results, then training has failed somewhere. 

If anyone out there has found some flavor of these image generation capabilities that really does work well for helping to make scientific illustrations, please let me know.

Update:  Google just released their free image generation tool, and while it still has problems like those above, it's pretty impressive.  For example, here is what it made from the prompt "drawing of a benzene molecule, consisting of six black carbon atoms arranged in a ring, with each carbon atom connected to one red hydrogen atom":

That's certainly better than anything DALL-E made.  Of course, the other three images that came along with that one were all screwed up.  Still, progress.


February 02, 2024

Matt von HippelMy Secret, Cap

I’d been meaning, for a while now, to write a post about how I got my permanent job. It lands a bit differently now that I’ve given that job up, but I think the post is still worth making.

Note that, while I know how things felt like, I don’t have “inside information” here. I don’t know why the hiring committee chose me, I never really got to the point where I could comfortably ask that. And I didn’t get to the point where I was on a hiring committee myself, so I never saw from the inside how they work.

Even if I had, “how I got a job” isn’t the kind of thing that has one simple answer. Academic jobs aren’t like commercial airlines or nuclear power plants, where every fail-safe has to go wrong to cause disaster. They aren’t like the highest reaches of competition in things like athletics, where a single mistake will doom you. They’re a mess of circumstances, dozens of people making idiosyncratic decisions, circumstances and effort pulling one way or another. There’s nothing you can do to guarantee yourself a job, nothing you can do so badly to screw up your chance of ever finding one, and no-one who can credibly calculate your chances.

What I can tell you is what happened, and what I eventually did differently. I started applying for permanent and tenure-track jobs in Fall 2019. I applied to four jobs that year, plus one fixed-term one: I still had funding for the next year, so I could afford to be picky. The next year, my funding was going to run out, so I applied more widely. I sent twenty-three applications, some to permanent or tenure-track jobs, but some to shorter-term positions. I got one tenure-track interview (where I did terribly), and two offers for short-term positions. I ended up turning both down after getting a surprise one-year extension where I was.

The next year was a blur of applications. From August 2021 to June 2022, I applied to at least one job every month, 45 jobs in total, and got either rejected or ghosted by all of them. I got a single interview, for a temporary position (where I again did pretty poorly). I was exhausted and heartsick, and when I was offered another one-year extension I didn’t know what to think about it.

So, I took a breath, and I stopped.

I talked to a trusted mentor, who mentioned my publications had slowed. To remedy that, I went back to three results and polished them up, speeding them out to the arXiv paper server in September. Readers of this blog know them as my cabinet of curiosities.

I got some advice from family, and friends of family. I’m descended from a long line of scientists, so this is more practically useful than it would be for most.

More important than either of those, though, I got some therapy. I started thinking about what I cared about, what mattered to me. And I think that there, from that, I figured out my real secret, the thing that ended up making the biggest difference. It wasn’t something I did, but how I thought and felt about it.

My secret to finding an academic job? Knowing you don’t need one.

I’m not saying I didn’t want the position. There were things I wanted to accomplish, things that get a lot easier with the right permanent academic job. But I realized that if I didn’t get it, it wasn’t the end of the world. I had other things I could look into, other paths that would make me happy. On one level, I almost relished the idea of the search not working, of getting some time to rediscover myself and learn something new.

If you’ve ever been lonely, someone has warned you against appearing too desperate. This always seemed patently unfair, as if people are bigoted against those who need companionship the most. But from this job search, I’ve realized there’s another reason.

During that year of applications, the most exhausting part was tailoring. In order for an application to have a chance, I’d need to look up what the other professors in the place I was applying did, come up with a story for how we might collaborate, and edit those stories in to my application materials. This took time, but worse, it felt demeaning. I was applying because I wanted a job, any job, not because I wanted to work with those particular people. It felt like I was being forced to pretend to be someone else, to feign interest in the interests of more powerful people, again and again, when almost all of them weren’t even going to consider my application in the first place.

Then, after realizing I didn’t need the jobs? I tailored more.

I read up on the research the other profs were doing. I read up on the courses the department taught, and the system to propose new courses. I read up on the outreach projects, and even the diversity initiatives.

How did I stand that, how did I stomach it? Because my motivation was different.

Once I knew I didn’t need the job, I read with a very different question in mind: not “how do I pretend I’m good enough for the job”, but, “is the job good enough for me?”

In that final search, I applied to a lot fewer positions: just ten, in the end. But for each position, I was able to find specific reasons why it would be good for me, for the goals I had and what I wanted to accomplish. I was able to tolerate the reading, to get through the boilerplate and even write a DEI essay I wasn’t totally ashamed of, because I looked at each step as a filter: not a filter that would filter me out, but a filter that would get rid of jobs that I didn’t actually want.

I don’t know for certain if this helped: academic jobs are still as random as they come, and in the end I still only got one interview. But it felt like it helped. It gave me a confidence others lacked. It let me survive applying that one more time. And because I asked the right questions, questions based on what I actually cared about, I flattered people much more effectively than I could have done by intentionally trying to flatter them.

(I think that’s an insight that carries over to dating too, by the way. Someone trying to figure out what they want is much more appealing than someone just trying to get anyone they can, because the former asks the right questions.)

In the end, I suspect my problem is that I didn’t take this attitude far enough. I got excited that I was invited to interview, excited that everyone seemed positive and friendly, and I stopped asking the right questions. I didn’t spend time touring the area, trying to figure out if there were good places to live and functional transit. I pushed aside warning signs, vibes in the group and bureaucracy in the approach. I didn’t do the research I should have to figure out if my wife and I could actually make it work.

And I’m paying for it. Going back to Denmark after six months in France is not nearly as easy, not nearly as straightforward, as just not accepting the job and looking for industry jobs in Copenhagen would have been. There’s what my wife endured in those six months, of course. But also, we won’t have the same life that we did. My wife had to quit her job, a very good long-term role. She’ll have to find something else, taking a step back in her career. We were almost able to apply for permanent residency. We should talk to an immigration lawyer, but I’m guessing we’ll have to start again from scratch. We were saving up for an apartment, but Danish banks get skittish about giving loans if you’re new to the country. (Though as I’ve learned on my job search, some of these banks are considering changing how they evaluate credit risk…so maybe there’s some hope?)

So my secret is also my warning. Whatever you’re searching for in life, remember that you can always do without it. Figure out what works for you. Don’t get locked into assuming you only have one option, that you have to accept any offer you get. You have choices, you have options. And you can always try something new.

February 01, 2024

Tommaso DorigoAn Idea For Future Calorimetry

A calorimeter in physics is something that measures heat. However, there are mainly two categories of such objects: ones that measure macroscopic amounts of heat, and ones that measure the heat released by subatomic particles when they smash against matter. I am sure you guess which is the class of instruments I am going to discuss in this article.
A further distinction among calorimeters for particle physics is the one concerning the kind of particles these devices aim to measure. Electromagnetic calorimeters target electrons and photons, and hadronic calorimeters target particles made of quarks and gluons. Here I will discuss only the latter, which are arguably more complex to design.

Smashing protons

read more

John PreskillThe Noncommuting-Charges World Tour (Part 1 of 4)

Introduction: “Once Upon a Time”…with a twist

Thermodynamics problems have surprisingly many similarities with fairy tales. For example, most of them begin with a familiar opening. In thermodynamics, the phrase “Consider an isolated box of particles” serves a similar purpose to “Once upon a time” in fairy tales—both serve as a gateway to their respective worlds. Additionally, both have been around for a long time. Thermodynamics emerged in the Victorian era to help us understand steam engines, while Beauty and the Beast and Rumpelstiltskin, for example, originated about 4000 years ago. Moreover, each conclude with important lessons. In thermodynamics, we learn hard truths such as the futility of defying the second law, while fairy tales often impart morals like the risks of accepting apples from strangers. The parallels go on; both feature archetypal characters—such as wise old men and fairy godmothers versus ideal gases and perfect insulators—and simplified models of complex ideas, like portraying clear moral dichotomies in narratives versus assuming non-interacting particles in scientific models.

Of all the ways thermodynamic problems are like fairytale, one is most relevant to me: both have experienced modern reimagining. Sometimes, all you need is a little twist to liven things up. In thermodynamics, noncommuting conserved quantities, or charges, have added a twist.

Unfortunately, my favourite fairy tale, ‘The Hunchback of Notre-Dame,’ does not start with the classic opening line ‘Once upon a time.’ For a story that begins with this traditional phrase, ‘Cinderella’ is a great choice.

First, let me recap some of my favourite thermodynamic stories before I highlight the role that the noncommuting-charge twist plays. The first is the inevitability of the thermal state. For example, this means that, at most times, the state of most sufficiently small subsystem within the box will be close to a specific form (the thermal state).

The second is an apparent paradox that arises in quantum thermodynamics: How do the reversible processes inherent in quantum dynamics lead to irreversible phenomena such as thermalization? If you’ve been keeping up with Nicole Yunger Halpern‘s (my PhD co-advisor and fellow fan of fairytale) recent posts on the eigenstate thermalization hypothesis (ETH) (part 1 and part 2) you already know the answer. The expectation value of a quantum observable is often comprised of a sum of basis states with various phases. As time passes, these phases tend to experience destructive interference, leading to a stable expectation value over a longer period. This stable value tends to align with that of a thermal state’s. Thus, despite the apparent paradox, stationary dynamics in quantum systems are commonplace.

The third story is about how concentrations of one quantity can cause flows in another. Imagine a box of charged particles that’s initially outside of equilibrium such that there exists gradients in particle concentration and temperature across the box. The temperature gradient will cause a flow of heat (Fourier’s law) and charged particles (Seebeck effect) and the particle-concentration gradient will cause the same—a flow of particles (Fick’s law) and heat (Peltier effect). These movements are encompassed within Onsager’s theory of transport dynamics…if the gradients are very small. If you’re reading this post on your computer, the Peltier effect is likely at work for you right now by cooling your computer.

What do various derivations of the thermal state’s forms, the eigenstate thermalization hypothesis (ETH), and the Onsager coefficients have in common? Each concept is founded on the assumption that the system we’re studying contains charges that commute with each other (e.g. particle number, energy, and electric charge). It’s only recently that physicists have acknowledged that this assumption was even present.

This is important to note because not all charges commute. In fact, the noncommutation of charges leads to fundamental quantum phenomena, such as the Einstein–Podolsky–Rosen (EPR) paradox, uncertainty relations, and disturbances during measurement. This raises an intriguing question. How would the above mentioned stories change if we introduce the following twist?

“Consider an isolated box with charges that do not commute with one another.” 

This question is at the core of a burgeoning subfield that intersects quantum information, thermodynamics, and many-body physics. I had the pleasure of co-authoring a recent perspective article in Nature Reviews Physics that centres on this topic. Collaborating with me in this endeavour were three members of Nicole’s group: the avid mountain climber, Billy Braasch; the powerlifter, Aleksander Lasek; and Twesh Upadhyaya, known for his prowess in street basketball. Completing our authorship team were Nicole herself and Amir Kalev.

To give you a touchstone, let me present a simple example of a system with noncommuting charges. Imagine a chain of qubits, where each qubit interacts with its nearest and next-nearest neighbours, such as in the image below.

The figure is courtesy of the talented team at Nature. Two qubits form the system S of interest, and the rest form the environment E. A qubit’s three spin components, σa=x,y,z, form the local noncommuting charges. The dynamics locally transport and globally conserve the charges.

In this interaction, the qubits exchange quanta of spin angular momentum, forming what is known as a Heisenberg spin chain. This chain is characterized by three charges which are the total spin components in the x, y, and z directions, which I’ll refer to as Qx, Qy, and Qz, respectively. The Hamiltonian H conserves these charges, satisfying [H, Qa] = 0 for each a, and these three charges are non-commuting, [Qa, Qb] 0, for any pair a, b ∈ {x,y,z} where a≠b. It’s noteworthy that Hamiltonians can be constructed to transport various other kinds of noncommuting charges. I have discussed the procedure to do so in more detail here (to summarize that post: it essentially involves constructing a Koi pond).

This is the first in a series of blog posts where I will highlight key elements discussed in the perspective article. Motivated by requests from peers for a streamlined introduction to the subject, I’ve designed this series specifically for a target audience: graduate students in physics. Additionally, I’m gearing up to defending my PhD thesis on noncommuting-charge physics next semester and these blog posts will double as a fun way to prepare for that.

January 30, 2024

Scott Aaronson Does fermion doubling make the universe not a computer?

Unrelated Announcement: The Call for Papers for the 2024 Conference on Computational Complexity is now out! Submission deadline is Friday February 16.

Every month or so, someone asks my opinion on the simulation hypothesis. Every month I give some variant on the same answer:

  1. As long as it remains a metaphysical question, with no empirical consequences for those of us inside the universe, I don’t care.
  2. On the other hand, as soon as someone asserts there are (or could be) empirical consequences—for example, that our simulation might get shut down, or we might find a bug or a memory overflow or a floating point error or whatever—well then, of course I care. So far, however, none of the claimed empirical consequences has impressed me: either they’re things physicists would’ve noticed long ago if they were real (e.g., spacetime “pixels” that would manifestly violate Lorentz and rotational symmetry), or the claim staggeringly fails to grapple with profound features of reality (such as quantum mechanics) by treating them as if they were defects in programming, or (most often) the claim is simply so resistant to falsification as to enter the realm of conspiracy theories, which I find boring.

Recently, though, I learned a new twist on this tired discussion, when a commenter asked me to respond to the quantum field theorist David Tong, who gave a lecture arguing against the simulation hypothesis on an unusually specific and technical ground. This ground is the fermion doubling problem: an issue known since the 1970s with simulating certain quantum field theories on computers. The issue is specific to chiral QFTs—those whose fermions distinguish left from right, and clockwise from counterclockwise. The Standard Model is famously an example of such a chiral QFT: recall that, in her studies of the weak nuclear force in 1956, Chien-Shiung Wu proved that the force acts preferentially on left-handed particles and right-handed antiparticles.

I can’t do justice to the fermion doubling problem in this post (for details, see Tong’s lecture, or this old paper by Eichten and Preskill). Suffice it to say that, when you put a fermionic quantum field on a lattice, a brand-new symmetry shows up, which forces there to be an identical left-handed particle for every right-handed particle and vice versa, thereby ruining the chirality. Furthermore, this symmetry just stays there, no matter how small you take the lattice spacing to be. This doubling problem is the main reason why Jordan, Lee, and Preskill, in their important papers on simulating interacting quantum field theories efficiently on a quantum computer (in BQP), have so far been unable to handle the full Standard Model.

But this isn’t merely an issue of calculational efficiency: it’s a conceptual issue with mathematically defining the Standard Model at all. In that respect it’s related to, though not the same as, other longstanding open problems around making nontrivial QFTs mathematically rigorous, such as the Yang-Mills existence and mass gap problem that carries a $1 million prize from the Clay Math Institute.

So then, does fermion doubling present a fundamental obstruction to simulating QFT on a lattice … and therefore, to simulating physics on a computer at all?

Briefly: no, it almost certainly doesn’t. If you don’t believe me, just listen to Tong’s own lecture! (Really, I recommend it; it’s a masterpiece of clarity.) Tong quickly admits that his claim to refute the simulation hypothesis is just “clickbait”—i.e., an excuse to talk about the fermion doubling problem—and that his “true” argument against the simulation hypothesis is simply that Elon Musk takes the hypothesis seriously (!).

It turns out that, for as long as there’s been a fermion doubling problem, there have been known methods to deal with it, though (as often the case with QFT) no proof that any of the methods always work. Indeed, Tong himself has been one of the leaders in developing these methods, and because of his and others’ work, some experts I talked to were optimistic that a lattice simulation of the full Standard Model, with “good enough” justification for its correctness, might be within reach. Just to give you a flavor, apparently some of the methods involve adding an extra dimension to space, in such a way that the boundaries of the higher-dimensional theory approximate the chiral theory you’re trying to simulate (better and better, as the boundaries get further and further apart), even while the higher-dimensional theory itself remains non-chiral. It’s yet another example of the general lesson that you don’t get to call an aspect of physics “noncomputable,” just because the first method you thought of for simulating it on a computer didn’t work.

I wanted to make a deeper point. Even if the fermion doubling problem had been a fundamental obstruction to simulating Nature on a Turing machine, rather than (as it now seems) a technical problem with technical solutions, it still might not have refuted the version of the simulation hypothesis that people care about. We should really distinguish at least three questions:

  1. Can currently-known physics be simulated on computers using currently-known approaches?
  2. Is the Physical Church-Turing Thesis true? That is: can any physical process be simulated on a Turing machine to any desired accuracy (at least probabilistically), given enough information about its initial state?
  3. Is our whole observed universe a “simulation” being run in a different, larger universe?

Crucially, each of these three questions has only a tenuous connection to the other two! As far as I can see, there aren’t even nontrivial implications among them. For example, even if it turned out that lattice methods couldn’t properly simulate the Standard Model, that would say little about whether any computational methods could do so—or even more important, whether any computational methods could simulate the ultimate quantum theory of gravity. A priori, simulating quantum gravity might be harder than “merely” simulating the Standard Model (if, e.g., Roger Penrose’s microtubule theory turned out to be right), but it might also be easier: for example, because of the finiteness of the Bekenstein-Hawking entropy, and perhaps the Hilbert space dimension, of any bounded region of space.

But I claim that there also isn’t a nontrivial implication between questions 2 and 3. Even if our laws of physics were computable in the Turing sense, that still wouldn’t mean that anyone or anything external was computing them. (By analogy, presumably we all accept that our spacetime can be curved without there being a higher-dimensional flat spacetime for it to curve in.) And conversely: even if Penrose was right, and our laws of physics were Turing-uncomputable—well, if you still want to believe the simulation hypothesis, why not knock yourself out? Why shouldn’t whoever’s simulating us inhabit a universe full of post-Turing hypercomputers, for which the halting problem is mere child’s play?

In conclusion, I should probably spend more of my time blogging about fun things like this, rather than endlessly reading about world events in news and social media and getting depressed.

(Note: I’m grateful to John Preskill and Jacques Distler for helpful discussions of the fermion doubling problem, but I take 300% of the blame for whatever errors surely remain in my understanding of it.)

January 24, 2024

Robert HellingHow do magnets work?

I came across this excerpt from a a christian home schooling book:

which is of course funny in so many ways not at least as the whole process of "seeing" is electromagnetic at its very core and of course most people will have felt electricity at some point in their life. Even historically, this is pretty much how it was discovered by Galvani (using forge' legs) at a time when electricity was about cat skins and amber.

It also brings to mind this quite famous Youtube video that shows Feynman being interviewed by the BBC and first getting somewhat angry about the question how magnets work and then actually goes into a quite deep explanation of what it means to explain something

But how do magnets work? When I look at what my kids are taught in school, it basically boils down to "a magnet is made up of tiny magnets that all align" which if you think about it is actually a non-explanation. Can we do better (using more than layman's physics)? What is it exactly that makes magnets behave like magnets?

I would define magnetism as the force that moving charges feel in an electromagnetic field (the part proportional to the velocity) or said the other way round: The magnetic field is the field that is caused by moving charges. Using this definition, my interpretation of the question about magnets is then why permanent magnets feel this force.  For the permanent magnets, I want to use the "they are made of tiny magnets" line of thought but remove the circularity of the argument by replacing it by "they are made of tiny spins". 

This transforms the question to "Why do the elementary particles that make up matter feel the same force as moving charges even if they are not moving?".

And this question has an answer: Because they are Dirac particles! At small energies, the Dirac equation reduces to the Pauli equation which involves the term (thanks to minimal coupling)
$$(\vec\sigma\cdot(\vec p+q\vec A)^2$$
and when you expand the square that contains (in Coulomb gauge)
$$(\vec\sigma\cdot \vec p)(\vec\sigma\cdot q\vec A)= q\vec A\cdot\vec p + (\vec p\times q\vec A)\cdot\vec\sigma$$
Here, the first term is the one responsible for the interaction of the magnetic field and moving charges while the second one couples $$\nabla\times\vec A$$ to the operator $$\vec\sigma$$, i.e. the spin. And since you need to have both terms, this links the force on moving charges to this property we call spin. If you like, the fact that the g-factor is not vanishing is the core of the explanation how magnets work.

And if you want, you can add spin-statistics which then implies the full "stability of matter" story in the end is responsible that you can from macroscopic objects out of Dirac particles that can be magnets.

n-Category Café Summer Research at the Topos Institute

Are you a student wanting to get paid to work on category theory in Berkeley? Then you’ve got just one week left to apply! The application deadline for Research Associate positions at the Topos Institute is February 1st.

Details and instructions on how to apply are here:

Alas, the Topos Institute can’t provide visas.

Here are some topics you could work on:

  • Computational category theory using AlgebraicJulia (Julia skills recommended)
  • Categorical statistics
  • Polynomial functors
  • Interacting dynamical systems
  • Proof assistants
  • Technology ethics

A bunch of my favorite people are working there… it’s a great place.

January 23, 2024

David HoggBetz limit for sailboats?

In the study of sustainable energy, there is a nice result on windmills, called the Betz limit: There is a finite limit to the fraction of the kinetic energy of the wind that a windmill can absorb or exploit. The reason is often stated as: If the windmill took all of the power in the wind, the wind would stop, and then there would be no flow of energy over the windmill. I'm not sure I exactly agree with that explanation, but let's leave that here.

On my travel home today I worked on the possibility that there is an equivalent to the Betz limit for sailboats. Is there an energetic way of looking at sailing that is useful?

One paradox is that a sailboat is sailing steadily when the net force on the boat is zero (just like when a windmill is turning at constant angular velocity). In the Betz limit, the windmill is thought of as having two different torques on it, one from the wind, and one from the turbine. Sailing has no turbine. So this problem has a conceptual component to it.

January 22, 2024

John PreskillColliding the familiar and the anti-familiar at CERN

The most ingenious invention to surprise me at CERN was a box of chocolates. CERN is a multinational particle-physics collaboration. Based in Geneva, CERN is famous for having “the world’s largest and most powerful accelerator,” according to its website. So a physicist will take for granted its colossal magnets, subatomic finesse, and petabytes of experimental data

But I wasn’t expecting the chocolates.

In the main cafeteria, beside the cash registers, stood stacks of Toblerone. Sweet-tooth owners worldwide recognize the yellow triangular prisms stamped with Toblerone’s red logo. But I’d never seen such a prism emblazoned with CERN’s name. Scientists visit CERN from across the globe, and probably many return with Swiss-chocolate souvenirs. What better way to promulgate CERN’s influence than by coupling Switzerland’s scientific might with its culinary?1

I visited CERN last November for Sparks!, an annual public-outreach event. The evening’s speakers and performers offer perspectives on a scientific topic relevant to CERN. This year’s event highlighted quantum technologies. Physicist Sofia Vallecorsa described CERN’s Quantum Technology Initiative, and IBM philosopher Mira Wolf-Bauwens discussed ethical implications of quantum technologies. (Yes, you read that correctly: “IBM philosopher.”) Dancers Wenchi Su and I-Fang Lin presented an audiovisual performance, Rachel Maze elucidated government policies, and I spoke about quantum steampunk

Around Sparks!, I played the physicist tourist: presented an academic talk, descended to an underground detector site, and shot the scientific breeze with members of the Quantum Technology Initiative. (What, don’t you present academic talks while touristing?) I’d never visited CERN before, but much of it felt eerily familiar. 

A theoretical-physics student studies particle physics and quantum field theory (the mathematical framework behind particle physics) en route to a PhD. CERN scientists accelerate particles to high speeds, smash them together, and analyze the resulting debris. The higher the particles’ initial energies, the smaller the debris’s components, and the more elementary the physics we can infer. CERN made international headlines in 2012 for observing evidence of the Higgs boson, the particle that endows other particles with masses. As a scientist noted during my visit, one can infer CERN’s impact from how even Auto World (if I recall correctly) covered the Higgs discovery. Friends of mine process data generated by CERN, and faculty I met at Caltech helped design CERN experiments. When I mentioned to a colleague that I’d be flying to Geneva, they responded, “Oh, are you visiting CERN?” All told, a physicist can avoid CERN as easily as one can avoid the Panama Canal en route from the Atlantic Ocean to the Pacific through South America. So, although I’d never visited, CERN felt almost like a former stomping ground. It was the details that surprised me.

Familiar book, new (CERN) bookstore.

Take the underground caverns. CERN experiments take place deep underground, where too few cosmic rays reach to muck with observations much. I visited the LHCb experiment, which spotlights a particle called the “beauty quark” in Europe and the less complimentary “bottom quark” in the US. LHCb is the first experiment that I learned has its own X/Twitter account. Colloquia (weekly departmental talks at my universities) had prepared me for the 100-meter descent underground, for the hard hats we’d have to wear, and for the detector many times larger than I.

A photo of the type bandied about in particle-physics classes
A less famous hard-hat photo, showing a retired detector’s size.

But I hadn’t anticipated the bright, single-tone colors. Between the hard hats and experimental components, I felt as though I were inside the Google logo.

Or take CERN’s campus. I wandered around it for a while before a feeling of nostalgia brought me up short: I was feeling lost in precisely the same way in which I’d felt lost countless times at MIT. Numbers, rather than names, label both MIT’s and CERN’s buildings. Somebody must have chosen which number goes where by throwing darts at a map while blindfolded. Part of CERN’s hostel, building 39, neighbors buildings 222 and 577. I shouldn’t wonder to discover, someday, that the CERN building I’m searching for has wandered off to MIT.

Part of the CERN map. Can you explain it?

Between the buildings wend streets named after famous particle physicists. I nodded greetings to Einstein, Maxwell, Democritus (or Démocrite, as the French Swiss write), and Coulomb. But I hadn’t anticipated how much civil engineers venerate particle physicists. So many physicists did CERN’s designers stuff into walkways that the campus ran out of streets and had to recycle them. Route W. F. Weisskopf turns into Route R. P. Feynman at a…well, at nothing notable—not a fork or even a spoon. I applaud the enthusiasm for history; CERN just achieves feats in navigability that even MIT hasn’t.

The familiar mingled with the unfamiliar even in the crowd on campus. I was expecting to recognize only the personnel I’d coordinated with electronically. But three faces surprised me at my academic talk. I’d met those three physicists through different channels—a summer school in Malta, Harvard collaborators, and the University of Maryland—at different times over the years. But they happened to be visiting CERN at the same time as I, despite their not participating in Sparks! I’m half-reminded of the book Roughing It, which describes how Mark Twain traveled the American West via stagecoach during the 1860s. He ran into a long-lost friend “on top of the Rocky Mountains thousands of miles from home.” Exchange “on top of the Rockies” for “near the Alps” and “thousands of miles” for “even more thousands of miles.”

CERN unites physicists. We learn about its discoveries in classes, we collaborate on its research or have friends who do, we see pictures of its detectors in colloquia, and we link to its science-communication pages in blog posts. We respect CERN, and I hope we can be forgiven for fondly poking a little fun at it. So successfully has CERN spread its influence, I felt a sense of recognition upon arriving. 

I didn’t buy any CERN Toblerones. But I arrived home with 4.5 pounds of other chocolates, which I distributed to family and friends, the thermodynamics lunch group I run at the University of Maryland, and—perhaps most importantly—my research group. I’ll take a leaf out of CERN’s book: to hook students on fundamental physics, start early, and don’t stint on the sweets.

With thanks to Claudia Marcelloni, Alberto Di Meglio, Michael Doser, Antonella Del Rosso, Anastasiia Lazuka, Salome Rohr, Lydia Piper, and Paulina Birtwistle for inviting me to, and hosting me at, CERN.

1After returning home, I learned that an external company runs CERN’s cafeterias and that the company orders and sells the Toblerones. Still, the idea is brilliant.

January 21, 2024

Tommaso DorigoComparing Student Reactions To Lectures In Artificial Intelligence And Physics

In the past two weeks I visited two schools in Veneto to engage students with the topic of Artificial Intelligence, which is something everybody seems to be happy to hear about these days: on the 10th of January I visited a school in Vicenza, and on the 17th a school in Venice. In both cases there were about 50-60 students, but there was a crucial difference: while the school in Venezia (the "Liceo Marco Foscarini", where I have been giving lectures in the past within the project called "Art and Science") was a classical liceum and the high-schoolers who came to listen to my presentation were between 16 and 18 years old, the one in Vicenza was a middle school, and its attending students were between 11 and 13 years old. 

read more

January 20, 2024

Jacques Distler Responsibility

Many years ago, when I was an assistant professor at Princeton, there was a cocktail party at Curt Callan’s house to mark the beginning of the semester. There, I found myself in the kitchen, chatting with Sacha Polyakov. I asked him what he was going to be teaching that semester, and he replied that he was very nervous because — for the first time in his life — he would be teaching an undergraduate course. After my initial surprise that he had gotten this far in life without ever having taught an undergraduate course, I asked which course it was. He said it was the advanced undergraduate Mechanics course (chaos, etc.) and we agreed that would be a fun subject to teach. We chatted some more, and then he said that, on reflection, he probably shouldn’t be quite so worried. After all, it wasn’t as if he was going to teach Quantum Field Theory, “That’s a subject I’d feel responsible for.”

This remark stuck with me, but it never seemed quite so poignant until this semester, when I find myself teaching the undergraduate particle physics course.

The textbooks (and I mean all of them) start off by “explaining” that relativistic quantum mechanics (e.g. replacing the Schrödinger equation with Klein-Gordon) make no sense (negative probabilities and all that …). And they then proceed to use it anyway (supplemented by some Feynman rules pulled out of thin air).

This drives me up the #@%^ing wall. It is precisely wrong.

There is a perfectly consistent quantum mechanical theory of free particles. The problem arises when you want to introduce interactions. In Special Relativity, there is no interaction-at-a-distance; all forces are necessarily mediated by fields. Those fields fluctuate and, when you want to study the quantum theory, you end up having to quantize them.

But the free particle is just fine. Of course it has to be: free field theory is just the theory of an (indefinite number of) free particles. So it better be true that the quantum theory of a single relativistic free particle makes sense.

So what is that theory?

  1. It has a Hilbert space, \mathcal{H}, of states. To make the action of Lorentz transformations as simple as possible, it behoves us to use a Lorentz-invariant inner product on that Hilbert space. This is most easily done in the momentum representation χ|ϕ=d 3k(2π) 32k 2+m 2χ(k) *ϕ(k) \langle\chi|\phi\rangle = \int \frac{d^3\vec{k}}{{(2\pi)}^3 2\sqrt{\vec{k}^2+m^2}}\, \chi(\vec{k})^* \phi(\vec{k})
  2. As usual, the time-evolution is given by a Schrödinger equation
(1)i t|ψ=H 0|ψi\partial_t |\psi\rangle = H_0 |\psi\rangle

where H 0=p 2+m 2H_0 = \sqrt{\vec{p}^2+m^2}. Now, you might object that it is hard to make sense of a pseudo-differential operator like H 0H_0. Perhaps. But it’s not any harder than making sense of U(t)=e ip 2t/2mU(t)= e^{-i \vec{p}^2 t/2m}, which we routinely pretend to do in elementary quantum. In both cases, we use the fact that, in the momentum representation, the operator p\vec{p} is represented as multiplication by k\vec{k}.

I could go on, but let me leave the rest of the development of the theory as a series of questions.

  1. The self-adjoint operator, x\vec{x}, satisfies [x i,p j]=iδ j i [x^i,p_j] = i \delta^{i}_j Thus it can be written in the form x i=i(k i+f i(k)) x^i = i\left(\frac{\partial}{\partial k_i} + f_i(\vec{k})\right) for some real function f if_i. What is f i(k)f_i(\vec{k})?
  2. Define J 0(r)J^0(\vec{r}) to be the probability density. That is, when the particle is in state |ϕ|\phi\rangle, the probability for finding it in some Borel subset S 3S\subset\mathbb{R}^3 is given by Prob(S)= Sd 3rJ 0(r) \text{Prob}(S) = \int_S d^3\vec{r} J^0(\vec{r}) Obviously, J 0(r)J^0(\vec{r}) must take the form J 0(r)=d 3kd 3k(2π) 64k 2+m 2k 2+m 2g(k,k)e i(kk)rϕ(k)ϕ(k) * J^0(\vec{r}) = \int\frac{d^3\vec{k}d^3\vec{k}'}{{(2\pi)}^6 4\sqrt{\vec{k}^2+m^2}\sqrt{{\vec{k}'}^2+m^2}} g(\vec{k},\vec{k}') e^{i(\vec{k}-\vec{k'})\cdot\vec{r}}\phi(\vec{k})\phi(\vec{k}')^* Find g(k,k)g(\vec{k},\vec{k}'). (Hint: you need to diagonalize the operator x\vec{x} that you found in problem 1.)
  3. The conservation of probability says 0= tJ 0+ iJ i 0=\partial_t J^0 + \partial_i J^i Use the Schrödinger equation (1) to find J i(r)J^i(\vec{r}).
  4. Under Lorentz transformations, H 0H_0 and p\vec{p} transform as the components of a 4-vector. For a boost in the zz-direction, of rapidity λ\lambda, we should have U λp 2+m 2U λ 1 =cosh(λ)p 2+m 2+sinh(λ)p 3 U λp 1U λ 1 =p 1 U λp 2U λ 1 =p 3 U λp 3U λ 1 =sinh(λ)p 2+m 2+cosh(λ)p 3 \begin{split} U_\lambda \sqrt{\vec{p}^2+m^2} U_\lambda^{-1} &= \cosh(\lambda) \sqrt{\vec{p}^2+m^2} + \sinh(\lambda) p_3\\ U_\lambda p_1 U_\lambda^{-1} &= p_1\\ U_\lambda p_2 U_\lambda^{-1} &= p_3\\ U_\lambda p_3 U_\lambda^{-1} &= \sinh(\lambda) \sqrt{\vec{p}^2+m^2} + \cosh(\lambda) p_3 \end{split} and we should be able to write U λ=e iλBU_\lambda = e^{i\lambda B} for some self-adjoint operator, BB. What is BB? (N.B.: by contrast the x ix^i, introduced above, do not transform in a simple way under Lorentz transformations.)

The Hilbert space of a free scalar field is now n=0 Sym n\bigoplus_{n=0}^\infty \text{Sym}^n\mathcal{H}. That’s perhaps not the easiest way to get there. But it is a way …


Yike! Well, that went south pretty fast. For the first time (ever, I think) I’m closing comments on this one, and calling it a day. To summarize, for those who still care,

  1. There is a decomposition of the Hilbert space of a Free Scalar field as ϕ= n=0 n \mathcal{H}_\phi = \bigoplus_{n=0}^\infty \mathcal{H}_n where n=Sym n \mathcal{H}_n = \text{Sym}^n \mathcal{H} and \mathcal{H} is 1-particle Hilbert space described above (also known as the spin-00, mass-mm, irreducible unitary representation of Poincaré).
  2. The Hamiltonian of the Free Scalar field is the direct sum of the induced Hamiltonia on n\mathcal{H}_n, induced from the Hamiltonian, H=p 2+m 2H=\sqrt{\vec{p}^2+m^2}, on \mathcal{H}. In particular, it (along with the other Poincaré generators) is block-diagonal with respect to this decomposition.
  3. There are other interesting observables which are also block-diagonal, with respect to this decomposition (i.e., don’t change the particle number) and hence we can discuss their restriction to n\mathcal{H}_n.

Gotta keep reminding myself why I decided to foreswear blogging…

David Hoggauto-encoder for calibration data

Connor Hainje (NYU) is looking at whether we could build a hierarchical or generative model of SDSS-V BOSS spectrograph calibration data, such that we could reduce the survey's per-visit calibration overheads. He started by building an auto-encoder, which is a simple, self-supervised generative model. It works really well! We discussed how to judge performance (held-out data) and how performance should depend on the size of the latent space (I predict that it won't want a large latent space). We also decided that we should announce an SDSS-V project and send out a call for collaboration.

[Note added later: Contardo (SISSA) points out that an autoencoder is not a generative model. That's right, but there are multiple definitions of generative model; only one of which is that you can sample from it. Another is that it is a parameterized model that can predict the data. Another is that it is a likelihood function for the parameters. But she's right: We are going to punk parts of the auto-encoder into a generative model in the sense of a likelihood function.]

David HoggHappy birthday, Rix

Today was an all-day event at MPIA to celebrate the 60th birthday (and 25th year as Director) of Hans-Walter Rix (MPIA). There were many remarkable presentations and stories; he has left a trail of goodwill wherever he has gone! I decided to use the opportunity to talk about measurement, which is something that Rix and I have discussed for the last 18 years. My slides are here.

I've been very lucky with the opportunities I've had to work with wonderful people.

January 18, 2024

David Hoggdivide by your selection function, or multiply by it?

With Kate Storey-Fisher (San Sebastián), Abby Williams (Caltech) is working on a paper about large-angular-scale power, or anisotropy, in the distribution of quasars. It is a great subject; we need to estimate this power in the context of a very non-trivial all-sky selection function. The tradition in cosmology is to divide the data by this selection function. But of course you shouldn't manipulate your data. Instead, you could multiply your model by the selection function. You can guess which one I prefer! In fact you can do either, as long as you weight the data in the right way in the fit. I promised to write up a few words and equations about this for Williams.

January 14, 2024

Tommaso DorigoStatistical Methods For Fundamental Science Course Starts Tomorrow

From tomorrow onwards (once or twice a week until February 5), I will be giving an online course on the topic of "Statistical Methods for Fundamental Science" for the INSTATS organization. This is a 5-day, 15-hour set of lectures that I put together to suit the needs of students and researchers who work in any scientific discipline, who wish to improve their understanding and practice of statistical methods for data analysis.

read more

December 23, 2023

Scott Aaronson Postdocs wanted!

David Soloveichik, my friend and colleague in UT Austin’s Electrical and Computer Engineering department, and I are looking to hire a joint postdoc in “Unconventional Computing,” broadly defined. Areas of interest include but are not limited to:

(1) quantum computation,
(2) thermodynamics of computation and reversible computation,
(3) analog computation, and
(4) chemical computation.

The ideal candidate would have broad multi-disciplinary interests in addition to prior experience and publications in at least one of these areas. The researcher will work closely with David and myself but is expected to be highly self-motivated. To apply, please send an email to and with the subject line “quantum postdoc application.” Please include a CV and links to three representative publications. Let’s set a deadline of January 20th. We’ll be back in touch if we need recommendation letters.

My wife Dana Moshkovitz Aaronson and my friend and colleague David Zuckerman are also looking for a joint postdoc at UT Austin, to work on pseudorandomness and related topics. They’re asking for applications by January 16th. Click here for more information.

December 20, 2023

Scott Aaronson Rowena He

This fall, I’m honored to have made a new friend: the noted Chinese dissident scholar Rowena He, currently a Research Fellow at the Civitas Institute at UT Austin, and formerly of Harvard, the Institute for Advanced Study at Princeton, the National Humanities Center, and other fine places. I was connected to Rowena by the Harvard computer scientist Harry Lewis.

But let’s cut to the chase, as Rowena tends to do in every conversation. As a teenage girl in Guangdong, Rowena eagerly participated in the pro-democracy protests of 1989, the ones that tragically culminated in the Tiananmen Square massacre. Since then, she’s devoted her life to documenting and preserving the memory of what happened, fighting its deliberate erasure from the consciousness of future generations of Chinese. You can read some of her efforts in her first book, Tiananmen Exiles: Voices of the Struggle for Democracy in China (one of the Asia Society’s top 5 China books of 2014). She’s now spending her time at UT writing a second book.

Unsurprisingly, Rowena’s life’s project has not (to put it mildly) sat well with the Chinese authorities. From 2019, she had a history professorship at the Chinese University of Hong Kong, where she could be close to her research material and to those who needed to hear her message—and where she was involved in the pro-democracy protests that convulsed Hong Kong that year. Alas, you might remember the grim outcome of those protests. Following Hong Kong’s authoritarian takeover, in October of this year, Rowena was denied a visa to return to Hong Kong, and then fired from CUHK because she’d been denied a visa—events that were covered fairly widely in the press. Learning about the downfall of academic freedom in Hong Kong was particularly poignant for me, given that I lived in Hong Kong when I was 13 years old, in some of the last years before the handover to China (1994-1995), and my family knew many people there who were trying to get out—to Canada, Australia, anywhere—correctly fearing what eventually came to pass.

But this is all still relatively dry information that wouldn’t have prepared me for the experience of meeting Rowena in person. Probably more than anyone else I’ve had occasion to meet, Rowena is basically the living embodiment of what it means to sacrifice everything for abstract ideals of freedom and justice. Many academics posture that way; to spend a couple hours with Rowena is to understand the real deal. You can talk to her about trivialities—food, work habits, how she’s settling in Austin—and she’ll answer, but before too long, the emotion will rise in her voice and she’ll be back to telling you how the protesting students didn’t want to overthrow the Chinese government, but only help to improve it. As if you, too, were a CCP bureaucrat who might imprison her if the truth turned out otherwise. Or she’ll talk about how, when she was depressed, only the faces of the students in Hong Kong who crowded her lecture gave her the will to keep living; or about what she learned by reading the letters that Lin Zhao, a dissident from Maoism, wrote in blood in Chinese jail before she was executed.

This post has a practical purpose. Since her exile from China, Rowena has spent basically her entire life moving from place to place, with no permanent position and no financial security. In the US—a huge country full of people who share Rowena’s goal of exposing the lies of the CCP—there must be an excellent university, think tank, or institute that would offer a permanent position to possibly the world’s preeminent historian of Tiananmen and of the Chinese democracy movement. Though the readership of this blog is heavily skewed toward STEM, maybe that institute is yours. If it is, please get in touch with Rowena. And then I could say this blog had served a useful purpose, even if everything else I wrote for two decades was for naught.

Richard EastherA Bigger Sky

Amongst everything else that happened in 2023, a key anniversary of a huge leap in our understanding of the Universe passed largely unnoticed – the centenary of the realisation that not only was our Sun one of many stars in the Milky Way galaxy but that our galaxy was one of many galaxies in the Universe.

I had been watching the approaching anniversary for over a decade, thanks to teaching the cosmology section of the introductory astronomy course at the University of Auckland. My lectures come at the end of the semester and each October finds me showing this image – with its “October 1923” inscription – to a roomful of students.

The image was captured by the astronomer Edwin Hubble, using the world’s then-largest telescope, on top of Mt Wilson, outside Los Angeles. At first glance, it may not even look like a picture of the night sky: raw photographic images are reversed, so stars show up as dark spots against a light backgrounds. However, this odd-looking picture changed our sense of where we live in the Universe.

My usual approach when I share this image with my students is to ask for a show of hands by people with a living relative born before 1923. It’s a decent-sized class and this year a few of them had a centenarian in the family. However, I would get far more hands a decade ago when I asked about mere 90 year olds. And sometime soon no hands will rise at this prompt and I will have to come up with a new shtick. But it is remarkable to me that there are people alive today who were born before we understood of the overall arrangement of the Universe.

For tens of thousands of years, the Milky Way – the band of light that stretches across the dark night sky – would have been one of the most striking sights in the sky on a dark night once you stepped away from the fire.

Milky Way — via Unsplash

Ironically, the same technological prowess that has allowed us to explore the farthest reaches of the Universe also gives us cities and electric lights. I always ask whether my students have seen the Milky Way for themselves with another show of hands and each year quite a few of them disclose that they have not. I encourage them (and everyone) to find chances to sit out under a cloudless, moonless sky and take in the full majesty of the heavens as it slowly reveals itself to you as your eyes adapt to the dark.

In the meantime, though, we make do with a projector and a darkened lecture theatre.

It was over 400 years ago that Galileo pointed the first, small telescope at the sky. In that moment the apparent clouds of the Milky Way revealed themselves to be composed of many individual stars. By the 1920s, we understood that our Sun is a star and that the Milky Way is a collection of billions of stars, with our Sun inside it. But the single biggest question in astronomy in 1923 — which, with hindsight, became known “Great Debate” — was whether the Milky Way was an isolated island of stars in an infinite and otherwise empty ocean of space, or if it was one of many such islands, sprinkled across the sky.

In other words, for Hubble and his contemporaries the question was whether our galaxy was the galaxy, or one of many?

More specifically, the argument was whether nebulae, which are visible as extended patches of light in the night sky, were themselves galaxies or contained within the Milky Way. These objects, almost all of which are only detectable in telescopes, had been catalogued by astronomers as they mapped the sky with increasingly capable instruments. There are many kinds of nebulae, but the white nebulae had the colour of starlight and looked like little clouds through the eyepiece. Since the 1750s these had been proposed as possible galaxies. But until 1923 nobody knew with certainty whether they were small objects on the outskirts of our galaxy – or much larger, far more distant objects on the same scale as the Milky Way itself.

To human observers, the largest and most impressive of the nebulae is Andromeda. This was this object at which Hubble had pointed his telescope in October 1923. Hubble was renowned for his ability to spot interesting details in complex images [1] and after the photographic plate was developed his eye alighted on a little spot that had not been present in an earlier observation [2].

Hubble’s original guess was that this was a nova, a kind of star that sporadically flares in brightness by a factor of 1,000 or more, so he marked it and a couple of other candidates with an “N”. However, after looking back at images that he had already taken and monitoring the star through the following months Hubble came to realise that he had found a Cepheid variable – a star whose brightness changes rhythmically over weeks or months.

Stars come in a huge range of sizes and big stars are millions of times brighter than little ones, so simply looking a star in the sky tells us little about its distance from us. But Cepheids have a useful property [3]: brighter Cepheids takes longer to pass through a single cycle than their smaller siblings.

Imagine a group of people holding torches (flashlights if you are North Americans) each of which has a bulb with its own distinctive brightness. If this group fans out across a field at night and turns on their torches, we cannot tell how far away each person simply by looking at the resulting pattern of lights. Is that torch faint because it is further from us than most, or because its bulb is dimmer than most? But if each person were to flash the wattage of their bulbs in Morse Code we could estimate distances by comparing their apparent brightness (since distant objects appear fainter) to their actual intensity (which is encoded in the flashing light).

In the case of Cepheids they are not flashing in Morse code; instead, nature provides us with the requisite information via the time it takes for their brightness to vary from maximum to minimum and back to a maximum again.

Hubble used this knowledge to estimate the distance to Andromeda. While the number he found was lower than the best present-day estimates it was still large enough to show that it was far from the Milky Way and this roughly the same size as our galaxy.

The immediate implication, given that Andromeda is the brightest of the many nebulae we see in big telescopes, was that our Milky Way was neither alone nor unique in the Universe. Thus we confirmed that our galaxy was just one of an almost uncountable number of islands in the ocean of space – and the full scale of the cosmos yielded to human measurement for the first time, through Hubble’s careful lens on a curious star.

A modern image (made by Richard Gentler) of the Andromeda galaxy with a closeup on what is now called “Hubble’s star” taken using the (appropriately enough) Hubble Space Telescope, in the white circle. A “positive” image from Hubble’s original plate is shown at the bottom right.

Illustration Credit: NASA, ESA and Z. Levay (STScI). Credit: NASA, ESA and the Hubble Heritage Team (STScI/AURA)

[1] Astronomers in Hubble’s day used a gizmo called a “Blink Comparator” that chops quickly between two images viewed through an eyepiece, so objects changing in brightness draw attention to themselves by flickering.

[2] In most reproductions of the original plate I am hard put to spot it all, even more so when it is projected on a screen in a lecture theatre. A bit of mild image processing makes it a little clearer, but it hardly calls attention to itself.


[3] This “period-luminosity law” had been described just 15 years earlier by Henrietta Swan Leavitt and it is still key to setting the overall scale of the Universe.

December 18, 2023

Jordan EllenbergShow report: Bug Moment, Graham Hunt, Dusk, Disq at High Noon Saloon

I haven’t done a show report in a long time because I barely go to shows anymore! Actually, though, this fall I went to three. First, The Beths, opening for The National, but I didn’t stay for The National because I don’t know or care about them; I just wanted to see the latest geniuses of New Zealand play “Expert in a Dying Field”

Next was the Violent Femmes, playing their self-titled debut in order. They used to tour a lot and I used to see them a lot, four or five times in college and grad school I think. They never really grow old and Gordon Gano never stops sounding exactly like Gordon Gano. A lot of times I go to reunion shows and there are a lot of young people who must have come to the band through their back catalogue. Not Violent Femmes! 2000 people filling the Sylvee and I’d say 95% were between 50 and 55. One of the most demographically narrowcast shows I’ve ever been to. Maybe beaten out by the time I saw Black Francis at High Noon and not only was everybody exactly my age they were also all men. (Actually, it was interesting to me there were a lot of women at this show! I think of Violent Femmes as a band for the boys.)

But I came in to write about the show I saw this weekend, four Wisconsin acts playing the High Noon. I really came to see Disq, whose single “Daily Routine” I loved when it came out and I still haven’t gotten tired of. Those chords! Sevenths? They’re something:

Dusk was an Appleton band that played funky/stompy/indie, Bug Moment had an energetic frontwoman named Rosenblatt and were one of those bands where no two members looked like they were in the same band. But the real discovery of the night, for me, was Graham Hunt, who has apparently been a Wisconsin scene fixture forever. Never heard of the guy. But wow! Indie power-pop of the highest order. When Hunt’s voice cracks and scrapes the high notes he reminds me a lot of the other great Madison noisy-indie genius named Graham, Graham Smith, aka Kleenex Girl Wonder, who recorded the last great album of the 1990s in his UW-Madison dorm room. Graham Hunt’s new album, Try Not To Laugh, is out this week. ”Emergency Contact” is about as pretty and urgent as this kind of music gets. 

And from his last record, If You Knew Would You Believe it, “How Is That Different,” which rhymes blanket, eye slit, left it, and orbit. Love it! Reader, I bought a T-shirt.

December 16, 2023

Terence TaoOn a conjecture of Marton

Tim Gowers, Ben Green, Freddie Manners, and I have just uploaded to the arXiv our paper “On a conjecture of Marton“. This paper establishes a version of the notorious Polynomial Freiman–Ruzsa conjecture (first proposed by Katalin Marton):

Theorem 1 (Polynomial Freiman–Ruzsa conjecture) Let {A \subset {\bf F}_2^n} be such that {|A+A| \leq K|A|}. Then {A} can be covered by at most {2K^{12}} translates of a subspace {H} of {{\bf F}_2^n} of cardinality at most {A}.

The previous best known result towards this conjecture was by Konyagin (as communicated in this paper of Sanders), who obtained a similar result but with {2K^{12}} replaced by {\exp(O_\varepsilon(\log^{3+\varepsilon} K))} for any {\varepsilon>0} (assuming that say {K \geq 3/2} to avoid some degeneracies as {K} approaches {1}, which is not the difficult case of the conjecture). The conjecture (with {12} replaced by an unspecified constant {C}) has a number of equivalent forms; see this survey of Green, and these papers of Lovett and of Green and myself for some examples; in particular, as discussed in the latter two references, the constants in the inverse {U^3({\bf F}_2^n)} theorem are now polynomial in nature (although we did not try to optimize the constant).

The exponent {12} here was the product of a large number of optimizations to the argument (our original exponent here was closer to {1000}), but can be improved even further with additional effort (our current argument, for instance, allows one to replace it with {7+\sqrt{17} = 11.123\dots}, but we decided to state our result using integer exponents instead).

In this paper we will focus exclusively on the characteristic {2} case (so we will be cavalier in identifying addition and subtraction), but in a followup paper we will establish similar results in other finite characteristics.

Much of the prior progress on this sort of result has proceeded via Fourier analysis. Perhaps surprisingly, our approach uses no Fourier analysis whatsoever, being conducted instead entirely in “physical space”. Broadly speaking, it follows a natural strategy, which is to induct on the doubling constant {|A+A|/|A|}. Indeed, suppose for instance that one could show that every set {A} of doubling constant {K} was “commensurate” in some sense to a set {A'} of doubling constant at most {K^{0.99}}. One measure of commensurability, for instance, might be the Ruzsa distance {\log \frac{|A+A'|}{|A|^{1/2} |A'|^{1/2}}}, which one might hope to control by {O(\log K)}. Then one could iterate this procedure until doubling constant dropped below say {3/2}, at which point the conjecture is known to hold (there is an elementary argument that if {A} has doubling constant less than {3/2}, then {A+A} is in fact a subspace of {{\bf F}_2^n}). One can then use several applications of the Ruzsa triangle inequality

\displaystyle  \log \frac{|A+C|}{|A|^{1/2} |C|^{1/2}} \leq \log \frac{|A+B|}{|A|^{1/2} |B|^{1/2}} + \log \frac{|B+C|}{|B|^{1/2} |C|^{1/2}}

to conclude (the fact that we reduce {K} to {K^{0.99}} means that the various Ruzsa distances that need to be summed are controlled by a convergent geometric series).

There are a number of possible ways to try to “improve” a set {A} of not too large doubling by replacing it with a commensurate set of better doubling. We note two particular potential improvements:

  • (i) Replacing {A} with {A+A}. For instance, if {A} was a random subset (of density {1/K}) of a large subspace {H} of {{\bf F}_2^n}, then replacing {A} with {A+A} usually drops the doubling constant from {K} down to nearly {1} (under reasonable choices of parameters).
  • (ii) Replacing {A} with {A \cap (A+h)} for a “typical” {h \in A+A}. For instance, if {A} was the union of {K} random cosets of a subspace {H} of large codimension, then replacing {A} with {A \cap (A+h)} again usually drops the doubling constant from {K} down to nearly {1}.

Unfortunately, there are sets {A} where neither of the above two operations (i), (ii) significantly improves the doubling constant. For instance, if {A} is a random density {1/\sqrt{K}} subset of {\sqrt{K}} random translates of a medium-sized subspace {H}, one can check that the doubling constant stays close to {K} if one applies either operation (i) or operation (ii). But in this case these operations don’t actually worsen the doubling constant much either, and by applying some combination of (i) and (ii) (either intersecting {A+A} with a translate, or taking a sumset of {A \cap (A+h)} with itself) one can start lowering the doubling constant again.

This begins to suggest a potential strategy: show that at least one of the operations (i) or (ii) will improve the doubling constant, or at least not worsen it too much; and in the latter case, perform some more complicated operation to locate the desired doubling constant improvement.

A sign that this strategy might have a chance of working is provided by the following heuristic argument. If {A} has doubling constant {K}, then the Cartesian product {A \times A} has doubling constant {K^2}. On the other hand, by using the projection map {\pi: {\bf F}_2^n \times {\bf F}_2^n \rightarrow {\bf F}_2^n} defined by {\pi(x,y) := x+y}, we see that {A \times A} projects to {\pi(A \times A) = A+A}, with fibres {\pi^{-1}(\{h\})} being essentially a copy of {A \cap (A+h)}. So, morally, {A \times A} also behaves like a “skew product” of {A+A} and the fibres {A \cap (A+h)}, which suggests (non-rigorously) that the doubling constant {K^2} of {A \times A} is also something like the doubling constant of {A + A}, times the doubling constant of a typical fibre {A \cap (A+h)}. This would imply that at least one of {A +A} and {A \cap (A+h)} would have doubling constant at most {K}, and thus that at least one of operations (i), (ii) would not worsen the doubling constant.

Unfortunately, this argument does not seem to be easily made rigorous using the traditional doubling constant; even the significantly weaker statement that {A+A} has doubling constant at most {K^2} is false (see comments for more discussion). However, it turns out (as discussed in this recent paper of myself with Green and Manners) that things are much better. Here, the analogue of a subset {A} in {{\bf F}_2^n} is a random variable {X} taking values in {{\bf F}_2^n}, and the analogue of the (logarithmic) doubling constant {\log \frac{|A+A|}{|A|}} is the entropic doubling constant {d(X;X) := {\bf H}(X_1+X_2)-{\bf H}(X)}, where {X_1,X_2} are independent copies of {X}. If {X} is a random variable in some additive group {G} and {\pi: G \rightarrow H} is a homomorphism, one then has what we call the fibring inequality

\displaystyle  d(X;X) \geq d(\pi(X);\pi(X)) + d(X|\pi(X); X|\pi(X)),

where the conditional doubling constant {d(X|\pi(X); X|\pi(X))} is defined as

\displaystyle  d(X|\pi(X); X|\pi(X)) = {\bf H}(X_1 + X_2 | \pi(X_1), \pi(X_2)) - {\bf H}( X | \pi(X) ).

Indeed, from the chain rule for Shannon entropy one has

\displaystyle  {\bf H}(X) = {\bf H}(\pi(X)) + {\bf H}(X|\pi(X))


\displaystyle  {\bf H}(X_1+X_2) = {\bf H}(\pi(X_1) + \pi(X_2)) + {\bf H}(X_1 + X_2|\pi(X_1) + \pi(X_2))

while from the non-negativity of conditional mutual information one has

\displaystyle  {\bf H}(X_1 + X_2|\pi(X_1) + \pi(X_2)) \geq {\bf H}(X_1 + X_2|\pi(X_1), \pi(X_2))

and it is an easy matter to combine all these identities and inequalities to obtain the claim.

Applying this inequality with {X} replaced by two independent copies {(X_1,X_2)} of itself, and using the addition map {(x,y) \mapsto x+y} for {\pi}, we obtain in particular that

\displaystyle  2 d(X;X) \geq d(X_1+X_2; X_1+X_2) + d(X_1,X_2|X_1+X_2; X_1,X_2|X_1+X_2)

or (since {X_2} is determined by {X_1} once one fixes {X_1+X_2})

\displaystyle  2 d(X;X) \geq d(X_1+X_2; X_1+X_2) + d(X_1|X_1+X_2; X_1|X_1+X_2).

So if {d(X;X) = \log K}, then at least one of {d(X_1+X_2; X_1+X_2)} or {d(X_1|X_1+X_2; X_1|X_1+X_2)} will be less than or equal to {\log K}. This is the entropy analogue of at least one of (i) or (ii) improving, or at least not degrading the doubling constant, although there are some minor technicalities involving how one deals with the conditioning to {X_1+X_2} in the second term {d(X_1|X_1+X_2; X_1|X_1+X_2)} that we will gloss over here (one can pigeonhole the instances of {X_1} to different events {X_1+X_2=x}, {X_1+X_2=x'}, and “depolarise” the induction hypothesis to deal with distances {d(X;Y)} between pairs of random variables {X,Y} that do not necessarily have the same distribution). Furthermore, we can even calculate the defect in the above inequality: a careful inspection of the above argument eventually reveals that

\displaystyle  2 d(X;X) = d(X_1+X_2; X_1+X_2) + d(X_1|X_1+X_2; X_1|X_1+X_2)

\displaystyle  + {\bf I}( X_1 + X_2 : X_1 + X_3 | X_1 + X_2 + X_3 + X_4)

where we now take four independent copies {X_1,X_2,X_3,X_4}. This leads (modulo some technicalities) to the following interesting conclusion: if neither (i) nor (ii) leads to an improvement in the entropic doubling constant, then {X_1+X_2} and {X_2+X_3} are conditionally independent relative to {X_1+X_2+X_3+X_4}. This situation (or an approximation to this situation) is what we refer to in the paper as the “endgame”.

A version of this endgame conclusion is in fact valid in any characteristic. But in characteristic {2}, we can take advantage of the identity

\displaystyle  (X_1+X_2) + (X_2+X_3) = X_1 + X_3.

Conditioning on {X_1+X_2+X_3+X_4}, and using symmetry we now conclude that if we are in the endgame exactly (so that the mutual information is zero), then the independent sum of two copies of {(X_1+X_2|X_1+X_2+X_3+X_4)} has exactly the same distribution; in particular, the entropic doubling constant here is zero, which is certainly a reduction in the doubling constant.

To deal with the situation where the conditional mutual information is small but not completely zero, we have to use an entropic version of the Balog-Szemeredi-Gowers lemma, but fortunately this was already worked out in an old paper of mine (although in order to optimise the final constant, we ended up using a slight variant of that lemma).

I am planning to formalize this paper in the Lean4 language. Further discussion of this project will take place on this Zulip stream, and the project itself will be held at this Github repository.

December 12, 2023

Terence TaoA generalized Cauchy-Schwarz inequality via the Gibbs variational formula

Let {S} be a non-empty finite set. If {X} is a random variable taking values in {S}, the Shannon entropy {H[X]} of {X} is defined as

\displaystyle H[X] = -\sum_{s \in S} {\bf P}[X = s] \log {\bf P}[X = s].

There is a nice variational formula that lets one compute logs of sums of exponentials in terms of this entropy:

Lemma 1 (Gibbs variational formula) Let {f: S \rightarrow {\bf R}} be a function. Then

\displaystyle  \log \sum_{s \in S} \exp(f(s)) = \sup_X {\bf E} f(X) + {\bf H}[X]. \ \ \ \ \ (1)

Proof: Note that shifting {f} by a constant affects both sides of (1) the same way, so we may normalize {\sum_{s \in S} \exp(f(s)) = 1}. Then {\exp(f(s))} is now the probability distribution of some random variable {Y}, and the inequality can be rewritten as

\displaystyle  0 = \sup_X \sum_{s \in S} {\bf P}[X = s] \log {\bf P}[Y = s] -\sum_{s \in S} {\bf P}[X = s] \log {\bf P}[X = s].

But this is precisely the Gibbs inequality. (The expression inside the supremum can also be written as {-D_{KL}(X||Y)}, where {D_{KL}} denotes Kullback-Leibler divergence. One can also interpret this inequality as a special case of the Fenchel–Young inequality relating the conjugate convex functions {x \mapsto e^x} and {y \mapsto y \log y - y}.) \Box

In this note I would like to use this variational formula (which is also known as the Donsker-Varadhan variational formula) to give another proof of the following inequality of Carbery.

Theorem 2 (Generalized Cauchy-Schwarz inequality) Let {n \geq 0}, let {S, T_1,\dots,T_n} be finite non-empty sets, and let {\pi_i: S \rightarrow T_i} be functions for each {i=1,\dots,n}. Let {K: S \rightarrow {\bf R}^+} and {f_i: T_i \rightarrow {\bf R}^+} be positive functions for each {i=1,\dots,n}. Then

\displaystyle  \sum_{s \in S} K(s) \prod_{i=1}^n f_i(\pi_i(s)) \leq Q \prod_{i=1}^n (\sum_{t_i \in T_i} f_i(t_i)^{n+1})^{1/(n+1)}

where {Q} is the quantity

\displaystyle  Q := (\sum_{(s_0,\dots,s_n) \in \Omega_n} K(s_0) \dots K(s_n))^{1/(n+1)}

where {\Omega_n} is the set of all tuples {(s_0,\dots,s_n) \in S^{n+1}} such that {\pi_i(s_{i-1}) = \pi_i(s_i)} for {i=1,\dots,n}.

Thus for instance, the identity is trivial for {n=0}. When {n=1}, the inequality reads

\displaystyle  \sum_{s \in S} K(s) f_1(\pi_1(s)) \leq (\sum_{s_0,s_1 \in S: \pi_1(s_0)=\pi_1(s_1)} K(s_0) K(s_1))^{1/2}

\displaystyle  ( \sum_{t_1 \in T_1} f_1(t_1)^2)^{1/2},

which is easily proven by Cauchy-Schwarz, while for {n=2} the inequality reads

\displaystyle  \sum_{s \in S} K(s) f_1(\pi_1(s)) f_2(\pi_2(s))

\displaystyle  \leq (\sum_{s_0,s_1, s_2 \in S: \pi_1(s_0)=\pi_1(s_1); \pi_2(s_1)=\pi_2(s_2)} K(s_0) K(s_1) K(s_2))^{1/3}

\displaystyle (\sum_{t_1 \in T_1} f_1(t_1)^3)^{1/3} (\sum_{t_2 \in T_2} f_2(t_2)^3)^{1/3}

which can also be proven by elementary means. However even for {n=3}, the existing proofs require the “tensor power trick” in order to reduce to the case when the {f_i} are step functions (in which case the inequality can be proven elementarily, as discussed in the above paper of Carbery).

We now prove this inequality. We write {K(s) = \exp(k(s))} and {f_i(t_i) = \exp(g_i(t_i))} for some functions {k: S \rightarrow {\bf R}} and {g_i: T_i \rightarrow {\bf R}}. If we take logarithms in the inequality to be proven and apply Lemma 1, the inequality becomes

\displaystyle  \sup_X {\bf E} k(X) + \sum_{i=1}^n g_i(\pi_i(X)) + {\bf H}[X]

\displaystyle  \leq \frac{1}{n+1} \sup_{(X_0,\dots,X_n)} {\bf E} k(X_0)+\dots+k(X_n) + {\bf H}[X_0,\dots,X_n]

\displaystyle  + \frac{1}{n+1} \sum_{i=1}^n \sup_{Y_i} (n+1) {\bf E} g_i(Y_i) + {\bf H}[Y_i]

where {X} ranges over random variables taking values in {S}, {X_0,\dots,X_n} range over tuples of random variables taking values in {\Omega_n}, and {Y_i} range over random variables taking values in {T_i}. Comparing the suprema, the claim now reduces to

Lemma 3 (Conditional expectation computation) Let {X} be an {S}-valued random variable. Then there exists a {\Omega_n}-valued random variable {(X_0,\dots,X_n)}, where each {X_i} has the same distribution as {X}, and

\displaystyle  {\bf H}[X_0,\dots,X_n] = (n+1) {\bf H}[X]

\displaystyle - {\bf H}[\pi_1(X)] - \dots - {\bf H}[\pi_n(X)].

Proof: We induct on {n}. When {n=0} we just take {X_0 = X}. Now suppose that {n \geq 1}, and the claim has already been proven for {n-1}, thus one has already obtained a tuple {(X_0,\dots,X_{n-1}) \in \Omega_{n-1}} with each {X_0,\dots,X_{n-1}} having the same distribution as {X}, and

\displaystyle  {\bf H}[X_0,\dots,X_{n-1}] = n {\bf H}[X] - {\bf H}[\pi_1(X)] - \dots - {\bf H}[\pi_{n-1}(X)].

By hypothesis, {\pi_n(X_{n-1})} has the same distribution as {\pi_n(X)}. For each value {t_n} attained by {\pi_n(X)}, we can take conditionally independent copies of {(X_0,\dots,X_{n-1})} and {X} conditioned to the events {\pi_n(X_{n-1}) = t_n} and {\pi_n(X) = t_n} respectively, and then concatenate them to form a tuple {(X_0,\dots,X_n)} in {\Omega_n}, with {X_n} a further copy of {X} that is conditionally independent of {(X_0,\dots,X_{n-1})} relative to {\pi_n(X_{n-1}) = \pi_n(X)}. One can the use the entropy chain rule to compute

\displaystyle  {\bf H}[X_0,\dots,X_n] = {\bf H}[\pi_n(X_n)] + {\bf H}[X_0,\dots,X_n| \pi_n(X_n)]

\displaystyle  = {\bf H}[\pi_n(X_n)] + {\bf H}[X_0,\dots,X_{n-1}| \pi_n(X_n)] + {\bf H}[X_n| \pi_n(X_n)]

\displaystyle  = {\bf H}[\pi_n(X)] + {\bf H}[X_0,\dots,X_{n-1}| \pi_n(X_{n-1})] + {\bf H}[X_n| \pi_n(X_n)]

\displaystyle  = {\bf H}[\pi_n(X)] + ({\bf H}[X_0,\dots,X_{n-1}] - {\bf H}[\pi_n(X_{n-1})])

\displaystyle + ({\bf H}[X_n] - {\bf H}[\pi_n(X_n)])

\displaystyle  ={\bf H}[X_0,\dots,X_{n-1}] + {\bf H}[X_n] - {\bf H}[\pi_n(X_n)]

and the claim now follows from the induction hypothesis. \Box

With a little more effort, one can replace {S} by a more general measure space (and use differential entropy in place of Shannon entropy), to recover Carbery’s inequality in full generality; we leave the details to the interested reader.

December 07, 2023

Terence TaoA slightly longer Lean 4 proof tour

In my previous post, I walked through the task of formally deducing one lemma from another in Lean 4. The deduction was deliberately chosen to be short and only showcased a small number of Lean tactics. Here I would like to walk through the process I used for a slightly longer proof I worked out recently, after seeing the following challenge from Damek Davis: to formalize (in a civilized fashion) the proof of the following lemma:

Lemma. Let \{a_k\} and \{D_k\} be sequences of real numbers indexed by natural numbers k=0,1,\dots, with a_k non-increasing and D_k non-negative. Suppose also that a_k \leq D_k - D_{k+1} for all k \geq 0. Then a_k \leq \frac{D_0}{k+1} for all k.

Here I tried to draw upon the lessons I had learned from the PFR formalization project, and to first set up a human readable proof of the lemma before starting the Lean formalization – a lower-case “blueprint” rather than the fancier Blueprint used in the PFR project. The main idea of the proof here is to use the telescoping series identity

\displaystyle \sum_{i=0}^k D_i - D_{i+1} = D_0 - D_{k+1}.

Since D_{k+1} is non-negative, and a_i \leq D_i - D_{i+1} by hypothesis, we have

\displaystyle \sum_{i=0}^k a_i \leq D_0

but by the monotone hypothesis on a_i the left-hand side is at least (k+1) a_k, giving the claim.

This is already a human-readable proof, but in order to formalize it more easily in Lean, I decided to rewrite it as a chain of inequalities, starting at a_k and ending at D_0 / (k+1). With a little bit of pen and paper effort, I obtained

a_k = (k+1) a_k / (k+1)

(by field identities)

= (\sum_{i=0}^k a_k) / (k+1)

(by the formula for summing a constant)

\leq (\sum_{i=0}^k a_i) / (k+1)

(by the monotone hypothesis)

\leq (\sum_{i=0}^k D_i - D_{i+1}) / (k+1)

(by the hypothesis a_i \leq D_i - D_{i+1}

= (D_0 - D_{k+1}) / (k+1)

(by telescoping series)

\leq D_0 / (k+1)

(by the non-negativity of D_{k+1}).

I decided that this was a good enough blueprint for me to work with. The next step is to formalize the statement of the lemma in Lean. For this quick project, it was convenient to use the online Lean playground, rather than my local IDE, so the screenshots will look a little different from those in the previous post. (If you like, you can follow this tour in that playground, by clicking on the screenshots of the Lean code.) I start by importing Lean’s math library, and starting an example of a statement to state and prove:

Now we have to declare the hypotheses and variables. The main variables here are the sequences a_k and D_k, which in Lean are best modeled by functions a, D from the natural numbers ℕ to the reals ℝ. (One can choose to “hardwire” the non-negativity hypothesis into the D_k by making D take values in the nonnegative reals {\bf R}^+ (denoted NNReal in Lean), but this turns out to be inconvenient, because the laws of algebra and summation that we will need are clunkier on the non-negative reals (which are not even a group) than on the reals (which are a field). So we add in the variables:

Now we add in the hypotheses, which in Lean convention are usually given names starting with h. This is fairly straightforward; the one thing is that the property of being monotone decreasing already has a name in Lean’s Mathlib, namely Antitone, and it is generally a good idea to use the Mathlib provided terminology (because that library contains a lot of useful lemmas about such terms).

One thing to note here is that Lean is quite good at filling in implied ranges of variables. Because a and D have the natural numbers ℕ as their domain, the dummy variable k in these hypotheses is automatically being quantified over ℕ. We could have made this quantification explicit if we so chose, for instance using ∀ k : ℕ, 0 ≤ D k instead of ∀ k, 0 ≤ D k, but it is not necessary to do so. Also note that Lean does not require parentheses when applying functions: we write D k here rather than D(k) (which in fact does not compile in Lean unless one puts a space between the D and the parentheses). This is slightly different from standard mathematical notation, but is not too difficult to get used to.

This looks like the end of the hypotheses, so we could now add a colon to move to the conclusion, and then add that conclusion:

This is a perfectly fine Lean statement. But it turns out that when proving a universally quantified statement such as ∀ k, a k ≤ D 0 / (k + 1), the first step is almost always to open up the quantifier to introduce the variable k (using the Lean command intro k). Because of this, it is slightly more efficient to hide the universal quantifier by placing the variable k in the hypotheses, rather than in the quantifier (in which case we have to now specify that it is a natural number, as Lean can no longer deduce this from context):

At this point Lean is complaining of an unexpected end of input: the example has been stated, but not proved. We will temporarily mollify Lean by adding a sorry as the purported proof:

Now Lean is content, other than giving a warning (as indicated by the yellow squiggle under the example) that the proof contains a sorry.

It is now time to follow the blueprint. The Lean tactic for proving an inequality via chains of other inequalities is known as calc. We use the blueprint to fill in the calc that we want, leaving the justifications of each step as “sorry”s for now:

Here, we “open“ed the Finset namespace in order to easily access Finset‘s range function, with range k basically being the finite set of natural numbers \{0,\dots,k-1\}, and also “open“ed the BigOperators namespace to access the familiar ∑ notation for (finite) summation, in order to make the steps in the Lean code resemble the blueprint as much as possible. One could avoid opening these namespaces, but then expressions such as ∑ i in range (k+1), a i would instead have to be written as something like Finset.sum (Finset.range (k+1)) (fun i ↦ a i), which looks a lot less like like standard mathematical writing. The proof structure here may remind some readers of the “two column proofs” that are somewhat popular in American high school geometry classes.

Now we have six sorries to fill. Navigating to the first sorry, Lean tells us the ambient hypotheses, and the goal that we need to prove to fill that sorry:

The ⊢ symbol here is Lean’s marker for the goal. The uparrows ↑ are coercion symbols, indicating that the natural number k has to be converted to a real number in order to interact via arithmetic operations with other real numbers such as a k, but we can ignore these coercions for this tour (for this proof, it turns out Lean will basically manage them automatically without need for any explicit intervention by a human).

The goal here is a self-evident algebraic identity; it involves division, so one has to check that the denominator is non-zero, but this is self-evident. In Lean, a convenient way to establish algebraic identities is to use the tactic field_simp to clear denominators, and then ring to verify any identity that is valid for commutative rings. This works, and clears the first sorry:

field_simp, by the way, is smart enough to deduce on its own that the denominator k+1 here is manifestly non-zero (and in fact positive); no human intervention is required to point this out. Similarly for other “clearing denominator” steps that we will encounter in the other parts of the proof.

Now we navigate to the next `sorry`. Lean tells us the hypotheses and goals:

We can reduce the goal by canceling out the common denominator ↑k+1. Here we can use the handy Lean tactic congr, which tries to match two sides of an equality goal as much as possible, and leave any remaining discrepancies between the two sides as further goals to be proven. Applying congr, the goal reduces to

Here one might imagine that this is something that one can prove by induction. But this particular sort of identity – summing a constant over a finite set – is already covered by Mathlib. Indeed, searching for Finset, sum, and const soon leads us to the Finset.sum_const lemma here. But there is an even more convenient path to take here, which is to apply the powerful tactic simp, which tries to simplify the goal as much as possible using all the “simp lemmas” Mathlib has to offer (of which Finset.sum_const is an example, but there are thousands of others). As it turns out, simp completely kills off this identity, without any further human intervention:

Now we move on to the next sorry, and look at our goal:

congr doesn’t work here because we have an inequality instead of an equality, but there is a powerful relative gcongr of congr that is perfectly suited for inequalities. It can also open up sums, products, and integrals, reducing global inequalities between such quantities into pointwise inequalities. If we invoke gcongr with i hi (where we tell gcongr to use i for the variable opened up, and hi for the constraint this variable will satisfy), we arrive at a greatly simplified goal (and a new ambient variable and hypothesis):

Now we need to use the monotonicity hypothesis on a, which we have named ha here. Looking at the documentation for Antitone, one finds a lemma that looks applicable here:

One can apply this lemma in this case by writing apply Antitone.imp ha, but because ha is already of type Antitone, we can abbreviate this to apply ha.imp. (Actually, as indicated in the documentation, due to the way Antitone is defined, we can even just use apply ha here.) This reduces the goal nicely:

The goal is now very close to the hypothesis hi. One could now look up the documentation for Finset.range to see how to unpack hi, but as before simp can do this for us. Invoking simp at hi, we obtain

Now the goal and hypothesis are very close indeed. Here we can just close the goal using the linarith tactic used in the previous tour:

The next sorry can be resolved by similar methods, using the hypothesis hD applied at the variable i:

Now for the penultimate sorry. As in a previous step, we can use congr to remove the denominator, leaving us in this state:

This is a telescoping series identity. One could try to prove it by induction, or one could try to see if this identity is already in Mathlib. Searching for Finset, sum, and sub will locate the right tool (as the fifth hit), but a simpler way to proceed here is to use the exact? tactic we saw in the previous tour:

A brief check of the documentation for sum_range_sub' confirms that this is what we want. Actually we can just use apply sum_range_sub' here, as the apply tactic is smart enough to fill in the missing arguments:

One last sorry to go. As before, we use gcongr to cancel denominators, leaving us with

This looks easy, because the hypothesis hpos will tell us that D (k+1) is nonnegative; specifically, the instance hpos (k+1) of that hypothesis will state exactly this. The linarith tactic will then resolve this goal once it is told about this particular instance:

We now have a complete proof – no more yellow squiggly line in the example. There are two warnings though – there are two variables i and hi introduced in the proof that Lean’s “linter” has noticed are not actually used in the proof. So we can rename them with underscores to tell Lean that we are okay with them not being used:

This is a perfectly fine proof, but upon noticing that many of the steps are similar to each other, one can do a bit of “code golf” as in the previous tour to compactify the proof a bit:

With enough familiarity with the Lean language, this proof actually tracks quite closely with (an optimized version of) the human blueprint.

This concludes the tour of a lengthier Lean proving exercise. I am finding the pre-planning step of the proof (using an informal “blueprint” to break the proof down into extremely granular pieces) to make the formalization process significantly easier than in the past (when I often adopted a sequential process of writing one line of code at a time without first sketching out a skeleton of the argument). (The proof here took only about 15 minutes to create initially, although for this blog post I had to recreate it with screenshots and supporting links, which took significantly more time.) I believe that a realistic near-term goal for AI is to be able to fill in automatically a significant fraction of the sorts of atomic “sorry“s of the size one saw in this proof, allowing one to convert a blueprint to a formal Lean proof even more rapidly.

One final remark: in this tour I filled in the “sorry“s in the order in which they appeared, but there is actually no requirement that one does this, and once one has used a blueprint to atomize a proof into self-contained smaller pieces, one can fill them in in any order. Importantly for a group project, these micro-tasks can be parallelized, with different contributors claiming whichever “sorry” they feel they are qualified to solve, and working independently of each other. (And, because Lean can automatically verify if their proof is correct, there is no need to have a pre-existing bond of trust with these contributors in order to accept their contributions.) Furthermore, because the specification of a “sorry” someone can make a meaningful contribution to the proof by working on an extremely localized component of it without needing the mathematical expertise to understand the global argument. This is not particularly important in this simple case, where the entire lemma is not too hard to understand to a trained mathematician, but can become quite relevant for complex formalization projects.

December 06, 2023

Terence TaoFormalizing the proof of PFR in Lean4 using Blueprint: a short tour

Since the release of my preprint with Tim, Ben, and Freddie proving the Polynomial Freiman-Ruzsa (PFR) conjecture over {\mathbb F}_2, I (together with Yael Dillies and Bhavik Mehta) have started a collaborative project to formalize this argument in the proof assistant language Lean4. It has been less than a week since the project was launched, but it is proceeding quite well, with a significant fraction of the paper already either fully or partially formalized. The project has been greatly assisted by the Blueprint tool of Patrick Massot, which allows one to write a human-readable “blueprint” of the proof that is linked to the Lean formalization; similar blueprints have been used for other projects, such as Scholze’s liquid tensor experiment. For the PFR project, the blueprint can be found here. One feature of the blueprint that I find particularly appealing is the dependency graph that is automatically generated from the blueprint, and can provide a rough snapshot of how far along the formalization has advanced. For PFR, the latest state of the dependency graph can be found here. At the current time of writing, the graph looks like this:

The color coding of the various bubbles (for lemmas) and rectangles (for definitions) is explained in the legend to the dependency graph, but roughly speaking the green bubbles/rectangles represent lemmas or definitions that have been fully formalized, and the blue ones represent lemmas or definitions which are ready to be formalized (their statements, but not proofs, have already been formalized, as well as those of all prerequisite lemmas and proofs). The goal is to get all the bubbles leading up to and including the “pfr” bubble at the bottom colored in green.

In this post I would like to give a quick “tour” of the project, to give a sense of how it operates. If one clicks on the “pfr” bubble at the bottom of the dependency graph, we get the following:

Here, Blueprint is displaying a human-readable form of the PFR statement. This is coming from the corresponding portion of the blueprint, which also comes with a human-readable proof of this statement that relies on other statements in the project:

(I have cropped out the second half of the proof here, as it is not relevant to the discussion.)

Observe that the “pfr” bubble is white, but has a green border. This means that the statement of PFR has been formalized in Lean, but not the proof; and the proof itself is not ready to be formalized, because some of the prerequisites (in particular, “entropy-pfr” (Theorem 6.16)) do not even have their statements formalized yet. If we click on the “Lean” link below the description of PFR in the dependency graph, we are lead to the (auto-generated) Lean documentation for this assertion:

This is what a typical theorem in Lean looks like (after a procedure known as “pretty printing”). There are a number of hypotheses stated before the colon, for instance that G is a finite elementary abelian group of order 2 (this is how we have chosen to formalize the finite field vector spaces {\bf F}_2^n), that A is a non-empty subset of G (the hypothesis that A is non-empty was not stated in the LaTeX version of the conjecture, but we realized it was necessary in the formalization, and will update the LaTeX blueprint shortly to reflect this) with the cardinality of A+A less than K times the cardinality of A, and the statement after the colon is the conclusion: that A can be contained in the sum c+H of a subgroup H of G and a set c of cardinality at most 2K^{12}.

The astute reader may notice that the above theorem seems to be missing one or two details, for instance it does not explicitly assert that H is a subgroup. This is because the “pretty printing” suppresses some of the information in the actual statement of the theorem, which can be seen by clicking on the “Source” link:

Here we see that H is required to have the “type” of an additive subgroup of G. (Lean’s language revolves very strongly around types, but for this tour we will not go into detail into what a type is exactly.) The prominent “sorry” at the bottom of this theorem asserts that a proof is not yet provided for this theorem, but the intention of course is to replace this “sorry” with an actual proof eventually.

Filling in this “sorry” is too hard to do right now, so let’s look for a simpler task to accomplish instead. Here is a simple intermediate lemma “ruzsa-nonneg” that shows up in the proof:

The expression d[X; Y] refers to something called the entropic Ruzsa distance between X and Y, which is something that is defined elsewhere in the project, but for the current discussion it is not important to know its precise definition, other than that it is a real number. The bubble is blue with a green border, which means that the statement has been formalized, and the proof is ready to be formalized also. The blueprint dependency graph indicates that this lemma can be deduced from just one preceding lemma, called “ruzsa-diff“:

ruzsa-diff” is also blue and bordered in green, so it has the same current status as “ruzsa-nonneg“: the statement is formalized, and the proof is ready to be formalized also, but the proof has not been written in Lean yet. The quantity H[X], by the way, refers to the Shannon entropy of X, defined elsewhere in the project, but for this discussion we do not need to know its definition, other than to know that it is a real number.

Looking at Lemma 3.11 and Lemma 3.13 it is clear how the former will imply the latter: the quantity |H[X] - H[Y]| is clearly non-negative! (There is a factor of 2 present in Lemma 3.11, but it can be easily canceled out.) So it should be an easy task to fill in the proof of Lemma 3.13 assuming Lemma 3.11, even if we still don’t know how to prove Lemma 3.11 yet. Let’s first look at the Lean code for each lemma. Lemma 3.11 is formalized as follows:

Again we have a “sorry” to indicate that this lemma does not currently have a proof. The Lean notation (as well as the name of the lemma) differs a little from the LaTeX version for technical reasons that we will not go into here. (Also, the variables X, \mu, Y, \mu' are introduced at an earlier stage in the Lean file; again, we will ignore this point for the ensuing discussion.) Meanwhile, Lemma 3.13 is currently formalized as

OK, I’m now going to try to fill in the latter “sorry”. In my local copy of the PFR github repository, I open up the relevant Lean file in my editor (Visual Studio Code, with the lean4 extension) and navigate to the “sorry” of “rdist_nonneg”. The accompanying “Lean infoview” then shows the current state of the Lean proof:

Here we see a number of ambient hypotheses (e.g., that G is an additive commutative group, that X is a map from \Omega to G, and so forth; many of these hypotheses are not actually relevant for this particular lemma), and at the bottom we see the goal we wish to prove.

OK, so now I’ll try to prove the claim. This is accomplished by applying a series of “tactics” to transform the goal and/or hypotheses. The first step I’ll do is to put in the factor of 2 that is needed to apply Lemma 3.11. This I will do with the “suffices” tactic, writing in the proof

I now have two goals (and two “sorries”): one to show that 0 \leq 2 d[X;Y] implies 0 \leq d[X,Y], and the other to show that 0 \leq 2 d[X;Y]. (The yellow squiggly underline indicates that this lemma has not been fully proven yet due to the presence of “sorry”s. The dot “.” is a syntactic marker that is useful to separate the two goals from each other, but you can ignore it for this tour.) The Lean tactic “suffices” corresponds, roughly speaking, to the phrase “It suffices to show that …” (or more precisely, “It suffices to show that … . To see this, … . It remains to verify the claim …”) in Mathematical English. For my own education, I wrote a “Lean phrasebook” of further correspondences between lines of Lean code and sentences or phrases in Mathematical English, which can be found here.

Let’s fill in the first “sorry”. The tactic state now looks like this (cropping out some irrelevant hypotheses):

Here I can use a handy tactic “linarith“, which solves any goal that can be derived by linear arithmetic from existing hypotheses:

This works, and now the tactic state reports no goals left to prove on this branch, so we move on to the remaining sorry, in which the goal is now to prove 0 \leq 2 d[X;Y]:

Here we will try to invoke Lemma 3.11. I add the following lines of code:

The Lean tactic “have” roughly corresponds to the Mathematical English phrase “We have the statement…” or “We claim the statement…”; like “suffices”, it splits a goal into two subgoals, though in the reversed order to “suffices”.

I again have two subgoals, one to prove the bound |H[X]-H[Y]| \leq 2 d[X;Y] (which I will call “h”), and then to deduce the previous goal 0 \leq 2 d[X;Y] from h. For the first, I know I should invoke the lemma “diff_ent_le_rdist” that is encoding Lemma 3.11. One way to do this is to try the tactic “exact?”, which will automatically search to see if the goal can already be deduced immediately from an existing lemma. It reports:

So I try this (by clicking on the suggested code, which automatically pastes it into the right location), and it works, leaving me with the final “sorry”:

The lean tactic “exact” corresponds, roughly speaking, to the Mathematical English phrase “But this is exactly …”.

At this point I should mention that I also have the Github Copilot extension to Visual Studio Code installed. This is an AI which acts as an advanced autocomplete that can suggest possible lines of code as one types. In this case, it offered a suggestion which was almost correct (the second line is what we need, whereas the first is not necessary, and in fact does not even compile in Lean):

In any event, “exact?” worked in this case, so I can ignore the suggestion of Copilot this time (it has been very useful in other cases though). I apply the “exact?” tactic a second time and follow its suggestion to establish the matching bound 0 \leq |H[X] - H[Y]|:

(One can find documention for the “abs_nonneg” method here. Copilot, by the way, was also able to resolve this step, albeit with a slightly different syntax; there are also several other search engines available to locate this method as well, such as Moogle. One of the main purposes of the Lean naming conventions for lemmas, by the way, is to facilitate the location of methods such as “abs_nonneg”, which is easier figure out how to search for than a method named (say) “Lemma 1.2.1”.) To fill in the final “sorry”, I try “exact?” one last time, to figure out how to combine h and h' to give the desired goal, and it works!

Note that all the squiggly underlines have disappeared, indicating that Lean has accepted this as a valid proof. The documentation for “ge_trans” may be found here. The reader may observe that this method uses the \geq relation rather than the \leq relation, but in Lean the assertions X \geq Y and Y \leq X are “definitionally equal“, allowing tactics such as “exact” to use them interchangeably. “exact le_trans h’ h” would also have worked in this instance.

It is possible to compactify this proof quite a bit by cutting out several intermediate steps (a procedure sometimes known as “code golf“):

And now the proof is done! In the end, it was literally a “one-line proof”, which makes sense given how close Lemma 3.11 and Lemma 3.13 were to each other.

The current version of Blueprint does not automatically verify the proof (even though it does compile in Lean), so we have to manually update the blueprint as well. The LaTeX for Lemma 3.13 currently looks like this:

I add the “\leanok” macro to the proof, to flag that the proof has now been formalized:

I then push everything back up to the master Github repository. The blueprint will take quite some time (about half an hour) to rebuild, but eventually it does, and the dependency graph (which Blueprint has for some reason decided to rearrange a bit) now shows “ruzsa-nonneg” in green:

And so the formalization of PFR moves a little bit closer to completion. (Of course, this was a particularly easy lemma to formalize, that I chose to illustrate the process; one can imagine that most other lemmas will take a bit more work.) Note that while “ruzsa-nonneg” is now colored in green, we don’t yet have a full proof of this result, because the lemma “ruzsa-diff” that it relies on is not green. Nevertheless, the proof is locally complete at this point; hopefully at some point in the future, the predecessor results will also be locally proven, at which point this result will be completely proven. Note how this blueprint structure allows one to work on different parts of the proof asynchronously; it is not necessary to wait for earlier stages of the argument to be fully formalized to start working on later stages, although I anticipate a small amount of interaction between different components as we iron out any bugs or slight inaccuracies in the blueprint. (For instance, I am suspecting that we may need to add some measurability hypotheses on the random variables X, Y in the above two lemmas to make them completely true, but this is something that should emerge organically as the formalization process continues.)

That concludes the brief tour! If you are interested in learning more about the project, you can follow the Zulip chat stream; you can also download Lean and work on the PFR project yourself, using a local copy of the Github repository and sending pull requests to the master copy if you have managed to fill in one or more of the “sorry”s in the current version (but if you plan to work on anything more large scale than filling in a small lemma, it is good to announce your intention on the Zulip chat to avoid duplication of effort) . (One key advantage of working with a project based around a proof assistant language such as Lean is that it makes large-scale mathematical collaboration possible without necessarily having a pre-established level of trust amongst the collaborators; my fellow repository maintainers and I have already approved several pull requests from contributors that had not previously met, as the code was verified to be correct and we could see that it advanced the project. Conversely, as the above example should hopefully demonstrate, it is possible for a contributor to work on one small corner of the project without necessarily needing to understand all the mathematics that goes into the project as a whole.)

If one just wants to experiment with Lean without going to the effort of downloading it, you can playing try the “Natural Number Game” for a gentle introduction to the language, or the Lean4 playground for an online Lean server. Further resources to learn Lean4 may be found here.

November 27, 2023

Sean Carroll New Course: The Many Hidden Worlds of Quantum Mechanics

In past years I’ve done several courses for The Great Courses/Wondrium (formerly The Teaching Company): Dark Matter and Dark Energy, Mysteries of Modern Physics:Time, and The Higgs Boson and Beyond. Now I’m happy to announce a new one, The Many Hidden Worlds of Quantum Mechanics.

This is a series of 24 half-hour lectures, given by me with impressive video effects from the Wondrium folks.

The content will be somewhat familiar if you’ve read my book Something Deeply Hidden — the course follows a similar outline, with a few new additions and elaborations along the way. So it’s both a general introduction to quantum mechanics, and also an in-depth exploration of the Many Worlds approach in particular. It’s meant for absolutely everybody — essentially no equations this time! — but 24 lectures is plenty of time to go into depth.

Check out this trailer:

As I type this on Monday 27 November, I believe there is some kind of sale going on! So move quickly to get your quantum mechanics at unbelievably affordable prices.

November 23, 2023

Sean Carroll Thanksgiving

 This year we give thanks for a feature of nature that is frequently misunderstood: quanta. (We’ve previously given thanks for the Standard Model LagrangianHubble’s Law, the Spin-Statistics Theoremconservation of momentumeffective field theorythe error bargauge symmetryLandauer’s Principle, the Fourier TransformRiemannian Geometrythe speed of lightthe Jarzynski equalitythe moons of Jupiterspaceblack hole entropyelectromagnetism, and Arrow’s Impossibility Theorem.)

Of course quantum mechanics is very important and somewhat misunderstood in its own right; I can recommend a good book if you’d like to learn more. But we’re not getting into the measurement problem or the reality problem just now. I want to highlight one particular feature of quantum mechanics that is sometimes misinterpreted: the fact that some things, like individual excitations of quantized fields (“particles”) or the energy levels of atoms, come in sets of discrete numbers, rather than taking values on a smooth continuum. These discrete chunks of something-or-other are the “quanta” being referred to in the title of a different book, scheduled to come out next spring.

The basic issue is that people hear the phrase “quantum mechanics,” or even take a course in it, and come away with the impression that reality is somehow pixelized — made up of smallest possible units — rather than being ultimately smooth and continuous. That’s not right! Quantum theory, as far as it is currently understood, is all about smoothness. The lumpiness of “quanta” is just apparent, although it’s a very important appearance.

What’s actually happening is a combination of (1) fundamentally smooth functions, (2) differential equations, (3) boundary conditions, and (4) what we care about.

This might sound confusing, so let’s fix ideas by looking at a ubiquitous example: the simple harmonic oscillator. That can be thought of as a particle moving in one dimension, x, with a potential energy that looks like a parabola: V(x) = \frac{1}{2}\omega^2x^2. In classical mechanics, there is a lowest-energy state where the particle just sits at the bottom of its potential, unmoving, so both its kinetic and potential energies are zero. We can give it any positive amount of energy we like, either by kicking it to impart motion or just picking it up and dropping it in the potential at some point other than the origin.

Quantum mechanically, that’s not quite true (although it’s truer than you might think). Now we have a set of discrete energy levels, starting from the ground state and going upward in equal increments. Quanta!

But we didn’t put the quanta in. They come out of the above four ingredients. First, the particle is described not by its position and momentum, but by its wave function, \psi(x,t). Nothing discrete about that; it’s a fundamentally smooth function. But second, that function isn’t arbitrary; it’s going to obey the Schrödinger equation, which is a special differential equation. The Schrödinger equation tells us how the wave function evolves with time, and we can solve it starting with any initial wave function \psi(x, 0) we like. Still nothing discrete there. But there is one requirement, coming from the idea of boundary conditions: if the wave function grows (or remains constant) as x\rightarrow \pm \infty, the potential energy grows along with it. (It actually has to diminish at infinity just to be a wave function at all, but for the moment let’s think about the energy.) When we bring in the fourth ingredient, “what we care about,” the answer is that we care about low-energy states of the oscillator. That’s because in real-world situations, there is dissipation. Whatever physical system is being modeled by the harmonic oscillator, in reality it will most likely have friction or be able to give off photons or something like that. So no matter where we start, left to its own devices the oscillator will diminish in energy. So we generally care about states with relatively low energy.

Since this is quantum mechanics after all, most states of the wave function won’t have a definite energy, in much the same way they will not have a definite position or momentum. (They have “an energy” — the expectation value of the Hamiltonian — but not a “definite” one, since you won’t necessarily observe that value.) But there are some special states, the energy eigenstates, associated with a specific, measurable amount of energy. It is those states that are discrete: they come in a set made of a lowest-energy “ground” state, plus a ladder of evenly-spaced states of ever-higher energy.

We can even see why that’s true, and why the states look the way they do, just by thinking about boundary conditions. Since each state has finite energy, the wave function has to be zero at the far left and also at the far right. The energy in the state comes from two sources: the potential, and the “gradient” energy from the wiggles in the wave function. The lowest-energy state will be a compromise between “staying as close to x=0 as possible” and “not changing too rapidly at any point.” That compromise looks like the bottom (red) curve in the figure: starts at zero on the left, gradually increases and then decreases as it continues on to the right. It is a feature of eigenstates that they are all “orthogonal” to each other — there is zero net overlap between them. (Technically, if you multiply them together and integrate over x, the answer is zero.) So the next eigenstate will first oscillate down, then up, then back to zero. Subsequent energy eigenstates will each oscillate just a bit more, so they contain the least possible energy while being orthogonal to all the lower-lying states. Those requirements mean that they will each pass through zero exactly one more time than the state just below them.

And that is where the “quantum” nature of quantum mechanics comes from. Not from fundamental discreteness or anything like that; just from the properties of the set of solutions to a perfectly smooth differential equation. It’s precisely the same as why you get a fundamental note from a violin string tied at both ends, as well as a series of discrete harmonics, even though the string itself is perfectly smooth.

One cool aspect of this is that it also explains why quantum fields look like particles. A field is essentially the opposite of a particle: the latter has a specific location, while the former is spread all throughout space. But quantum fields solve equations with boundary conditions, and we care about the solutions. It turns out (see above-advertised book for details!) that if you look carefully at just a single “mode” of a field — a plane-wave vibration with specified wavelength — its wave function behaves much like that of a simple harmonic oscillator. That is, there is a ground state, a first excited state, a second excited state, and so on. Through a bit of investigation, we can verify that these states look and act like a state with zero particles, one particle, two particles, and so on. That’s where particles come from.

We see particles in the world, not because it is fundamentally lumpy, but because it is fundamentally smooth, while obeying equations with certain boundary conditions. It’s always tempting to take what we see to be the underlying truth of nature, but quantum mechanics warns us not to give in.

Is reality fundamentally discrete? Nobody knows. Quantum mechanics is certainly not, even if you have quantum gravity. Nothing we know about gravity implies that “spacetime is discrete at the Planck scale.” (That may be true, but it is not implied by anything we currently know; indeed, it is counter-indicated by things like the holographic principle.) You can think of the Planck length as the scale at which the classical approximation to spacetime is likely to break down, but that’s a statement about our approximation schemes, not the fundamental nature of reality.

States in quantum theory are described by rays in Hilbert space, which is a vector space, and vector spaces are completely smooth. You can construct a candidate vector space by starting with some discrete things like bits, then considering linear combinations, as happens in quantum computing (qubits) or various discretized models of spacetime. The resulting Hilbert space is finite-dimensional, but is still itself very much smooth, not discrete. (Rough guide: “quantizing” a discrete system gets you a finite-dimensional Hilbert space, quantizing a smooth system gets you an infinite-dimensional Hilbert space.) True discreteness requires throwing out ordinary quantum mechanics and replacing it with something fundamentally discrete, hoping that conventional QM emerges in some limit. That’s the approach followed, for example, in models like the Wolfram Physics Project. I recently wrote a paper proposing a judicious compromise, where standard QM is modified in the mildest possible way, replacing evolution in a smooth Hilbert space with evolution on a discrete lattice defined on a torus. It raises some cosmological worries, but might otherwise be phenomenologically acceptable. I don’t yet know if it has any specific experimental consequences, but we’re thinking about that.

November 13, 2023

Robert HellingHow not to detect MOND

 You might have heard about recent efforts to inspect lots of "wide binaries", double stars that orbit each other at very large distances, which is one of the tasks the Gaia mission was built for, to determine if their dynamics follows Newtonian gravity or rather MOND, the modified Newtonian dynamics (Einstein theory plays no role at such weak fields). 

You can learn about the latest update from this video by Dr. Betty (spoiler: Newton's just fine).

MOND is an alternative theory of gravity that was originally proposed as an alternative to dark matter to explain galactic rotation curves (which it does quite well, some argue better than dark matter). Since, it has been investigated in other weak gravity situations as well. In short, it introduces an additional scale \(a_0\) of dimension acceleration and posits that gravitational acceleration (either in Newton's law of gravity or in Newton's second law) are weakened by a factor


where a is the acceleration without the correction.

In the recent studies reported on in the video, people measure the stars' velocities and have to do statistics because they don't know about the orbital parameters and the orientation of the orbit relative to the line of sight.

That gave me an idea of what else one could try: When the law of gravity gets modified from its \(1/r^2\) form for large separations and correspondingly small gravitational accelerations, the orbits will no longer be Keppler ellipses. What happens for example if this modified dynamics would result for example in eccentricities growing or shrinking systematically? Then we might observe too many binaries with large/small eccentricities and that would be in indication of a modified gravitational law.

The only question is: What does the modification result in? A quick internet search did not reveal anything useful combining celestial mechanics and MOND, so I had to figure out myself. Inspection shows that you can put the modification into a modification of \(1/r^2\) into 

$$\mu(1/r^2) \frac{\vec r}{r^3}$$

and thus into a corresponding new gravitational potential. Thus much of the usual analysis carries over: Energy and angular momentum would still be conserved and one can go into the center of mass system and work with the reduced mass of the system. And I will use units in which \(GM=1\) to simplify calculations.

The only thing that will no longer be conserved is the Runge-Lenz-vector

$$\vec A= \vec p\times\vec L - \vec e_r.$$

\(\vec A\) points in the direction of the major semi-axis and its length equals the eccentricity of the ellipse.

Just recall that in Newton gravity, this is an additional constant of motion (which made the system \(SO(4,2)\) rather than \(SO(3)\) symmetric and is responsible for states with different \(\ell\) being degenerate in energy for the hydrogen atom), as one can easily check

$$\dot{\vec A} = \{H, \vec A\}= \dot{\vec p}\times \vec L-\dot{\vec e_r}=\dots=0$$

using the equations of motion in the first term. 

To test this idea I started Mathematica and used the numerical ODE solver to solve the modified equations of motion and plot the resulting orbit. I used initial data that implies a large eccentricity (so one can easily see the orientation of the ellipse) and an \(a_0\) that kicks in for about the further away half of the orbit.

Clearly, the orbit is no longer elliptic but precesses around the center of the potential. On the other hand, it does not look like the instantaneous ellipses would get rounder or narrower. So let's plot the orbit of the would be Runge Lenz vector:

Orbit of would be Runge Lenz vector \(\vec A\)

What a disappointment! Even if it is no longer conserved it seems to move on a circle with some additional wiggles on it (Did anybody mention epicycles?). So it is only the orientation of the orbit that changes with time but there is no general trend toward smaller or larger eccentricities that one might look out for in real data.

On the other hand the eccentricity \(\|\vec A\|\) is not exactly conserved but wiggles a bit with the orbit but comes back to its original value after one full rotation. Can we understand that analytically?

To this end, we make use the fact that the equation of motion is only used in the first term when computing the time derivative of \(\vec A\):

 $$\dot{\vec A}=\left(1-\mu(1/r^2)\right) \dot{\vec e_r}.$$

\(\mu\) differs from 1 far away from the center, where the acceleration is weakest. On the other hand, since \(\vec e_r\) is a unit vector, its time derivative has to be orthogonal to it. But in the far away part of the the ellipse, \(\vec e_r\) is almost parallel to the major semi axis and thus \(\vec A\) and thus \(\dot{\vec a}\) is almost orthogonal to \(\vec A\). Furthermore, due to the reflection symmetry of the ellipse, the parts of \(\dot{\vec e_r}\) that are not orthogonal to \(\vec A\) will cancel each other on both sides and thus the wiggling around the average \(\|\vec a\|\) is periodic with the period of the orbit. q.e.d.

There is only a tiny net effect since the ellipse is not exactly symmetric but precesses a little bit. This can be seen when plotting \(\|\vec A\|\) as a function of time:

\(\|\vec A\|\) as a function of time for the first 1000 units of time (brown) and from time 9000 to 10,000 (red)

The same plot zoomed in. One can see that the brown line's minimum is slightly below the red one.
If one looks very carefully, one sees a tiny trend towards larger values of eccentricity.

If one looks very carefully, one sees a tiny trend towards larger values of eccentricity.

This is probably far too weak to have any observable consequence (in particular since there are a million other perturbing effects), but these numerics suggests that binaries whose orbits probe the MOND regime for a long time should show slightly larger eccentricities on average.

So Gaia people, go out an check this!

November 09, 2023

Jordan EllenbergWriting exercise: poll report as dialogue

I’m not sure if I mentioned that I’m teaching a first-year undergrad seminar on “Writing and Data,” in some respects patterned after the Writing Scientists’ Workshop I ran last year. With 18-year-olds it’s a little different; for one thing, I find that doing two hour-long workshops in a row gets a little long for them, so I’m doing two 45-minute workshops with an in-class writing exercise in between. Last week’s worked particularly well so I wanted to record what I did. We started with this piece from Pew, “How Many Friends do Americans Have?” Because I want them to think about conveying the same information in different registers, and in particular writing more “conversationally,” I split the group into pairs and asked each pair to write a dialogue which conveyed some of the information from the Pew piece. I gave them 15-20 minutes to do that, then had each pair act out their dialogue. I had been wondering whether to have everyone start from the same source or let people pick; in the end, I was glad we were all working from the same article, because it was instructive to see how many different ways the same information could be deployed in speech, or an imitation of speech. If there’s one thing I’m trying to get across in this class, it’s that writing is much, much more than the factual information it conveys.