Scones, Logical Relations, and Parametricity
Posted by Mike Shulman
Consider a formal system, such as logic, type theory, or some programming language. Suppose you want to prove some “meta-theoretic property” of this system, such as the following.
- Consistency: There is no proof of false. This is obviously a desirable property of a formal system.
- The existence property: if it is provable that “there exists an such that ”, then there is some particular such that is provable.
- The disjunction property: if “ or ” is provable, then either is provable or is provable. The existence and disjunction properties are not true of all formal systems: they are characteristic of constructive logics. (Excluded middle obviously violates the disjunction property, while the axiom of choice obviously violates the existence property.) In fact, arguably they are a defining feature of constructive logics, where proving that something exists ought to involve “constructing” a particular such thing.
- Canonicity: any term of natural-number type evaluates to (or, is provably equal to) some numeral (and various similar statements). This is an important property for any system that can function as a programming language: we want our programs to compute results!
- Parametricity: I’ll explain this later on.
How might you go about proving properties like this? If you’re a syntactically minded type theorist, you might try to use induction over types. That is, you first prove something for “base types” such as the natural numbers , then prove that if it holds for and then it holds for derived types such as , , and , etc. After a lot of work reformulating the thing you’re trying to prove as a sufficiently general statement for such an induction to go through, you finally make it all work and write QED. Then, because the reformulation you ended up with involves assigning “relations” (sometimes unary, sometimes binary) to types in an inductive way, you call the method logical relations.
I’ve gathered that this sort of thing is perfectly intuitive to a type theorist. However, when I first encountered it, being a category theorist, I had a very hard time understanding it. Certain steps seemed extremely unmotivated, and I didn’t see any principled reason for what was going on. I think I understand the type theorists’ point of view a little better now — but fortunately there’s another way to structure the whole argument, which I think makes much more sense to a category theorist.
As category theorists, we like to use universal properties to avoid doing work. (Or, more precisely, we like to use them to package up work into an easily reusable theorem, so that we only have to do the work once.) Fortunately, the syntax of type theory has a nice universal property: it constructs free categories with specified structure.
A little more precisely, from any type theory , we can build a category whose objects are, roughly, the types in the theory, and whose morphisms are terms with free variables. That is, a morphism is a term of type involving a free variable of type . Some more details, and a few examples, can be found at the nLab page on syntactic categories.
Now whatever structure our type theory had, it is usually visible as some categorical property of the syntactic category . For instance, if had product types, then has categorical products. If also had function types, then is cartesian closed. And so on. Moreover, is the initial such category: for any other category with the appropriate categorical structure, there is an essentially unique functor preserving that structure. (I’m sweeping a lot of details under the rug here, having to do with weakness and coherence and strictification and what categorical level this universal property lives at, but the basic idea is the important thing.)
Now the question becomes, how can we use this universal property to prove meta-theoretic properties of the type theory? Clearly, we are going to have to cook up some particular structured category (or perhaps more than one) so that the existence of a structure-preserving functor tells us something about . Thus, the construction of will have to involve somehow—but must also be different from . Moreover, we should expect to have to use the uniqueness aspect of the universal property too.
When I first wrote the above paragraph, I said that we were going to have to cleverly cook up some category . Then I felt embarrassed, because I remembered Tom’s comment that “if cleverness is the first quality that comes to mind, then it suggests to me that something is insufficiently understood.” So I thought for a bit, and now I can say more about where this comes from. (I can actually say even more than I’m about to right now, but the rest will have to wait for another post.)
Let’s take a clue from type theory itself. Consider a type like , the natural numbers type. Semantically, we want to think of this as a natural numbers object, which is an initial object in some category. More specifically, for any equipped with morphisms and , there is supposed to be a unique morphism respecting this structure.
Inside of type theory, the existence of such a morphism is easy to specify: it’s called recursion and is the natural “elimination rule” for the type . But uniqueness of that morphism is not as natural, type-theoretically: it would be asserting an equality, which is less “computational”. Instead, we generalize the recursion rule to say that not only does every such come with a morphism out of , every such over comes with a section. This turns out to be equivalent to asking to be initial. For if is initial, then when is over , the map must be a section (by uniqueness of the composite ); and conversely, for any two morphisms , if their equalizer admits a section, then they are equal. A homotopical version of this equivalence was proven recently by Awodey, Gambino, and Sojakova.
(You may wonder whether talking about “sections” is any better than talking about equality, since being a section of means that . However, dependent type theory has a way to talk about sections of particular kinds of maps—called “display maps” or “dependent projections” or “fibrations”—without referring explicitly to any kind of equality.)
This suggests that to use the uniqueness as well as the existence part of the universal property of , we might look for a structured category over , for which it would be interesting to have a section. Moreover, we might expect the functor to be some sort of category-theoretic fibration.
Now how can we build an interesting fibration over , without knowing much about itself? Well, if we had a functor for some other category , and some other functor , then we could build the comma category of over or under . The most obvious choice of , of course, would be the identity functor of . But what sort of functor can we build without knowing much about ? Well, as Tom also reminded us last year, about the only functors out of an arbitrary category that we can even mention are the representables, . And finally, what’s the most canonical object ? One very canonical object is the terminal one.
There are still a few choices that we might make differently, but I hope I’ve convinced you that at least one reasonable possibility to consider for is the scone of : the comma category of over the global sections functor . Explicitly, an object of consists of an object together with a function . Elements of are called global sections of , so we can view as a family of sets indexed by the global sections of . Similarly, a morphism from to consists of a morphism in together with a function making the evident square commute. The projection simply forgets the primed data.
Over a year ago, I talked about scones from a “geometric” point of view. At that time, I said that given a “forgetful functor” , we can think of the scone as a “concretization” of : its objects are “sets equipped with -structure”. It’s at least interesting to ponder this in our current situation: the objects of are “sets equipped with the structure of a type”. I’m not really sure what that means, but it sounds profound.
Anyway, in order to apply the universal property of to get a functor , we’ll need to know that inherits all the categorical structure that has. And to conclude that this map is a section of the projection, we’ll need to know that the projection also preserves all this structure. But suppose we take these two facts as given for a moment, so that exists and is a section of the projection. That last means that for , is the object itself equipped with a family of its global sections. Thus, what does is just to equip every object of with a family of sets indexed by its global sections—that is, a map —in a way which is functorial and structure-preserving.
In particular, insofar as the maps are injective (as often happens), what does is assign to each type a predicate—that is, a unary relation—on its set of global sections. And a global section of a type in in just a “closed term” of type (meaning a term involving no free variables). Thus, in general we may think of the elements of over as “witnesses” or “proofs” that some relation “” holds.
Finally, if we look “under the hood” into the actual construction of , we see that it’s built inductively over the structure of types (since this is how we prove the universal property of ). Thus, these unary relations are precisely the syntactically-minded type theorist’s “logical relations”. What category theory does for us is (1) fold all inductions over the structure of types into the single statement that is initial in some category, so that we only have to do it once; and even more importantly, (2) tell us automatically what each inductive step in the construction of should look like: it’s determined by the corresponding categorical structure in , which in turn is characterized by a universal property.
In order to get any mileage out of this, of course, we need an explicit description of how inherits structure from and from , in a way that is preserved by the forgetful functor . This is basically always completely straightforward. Here are some easy examples that you can check for yourself:
The terminal object of is .
The initial object of is .
The product of and is . In other words, a witness of is just a witness of together with a witness of .
A morphism is a monomorphism just when both and are so. In particular, a subobject of is given by a subobject in , together with for each , a subset .
If includes (“truncated”) existential quantification, so that is a regular category, then is a regular epi just when both and are so.
If includes binary unions of subobjects, then the union of and is .
If includes a type of natural numbers, then the natural numbers object of is , where is the ordinary set of natural numbers and the function sends each to the numeral . In other words, a witness of is a natural number such that is equal to the numeral .
This is enough to prove consistency, canonicity, and the disjunction and existence properties! Consistency is the easiest: if there were a term of type false, it would be a morphism in . But then would take this to a morphism . Since preserves all the categorical structure, is the terminal object and is the initial object . But then we would have a morphism in , which is impossible. (Of course, this is a relative consistency result, as always – if our axioms for were inconsistent, then there would be a morphism .)
For the existence property, suppose we can prove . That means exactly that the composite in is a regular epi. Since preserves the relevant categorical structure in all cases, in this case it preserves regular epis. It follows that is regular epi, and in particular is a surjective function. But also preserves the terminal object, so , and hence must be nonempty. Therefore, is nonempty, so has a global element in . In other words, there is a closed term such that is provable. The argument for the disjunction property is similar; I’ll leave it as an exercise. (I think Peter Freyd was the first to prove these two properties categorically in this way.)
Finally, for canonicity, suppose is a term of natural-number type, hence a global section in . Then is a morphism in , since preserves the terminal object and the natural numbers object. But is the usual set of natural numbers, so this means that must be equal to some numeral (namely, the primed part of ).
I think this much is already pretty awesome, but now we get to parametricity. For this we need to know, first of all, what exponentials look like in . Of course, they must be preserved by the forgetful functor, so the underlying -object of is the exponential in . But now the elements of the set which lie over must be equivalent to maps in lying over in , and hence equivalently to maps lying over . In other words, if is a morphism in , hence a global element of , then the fiber of over consists of all functions lifting .
In particular, if and are monic, hence merely predicates on the global sections of and respectively, then is also a predicate on global sections of , which holds of precisely when maps into . In other words, a witness of a function is a function exhibiting the fact that preserves witnesses. (This isn’t a proof that has exponentials, of course, only a characterization of what they must be if they exist. But as usual with this sort of thing, it’s easy to prove that this definition works.)
More generally, we’ll need local exponentials (exponentials in slice categories) and dependent products, but these work in essentially the same way: to every “function term” we assign the set of “liftings” of it to witnesses.
The other thing we need to know about is universe objects. Thus, suppose is a type of (small) types, giving a universe object in . In particular, we have the type of pointed types, with the first projection being the “universal fibration” in . That is, for any type , dependent types over are classified by maps , with the resulting display map being the corresponding pullback of .
Now the first thing we need is a family of sets indexed by the global sections of . Of course, a global section of is just a type dependent over , which is to say a type defined in the empty context. Now given such an , a witness of must correspond to a map in lifting in . But if is to be a universe object, such maps must be equivalent to objects of lying over . In other words, a witness of a type is any lifting of to an object of . Any lifting at all.
In particular, since must preserve all structure, we must have . Note that this is not merely a unary relation on : a given type has many witnesses.
We also need the universal fibration . But this is likewise easy: suppose is an object of , hence a global element of , and suppose also that is a global section of . Then we must have a commutative square of sets Elements of lying over should be the same as liftings of to which agree with . But since the object must be the pullback of along its classifying map , this means precisely to give an element of lying over .
That is, for a type with a witness , a witness of is just an element of over , i.e. a witness of in the ordinary sense defined above.
Finally, we’re ready for parametricity. Consider the type , an inhabitant of which is a function assigning to every (small) type an endomorphism of . In classical mathematics, there are lots and lots of such functions. However, in constructive type theory, there’s essentially only one such function: the one which assigns to each its identity function! The point is that without some additional assumption such as excluded middle or AC, we can’t write down such a function that behaves differently depending on what is: e.g. we can’t say “if has exactly 2 elements, then swap them, otherwise, …”. (I believe this is the origin of the word “parametricity” — we say the function has to be defined parametrically in .) And if you just have an arbitrary , without knowing anything about it, then as Tom would say, what’s the only function that you can even mention? The identity function.
Of course, you can throw other garbage into the definition of such a function, but it won’t be able to change the fact that what comes out at the end of the day acts like the identity function. This is the theorem that we can now prove using the scone.
Categorically, the type is built as the local exponential in the slice category of over , followed by the dependent product from this slice category to itself. And all these constructions must be preserved by . Unraveling this, the first step gives us an object of over , whose component in is the dependent type . The additional data assigned by consists of
- for every (this is a global section of in ), and
- for every choice of a lift of to (this is a witness of ), and
- for every term (this is a global section of lying over the classifying map of ),
the set of liftings of to witnesses. Note that these witnesses are precisely given by the specified above. If we think only about relations, then at this stage, is recording, for every term , whether it preserves each unary relation on .
Now at the second step, we consider the liftings of each to the witnesses defined above. In other words, a witness of such a consists of, for every and every , a lifting of to these witnesses. In particular, this means that the function preserves every unary relation on the set . Clearly this is only possible if is the identity function (consider the characteristic relations of singletons).
Finally, any actual term yields a global section in , hence a global section of , and hence a witness of in the above sense. This is a precise form of the first parametricity theorem: any term of type behaves like the identity function when applied to terms in the empty context.
What is the point of all this? I think the example of universe objects is where the category-theoretic point of view really shines. When you do “logical relations” from the syntactic type-theory point of view, you basically have to make up each inductive clause of the definition of the logical relation as you go. The definition for function types is not too surprising, if you think about the unary relation as a sort of “well-behavedness”: it’s not unreasonable to say that a function is well-behaved if it maps well-behaved inputs to well-behaved outputs. (I believe the usual word for “well-behaved” in this context is “reducible”.) But I think the definition for a universe is pretty opaque from this point of view: a witness to reducibility of a type is any relation on its global elements at all?
Moreover, if you insist on thinking of as merely a relation on global elements, then you can’t even phrase it like this. Basic presentations of parametricity don’t actually include a universe object; instead they work in a “polymorphic type theory”, where you can form types that behave like “” but which are not literally dependent products over any type “”. Then you basically have to make up the combined rule for defining on these types as above: is reducible if for every type , the function preserves every unary relation on .
Let me end with a few words about generalizations. There are lots of other “parametricity” theorems – have a look at this classic paper. For instance, for any definable functors and , any term of type must be a natural transformation. To prove this, you need not just unary logical relations but binary logical relations. But these is easy to do categorically too: instead of the scone of , we consider the category whose objects consist of an , a set , and a function . This category inherits all the relevant structure in essentially the same way.
The most general context that I know of in which to do this is the colax limit of a (pseudo)functor , where is an inverse category. If is the interval category and picks out , then this colax limit is the scone. If is the parallel pair and picks out twice, then this colax limit is the place for binary relations I mentioned above. We obviously have -ary relations for all , and also potentially “higher” logical relations. (Are there any uses for those?)
The assumption that is inverse is what gives things like exponentials and universe objects such a nice form in the colax limit. I made use of that same fact in this paper, focusing mainly on the special case of colax limits of constant functors — which is the same as categories of diagrams over in some fixed category . But I recently realized that the same techniques work for non-constant functors, so that the scones work homotopically as well (up to a point), and in particular, inherit things like the univalence axiom.
In principle, this means that we can extend all the above theorems to type theory with univalence. Canonicity in the presence of univalence, in particular, has been an important open question since its introduction: does the univalence axiom introduce new closed terms of natural number type whose numeric value isn’t determined? Dan Licata and Bob Harper showed that the answer is essentially no in the 1-truncated case, by adding definitional equality rules to make the type theory 1-truncated, and then giving a logical relations argument to show canonicity. The gluing approach, on the other hand, doesn’t modify the underlying type theory, but yields instead a homotopical answer: every term of natural number type is homotopic to a numeral. (That this is true was conjectured by Voevodsky when he introduced the univalence axiom.)
Unfortunately, the gluing approach also suffers from a coherence problem. In a homotopical setting, we’d like to glue along a “global sections” functor valued in simplicial sets (or some other model for ). However, it seems hard to obtain a strict functor of this sort. We do have global sections functors into or , and since and contain one and two univalent universes, respectively, we can derive homotopy canonicity for 1-truncated type theory (in which all types are 1-types) with one or two univalent universes. (The groupoid scone was also considered, without univalence, in this paper by Hofstra and Warren.) The new version of my paper (posted today) describes this; it’s an open question how any of these results could be extended to type theories without a truncation axiom, and with arbitrarily many univalent universes.
Acknowledgments: this post owes a lot to Bob Harper, from whom I first learned about logical relations at OPLSS; to Peter Lumsdaine, who first tried to tell me what they had to do with scones; and to Thierry Coquand, whose talk at IAS last term about regarding setoids as truncated semi-simplicial types finally crystallized for me the relation between inverse diagrams and logical relations.
Re: Scones, Logical Relations, and Parametricity
Here are a couple more more papers which go into the categorical end of the connection:
John Mitchell and Andrew Scedrov, Notes on Sconing and Relators
It explains the basic setup of logical relations from a categorical pov, for simple types.
Wesley Phoa. An introduction to fibrations, topos theory, the effective topos and modest sets
Realizability and PER models are basically defined using logical relations, and so many of the same ideas carry over. They also make the fact that the witness to reducibility of a type is an arbitrary relation seem rather natural, at least to me.