Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

March 1, 2011

Characterizing the p-Norms

Posted by Tom Leinster

Some mathematical objects acquire a reputation for being important. We know they’re important because our lecturers told us so when we were students, and because we’ve observed that they’re treated as important by large groups of research mathematicians. If you stood up in public and asked exactly what was so important about them, you might fear getting laughed at as an ignoramus… but perhaps no one would have a really good answer. There’s only a social proof of importance.

I have a soft spot for theorems that take a mathematical object known socially to be important and state a precise mathematical sense in which it’s important. This might, for example, be a universal property (‘it’s the universal thing with these good properties’) or a unique characterization (‘it’s the unique thing with these good properties’).

Previously I’ve enthused about theorems that do this for the category Δ\Delta, the topological space [0,1][0, 1], and the Banach space L 1L^1. Today I’ll enthuse about a theorem that does it for the pp-norms p\Vert\cdot\Vert_p. The theorem is from a recent paper of Guillaume Aubrun and Ion Nechita.

The statement is beautifully simple.

First here’s some notation.

  • For any finite set II, we have the real vector space R I\mathbf{R}^I.
  • For any injection f:IJf: I \to J between finite sets, there is the induced linear map f *:R IR Jf_*: \mathbf{R}^I \to \mathbf{R}^J, got by reindexing as dictated by ff and padding out with 00s.
  • For any xR Ix \in \mathbf{R}^I and yR Jy \in \mathbf{R}^J, there is an element xyR I×Jx \otimes y \in \mathbf{R}^{I \times J}, whose (i,j)(i, j)-coordinate is x iy jx_i y_j. (I call it xyx \otimes y because if you identify R I×J\mathbf{R}^{I \times J} with R IR J\mathbf{R}^I \otimes \mathbf{R}^J, that’s what it is.)

A norm system is a sensible way of assigning a norm to each vector space R I\mathbf{R}^I, where II ranges over finite sets. In other words, it consists of a specified norm \Vert\cdot\Vert on R I\mathbf{R}^I for each finite set II, such that whenever f:IJf: I \to J is an injection of finite sets and xR Ix \in \mathbf{R}^I, then f *(x)=x\Vert f_*(x) \Vert = \Vert x \Vert. All that says is that if you pad out xx with some zeros, and switch the order of the coordinates round, it doesn’t change the norm.

Example:  For each p[1,]p \in [1, \infty] there’s a norm system p\Vert\cdot\Vert_p given by the usual formulas: if p<p \lt \infty then x p=( iI|x i| p) 1/p \Vert x \Vert_p = (\sum_{i \in I} |x_i|^p )^{1/p} (for finite sets II and xR Ix \in \mathbf{R}^I), and x =max iI|x i|\Vert x \Vert_\infty = \max_{i \in I} |x_i|.

A norm system is multiplicative if xy=xy\Vert x \otimes y \Vert = \Vert x \Vert \Vert y \Vert for all finite sets II and JJ, xR Ix \in \mathbf{R}^I, and yR Jy \in \mathbf{R}^J.

Example:  For each p[1,]p \in [1, \infty], the norm system p\Vert\cdot\Vert_p is multiplicative.

Theorem (Aubrun and Nechita) The only multiplicative norm systems are p\Vert\cdot\Vert_p (p[1,]p \in [1, \infty]).

I find this amazing, both in itself and because it wasn’t known half a century ago. In fact it’s only the first of two theorems in their paper; the second concerns the L pL_p norms (as opposed to what we’ve just been discussing, the p\ell_p norms). Anyway, I’ll stick to the first one.

I’ll say a bit more later about why I find this amazing, but first I should point out that my phrasing of Aubrun’s and Nechita’s theorem is a bit different from theirs.

How Aubrun and Nechita put it  Someone reading my phrasing might think: That’s a bit extravagant. A norm on R I\mathbf{R}^I for every finite set II? Well, the axioms imply that if IJI \cong J then the norm on R I\mathbf{R}^I determines the norm on R J\mathbf{R}^J, so we might as well just consider one finite set of each cardinality. That is, instead of taking this huge system of norms, we take just one permutation-invariant norm on each of the spaces R n\mathbf{R}^n (nNn \in \mathbf{N}), such that (x 1,,x n,0)=(x 1,,x n)\Vert (x_1, \ldots, x_n, 0) \Vert = \Vert (x_1, \ldots, x_n) \Vert.

That’s fine, but there’s a small price to pay: in order to state the multiplicativity axiom, you have to choose a bijection between {1,,n}×{1,,m}\{1, \ldots, n\} \times \{1, \ldots, m\} and {1,,nm}\{1, \ldots, n m\} for each nn and mm.

But you still might find that extravagant. The axioms imply that the norm on R n+1\mathbf{R}^{n + 1} determines the norm on R n\mathbf{R}^n, so you might as well just work with c 00c_{00}, the space of real sequences (x n) n=1 (x_n)_{n = 1}^\infty that are 00 in all but finitely many places. So instead of having a whole family of norms, you have just one norm; it’s a permutation-invariant norm on c 00c_{00}.

That’s still fine, but there’s again a price to pay: in order to state the multiplicativity axiom, you have to choose a bijection between Z +×Z +\mathbf{Z}^+ \times \mathbf{Z}^+ and Z +\mathbf{Z}^+ (where Z +\mathbf{Z}^+ is the set of positive integers). It doesn’t matter which you choose, in the sense that if multiplicativity holds for one choice then it holds for all of them. But you do have to choose one.

That, then, is what Aubrun and Nechita do: they characterize p\Vert\cdot\Vert_p (p[1,]p \in [1, \infty]) as the only norms on c 00c_{00} that are permutation-invariant and multiplicative. They don’t mention what I’ve called ‘norm systems’.

Something impressive  Here’s a consequence of Aubrun and Nechita’s theorem that is, I think, quite non-obvious. At least, I completely failed to prove it without the aid of their theorem; maybe you can do better.

The pp-norms have a special property — which, in what follows, I will call the special property. Given x 1R n 1,,x kR n k, x^1 \in \mathbf{R}^{n_1}, \ldots, x^k \in \mathbf{R}^{n_k}, write x 1;;x kR n 1++n k x^1; \ldots; x^k \in \mathbf{R}^{n_1 + \cdots + n_k} for their concatenation. The special property is that x 1;;x k p \Vert x^1; \ldots; x^k \Vert_p is determined by x 1 p,,x k p\Vert x^1 \Vert_p, \ldots, \Vert x^k \Vert_p alone.

How? By the formula x 1;;x k p= (x 1 p,,x k p)  p. \Vert x^1; \ldots; x^k \Vert_p = \Vert &nbsp; (\Vert x^1 \Vert_p, \ldots, \Vert x^k \Vert_p) &nbsp; \Vert_p. One perspective on the formula is this: it says that there’s a monoid structure on [0,)[0, \infty) whose kk-fold multiplication is (y 1,,y k)(y 1,,y k) p (y_1, \ldots, y_k) \mapsto \Vert (y_1, \ldots, y_k) \Vert_p (y i[0,)y_i \in [0, \infty)). This is kind of obvious when p<p \lt \infty, since the operation yy py \mapsto \Vert y\Vert_p is what you get when you transport addition — itself a monoid structure on [0,)[0, \infty) — across the bijection ( ) p:[0,)[0,)(&nbsp;)^p: [0, \infty) \to [0, \infty).

The Aubrun–Nechita theorem implies that any multiplicative norm system has this special property (because it must be one of the pp-norms). But can you prove this directly, without the aid of their theorem? I couldn’t. I made a bit of progress, but the progress I made more or less reproduced the beginning of their proof of the theorem itself. (It didn’t reproduce the end, which is an application of Cramér’s large deviation theorem.)

Actually, I think that if you can prove that any multiplicative norm system has the special property, then you can probably build an alternative proof of the Aubrun–Nechita theorem, as follows.

There’s a fairly ancient body of work on ‘generalized means’, for which the classic text is Hardy, Littlewood and Pólya’s 1934 book Inequalities. A generalized mean of numbers x 1,,x nx_1, \ldots, x_n is something like the arithmetic mean, (1/n)x i\sum (1/n) x_i, or the harmonic mean, ((1/n)x i 1) 1(\sum (1/n) x_i^{-1})^{-1}, or more generally ((1/n)x i p) 1/p(\sum (1/n) x_i^p)^{1/p} for any real pp. The limiting case p0p \to 0 is the geometric mean, x i 1/n\prod x_i^{1/n}. You can also change the uniform weighting 1/n,,1/n1/n, \ldots, 1/n to some non-uniform weighting.

Part of that ancient body of work is a collection of theorems characterizing generalized means. Obviously generalized means and pp-norms are closely related, and when I first saw Aubrun and Nechita’s paper I thought I could deduce their result from these classical theorems. But the step in the deduction that I couldn’t fill in was what I just mentioned: showing directly that a multiplicative norm system has the special property.

Incidentally, I first saw news of this paper at John’s blog. We were discussing characterizations of the Rényi entropies, which are closely related to generalized means and pp-norms. Mark Meckes mentioned this paper. It would be nice to use Aubrun and Nechita’s characterization of the pp-norms to produce new characterizations of generalized means and Rényi entropies.

Posted at March 1, 2011 6:02 AM UTC

TrackBack URL for this Entry:

15 Comments & 1 Trackback

Re: Characterizing the p-Norms

I have a dim recollection of seeing F. J. Linton’s name attached to a talk or preprint or paper about functorial aspects of l^p-spaces. Unfortunately I have no idea what the result was, but I wonder if it might be related to your take on Aubrun and Nechita’s paper.

Posted by: Yemon Choi on March 1, 2011 7:59 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

Thanks. Linton has written a lot on functorial aspects of Banach spaces, so what you say doesn’t surprise me. On the other hand, I don’t remember having seen any work of his on p\ell^p spaces in particular.

You say “my take”, and it’s true that I phrased Aubrun and Nechita’s theorem in a slightly different way than they did, but I think it’s a pretty trivial change in perspective.

Anyway, if you ever remember what that work of Linton was, let me know!

Posted by: Tom Leinster on March 1, 2011 8:13 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

I like your reformulation. In the mathematical culture that Guillaume and I come from (I don’t know Ion Nechita personally) it’s a common (bad?) habit to make an arbitrary choice — like fixing some bijection between Z +×Z +\mathbf{Z}^+ \times \mathbf{Z}^+ and Z +\mathbf{Z}^+ — and leaving it implicit that a more invariant formulation is possible. Other common examples are to take the setting of a theorem to be n\mathbb{R}^n with its standard inner product when the “correct” setting is an nn-dimensional real inner product space, or to work with L p[0,1]L_p[0,1] and leave it understood that results apply to more general L pL_p spaces.

Posted by: Mark Meckes on March 2, 2011 2:11 AM | Permalink | Reply to this

Re: Characterizing the p-Norms

Looks like a nice application of the tensor power trick:

Posted by: Terence Tao on March 1, 2011 10:46 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

Indeed. Incidentally, here’s another paper I like which proves a characterization theorem by combining the tensor power trick with a large deviations theorem (Sanov’s theorem in this case).

Posted by: Mark Meckes on March 2, 2011 2:00 AM | Permalink | Reply to this

Re: Characterizing the p-Norms

Interesting post!

Posted by: Bruce Bartlett on March 2, 2011 1:03 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

Ion Nechita sent me a nice email containing the following interesting paragraph, which I’m posting here with his permission:

In two earlier papers (here and here) we looked at the p\ell_p norms and at tensor products of vectors in connection with some conjecture in quantum information theory that could have also been stated using Renyi entropies. It is not clear however how the norm condition translates in terms of entropies. As for the “special property” in your post, this is slightly connected to the compositivity property (2.2.20) in the book of Aczel and Daroczy that was mentioned in the comments of John’s blog post. The open problem after (5.2.38) (same reference) might also be relevant here.

In reply to the part about compositivity, I wrote:

Thanks; I hadn’t noticed that property in their book. It’s also related to “quasi-linearity”; but I like the name “compositivity”, because I like to think of this axiom in terms of the composition in an operad. More specifically, let Δ n\Delta_n be the set of probability distributions on {1,...,n}\{1, ..., n\}, i.e. non-negative nn-tuples summing to 1. The sequence of spaces (Δ n)(\Delta_n) forms an operad, which is to say that there is a natural “composition” map Δ n×Δ k 1××Δ k nΔ k 1++k n \Delta_n \times \Delta_{k_1} \times \cdots \times \Delta_{k_n} \to \Delta_{k_1 + \cdots + k_n} for each nn, k 1k_1, …, k nk_n. The compositivity axiom says something about how entropy interacts with the composition in this operad.

Posted by: Tom Leinster on March 3, 2011 12:41 AM | Permalink | Reply to this
Read the post Characterizing the Generalized Means
Weblog: The n-Category Café
Excerpt: A new characterization of generalized means?
Tracked: March 3, 2011 2:43 AM

Re: Characterizing the p-Norms

Carlos Palazuelos in Madrid just got in touch to tell me that the same result also appeared in a paper from a few years ago by him, C. Fernández-González, and D. Pérez-García:

The natural rearrangement invariant structure on tensor products, Journal of Mathematical Analysis and Applications 343 (2008), 40-47

(apparently not on the arXiv). The methods used appear to be very different. I’m frantically busy and haven’t had a chance to give it more than the quickest of looks yet.

Posted by: Tom Leinster on April 6, 2011 11:52 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

As you said in the original post, it seems amazing that such a characterization wasn’t known 50 years ago, so it’s particularly intriguing that it appears to have been discovered twice so recently.

However, the results are not exactly the same. The hypotheses of the main theorems are somewhat different, and if you pay close attention, you can tell that they can’t be directly comparable, because Aubrun and Nechita’s characterization includes the case p=p = \infty, whereas Fernández-González, Palazuelos, and Pérez-García only characterize L pL_p for finite pp. (That surprised me at first, because the hypotheses of A–N appeared stronger to me – “isometric” in character where F-G–P–P-G’s hypotheses are “isomorphic”. I think the key difference is an “onto” requirement in the latter result.) The methods are indeed very different. Aubrun and Nechita’s approach seems more elementary to me; they also have a more unified approach to the separate cases of p\ell_p and L pL_p.

Posted by: Mark Meckes on April 7, 2011 2:42 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

Thanks, Mark. (And: nice use of dashes and hyphens!)

One question I’d like to have answered is whether the p\ell_p theorem in A and N’s paper follows easily from the results of F-G, P and P-G. (And if so, how?) What you say about p=p = \infty suggests that the answer is no. But what Carlos wrote to me suggests that the answer is yes. I’m hoping that Carlos will appear here and clarify this.

In any case, whether or not the F-G–P–P-G results imply precisely the p\ell_p theorem in A–N, I’d like to understand the relationship between the results in the two papers.

Posted by: Tom Leinster on April 7, 2011 8:21 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

I think I was being silly about the case p=p=\infty. The F-G–P–P-G result for L pL_p probably has an implicit separability assumption; their p\ell_p result includes c 0c_0 (the completion of c 00c_{00} in the \ell_\infty norm) as one case. So maybe the A–N result does follow as a corollary.

Posted by: Mark Meckes on April 8, 2011 12:48 AM | Permalink | Reply to this

Re: Characterizing the p-Norms

Hey everybody,

I’m sorry but I couldn’t write before. Mark is totally right in his two comments. In our paper, we were interested in studying (tensor product) spaces with symmetric basis (in the discrete case). That is, given two spaces XX and YY with symmetric basis ((x n) n(x_n)_n, (y n) n(y_n)_n resp.), we studied when the product basis (x ny m) n,m(x_n\otimes y_m)_{n,m} is a symmetric basis of the tensor product X αYX\otimes _\alpha Y for some norm α\alpha. Of course this requires that the spaces XX and YY have basis, so they are separable. Thus we cannot deal with \ell_\infty (but we have to work with c 0c_0). On the other hand, the result one can obtain from Theorem 1.1 in A and N’s paper is exactly that (the completion of c 00c_{00} under the infinite norm is c 0c_0). We didn’t think about the non separable case and I don’t even know how the problem should be stated. But one could probably find the suitable result in that case.

You were also right in the continuous case. The fact that we were looking for surjectivity in (Theorem 3.1, F-G-P-P-G) implies that we have to rule out the case p=p=\infty. This is explained in (Lemma 3.3, F-G-P-P-G). However, if one is just interested in the conditions required in A and N’s paper, the same result follows from our Proposition 3.6 (which covers the case p=p=\infty). In particular, the results given in A-N can be obtained from F-G-P-P-G.

To understand the real connection between both works just not that A-N’s paper is dealing with multiplicative and symmetric norms (see Remark 2.2) and we are dealing with cross norms α\alpha in the tensor product, so multiplicative in their language, such that the product basis is symmetric (and analogously in the continuous case with rearrangement invariant structure). The problems can look slightly different at first sight. For instance, in A-N’s paper they start talking about just one norm, while in our problem we have a norm in XX and a norm in YY and we had to show that both norms must be equal (which is completely trivial in the discrete case, but not in the continuous one). But all these things would imply minor modifications of the results and both problems are exactly the same. I have to point out that we didn’t study the non commutative case as in Section 4.2 in A-N’s work.

Finally, I completely agree with the idea that A-N is “more self contained”. Actually, as we told Ion (Nechita) we though their proof is really interesting because of the techniques they use. We used results in Banach Space theory. Probably more complicated but also making the problem much easier to solve. Regarding this point, one could think that, even when the result was not stated in the literature, the main specialist in the field wouldn’t find this result surprising at all.

I hope I’ve clarified something!

Posted by: Carlos Palazuelos on April 9, 2011 1:20 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

Thanks, Carlos!

Posted by: Mark Meckes on April 11, 2011 2:32 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

Thanks from me too. Sorry not to have anything more substantial to say in reply at the moment. I need to spend some time looking at your paper, but when I do, it will be very useful to have your comments above to refer to.

Posted by: Tom Leinster on April 11, 2011 2:36 PM | Permalink | Reply to this

Re: Characterizing the p-Norms

I updated my paper on means to include references to the paper by Carlos and co.

Posted by: Tom Leinster on June 10, 2011 1:11 AM | Permalink | Reply to this

Post a New Comment