I’d been thinking about Stirling’s formula recently, and then remembered this Café posting. I’ll take this opportunity to record another reasonably simple proof of Stirling’s formula. The stuff in the beginning is really well-known and appears in many calculus textbooks. The end is not something I’ve seen put quite this way before, although the basic plan of attack seems fairly natural to me.
Trapezoidal sum
My own favorite way of remembering Stirling’s formula is by working through a trapezoidal sum estimate for the integral
where is of course the natural logarithm. Partitioning the interval into subintervals of unit length, the corresponding trapezoidal estimate is
is an underestimate of because the graph of is concave down. In other words, the error terms
are positive, and it follows that the sum of these error terms from to , which is , increases with . However, using the error term estimate for the trapezoidal rule, the term (1) is on the order of (the maximum value of over ). Since converges, we see that the increasing sequence has a uniform upper bound and therefore a limit. Hence
exists.
We have thus shown that for some constant .
Connection with Wallis’s formula
It remains to see that . By some scaling analysis, leads to
We have
and now by multiplying the previous two lines,
This last is recognizable as a product appearing in a ‘Wallis formula’ for . One way of evaluating its limit is first by rewriting it as
and then by using the product formula for the sine function,
But the product formula is a whole other story, involving things like the reflection formula for the Gamma function. Let’s not get too sidetracked: remember that the original goal was to connect directly to the Gaussian integral
and that’s what we should do here as well.
Calculation with Gaussian integrals and the Gamma function
Let’s start afresh. The Gaussian integral is a somewhat protean object, and is inevitably connected with the Gamma function, for example with the value . Indeed,
(substitute for ) and here I think we may take it for granted that we know the classic and beautiful proof that
by squaring both sides and changing to polar coordinates. We can also take for granted the functional equation
Thinking asymptotically, one may then guess the interpolation
Let’s see where this takes us. For large integers , consider
We know the limit of (3) as exists, based on our similar expression for :
All that remains is to show the limit of (3) is (as we suspect from our asymptotic guess), because this is equivalent to . For that, all we need to do is show that we get the same limit
for half-integers as we do for whole integers , for then
implies , whence : exactly what we want.
The crucial fact we need here is the log-convexity of , i.e., the fact that the function is convex over . For the moment we take this for granted. The immediate consequence of log-convexity is that for ,
so that
This means that these values for half-integers are nestled between values for whole integers on either side.
It follows easily that we get the same limit (4) for half-integers as we do for whole integers. And now we’re done.
Log-convexity of Gamma function
This is actually not very difficult, but the basic fact on which it rests is in some sense profound:
Theorem: Log-convex functions form a rig over the rig of nonnegative real numbers, closed under sups.
That log-convex functions are closed under pointwise multiplication is very easy. Another easy fact that can be left as an exercise is that log-convex functions are closed under taking pointwise suprema. But the real kicker is that log-convex functions are closed under pointwise addition!
I’ll just point to the nLab for that. In essence, it boils down to an application of Hölder’s inequality.
Note that for fixed positive , the function is log-convex (it is log-linear after all). The integral
can be expressed as a sup over Riemann sums
which are positive linear combinations of log-convex functions. It follows from the theorem that is log-convex.
One last thing I’ll recall is that is uniquely characterized as a real-valued function defined for all real except and satisfying log-convexity, the functional equation , and . This is the Bohr-Mollerup theorem. Various convexity arguments are put to use with a kind of terrifying vise-like effect, to squeeze down to a single possibility. Emil Artin made the Bohr-Mollerup characterization the centerpiece of his famous notes on the Gamma function, which may be found here.
Afterword
While I was thinking about this and researching this, I came across some interesting articles. One of them, “From the Wallis formula to the Gaussian distribution” by Mark Reeder, appears to be a write-up for students that is rich in historical notes, for example following the trail of Wallis in his amazing efforts toward “quadrature of the circle”, and connecting up with Euler, Gauss, and even Stieltjes. I didn’t know, for instance, that the symbol was first introduced by a Welsh mathematician named William Jones in 1706, after Wallis had died. (Sheesh, really? What did the old codgers like Newton and Leibniz use?) Wallis himself denoted the quantity we’d call by a symbol which, if I had to name it in TeX, I’d call \boxdot! A box with a dot inside it.
Another is a very short proof of Stirling’s formula due to Dan Romik that appeared in the American Mathematical Monthly. Normally I’m not a huge fan of “World’s Shortest Proofs”, like Don Zagier’s “one-line proof” of the two-squares theorem, because (a) usually they look a little show-offy, and (b) they often leave a lot of work to the reader to actually grok the damned thing. Rabbits out of hats and so forth. [Incidentally, a much kinder explanation of Zagier’s proof can be found somewhere on YouTube, maybe by Mathologer; I forget. IIRC it’s about 30 minutes long, but it really gives the intuition, which is indeed lovely.] But anyway, in the case of Romik, I’ll admit that my jaw dropped. To me his proof has a very Euler-MacLaurin feel to it, but very slickly arranged/optimized.
Finally, I’ll mention A Probabilistic Proof of Wallis’s Formula for by Stephen J. Miller, which involves Student’s t- and seems to have something in common with Romik’s set-up, although I’d need to think longer to be clear on that.
Re: Stirling’s Formula
Nice! Now that you say it, I suppose it’s inevitable that the in Stirling’s formula comes from the integral of .
Typos: just before the heading “Laplace’s method”, there are a couple of equals signs that should be .