Infinite-Dimensional Exponential Families
Posted by David Corfield
Back on my old blog I posted a few times on information geometry (1, 2, 3, 4). One key idea is the duality between projecting from a prior distribution onto the manifold of distributions, a specified set of whose moments match those of the empirical distribution, and projecting from the empirical distribution onto the corresponding exponential family. Legendre transforms govern this duality.
Now, one of the most important developments in machine learning over the past decade has been the use of kernel methods. For example, in the support vector machine (SVM) approach to classification, the data space is mapped into a feature space, a reproducing kernel Hilbert space. A linear classifier is then chosen in this feature space which does the best job at separating points with different labels. This classifier corresponds to a nonlinear decision boundary in the original space. The ‘Bayesian’ analogue employs Gaussian processes (GP).
Putting the two ideas together what we need is nonparametric information geometry, involving projection onto infinite-dimensional exponential families. Various people have worked to find maximal exponential families using Orlicz spaces. But to capture SVM and GP methods it looks like we want a more restricted form of exponential family, where the manifold of models is locally isomorphic to a reproducing kernel Hilbert space. Someone working on this, Kenji Fukumizu, is now in Tübingen, so I’m hoping to learn a few things from him over the next few days.
Information geometry, both finite and infinite-dimensional, has a quantum equivalent, see, e.g., here. Does anyone know what the major achievements of this field are?
Re: Infinite-Dimensional Exponential Families
Hi David,
I remember trying to dig up information about this topic while I was in Tuebingen.
I found some interesting work about this in
Pistone, G. and Sempi, C. (1995), `An infinite dimensional geometric structure on the space of all the probability measures equivalent to a given one’, The Annals of Statistics 33(5), 1543-1561.
This is a tough paper, but seems to be a good starting point (there are other works by the authors going in this direction).
Of course, this does not go in the direction of RKHS, but at least lays down the proper foundations for infinite dimensional information geometry (thus generalizing the stuff in Amari’s book).
Cheers,
Olivier.