Progic V
Posted by David Corfield
I’ve come across something promissing for the Progic project. Apparently there is a way to complete the analogy:
propositional logic : predicate logic :: Bayesian networks: ?
The answer, it is claimed, is ‘probabilistic relational models’. Now before we take a look a them, it would be worth considering in what sense Bayesian networks are propositional.
And before that we need to know what a Bayesian network is. Well, first of all, it’s a directed acyclic graph (DAG), so a graph with directed arrows, and no cycles. Each vertex corresponds to a variable which may take any of a set of values, e.g., , taking values in . A ‘parent’ of vertex is a vertex with an edge pointing from it to . Then the final ingredient for a Bayesian network is a conditional probability distribution for each vertex given its parents, .
A Bayesian network with vertices, say, , gives a joint probability distibution over the variables. We know we can factorise any joint distribution as, say,
This corresponds to a network with arrows from to , and , from to and , and from to . is childless.
Of course, we could permute the variables and have, say, the childless vertex. But what we hope happens is that there are some independencies between variables. So for instance, perhaps and . Or in words, is independent of conditional on , and is independent of conditional on and . Then a DAG may be drawn to represent these independences, with arrows going from to and , and arrows converging from and to . The distribution is encoded far more efficiently if the DAG is sparse.
If Bayesian networks are to be thought of as ‘propositional’, you’d imagine it’s because each variable is an atomic proposition. But this suggests that each vertex ought to be binary valued. While there are binary Bayesian networks, there is no need for them to have this property. Even restricting to discrete variables, they may take values in a set of any countable cardinality. One natural response is to wonder why in propositional logic we restrict ourselves to pairs of propositions, , precisely one of which is true. Don’t we encounter situations where we’re reasoning about an entity which may be in one of three states, say, a gear stick in drive, neutral or reverse? Why treat them as three separate atoms, only to add the premise that precisely one holds?
Another issue for us at the Café is that a Bayesian network seems to be very much like an arrow from the product of the parentless vertices to probability distributions over the product of the childless vertices, an arrow in the Kleisli category. This suggests that if we have two networks, the childless vertices of the first matching the parentless vertices of the second, then they could be composed. So with a pair of simple graphs, and , we would have:
What doesn’t quite fit however is that a Bayesian network represents a distribution over the product of all the node variables. In the case above we ought to have composed a joint distribution over and with one over and resulting in one over and .
Maybe the issue is that there’s an invisible 1, out of which arrows feed into the parentless vertices, giving an (unconditional) distribution over them. If so, one ought not to compose as I suggested as the domain and codomain would not match.
If we include these unwritten arrows, e.g., as a network representing a joint distribution over and , the final node doesn’t mention . So we might think to draw an edge out of the parent to a copy of itself at the bottom (right, here) of the network, with obvious distribution . Then we’d have a network .
Something a bit odd is going on here. Should we be after multispans, rather than arrows in a monoidal category?
Re: Progic V
I guess an important part of what’s going on here is that there is no category with sets as objects, and morphisms joint probability distributions over the product of domain and codomain.
In a category like Rel we can compose and to give a relation . But given distributions over and , there’s no obvious composition. It’s only when we have conditional distributions and that we can compose, in the Kleislian way.
So maybe the graphical representation of the Bayesian network is rather misleading.