## December 6, 2007

### Progic V

#### Posted by David Corfield I’ve come across something promissing for the Progic project. Apparently there is a way to complete the analogy:

propositional logic : predicate logic :: Bayesian networks: ?

The answer, it is claimed, is ‘probabilistic relational models’. Now before we take a look a them, it would be worth considering in what sense Bayesian networks are propositional.

And before that we need to know what a Bayesian network is. Well, first of all, it’s a directed acyclic graph (DAG), so a graph with directed arrows, and no cycles. Each vertex corresponds to a variable which may take any of a set of values, e.g., , $A$ taking values in $\{a_i\}$. A ‘parent’ of vertex $A$ is a vertex with an edge pointing from it to $A$. Then the final ingredient for a Bayesian network is a conditional probability distribution for each vertex given its parents, $Pr(a_i | Pa(A)_j)$.

A Bayesian network with vertices, say, $A, B, C, D$, gives a joint probability distibution over the variables. We know we can factorise any joint distribution as, say,

$P(a, b, c, d) = P(d) \cdot P(c | d) \cdot P(b | c, d) \cdot P(a | b, c, d).$

This corresponds to a network with arrows from $D$ to $A$, $B$ and $C$, from $C$ to $A$ and $B$, and from $B$ to $A$. $A$ is childless.

Of course, we could permute the variables and have, say, $D$ the childless vertex. But what we hope happens is that there are some independencies between variables. So for instance, perhaps $P(b | c, d) = P(b | d)$ and $P(a | b, c, d) = P(a | b, c)$. Or in words, $B$ is independent of $C$ conditional on $D$, and $A$ is independent of $D$ conditional on $B$ and $C$. Then a DAG may be drawn to represent these independences, with arrows going from $D$ to $B$ and $C$, and arrows converging from $B$ and $C$ to $A$. The distribution is encoded far more efficiently if the DAG is sparse.

If Bayesian networks are to be thought of as ‘propositional’, you’d imagine it’s because each variable is an atomic proposition. But this suggests that each vertex ought to be binary valued. While there are binary Bayesian networks, there is no need for them to have this property. Even restricting to discrete variables, they may take values in a set of any countable cardinality. One natural response is to wonder why in propositional logic we restrict ourselves to pairs of propositions, $\{P, \neg P\}$, precisely one of which is true. Don’t we encounter situations where we’re reasoning about an entity which may be in one of three states, say, a gear stick in drive, neutral or reverse? Why treat them as three separate atoms, only to add the premise that precisely one holds?

Another issue for us at the Café is that a Bayesian network seems to be very much like an arrow from the product of the parentless vertices to probability distributions over the product of the childless vertices, an arrow in the Kleisli category. This suggests that if we have two networks, the childless vertices of the first matching the parentless vertices of the second, then they could be composed. So with a pair of simple graphs, $C \to B$ and $B \to A$, we would have:

$P(a_i | c_k) = \sum_{b_j}P(a_i | b_j) \cdot P(b_j | c_k).$

What doesn’t quite fit however is that a Bayesian network represents a distribution over the product of all the node variables. In the case above we ought to have composed a joint distribution over $C$ and $B$ with one over $B$ and $A$ resulting in one over $C, B$ and $A$.

Maybe the issue is that there’s an invisible 1, out of which arrows feed into the parentless vertices, giving an (unconditional) distribution over them. If so, one ought not to compose as I suggested as the domain and codomain would not match.

If we include these unwritten arrows, e.g., $1 \to C \to B$ as a network representing a joint distribution over $B$ and $C$, the final node doesn’t mention $C$. So we might think to draw an edge out of the parent $C$ to a copy of itself at the bottom (right, here) of the network, with obvious distribution $P(c_i | c_j) = \delta_{ij}$. Then we’d have a network $1 \to C \to B \times C$.

Something a bit odd is going on here. Should we be after multispans, rather than arrows in a monoidal category?

Posted at December 6, 2007 12:16 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/1524

### Re: Progic V

I guess an important part of what’s going on here is that there is no category with sets as objects, and morphisms joint probability distributions over the product of domain and codomain.

In a category like Rel we can compose $R: A \to B$ and $S: B \to C$ to give a relation $S \cdot R: A \to C$. But given distributions over $A \times B$ and $B \times C$, there’s no obvious composition. It’s only when we have conditional distributions $P(B | A)$ and $P(C | B)$ that we can compose, in the Kleislian way.

So maybe the graphical representation of the Bayesian network is rather misleading.

Posted by: David Corfield on December 7, 2007 10:11 AM | Permalink | Reply to this

### Re: Progic V

Would we ever represent a relation between, say, sets $A, B, C, and D$, as the subset of $D$ featuring in the relation, followed by for each $d$ the subset of $C$, for each $d$ and $c$ the subset of $B$, and for each $d, c, and b$ the subset of $A$?

Posted by: David Corfield on December 14, 2007 9:24 AM | Permalink | Reply to this

### Re: Progic V

Ooh, Coecke and Spekkens have brought their category theoretical picture calculus to bear on Bayesian inference here.

Posted by: David Corfield on February 14, 2011 8:33 PM | Permalink | Reply to this

### Re: Progic V

My, don’t things grow up! See what they can do with Bayesian networks and category theory these days:

• Tobias Fritz, Andreas Klingler, The d-separation criterion in Categorical Probability (arXiv:2207.05740)
Posted by: David Corfield on July 22, 2022 9:50 AM | Permalink | Reply to this

### Re: Progic V

It’s quite intriguing to be reading this post ~15 years later. David wrote in 2007:

If we include these unwritten arrows, e.g., $1 \to C \to B$ as a network representing a joint distribution over $B$ and $C$, the final node doesn’t mention $C$. So we might think to draw an edge out of the parent $C$ to a copy of itself at the bottom (right, here) of the network, with obvious distribution $P(c_i|c_j)=\delta_{ij}$. Then we’d have a network $1 \to C \to B \times C$.

That’s actually exactly how it works! It’s one of the many uses of the “copy” morphisms that seem to be so ubiquitous in probability theory. For Bayesian networks, this idea isn’t ours, but has already been implemented in Brendan Fong’s 2012 MSc thesis.

Posted by: Tobias Fritz on July 26, 2022 10:56 AM | Permalink | Reply to this

### Re: Progic V

Yes, a pleasing anticipation. Applied category theory before its time.

By the way, given your interest both in $d$-separation, and so causality, and in De Finetti’s theorem done category-theoretically, how about a category-theoretic treatment of Causal de Finetti?

Posted by: David Corfield on July 26, 2022 12:01 PM | Permalink | Reply to this

### Re: Progic V

Thank you for the relevant pointer – I didn’t know about that paper. I wonder how it’s related to A generalization of hierarchical exchangeability on trees to directed acyclic graphs. In any case, this is indeed one of the directions that we have in mind, and on which my coauthor Tomáš Gonda has some partial results.

Posted by: Tobias Fritz on July 26, 2022 12:54 PM | Permalink | Reply to this

Post a New Comment