Safeguarded AI Meeting

August 14, 2025

Posted by John Baez

$MathML-enabled post (click for more details).$

This week, 50 category theorists and software engineers working on “safeguarded AI” are meeting in Bristol. They’re being funded by £59 million from ARIA, the UK’s Advanced Research and Invention Agency.

The basic idea is to develop a mathematical box that can contain a powerful genie. More precisely:

By combining scientific world models and mathematical proofs we will aim to construct a ‘gatekeeper’, an AI system tasked with understanding and reducing the risks of other AI agents. In doing so we’ll develop quantitative safety guarantees for AI in the way we have come to expect for nuclear power and passenger aviation.

$MathML-enabled post (click for more details).$

This program director is David Dalrymple, and you can get a much better description of the project from him in the first 4 minutes here:

David Dalrymple on safeguarded, transformative AI.

It’s remarkable how many of the applied category theorists in the UK are involved in this. Here you can find a partial list:

Category theorists in AI.

If you’re wondering “why category theory?”, I think the idea is this: software based on general abstract math is more flexible, yet also easier to formally verify.

For example the Topos Institute, run by my former student Brendan Fong, now has a branch in the UK largely funded by ARIA. At the meeting, Topos is demonstrating how to build models in CatColab, their new category-based software.

I have decided not to be part of this project, though some of my math is getting used here. I’ve always preferred to avoid doing things connected to AI, for various reasons. But this project might make AI better. It could also have various bad effects. I have no idea how successful it will be, so I’m watching with fascination and profoundly mixed emotions.

Posted at August 14, 2025 12:06 PM UTC

TrackBack URL for this Entry: https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3613

11 Comments & 0 Trackbacks

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

FYI, Motivating coherence-driven inference via sheaves

Posted by: Steve Huntsman on August 14, 2025 12:38 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

John, I’d be interested to hear you expand on the last paragraph.

All AI developments are at least potentially dual use, in that a technique intended to promote safety may end up promoting capabilities including capabilities to do dangerous things. RLHF is often cited as an example of this.

Still, if the aim to promote safety surely on balance you’re more likely to than someone trying to promote capabilities. So sure you’re only promoting safety in expected value, but most things in life are like that.

Posted by: Colin Rust on August 16, 2025 3:24 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

I hadn’t expanded on my thoughts about category theory in AI because it’s a very emotional subject for me and I didn’t want to contaminate my reporting of a potentially momentous event with too much of my own perspective. But okay:

I agree that getting category theorists to help with “safeguarded AI” is on average safer than getting them to help develop powerful AI capabilities. So I’m happy that ARIA is bankrolling lots of UK category theorists to do the former rather than the latter.

But category theorists have many other choices they could make, like staying out of AI altogether. This too can take many forms. For example they could stay out of “applied category theory” altogether, or focus on applications they consider likely to be beneficial, like developing measures of biodiversity (as Tom Leinster did, for a time), or developing math that will help people design software for modeling in epidemiology (as I’m doing).

It’s true, however, that category theory is so flexible that math developed for modeling in epidemiology can be repurposed for AI. In fact that’s approximately already what’s happening. (More precisely, my work on structured and decorated cospans, which get used by our team for epidemiological modeling, also served as an inspiration in David Jaz Myer’s Double Operadic Theory of Systems, which seems poised to play a big role in of ARIA’s work on safeguarded AI.)

So, one question is whether one should try to do powerful applied mathematics and try to nudge it toward beneficial applications, or avoid it altogether if the possible negative consequences are large — or just stop worrying, develop it, and let history sort out what it gets used for. I’m reluctant to take the third approach, which seems like wilfully avoiding responsibility for the consequences of my actions. So I’m wavering between the first two.

For example I could have a lot of fun working on the math of music theory, and in fact I’ve done plenty of this and plan to do more. But doing only that, and similar safe and beautiful things, feels like shirking my responsibility.

Soon the conversation becomes about utilitarianism versus virtue ethics, and also about “why do you think anything you do matters enough to worry about?” You see why I was reluctant to start talking about this stuff! There are a lot of issues here.

Posted by: John Baez on August 17, 2025 11:40 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

I quote the comment I left on John’s blog about this:

I’m part of SGAI and I believe this is the part of ‘AI’ that can realistically be transformative (for the better!) in the short to medium term, rather than just glorified chatbots and automated misinformation. AI used to be a much broader field until genAI companies seized the narrative, we should be smarter than that and refrain from throwing away the proverbial baby with the bathwater.

At the end of the day, we are doing next-gen control theory: how do we leverage the unreasonably effective methods of machine learning to manage and optimize safety-critical processes? Lots of industrial (and not) processes are chronically inefficient as of now, think of balancing electric grids—a task that is done mostly by expert humans now—so much of the ecological transition hinges upon getting that right.

The part of the SGAI programme category theorists are involved in is producing a toolkit for compositional modeling and verification. A lot of it is putting together already existing things: classical and neural model checking, Courser-Baez compositionality for systems (albeit in the newfangled ‘double operadic’ framework), probabilistic and quantitative languages, formalization and program verification, etc. The innovative part is mostly in doing all of it at once. This can be a great showcase of the merits of category theory for cyberphysical applications.

Perhaps I’m young and delusional, I’m certainly not so naive to not understand the potential misuses of such a technology, but I seriously believe more good than bad things will come out of all this. As always, tech empowers people. It is the political task of us all to ensure that power is used well.

To add to John’s comment here: my belief is that we (and I mean all category theorists here) work at a level of abstraction that makes it impossible to exclude nefarious purposes from the downstream applications. And seclude ourselves in our ivory towers is an unacceptable waste of the privilege (and the resources) society entrusted us with. Thus the best we can do is frame our work positively, to encourage and inspire good applications rather than bad ones, as well as being politically active.

Posted by: Matteo Capucci on August 18, 2025 7:15 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

Thx for sharing this development. I am still looking into materials, but they are giving me impression, that more are less they accept NNs as black box things that can be trained on synthetic data (provided by the ethical and normative world models) and whose output can be filtered by the world models. Though, I feel that this control is not enough, NNs can be deceptive and their optimization can require some kind of microscopic theory of NNs.

I guess, there are open questions what happens in NNs? Whether some specific algorithms live in them (as categorical deep learning approaches), or maybe full theories with lot of algorithms live in them, or maybe they (NNs) are just models or embeddings of models of some theories. In the last case - knowing - that toposes host every model of type theories (even HOL, HoTT), NNs (or their Euclidean? spaces of activations of neuronal layers) as the embeddings or some other kind of translation of toposes.

“Toposes and stacks of deep neural networs” was promising article in this direction, but, sadly, it has not taken off follow ups.

It may be quite impossible to get some results from statement “theory lives in NN”, but theorems in the form “if theory (e.g. as contextual category) is having such and such invariants and particular NN architecture is having such and such invariants, then 1) the learning rate is bounded by… some formula on them; 2) the output if having such and such invariants and they can be used for checking compliance wrt norms (again some theory with such and such invariants)” could be attained, I guess.

The model theory/neural network interface or embedding in activation spaces can be the hardest thing, because Euclidean spaces are not capable to keep toposes, toposes can not be embedded in Euclidean spaces. Actually, it is quite funny, because one expects the neural networks to express everything, and yet - no logic, that is living in toposes, can not be embedded inside them.

So, such kind of microscopic theory of NNs could be the real solution to the explainability, manageability and controllability of NNs. It could be the complete solution to the safety problem, so, Prof. Baez could join in such efforts without any reservations. Yet, of course, if such explainability is achieved for NNs then it can be applied to human brains as well and, huh, I guess there could be some benefits and lot of problems as well.

Posted by: Alex on August 16, 2025 10:17 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

From his interview, I get the feeling David Dalrymple is wisely being carefully agnostic about how powerful AI will work. His idea is that you feed the genie in a box a well-specified problem, it figures out the answer, the box checks the answer, and only if the answer is appropriate does the box let the answer out. He is not concerned with the design of the genie. He is trying to design the box.

Safeguarded AI is targeting what what I call “the critical period in the middle” where we’re dealing with systems that are to powerful to just deploy on the internet, or even for humans to talk to because they might be very persuasive about very damaging memes, or they might emit unstoppable malware if they’re connected to the internet, but it’s not like they’re so smart that it would be impossible to contain them if you actually tried. And so the question that I’m trying to answer in safeguarded AI is how could we make use of a system in this middle level of super intelligence without creating unacceptable risk. And the the answer is — in a sentence — that we create a workflow around this contained system which is a general purpose way of of using it to do autonomous R&D of special purpose agents that have quantitative safety guarantees relative to specific contexts of use.

I still can’t join in this project “without any reservations” because, well, I’m not that kind of guy. There is always something to worry about.

Posted by: John Baez on August 17, 2025 11:54 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

In 1954, Frederick Brown wrote a science fiction story titled “The Answer”. It’s so short I can reproduce it here.

“Dwan Ev ceremoniously soldered the final connection with gold. The eyes of a dozen television cameras watched him and the subether bore throughout the universe a dozen pictures of what he was doing.

He straightened and nodded to Dwar Reyn, then moved to a position beside the switch that would complete the contact when he threw it. The switch that would connect, all at once, all of the monster computing machines of all the populated planets in the universe – ninety-six billion planets – into the supercircuit that would connect them all into one supercalculator, one cybernetics machine that would combine all the knowledge of all the galaxies.

Dwar Reyn spoke briefly to the watching and listening trillions. Then after a moment’s silence he said, “Now, Dwar Ev.”

Dwar Ev threw the switch. There was a mighty hum, the surge of power from ninety-six billion planets. Lights flashed and quieted along the miles-long panel.

Dwar Ev stepped back and drew a deep breath. “The honor of asking the first question is yours, Dwar Reyn.”

“Thank you,” said Dwar Reyn. “It shall be a question which no single cybernetics machine has been able to answer.”

He turned to face the machine. “Is there a God?”

The mighty voice answered without hesitation, without the clicking of a single relay. “Yes, now there is a God.”

Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch. A bolt of lightning from the cloudless sky struck him down and fused the switch shut.”

This is what many people are terrified about when it comes to AI but this is utterly ridiculous. AI is not going to turn into a god that will enslave humanity. AI has no trace of consciousness what’s so ever. it has no more consciousness than a chair in your living room. The problem is people mistaking AI for a person, when it not a person at all. It is easy to fool people into thinking they are talking to a person. It is the very people who are raising the alarm bells about AI who are the ones being most fooled. This problem will get worse with increasingly realistic AI generated video avatars, and lifelike humanoid androids. So what do we do? You try to educate the public that this thing is not a person. There are things we should be worried about. We should not allow military drones to use lethal force without a human signing off on it. People in self-driving cars need to pay attention,and be ready to take control at any time. There is a real problem in education where students use AI to cheat. Does that mean we can’t have homework because the teacher can’t know whether the student used AI? This is a problem in physics education, which you can read about here.

https://physics.aps.org/articles/v18/147

Posted by: Jeffery Winkler on August 20, 2025 4:55 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

Here are some more links about the connections between physics and AI.

Physics Today - Nobel Prize

https://pubs.aip.org/physicstoday/article-abstract/77/12/12/3320663/Nobel-Prize-highlights-neural-networks-physics

2 Physics Today - Future of Discovery

https://pubs.aip.org/physicstoday/article-abstract/77/11/30/3318195/Physics-AI-and-the-future-of-discoveryLeaders-from

APS News - Climate Change

https://www.aps.org/apsnews/2025/06/ai-could-shape-climate-science

Physics Focus - Students Using AI to Cheat

https://physics.aps.org/articles/v18/147

APS News - Grading Tests

https://www.aps.org/apsnews/2025/07/what-happens-ai-grading

Wired - Drug Development

https://www.magzter.com/stories/science/WIRED/THE-DOSE-IN-THE-MACHINE

Posted by: Jeffery Winkler on September 3, 2025 6:27 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

Here is a physics paper partially written using ChatGPT.

https://arxiv.org/pdf/2508.21276

This will be more common in the future. Is this good or bad for physics?

Posted by: Jeffery Winkler on September 3, 2025 6:55 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

In all cases, the line-by-line proof details presented here were constructed by the author. It seems important to point out that GPT5 was not reliable in providing proof details. In several cases during the present project, prompting of GPT5 to produce some detailed part of a proof gave results that were sloppy or incorrect. In one situation, the model supplied two alternative “proofs” for a conjecture that turned up to be false.

I wouldn’t go so far as to say the “paper was partially written using ChatGPT”, but that it was used as a tool to help build code and suggest ideas, which the author used as a springboard.

Posted by: David Roberts on September 4, 2025 6:26 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

$MathML-enabled post (click for more details).$

I am writing this (maybe too late but) in the hopes of an open-minded reception, but I do have to say that it is not without deeper reason that this meeting’s description bears a striking similarity to the premise of several dystopian artworks. What is called AI is an expression of the crisis of capitalism (the value of society, labor, both incentivizing and undermining itself through automation) manifesting itself in increasingly sophisticated ways, but because this crisis is essential to capitalism, any attempt to control it within capitalism is not just doomed to fail but to become an authoritarian nightmare, expanding evermore in a vain effort to control the thing that feeds it.

But on a deeper level, have we become so ideologically decrepit that all we can think of anymore is how to control and safeguard, to limit and contain? It seems obvious to me that the power of the state would be directed against any serious socialist movement, so ideological justifications for increases in state (or capitalist) power are inherently counter-revolutionary, but the global nature of capitalism makes nation states incapable of managing it. So if I were an outsider I would ask: is it really for humanity we fear or do we mistake humanity for us? We scientists have created the gunpowder that brought China to fall, displaced billions, we’ve revolutionized every industry there is, and now we want to call it quits as our work affects us? The reflection we see in our terminals a lot more workers had seen in the machines we designed.

Even if we stretch the case for the extraordinariness of AI to its limits, they do not justify an abandonment of the quest for freedom, because freedom is what perpetuates society, literally, and society, the self-consciousness of nature, is more important than humanity, its reflection in a particular species: if the machines were to wipe us all out, what difference would it make in evolutionary terms that we were just the cocoon a beautiful mechanical butterfly had to push through? But this is not so: the butterfly is society and the cocoon is capitalism. Were the machines to take over, they would be taking over society, and with that capitalism. Our narrow humanitarianism constantly underestimates the resilience of society: it is by far the most adaptable superorganism in existence. It could easily lose a majority of its members and continue. If the US and Russia were to exchange all of their nukes tomorrow, the highest estimate would be that this kills 5 billion people, leaving 2 billion, more than at almost any point in humanity’s existence — these would still be bound by capitalism.

So whatever the future holds, I’m quite sure that there will be one, and that, in the absence of an international working-class socialist movement that could phase out capitalism, that it will be capitalist. But I think we all know such a movement would be deemed unsafe by the people designing the safeguards.

Posted by: Alexander Prähauser on November 27, 2025 4:08 AM | Permalink | Reply to this

The n-Category Café

Skip to the Main Content

August 14, 2025