Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

August 14, 2025

Safeguarded AI Meeting

Posted by John Baez

This week, 50 category theorists and software engineers working on “safeguarded AI” are meeting in Bristol. They’re being funded by £59 million from ARIA, the UK’s Advanced Research and Invention Agency.

The basic idea is to develop a mathematical box that can contain a powerful genie. More precisely:

By combining scientific world models and mathematical proofs we will aim to construct a ‘gatekeeper’, an AI system tasked with understanding and reducing the risks of other AI agents. In doing so we’ll develop quantitative safety guarantees for AI in the way we have come to expect for nuclear power and passenger aviation.

This program director is David Dalrymple, and you can get a much better description of the project from him in the first 4 minutes here:

It’s remarkable how many of the applied category theorists in the UK are involved in this. Here you can find a partial list:

If you’re wondering “why category theory?”, I think the idea is this: software based on general abstract math is more flexible, yet also easier to formally verify.

For example the Topos Institute, run by my former student Brendan Fong, now has a branch in the UK largely funded by ARIA. At the meeting, Topos is demonstrating how to build models in CatColab, their new category-based software.

I have decided not to be part of this project, though some of my math is getting used here. I’ve always preferred to avoid doing things connected to AI, for various reasons. But this project might make AI better. It could also have various bad effects. I have no idea how successful it will be, so I’m watching with fascination and profoundly mixed emotions.

Posted at August 14, 2025 12:06 PM UTC

TrackBack URL for this Entry:   https://golem.ph.utexas.edu/cgi-bin/MT-3.0/dxy-tb.fcgi/3613

10 Comments & 0 Trackbacks

Re: Safeguarded AI Meeting

FYI, Motivating coherence-driven inference via sheaves

Posted by: Steve Huntsman on August 14, 2025 12:38 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

John, I’d be interested to hear you expand on the last paragraph.

All AI developments are at least potentially dual use, in that a technique intended to promote safety may end up promoting capabilities including capabilities to do dangerous things. RLHF is often cited as an example of this.

Still, if the aim to promote safety surely on balance you’re more likely to than someone trying to promote capabilities. So sure you’re only promoting safety in expected value, but most things in life are like that.

Posted by: Colin Rust on August 16, 2025 3:24 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

I hadn’t expanded on my thoughts about category theory in AI because it’s a very emotional subject for me and I didn’t want to contaminate my reporting of a potentially momentous event with too much of my own perspective. But okay:

I agree that getting category theorists to help with “safeguarded AI” is on average safer than getting them to help develop powerful AI capabilities. So I’m happy that ARIA is bankrolling lots of UK category theorists to do the former rather than the latter.

But category theorists have many other choices they could make, like staying out of AI altogether. This too can take many forms. For example they could stay out of “applied category theory” altogether, or focus on applications they consider likely to be beneficial, like developing measures of biodiversity (as Tom Leinster did, for a time), or developing math that will help people design software for modeling in epidemiology (as I’m doing).

It’s true, however, that category theory is so flexible that math developed for modeling in epidemiology can be repurposed for AI. In fact that’s approximately already what’s happening. (More precisely, my work on structured and decorated cospans, which get used by our team for epidemiological modeling, also served as an inspiration in David Jaz Myer’s Double Operadic Theory of Systems, which seems poised to play a big role in of ARIA’s work on safeguarded AI.)

So, one question is whether one should try to do powerful applied mathematics and try to nudge it toward beneficial applications, or avoid it altogether if the possible negative consequences are large — or just stop worrying, develop it, and let history sort out what it gets used for. I’m reluctant to take the third approach, which seems like wilfully avoiding responsibility for the consequences of my actions. So I’m wavering between the first two.

For example I could have a lot of fun working on the math of music theory, and in fact I’ve done plenty of this and plan to do more. But doing only that, and similar safe and beautiful things, feels like shirking my responsibility.

Soon the conversation becomes about utilitarianism versus virtue ethics, and also about “why do you think anything you do matters enough to worry about?” You see why I was reluctant to start talking about this stuff! There are a lot of issues here.

Posted by: John Baez on August 17, 2025 11:40 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

I quote the comment I left on John’s blog about this:

I’m part of SGAI and I believe this is the part of ‘AI’ that can realistically be transformative (for the better!) in the short to medium term, rather than just glorified chatbots and automated misinformation. AI used to be a much broader field until genAI companies seized the narrative, we should be smarter than that and refrain from throwing away the proverbial baby with the bathwater.

At the end of the day, we are doing next-gen control theory: how do we leverage the unreasonably effective methods of machine learning to manage and optimize safety-critical processes? Lots of industrial (and not) processes are chronically inefficient as of now, think of balancing electric grids—a task that is done mostly by expert humans now—so much of the ecological transition hinges upon getting that right.

The part of the SGAI programme category theorists are involved in is producing a toolkit for compositional modeling and verification. A lot of it is putting together already existing things: classical and neural model checking, Courser-Baez compositionality for systems (albeit in the newfangled ‘double operadic’ framework), probabilistic and quantitative languages, formalization and program verification, etc. The innovative part is mostly in doing all of it at once. This can be a great showcase of the merits of category theory for cyberphysical applications.

Perhaps I’m young and delusional, I’m certainly not so naive to not understand the potential misuses of such a technology, but I seriously believe more good than bad things will come out of all this. As always, tech empowers people. It is the political task of us all to ensure that power is used well.

To add to John’s comment here: my belief is that we (and I mean all category theorists here) work at a level of abstraction that makes it impossible to exclude nefarious purposes from the downstream applications. And seclude ourselves in our ivory towers is an unacceptable waste of the privilege (and the resources) society entrusted us with. Thus the best we can do is frame our work positively, to encourage and inspire good applications rather than bad ones, as well as being politically active.

Posted by: Matteo Capucci on August 18, 2025 7:15 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

Thx for sharing this development. I am still looking into materials, but they are giving me impression, that more are less they accept NNs as black box things that can be trained on synthetic data (provided by the ethical and normative world models) and whose output can be filtered by the world models. Though, I feel that this control is not enough, NNs can be deceptive and their optimization can require some kind of microscopic theory of NNs.

I guess, there are open questions what happens in NNs? Whether some specific algorithms live in them (as categorical deep learning approaches), or maybe full theories with lot of algorithms live in them, or maybe they (NNs) are just models or embeddings of models of some theories. In the last case - knowing - that toposes host every model of type theories (even HOL, HoTT), NNs (or their Euclidean? spaces of activations of neuronal layers) as the embeddings or some other kind of translation of toposes.

“Toposes and stacks of deep neural networs” was promising article in this direction, but, sadly, it has not taken off follow ups.

It may be quite impossible to get some results from statement “theory lives in NN”, but theorems in the form “if theory (e.g. as contextual category) is having such and such invariants and particular NN architecture is having such and such invariants, then 1) the learning rate is bounded by… some formula on them; 2) the output if having such and such invariants and they can be used for checking compliance wrt norms (again some theory with such and such invariants)” could be attained, I guess.

The model theory/neural network interface or embedding in activation spaces can be the hardest thing, because Euclidean spaces are not capable to keep toposes, toposes can not be embedded in Euclidean spaces. Actually, it is quite funny, because one expects the neural networks to express everything, and yet - no logic, that is living in toposes, can not be embedded inside them.

So, such kind of microscopic theory of NNs could be the real solution to the explainability, manageability and controllability of NNs. It could be the complete solution to the safety problem, so, Prof. Baez could join in such efforts without any reservations. Yet, of course, if such explainability is achieved for NNs then it can be applied to human brains as well and, huh, I guess there could be some benefits and lot of problems as well.

Posted by: Alex on August 16, 2025 10:17 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

From his interview, I get the feeling David Dalrymple is wisely being carefully agnostic about how powerful AI will work. His idea is that you feed the genie in a box a well-specified problem, it figures out the answer, the box checks the answer, and only if the answer is appropriate does the box let the answer out. He is not concerned with the design of the genie. He is trying to design the box.

Safeguarded AI is targeting what what I call “the critical period in the middle” where we’re dealing with systems that are to powerful to just deploy on the internet, or even for humans to talk to because they might be very persuasive about very damaging memes, or they might emit unstoppable malware if they’re connected to the internet, but it’s not like they’re so smart that it would be impossible to contain them if you actually tried. And so the question that I’m trying to answer in safeguarded AI is how could we make use of a system in this middle level of super intelligence without creating unacceptable risk. And the the answer is — in a sentence — that we create a workflow around this contained system which is a general purpose way of of using it to do autonomous R&D of special purpose agents that have quantitative safety guarantees relative to specific contexts of use.

I still can’t join in this project “without any reservations” because, well, I’m not that kind of guy. There is always something to worry about.

Posted by: John Baez on August 17, 2025 11:54 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

In 1954, Frederick Brown wrote a science fiction story titled “The Answer”. It’s so short I can reproduce it here.

“Dwan Ev ceremoniously soldered the final connection with gold. The eyes of a dozen television cameras watched him and the subether bore throughout the universe a dozen pictures of what he was doing.

He straightened and nodded to Dwar Reyn, then moved to a position beside the switch that would complete the contact when he threw it. The switch that would connect, all at once, all of the monster computing machines of all the populated planets in the universe – ninety-six billion planets – into the supercircuit that would connect them all into one supercalculator, one cybernetics machine that would combine all the knowledge of all the galaxies.

Dwar Reyn spoke briefly to the watching and listening trillions. Then after a moment’s silence he said, “Now, Dwar Ev.”

Dwar Ev threw the switch. There was a mighty hum, the surge of power from ninety-six billion planets. Lights flashed and quieted along the miles-long panel.

Dwar Ev stepped back and drew a deep breath. “The honor of asking the first question is yours, Dwar Reyn.”

“Thank you,” said Dwar Reyn. “It shall be a question which no single cybernetics machine has been able to answer.”

He turned to face the machine. “Is there a God?”

The mighty voice answered without hesitation, without the clicking of a single relay. “Yes, now there is a God.”

Sudden fear flashed on the face of Dwar Ev. He leaped to grab the switch. A bolt of lightning from the cloudless sky struck him down and fused the switch shut.”

This is what many people are terrified about when it comes to AI but this is utterly ridiculous. AI is not going to turn into a god that will enslave humanity. AI has no trace of consciousness what’s so ever. it has no more consciousness than a chair in your living room. The problem is people mistaking AI for a person, when it not a person at all. It is easy to fool people into thinking they are talking to a person. It is the very people who are raising the alarm bells about AI who are the ones being most fooled. This problem will get worse with increasingly realistic AI generated video avatars, and lifelike humanoid androids. So what do we do? You try to educate the public that this thing is not a person. There are things we should be worried about. We should not allow military drones to use lethal force without a human signing off on it. People in self-driving cars need to pay attention,and be ready to take control at any time. There is a real problem in education where students use AI to cheat. Does that mean we can’t have homework because the teacher can’t know whether the student used AI? This is a problem in physics education, which you can read about here.

https://physics.aps.org/articles/v18/147

Posted by: Jeffery Winkler on August 20, 2025 4:55 PM | Permalink | Reply to this

Re: Safeguarded AI Meeting

Here are some more links about the connections between physics and AI.

  1. Physics Today - Nobel Prize

https://pubs.aip.org/physicstoday/article-abstract/77/12/12/3320663/Nobel-Prize-highlights-neural-networks-physics

2 Physics Today - Future of Discovery

https://pubs.aip.org/physicstoday/article-abstract/77/11/30/3318195/Physics-AI-and-the-future-of-discoveryLeaders-from

  1. APS News - Climate Change

https://www.aps.org/apsnews/2025/06/ai-could-shape-climate-science

  1. Physics Focus - Students Using AI to Cheat

https://physics.aps.org/articles/v18/147

  1. APS News - Grading Tests

https://www.aps.org/apsnews/2025/07/what-happens-ai-grading

  1. Wired - Drug Development

https://www.magzter.com/stories/science/WIRED/THE-DOSE-IN-THE-MACHINE

Posted by: Jeffery Winkler on September 3, 2025 6:27 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

Here is a physics paper partially written using ChatGPT.

https://arxiv.org/pdf/2508.21276

This will be more common in the future. Is this good or bad for physics?

Posted by: Jeffery Winkler on September 3, 2025 6:55 AM | Permalink | Reply to this

Re: Safeguarded AI Meeting

In all cases, the line-by-line proof details presented here were constructed by the author. It seems important to point out that GPT5 was not reliable in providing proof details. In several cases during the present project, prompting of GPT5 to produce some detailed part of a proof gave results that were sloppy or incorrect. In one situation, the model supplied two alternative “proofs” for a conjecture that turned up to be false.

I wouldn’t go so far as to say the “paper was partially written using ChatGPT”, but that it was used as a tool to help build code and suggest ideas, which the author used as a springboard.

Posted by: David Roberts on September 4, 2025 6:26 AM | Permalink | Reply to this

Post a New Comment