Benjamin Sturgeon website - Benjamin Sturgeon website

Me in 10 seconds

I'm a MATS fellow, currently doing the MATS extension working with David Africa, Sid Black and more recently Daniel Tan. I'm also collaborating with the Center on Long-Term Risk through their summer fellowship programme. I'm currently based in London. I'm interested in personas and understanding phenomena like emergent misalignment in large language models, as well as how models think about themselves.

I'm also strongly involved with AI Safety South Africa as cofounder and strategic director, growing the AI safety community in Cape Town.

I welcome feedback on how I'm doing! If you'd like to share, please feel encouraged to do so using this feedback form.

Me in many seconds

link to my about page

Now Now Now

What I'm doing now

Writing

A Case for Persona Robustness as a Research Area

Contributing to Technical Research in the AI Safety End Game

Lessons from my first 10 day Vipassana

Do Models Know They're Lying When Claiming Fake Identities?

Whole Brain Emulation as an Anchor for AI Welfare

How South Africa's Electricity Catastrophe Was (Mostly) Fixed

Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?

The Garden (Complete)

Projects

Inkhaven: 30 Days of Posts

Building and training a word embedding system

Creating a small GPT from scratch in Pytorch

Adding vision and navigation to an autonomous farm robot

Developing toy models of agency. A mechanistic interpretability project.

Talks

An intro to AI Safety

Papers

When Roleplaying, Do Models Believe What They Say?

Investigating Factored Cognition in Large Language Models For Answering Ethically Nuanced Questions

A Security Analysis of the Linux RNG Protocol in Virtual Machines

HumanAgencyBench: Do Language Models Support Human Agency?

Pictures

My Resumé

Download my resumé

Connect