Benjamin Sturgeon

Me in 10 seconds

I'm a researcher working on AI safety, with a focus on understanding agency, both in AI systems, and to understand how powerful AI will impact human agency. Currently pursuing an MPhil in Applied Mathematics at the University of Cape Town with Jonathan Shock, where I'm studying the application of mechanistic interpretability to deeply understand RL models. I also have strong interest in evaluative frameworks and creative ways that we can learn to measure and understand LLMs.

I am also passionate about teaching and growing the field of AI safety, and co-founded AI Safety Cape Town to contribute to that effort.

Outside of research, I draw inspiration from Buddhist philosophy, Stoicism, and Kantian ethics in thinking about how to make meaningful contributions to the field of AI safety. To recharge I spend time playing beach volleyball, watching anime, reading epic fantasy/sci-fi and hiking.

I welcome feedback on how I'm doing! If you'd like to share, please feel encouraged to do so using this feedback form.

Me in many seconds

link to my about page

Now Now Now

What I'm doing now

Inkhaven: 30 Days of Posts

How did the United States hand over its future to corporations?

The Physics of Great Storytelling

The Garden (Part 1)

The Garden (Complete)

Spectator Sport

How Does an Agent with Multiple Goals Choose a Target?

Spend Money to Buy Opportunity

Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?

This Is Not Financial Advice

How South Africa's Electricity Catastrophe Was (Mostly) Fixed

How to Make a Problem Disappear

Is the Adult in the Room with Us Right Now?

Characterising the Views on Safety from Frontier AI Labs: Anthropic and DeepMind

Characterising the Views on Safety from Frontier AI Labs: OpenAI

A Case for Persona Robustness as a Research Area

All posts

Posts

Whole Brain Emulation as an Anchor for AI Welfare

Lessons from my first 10 day Vipassana

Learning how to learn

Why pursue conceptions of agency for AI safety

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation

Projects

Building and training a word embedding system

Creating a small GPT from scratch in Pytorch

Adding vision and navigation to an autonomous farm robot

Developing toy models of agency. A mechanistic interpretability project.

Talks

An intro to AI Safety

Papers

Investigating Factored Cognition in Large Language Models For Answering Ethically Nuanced Questions

A Security Analysis of the Linux RNG Protocol in Virtual Machines

HumanAgencyBench: Do Language Models Support Human Agency?

Pictures

My Resumé

Download my resumé

Connect