Those rules are often ignored by agents. Codex is known to be quite adhering, bu...

philipp-gayret · 2025-10-27T23:02:49 1761606169

I'm aware of the issues around rules as in a default prompt. I had hoped the author of the blog meant a different mechanism when they mentioned "steering rules". I do mean something different, where an agent will self-correct when it is seen going against rules in the initial prompt. I have a different setup myself for Claude Code, and would call parts of that "steering"; adjusting the trajectory of the agent as it goes.

manmal · 2025-10-28T05:49:52 1761630592

With Claude Code, you can intercept its prompts if you start it in a wrapper and mock fetch (someone with github user handle „badlogic“ did this, but I can’t find the repo now). For all other things (and codex, Cursor) you‘d need to proxy/isolate all comms with the system heavily.

CharlesW · 2025-10-27T22:26:23 1761603983

Everything related to LLMs is probabilistic, but those rules are also often followed well by agents.

manmal · 2025-10-28T05:31:44 1761629504

Yes they do, most of the time. Then they don’t. Yesterday, I told codex that it must always run tests by invoking a make target. That target is even configurable w/ parameters, eg to filter by test name. But always, at some point in the session, codex started disregarding that rule and fell back to using the platform native test tool directly. I used strong language to steer it back, but 20% or so of context later, it did that again.

Dilettante_ · 2025-10-28T09:56:30 1761645390

Once the LLM has made one mistake, it's often best to start a new context.

Since its mechanism is to predict the next token of the conversation, it's reasonable to "predict" itself making more mistakes once it has made one.

manmal · 2025-10-28T10:24:20 1761647060

I‘m not sure this is still the case with codex. In this instance, restarting had no strong effect.