Interesting that it is generalizable to other pairs. That implies some sort of p...

Interesting that it is generalizable to other pairs. That implies some sort of prompt property or characteristic that could be widely used.

I don’t think using different models is the right approach though. They behave differently. Better to use a big and small one from same family. Or alternatively using this to drive whether to give the ai more “thinking time” via chain of thought or agents.