I'm amazed that there are so many good responses above only this mentions fuzzin...

I'm amazed that there are so many good responses above only this mentions fuzzing. In the context of security, inputs might be non-linear things like adjacent memory, so I don't see anyway to be confident about equilivancy without substantial fuzzing.

Honestly I just don't see a way to formally verify this at all, it's sounds like it could be a very useful tool but I don't see a way for it to be fully confident. But, heck, just getting you 90% of the way towards understanding it with LLMs is still amazing and useful in real life.