I guess your generated code contains a lot of black boxes that the optimizer can't see through. At my work, the LLVM optimizer is doing an insane amount of optimizations.
Rust is very optimizable by LLVM, but still has similar issue with performance. It's costly to optimize overly verbose/inefficient LLVM IR. Rust ended up implementing its own (MIR) optimization passes before LLVM to generate more optimized IR for it.