For me the huge difference between re-compilability and re-excuteability scores is very interesting. GTP4 achieved 8x% on re-compilability (syntactically correct) but abysmal 1x% in re-excutability (schematically correct) demonstrated once again its overgrown mimicry capacity.
I don't have a toolchain, I am predicting research that will be able to detect the exact toolchain used from the binary. If you can detect the toolchain, then you can iterate to a fixed point (grind) until you recover a perfect copy of the source.