Not me, but a compiler backend would perhaps have better “materials” for optimization this way. There is no reason for existing higher level code to not be transformable to this “microcode”
A compiler backend would have to do it statically, and would not be able to do it based on the workload that's being run.
The argument to expose this sort of control would be because domain knowledge would allow higher level domain knowledge to be used to tune the prefetching manually.