It's weird to me that people build libraries on top of the ML stack to track pro...

jamesblonde · on Dec 25, 2020

MLFlow and TFX try to add some form of provenance by polluting your code with "logging" calls. A good thing MLFlow has added is auto-loggers - we also added them in our Maggy framework ( https://www.logicalclocks.com/blog/unifying-single-host-and-... ).

I totally agree that where you have framework hooks, you should have provenance, but given there's no standard for what provenance is, no defacto open-source platform, the sklearn and tf and pytorch folks rightly steer clear. We see that if you have a shared file system, you can use conventions for path names (features go in 'featurestore', training data in 'training', models in 'models', etc), to capture a ton of provenance data.