I’m not well versed in QA/Sysadmin/Logs but surely metrics suffer from Simpson’s...

I’m not well versed in QA/Sysadmin/Logs but surely metrics suffer from Simpson’s paradox compared to properly probed questions only answered through having access to the entirety of the logs?

If you average out metrics across all log files you’re potentially reaching false or worse inverse conclusions about multiple distinct subsets of the logs

It’s part of the reason why statisticians are so pedantic about the wording of their conclusions and to which subpopulation their conclusions actually apply to