Consider I get 10 files of size 3 GB every week, which I am supposed to filter b...

jbooth · on Aug 27, 2014

I think the gain you're seeing there is because it's quicker for you to do quick, dirty ad hoc work with the shell than it is to write custom python for each file. Which totally makes sense, the work's ad hoc so use an ad hoc tool. Python being slow and grep being a marvel of optimization doesn't really matter, here, compared to the dev time you're saving.

collyw · on Aug 27, 2014

I have been doing Python the last few years, but went back to Perl for this sort of thing recently. You can start with a one liner, and if it gets complicated, just turn it into a proper script. As well as the Unix commands mentioned. Its just faster when you don't know what you are dealing with yet.

gaius · on Aug 28, 2014

For this kind of thing, it's easiest to bulk-load them into SQLite and do your exploration and early analysis in SQL