Most scripting languages aren't multithreaded, and some aren't pipeline oriented by default.
For example, working with file lines naively in Ruby means reading the whole lot into a giant array and doing transformations an array at a time, rather than in a streaming fashion.
The shell gives you fairly safe concurrency and streaming for free.
Personally, if it's a complex task, I generally write a tool such that it can be put into a shell pipeline.
Knowing the command line well - so that you don't often have to look up man pages for obscure flags / functionality - has its own rewards, as these commands turn into something you use all the time in the terminal. Rather than spending a few minutes developing a script in an editor, you can incrementally build a pipeline over a few seconds. Doing your script in a REPL is a better approximation, but it's a bit less immediate.
You don't have to read all of a file into memory in Ruby. There are a number of facilities for reading only a portion of a file, readpartial[1] for example. Additionally, you have access to all of the native pipe[2] functionality as well. There are plenty of reasons to favor shell tools over Ruby, but those aren't some of them.
the problem with processing big files with ruby (in my humble experience) is usually that it's still slow enough that "preprocessing with grep&uniq" is worthwhile.
For example, working with file lines naively in Ruby means reading the whole lot into a giant array and doing transformations an array at a time, rather than in a streaming fashion.
The shell gives you fairly safe concurrency and streaming for free.
Personally, if it's a complex task, I generally write a tool such that it can be put into a shell pipeline.
Knowing the command line well - so that you don't often have to look up man pages for obscure flags / functionality - has its own rewards, as these commands turn into something you use all the time in the terminal. Rather than spending a few minutes developing a script in an editor, you can incrementally build a pipeline over a few seconds. Doing your script in a REPL is a better approximation, but it's a bit less immediate.