Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most scripting languages aren't multithreaded, and some aren't pipeline oriented by default.

For example, working with file lines naively in Ruby means reading the whole lot into a giant array and doing transformations an array at a time, rather than in a streaming fashion.

The shell gives you fairly safe concurrency and streaming for free.

Personally, if it's a complex task, I generally write a tool such that it can be put into a shell pipeline.

Knowing the command line well - so that you don't often have to look up man pages for obscure flags / functionality - has its own rewards, as these commands turn into something you use all the time in the terminal. Rather than spending a few minutes developing a script in an editor, you can incrementally build a pipeline over a few seconds. Doing your script in a REPL is a better approximation, but it's a bit less immediate.



You don't have to read all of a file into memory in Ruby. There are a number of facilities for reading only a portion of a file, readpartial[1] for example. Additionally, you have access to all of the native pipe[2] functionality as well. There are plenty of reasons to favor shell tools over Ruby, but those aren't some of them.

[1]: http://www.ruby-doc.org/core-2.1.2/IO.html#method-i-readpart... [2]: http://www.ruby-doc.org/core-2.1.2/IO.html#method-c-popen


No the case with ruby at all, if you're reading the whole file into memory theres a good chance you're doing it wrong.

check out yield and blocks


The problem is that the most obvious way of doing it - File.readlines('foo.txt').map { ... }.select { ... } etc. - is not stream-oriented.


arguably, it's trivial to make that stream oriented

    open('tmp.rb').each_line.lazy.map {...}.select {...}
the problem with processing big files with ruby (in my humble experience) is usually that it's still slow enough that "preprocessing with grep&uniq" is worthwhile.


    > open('tmp.rb').each_line.lazy
    NoMethodError: undefined method `lazy' for #<Enumerator: #<File:Procfile>:each_line>
Not everybody is using Ruby 2.0.


> if you're reading the whole file into memory theres a good chance you're doing it wrong

GP: "working with file lines _naively_ in Ruby"


ah, my bad, read that as natively and chalked it up to odd wording.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: