Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I still don't understand why it's important to have all these areas of expertise embodied in a single person called a "data scientist". Rather than hire one of these rare and expensive people, why couldn't a business hire a statistician and a couple of computer science people and have them work as a team? Given how few data scientists there currently are and the high demand for them, you might even be able to get these three people for less money than one data scientist.

Also, someone who has to constantly shift their attention between statistics and database servers might get less done than somebody who can concentrate on the mathematics and let their co-workers handle the implementation details.



"Rather than hire one of these rare and expensive people, why couldn't a business hire a statistician and a couple of computer science people and have them work as a team?"

This paradigm shift is happening at intelligent companies. You hire a competent software developer who took stats 101 and knows how to Google some stuff, then have him oversee a pair of interns, one who has a math/stats degree and the other who has a physics/engineering degree. In six months you'll have a trained Data Science (tm) team.


The problem with this is that people's skills combine non-linearly. Just like a cluster of 100 machines is not 100 times faster than a single machine. After all, why do we need kite surfers, just tie a surfer and a kite jumper together.

OK, so let's say I'm the scientist working alongside a couple of computer science people. Now, every time I have to remove a comma from the files they exported for me, should I ask them to please write a magic line into the shell for me. Every time I have to parse data in json or whatever else I don't know anything about I wait for them to do it for me. Every time data has to be loaded from a database, I explain what data I want, wait for the computer science person to write and execute a query for me? Just doesn't work this way.

When working with data you have to be able to experiment, and to experiment you have to have an idea about what's possible and what's not. If, as a scientist you do not understand what these practical tools can do for you, your experimentation will be severely limited. You have to be able to pair up the most promising mathematical approach with the simplest techniques that get you there. This is very hard to do, unless the same person knows about both maths and computer tools.

But more fundamentally, machine learning (or data science) does not equal statistics + computer science. The skillset you need to be a powerful part of the team is not simply a union of various computer science and statistics skills. It requires a different mindset.


They are, and continue to. However everyone needs to be able to talk and work together. It's like hiring a programmer to write genome sequencing software. They need to know some biology! In this case the stats people need to know some CS, the CS people some stats, and they both need to know some business.


Not only that, "data scientist" is just a marketing gimmick for consultants. It's a standard set of skills that anyone trained in science or engineering has. It's ok to use it in front of pointy headed bosses but in front of nerds it's slightly dishonest.


^this. I've often wondered where the science in data science is. I don't see it. If you're an engineer or scientist, you probably don't either.


The science in data science involves testing hypotheses, like anyone following the scientific method would do. It's mandatory for validating ML models.


But most public examples of "data science" don't do this. They just publish pretty graphs that a great completely unrigorous.


OK, but that doesn't mean "if you're an engineer or scientist, you don't see the science in data science" is a true statement. It's categorically false.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: