Whoever read this - please please please ignore the posts that suggest to just play with numbers. This is the equivalent of suggesting to someone who wants to learn how to code to copy-paste formulas into excel. Just don't be that person.
To be very blunt, in 2020 most ML is still glorified statistics, except you lose the insights and explanations. The only tangible improvements can be random forests - some times. 99% of the stuff you can do with basic statistics. 99% of the coders I know just don't know statistics besides the mean (and even with that, they do senseless things like doing means of means)
So learn statistics - basic statistics, like in the "for dummies" book series.
If you want to be a little more practical, stats "for dummies" is often found in disciplines that depends on stats, but are not very good in math - biology, psychology, and economics are great candidates.
So just download biology basis stats (to know how to compare means - this gives you the A/B test superpower), then psychology factor analysis (to know PCA - this gives you the dimension reduction superpower) then econometrics basic regression (to know linear regression)
With these 3 superpowers, you will be able to do more than most of the "machine learning" people. When you have mastered that, try stuff like random forest, and see if you still think it's as cool as it's hyped to be.
Given that many data people run across is tabular, I appreciate your advice about the importance of statistics. Also kudos for mentioning hypothesis testing (no one in this thread mentioned it). Lastly, I’d add that ML practitioners will gain a lot by listening to statisticians and economists on the issue of data quality, e.g. selection bias.
That said, I am not as cynical about “machine learning.” ML and “data science” brought the importance of prediction front and center, i.e. can you fit a model that accurately predict the target value given a previously seen input? This point is made by the recently published stats textbook Computer Age Statistical Inference (Efron and Hastie).
In some applications, it may be beneficial to choose black box models with high predictive accuracy, as the goal for these applications is prediction, not interpreting individual model coefficients.
To be very blunt, in 2020 most ML is still glorified statistics, except you lose the insights and explanations. The only tangible improvements can be random forests - some times. 99% of the stuff you can do with basic statistics. 99% of the coders I know just don't know statistics besides the mean (and even with that, they do senseless things like doing means of means)
So learn statistics - basic statistics, like in the "for dummies" book series.
If you want to be a little more practical, stats "for dummies" is often found in disciplines that depends on stats, but are not very good in math - biology, psychology, and economics are great candidates.
So just download biology basis stats (to know how to compare means - this gives you the A/B test superpower), then psychology factor analysis (to know PCA - this gives you the dimension reduction superpower) then econometrics basic regression (to know linear regression)
With these 3 superpowers, you will be able to do more than most of the "machine learning" people. When you have mastered that, try stuff like random forest, and see if you still think it's as cool as it's hyped to be.