This is one of the worst explanations of probability and statistics I’ve seen. The distinction you draw between the two is incoherent, and your explanation of simple probability concepts is muddled. The entire article sounds like someone trying to explain something they know nothing about.
In some cases, the exact same analysis is used in statistics and machine learning but the results are used differently. For example, a statistician might use a regression to make inferences about a population parameter, whereas a machine learning engineer could use the same analysis and the same data to make a prediction about a new data value.
I prefer a different categorization: prediction is a part of statistics but this is generally done within sample. ML is a branch of statistics that a) always uses resampling or cross validation —an invention of statistics— to demonstrate out of sample prediction and b) often tries to automatically find the function mapping input to output through optimization and loss functions —common stats models also do this— without predefining so many model specifications and a probability distribution. Crossvalidation prevents overfitting so you can potentially use any complex model that does the job, which made the fields grow apart. Stats is for Explanation vs ml is for prediction (while often the case) is a false dichotomy: Is a linear model with low R2 a better explanation than feature importance rank from a neural network with moderate or high R2?
This is one of the worst explanations of probability and statistics I’ve seen. The distinction you draw between the two is incoherent, and your explanation of simple probability concepts is muddled. The entire article sounds like someone trying to explain something they know nothing about.
I've often heard about Bayesian Statistics. How do these stack here?
Essentially a way to update beliefs based upon new data.
Great post, Tivadar! Very clear and succinct.
In some cases, the exact same analysis is used in statistics and machine learning but the results are used differently. For example, a statistician might use a regression to make inferences about a population parameter, whereas a machine learning engineer could use the same analysis and the same data to make a prediction about a new data value.
Thanks! Yeah, I agree. Statistics is for explanation, machine learning is for prediction.
Great post!
"Or 4.921 ft, if you use the imperial metric system."
4'11" if you use the imperial system. And there is no such thing as the imperial metric system
Imperial has a romantic beauty that shouldn't be sullied by easy computability.
;)
Did people use statistics to make predictions before the concept of machine learning existed?
Yes of course
The distinctions drawn here are not precise
I prefer a different categorization: prediction is a part of statistics but this is generally done within sample. ML is a branch of statistics that a) always uses resampling or cross validation —an invention of statistics— to demonstrate out of sample prediction and b) often tries to automatically find the function mapping input to output through optimization and loss functions —common stats models also do this— without predefining so many model specifications and a probability distribution. Crossvalidation prevents overfitting so you can potentially use any complex model that does the job, which made the fields grow apart. Stats is for Explanation vs ml is for prediction (while often the case) is a false dichotomy: Is a linear model with low R2 a better explanation than feature importance rank from a neural network with moderate or high R2?