What’s all the buzz about?
Machine learning is a concept and frequently dropped buzz word in today’s tech environment that leaves a lot to be desired as far as explanation goes. People often refer to machine learning algorithms as a black box; and while there may be certain aspects of machine learning that may lack transparency, a lot of the frustration or confusion can be mitigated with a bit of background into what machine learning really is.
Machine learning as a discipline is a lot simpler than you might think and can be reduced down to a number of component parts. Having this high level perspective will allow you to understand and to contribute to the conversation when machine learning is brought up in the context of your business, project, or organization.
What Makes Up Machine Learning?
There are two major areas that machine learning falls into that we’re going to break down here. Most “ML” solutions you come across will fall into one of these two buckets: Supervised & unsupervised learning
The idea of a supervised machine learning problem is that you effectively have a pre-defined dataset, which contains what’s called your dependent and response variables. Dependent is what you want to understand and the independent variable(s) are what you can use to make sense of things.
The world of supervised learning breaks into two more sub-classes: Regression & Classification. As I give examples of those, the entire picture of supervised learning should become more clear
When we’re talking about regression in context of the sub-class of supervised learning, what we’re effectively talking about is the prediction of continuous variables. In other words, instead of predicting whether something is spam or not, whether it’s a cat or not, whether it’s anything or not i.e. classification; then we’re predicting a number that can go on ‘continuously’.
Lets use a couple examples to bring it home
Lets say you wanted to predict individual income. You have a dataset that details someone’s experience, education, skills & the like.. from there you might use those variables to predict what they’re likely making.
Another classic example is housing prices. You want to know how much your house is in a given market, so you could pass that markets home sale data to create a model that predicts home value using datapoints like number of rooms, square feet, lot size, year built, and more.
Hopefully that locks down regression & also tees up classification.
Remember the things I said were not regression? Those things are classification. We are still using supervised learning here, but it’s not to predict a continuous number.
We are trying to predict categories or classes. This introduces one more wrinkle, but when it comes to classification; you can try to predict something does or does not pertain to a single class; ie Binary Classification or you can predict which of many classes a record pertains; ie Multi-class classification.
For binary classification, it could be that you want to predict the outcome of an opportunity/deal to be won versus lost.
You could predict movie genre according to rating, box-office, and budget.
When it comes to unsupervised learning; rather than dictating to your model what you want to understand, the model will identify natural patterns on its own.
Likely the most popular unsupervised learning approach is called clustering. Similar to what I just described for regression; lets say your looking at housing data, but rather than predicting a given variable, you want to understand the natural groupings of homes, clustering would allow you to identify natural groupings of homes according to greatest similarity to their defined group and greatest dissimilarity to the other groups.
With the housing data, you may specify that you care about square footage and the value of the home; you may see that the natural groupings could be indicators of house types, different markets, age, among other things.
There are also two other emergent and less prevalent categories that machine learning activities might fall under. Those being reinforcement learning and semi-supervised. I wont dive into either of those here, but if you found this helpful and would like more content similar to this, check out my blog at datasciencelessons.com.
Thanks for reading! Happy Data science-ing!