Foundations of Probability that Every Data Scientist Should Know

Understanding Random Events Customers and Your Application Lets say that you have a random likelihood that a user will click on a call-to-action(I’ll call it a CTA from here on out, but this is anytime you invite the reader to buy, shop, give an email, etc.) within your application. Once they have clicked the call-to-action …

Tired of Nested ifelse in Dplyr?

Using Mutate to Feature Engineer a New Categorical Among the most helpful functions from dplyr is mutate; it allows you to create new variables– typically by layering some logic on top of the other variables in your dataset. Quick Example Let’s say that you’re analyzing user data and you want to categorize users according to …

A Must-have Algorithm for Your Machine Learning Toolbox: XGBoost

One of the most performant machine learning algorithms XGBoost is a supervised learning algorithm that can be used for both regression & classification. Like all algorithms it has its virtues & draws, of which we’ll be sure to walk through. For this post, we’ll just be learning about XGBoost from the context of classification problems. …