The dplyr package has a rich set of tools & functions that you can use for data wrangling, exploratory data analysis, feature engineering, and the like. In the next few minutes, we’ll run through the functions that are absolutely pivotal and that you’ll find yourself using every day as a data scientist. Select: Surface the …

# Author Archives: lessonsindatascience

## Intro to Bayesian Statistics

Bayesian Statistics at the Heart of Data Science Data science has deep roots in bayesian statistics & rather than giving the historical background of Sir Thomas Bayes, I’ll give you a high level perspective on bayesian statistics, bayes’ theorem, and how to leverage it as a tool in your work! Bayesian statistics are rooted in …

## Introduction to Forecasting with ARIMA in R

What Makes ARIMA & XTS Objects So Useful for Forecasting XTS Objects If you’re not using XTS objects to perform your forecasting in R, then you are likely missing out! The major benefits that we’ll explore throughout is that these objects are a lot easier to work with when it comes to modeling, forecasting, & …

Continue reading “Introduction to Forecasting with ARIMA in R”

## Foundations of Probability that Every Data Scientist Should Know

Understanding Random Events Customers and Your Application Lets say that you have a random likelihood that a user will click on a call-to-action(I’ll call it a CTA from here on out, but this is anytime you invite the reader to buy, shop, give an email, etc.) within your application. Once they have clicked the call-to-action …

Continue reading “Foundations of Probability that Every Data Scientist Should Know”

## Tired of Nested ifelse in Dplyr?

Using Mutate to Feature Engineer a New Categorical Among the most helpful functions from dplyr is mutate; it allows you to create new variables– typically by layering some logic on top of the other variables in your dataset. Quick Example Let’s say that you’re analyzing user data and you want to categorize users according to …

## You’re Not a Data Scientist Until You Understand the Binomial Distribution

Inference at the heart of data analysis What is the point of inference? Inference is about drawing conclusions about a greater population via some sample of observed data. For example, you have some sample of the countries opinion on the president and you’d like to make some conclusions about the population at large. Obviously you …

Continue reading “You’re Not a Data Scientist Until You Understand the Binomial Distribution”

## Data Visualization for Product Managers

A few Rules of Thumb to Make You Dangerous Chances are if you’re reading this is you’re a product manager or in some way a contributor to a product team and would like to give yourself a leg up when it comes to understanding the data that is coming your way. I’m going to give …

## Machine Learning, Simplified. Be Apart of the Conversation.

What’s all the buzz about? Machine learning is a concept and frequently dropped buzz word in today’s tech environment that leaves a lot to be desired as far as explanation goes. People often refer to machine learning algorithms as a black box; and while there may be certain aspects of machine learning that may lack …

Continue reading “Machine Learning, Simplified. Be Apart of the Conversation.”

## A Must-have Algorithm for Your Machine Learning Toolbox: XGBoost

One of the most performant machine learning algorithms XGBoost is a supervised learning algorithm that can be used for both regression & classification. Like all algorithms it has its virtues & draws, of which we’ll be sure to walk through. For this post, we’ll just be learning about XGBoost from the context of classification problems. …

Continue reading “A Must-have Algorithm for Your Machine Learning Toolbox: XGBoost”

## What is Bootstrap Replication & How Do I Use it?

What is bootstrap replication? For those catching up here, bootstrap sampling refers to the process of sampling a given dataset ‘with replacement’…. And this is where most people get lost. You take many samples and build a distribution to mark your confidence interval. Lets take a quick example. Crypto at College Lets say that you …

Continue reading “What is Bootstrap Replication & How Do I Use it?”