Introduction Assuming you already have some background with the other more common types of joins, inner, left, right, and outer; adding semi and anti can prove incredibly useful saving you what could have alternatively taken multiple steps. In a previous post, I outlined the benefits of semi-joins and how to use them. Here I’ll be …

# Tag Archives: R

## Getting Started with Data Science

Introduction When it comes to getting started in data science it can be a bit overwhelming. You need to know statistics, programming, machine learning… within each of those domains there are a many, many sub domains that can dominate a person’s focus and once they’re done reading everything there is to know about one thing, …

## Leverage Semi-joins in R

Introduction Assuming you already have some background with the other more common types of joins, inner, left, right, and outer; adding semi and anti can prove incredibly useful saving you what could have alternatively taken multiple steps. In this post, I’ll be focusing on just semi-joins; with that said, there is a lot of overlap …

## Kmeans clustering

Introduction Clustering is a machine learning technique that falls into the unsupervised learning category. Without going into a ton of detail on different machine learning categories, I’ll give a high level description of unsupervised learning. To put it simply, rather than pre-determining what we want our algorithm to find, we provide the algorithm little to …

## Why Bias in Covid-19 Reporting Will Drive New Risks & Challenges

How incomplete information & bias are driving bad assumptions and inappropriate action Right now the world is in pandemonium about the risks associated with covid-19; most of which appear to be less about virus symptoms, and more about the larger social implications of the panic. What are the current data limitations? Our information is currently …

Continue reading “Why Bias in Covid-19 Reporting Will Drive New Risks & Challenges”

## Don’t Miss The Bias-Variance Tradeoff Question in Your Next Interview

Why Do Interviewers Ask About it? Questions about the bias-variance tradeoff are used very frequently in interviews for data scientist positions. They often serve to delineate a data scientist that is seasoned and knows their stuff versus one that is junior… and more specifically, as one who is unfamiliar with their options for mitigating prediction …

Continue reading “Don’t Miss The Bias-Variance Tradeoff Question in Your Next Interview”

## Become a Master of Data Wrangling in R

The dplyr package has a rich set of tools & functions that you can use for data wrangling, exploratory data analysis, feature engineering, and the like. In the next few minutes, we’ll run through the functions that are absolutely pivotal and that you’ll find yourself using every day as a data scientist. Select: Surface the …

## Intro to Bayesian Statistics

Bayesian Statistics at the Heart of Data Science Data science has deep roots in bayesian statistics & rather than giving the historical background of Sir Thomas Bayes, I’ll give you a high level perspective on bayesian statistics, bayes’ theorem, and how to leverage it as a tool in your work! Bayesian statistics are rooted in …

## Introduction to Forecasting with ARIMA in R

What Makes ARIMA & XTS Objects So Useful for Forecasting XTS Objects If you’re not using XTS objects to perform your forecasting in R, then you are likely missing out! The major benefits that we’ll explore throughout is that these objects are a lot easier to work with when it comes to modeling, forecasting, & …

Continue reading “Introduction to Forecasting with ARIMA in R”

## Foundations of Probability that Every Data Scientist Should Know

Understanding Random Events Customers and Your Application Lets say that you have a random likelihood that a user will click on a call-to-action(I’ll call it a CTA from here on out, but this is anytime you invite the reader to buy, shop, give an email, etc.) within your application. Once they have clicked the call-to-action …

Continue reading “Foundations of Probability that Every Data Scientist Should Know”