R to Python: A Guide to Recreating Dplyr’s Convenient Joins in Python

Photo by Alexander Grey introduction If you are one of the many R users who is making the shift to python, you may find yourself depending on the convenience of some of R’s most beloved libraries. At the surface, the jump from the convenience and simplicity of R can seem a bit daunting as the …

Leverage Anti-joins

Introduction Assuming you already have some background with the other more common types of joins, inner, left, right, and outer; adding semi and anti can prove incredibly useful saving you what could have alternatively taken multiple steps. In a previous post, I outlined the benefits of semi-joins and how to use them. Here I’ll be …

Getting Started with Data Science

Introduction When it comes to getting started in data science it can be a bit overwhelming. You need to know statistics, programming, machine learning… within each of those domains there are a many, many sub domains that can dominate a person’s focus and once they’re done reading everything there is to know about one thing, …

Don’t Miss The Bias-Variance Tradeoff Question in Your Next Interview

Why Do Interviewers Ask About it? Questions about the bias-variance tradeoff are used very frequently in interviews for data scientist positions. They often serve to delineate a data scientist that is seasoned and knows their stuff versus one that is junior… and more specifically, as one who is unfamiliar with their options for mitigating prediction …

Intro to Bayesian Statistics

Bayesian Statistics at the Heart of Data Science Data science has deep roots in bayesian statistics & rather than giving the historical background of Sir Thomas Bayes, I’ll give you a high level perspective on bayesian statistics, bayes’ theorem, and how to leverage it as a tool in your work! Bayesian statistics are rooted in …

Data Visualization for Product Managers

A few Rules of Thumb to Make You Dangerous Chances are if you’re reading this is you’re a product manager or in some way a contributor to a product team and would like to give yourself a leg up when it comes to understanding the data that is coming your way. I’m going to give …

Machine Learning, Simplified. Be Apart of the Conversation.

What’s all the buzz about? Machine learning is a concept and frequently dropped buzz word in today’s tech environment that leaves a lot to be desired as far as explanation goes. People often refer to machine learning algorithms as a black box; and while there may be certain aspects of machine learning that may lack …

A Must-have Algorithm for Your Machine Learning Toolbox: XGBoost

One of the most performant machine learning algorithms XGBoost is a supervised learning algorithm that can be used for both regression & classification. Like all algorithms it has its virtues & draws, of which we’ll be sure to walk through. For this post, we’ll just be learning about XGBoost from the context of classification problems. …

Build your First Chatbot in three minutes

30 sec explanation of how Chatbots work Whether you’re a data scientist, data analyst, or software engineer; and whether you have a strong handle on NLP tools and approaches, if you’re here, you’ve likely wondered how a chatbot works and how to build one, but haven’t ever had the need or chance. Well… you’re here …

Getting Started with Experimental Design in R

This quick blog is designed to help you get off to the races quickly in world of data science; and here specifically, Experimental design. Enjoy! When it comes to experiemental design there are three main streps it can be broken down to: PlanningDesignAnalysis Planning & Design Planning should always begin with a well formed hypothesis. …