GIT Essentials for a Data Scientist

Version Control 101 Version control is all about managing changes to files and directories by one or many contributors. Git is an incredibly popular system for version control and the one we will be running through for this course. There are many benefits to version control, and Git specifically. Including a view of historical changes …

Don’t Miss The Bias-Variance Tradeoff Question in Your Next Interview

Why Do Interviewers Ask About it? Questions about the bias-variance tradeoff are used very frequently in interviews for data scientist positions. They often serve to delineate a data scientist that is seasoned and knows their stuff versus one that is junior… and more specifically, as one who is unfamiliar with their options for mitigating prediction …

Become a Master of Data Wrangling in R

The dplyr package has a rich set of tools & functions that you can use for data wrangling, exploratory data analysis, feature engineering, and the like. In the next few minutes, we’ll run through the functions that are absolutely pivotal and that you’ll find yourself using every day as a data scientist. Select: Surface the …

Intro to Bayesian Statistics

Bayesian Statistics at the Heart of Data Science Data science has deep roots in bayesian statistics & rather than giving the historical background of Sir Thomas Bayes, I’ll give you a high level perspective on bayesian statistics, bayes’ theorem, and how to leverage it as a tool in your work! Bayesian statistics are rooted in …

Introduction to Forecasting with ARIMA in R

What Makes ARIMA & XTS Objects So Useful for Forecasting XTS Objects If you’re not using XTS objects to perform your forecasting in R, then you are likely missing out! The major benefits that we’ll explore throughout is that these objects are a lot easier to work with when it comes to modeling, forecasting, & …

Foundations of Probability that Every Data Scientist Should Know

Understanding Random Events Customers and Your Application Lets say that you have a random likelihood that a user will click on a call-to-action(I’ll call it a CTA from here on out, but this is anytime you invite the reader to buy, shop, give an email, etc.) within your application. Once they have clicked the call-to-action …

Tired of Nested ifelse in Dplyr?

Using Mutate to Feature Engineer a New Categorical Among the most helpful functions from dplyr is mutate; it allows you to create new variables– typically by layering some logic on top of the other variables in your dataset. Quick Example Let’s say that you’re analyzing user data and you want to categorize users according to …

You’re Not a Data Scientist Until You Understand the Binomial Distribution

Inference at the heart of data analysis What is the point of inference? Inference is about drawing conclusions about a greater population via some sample of observed data. For example, you have some sample of the countries opinion on the president and you’d like to make some conclusions about the population at large. Obviously you …

Data Visualization for Product Managers

A few Rules of Thumb to Make You Dangerous Chances are if you’re reading this is you’re a product manager or in some way a contributor to a product team and would like to give yourself a leg up when it comes to understanding the data that is coming your way. I’m going to give …

Machine Learning, Simplified. Be Apart of the Conversation.

What’s all the buzz about? Machine learning is a concept and frequently dropped buzz word in today’s tech environment that leaves a lot to be desired as far as explanation goes. People often refer to machine learning algorithms as a black box; and while there may be certain aspects of machine learning that may lack …