Data Science Lessons – Home

Photo by on

Understand the Internet’s Relevance Metric in a Hurry: TF-IDF

intro With roots in the 1950’s, TF-IDF is a cornerstone for modern applications determining the relevance of each word in a document. At first glance, one could take the simple approach of looking at volume alone, i.e. “how many times did each term show up?”; but TF-IDF takes us a big step further; not only…

Making Sense of Text in a Hurry: A Regular Expressions Primer

Introduction Whether you are brand new to regex and have text data you’d like to make sense of, or you have experience laboring over stack overflow questions hoping to find the exact same use case without quite understanding the jumble of regex you’re putting into use; this introduction will prove a useful foundation as you…

Follow My Blog

Get new content delivered directly to your inbox.