Intro to ML

WTM + Muse and Metrics Outreach

Posted on September 07, 2023 · 6 mins read

I had a blast co-hosting an Introduction to Machine Learning (ML) tutorial with Phillipa Burgess, host of the Muse and Metrics podcast and also a fellow Women Techmakers Ambassador. We begin by discussing what ML is and how it differs from traditional programming. Next, we delve into a typical ML workflow that starts with data preparation. After our data is ready, we train a model and evaluate its performance. Lastly, we discuss next steps and free resources to help you continue learning.

Welcome

What is Machine Learning?

Traditional Programming

In traditional programming, you write explicit rules in code. Take a program that converts temperature from Celsius to Fahrenheit as an example. You input the temperature in Celsius, and the program uses a predefined formula to output the temperature in Fahrenheit.

Machine Learning

ML, however, works differently. Instead of giving the program a formula, you provide it with examples—pairs of input and output data. The algorithm then goes through the data to find patterns, learning to make predictions on its own. For example, you could provide a bunch of temperatures in Celsius along with their Fahrenheit equivalents, and the model would learn the conversion rule by itself.

This approach is powerful for complex tasks. Consider image recognition: it would be difficult to define explicit rules to determine if a photo contains a cat. But with ML, you can train a model on numerous examples of photos with and without cats, and the algorithm will figure out how to identify them.

Language processing benefits similarly from ML. Instead of feeding a program exhaustive lists of vocabulary and grammar rules, you can train a model on well-written text, allowing it to learn the language’s structure and usage patterns.

Data in Machine Learning

Understanding Your Data

Before we get started, it’s crucial to understand your data. Ask questions like:

  • Where does the data come from?
  • What type of data is it—images, numbers, text?
  • What questions can the data answer?

For this tutorial, we’ll use data about penguins from three specific islands in the Palmer Archipelago. The dataset contains tabular information like bill depth, flipper length, body mass, and sex for each penguin.

Data Explore

Data Preparation

Preparing your data is often the most time-consuming part of an ML project. In our case, we’ll use features like bill depth and flipper length as input for our model. First, check for missing data and decide how to handle it. Data needs to be numerical for the algorithm to understand it. Also, for our supervised learning model, the output will be the type of penguin—Adelie, Gentoo, or Chinstrap, encoded numerically as 0, 1, or 2, respectively.

Then, we split the data into a training set and a testing set, often using an 80-20 split. Training on all the data can lead to biased results; it’s crucial to reserve a random sample for evaluation.

Models and Algorithms in Machine Learning

Decision Tree is a simple yet effective algorithm for beginners. Think of it like a flowchart that makes decisions based on the input features. For example, if a penguin’s bill length is less than 44 mm, it’s most likely an Adelie. The model learns these rules during the training phase.

Decision Tree

Decision Forest is a slightly more advanced algorithm, which employs multiple Decision Trees for better accuracy. Each tree in the forest is trained on different subsets of the data, making the ensemble of trees more reliable and accurate than a single tree.

Decision Forest

The good news is that the mathematical and coding aspects of Decision Trees and Forests are mostly handled by existing libraries, letting you focus on the data and the problem at hand. As you delve deeper into ML, you’ll find that a strong foundation is crucial for understanding more complex algorithms and techniques.

Here are links to the Google Colab Notebook detailed instructions and the YouTube video with the complete tutorial. This was adapted based on the tutorial developed for the 2022 Women in Machine Learning (WiML) Conference by Michelle Carney and Soo Sung from the Tensorflow team.

Learning Resources

The fun doesn’t have to stop here! There are a TON of free, online resources to help you learn ML and get started with Generative AI!

  • DeepLearning.ai series of 1-hour short courses to learn generative AI
  • Introduction to Generative AI earn badges while following a learning path with videos & exercises
  • ML Crash Course Google’s fast-paced, practical introduction to machine learning.
  • fast.ai if you already have some coding background, this is a practical guide to dive into deep learning.
  • Kaggle Competitions, If you’re ready to dive in and start coding, check out the “competitions” on Kaggle! It’s a great way to apply what you’ve learned with a community of other learners.
  • SimpleML in Google sheets for a no-code way to get started with ML.
  • Made with TFJS youtube series that highlight awesome projects made for the web!