in

How to Find Mean, Median, and Mode in Python? The Comprehensive Guide

Understanding mean, median, and mode is crucial for anyone getting started in data science. These metrics help you understand the typical value and distribution of your data.

In this comprehensive guide, you‘ll become an expert at calculating mean, median, and mode in Python. I‘ll share my insight as a data analyst to explain not just the code, but also when and why to use each technique.

By the end of this guide, you‘ll be able to:

  • Explain mean, median, and mode in plain English
  • Manually calculate mean, median, and mode in Python
  • Create reusable functions to find them
  • Use Pandas and NumPy for large datasets
  • Know when to use mean vs median vs mode

Let‘s get started! This tutorial has everything you need to become a pro at working with mean, median, and mode.

Mean, Median, and Mode – An Analyst‘s Explanation

The terms mean, median, and mode get thrown around a lot in statistics. But what do they actually mean?

Here‘s my explanation as a data analyst:

  • Mean – The average value of a dataset. Add up all the numbers and divide by the length.

  • Median – The middle value when a dataset is sorted. Finds the "center" of a dataset.

  • Mode – The most frequently occurring value in a dataset. Tells you the most "popular" value.

The mean incorporates every value in a dataset. But it can be skewed by outliers.

The median looks at the middle and is not affected as much by outliers.

The mode reveals common or repetitive values even if they are not in the middle.

Each one provides a different perspective on your data!

For example, let‘s look at home prices in a neighborhood:

Home Prices = [198,400, 325,100, 279,950, 210,000, 262,800, 339,900]
  • Mean Price = $276,767
  • Median Price = $279,950
  • Mode Price = No mode (all values differ)

The median gives us the best sense of a "typical" home price without being skewed by expensive houses.

Next let‘s look at customer ratings for a product:

Ratings = [2, 5, 4, 2, 5, 3, 5, 2, 3, 5]  
  • Mean Rating = 3.5
  • Median Rating = 3.5
  • Mode Rating = 5 (most common rating)

Here the mode shows the most popular rating was 5 stars even though the average is brought down by lower ratings.

Understanding these metrics provides valuable insight into your data! Now let‘s dive into calculating them in Python.

Calculating the Mean Manually in Python

Let‘s first cover how to calculate the mean manually without using any libraries.

Here are the steps:

  1. Sum all the values in the dataset
  2. Count the total number of values
  3. Divide the sum by the total count

For example, to find the mean of [2, 4, 6, 8, 10]:

values = [2, 4, 6, 8, 10]

# Sum of all values  
sum = 2 + 4 + 6 + 8 + 10 = 30

# Total number of values
count = 5 

# Calculate mean
mean = sum / count = 30 / 5 = 6

Let‘s translate this into reusable Python code:

def mean(values):
  sum = 0
  for num in values:
    sum += num

  count = len(values)

  return sum / count

print(mean([2, 4, 6, 8, 10])) # 6

By encapsulating the steps into a function, we can reuse it to get the mean of any list of values.

Manually calculating the mean gives you an appreciation for what‘s happening behind the scenes. Now let‘s move on to the median.

Finding the Median in Python

Calculating the median requires first sorting the dataset:

  1. Sort values in ascending order
  2. Find the middle index (or indices for even sets)
  3. Median is the value(s) at the middle index/indices

For example, to find the median of [2, 4, 6, 8, 10]:

values = [2, 4, 6, 8, 10] 

# Sort values
sorted_values = [2, 4, 6, 8, 10]  

# Middle index: 2
# Median = 6

For a dataset with an even number of values:

values2 = [2, 4, 6, 8, 10, 12]

sorted_values = [2, 4, 6, 8, 10, 12]

# Middle indices: 2, 3
# Median = (8 + 10) / 2 = 9 

Let‘s make this into a function:

import math

def median(values):

  sorted_values = sorted(values)

  length = len(sorted_values)

  if length % 2 != 0:
    # Odd number - return middle value
    return sorted_values[length // 2]
  else:
    # Even number - return average of middle two
    mid1 = sorted_values[length//2 - 1]  
    mid2 = sorted_values[length//2]
    return (mid1 + mid2) / 2

print(median([2, 4, 6, 8, 10])) # 6 
print(median([2, 4, 6, 8, 10, 12]) # 9

This handles both odd and even dataset lengths to calculate the median.

Now let‘s switch gears to calculating the mode.

Finding the Mode in Python

To find the mode, we need to know the frequency of each value:

  1. Create a frequency dictionary
  2. Increment count each time a value is seen
  3. Find the key with the maximum frequency

For example, to find the mode of [1, 3, 6, 3, 7, 3]:

values = [1, 3, 6, 3, 7, 3]

freqs = {}

for num in values:
  if num in freqs:
    freqs[num] += 1
  else:
    freqs[num] = 1

print(freqs) 
# {1: 1, 3: 3, 6: 1, 7: 1}

max_freq = max(freqs.values()) # 3

modes = [num for num in freqs if freqs[num] == max_freq] 
print(modes) # [3]

We can wrap this logic in a function:

def mode(values):

  freqs = {}

  for val in values:
    if val in freqs:
      freqs[val] += 1
    else:
      freqs[val] = 1

  max_freq = max(freqs.values()) 

  modes = [val for val, freq in freqs.items() if freq == max_freq]

  return modes 

print(mode([1, 3, 6, 3, 7, 3])) # [3]

And there we have it – calculating mean, median, and mode from scratch in Python!

Using Python‘s statistics Module

Writing functions yourself is great for learning. But Python‘s built-in statistics module provides simple ready-made functions:

from statistics import mean, median, mode

prices = [198,400, 325,100, 279,950] 

mean_price = mean(prices)
median_price = median(prices)

print(mean_price) # 276,766  
print(median_price) # 279,950

The statistics module is great for quick, one-off analysis on small datasets.

For large datasets, Pandas and NumPy provide optimized functions that are much faster:

import pandas as pd
import numpy as np

data = pd.DataFrame({
  "Price": np.random.normal(275000, 70000, 10000) 
})

mean_price = data["Price"].mean()
median_price = data["Price"].median() 

Pandas and NumPy are designed for speed and performance on big data.

So in summary:

  • statistics – Simple analysis on small data
  • Pandas/NumPy – Optimized for large datasets

Choose the best tool for the job!

When Should You Use Mean, Median, or Mode?

Now that you know how to calculate mean, median, and mode, when should you use each one?

Here are my tips as an analyst:

  • Mean – Use for quantitative data like prices. But beware of outliers skewing the mean.

  • Median – Use for ordinal data like ratings. Gives you the "middle" value.

  • Mode – Use for categorical data like product or brand names. Finds the most popular category.

The distribution of your data should guide which measure of central tendency you use:

  • Symmetric, normal data – Mean
  • Skewed data – Median
  • Categorical data – Mode

For example, for customer satisfaction scores, the median gives you a good sense of "average" rating unaffected by very high or low scores.

For product prices, the mean lets you analyze pricing trends even though no product costs exactly the mean price.

And for customer demographics, the mode reveals common categories like age ranges or geographic regions.

Evaluate your data and choose the best metric for the job!

Let‘s Recap…

In this comprehensive guide you learned:

  • Mean – The average, summing values divided by count

  • Median – The middle value, sorting then finding mid index

  • Mode – Most frequent value, counting occurrences

  • How to manually calculate mean, median, mode in Python

  • Creating custom functions for reuse

  • Using Python‘s statistics module for convenience

  • Optimized NumPy/Pandas functions for large data

  • When to use mean vs median vs mode based on data types

You‘re now an expert at working with these fundamental statistical concepts in Python!

Calculating mean, median, and mode provides valuable insight into the typical value and distribution of your data. Apply these essential skills to explore and understand any dataset you encounter.

Thanks for reading! Please let me know if you have any other questions.

AlexisKestler

Written by Alexis Kestler

A female web designer and programmer - Now is a 36-year IT professional with over 15 years of experience living in NorCal. I enjoy keeping my feet wet in the world of technology through reading, working, and researching topics that pique my interest.