Skip to content
📖 Welcome to my knowledge base! Notes on AI/ML, Maths, CS, MBA, Trading, Economics, Health & Self-Help — all in one place.! 🎉 Discover what’s new

Seaborn

Seaborn is a powerful Python library for statistical data visualization, built on top of Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics with minimal code. This tutorial provides a comprehensive guide to using Seaborn for data exploration and analysis.


Introduction and Philosophy

Seaborn is designed to make visualization a central part of exploring and understanding data. Its core philosophy is “dataset-oriented,” meaning its plotting functions operate on entire DataFrames and arrays, not just individual vectors . This approach allows you to focus on the meaning of the variables in your plot, rather than the mechanics of how to draw them.

Key principles of Seaborn:

  • Dataset-oriented: Work directly with DataFrames and named variables
  • Semantic mapping: Automatically translate data values into visual properties (colors, sizes, styles)
  • Statistical awareness: Built-in aggregation, error estimation, and confidence intervals
  • Aesthetic defaults: Publication-ready themes and color palettes out of the box
  • Matplotlib integration: Full compatibility with Matplotlib customization when needed

Behind the scenes, Seaborn uses Matplotlib to draw its plots, but it simplifies the process significantly by handling the translation from values in the DataFrame to arguments that Matplotlib understands .


Installation and Setup

Installing Seaborn

pip install seaborn

Or with conda:

conda install seaborn

Import Conventions

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Apply the default theme once for all plots
sns.set_theme()

Setting Up a Project-wide Theme

A consistent visual baseline makes your plots look professional and ensures readability across different contexts .

sns.set_theme(
    context="talk",      # text size scaling: paper, notebook, talk, poster
    style="whitegrid",   # clean background with light grid
    palette="deep"       # readable, colorblind-aware categorical palette
)

Why these defaults matter:

  • context="talk" gives readable axis labels and titles for slides and reports
  • style="whitegrid" improves value reading for line and bar plots without heavy visual noise
  • palette="deep" provides distinct category colors that hold up when printed or projected

Understanding Figure-level vs. Axes-level Functions

This is a crucial concept in Seaborn. All plotting functions fall into one of two categories :

Axes-level functions draw onto a single Matplotlib Axes object:

  • Examples: scatterplot, histplot, boxplot, regplot, heatmap
  • Return the Axes object
  • Accept an ax= parameter for precise placement in complex figures
  • Can be combined with other plots in a larger Matplotlib figure

Figure-level functions manage an entire figure, usually with a FacetGrid:

  • Examples: relplot, displot, catplot, lmplot
  • Return a FacetGrid object
  • Do not accept an ax= parameter (they “own” their figure)
  • Support easy faceting with col and row parameters
  • Place legends outside the plot by default

When to use which:

  • Use figure-level functions for quick exploratory analysis and when you need faceting
  • Use axes-level functions when building complex, custom Matplotlib figures with multiple plot types

Color Palettes

Choosing the right color palette is critical for effective data visualization. Seaborn provides extensive tools for creating and applying palettes .

Qualitative Palettes

For categorical data where order doesn’t matter:

# Default palettes
sns.color_palette("deep", n_colors=5)
sns.color_palette("pastel", n_colors=5)
sns.color_palette("dark", n_colors=5)
sns.color_palette("bright", n_colors=5)

# Set a qualitative palette globally
sns.set_palette("Set2")

Sequential Palettes

For continuous data from low to high:

# Built-in sequential palettes
sns.color_palette("viridis", as_cmap=True)  # Perceptually uniform
sns.color_palette("rocket", as_cmap=True)
sns.color_palette("crest", as_cmap=True)

# Blend between two colors
blend = sns.blend_palette(["#0F766E", "#60A5FA"], n_colors=7)
blend_cmap = sns.blend_palette(["#0F766E", "#60A5FA"], as_cmap=True)

Diverging Palettes

For data with a meaningful midpoint (e.g., correlations):

sns.color_palette("coolwarm", as_cmap=True)
sns.color_palette("RdBu", as_cmap=True)

# Center a diverging palette
sns.color_palette("coolwarm", n_colors=11, center=0)

Cubehelix Palettes

For ordered categories or low-ink plots that stay readable in grayscale :

cube = sns.cubehelix_palette(
    start=0.5,    # hue start
    rot=-0.75,    # hue rotation
    gamma=1.0,    # intensity curve
    light=0.95,
    dark=0.15,
    n_colors=6
)

Quick Palette Preview

sns.palplot(sns.color_palette("viridis", n_colors=8))
plt.show()

Figure Sizing and Export

Control size and resolution from the start to avoid fuzzy labels or cramped axes .

# Global defaults via Matplotlib rcParams
plt.rcParams["figure.figsize"] = (8, 5)    # width, height in inches
plt.rcParams["figure.dpi"] = 150           # on-screen clarity

# Export high-quality outputs
plt.savefig("figure.png", dpi=300, bbox_inches="tight")
plt.savefig("figure.svg", bbox_inches="tight")  # Vector format
plt.savefig("figure.pdf", bbox_inches="tight")

Relational Plots

Relational plots show relationships between numeric variables and can be enhanced by mapping additional variables .

scatterplot

The scatter plot is a mainstay of statistical visualization, depicting the joint distribution of two variables .

import seaborn as sns
import matplotlib.pyplot as plt

penguins = sns.load_dataset("penguins").dropna()

# Basic scatter
sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
plt.show()

# With semantic mappings
sns.scatterplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",          # categorical color mapping
    style="sex",            # marker style
    size="body_mass_g",     # point size
    sizes=(30, 160),        # min and max point sizes
    alpha=0.8,
    edgecolor="w",
    linewidth=0.5
)
plt.legend(title="Species")
plt.show()

When the hue variable is numeric, Seaborn switches to a sequential color palette :

sns.scatterplot(
    data=penguins,
    x="bill_length_mm", 
    y="bill_depth_mm",
    hue="body_mass_g",
    palette="viridis"
)
plt.show()

lineplot

For data with a continuous variable like time, line plots show trends and include automatic aggregation and confidence intervals .

fmri = sns.load_dataset("fmri")

# Basic line plot with confidence intervals (default: 95% CI)
sns.lineplot(data=fmri, x="timepoint", y="signal")
plt.show()

# With hue semantic
sns.lineplot(
    data=fmri,
    x="timepoint",
    y="signal",
    hue="event",
    style="event",
    markers=True,
    dashes=False
)
plt.show()

# Control uncertainty representation
sns.lineplot(
    data=fmri,
    x="timepoint",
    y="signal",
    hue="region",
    ci="sd",                # show standard deviation instead of CI
    estimator="mean"
)
plt.show()

lineplot() automatically aggregates multiple observations at each x value, plotting the mean and confidence interval by default. For large datasets, you can disable bootstrapped confidence intervals with ci=None for better performance .

relplot: Figure-level Relational Plotting

relplot() provides a figure-level interface that supports faceting .

# Faceted scatter plot
sns.relplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",
    col="sex",              # facet by sex
    kind="scatter"
)
plt.show()

# Faceted line plot
sns.relplot(
    data=fmri,
    x="timepoint",
    y="signal",
    hue="event",
    col="region",
    kind="line",
    height=4,
    aspect=1
)
plt.show()

Key parameters for relational plots:

  • x, y: Primary variables
  • hue: Color encoding for additional categorical/continuous variable
  • size: Point/line size encoding
  • style: Marker/line style encoding
  • col, row: Facet into multiple subplots (figure-level only)

Distribution Plots

Distribution plots help you understand the spread, shape, and probability density of data .

histplot

Histograms show frequency distributions with flexible binning.

# Univariate histogram
sns.histplot(data=penguins, x="flipper_length_mm")
plt.show()

# With hue and stacking
sns.histplot(
    data=penguins,
    x="flipper_length_mm",
    hue="species",
    multiple="stack",        # "layer", "stack", "dodge", "fill"
    stat="density"           # "count", "density", "probability"
)
plt.show()

# Bivariate histogram
sns.histplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm"
)
plt.show()

kdeplot

Kernel Density Estimation provides a smooth estimate of the probability density function.

# Univariate KDE
sns.kdeplot(data=penguins, x="flipper_length_mm", hue="species", fill=True)
plt.show()

# Bivariate KDE with contours
sns.kdeplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    fill=True,
    levels=5,
    thresh=0.1
)
plt.show()

ecdfplot

Empirical Cumulative Distribution Functions require no parameter tuning and are excellent for comparing distributions .

sns.ecdfplot(
    data=penguins,
    x="flipper_length_mm",
    hue="species"
)
plt.show()

rugplot

Add tick marks at each observation value along the axis .

sns.rugplot(data=penguins, x="flipper_length_mm", hue="species")
plt.show()

displot: Figure-level Distribution Plotting

displot() provides a unified interface for distribution plots with faceting support .

# Histogram with faceting
sns.displot(
    data=penguins,
    x="flipper_length_mm",
    hue="species",
    col="sex",
    kind="hist"              # "hist", "kde", or "ecdf"
)
plt.show()

# KDE with faceting
sns.displot(
    data=penguins,
    x="flipper_length_mm",
    hue="species",
    col="sex",
    kind="kde",
    height=4,
    aspect=0.7
)
plt.show()

jointplot

Combines a bivariate plot with univariate marginal distributions .

sns.jointplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    hue="species",
    kind="scatter"           # "scatter", "hist", "kde", "reg"
)
plt.show()

pairplot

Visualizes every pairwise combination of variables simultaneously .

sns.pairplot(
    data=penguins,
    hue="species",
    corner=True              # Only show lower triangle
)
plt.show()

Categorical Plots

Categorical plots compare distributions or statistics across discrete categories .

Categorical Scatterplots

  • stripplot: Points with jitter to show all observations
  • swarmplot: Non-overlapping points using a beeswarm algorithm
tips = sns.load_dataset("tips")

# Strip plot with jitter
sns.stripplot(data=tips, x="day", y="total_bill", hue="sex", dodge=True)
plt.show()

# Swarm plot - points don't overlap
sns.swarmplot(data=tips, x="day", y="total_bill", hue="sex", dodge=True)
plt.show()

Distribution Comparisons

  • boxplot: Quartiles and outliers
  • violinplot: KDE + quartile information
  • boxenplot: Enhanced boxplot for larger datasets
# Boxplot with controlled ordering
sns.boxplot(
    data=tips,
    x="day",
    y="total_bill",
    hue="sex",
    order=["Thur", "Fri", "Sat", "Sun"],
    dodge=True,
    showfliers=False          # Hide outliers for cleaner view
)
plt.show()

# Violin plot with quartiles
sns.violinplot(
    data=tips,
    x="day",
    y="total_bill",
    hue="sex",
    order=["Thur", "Fri", "Sat", "Sun"],
    split=True,               # Split hue levels side by side
    inner="quartile",         # "quartile", "box", "stick", "point"
    cut=0                     # Don't extend past observed data
)
plt.show()

Statistical Estimates

  • barplot: Mean/aggregate with confidence intervals
  • pointplot: Point estimates with connecting lines
  • countplot: Count of observations per category
# Bar plot with percentile intervals for skewed data
sns.barplot(
    data=tips,
    x="day",
    y="tip",
    hue="sex",
    estimator=np.mean,
    errorbar=("pi", 95),      # Percentile interval for skewed data
    dodge=True
)
plt.show()

# Point plot for trend visualization
sns.pointplot(
    data=tips,
    x="day",
    y="tip",
    hue="sex",
    dodge=True
)
plt.show()

catplot: Figure-level Categorical Plotting

catplot() provides a unified interface with faceting support .

sns.catplot(
    data=tips,
    x="day",
    y="total_bill",
    hue="sex",
    col="time",               # Facet by meal time
    kind="violin",            # "strip", "swarm", "box", "violin", "bar", "point"
    height=4
)
plt.show()

Regression Plots

Regression plots visualize linear relationships with automatic model fitting .

regplot

Axes-level regression plot with scatter + fit line.

# Simple linear regression
sns.regplot(data=tips, x="total_bill", y="tip")
plt.show()

# Polynomial regression
sns.regplot(data=tips, x="total_bill", y="tip", order=2)
plt.show()

# Robust regression (less sensitive to outliers)
sns.regplot(data=tips, x="total_bill", y="tip", robust=True)
plt.show()

lmplot

Figure-level regression with faceting support .

sns.lmplot(
    data=tips,
    x="total_bill",
    y="tip",
    col="time",
    hue="smoker",
    ci=95,
    height=4
)
plt.show()

residplot

Check residuals for assessing model fit.

sns.residplot(data=tips, x="total_bill", y="tip")
plt.show()

Matrix Plots

Matrix plots visualize rectangular data like correlation matrices .

heatmap

Color-encoded matrices with optional annotations.

# Correlation heatmap
corr = penguins.select_dtypes(include=['number']).corr()
sns.heatmap(
    corr,
    annot=True,
    fmt='.2f',
    cmap='coolwarm',
    center=0,
    square=True,
    linewidths=0.5
)
plt.show()

clustermap

Hierarchically-clustered heatmap.

sns.clustermap(
    data,
    cmap='viridis',
    standard_scale=1,      # Standardize columns
    figsize=(10, 10)
)
plt.show()

Multi-plot Grids

For complex multi-panel figures, Seaborn provides grid objects .

FacetGrid

Create subplots based on categorical variables.

g = sns.FacetGrid(
    tips,
    col='time',
    row='sex',
    hue='smoker',
    height=3,
    aspect=1.2
)
g.map(sns.scatterplot, 'total_bill', 'tip')
g.add_legend()
plt.show()

# Using custom functions on each facet
g = sns.FacetGrid(tips, col='day', height=3)
g.map(sns.histplot, 'total_bill')
g.set_axis_labels('Total Bill', 'Count')
plt.show()

PairGrid

Show pairwise relationships with customizable upper/lower/diagonal plots.

g = sns.PairGrid(
    penguins,
    hue='species',
    corner=True
)
g.map_upper(sns.scatterplot)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot, element='step')
g.add_legend()
plt.show()

JointGrid

Combine bivariate plot with marginal distributions.

g = sns.JointGrid(
    data=penguins,
    x='flipper_length_mm',
    y='bill_length_mm'
)
g.plot_joint(sns.scatterplot, hue=penguins['species'])
g.plot_marginals(sns.histplot)
plt.show()

Customizing Plots

Once you’ve created a plot, you can customize it through both the Seaborn API and by dropping down to the Matplotlib layer .

Using Matplotlib for Fine-tuning

# Create a plot
g = sns.relplot(
    data=penguins,
    x='bill_length_mm',
    y='bill_depth_mm',
    hue='body_mass_g',
    palette='crest',
    marker='x',
    s=100
)

# Customize through Matplotlib
g.set_axis_labels('Bill length (mm)', 'Bill depth (mm)', labelpad=10)
g.legend.set_title('Body mass (g)')
g.figure.set_size_inches(6.5, 4.5)
g.ax.margins(.15)
g.despine(trim=True)

plt.show()

Removing Axes Spines

Seaborn makes it easy to trim spines for a cleaner look .

sns.set_theme(style="ticks")
sns.scatterplot(data=tips, x="total_bill", y="tip")
sns.despine(trim=True)      # Remove top and right spines
plt.show()

Common Pitfalls and Solutions

SettingWithCopyWarning

Always create explicit copies when modifying DataFrames:

penguins = sns.load_dataset("penguins").copy()
penguins_clean = penguins.dropna().copy()

Missing Data Handling

Seaborn functions often require handling missing values:

penguins = sns.load_dataset("penguins").dropna(
    subset=["bill_length_mm", "bill_depth_mm", "species"]
)

Overlapping Points in Scatter Plots

Use transparency to reveal density:

sns.scatterplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    alpha=0.5,               # Transparency
    edgecolor="w",
    linewidth=0.5
)

Figure-level vs Axes-level Confusion

  • If you need to place a plot on a specific axes, use an axes-level function with ax=
  • If you need faceting, use a figure-level function
# Combining axes-level functions
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
sns.scatterplot(data=tips, x="total_bill", y="tip", ax=axes[0])
sns.boxplot(data=tips, x="day", y="total_bill", ax=axes[1])
plt.tight_layout()
plt.show()

Performance Tips

For large datasets:

  1. Use displot with kind="ecdf" instead of histograms or KDEs for faster computation
  2. Sample your data for exploratory plots: df.sample(1000)
  3. Disable bootstrapping in lineplot with ci=None
  4. Use scatterplot over stripplot for large categorical datasets

Integration with Matplotlib

Seaborn works seamlessly with Matplotlib, allowing you to leverage both libraries’ strengths . You can use Matplotlib for fine-grained control and Seaborn for high-level statistical plotting:

  • plt.subplots() for creating complex figure layouts
  • Matplotlib’s Axes objects for precise customization
  • Seaborn’s ax= parameter for placing plots on specific axes
  • Matplotlib’s rcParams for global figure settings
Last updated on