Jupyter Notebook · MemoryError · Large data
How to fix MemoryError in Jupyter Notebook for large datasets
You run a cell in Jupyter, your fan goes crazy, then: MemoryError or a dead kernel.
Let’s walk through practical ways to keep your notebook alive when working with large data.
Step 1 – Confirm it’s really a memory issue
On most machines you can quickly check memory usage:
import psutil
mem = psutil.virtual_memory()
print(f"Total: {mem.total/1024**3:.2f} GB")
print(f"Used: {mem.used/1024**3:.2f} GB")
print(f"Free: {mem.available/1024**3:.2f} GB")
Step 2 – Avoid loading everything at once
The classic trap is doing:
import pandas as pd
df = pd.read_csv("huge.csv") # >> MemoryError
Instead, use chunked reading:
import pandas as pd
reader = pd.read_csv("huge.csv", chunksize=100_000)
rows = 0
for chunk in reader:
# process chunk here (aggregate, filter, feature engineering)
rows += len(chunk)
print("Processed rows:", rows)
Step 3 – Work with a sample first
For EDA and prototyping, a sample is enough:
import pandas as pd
sample = pd.read_csv("huge.csv", nrows=50_000)
sample.head()
Step 4 – Optimize dtypes
Use smaller integer / float types to cut memory usage:
for col in sample.select_dtypes(include="int64").columns:
sample[col] = pd.to_numeric(sample[col], downcast="integer")
for col in sample.select_dtypes(include="float64").columns:
sample[col] = pd.to_numeric(sample[col], downcast="float")
Step 5 – Clear variables and restart smarter
In notebooks, old variables stick around. Use:
%reset -f # clears variables in current kernel
Or manually delete large objects:
del df
import gc
gc.collect()
Step 6 – Keep plots reasonable
Very large heatmaps or pairplots can explode memory. Try plotting only a subset of rows/columns or aggregate first.
Snapshot your working notebook with NoteCapsule
After you’ve finally built a memory-safe workflow, capture it as a reproducible Capsule so you don’t lose this “known good” state.
pip install "git+https://github.com/YOUR_GITHUB_USERNAME/notecapsule.git@main"
from notecapsule import create_capsule
create_capsule(
name="jupyter-large-data-safe",
notebook_path="analysis.ipynb",
data_dirs=["./data"],
base_dir="."
)
Commit the Capsule folder to Git or share it as a zip for grading, viva, or collaboration.