Fix “CUDA out of memory” in Google Colab (PyTorch & TensorFlow)

Google Colab · GPU errors · Approx. 10 min read

“CUDA out of memory” is one of the most common errors when training deep learning models on Google Colab. The error usually means your model, batch size, or data doesn’t fit in the GPU memory Colab gave you. Let’s walk through practical fixes that students actually use.

1. First: confirm GPU and memory

Check you really have a GPU runtime and how much memory it has:

import torch

print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU name:", torch.cuda.get_device_name(0))
    print("Total memory (GB):", torch.cuda.get_device_properties(0).total_memory / 1e9)

On TensorFlow:

import tensorflow as tf
print(tf.config.list_physical_devices("GPU"))

2. Easiest fix: reduce batch size

Batch size multiplies the memory needed for activations and gradients. Halve it and try again; keep shrinking until the error disappears.

# PyTorch example
train_loader = DataLoader(dataset, batch_size=32, shuffle=True)

Try 16, 8, 4… until it fits.

3. Use smaller inputs or models

Resize images (e.g., from 512×512 down to 224×224).
Use a smaller backbone (ResNet18 instead of ResNet50; DistilBERT instead of BERT-base).
Reduce hidden sizes / number of layers in custom models.

4. Clear unused GPU memory between runs (PyTorch)

import torch, gc

gc.collect()
torch.cuda.empty_cache()

This doesn’t magically give more memory, but it can clean up memory from old variables that are no longer referenced.

5. Mixed precision & gradient checkpointing (advanced)

5.1 Mixed precision (PyTorch)

scaler = torch.cuda.amp.GradScaler()

for inputs, targets in train_loader:
    optimizer.zero_grad()
    with torch.cuda.amp.autocast():
        outputs = model(inputs.to("cuda"))
        loss = criterion(outputs, targets.to("cuda"))
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

5.2 Gradient checkpointing (Transformers)

Many HuggingFace models support gradient checkpointing to trade compute for memory:

model.gradient_checkpointing_enable()

6. Save a working configuration with NoteCapsule

Once you’ve found a combo of batch size, model, and settings that fits in Colab’s GPU, capture that state so you don’t have to rediscover it later.

from notebookcapsule import create_capsule

create_capsule(
    name="gpu-memory-ok",
    notebook_path="notebooks/train.ipynb",
    data_dirs=["./data"],
    base_dir=".",  # project root
)

The Capsule can store environment metadata and config so you know exactly what setup didn’t crash.

Don’t debug CUDA errors from scratch every semester

NoteCapsule helps you keep reproducible snapshots of GPU training setups that actually fit in memory, so you can re-run or extend them without guessing.

Join NoteCapsule early access