ML project folder structure for Jupyter & Google Colab
Most ML projects start as a single notebook and end as a mess of final2.ipynb, random CSVs, and
checkpoint files scattered across Colab or your laptop. A simple, consistent project structure makes everything
easier: debugging, collaboration, and reproducibility.
1. A simple, reusable folder layout
my-ml-project/
notebooks/
exploration.ipynb
training.ipynb
data/
raw/
processed/
models/
reports/
figures/
capsules/
requirements.txt
README.md
This layout works both for local Jupyter and Google Colab (with Drive mounted).
2. Keep notebooks in notebooks/, not project root
Benefits:
- Your root folder stays clean.
- You can have multiple notebooks for different phases (EDA, training, evaluation).
- Easier to ignore checkpoints (
.ipynb_checkpoints/).
3. Make data folders meaningful
Split data by purpose:
data/raw/– original files.data/processed/– cleaned, transformed, feature matrices.data/external/– public datasets, external sources.
In notebooks, always refer to data via relative paths:
DATA_DIR = "../data"
train_path = f"{DATA_DIR}/processed/train.csv"
4. Keep models and outputs separate
Save checkpoints and trained models under models/:
models/ baseline_logreg.joblib cnn_epoch10.pt best_model.h5
Similarly, store plots and exported results under reports/:
reports/
figures/
roc_curve.png
confusion_matrix.png
metrics_summary.csv
5. Using this structure in Google Colab
Mount Drive and point Colab to your project folder:
from google.colab import drive
drive.mount('/content/drive')
PROJECT_ROOT = "/content/drive/MyDrive/projects/my-ml-project"
%cd $PROJECT_ROOT/notebooks
From here, relative paths like ../data/processed/train.csv will work reliably, and your whole project
lives in one Drive folder.
6. Add NoteCapsule Capsules on top
Once your layout is in place, a Capsule becomes an easy way to preserve “snapshots” of the whole project at important milestones:
from notebookcapsule import create_capsule
create_capsule(
name="after-eda-and-baseline",
notebook_path="training.ipynb",
data_dirs=["../data", "../models"],
base_dir="..", # project root
)
Each Capsule lives under capsules/ with its own metadata and manifests.
Want your ML project structure to survive deadlines?
NoteCapsule works best with a clean project layout. Once you have notebooks/, data/,
and models/ in place, a single function call can capture a reproducible Capsule for safekeeping.
We’ll share templates and example repos using this structure in both Jupyter and Colab.