A convolutional neural network for binary "cat vs not-cat" classification.
This repository is my playground where I built a CNN from scratch (NumPy + Pillow) without relying on any deep learning framework. Iterating on data loading, training, evaluation and small UX utilities. It started as exercises from Andrew Ng's deep learning materials plus some independent reading/tutorials, and I iterated on it with a bit of automated help.
Expect this project to be iteratively improved as it's intentionally simple and experimental for now.
The CNN processes a 64x64 image and passes it through the following architecture:
input (64×64 grayscale) -> conv 3×3 (kernel1) -> conv map 62×62 -> max-pool 2×2 (stride 2) -> 31×31 ↩
-> conv 3×3 (kernel2) -> 29×29 -> max-pool 2×2 -> 14×14 -> flatten (196) ↩
-> Dense (196 -> 1) -> sigmoid -> output (probability)
This project is a lightweight educational implementation of a CNN for binary image classification.
The model is intentionally small:
- two convolution layers
- two max-pooling layers
- one dense output layer
- sigmoid output for probability scoring
The goal is not state-of-the-art accuracy. The goal is to show how a CNN can be implemented end-to-end using only NumPy for math and Pillow for image handling.
| Input | Conv1 |
|---|---|
![]() |
![]() |
| Pool1 | Conv2 |
|---|---|
![]() |
![]() |
| Pool2 | Dense weights |
|---|---|
![]() |
![]() |
| Feature grid |
|---|
![]() |
| Summary |
|---|
![]() |
Briefly, a convolutional neural network works by learning filters that respond to local patterns in an image.
The 3×3 kernels slide across the image and compute dot products with small neighborhoods. Early filters tend to learn simple patterns such as:
- edges
- corners
- light/dark transitions
- simple textures
In this project, the first convolution layer extracts local image patterns from the 64×64 grayscale input. The second convolution layer operates on pooled features and can combine earlier patterns into slightly more abstract ones.
Max-pooling reduces spatial size by keeping the strongest activation in each small region. This helps:
- reduce computation
- reduce sensitivity to tiny shifts
- keep the most salient features
After convolution and pooling, the feature map is flattened into a 1D vector. A dense layer maps that vector to a single output logit, and sigmoid turns that into a value between 0 and 1.
Interpretation:
- close to 1.0 → cat
- close to 0.0 → not cat
cat_cnn.py— CNN forward and backward passcnn_layers.py— convolution and max-pooling operationscnn_network.py— dense layer and gradient descent optimizerdata_loader.py— folder-based image loadingmain.py— training entrypoint, prediction loop, model save/load, metrics exportprepare_kaggle_data.py— converts Kaggle raw dataset files intodata/catsanddata/noncatsutils.py— sigmoid, loss, and helper functionsrequirements.txt— minimal runtime dependencies
Core runtime dependencies:
- Python 3.10+
- NumPy
- Pillow
Install the base dependencies:
pip install -r requirements.txtThe requirements file now includes the Kaggle workflow helpers as well, so the command above covers the full project setup.
If you want to install only the core runtime dependencies, use:
pip install numpy PillowThe training script expects a folder layout like this:
data/
├── cats/
└── noncats/
The loader:
- reads images from both folders
- converts them to grayscale
- resizes them to 64×64
- normalizes pixel values to the range [0, 1]
In my model training I used sagar2522/cat-vs-non-cat Kaggle dataset that you can find here.
Use either:
kaggle/kaggle.jsonin the repo, or~/.kaggle/kaggle.json
Example file:
{
"username": "YOUR_USERNAME",
"key": "YOUR_API_KEY"
}If you use ~/.kaggle/kaggle.json, set strict permissions:
chmod 600 ~/.kaggle/kaggle.jsonKAGGLE_CONFIG_DIR="$PWD/kaggle" kaggle datasets download -d sagar2522/cat-vs-non-cat -p data_raw
unzip -o data_raw/cat-vs-non-cat.zip -d data_rawpython prepare_kaggle_data.py --source-root data_raw --output-root data --max-per-class 300This generates:
data/catsdata/noncats
Train the model and save the weights:
python main.py --train-only --epochs 10 --lr 0.01 --save-path final_cnn_model.npz --metrics-path training_metrics.jsonWhat happens during training:
- load training images from
data/catsanddata/noncats - run the CNN forward pass
- compute binary cross-entropy loss
- backpropagate gradients
- update kernels and dense layer parameters with gradient descent
- save the model to
final_cnn_model.npz - evaluate metrics and save them to
training_metrics.json
Training metrics are computed on the loaded training set after the final epoch.
They are printed to the console and saved as JSON.
Fields in the metrics report:
lossaccuracyprecisionrecallf1tptnfpfnsamples
Based on the current training_metrics.json:
| Metric | Value |
|---|---|
| Loss | 0.6811 |
| Accuracy | 0.6555 |
| Precision | 0.0000 |
| Recall | 0.0000 |
| F1 | 0.0000 |
| TP | 0 |
| TN | 137 |
| FP | 0 |
| FN | 72 |
| Samples | 209 |
If you run main.py without --train-only, the script trains and then enters an interactive prediction loop:
python main.pyYou will be prompted for an image path. The image will be resized to 64×64, converted to grayscale, and classified as:
CatNot a cat
Press Ctrl+C or send EOF to exit.
You can generate an activation and weight-map images for a single prediction just like the images presented at the top:
python visualize_model.py path/to/image.jpg --model-path final_cnn_model.npz --output-dir visualizationsQuick example using one of the prepared training images:
SAMPLE=$(find data/cats -type f | head -n 1)
python visualize_model.py "$SAMPLE" --model-path final_cnn_model.npz --output-dir visualizations
open visualizationsThis creates PNG files showing:
- the input image
conv1activationspool1activationsconv2activationspool2activations- the dense-layer weight map reshaped to the pooled feature size
- a summary image with the predicted label and probability
The output is saved into the chosen directory and can be opened like normal images.
You can also generate an abstract animated neuron view that uses actual values from your trained model prediction.
SAMPLE=$(find data/cats -type f | head -n 1)
python export_animation_data.py "$SAMPLE" --model-path final_cnn_model.npz --output-json visualizations/network_animation_data.jsonOpen the file network_animation.html in your browser.
- It will try to load
visualizations/network_animation_data.jsonby default. - If blocked by browser file permissions, run a local server:
python -m http.server 8000Then open http://localhost:8000/network_animation.html.
The page animates each layer as abstract neurons where:
- node size/glow tracks activation magnitude
- color encodes sign (positive/negative)
- animated links indicate signal flow between layers
- output panel shows final prediction and probability
--epochs— number of training epochs, default10--lr— learning rate, default0.01--data-cats— path to cat images, defaultdata/cats--data-noncats— path to non-cat images, defaultdata/noncats--save-path— output model path, defaultfinal_cnn_model.npz--metrics-path— output metrics JSON path, defaulttraining_metrics.json--load-model— load an existing.npzmodel before training--train-only— train, save, evaluate, and exit without interactive prediction
--source-root— root folder containing extracted Kaggle files, defaultdata_raw--output-root— output dataset root, defaultdata--max-per-class— cap images per class, default300--seed— random seed, default42
image_path— image to inspect--model-path— saved.npzmodel file, defaultfinal_cnn_model.npz--output-dir— folder to write PNG visualizations, defaultvisualizations
image_path— image to inspect for animation data--model-path— saved.npzmodel file, defaultfinal_cnn_model.npz--output-json— output JSON data for animation, defaultvisualizations/network_animation_data.json
The saved .npz file stores the learned parameters:
kernel1kernel2dense_Wdense_b
- Convolution and pooling are implemented manually in Python/Numpy.
- Backpropagation is also implemented manually for learning purposes.
sigmoidand other math helpers are vectorized with NumPy.- The
visualize_model.pyand thenetwork_animation.htmlare the only files implemented using ai assisted synthesis(vibe-coded). - The project is small enough to understand end-to-end, but it is not optimized for performance.
The repo already includes a trained model artifact and metrics output from a recent run:
See LICENSE for details.







