I built NerdScan because nothing I tried worked well enough for pulling individual photos out of scanned album pages. It started as a practical, open-source prototype to automate the painful part: detecting each photo, cropping it cleanly, and exporting the results without manual box-drawing for every scan.

Open Source Prototype

NerdScan is the original public version of this idea. The source code is on GitHub as a reference implementation and early prototype.

View NerdScan on GitHub

Overview

NerdScan uses object detection to find photos embedded in scanned pages, then extracts them as separate images. I designed it around a real family archiving workflow: scan full album pages, process them in batches, and avoid manually cropping hundreds of photos one by one.

It also included quality-of-life features like folder-based year detection and EXIF date assignment, making the output easier to organize in photo libraries after export.

What It Did Well

  • Automated detection and cropping for multi-photo scans
  • Batch processing for large collections of album pages
  • Folder-based year extraction for automatic EXIF dating
  • Visual output for checking detections before trusting a full run
  • CLI-first workflow for power users and local processing

Why It Mattered

NerdScan proved the workflow was worth building, but it also showed the limits of a generic open-source approach for this niche problem. It gave me the baseline, the failure cases, and the product insight that later led to a much more specialized model and user experience.

What Came Next

NerdScan is no longer the version I actively maintain. The project evolved into ScanCropper, where I trained a custom model specifically for extracting photos from scanned album pages and wrapped it in a production web app with a much smoother workflow.

View on GitHub