I built NerdScan because nothing I tried worked well enough for pulling individual photos out of scanned album pages. It started as a practical, open-source prototype to automate the painful part: detecting each photo, cropping it cleanly, and exporting the results without manual box-drawing for every scan.
Open Source Prototype
NerdScan is the original public version of this idea. The source code is on GitHub as a reference implementation and early prototype.
Overview
NerdScan uses object detection to find photos embedded in scanned pages, then extracts them as separate images. I designed it around a real family archiving workflow: scan full album pages, process them in batches, and avoid manually cropping hundreds of photos one by one.
It also included quality-of-life features like folder-based year detection and EXIF date assignment, making the output easier to organize in photo libraries after export.
What It Did Well
- Automated detection and cropping for multi-photo scans
- Batch processing for large collections of album pages
- Folder-based year extraction for automatic EXIF dating
- Visual output for checking detections before trusting a full run
- CLI-first workflow for power users and local processing
Why It Mattered
NerdScan proved the workflow was worth building, but it also showed the limits of a generic open-source approach for this niche problem. It gave me the baseline, the failure cases, and the product insight that later led to a much more specialized model and user experience.
What Came Next
NerdScan is no longer the version I actively maintain. The project evolved into ScanCropper, where I trained a custom model specifically for extracting photos from scanned album pages and wrapped it in a production web app with a much smoother workflow.