I was staring at my Lighthouse report like it owed me money.
Performance: 62.
The culprit? Images. Hundreds of them scattered across markdown files, hosted on Flickr, Imgur, GitHub… all in glorious, unoptimized JPEG and PNG formats.
The manual fix would be:
- Download each image
- Convert to WebP
- Upload to my CDN
- Find and replace every URL in every markdown file
For 500 unique images? That’s not a weekend project. That’s a prison sentence.
So I did what any lazy engineer would do: I automated it.
The Problem
My blog uses Hugo with markdown files. Images are referenced everywhere:
| |
Each image had to be:
- Downloaded from the original source
- Converted to WebP (smaller, faster)
- Uploaded to my new CDN
- URL replaced in the markdown file
Multiply by 500. No thanks.
The Solution: An ETL Pipeline
I built bulk-webp-url-replacer—a Python tool that does exactly what it says:
| |
What it does:
- Extract — Scans all
.mdfiles for image URLs (frontmatter, galleries, inline) - Transform — Downloads each image and converts to WebP
- Load — Replaces all old URLs with new CDN paths
One command. 500 images. Done.
The Technical Bits
Regex Patterns for URL Extraction
Markdown has multiple ways to embed images. My extractor handles them all:
| |
Parallel Downloads
Downloading 500 images sequentially? Slow. With ThreadPoolExecutor:
| |
8 threads = 8x faster. Simple math.
Rate Limiting & Retries
Imgur wasn’t happy with my enthusiasm. HTTP 429 errors everywhere.
The fix: exponential backoff with browser-like headers.
| |
Smart Skipping
The tool saves a mapping.json after each run:
| |
Next run? It skips already-processed images. Incremental migrations FTW.
The Results
Before:
- 612 image references across 72 markdown files
- Images scattered across Flickr, Imgur, GitHub
- Lighthouse begging for mercy
After:
- All images converted to WebP
- Hosted on a single CDN
- URLs automatically updated
- One hour of work (mostly watching the progress bar)
Performance improvement:
- Average image size: 60-80% smaller
- Lighthouse Performance: 62 → 89
Lessons Learned
Automation scales. What would take days manually took an hour to build and minutes to run.
Rate limiting is real. Always add retries and backoff. Sites like Imgur will throttle you.
Dry-run first. The
--dry-runflag saved me from accidentally breaking 72 files.WebP is worth it. Same quality, fraction of the size. There’s no reason to serve JPEGs in 2026.
Try It Yourself
The tool is open source on GitHub.
| |
Your Lighthouse score will thank you. 🚀
Example Output
After running the migration tool, the URLs are automatically updated to point to the optimized WebP versions:
| |
