Scrappy
Bulk job-board scraper with 100+ sites, email enrichment, deterministic quality scoring, and multi-format exports.
Overview
Scrappy is a high-throughput job-board scraper that covers 100+ sites with deterministic quality scoring, email enrichment, and multi-format exports. Built for bulk-first scheduled operations with per-site rate limiting, proxy pools, and resume support.
Key Features
100+ Job Boards
Scrapes listings from over 100 job boards simultaneously with per-site rate limiting and proxy rotation.
Email Enrichment
Automatically enriches job listings with recruiter contact information using multiple enrichment strategies.
Quality Scoring
Deterministic quality scoring algorithm that ranks listings by relevance, freshness, and completeness.
Multi-Format Export
Exports results to CSV, JSONL, XLSX, and Parquet formats — ready for analysis or pipeline consumption.
Resume Support
Scrappy can resume interrupted scraping sessions, so long-running jobs survive connection drops.
Architecture
Scrappy uses a modular architecture with:
- Per-site scraper modules with configurable rate limits
- Proxy pool management for avoiding IP blocks
- Deterministic scoring engine (same input = same score)
- Pipeline-based export system supporting multiple formats
- Session persistence for resume capability
Tech Stack
- Core: TypeScript, Python
- Data: CSV, JSONL, Parquet
- Infrastructure: Proxy pools, rate limiters