A Node.js tool that automatically downloads podcast transcripts from Podscribe.ai for entire series, handling even large collections with 700+ episodes efficiently.
Podscribe.ai hosts podcast transcripts but lacks a bulk download option. This script:
- Fetches all episodes for a podcast series via the Podscribe API
- Automates browser interactions to download each transcript
- Organizes and saves all transcripts locally
- Tracks progress to allow resuming interrupted downloads
- Node.js (v16+)
- npm or yarn
# Install dependencies
npm install
# Run with required Series ID parameter
node podcast-scraper.js --seriesId=123
# Run with additional custom settings
node podcast-scraper.js --seriesId=123 --outputDir=./my-transcripts| Option | CLI Argument | Default | Description |
|---|---|---|---|
| Series ID | --seriesId |
Required | Podcast series ID |
| Output Directory | --outputDir |
./transcripts | Where transcripts will be saved |
| Download Wait Time | --downloadWaitTime |
5000 | Wait time for downloads (ms) |
| Request Delay | --requestDelay |
2000 | Delay between requests (ms) |
| Max Retries | --maxRetries |
3 | Retry attempts for failed downloads |
| Log File | --logFile |
./scraper_log.json | Progress log location |
| Headless Mode | --headless |
false | Run browser invisibly when true |
For help with all options:
node podcast-scraper.js --help# Basic usage with required Series ID
node podcast-scraper.js --seriesId=123
# Custom series and output location
node podcast-scraper.js --seriesId=359 --outputDir=./my-transcripts
# Avoid rate limiting
node podcast-scraper.js --seriesId=123 --requestDelay=5000
# Run without visible browser
node podcast-scraper.js --seriesId=123 --headless=true- Configurable: Command-line options for all settings
- Resilient: Progress tracking and automatic retries
- Rate-Limited: Prevents overwhelming the server
- Organized: Consistent file naming with metadata
- Failed Downloads: Increase
--requestDelay(default: 2000ms) - UI Interaction Issues: Run with
--headless=falseto observe browser - Timeouts: Increase
--downloadWaitTime(default: 5000ms) - Check Failures: Review
scraper_log.jsonfor error details
The script uses:
- Puppeteer: Browser automation
- Axios: API requests
- p-throttle: Rate limiting
- fs-extra: File operations
This tool is for educational purposes. Ensure you have permission to download content and respect website terms of service and rate limits.