Parakeet

A high-performance automatic speech recognition (ASR) server written in Go.
It uses NVIDIA's Parakeet TDT 0.6B model through ONNX Runtime to provide speech-to-text transcription via an OpenAI Whisper-compatible API.

Overview

Parakeet ASR Server provides a lightweight, production-ready speech recognition service without Python dependencies. It exposes an API compatible with OpenAI's Whisper, making it a drop-in replacement for applications already using that interface.

Key features:

OpenAI Whisper-compatible REST API
ONNX Runtime inference (CPU)
No Python or external dependencies at runtime
Support for multiple response formats (JSON, text, SRT, VTT)
Multilingual support (English and 25+ languages)
Quantized model support for reduced memory footprint

Model Architecture

This server uses the NVIDIA Parakeet TDT 0.6B model, converted to ONNX format by istupakov.

The architecture consists of:

Encoder: Conformer-based encoder with 1024-dimensional output. Processes 128-dimensional mel filterbank features with 8x temporal subsampling.
Decoder: Token-and-Duration Transducer (TDT) decoder that jointly predicts tokens and their durations. Uses a 2-layer LSTM with 640-dimensional hidden state.
Vocabulary: 8193 SentencePiece tokens including a blank token for CTC-style decoding.

The int8 quantized models require approximately 670MB of disk space and 2GB of RAM during inference.

Parakeet vs Whisper

Aspect	Parakeet TDT	OpenAI Whisper
Architecture	Conformer encoder + TDT decoder	Transformer encoder-decoder
Decoding	Non-autoregressive (parallel)	Autoregressive (sequential)
Speed	Faster inference due to TDT	Slower due to autoregressive decoding
Model size	0.6B parameters	0.04B - 1.5B parameters
Training data	NeMo ASR datasets	680K hours web audio
Primary focus	Accuracy and speed balance	Multilingual robustness
Timestamps	Duration-based prediction	Attention-based alignment

Parakeet TDT uses Token-and-Duration Transducer decoding, which predicts both the token and how many encoder frames to advance in a single step. This allows for faster inference compared to traditional autoregressive decoders while maintaining competitive accuracy.

Requirements

ONNX Runtime 1.17.0 or later (required at runtime)
Parakeet TDT ONNX models (downloaded separately)

For building from source:

Go 1.21 or later

Installing ONNX Runtime

ONNX Runtime is required to run the inference. Choose the installation method for your Linux distribution:

Ubuntu / Debian

# Option 1: Download from GitHub releases (recommended)
curl -L -o onnxruntime.tgz "https://github.com/microsoft/onnxruntime/releases/download/v1.17.0/onnxruntime-linux-x64-1.17.0.tgz"
tar xzf onnxruntime.tgz
sudo cp onnxruntime-linux-x64-1.17.0/lib/* /usr/local/lib/
sudo ldconfig

# Option 2: Using apt (if available in your version)
sudo apt update
sudo apt install libonnxruntime-dev

Fedora / RHEL / CentOS

# Download from GitHub releases
curl -L -o onnxruntime.tgz "https://github.com/microsoft/onnxruntime/releases/download/v1.17.0/onnxruntime-linux-x64-1.17.0.tgz"
tar xzf onnxruntime.tgz
sudo cp onnxruntime-linux-x64-1.17.0/lib/* /usr/local/lib64/
sudo ldconfig

Arch Linux

# From AUR
yay -S onnxruntime

# Or manually
curl -L -o onnxruntime.tgz "https://github.com/microsoft/onnxruntime/releases/download/v1.17.0/onnxruntime-linux-x64-1.17.0.tgz"
tar xzf onnxruntime.tgz
sudo cp onnxruntime-linux-x64-1.17.0/lib/* /usr/local/lib/
sudo ldconfig

Alpine Linux

apk add onnxruntime

Manual Installation (any distro)

# Download and extract
curl -L -o onnxruntime.tgz "https://github.com/microsoft/onnxruntime/releases/download/v1.17.0/onnxruntime-linux-x64-1.17.0.tgz"
tar xzf onnxruntime.tgz

# Install to /usr/local
sudo cp -r onnxruntime-linux-x64-1.17.0/lib/* /usr/local/lib/
sudo cp -r onnxruntime-linux-x64-1.17.0/include/* /usr/local/include/
sudo ldconfig

# Or set environment variable to use from current directory
export ONNXRUNTIME_LIB=$(pwd)/onnxruntime-linux-x64-1.17.0/lib/libonnxruntime.so

ARM64 (Raspberry Pi, etc.)

curl -L -o onnxruntime.tgz "https://github.com/microsoft/onnxruntime/releases/download/v1.17.0/onnxruntime-linux-aarch64-1.17.0.tgz"
tar xzf onnxruntime.tgz
sudo cp onnxruntime-linux-aarch64-1.17.0/lib/* /usr/local/lib/
sudo ldconfig

Verify Installation

# Check if library is found
ldconfig -p | grep onnxruntime

# Or find the library manually
find /usr -name "libonnxruntime.so*" 2>/dev/null

If the library is installed in a non-standard location, set the ONNXRUNTIME_LIB environment variable:

export ONNXRUNTIME_LIB=/path/to/libonnxruntime.so

Installation

From Release Binary

Download the latest release for your platform from the Releases page.

# Linux (amd64)
curl -L -o parakeet https://github.com/achetronic/parakeet/releases/latest/download/parakeet-linux-amd64
chmod +x parakeet

# Download models (using Makefile)
make models           # int8 quantized models (recommended, ~670MB)
# Or for full precision:
make models-fp32      # fp32 models (~2.5GB)

# Run (requires ONNX Runtime - see Installing ONNX Runtime section above)
./parakeet -port 5092 -models ./models

From Source

# Clone the repository
git clone https://github.com/achetronic/parakeet.git
cd parakeet

# Download models
make models           # int8 quantized models (recommended)
# Or: make models-fp32  # full precision models

# Build
make build

# Run
./bin/parakeet

Using Docker

The Docker image includes ONNX Runtime but requires models to be mounted at runtime.

# Pull the image
docker pull ghcr.io/achetronic/parakeet:latest

# Download models locally
mkdir -p models
make models

# Run the container
docker run -d \
  --name parakeet \
  -p 5092:5092 \
  -v $(pwd)/models:/models \
  ghcr.io/achetronic/parakeet:latest

Or build the image locally:

make docker-build
make docker-run

Docker Compose

version: '3.8'
services:
  parakeet:
    image: ghcr.io/achetronic/parakeet:latest
    ports:
      - "5092:5092"
    volumes:
      - ./models:/models
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5092/health"]
      interval: 30s
      timeout: 3s
      retries: 3

Configuration

Command Line Flags

Flag	Description	Default	Example
`-port`	HTTP server port	`5092`	`-port 8080`
`-models`	Path to models directory	`./models`	`-models /opt/parakeet/models`
`-debug`	Enable debug logging (verbose output for troubleshooting)	`false`	`-debug`

Examples:

# Basic usage
./parakeet

# Custom port and models directory
./parakeet -port 8080 -models /opt/models

# Enable debug logging for troubleshooting
./parakeet -debug

# Suppress ONNX Runtime schema warnings (stderr) while keeping debug logs
./parakeet -debug 2>&1 | grep -v "Schema error"

Environment Variables

Variable	Description	Default
`ONNXRUNTIME_LIB`	Path to libonnxruntime.so	Auto-detected

Model Files

The following files are required in the models directory:

File	Size	Description
`config.json`	97 B	Model configuration
`vocab.txt`	94 KB	SentencePiece vocabulary
`encoder-model.int8.onnx`	652 MB	Quantized encoder
`decoder_joint-model.int8.onnx`	18 MB	Quantized TDT decoder

For full precision models, use encoder-model.onnx (requires encoder-model.onnx.data, 2.5GB total) and decoder_joint-model.onnx (72MB).

API Reference

Transcribe Audio

POST /v1/audio/transcriptions

Transcribes audio into text. Compatible with OpenAI's Whisper API.

Request

Content-Type: multipart/form-data

Parameter	Type	Required	Description
`file`	file	Yes	Audio file (WAV format, max 25MB)
`model`	string	No	Model name (accepted but ignored)
`language`	string	No	ISO-639-1 language code (default: en)
`response_format`	string	No	Output format: json, text, srt, vtt, verbose_json
`prompt`	string	No	Accepted but ignored
`temperature`	float	No	Accepted but ignored

Response

JSON format (default):

{
  "text": "transcribed text here"
}

Verbose JSON format:

{
  "task": "transcribe",
  "language": "en",
  "duration": 5.2,
  "text": "transcribed text here",
  "segments": [
    {
      "id": 0,
      "start": 0,
      "end": 5.2,
      "text": "transcribed text here"
    }
  ]
}

Example

curl -X POST http://localhost:5092/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F language=en \
  -F response_format=json

List Models

GET /v1/models

Returns available models. Returns parakeet-tdt-0.6b and whisper-1 (alias for compatibility).

Health Check

GET /health

Returns {"status": "ok"} if the server is running.

Development

Available Make Targets

make help          # Show all available targets

# Build
make build         # Build the binary
make build-static  # Build statically linked binary

# Development
make run           # Build and run
make run-dev       # Run with development settings
make clean         # Remove build artifacts

# Code quality
make fmt           # Format code
make vet           # Run go vet
make lint          # Run all linters
make test          # Run tests
make test-coverage # Run tests with coverage report

# Models
make models        # Download int8 models (default)
make models-int8   # Download int8 quantized models
make models-fp32   # Download full precision models

# Docker
make docker-build  # Build Docker image
make docker-run    # Run Docker container

# Release
make release       # Build binaries for all platforms

Running Tests

make test

# With coverage
make test-coverage
open coverage.html

Project Structure

parakeet/
├── main.go                 # HTTP server and API handlers
├── internal/
│   └── asr/
│       ├── transcriber.go  # ONNX inference pipeline
│       ├── mel.go          # Mel filterbank feature extraction
│       └── audio.go        # WAV parsing and resampling
├── models/                 # ONNX models (not in repository)
├── Dockerfile
├── Makefile
├── .github/
│   └── workflows/
│       ├── ci.yaml         # CI pipeline
│       └── release.yaml    # Release pipeline
└── README.md

Troubleshooting

ONNX Runtime library not found

Install ONNX Runtime or set the library path:

export ONNXRUNTIME_LIB=/path/to/libonnxruntime.so

Common installation locations:

/usr/lib/libonnxruntime.so
/usr/local/lib/libonnxruntime.so
/opt/onnxruntime/lib/libonnxruntime.so

Encoder model not found

Download the models:

make models

Out of memory errors

Use the int8 quantized models (default) instead of fp32. The int8 models require approximately 2GB of RAM versus 6GB for fp32.

Unsupported audio format

Currently only WAV format is supported. Convert other formats using ffmpeg:

ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav

License

Code: MIT License
Parakeet Model: CC-BY-4.0

Credits

NVIDIA - Original Parakeet TDT 0.6B model
Ivan Stupakov (@istupakov) - ONNX conversion of the Parakeet model

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs/img		docs/img
internal		internal
models		models
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

achetronic/parakeet

Folders and files

Latest commit

History

Repository files navigation