The ultimate portable, offline all-in-one audio studio
Text-to-Speech · Transcription - Subtitles - Music Generation · Sound Effects · Video Production · AI Chatbot
LocalSoundsAPI gives you both a full-featured browser-based web interface and a complete local REST API — use it interactively or call it from scripts, other apps, or automation tools.
Everything runs locally from one folder — no installation, no internet needed after setup.
- XTTS v2 – Top-tier multilingual voice cloning with speaker embeddings
- Fish Speech – Extremely fast and expressive cloned voices
- Kokoro 82M – Lightning-fast English TTS with 20 premium built-in voices
- Stable Audio Open 1.0 – Text-to-music and sound effects (CLAP-scored variants)
- ACE-Step 3.5B – Advanced multi-line prompt music generation (style + lyrics)
- Whisper – On-demand transcription & quality verification for every generated chunk
- Local LLM Chatbot – Built-in llama.cpp assistant for writing prompts, scripts, lyrics, stories, and full projects
- OpenRouter / LM Studio support – Optional cloud or external local backends for the chatbot
-
Professional post-processing on every engine
De-reverb, de-essing, loudness normalization (-23 LUFS), intelligent silence trimming, peak limiting, and optional Whisper verification with automatic retries. -
Full project system
Save jobs with progress tracking, automatic recovery (##recover##), and persistentjob.jsonfiles. -
Powerful built-in Chatbot
Helps you write perfect prompts, lyrics, stories, or entire scripts. Responses can be sent directly to any TTS or music engine with one click. -
Per-model device selection
Every model (XTTS, Fish, Kokoro, Stable Audio, ACE-Step, Whisper, local LLM) can be loaded on CPU or any available GPU independently — perfect for mixing heavy and light models. -
Run multiple instances
Use(portable) LocalSoundsAPI-Multi.batto launch several copies on different ports — great for parallel generation or different model setups. -
Video production tool
Turn any audio + transcription into a subtitled video (horizontal/vertical, solid color, transparent, or image/video background). -
Settings presets – Save and load all your favorite parameters instantly.
-
Download the repository code
Go to the main repo → Code → Download ZIP.
Extract it to any folder you like (e.g., Desktop, Documents, or a USB drive). This is your main project folder. -
Download the portable binaries from Releases
Go to Releases and download:portable-python-env-v1.7zbin.zip
-
Extract the binaries correctly
- Extract
portable-python-env-v1.7zdirectly into your main project folder → it creates thepython/subfolder. - Extract
bin.zipinto the existingbin/folder (inside your main project folder) → it populatesbin/ffmpeg/,bin/rubberband/, andbin/espeak-ng/.
- Extract
-
Launch the app
-
Single instance (recommended for most users):
Double-click(portable) LocalSoundsAPI-Single.bat
→ It always starts on port 5006 and opens http://127.0.0.1:5006 in your browser. -
Multiple instances (for running several generations in parallel):
Double-click(portable) LocalSoundsAPI-Multi.bat
→ It will ask you:
• How many instances do you want?
• Starting from which port? (e.g., 5006, 5007, 5008...)
Each instance gets its own port and browser tab.
-
First run only: The app auto-downloads all models (~8–12 GB total). This happens on a need-to-use basis once and can take 10–40 minutes. Just let it finish.
That's it – completely offline and portable after the first run!
models/– Place or auto-download TTS/music models herevoices/– Your reference voice samples for cloningprojects_output/– All saved jobs and final outputsbrain/– Chatbot history, archives, and system promptssettings/– Your saved parameter presetsbin/– Bundled ffmpeg, rubberband, eSpeak-ngpython/– Complete portable Python environment
project-root/
├── ACE-Step/ # Bundled ACE-Step repo (music generation)
├── bin/ # Portable tools
│ ├── ffmpeg/
│ ├── rubberband/
│ └── espeak-ng/
├── brain/ # Chatbot memory
│ ├── context_history/ # Current + archived chats
│ └── system_prompt.json
├── fish-speech/ # Bundled Fish Speech repo
├── models/ # All models (auto-downloaded or placed here)
│ ├── XTTS-v2/
│ ├── fish-speech-1.5/
│ ├── kokoro-82m/
│ ├── stable-audio-open-1.0/
│ ├── ace_step/
│ └── clap-htsat-unfused/
├── projects_output/ # Saved jobs and final outputs
├── voices/ # Your reference voice samples
├── settings/ # Saved parameter presets
├── static/ # Web UI (CSS, JS, icons)
├── templates/ # HTML pages
├── routes/ # All Flask endpoints
├── python/ # Portable Python environment (from the 7z)
├── (portable) LocalSoundsAPI-Single.bat
├── (portable) LocalSoundsAPI-Multi.bat
├── main.py
├── config.py
└── requirements.txt
- Completely self-contained – The bundled portable Python environment is isolated from your system Python. No pip installs, no conda environments, no dependency conflicts, no PATH headaches. Just extract and run.
- Truly offline – After the initial model downloads (which you can do once), everything works 100% without internet.
- No admin rights needed – Perfect for work/school computers or USB stick setups.
- Instant multi-GPU support – Load heavy models on your best GPU and lighter ones (Whisper, Kokoro, Fish) on another or on CPU — all from the same interface.
- First run? Let the app auto-download the models you need (XTTS, Fish, Kokoro, Stable Audio, ACE-Step, CLAP, Whisper). It only happens once per model.
- Low VRAM? Use the per-model device selectors — keep big models on your strongest GPU and run Whisper/Kokoro on CPU or a smaller card.
- Want to generate faster? Launch multiple instances with
LocalSoundsAPI-Multi.bat— one for TTS, one for music, one for the chatbot, etc. - Chatbot for content creation – Stuck on a prompt or lyric? Ask the built-in assistant — then click the little icons under its reply to send the text straight to XTTS, Fish, Kokoro, Stable Audio, or ACE-Step.
- Save everything you like – Use the “Save Path” field to create permanent projects in
projects_output/. Temporary generations disappear when you close the app (unless saved).
Enjoy a clean, powerful, completely local creative workflow — no cloud, no subscriptions, no compromises! 🎧✨