Skip to content

Portable offline AI audio studio with web UI & local API – XTTS, Fish Speech, Kokoro, Stable Audio, ACE-Step, voice cloning, music gen (no install)

License

Notifications You must be signed in to change notification settings

rookiemann/LocalSoundsAPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocalSoundsAPI

The ultimate portable, offline all-in-one audio studio
Text-to-Speech · Transcription - Subtitles - Music Generation · Sound Effects · Video Production · AI Chatbot

LocalSoundsAPI gives you both a full-featured browser-based web interface and a complete local REST API — use it interactively or call it from scripts, other apps, or automation tools.

Everything runs locally from one folder — no installation, no internet needed after setup.

Included Engines (all fully local & offline)

  • XTTS v2 – Top-tier multilingual voice cloning with speaker embeddings
  • Fish Speech – Extremely fast and expressive cloned voices
  • Kokoro 82M – Lightning-fast English TTS with 20 premium built-in voices
  • Stable Audio Open 1.0 – Text-to-music and sound effects (CLAP-scored variants)
  • ACE-Step 3.5B – Advanced multi-line prompt music generation (style + lyrics)
  • Whisper – On-demand transcription & quality verification for every generated chunk
  • Local LLM Chatbot – Built-in llama.cpp assistant for writing prompts, scripts, lyrics, stories, and full projects
  • OpenRouter / LM Studio support – Optional cloud or external local backends for the chatbot

Key Features

  • Professional post-processing on every engine
    De-reverb, de-essing, loudness normalization (-23 LUFS), intelligent silence trimming, peak limiting, and optional Whisper verification with automatic retries.

  • Full project system
    Save jobs with progress tracking, automatic recovery (##recover##), and persistent job.json files.

  • Powerful built-in Chatbot
    Helps you write perfect prompts, lyrics, stories, or entire scripts. Responses can be sent directly to any TTS or music engine with one click.

  • Per-model device selection
    Every model (XTTS, Fish, Kokoro, Stable Audio, ACE-Step, Whisper, local LLM) can be loaded on CPU or any available GPU independently — perfect for mixing heavy and light models.

  • Run multiple instances
    Use (portable) LocalSoundsAPI-Multi.bat to launch several copies on different ports — great for parallel generation or different model setups.

  • Video production tool
    Turn any audio + transcription into a subtitled video (horizontal/vertical, solid color, transparent, or image/video background).

  • Settings presets – Save and load all your favorite parameters instantly.

Quick Start – Fully Portable (No Installation)

  1. Download the repository code
    Go to the main repo → Code → Download ZIP.
    Extract it to any folder you like (e.g., Desktop, Documents, or a USB drive). This is your main project folder.

  2. Download the portable binaries from Releases
    Go to Releases and download:

    • portable-python-env-v1.7z
    • bin.zip
  3. Extract the binaries correctly

    • Extract portable-python-env-v1.7z directly into your main project folder → it creates the python/ subfolder.
    • Extract bin.zip into the existing bin/ folder (inside your main project folder) → it populates bin/ffmpeg/, bin/rubberband/, and bin/espeak-ng/.
  4. Launch the app

    • Single instance (recommended for most users):
      Double-click (portable) LocalSoundsAPI-Single.bat
      → It always starts on port 5006 and opens http://127.0.0.1:5006 in your browser.

    • Multiple instances (for running several generations in parallel):
      Double-click (portable) LocalSoundsAPI-Multi.bat
      → It will ask you:
      • How many instances do you want?
      • Starting from which port? (e.g., 5006, 5007, 5008...)
      Each instance gets its own port and browser tab.

First run only: The app auto-downloads all models (~8–12 GB total). This happens on a need-to-use basis once and can take 10–40 minutes. Just let it finish.

That's it – completely offline and portable after the first run!

Important Folders

  • models/ – Place or auto-download TTS/music models here
  • voices/ – Your reference voice samples for cloning
  • projects_output/ – All saved jobs and final outputs
  • brain/ – Chatbot history, archives, and system prompts
  • settings/ – Your saved parameter presets
  • bin/ – Bundled ffmpeg, rubberband, eSpeak-ng
  • python/ – Complete portable Python environment

Project Structure

project-root/
├── ACE-Step/                  # Bundled ACE-Step repo (music generation)
├── bin/                       # Portable tools
│   ├── ffmpeg/
│   ├── rubberband/
│   └── espeak-ng/
├── brain/                     # Chatbot memory
│   ├── context_history/       # Current + archived chats
│   └── system_prompt.json
├── fish-speech/               # Bundled Fish Speech repo
├── models/                    # All models (auto-downloaded or placed here)
│   ├── XTTS-v2/
│   ├── fish-speech-1.5/
│   ├── kokoro-82m/
│   ├── stable-audio-open-1.0/
│   ├── ace_step/
│   └── clap-htsat-unfused/
├── projects_output/           # Saved jobs and final outputs
├── voices/                    # Your reference voice samples
├── settings/                  # Saved parameter presets
├── static/                    # Web UI (CSS, JS, icons)
├── templates/                 # HTML pages
├── routes/                    # All Flask endpoints
├── python/                    # Portable Python environment (from the 7z)
├── (portable) LocalSoundsAPI-Single.bat
├── (portable) LocalSoundsAPI-Multi.bat
├── main.py
├── config.py
└── requirements.txt

Why This Feels So Smooth

  • Completely self-contained – The bundled portable Python environment is isolated from your system Python. No pip installs, no conda environments, no dependency conflicts, no PATH headaches. Just extract and run.
  • Truly offline – After the initial model downloads (which you can do once), everything works 100% without internet.
  • No admin rights needed – Perfect for work/school computers or USB stick setups.
  • Instant multi-GPU support – Load heavy models on your best GPU and lighter ones (Whisper, Kokoro, Fish) on another or on CPU — all from the same interface.

Tips for the Best Experience

  • First run? Let the app auto-download the models you need (XTTS, Fish, Kokoro, Stable Audio, ACE-Step, CLAP, Whisper). It only happens once per model.
  • Low VRAM? Use the per-model device selectors — keep big models on your strongest GPU and run Whisper/Kokoro on CPU or a smaller card.
  • Want to generate faster? Launch multiple instances with LocalSoundsAPI-Multi.bat — one for TTS, one for music, one for the chatbot, etc.
  • Chatbot for content creation – Stuck on a prompt or lyric? Ask the built-in assistant — then click the little icons under its reply to send the text straight to XTTS, Fish, Kokoro, Stable Audio, or ACE-Step.
  • Save everything you like – Use the “Save Path” field to create permanent projects in projects_output/. Temporary generations disappear when you close the app (unless saved).

Enjoy a clean, powerful, completely local creative workflow — no cloud, no subscriptions, no compromises! 🎧✨