WebLLM.io

Run LLMs in the browser with automatic local/cloud fallback.

Quick Start

npm install @webllm-io/sdk

Zero Config (Local AI)

import { createClient } from '@webllm-io/sdk';

const ai = createClient();

const response = await ai.chat.completions.create({
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);

With Cloud Fallback

const ai = createClient({
  cloud: '/api/chat/completions',
});

// Uses local AI when available, falls back to cloud
const stream = await ai.chat.completions.create({
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Cloud Only

const ai = createClient({
  local: false,
  cloud: {
    baseURL: 'https://api.openai.com/v1',
    apiKey: 'sk-...',
    model: 'gpt-4o-mini',
  },
});

Responsive AI (Device-Aware)

const ai = createClient({
  local: {
    tiers: {
      high: 'Llama-3-8B-q4f32',   // S/A-grade devices
      medium: 'auto',              // B-grade, SDK picks
      low: null,                   // C-grade, skip local
    },
  },
  cloud: {
    baseURL: 'https://api.my-backend.com/v1',
    apiKey: token,
    model: 'gpt-4o-mini',
  },
  onProgress: (p) => console.log(`${p.stage}: ${Math.round(p.progress * 100)}%`),
});

Abort Support

const controller = new AbortController();

const stream = ai.chat.completions.create({
  messages: [...],
  stream: true,
  signal: controller.signal,
});

// Cancel anytime
controller.abort();

Device Capability Check

import { checkCapability } from '@webllm-io/sdk';

const report = await checkCapability();
console.log(report.webgpu);  // true/false
console.log(report.grade);   // 'S' | 'A' | 'B' | 'C'
console.log(report.gpu);     // { vendor, name, vram }

Explicit Providers

import { createClient } from '@webllm-io/sdk';
import { mlc } from '@webllm-io/sdk/providers/mlc';
import { fetchSSE } from '@webllm-io/sdk/providers/fetch';

const ai = createClient({
  local: mlc({ model: 'Qwen2.5-3B-Instruct-q4f16_1-MLC' }),
  cloud: fetchSSE({ baseURL: 'https://api.example.com/v1', apiKey: '...' }),
});

How It Works

Device Detection — Checks WebGPU, estimates VRAM, scores device (S/A/B/C)
Model Selection — Picks appropriate model based on device grade
WebWorker Inference — Runs MLC inference in a Web Worker (UI stays responsive)
Smart Routing — Routes to local or cloud based on device capability, battery, and network
Automatic Fallback — Local failure → cloud; cloud failure → local (if ready)

Development

pnpm install
pnpm build
pnpm test

Apps

App	Description	Dev Command	Port
`apps/web`	Landing page	`pnpm --filter @webllm-io/web dev`	4321
`apps/docs`	Documentation (Starlight)	`pnpm --filter @webllm-io/docs dev`	4322
`apps/playground`	Interactive playground	`pnpm --filter @webllm-io/playground dev`	5173

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.changeset		.changeset
.github/workflows		.github/workflows
apps		apps
packages/sdk		packages/sdk
scripts		scripts
tooling/tsconfig		tooling/tsconfig
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
_headers		_headers
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json
vitest.workspace.ts		vitest.workspace.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebLLM.io

Quick Start

Zero Config (Local AI)

With Cloud Fallback

Cloud Only

Responsive AI (Device-Aware)

Abort Support

Device Capability Check

Explicit Providers

How It Works

Development

Apps

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

WebLLM-io/webllm.io

Folders and files

Latest commit

History

Repository files navigation

WebLLM.io

Quick Start

Zero Config (Local AI)

With Cloud Fallback

Cloud Only

Responsive AI (Device-Aware)

Abort Support

Device Capability Check

Explicit Providers

How It Works

Development

Apps

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages