-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Description
Name and Version
$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 7360 (53ecd4f)
built with GNU 12.2.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
Ryzen 5950x + nVidia 3090 Ti
Models
Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf
Problem description & steps to reproduce
When I run the llama-server in router mode and connect Claude Code to it, the llama-server works, but Claude Code returns an immediate error:
⎿ API Error: 400 {"error":{"code":400,"message":"model not found","type":"invalid_request_error"}}
When running llama-server in normal single-model serve fashion (prior to router mode), Claude Code works fine.
First Bad Commit
This is an issue with the new router mode feature. This is not a bug with an existing feature.
Relevant log output
⎿ API Error: 400 {"error":{"code":400,"message":"model not found","type":"invalid_request_error"}}
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv operator(): all results received, terminating stream
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv operator(): http: stream ended
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv log_server_r: request: POST /v1/messages 127.0.0.1 200
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv log_server_r: request: {"model":"Devstral","messages":[{"role":"user","content":[{"type":"text","text":"Hi"}]},{"role":"assistant","content":[{"type":"text","text":"{"}]}],"system":[{"type":"text","text":"You are Claude Code, Anthropic's official CLI for Claude."},{"type":"text","text":"Analyze if this message indicates a new conversation topic. If it does, extract a 2-3 word title that captures the new topic. Format your response as a JSON object with two fields: 'isNewTopic' (boolean) and 'title' (string, or null if isNewTopic is false). Only include these fields, no other text. ONLY generate the JSON object, no other text (eg. no markdown)."}],"tools":[],"metadata":{"user_id":"user_2b56a1ab9d1f7f3d80c12ffb75a944a2f012828d1d29c5d54c7cd4a28169ad91_account__session_edae2a8c-5630-4bb6-98fc-7386afc79814"},"max_tokens":32000,"stream":true}
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv log_server_r: response:
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] res remove_waiti: remove task 0 from waiting list. current waiting = 1 (before remove)
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv stop: all tasks already finished, no need to cancel
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: srv log_server_r: request: POST /v1/messages [removed] 200