Skip to content

Eval bug: [Router] Claude Code - model not found - invalid_request_error #17968

@isgallagher

Description

@isgallagher

Name and Version

$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 7360 (53ecd4f)
built with GNU 12.2.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

Ryzen 5950x + nVidia 3090 Ti

Models

Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf

Problem description & steps to reproduce

When I run the llama-server in router mode and connect Claude Code to it, the llama-server works, but Claude Code returns an immediate error:

⎿  API Error: 400 {"error":{"code":400,"message":"model not found","type":"invalid_request_error"}}

When running llama-server in normal single-model serve fashion (prior to router mode), Claude Code works fine.

First Bad Commit

This is an issue with the new router mode feature. This is not a bug with an existing feature.

Relevant log output

⎿  API Error: 400 {"error":{"code":400,"message":"model not found","type":"invalid_request_error"}}


Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv    operator(): all results received, terminating stream
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv    operator(): http: stream ended
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv  log_server_r: request: POST /v1/messages 127.0.0.1 200
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv  log_server_r: request:  {"model":"Devstral","messages":[{"role":"user","content":[{"type":"text","text":"Hi"}]},{"role":"assistant","content":[{"type":"text","text":"{"}]}],"system":[{"type":"text","text":"You are Claude Code, Anthropic's official CLI for Claude."},{"type":"text","text":"Analyze if this message indicates a new conversation topic. If it does, extract a 2-3 word title that captures the new topic. Format your response as a JSON object with two fields: 'isNewTopic' (boolean) and 'title' (string, or null if isNewTopic is false). Only include these fields, no other text. ONLY generate the JSON object, no other text (eg. no markdown)."}],"tools":[],"metadata":{"user_id":"user_2b56a1ab9d1f7f3d80c12ffb75a944a2f012828d1d29c5d54c7cd4a28169ad91_account__session_edae2a8c-5630-4bb6-98fc-7386afc79814"},"max_tokens":32000,"stream":true}
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv  log_server_r: response:
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] res  remove_waiti: remove task 0 from waiting list. current waiting = 1 (before remove)
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv          stop: all tasks already finished, no need to cancel
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: srv  log_server_r: request: POST /v1/messages [removed] 200

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions