Eval bug: [Router] Claude Code - model not found - invalid_request_error

### Name and Version

$ llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
version: 7360 (53ecd4fdb9)
built with GNU 12.2.0 for Linux x86_64



### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Ryzen 5950x + nVidia 3090 Ti

### Models

Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf

### Problem description & steps to reproduce

When I run the llama-server in router mode and connect Claude Code to it, the llama-server works, but Claude Code returns an immediate error:

  ⎿  API Error: 400 {"error":{"code":400,"message":"model not found","type":"invalid_request_error"}}

When running llama-server in normal single-model serve fashion (prior to router mode), Claude Code works fine.

### First Bad Commit

This is an issue with the new router mode feature. This is not a bug with an existing feature.

### Relevant log output

```shell
⎿  API Error: 400 {"error":{"code":400,"message":"model not found","type":"invalid_request_error"}}


Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv    operator(): all results received, terminating stream
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv    operator(): http: stream ended
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv  log_server_r: request: POST /v1/messages 127.0.0.1 200
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv  log_server_r: request:  {"model":"Devstral","messages":[{"role":"user","content":[{"type":"text","text":"Hi"}]},{"role":"assistant","content":[{"type":"text","text":"{"}]}],"system":[{"type":"text","text":"You are Claude Code, Anthropic's official CLI for Claude."},{"type":"text","text":"Analyze if this message indicates a new conversation topic. If it does, extract a 2-3 word title that captures the new topic. Format your response as a JSON object with two fields: 'isNewTopic' (boolean) and 'title' (string, or null if isNewTopic is false). Only include these fields, no other text. ONLY generate the JSON object, no other text (eg. no markdown)."}],"tools":[],"metadata":{"user_id":"user_2b56a1ab9d1f7f3d80c12ffb75a944a2f012828d1d29c5d54c7cd4a28169ad91_account__session_edae2a8c-5630-4bb6-98fc-7386afc79814"},"max_tokens":32000,"stream":true}
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv  log_server_r: response:
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] res  remove_waiti: remove task 0 from waiting list. current waiting = 1 (before remove)
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: [56401] srv          stop: all tasks already finished, no need to cancel
Dec 12 16:42:18 llm-api-engine-rig2 llama-server[729483]: srv  log_server_r: request: POST /v1/messages [removed] 200
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: [Router] Claude Code - model not found - invalid_request_error #17968

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: [Router] Claude Code - model not found - invalid_request_error #17968

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions