Skip to content

Conversation

@ngxson
Copy link
Owner

@ngxson ngxson commented Dec 7, 2025

…pport)

Make sure to read the contributing guidelines before submitting a PR

Summary by CodeRabbit

  • New Features

    • Enhanced console navigation: Home/End, Delete, arrow keys with Ctrl‑arrow word movement, consistent behavior across Windows and POSIX.
    • Persistent command history with per-entry backup, Up/Down navigation, and deduplication of saved entries.
  • Bug Fixes

    • More reliable UTF‑8 decoding and character-aware editing.
    • Improved mid-line edit/redraw and unified escape-sequence handling for consistent cursor positioning.

✏️ Tip: You can customize this high-level summary in your review settings.

@ngxson ngxson marked this pull request as ready for review December 7, 2025 00:14
@coderabbitai
Copy link

coderabbitai bot commented Dec 7, 2025

Walkthrough

Adds UTF‑8‑aware, in‑place console line editing: UTF‑8 decoding and navigation, per‑character byte/width tracking, cursor movement and redraw helpers, insertion/deletion at character boundaries, enhanced escape‑sequence and Windows key handling (arrows, Home/End, Delete, Ctrl‑word), and persistent input history — all internal to common/console.cpp. (47 words)

Changes

Cohort / File(s) Summary
Console UTF‑8 Input & Navigation
common/console.cpp
Adds UTF‑8 utilities (decode_utf8, prev_utf8_char_pos, next_utf8_char_pos), per‑character widths and byte/char position tracking, line editing primitives and helpers (move_cursor, move_word_left, move_word_right, move_to_line_start, move_to_line_end, delete_at_cursor, set_line_contents, clear_current_line), redraw/tail update logic, enhanced CSI/escape parsing (arrows, Home/End, Up/Down history, Delete, optional Ctrl modifiers), Windows private key constants and mapping, and a history_t for persistent history with navigation and restoration. All changes are internal to the file.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Verify byte vs char position invariants, especially for multi‑byte and malformed UTF‑8 sequences.
  • Check correctness of widths computation and redraw/tail update (wrapping, wide characters/emojis).
  • Inspect escape‑sequence parsing and Windows vs POSIX branches for consistent behavior.
  • Review history backup/restore semantics when entering/exiting history entries.

Poem

🐰
I nibble bytes with careful paws,
Hop back one char without a pause,
Cursor twirls and tails realign,
UTF‑8 hops in tidy line,
My whiskers tap — the prompt says fine.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete, containing only a fragment '…pport)' and the template boilerplate, with no substantive explanation of the changes, rationale, or impact. Add a detailed description explaining what features were added (arrow key support, UTF-8 handling, history persistence), why they matter, and any testing performed or known limitations.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.70% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: enabling arrow left/right for line editing in the console, which matches the primary focus of the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch xsn/cli_arrow_left_right

Comment @coderabbitai help to get the list of available commands and usage tips.

@ngxson
Copy link
Owner Author

ngxson commented Dec 7, 2025

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 7, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
common/console.cpp (1)

382-388: Consider using parameterized ANSI escape sequences for efficiency.

The current loop emits one escape sequence per column. For large cursor movements, using parameterized sequences like \033[%dC (move right N columns) and \033[%dD (move left N columns) would reduce output volume.

 #else
-        if (delta < 0) {
-            for (int i = 0; i < -delta; i++) fprintf(out, "\b");
-        } else {
-            for (int i = 0; i < delta; i++) fprintf(out, "\033[C");
+        if (delta < 0) {
+            fprintf(out, "\033[%dD", -delta);
+        } else {
+            fprintf(out, "\033[%dC", delta);
         }
 #endif
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db97837 and 77ca259.

📒 Files selected for processing (1)
  • common/console.cpp (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/console.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/console.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: build-cmake-pkg / linux
  • GitHub Check: android-build
  • GitHub Check: ubuntu-latest-cmake-cuda
  • GitHub Check: ubuntu-24-cmake-vulkan
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: macOS-latest-cmake-ios
  • GitHub Check: ubuntu-24-cmake-vulkan-deb
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: ubuntu-22-cmake-sycl
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: macOS-latest-cmake-visionos
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
  • GitHub Check: ubuntu-24-cmake-webgpu
  • GitHub Check: ubuntu-24-wasm-webgpu
  • GitHub Check: macOS-latest-cmake-arm64
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
  • GitHub Check: server-windows
🔇 Additional comments (8)
common/console.cpp (8)

4-4: LGTM!

Required header for the new assertions added later in the code.


340-347: LGTM!

The UTF-8 backward navigation is correct. The function properly skips continuation bytes (10xxxxxx pattern) to find the previous character's start position.


349-356: LGTM!

Correct forward navigation through UTF-8 sequences, properly skipping continuation bytes.


401-407: LGTM!

The position tracking state and invariant assertions are well-designed. The assertions will catch any desynchronization between byte position, character position, and the widths array during development.


427-464: LGTM with a minor note.

The arrow key handling is correct. The left/right navigation properly maintains synchronization between byte_pos, char_pos, and the visual cursor.

Note: ESC followed by a non-[ and non-ESC character (e.g., Alt+letter on some terminals) is silently consumed. This is acceptable given the current scope of arrow key support.


465-493: LGTM!

The backspace handling correctly manages UTF-8 character deletion at any position. The redraw-and-clear strategy properly handles variable-width characters (including CJK wide chars) by clearing the exact space freed by deletion.


494-531: LGTM!

The character insertion logic correctly handles both end-of-line and mid-line insertion with proper UTF-8 byte management and display width tracking. The defensive clamping of negative widths to zero is a reasonable safeguard against non-printable character edge cases.


391-571: Verify cross-platform behavior of cursor handling in readline_advanced.

The readline_advanced function uses intentionally different implementations for Windows and POSIX systems (cursor movement via console API vs. ANSI escape sequences, width estimation via fixed value vs. wcwidth()). These platform-specific approaches should be tested on Linux, macOS, and Windows with:

  • Wide character (CJK) input and deletion
  • Cursor movement across multi-byte UTF-8 characters
  • Behavior at line wrap boundaries

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
common/console.cpp (2)

377-408: Consider optimizing line-wrapping arithmetic.

The while loops (lines 388-395) work correctly but could be more efficient with division and modulo operations for larger deltas:

int newY = newCursorPosition.Y + (newCursorPosition.X + delta) / width;
int newX = (newCursorPosition.X + delta) % width;
if (newX < 0) {
    newX += width;
    newY--;
}

However, for typical navigation (delta = ±1), the current approach is fine.


484-501: Consider extracting duplicated arrow key logic.

The left/right arrow handling logic is duplicated between escape sequences (lines 450-463) and Windows key constants (lines 485-498). While this works correctly, extracting the logic into helper functions would reduce duplication:

static void handle_left_arrow(size_t& char_pos, size_t& byte_pos, 
                               const std::string& line, const std::vector<int>& widths) {
    if (char_pos > 0) {
        int w = widths[char_pos - 1];
        move_cursor(-w);
        char_pos--;
        byte_pos = prev_utf8_char_pos(line, byte_pos);
    }
}

However, the duplication is minimal and the current approach is clear.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77ca259 and e0c69b3.

📒 Files selected for processing (1)
  • common/console.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/console.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/console.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: ggml-ci-x64-cpu-low-perf
  • GitHub Check: ubuntu-22-cmake-musa
  • GitHub Check: android-ndk-build (arm64-cpu, -D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TO...
  • GitHub Check: ubuntu-22-cmake-sycl-fp16
  • GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: ubuntu-24-cmake-vulkan-deb
  • GitHub Check: ubuntu-24-cmake-webgpu
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: ubuntu-22-cmake-sycl
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: ubuntu-24-wasm-webgpu
  • GitHub Check: ubuntu-24-cmake-vulkan
  • GitHub Check: macOS-latest-cmake-x64
  • GitHub Check: server-windows
🔇 Additional comments (6)
common/console.cpp (6)

4-4: LGTM! Well-designed approach to special key handling.

Using private-use Unicode values (0xE000–0xE003) for special keys is a clean solution that avoids conflicts with real input characters and allows uniform handling across platforms.

Also applies to: 39-47


359-375: LGTM! Correct UTF-8 boundary detection.

The helpers correctly identify UTF-8 continuation bytes ((byte & 0xC0) == 0x80) and handle boundary conditions properly.


420-427: LGTM! Good use of assertions to maintain invariants.

The assertions correctly validate that byte_pos >= char_pos (multi-byte UTF-8) and char_pos <= widths.size() (per-character tracking).


446-483: LGTM! Escape sequence handling is correct.

The code properly handles ANSI escape sequences for arrow keys (ESC [ D/C for left/right) and correctly discards unknown sequences.


502-530: LGTM! Backspace handling correctly maintains UTF-8 boundaries.

The deletion logic properly:

  1. Computes UTF-8 character boundaries
  2. Updates both line content and display widths
  3. Redraws affected text
  4. Restores cursor position

531-569: LGTM! Insertion logic correctly handles both end and middle positions.

The code properly:

  1. Converts codepoint to UTF-8
  2. Inserts at the correct byte position
  3. Maintains the widths vector
  4. Redraws affected text
  5. Updates cursor and positions correctly

The negative width clamping (if (real_w < 0) real_w = 0) appropriately handles control characters.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
common/console.cpp (1)

39-48: Windows arrow key mapping looks good; remove redundant wc == 0 check (already flagged previously).

Using private-use KEY_ARROW_* codes and mapping wVirtualKeyCode when UnicodeChar == 0 is a clean way to surface arrows as logical characters. However, the second if (wc == 0) { continue; } after the new switch is now dead/unreachable: all wc == 0 paths already either return a KEY_ARROW_* or continue via the default case in the switch.

You can safely delete that second if (wc == 0) block to simplify control flow:

-                if (wc == 0) {
-                    switch (record.Event.KeyEvent.wVirtualKeyCode) {
-                        case VK_LEFT:  return KEY_ARROW_LEFT;
-                        case VK_RIGHT: return KEY_ARROW_RIGHT;
-                        case VK_UP:    return KEY_ARROW_UP;
-                        case VK_DOWN:  return KEY_ARROW_DOWN;
-                        default:       continue;
-                    }
-                }
-                if (wc == 0) {
-                    continue;
-                }
+                if (wc == 0) {
+                    switch (record.Event.KeyEvent.wVirtualKeyCode) {
+                        case VK_LEFT:  return KEY_ARROW_LEFT;
+                        case VK_RIGHT: return KEY_ARROW_RIGHT;
+                        case VK_UP:    return KEY_ARROW_UP;
+                        case VK_DOWN:  return KEY_ARROW_DOWN;
+                        default:       continue;
+                    }
+                }

Also applies to: 191-202, 486-503

🧹 Nitpick comments (3)
common/console.cpp (3)

361-377: UTF‑8 prev/next helpers are correct for valid UTF‑8; optional minor robustness tweak.

prev_utf8_char_pos / next_utf8_char_pos correctly walk backward/forward over continuation bytes using (byte & 0xC0) == 0x80, assuming pos always lands on a character boundary (which your byte_pos invariants maintain). For extra clarity/robustness, you could cast to unsigned char when masking to avoid any confusion on platforms where char is signed:

-        while (pos > 0 && (line[pos] & 0xC0) == 0x80) {
+        while (pos > 0 && (static_cast<unsigned char>(line[pos]) & 0xC0) == 0x80) {
             pos--;
         }

(and similarly in next_utf8_char_pos).

This is purely cosmetic; the current logic is fine given valid UTF‑8 input and your invariants.


379-409: move_cursor implementation looks sound; consider clamping on Windows as a safety net.

The cross‑platform move_cursor(delta) abstraction is nice, and the Windows branch correctly wraps horizontally using the console buffer width while the POSIX branch uses \b and \033[C.

One small defensive improvement: on Windows, if delta is large and the current cursor is near the top of the buffer, newY could become negative before calling SetConsoleCursorPosition. Clamping newY to 0 (and optionally to bufferInfo.dwSize.Y - 1) would guard against odd console states without complicating the logic much:

-            while (newX < 0) {
-                newX += width;
-                newY--;
-            }
+            while (newX < 0) {
+                newX += width;
+                newY--;
+            }
+            if (newY < 0) {
+                newY = 0;
+            }

Not strictly required, but it makes the helper safer under unusual cursor positions.


422-424: UTF‑8‑aware editing with byte_pos/char_pos is well‑structured; be aware of combining‑mark edge cases.

The new byte_pos/char_pos state, assertions, and use of widths plus prev_utf8_char_pos/next_utf8_char_pos give you a clean model for left/right navigation, insertion in the middle of the line, and backspace, including multi‑byte UTF‑8 and multi‑column glyphs. The redraw logic for both deletion and insertion (re‑emitting the “tail” and moving the cursor back with move_cursor) is consistent and looks correct.

One subtle edge case: characters with display width 0 (e.g., some combining marks where wcwidth returns 0) will get widths[i] == 0. In those cases, left/right/backspace will update char_pos/byte_pos, but move_cursor(±w) becomes a no‑op, so the logical cursor index can drift relative to the physical console cursor when stepping over such code points. For most use cases this is probably acceptable, but if you ever want perfect behavior with combining sequences/grapheme clusters, you’d likely need to treat them as part of the previous cluster rather than independent positions.

Given the added complexity that would entail, I’d keep the current behavior as‑is unless you explicitly target those edge cases; just worth being aware of the limitation.

Also, per the repo guidelines, it would be good to sanity‑check this path on at least one Linux/macOS terminal and a Windows console (arrow left/right, backspace, insertion in the middle, and some simple UTF‑8 like CJK characters).

Also applies to: 427-429, 448-478, 486-503, 505-532, 534-570

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0c69b3 and cd05610.

📒 Files selected for processing (1)
  • common/console.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/console.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/console.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: windows-latest-cmake-sycl
  • GitHub Check: windows-latest-cmake (llvm-arm64-opencl-adreno, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN...
  • GitHub Check: windows-latest-cmake (openblas-x64, x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x6...
  • GitHub Check: windows-msys2 (CLANG64, clang-x86_64, Release)
  • GitHub Check: windows-latest-cmake (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD...
  • GitHub Check: windows-latest-cmake (llvm-arm64, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/ar...
  • GitHub Check: windows-msys2 (UCRT64, ucrt-x86_64, Release)
  • GitHub Check: windows-latest-cmake (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmak...
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: windows-2022-cmake-cuda (12.4)
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
  • GitHub Check: ubuntu-24-cmake-vulkan-deb
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
  • GitHub Check: macOS-latest-cmake-arm64
  • GitHub Check: server-windows
🔇 Additional comments (1)
common/console.cpp (1)

4-4: Adding <cassert> for invariants is appropriate.

Including <cassert> to enforce internal invariants (e.g., on char_pos/byte_pos) is reasonable here; it will help catch any future cursor/byte index mismatches during development without affecting release builds when NDEBUG is set.

@ngxson ngxson force-pushed the xsn/cli_arrow_left_right branch from cd05610 to 77ca259 Compare December 7, 2025 12:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
common/console.cpp (4)

4-5: New headers, key constants, and history vector look fine; consider bounding history size

  • Adding <cassert> and <cstddef> matches the new use of assert and size_t.
  • Using private-use Unicode values (0xE000–0xE005) for Windows-only key tokens is a reasonable way to keep them out of normal input space.
  • A single static std::vector<std::string> history; is fine for this console, but it will grow unbounded over a long-lived process.

If you care about long sessions, you might cap history (e.g., keep last N entries) or offer a way to clear it; otherwise this is acceptable as-is.

Also applies to: 40-52, 62-63


344-370: UTF‑8 helpers work for well‑formed input; add minimal validation for malformed sequences

decode_utf8 correctly decodes 1–4 byte sequences and falls back to U+FFFD when the leading byte doesn’t match any known pattern, and prev_utf8_char_pos / next_utf8_char_pos are standard “walk over continuation bytes” helpers that assume valid UTF‑8.

One small robustness gap is that decode_utf8 doesn’t currently check that continuation bytes are in the 10xxxxxx range; malformed sequences like a 2‑byte lead followed by an ASCII byte will be decoded as if they were valid. If you want to keep behavior predictable on bad input, you can cheaply validate continuation bytes before constructing the codepoint, e.g.:

-        if ((c & 0xE0u) == 0xC0u && pos + 1 < input.size()) {
-            advance = 2;
-            return ((c & 0x1Fu) << 6) | (static_cast<unsigned char>(input[pos + 1]) & 0x3Fu);
-        }
+        if ((c & 0xE0u) == 0xC0u && pos + 1 < input.size()) {
+            unsigned char c1 = static_cast<unsigned char>(input[pos + 1]);
+            if ((c1 & 0xC0u) != 0x80u) {
+                advance = 1;
+                return 0xFFFD;
+            }
+            advance = 2;
+            return ((c & 0x1Fu) << 6) | (c1 & 0x3Fu);
+        }
@@
-        if ((c & 0xF0u) == 0xE0u && pos + 2 < input.size()) {
-            advance = 3;
-            return ((c & 0x0Fu) << 12) |
-                   ((static_cast<unsigned char>(input[pos + 1]) & 0x3Fu) << 6) |
-                   (static_cast<unsigned char>(input[pos + 2]) & 0x3Fu);
-        }
+        if ((c & 0xF0u) == 0xE0u && pos + 2 < input.size()) {
+            unsigned char c1 = static_cast<unsigned char>(input[pos + 1]);
+            unsigned char c2 = static_cast<unsigned char>(input[pos + 2]);
+            if ((c1 & 0xC0u) != 0x80u || (c2 & 0xC0u) != 0x80u) {
+                advance = 1;
+                return 0xFFFD;
+            }
+            advance = 3;
+            return ((c & 0x0Fu) << 12) |
+                   ((c1 & 0x3Fu) << 6) |
+                   (c2 & 0x3Fu);
+        }
@@
-        if ((c & 0xF8u) == 0xF0u && pos + 3 < input.size()) {
-            advance = 4;
-            return ((c & 0x07u) << 18) |
-                   ((static_cast<unsigned char>(input[pos + 1]) & 0x3Fu) << 12) |
-                   ((static_cast<unsigned char>(input[pos + 2]) & 0x3Fu) << 6) |
-                   (static_cast<unsigned char>(input[pos + 3]) & 0x3Fu);
-        }
+        if ((c & 0xF8u) == 0xF0u && pos + 3 < input.size()) {
+            unsigned char c1 = static_cast<unsigned char>(input[pos + 1]);
+            unsigned char c2 = static_cast<unsigned char>(input[pos + 2]);
+            unsigned char c3 = static_cast<unsigned char>(input[pos + 3]);
+            if ((c1 & 0xC0u) != 0x80u || (c2 & 0xC0u) != 0x80u || (c3 & 0xC0u) != 0x80u) {
+                advance = 1;
+                return 0xFFFD;
+            }
+            advance = 4;
+            return ((c & 0x07u) << 18) |
+                   ((c1 & 0x3Fu) << 12) |
+                   ((c2 & 0x3Fu) << 6) |
+                   (c3 & 0x3Fu);
+        }

prev_utf8_char_pos and next_utf8_char_pos then continue to behave well on the strings you build via append_utf8.

Also applies to: 393-409


415-472: Cursor helpers overall look good; consider clamping Windows coordinates in move_cursor

clear_current_line, set_line_contents, and move_to_line_start / move_to_line_end nicely centralize the display-width accounting and keep char_pos / byte_pos consistent with what’s on screen.

In move_cursor(int delta) on Windows, for large negative or positive deltas near the top or bottom of the buffer, newY can become < 0 or ≥ bufferInfo.dwSize.Y, which makes SetConsoleCursorPosition receive out-of-range coordinates. Because your deltas are in practice bounded by the current line width, this is unlikely to matter, but adding a clamp is cheap insurance:

         if (hConsole != NULL) {
             CONSOLE_SCREEN_BUFFER_INFO bufferInfo;
             GetConsoleScreenBufferInfo(hConsole, &bufferInfo);
             COORD newCursorPosition = bufferInfo.dwCursorPosition;
             int width = bufferInfo.dwSize.X;
+            int height = bufferInfo.dwSize.Y;
             int newX = newCursorPosition.X + delta;
             int newY = newCursorPosition.Y;
@@
             while (newX < 0) {
                 newX += width;
                 newY--;
             }
 
-            newCursorPosition.X = newX;
-            newCursorPosition.Y = newY;
+            if (newY < 0) {
+                newY = 0;
+                newX = 0;
+            } else if (newY >= height) {
+                newY = height - 1;
+                newX = width - 1;
+            }
+
+            newCursorPosition.X = static_cast<SHORT>(newX);
+            newCursorPosition.Y = static_cast<SHORT>(newY);
             SetConsoleCursorPosition(hConsole, newCursorPosition);
         }

761-767: History persistence works; think about multiline prompts and empty/whitespace entries

The new history persistence at the end of readline_advanced works as intended: you trim a trailing newline, then push the resulting string for non‑EOF inputs.

Two minor considerations:

  • For multiline input, you currently push an entry each time readline_advanced returns, even when has_more is true. If you’d prefer history to contain only “complete” prompts, you could gate this on has_more == false.
  • If you don’t want purely whitespace lines in history, you could trim and skip when the result is empty.

Both are optional UX tweaks; the current behavior is acceptable if it matches how you expect the console to be used.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd05610 and 440ac83.

📒 Files selected for processing (1)
  • common/console.cpp (9 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/console.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/console.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🧬 Code graph analysis (1)
common/console.cpp (3)
tools/run/run.cpp (1)
  • line (1183-1183)
vendor/nlohmann/json.hpp (1)
  • move (6987-6990)
tools/server/public_legacy/json-schema-to-grammar.mjs (1)
  • digits (145-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: windows-latest-cmake-sycl
  • GitHub Check: ggml-ci-x64-cpu-high-perf
  • GitHub Check: ios-xcode-build
  • GitHub Check: windows-msys2 (CLANG64, clang-x86_64, Release)
  • GitHub Check: macOS-latest-cmake-visionos
  • GitHub Check: ubuntu-22-cmake-musa
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-24-cmake-vulkan-deb
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: ubuntu-22-cmake-sycl-fp16
  • GitHub Check: ubuntu-24-wasm-webgpu
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
  • GitHub Check: ubuntu-22-cmake-sycl
  • GitHub Check: macOS-latest-cmake-x64
  • GitHub Check: macOS-latest-cmake-arm64
  • GitHub Check: server-windows
🔇 Additional comments (2)
common/console.cpp (2)

619-656: Windows virtual‑key handling integrates cleanly with the navigation and history logic

The Windows‑specific handling of KEY_ARROW_LEFT/RIGHT, KEY_HOME/END, and KEY_ARROW_UP/DOWN mirrors the CSI behavior on POSIX:

  • Left/right correctly move by the prior/next character width and adjust both char_pos and byte_pos using prev_utf8_char_pos / next_utf8_char_pos.
  • Home/End reuse move_to_line_start / move_to_line_end, so cursor semantics stay consistent.
  • Up/Down reuse the same history navigation logic as the CSI 'A' / 'B' path, including restoring the “empty current line” when moving past the latest history entry.

This is a nice way to keep the higher‑level editing logic platform‑agnostic.


657-723: Backspace and insertion logic correctly handle UTF‑8 and screen reflow

The new backspace and insertion code in readline_advanced looks sound:

  • Backspace:
    • Only acts when char_pos > 0, uses widths[char_pos - 1] for visual movement, and uses prev_utf8_char_pos / byte_pos to find the byte range to erase.
    • Updates line and widths consistently, then redraws the tail of the line and clears the trailing character cell(s) that may remain on screen.
  • Insertion:
    • Builds a UTF‑8 sequence via append_utf8, estimates its width, and either appends at the end or inserts into the middle.
    • After inserting in the middle, it prints the new codepoint, redraws the tail based on widths, then moves the cursor back over the tail so char_pos / byte_pos stay aligned with the logical cursor.
    • Uses put_codepoint in both paths so Windows gets accurate width measurements.

This gives you correct behavior for multi‑byte UTF‑8 characters, including insertion and deletion in the middle of the line.

@ServeurpersoCom
Copy link

Last commit add CTRL + left/right arrow -> word jump
https://github.com/user-attachments/assets/c2ab9123-586b-428b-85f3-4a6b860990a4

@ngxson
Copy link
Owner Author

ngxson commented Dec 7, 2025

Last commit add CTRL + left/right arrow -> word jump https://github.com/user-attachments/assets/c2ab9123-586b-428b-85f3-4a6b860990a4

Nice, thanks! I'll push it as a dedicated PR once the current PR is merged

ngxson and others added 9 commits December 8, 2025 15:45
- Add KEY_CTRL_ARROW_LEFT and KEY_CTRL_ARROW_RIGHT codes
- Windows: detect CTRL modifier via dwControlKeyState
- Linux: parse ANSI sequences with modifier (1;5D/C)
- Implement move_word_left/right with space-skipping logic
- Refactor escape sequence parsing to accumulate params
- Windows: VK_DELETE detection
- Linux: ESC[3~ sequence parsing
- Forward character deletion with UTF-8 support
- Edit any history line during UP/DOWN navigation, edits persist
- Pressing Enter appends edited version as new history entry
- Original line stay untouched in their positions
@ngxson
Copy link
Owner Author

ngxson commented Dec 8, 2025

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
common/console.cpp (3)

4-7: UTF‑8 decoding and char-boundary helpers are sound; consider small robustness tweaks

  • The added headers (<cassert>, <cstddef>, <cctype>, <cwctype>) match the new usages (asserts, size_t, std::isdigit, std::iswspace), so dependency surface is consistent.
  • decode_utf8 and the prev_utf8_char_pos/next_utf8_char_pos helpers correctly walk byte boundaries assuming well‑formed UTF‑8 and never return advance == 0, so loops that consume by advance cannot hang.
  • If you ever expect malformed UTF‑8 (e.g., pasted binary), you might want to:
    • Assert pos < input.size() in decode_utf8, and
    • Optionally validate continuation bytes ((b & 0xC0) == 0x80) before treating them as part of a multi‑byte sequence, falling back to 0xFFFD when that fails.
      These aren’t correctness blockers given current controlled sources of line, but they would make the helpers more defensive if input sources broaden later.

Also applies to: 352-417


419-451: Cursor movement and redraw utilities look consistent; tail‑redraw logic is duplicated

  • delete_at_cursor, clear_current_line, set_line_contents, move_to_line_start/end, and move_cursor maintain char_pos, byte_pos, and widths invariants correctly and only move the cursor by the summed widths they themselves printed, so they shouldn’t walk off the visible line under normal usage.
  • The “redraw tail then move cursor back” pattern in delete_at_cursor is nearly identical to what the backspace handler does later in readline_advanced, so there’s some duplication of a fairly intricate sequence (recompute byte positions, print tail codepoints with stored widths, compute tail_width, then move_cursor(-(tail_width + w))).
  • Extracting that common “erase one char and redraw tail” behavior into a shared helper (parameterized by whether deletion is backward or forward) would reduce the chance of future drift between backspace and Delete handling while keeping the invariants in one place.

Also applies to: 453-508, 627-658


42-56: Windows key mapping to private char32_t codes is clean; consider tightening the abstraction

  • Mapping Windows virtual key codes (arrows, Home/End, Delete, and Ctrl+Arrow variants) to reserved private‑use char32_t values in getchar32() is a neat way to unify handling with the POSIX escape‑sequence path; readline_advanced just switches on these logical keys.
  • The implementation correctly detects Ctrl via dwControlKeyState & (LEFT_CTRL_PRESSED | RIGHT_CTRL_PRESSED) and distinguishes plain vs. Ctrl+Left/Right.
  • As a minor improvement, you could wrap these special key codes in a enum class (or at least a named enum) instead of raw char32_t constants, which would make it harder to accidentally treat them as real Unicode code points elsewhere and would self‑document the intent at call sites.

Also applies to: 187-214, 799-852

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 440ac83 and ba3f2e7.

📒 Files selected for processing (1)
  • common/console.cpp (8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/console.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/console.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: WebUI Check
  • GitHub Check: ubuntu-latest-cmake-cuda
  • GitHub Check: macOS-latest-cmake-visionos
  • GitHub Check: windows-latest-cmake-hip
  • GitHub Check: windows-2022-cmake-cuda (12.4)
  • GitHub Check: windows-latest-cmake-sycl
  • GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: ubuntu-24-cmake-vulkan-deb
  • GitHub Check: ubuntu-24-wasm-webgpu
  • GitHub Check: ubuntu-22-cmake-hip
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: ubuntu-24-cmake-vulkan
  • GitHub Check: macOS-latest-cmake-arm64
  • GitHub Check: macOS-latest-cmake-x64
  • GitHub Check: server-windows
🔇 Additional comments (3)
common/console.cpp (3)

510-541: Word navigation behavior is reasonable and locale‑aware

  • has_ctrl_modifier correctly recognizes Ctrl‑modified CSI sequences like ESC [ 1 ; 5 D / ESC [ 1 ; 5 C by treating any ;‑separated parameter with value 5 as “Ctrl pressed”.
  • is_space_codepoint uses std::iswspace on the decoded code point, so word splitting respects the active locale (after the earlier setlocale(LC_ALL, "")) and handles non‑ASCII space characters.
  • move_word_left/move_word_right implement intuitive behavior:
    • Left: skip spaces leftwards, then skip a word of non‑spaces, landing at the word start.
    • Right: skip spaces rightwards, then a word, then trailing spaces, landing at the start of the next word (or end of line).
      Given the invariants on char_pos, byte_pos, and widths, the loops respect bounds and don’t overrun line.

Also applies to: 542-625


705-791: CSI parsing now fully consumes escape sequences and fixes the Ctrl+Arrow residue issue

  • The new ESC [ handling that accumulates all intermediate bytes into params until the final byte (AZ, az, ~, or WEOF) ensures the entire CSI sequence is consumed before acting, so there are no leftover bytes to be misinterpreted as normal input.
  • Using has_ctrl_modifier(params) for the 'C'/'D' cases cleanly distinguishes plain arrows from Ctrl+Left/Right (e.g., ESC [ 1 ; 5 D), and maps them to character vs. word navigation, respectively.
  • The tilde sequences for Home/End/Delete (1~/7~/4~/8~/3~) are handled via the digits extraction from params, with unknown numeric sequences simply ignored after being fully read, which is a safe default.
    This redesign addresses the earlier issue where parts of extended CSI sequences (like ";5D") could leak into the input stream and show up as spurious characters.

854-922: Backspace and insertion paths preserve UTF‑8 and width invariants

  • The backspace branch walks to the previous UTF‑8 character boundary via prev_utf8_char_pos, erases the correct byte span, updates widths, redraws the tail with stored widths, and uses move_cursor(-(tail_width + w)) to restore the cursor — this matches the forward‑delete logic in delete_at_cursor.
  • The insertion path correctly handles both appending and mid‑line insertion:
    • append_utf8 encodes input_char into UTF‑8 bytes.
    • You measure and store real_w from put_codepoint (clamped at 0), so widths stays non‑negative.
    • For mid‑line inserts, you print the inserted codepoint, then reprint the tail and move the cursor back by tail_width, updating byte_pos/char_pos afterwards.
  • The asserts at the top of the loop (char_pos <= byte_pos and char_pos <= widths.size()) are consistent with these updates and should catch any future regressions in editing paths early in debug builds.

@ngxson
Copy link
Owner Author

ngxson commented Dec 8, 2025

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba3f2e7 and ae223fc.

📒 Files selected for processing (1)
  • common/console.cpp (8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/console.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/console.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: build-cmake-pkg / linux
  • GitHub Check: ubuntu-22-cmake-musa
  • GitHub Check: ubuntu-24-cmake-vulkan
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: ubuntu-24-cmake-vulkan-deb
  • GitHub Check: windows-latest-cmake (llvm-arm64-opencl-adreno, arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN...
  • GitHub Check: windows-msys2 (CLANG64, clang-x86_64, Release)
  • GitHub Check: macOS-latest-cmake-x64
  • GitHub Check: macOS-latest-cmake-arm64
  • GitHub Check: macOS-latest-cmake-tvos
  • GitHub Check: windows-msys2 (UCRT64, ucrt-x86_64, Release)
  • GitHub Check: windows-latest-cmake (vulkan-x64, x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD...
  • GitHub Check: ubuntu-24-cmake-webgpu
  • GitHub Check: windows-latest-cmake (cpu-x64 (static), x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmak...
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
  • GitHub Check: server-windows
🔇 Additional comments (11)
common/console.cpp (11)

4-8: LGTM!

The added headers are appropriate for the UTF-8 and line-editing functionality. The Windows-specific key constants are well-placed in the private-use Unicode area (U+E000-U+F8FF) to avoid conflicts with legitimate input characters.

Also applies to: 43-62


200-213: LGTM!

The Windows-specific key handling correctly detects the Ctrl modifier and maps virtual key codes to the private key constants. The logic is clean and handles all the necessary navigation keys.


400-416: LGTM!

The UTF-8 character boundary navigation functions correctly identify continuation bytes (pattern 10xxxxxx) and handle edge cases at string boundaries.


424-507: LGTM!

The line editing helper functions are well-structured and correctly handle:

  • UTF-8 character boundaries during deletion
  • Width tracking for proper cursor positioning
  • Redrawing affected portions of the line
  • Clearing and replacing line contents

579-624: Verify the word-right navigation logic.

The move_word_right function has three sequential loops:

  1. Skip current whitespace (588-597)
  2. Skip the next word (599-608)
  3. Skip trailing whitespace again (610-619)

This means pressing Ctrl+Right will skip over the next word entirely and position at the start of the word after next. For example, with cursor at the start of "hello world test", Ctrl+Right would jump to the start of "test", skipping "world" entirely.

Typically, word-right navigation (bash/readline style) positions at the end of the current word or the start of the next word, not the start of the word-after-next. Can you confirm this three-loop behavior is intentional?

If this should match typical bash behavior, consider simplifying to:

 static void move_word_right(size_t & char_pos, size_t & byte_pos, const std::vector<int> & widths, const std::string & line) {
     if (char_pos >= widths.size()) {
         return;
     }
 
     size_t new_char_pos = char_pos;
     size_t new_byte_pos = byte_pos;
     int move_width = 0;
 
-    while (new_char_pos < widths.size()) {
+    // Skip the current word if we're in one
+    size_t advance = 0;
+    char32_t cp = decode_utf8(line, new_byte_pos, advance);
+    bool in_word = !is_space_codepoint(cp);
+    
+    if (in_word) {
+        while (new_char_pos < widths.size()) {
+            size_t advance = 0;
+            char32_t cp = decode_utf8(line, new_byte_pos, advance);
+            if (is_space_codepoint(cp)) {
+                break;
+            }
+            move_width += widths[new_char_pos];
+            new_char_pos++;
+            new_byte_pos += advance;
+        }
+    }
+    
+    // Skip whitespace to reach next word
+    while (new_char_pos < widths.size()) {
         size_t advance = 0;
         char32_t cp = decode_utf8(line, new_byte_pos, advance);
-        if (!is_space_codepoint(cp)) {
+        if (!is_space_codepoint(cp)) {
             break;
         }
         move_width += widths[new_char_pos];
         new_char_pos++;
         new_byte_pos += advance;
     }
-
-    while (new_char_pos < widths.size()) {
-        size_t advance = 0;
-        char32_t cp = decode_utf8(line, new_byte_pos, advance);
-        if (is_space_codepoint(cp)) {
-            break;
-        }
-        move_width += widths[new_char_pos];
-        new_char_pos++;
-        new_byte_pos += advance;
-    }
-
-    while (new_char_pos < widths.size()) {
-        size_t advance = 0;
-        char32_t cp = decode_utf8(line, new_byte_pos, advance);
-        if (!is_space_codepoint(cp)) {
-            break;
-        }
-        move_width += widths[new_char_pos];
-        new_char_pos++;
-        new_byte_pos += advance;
-    }
 
     move_cursor(move_width);
     char_pos = new_char_pos;
     byte_pos = new_byte_pos;
 }

626-711: LGTM!

The cursor movement function properly handles platform differences (Windows console wrapping vs POSIX ANSI codes). The history structure implements bash-style editing where modifications persist during navigation, as confirmed in previous review discussions.


728-748: LGTM!

The position invariant assertions (char_pos ≤ byte_pos ≤ widths.size()) are excellent defensive checks. The history navigation lambdas provide clean abstractions over the underlying history operations.


768-841: LGTM!

The escape sequence handling comprehensively parses CSI sequences with parameter support, Ctrl modifier detection, and proper handling of arrow keys, Home/End, Delete, and history navigation. Previous review discussions confirmed the escape sequence parsing issues were addressed.


877-944: LGTM!

The backspace and character insertion logic correctly handles:

  • UTF-8 multi-byte characters
  • Per-character width tracking
  • Tail redrawing for mid-line edits
  • Position synchronization (byte_pos and char_pos)

981-988: LGTM!

The history management correctly strips the trailing newline before storage and avoids adding empty entries. The TODO comment about multiline history support is a reasonable future enhancement note.


842-876: Compilation error: undefined function sync_history_line().

Line 867 calls sync_history_line(), but this function is not defined anywhere in the code. This will cause a compilation failure.

Looking at the context, this appears to be unnecessary since the history is managed elsewhere. You can simply remove this call:

             } else if (input_char == KEY_DELETE) {
                 delete_at_cursor(line, widths, char_pos, byte_pos);
-                sync_history_line();
             } else if (input_char == KEY_ARROW_UP || input_char == KEY_ARROW_DOWN) {

The history synchronization happens automatically when you press Enter (lines 981-988), so explicit syncing during editing is not needed.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
common/console.cpp (1)

351-395: Consider adding semantic UTF-8 validation.

The decoder now correctly validates continuation byte patterns (10xxxxxx), which addresses the previous review feedback. However, to be fully UTF-8 compliant, consider adding:

  1. Overlong encoding rejection: Disallow code points encoded in more bytes than necessary (e.g., 0xC0 0x80 for U+0000 instead of 0x00).
  2. Surrogate half rejection: Reject U+D800–U+DFFF, which are invalid in UTF-8.
  3. Range validation: Reject code points > U+10FFFF.

These checks would prevent accepting technically malformed UTF-8 sequences that could cause issues downstream.

Example validation additions:

// After decoding 2-byte sequence:
char32_t cp = ((c & 0x1Fu) << 6) | (c1 & 0x3Fu);
if (cp < 0x80) { // overlong
    advance = 1;
    return 0xFFFD;
}

// After decoding 3-byte sequence:
char32_t cp = ((c & 0x0Fu) << 12) | ((c1 & 0x3Fu) << 6) | (c2 & 0x3Fu);
if (cp < 0x800 || (cp >= 0xD800 && cp <= 0xDFFF)) { // overlong or surrogate
    advance = 1;
    return 0xFFFD;
}

// After decoding 4-byte sequence:
char32_t cp = ((c & 0x07u) << 18) | ((c1 & 0x3Fu) << 12) | ((c2 & 0x3Fu) << 6) | (c3 & 0x3Fu);
if (cp < 0x10000 || cp > 0x10FFFF) { // overlong or out of range
    advance = 1;
    return 0xFFFD;
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae223fc and 9190fab.

📒 Files selected for processing (1)
  • common/console.cpp (8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/console.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/console.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: WebUI Check
  • GitHub Check: build-linux-cross / debian-13-loongarch64-vulkan-cross
  • GitHub Check: macOS-latest-cmake-tvos
  • GitHub Check: ubuntu-24-cmake-vulkan
  • GitHub Check: ubuntu-24-cmake-webgpu
  • GitHub Check: ubuntu-24-cmake-vulkan-deb
  • GitHub Check: ubuntu-cpu-cmake (arm64, ubuntu-22.04-arm)
  • GitHub Check: macOS-latest-cmake-ios
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: ubuntu-24-wasm-webgpu
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: macOS-latest-cmake-arm64
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: ubuntu-22-cmake-hip
  • GitHub Check: macOS-latest-cmake-x64
  • GitHub Check: server-windows
🔇 Additional comments (20)
common/console.cpp (20)

4-8: LGTM!

The new standard library includes are appropriate for the UTF-8 handling, character classification, and type safety added in this PR.


43-58: LGTM!

Using private-use Unicode values (U+E000–U+E008) to represent Windows special keys is a clean approach that unifies key handling across platforms without risking collisions with actual input.


200-212: LGTM!

The Windows key handling correctly maps virtual key codes to the private-use constants and properly detects Ctrl modifiers for word-wise navigation.


418-434: LGTM!

The UTF-8 character position navigation correctly identifies character boundaries by scanning for continuation bytes. The boundary checks prevent out-of-bounds access.


442-468: LGTM!

The delete_at_cursor function correctly handles UTF-8 multi-byte characters, updates the widths vector, redraws the affected tail, and repositions the cursor.


470-481: LGTM!

The clear_current_line helper correctly calculates display width and clears the line. The defensive handling of negative widths is a nice touch.


483-505: LGTM!

The set_line_contents function correctly replaces the entire line, rebuilds the widths vector with actual display widths, and positions the cursor at the end. This is essential for history navigation.


507-525: LGTM!

The move_to_line_start and move_to_line_end functions correctly calculate cursor movement from the widths vector and update both character and byte positions.


527-553: LGTM!

The has_ctrl_modifier function correctly parses semicolon-separated CSI parameters to detect the Ctrl modifier (value 5), which is standard for sequences like ESC[1;5D (Ctrl+Left).


555-557: LGTM!

The is_space_codepoint wrapper provides a clear interface for whitespace detection using the standard library.


559-595: LGTM!

The move_word_left function implements standard readline-style word navigation: skip trailing whitespace, then move backward through the word. The UTF-8 handling is correct.


597-642: LGTM!

The move_word_right function correctly implements bash-style forward word navigation with three phases: skip current whitespace, skip the word, and skip trailing whitespace to position at the start of the next word.


644-675: LGTM!

The move_cursor function correctly handles platform-specific cursor movement with proper line-wrapping logic for Windows and standard escape sequences for POSIX.


677-729: LGTM!

The history_t structure implements clean history navigation with proper state management. The bash-style behavior (edits persist during navigation) was confirmed as intentional in previous review discussions.


746-766: LGTM!

The history navigation lambdas cleanly encapsulate the logic for moving through history and updating the displayed line. The state checks ensure correct behavior.


786-893: LGTM!

The escape sequence handling is comprehensive and correct. Both POSIX (via CSI sequences) and Windows (via private key constants) support the same navigation features including Ctrl+Arrow for word-wise movement, which addresses previous review feedback.


894-922: LGTM!

The backspace handling correctly deletes the previous UTF-8 character, updates the widths vector, redraws the affected tail, and repositions the cursor.


923-961: LGTM!

The character insertion logic correctly handles both end-of-line and mid-line insertion, properly encoding UTF-8, updating the widths vector, and redrawing affected portions of the line.


998-1005: LGTM!

The history storage correctly adds non-empty lines after removing the trailing newline. The TODO for multiline history support is noted for future enhancement.


746-747: LGTM!

The loop invariant assertions are excellent defensive programming that help catch internal logic errors. They correctly express the relationships: char_pos ≤ byte_pos (since UTF-8 characters can span multiple bytes) and char_pos ≤ widths.size() (since char_pos indexes into widths).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants