Skip to content

Proposal: Context File Growth: Consolidation + Smart Retrieval #19

@v0lkan

Description

@v0lkan

Problem

Files that grow every session — LEARNINGS.md, DECISIONS.md,
CONVENTIONS.md — accumulate entries without bound. After weeks of active
use, these files become expensive to load into agent context and dilute
signal with entries that aren't relevant to the current task.

The problem isn't storage (flat files are fine on disk). The problem is
retrieval: ctx agent --budget 4000 must decide what to include, and
today it truncates by token count rather than by relevance.

Constraints

  • ctx is intentionally file-based. SQLite or any database layer violates
    the core design principle that context is human-readable, git-tracked
    Markdown.
  • Archiving buries signal. Moving old entries to an archive folder makes
    them harder to find. A learning from 3 months ago might still be the most
    critical thing an agent needs today.
  • Pagination fragments. Splitting into LEARNINGS-01.md,
    LEARNINGS-02.md loses the single-file simplicity that makes ctx work.
    Agents would need to know which file to read, and humans lose the ability
    to grep one file.
  • Old entries aren't necessarily stale. Unlike tasks (which complete),
    learnings and decisions are often permanent. "Don't use go install in
    hooks" is as true on day 100 as day 1.

Proposal: Three-Layer Approach

1. Periodic Consolidation (reduces without burying)

A /ctx-consolidate skill that:

  • Groups related entries by topic/tag similarity
  • Merges redundant or overlapping entries into denser combined entries
  • Moves originals to .context/archive/learnings-YYYY-QN.md for reference
  • Is human-triggered, not automatic (preserves "you control the context")

Example: 5 separate learnings about hook edge cases become 1 consolidated
entry covering all the gotchas, with the 5 originals archived.

Key distinction: consolidation ≠ archival. Archival moves entries out.
Consolidation replaces verbose entries with tighter ones — the file stays
useful, just denser.

2. Relevance-Aware ctx agent Retrieval

Instead of truncating by token count, ctx agent scores entries:

  • Recency boost: entries from the last N sessions rank higher
  • Task relevance: keyword/tag overlap with active TASKS.md entries
  • Entry type weighting: conventions and active decisions rank higher
    than old learnings (conventions are always relevant; learnings are
    situational)
  • Summarize the rest: entries that don't make the budget cut get a
    one-line summary rather than full inclusion

The files stay flat and append-only. The presentation layer gets smart.

This is the highest-impact change: it solves the problem without touching
the files at all.

3. Soft Caps with Nudges

ctx drift warns when a file exceeds a threshold:

⚠ LEARNINGS.md has 47 entries (recommended: ≤30)
  Run /ctx-consolidate to review and merge related entries

Not enforcement — just a nudge in the existing maintenance workflow. The
threshold is configurable via .contextrc.

Alternatives Considered

Approach Why not
SQLite Violates file-based design; not human-readable; not git-diffable
Folder pagination Fragments the file; agents don't know which page to read; humans can't grep one file
Pure archival Buries signal; a 6-month-old learning might be the most relevant today
Automatic pruning Dangerous — who decides what's stale? Only the human should make that call
Tagging + filtering Helps retrieval but doesn't reduce file size; complementary, not sufficient

Implementation Phases

Phase 1: Smart Retrieval (highest impact, no file changes)

Enhance ctx agent to score entries against current tasks before including
them. This alone solves the "too much context" problem without any file
format changes.

  • Modify internal/cli/agent/ to score entries
  • Add keyword extraction from TASKS.md for relevance matching
  • Entries below the relevance threshold get one-line summaries
  • Budget allocation: constitution (fixed) → tasks (fixed) → conventions
    (high weight) → decisions (medium) → learnings (scored)

Phase 2: Drift Nudges (low effort)

Add entry count checks to ctx drift:

  • Warn when LEARNINGS.md exceeds 30 entries
  • Warn when DECISIONS.md exceeds 20 entries
  • Configurable thresholds via .contextrc

Phase 3: Consolidation Skill (full solution)

Build /ctx-consolidate skill:

  • Analyze entries for topic similarity
  • Present consolidation suggestions to the user
  • Merge approved groups into denser entries
  • Archive originals to .context/archive/
  • Update cross-references if any entries link to consolidated ones

Open Questions

  • Should consolidated entries carry a consolidated-from: metadata field
    linking back to the originals?
  • Should the archive be gitignored (like sessions) or tracked (like context
    files)?
  • Should ctx agent budget allocation weights be configurable, or are
    sensible defaults sufficient?
  • Is 30 the right soft cap for learnings? Should it vary by file type?

Labels

enhancement, context-management, agent-ux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions