Proposal: Context File Growth: Consolidation + Smart Retrieval

## Problem

Files that grow every session — `LEARNINGS.md`, `DECISIONS.md`,
`CONVENTIONS.md` — accumulate entries without bound. After weeks of active
use, these files become expensive to load into agent context and dilute
signal with entries that aren't relevant to the current task.

The problem isn't storage (flat files are fine on disk). The problem is
**retrieval**: `ctx agent --budget 4000` must decide what to include, and
today it truncates by token count rather than by relevance.

## Constraints

- **ctx is intentionally file-based.** SQLite or any database layer violates
  the core design principle that context is human-readable, git-tracked
  Markdown.
- **Archiving buries signal.** Moving old entries to an archive folder makes
  them harder to find. A learning from 3 months ago might still be the most
  critical thing an agent needs today.
- **Pagination fragments.** Splitting into `LEARNINGS-01.md`,
  `LEARNINGS-02.md` loses the single-file simplicity that makes ctx work.
  Agents would need to know which file to read, and humans lose the ability
  to grep one file.
- **Old entries aren't necessarily stale.** Unlike tasks (which complete),
  learnings and decisions are often permanent. "Don't use `go install` in
  hooks" is as true on day 100 as day 1.

## Proposal: Three-Layer Approach

### 1. Periodic Consolidation (reduces without burying)

A `/ctx-consolidate` skill that:

- Groups related entries by topic/tag similarity
- Merges redundant or overlapping entries into denser combined entries
- Moves originals to `.context/archive/learnings-YYYY-QN.md` for reference
- Is human-triggered, not automatic (preserves "you control the context")

Example: 5 separate learnings about hook edge cases become 1 consolidated
entry covering all the gotchas, with the 5 originals archived.

**Key distinction: consolidation ≠ archival.** Archival moves entries out.
Consolidation *replaces* verbose entries with tighter ones — the file stays
useful, just denser.

### 2. Relevance-Aware `ctx agent` Retrieval

Instead of truncating by token count, `ctx agent` scores entries:

- **Recency boost**: entries from the last N sessions rank higher
- **Task relevance**: keyword/tag overlap with active `TASKS.md` entries
- **Entry type weighting**: conventions and active decisions rank higher
  than old learnings (conventions are always relevant; learnings are
  situational)
- **Summarize the rest**: entries that don't make the budget cut get a
  one-line summary rather than full inclusion

The files stay flat and append-only. The *presentation layer* gets smart.

This is the highest-impact change: it solves the problem without touching
the files at all.

### 3. Soft Caps with Nudges

`ctx drift` warns when a file exceeds a threshold:

```
⚠ LEARNINGS.md has 47 entries (recommended: ≤30)
  Run /ctx-consolidate to review and merge related entries
```

Not enforcement — just a nudge in the existing maintenance workflow. The
threshold is configurable via `.contextrc`.

## Alternatives Considered

| Approach | Why not |
|---|---|
| **SQLite** | Violates file-based design; not human-readable; not git-diffable |
| **Folder pagination** | Fragments the file; agents don't know which page to read; humans can't grep one file |
| **Pure archival** | Buries signal; a 6-month-old learning might be the most relevant today |
| **Automatic pruning** | Dangerous — who decides what's stale? Only the human should make that call |
| **Tagging + filtering** | Helps retrieval but doesn't reduce file size; complementary, not sufficient |

## Implementation Phases

### Phase 1: Smart Retrieval (highest impact, no file changes)

Enhance `ctx agent` to score entries against current tasks before including
them. This alone solves the "too much context" problem without any file
format changes.

- Modify `internal/cli/agent/` to score entries
- Add keyword extraction from `TASKS.md` for relevance matching
- Entries below the relevance threshold get one-line summaries
- Budget allocation: constitution (fixed) → tasks (fixed) → conventions
  (high weight) → decisions (medium) → learnings (scored)

### Phase 2: Drift Nudges (low effort)

Add entry count checks to `ctx drift`:

- Warn when LEARNINGS.md exceeds 30 entries
- Warn when DECISIONS.md exceeds 20 entries
- Configurable thresholds via `.contextrc`

### Phase 3: Consolidation Skill (full solution)

Build `/ctx-consolidate` skill:

- Analyze entries for topic similarity
- Present consolidation suggestions to the user
- Merge approved groups into denser entries
- Archive originals to `.context/archive/`
- Update cross-references if any entries link to consolidated ones

## Open Questions

- Should consolidated entries carry a `consolidated-from:` metadata field
  linking back to the originals?
- Should the archive be gitignored (like sessions) or tracked (like context
  files)?
- Should `ctx agent` budget allocation weights be configurable, or are
  sensible defaults sufficient?
- Is 30 the right soft cap for learnings? Should it vary by file type?

## Labels

`enhancement`, `context-management`, `agent-ux`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Context File Growth: Consolidation + Smart Retrieval #19

Problem

Constraints

Proposal: Three-Layer Approach

1. Periodic Consolidation (reduces without burying)

2. Relevance-Aware `ctx agent` Retrieval

3. Soft Caps with Nudges

Alternatives Considered

Implementation Phases

Phase 1: Smart Retrieval (highest impact, no file changes)

Phase 2: Drift Nudges (low effort)

Phase 3: Consolidation Skill (full solution)

Open Questions

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	Why not
SQLite	Violates file-based design; not human-readable; not git-diffable
Folder pagination	Fragments the file; agents don't know which page to read; humans can't grep one file
Pure archival	Buries signal; a 6-month-old learning might be the most relevant today
Automatic pruning	Dangerous — who decides what's stale? Only the human should make that call
Tagging + filtering	Helps retrieval but doesn't reduce file size; complementary, not sufficient

Proposal: Context File Growth: Consolidation + Smart Retrieval #19

Description

Problem

Constraints

Proposal: Three-Layer Approach

1. Periodic Consolidation (reduces without burying)

2. Relevance-Aware ctx agent Retrieval

3. Soft Caps with Nudges

Alternatives Considered

Implementation Phases

Phase 1: Smart Retrieval (highest impact, no file changes)

Phase 2: Drift Nudges (low effort)

Phase 3: Consolidation Skill (full solution)

Open Questions

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Relevance-Aware `ctx agent` Retrieval