Skip to content

Comments

perf(engine): cache compiled routines in Thread.loadNext()#250

Open
killerdevildog wants to merge 1 commit intopmgl:masterfrom
killerdevildog:fix/cache-call-string-routines
Open

perf(engine): cache compiled routines in Thread.loadNext()#250
killerdevildog wants to merge 1 commit intopmgl:masterfrom
killerdevildog:fix/cache-call-string-routines

Conversation

@killerdevildog
Copy link

perf(engine): cache compiled routines in Thread.loadNext()

Summary

Fixes a critical per-frame performance bottleneck in the microScript v2 runtime engine. Thread.loadNext() was re-parsing and re-compiling call strings like "update()" and "draw()" every single frame, despite the strings never changing.

Problem

At 60fps, loadNext() creates 120+ full parse→compile cycles per second for static call strings. Each cycle allocates:

  • A Tokenizer with lookup tables
  • A Parser with api_reserved arrays
  • AST nodes
  • A Compiler with opcodes arrays and label maps

All of these objects are immediately garbage collected, creating unnecessary GC pressure and wasted CPU time.

Solution

Added a Map<string, Routine> cache (call_cache) to Thread. On the first encounter of a call string, the parse→compile path runs as before, then stores the compiled Routine in the cache. On subsequent frames, the cached Routine is returned directly via Map.get(), bypassing the entire pipeline.

Why this is safe

The cached routines compile to "call by name" instructions — they resolve the actual function body at runtime via context.global. When user source code changes, context.global.update (etc.) is updated by Runner.run(), so the cached routine automatically picks up the new function body without cache invalidation.

The cache is scoped to Thread lifetime — new game sessions create a new MicroVMRunnerThread → fresh cache.

Benchmark Results (Vitest/Tinybench)

Metric Before After Improvement
Single call "update()" 641,228 ops/sec 18,482,921 ops/sec ~29x faster
Full frame (update+draw) 375,660 ops/sec 17,222,238 ops/sec ~46x faster
Per-frame cost at 60fps ~162µs/sec ~3.5µs/sec ~46x less overhead
GC pressure 120+ allocs/sec 0 after warm-up Eliminated

Files Changed

  • static/js/languages/microscript/v2/runner.coffee — CoffeeScript source
  • static/js/languages/microscript/v2/runner.js — compiled JS output

Testing

  • Vitest benchmark suite confirms performance improvement
  • runner.coffee compiles cleanly with CoffeeScript
  • Compiled output matches the manually maintained .js file logic

Benchmark Suite (patch included)

A full Vitest/Tinybench benchmark suite is available as a separate patch file (benchmarks.patch). It includes:

  • 10 benchmark suites covering the microScript v2 pipeline (tokenizer, parser, compiler, processor, pipeline) and 30 identified bottlenecks
  • Baseline and post-fix benchmark results (benchmark_baseline.md, benchmark_postfix.md)
  • Bottleneck analysis (possible_bottlenecks.md)
  • Setup and helpers for loading microScript source files in Node.js via vm.Script

To apply:

git apply benchmarks.patch
cd benchmarks && npm install

npx vitest bench --run

benchmarks.patch

Bottleneck pmgl#1: loadNext() receives strings like "update()" and "draw()"
and creates a new Parser, parses, creates a new Compiler, and compiles
every single frame. At 60fps this is 120+ full parse→compile cycles/sec
for code that never changes.

Each cycle allocates a Tokenizer (with lookup tables), Parser (with
api_reserved), AST nodes, Compiler, opcodes arrays, and label maps —
all immediately garbage collected.

Fix: Add a Map<string, Routine> cache (call_cache) to Thread. On first
encounter of a call string, parse and compile as before, then store the
compiled Routine in the cache. On subsequent frames, return the cached
Routine directly via Map.get(), bypassing the entire pipeline.

The cached routines are semantically safe to reuse — they compile to
"call by name" instructions that resolve the actual function body at
runtime via context.global, so source code changes are picked up
without cache invalidation.

Benchmark results (Vitest/Tinybench):
  Before: 375,660 ops/sec (full frame parse+compile update()+draw())
  After:  17,222,238 ops/sec (cached Map.get)
  Speedup: ~46x faster, GC pressure eliminated

Files changed:
  - runner.coffee: source of truth (CoffeeScript)
  - runner.js: compiled output (ES6 class syntax)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant