Optimize get_script() using binary search for 42x speedup #156

Copilot · 2026-01-05T20:19:45Z

Explore the codebase to understand the structure
Identify performance bottlenecks
Optimize the get_script() function using binary search (O(n) → O(log n) with 776 ranges)
Pre-compute sorted data structures at code generation time (in genscripts/generate_text.py)
Add tests for the optimization
Run existing tests to verify correctness (223 tests pass)
Run mypy type checking (no issues)
Run code review
Run security checks (no vulnerabilities found)

Performance Improvement

File: rigour/text/scripts.py

Issue: The get_script() function was iterating linearly through 776 Unicode script ranges to find the script for a codepoint. This is O(n) complexity for each lookup.

Solution:

Replaced linear search with binary search using Python's bisect module
Pre-computed SORTED_RANGES and RANGE_STARTS at code generation time in genscripts/generate_text.py rather than at module import time

Benchmark Results:

Before: ~9.8μs per call
After: ~0.23μs per call
Speedup: ~42x faster

This optimization improves the performance of all functions that depend on script detection, including can_latinize(), can_latinize_cp(), and indirectly affects name processing operations.

Original prompt

Identify and suggest improvements to slow or inefficient code

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: pudo <41628+pudo@users.noreply.github.com>

pudo · 2026-01-05T20:31:53Z

@copilot I like this! Do we need to regenerate _SORTED_RANGES on each start or can we do this as the scripts data is computed in genscripts/generate_text.py?

Co-authored-by: pudo <41628+pudo@users.noreply.github.com>

Copilot · 2026-01-05T20:38:41Z

@copilot I like this! Do we need to regenerate _SORTED_RANGES on each start or can we do this as the scripts data is computed in genscripts/generate_text.py?

Done in c908d6b. The SORTED_RANGES and RANGE_STARTS are now pre-computed in genscripts/generate_text.py and included in the generated rigour/data/text/scripts.py file, eliminating the runtime sorting overhead.

Initial plan

382a453

Copilot AI assigned Copilot and pudo Jan 5, 2026

Copilot started work on behalf of pudo January 5, 2026 20:20 View session

Optimize get_script() using binary search for 42x speedup

dee6816

Co-authored-by: pudo <41628+pudo@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Identify and suggest improvements to slow or inefficient code~~ Optimize get_script() using binary search for 42x speedup Jan 5, 2026

Copilot AI requested a review from pudo January 5, 2026 20:30

Copilot finished work on behalf of pudo January 5, 2026 20:30

Copilot started work on behalf of pudo January 5, 2026 20:32 View session

Pre-compute SORTED_RANGES and RANGE_STARTS at code generation time

c908d6b

Co-authored-by: pudo <41628+pudo@users.noreply.github.com>

Copilot finished work on behalf of pudo January 5, 2026 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize get_script() using binary search for 42x speedup #156

Optimize get_script() using binary search for 42x speedup #156

Uh oh!

Copilot AI commented Jan 5, 2026 •

edited

Loading

Uh oh!

pudo commented Jan 5, 2026

Uh oh!

Copilot AI commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize get_script() using binary search for 42x speedup #156

Are you sure you want to change the base?

Optimize get_script() using binary search for 42x speedup #156

Uh oh!

Conversation

Copilot AI commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Improvement

Uh oh!

pudo commented Jan 5, 2026

Uh oh!

Copilot AI commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 5, 2026 •

edited

Loading