Skip to content

Conversation

@anderdc
Copy link
Collaborator

@anderdc anderdc commented Jan 5, 2026

use tree-sitter to parse github code content and calculate a 'truer' value of changed lines

  • validators now fetch code contents, not just file patches, this is for creating ASTs and comparing old vs new nodes/tokens
  • using old and new file contents, construct AST's and compare changed tokens
  • structural and leaf code changes are calculated each with their own set of weights
  • file types we don't support do not get scored (lines do count towards code density)
  • code_density is token_score / total_lines
  • base score is no longer static, gets multiplied by code_density
  • no base score earned for PRs with < 5.0 token score, contribution bonus can still be earned
  • contribution_bonus uses token_score
  • tier configs now use token_score, need min token score AND a min token score for n unique repos per tier now
  • PRs are no longer considered "low value" and disregarded for scoring. but since token score and code density play a big part in scoring, they will still have a low value score-wise
  • left in low_value_pr code as deprecated code in case we flip back
  • etc...

misc. updates

  • remove db migrator and definitions of db tables

@anderdc anderdc marked this pull request as ready for review January 13, 2026 05:09
@entrius entrius merged commit f1e7d05 into test Jan 15, 2026
2 checks passed
@entrius entrius deleted the token-scoring branch January 15, 2026 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants