Skip to content

Conversation

@johncalesp
Copy link
Contributor

In this PR I intent to add the brand field to the accuracy evaluation.
Since brand can be any string, I opted to use another package to perform the evaluation.
The library rapidfuzz helps compare strings and provide a numeric value based on a threshold. In comparisson using sklearn looks for exact string matches and If we have 1,000 different brands, sklearn treats this as a classification problem with 1,000 classes (multi-classification problem).

The evaluation now will look like this:

image

@johncalesp johncalesp requested a review from a team as a code owner December 6, 2025 01:06
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Comment on lines +128 to +130
if norm_truth == norm_pred:
matches.append(1)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering, if it's an exact match, wouldn't the score from fuzz.ratio be also bigger than valid_threshold? Therefore, maybe there's no need to treat the exact match as a special case (that needed to be handled differently)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants