Skip to content

Conversation

@KPHippe
Copy link
Collaborator

@KPHippe KPHippe commented Oct 15, 2024

Minimal implementation of parsl distribution for computing minhashes.

A couple of sticky points:

  • did not use MinHasher as intended, kind of hacked around it so the core is the same
  • using all available workers to compute minhash, then keeping the job alive and using a single worker for minhash so it doesn't get computed on the pilot (deduplication/workflows:196)
  • Made a wrapper for the LSHBloom (deduplication/lshbloom) so that it could run via parsl

@KPHippe KPHippe requested a review from 123epsilon October 15, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants