-
-
Notifications
You must be signed in to change notification settings - Fork 101
Updated import db #727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Updated import db #727
Conversation
|
@whilo @TimoKramer @awb99 @yflim a review would be greatly appreciated 🙏 |
whilo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alekcz see comment.
| :total-datoms @datom-count | ||
| :remaining (- @datom-count @txn-count)}))) | ||
|
|
||
| (comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alekcz Thank you so much for helping with this! Overall it looks good to me and it is great that this code performs well for you. However, I cannot evaluate this comment block, stub-server is not in scope and the alias d is already loaded. Could you turn this into a test?
|
I read through this again and I am fine with merging it even without the test, since it is already covered and definitely better than before. I am wondering whether it is right to export the Datoms sorted by eid though, I would have guessed it would be better to sort by tx-id. Also, sorting is slow. @alekcz Do you use export-db even? |
15ca402 to
6149e16
Compare
SUMMARY
The current
import-dbfunction does not work at scale. It requires a lot of memory and takes very long.Restoring our production instance took 2 hours for 16M datoms and required a huge instance. This is not practical.
This PR changes
import-dbsuch that one thread reads the serialized data token by token while another loads them into the store. This has changed the restore of 16M datoms from 2 hours to 21 seconds.This introduces 3 new options.
filter-schema?:
ignore schema datoms when importing the db. This is helpful if your schema has evolved over time.
load-entities?:
uses load-entities instead of transact. This is much faster. It is also more forgiving where intial datoms do not adhere to current schema
sync?:
When set to true old behaviour is preserved, when set to false importing is async in two threads. A status function is return to allow querying of the status
For backwards compatibility the default mode is synchronous, uses transact, and does not filter the schema.
Checks
Bugfix
fixes #633Feature
fixes #633ADDITIONAL INFORMATION
I would appreciate a second pair of eyes on this. Here's how you can run your own test