Skip to content

Commit 291234c

Browse files
committed
process updates in batches of 500
Batch processing: * updates just the declared license in the DB documents using `collection.bulk_write()` * updates denitions using service API `POST /definitions?force=true` _NOTE: Updating the DB makes the fix of the declared license immediately available. When the `POST /definitions` request completes, the full DB document will be updated to be in sync with the blob definition._ Additional changes: * moves global variable definitions based on .env to the initialize() function * adds DRYRUN flag to check what would run and how many records would be evaluated * add estimated time to complete * adds script and function level documentation * includes timestamps to make it easier to estimate how long it will take to complete a run * generate filename based on date range and offset to avoid overwriting output files _NOTE: Azure only supports fetching one blob at a time. Not able to optimize that part of the code. _ _NOTE: Batch size of 500 was selected because that is the max number of coordinates supported in calls to service API `POST /definitions`._
1 parent 615c940 commit 291234c

File tree

2 files changed

+169
-84
lines changed

2 files changed

+169
-84
lines changed

tools/analyze_data_synchronization/.env_example

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,12 @@ MONGO_CONNECTION_STRING="mongodb://localhost:27017/"
22
BASE_AZURE_BLOB_URL = "https://clearlydefineddev.blob.core.windows.net"
33
AZURE_CONTAINER_NAME = "develop-definition"
44
SERVICE_API_URL = "http://dev-api.clearlydefined.io/"
5-
OUTPUT_FILE = "invalid-data.json"
5+
BASE_OUTPUT_FILENAME = "invalid-data"
66
# START_DATE = "2024-06-21"
77
# END_DATE = "2024-06-28"
88
START_MONTH = str(os.environ.get("START_MONTH", "2024-06"))
99
END_MONTH = str(os.environ.get("END_MONTH", "2024-06"))
1010
INITIAL_SKIP = 0
1111
PAGE_SIZE = 1000
1212
REPAIR = false
13+
DRYRUN = false

0 commit comments

Comments
 (0)