-
Notifications
You must be signed in to change notification settings - Fork 211
Description
From @c0b on October 31, 2017 17:48
from the bigquery job api I was only aware the complete event to listen to a job with a callback when job completed, till recently I found from some gist shared code I got that a job.promise() is available, since our application uses node v6 and recently upgraded to v8; the promise api fits the code better, and works with async await model; wonder should you at least document it?
https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/0.9.6/bigquery/job
On the other hand, I spent some time figured out how was this default job.promise() working, I found the call trace down to the Operation's setTimeout self.startPolling of every 500ms, so it's polling at a hard coded interval of 500ms? while in many gcloud products best practices a backing off strategry of retrying is preferred,
https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/common/src/operation.js#L184
this behavior of polling 500ms may be acceptable (or wanted) for some cases, for our ETL scripts which runs hundreds of query jobs concurrently in BATCH mode is just not so efficient, for this ETL purpose I have a piece of code already in use in production for a long while, implemented the backing off strategry, it supports an optional options obj parameters of waitbegin (default to 500ms) and waitmax (default to 10s)
// looping on a bigquery job till it's 'DONE' or error
// using a backing off strategry, waiting starts with 500ms,
// then increase by half till the max is 10s
function waitJobFinish(job, {waitbegin=500, waitmax=10000, initialState='UNKNOWN'} = {}) {
return new Promise((fulfilled, rejected) =>
function loop(retries=0, wait=waitbegin, state=initialState) {
job.getMetadata()
.catch(rejected)
.then(([ metadata, apiResponse ]) => {
if (metadata.status.state !== state) {
console.log(`Job ${metadata.id} state transit from ${state} to ${metadata.status.state}, at ${(new Date).toJSON()} after ${retries} retries check job status.`);
state = metadata.status.state;
}
if (metadata.status.errorResult)
return rejected(metadata.status.errorResult);
if (metadata.status.state === 'DONE')
return fulfilled([ metadata, apiResponse, retries ]);
setTimeout(loop, wait, retries+1, Math.min(waitmax, (wait+=wait/2)), state);
});
}() // (0, waitbegin, 'UNKNOWN')
);
}so with this API, it's similar to job.promise() we can write code like this, but internally it's doing a backing off strategy of retrying retrieve metadata;
bigquery.startQuery({
query: '...',
// more options
})
.then(([ job ]) => waitJobFinish(job))
.then(([ metadata, apiResponse, retries ]) => { ... })or with async await
// in an async function
const [ job ] = await bigquery.startQuery(...);
const [ metadata, apiResponse, retries ] = await waitJobFinish(job);
// ...the console.log lines give us transparency of how healthy each job runs, state transition from 'PENDING' to 'RUNNING' to 'DONE'
I'm not sure this strategy can be in the Operation for all the @google-cloud/... packages, but at least works for bigquery job; let me know if you like the code.
Copied from original issue: googleapis/google-cloud-node#2710