Skip to content

allow customization of an operation's polling interval #13

@stephenplusplus

Description

@stephenplusplus

From @c0b on October 31, 2017 17:48

from the bigquery job api I was only aware the complete event to listen to a job with a callback when job completed, till recently I found from some gist shared code I got that a job.promise() is available, since our application uses node v6 and recently upgraded to v8; the promise api fits the code better, and works with async await model; wonder should you at least document it?
https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/0.9.6/bigquery/job

On the other hand, I spent some time figured out how was this default job.promise() working, I found the call trace down to the Operation's setTimeout self.startPolling of every 500ms, so it's polling at a hard coded interval of 500ms? while in many gcloud products best practices a backing off strategry of retrying is preferred,
https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/common/src/operation.js#L184

this behavior of polling 500ms may be acceptable (or wanted) for some cases, for our ETL scripts which runs hundreds of query jobs concurrently in BATCH mode is just not so efficient, for this ETL purpose I have a piece of code already in use in production for a long while, implemented the backing off strategry, it supports an optional options obj parameters of waitbegin (default to 500ms) and waitmax (default to 10s)

// looping on a bigquery job till it's 'DONE' or error
//   using a backing off strategry, waiting starts with 500ms,
//     then increase by half till the max is 10s
function waitJobFinish(job, {waitbegin=500, waitmax=10000, initialState='UNKNOWN'} = {}) {
  return new Promise((fulfilled, rejected) =>
    function loop(retries=0, wait=waitbegin, state=initialState) {
      job.getMetadata()
        .catch(rejected)
        .then(([ metadata, apiResponse ]) => {
          if (metadata.status.state !== state) {
            console.log(`Job ${metadata.id} state transit from ${state} to ${metadata.status.state}, at ${(new Date).toJSON()} after ${retries} retries check job status.`);
            state = metadata.status.state;
          }

          if (metadata.status.errorResult)
            return rejected(metadata.status.errorResult);

          if (metadata.status.state === 'DONE')
            return fulfilled([ metadata, apiResponse, retries ]);

          setTimeout(loop, wait, retries+1, Math.min(waitmax, (wait+=wait/2)), state);
        });
    }() // (0, waitbegin, 'UNKNOWN')
  );
}

so with this API, it's similar to job.promise() we can write code like this, but internally it's doing a backing off strategy of retrying retrieve metadata;

  bigquery.startQuery({
    query: '...',
    // more options
  })
  .then(([ job ]) => waitJobFinish(job))
  .then(([ metadata, apiResponse, retries ]) => { ... })

or with async await

  // in an async function
  const [ job ] = await bigquery.startQuery(...);
  const [ metadata, apiResponse, retries ] = await waitJobFinish(job);
  // ...

the console.log lines give us transparency of how healthy each job runs, state transition from 'PENDING' to 'RUNNING' to 'DONE'

I'm not sure this strategy can be in the Operation for all the @google-cloud/... packages, but at least works for bigquery job; let me know if you like the code.

Copied from original issue: googleapis/google-cloud-node#2710

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigqueryIssues related to the googleapis/nodejs-bigquery API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions