|
| 1 | +# Setting GCP region |
| 2 | + |
| 3 | +## What to consider |
| 4 | + |
| 5 | +Google Cloud Platform services are available in [many |
| 6 | +locations](https://cloud.google.com/about/locations/) across the globe. |
| 7 | +You can minimize network latency and network transport costs by running your |
| 8 | +Dataflow job in the same region as its input bucket, output dataset, and |
| 9 | +temporary directory are located. More specifically, in order to run Variant |
| 10 | +Transforms most efficiently you should make sure all the following resources |
| 11 | +are located in the same region: |
| 12 | +* Your source bucket set by `--input_pattern` flag. |
| 13 | +* Your pipeline's temporary location set by `--temp_location` flag. |
| 14 | +* Your output BigQuery dataset set by `--output_table` flag. |
| 15 | +* Your Dataflow pipeline set by `--region` flag. |
| 16 | + |
| 17 | +## Running jobs in a particular region |
| 18 | +The Dataflow API [requires](https://beam.apache.org/blog/2019/08/22/beam-2.15.0.html) |
| 19 | +setting a [GCP |
| 20 | +region](https://cloud.google.com/compute/docs/regions-zones/#available) via |
| 21 | +`--region` flag to run. In addition to this requirment you might also |
| 22 | +choose to run Variant Transforms in a specific region following your project’s |
| 23 | +security and compliance requirements. For example, in order |
| 24 | +to restrict your processing job to Europe, update the region as follows: |
| 25 | + |
| 26 | +```bash |
| 27 | +COMMAND="/opt/gcp_variant_transforms/bin/vcf_to_bq ... |
| 28 | +
|
| 29 | +docker run gcr.io/cloud-lifesciences/gcp-variant-transforms \ |
| 30 | + --project "${GOOGLE_CLOUD_PROJECT}" \ |
| 31 | + --region "${GOOGLE_CLOUD_REGION}" \ |
| 32 | + "${COMMAND}" |
| 33 | +``` |
| 34 | +
|
| 35 | +Note that values of `--project` and `--region` flags will be automatically |
| 36 | +passed as `COMMAND` args in [`piplines_runner.sh`](docker/pipelines_runner.sh). |
| 37 | +Alternatively, you can set your default region using the following command: |
| 38 | +
|
| 39 | +```bash |
| 40 | +gcloud config set compute/region "europe-west1" |
| 41 | +``` |
| 42 | +
|
| 43 | +In this case you do not need to set the `--region` flag any more. For more |
| 44 | +information please refer to this [cloud SDK page](https://cloud.google.com/sdk/gcloud/reference/config/set). |
| 45 | +
|
| 46 | +If you are running Variant Transforms from GitHub, you just need to specify |
| 47 | +region for the Dataflow API as below. |
| 48 | +
|
| 49 | +```bash |
| 50 | +python -m gcp_variant_transforms.vcf_to_bq ... \ |
| 51 | + --project "${GOOGLE_CLOUD_PROJECT}" \ |
| 52 | + --region "${GOOGLE_CLOUD_REGION}" \ |
| 53 | +``` |
| 54 | +
|
| 55 | +## Setting Google Cloud Storage bucket region |
| 56 | +
|
| 57 | +You can choose your [GCS bucket's region](https://cloud.google.com/storage/docs/locations) |
| 58 | +when you are [creating it](https://cloud.google.com/storage/docs/creating-buckets#storage-create-bucket-console). |
| 59 | +When you create a bucket, you [permanently |
| 60 | +define](https://cloud.google.com/storage/docs/moving-buckets#storage-create-bucket-console) |
| 61 | +its name, its geographic location, and the project it is part of. For an existing bucket, you can check |
| 62 | +[its information](https://cloud.google.com/storage/docs/getting-bucket-information) to find out |
| 63 | +about its geographic location. |
| 64 | +
|
| 65 | +## Setting BigQuery dataset region |
| 66 | +
|
| 67 | +You can choose the region for the BigQuery dataset at dataset creation time. |
| 68 | +
|
| 69 | + |
| 70 | +
|
0 commit comments