To install from PyPI, run:
python3 -m pip install yamlprocessorThis project provides two command line utilities yp-data and yp-schema.
The yp-data utility allows automation of the following in a single command:
- Modularisation of YAML files via a simple include mechanism.
- Variable substitutions in string values.
- Environment and pre-defined variables.
- Date-time variables, based on the current time and/or a reference time.
- Validation using JSON schema.
The yp-schema utility is a compliment to the YAML modularisation / include
functionality provided by yp-data. It allows users to break up a monolithic
JSON schema file into a set of subschema files.
Command line:
yp-data [options] input-file-name output-file-nameType -yp-data --help for a list of options, and see below for usage detail.
Python:
from yamlprocessor.dataprocess import DataProcessor
processor = DataProcess()
processor.process_data(in_file_name, out_file_name)Allow modularisation of YAML files using a controlled include file mechanism, backed by dividing the original JSON schema file into a set of subschema files.
Consider a YAML file hello.yaml:
hello:
- location: earth
targets:
- human
- cat
- dog
- location: mars
targets:
- martian
# And so onAnd its associated JSON schema file:
{
"properties": {
"hello": {
"items": {
"properties": {
"location": {
"type": "string"
},
"targets": {
"items": {
"type": "string"
},
"minItems": 1,
"type": "array",
"uniqueItems": true
}
},
"required": ["location", "targets"],
"type": "object"
},
"type": "array"
}
},
"required": ["hello"],
"type": "object"
}We want to modularise the YAML file in this way.
Let's call this hello-root.yaml:
hello:
- INCLUDE: earth.yaml
- INCLUDE: mars.yamlWhere earth.yaml contains:
location: earth
targets:
- human
- cat
- dogAnd mars.yaml contains:
location: mars
targets:
- martianAt runtime, we can run the yp-data INFILE OUTFILE
command to process and recombine the YAML files.
To split the schema to support these YAML files, however, we'll
use the yp-schema SCHEMA-FILE CONFIG-FILE command.
For this command to work, we need to supply it with some settings
to tell it where to split up the schema in the syntax:
{
"OUTPUT-ROOT-SCHEMA-FILENAME": "",
"OUTPUT-SUB-SCHEMA-FILENAME-1": "JMESPATH-1",
/* and so on */
}Obviously, we must have a root schema output file name.
The rest of the entries are output file names for the subschemas.
The https://jmespath.org/ syntax tells the
yp-schema command where to split JSON schema into
subschemas. In the example above, we can give use the setting:
{
"hello.schema.json": "",
"hello-location.schema.json": "properties.hello.items"
}The resulting hello.schema.json will look like this,
which can be used to validate both hello.yaml and hello-root.yaml:
{
"properties": {
"hello": {
"items": {
"oneOf": [
{"$ref": "hello-location.schema.json"},
{
"properties": {
"INCLUDE": {
"type": "string"
},
"QUERY": {
"type": "string"
}
},
"required": ["INCLUDE"],
"type": "string"
}
]
},
"type": "array"
}
},
"required": ["hello"],
"type": "object"
}The resulting hello-location.schema.json will look like this
which can be used to validate earth.yaml and mars.yaml:
{
"properties": {
"location": {
"type": "string"
},
"targets": {
"items": {
"type": "string"
},
"minItems": 1,
"type": "array",
"uniqueItems": true
}
},
"required": ["location", "targets"],
"type": "object"
}Consider an example where we want to include only a subset of the data structure from the include file. We can use a JMESPath query to achieve this.
For example, we may have something like this in hello-root.yaml:
hello:
INCLUDE: planets.yaml
QUERY: "[?type=='rocky'].{location: location, targets: targets}"Where planets.yaml contains:
- location: earth
type: rocky
targets:
- human
- cat
- dog
- location: mars
type: rocky
targets:
- martian
- location: jupiter
type: gaseous
targets:
- ...Running yp-data hello-root.yaml will return:
hello:
- location: earth
targets:
- human
- cat
- dog
- location: mars
targets:
- martianYou can tell yp-data to look for a JSON schema file and validate
the current YAML file by adding a #!<SCHEMA-URI> to the beginning
of the YAML file. The SCHEMA-URI is a string pointing to the location
of a JSON schema file. Some simple assumptions apply:
- If
SCHMEA-URIis a normal URI with a leading scheme, e.g.https://, it is used as-is. - If
SCHEMA-URIdoes not have a leading scheme and exists in the local file system, then it is also used as-is. - Otherwise, if the
YP_SCHEMA_PREFIXenvironment variable is defined or if--schema-prefix=PREFIXis specified, then the prefix will be added to the value of theSCHEMA-URI.
Process variable substitution syntax for string values in YAML files. Consider:
key: ${SWEET_HOME}/sugar.txt(Note: You can write $SWEET_HOME or ${SWEET_HOME} in here.)
If SWEET_HOME is defined in the environment and has a value /home/sweet,
then running yp-data on the above will give:
key: /home/sweet/sugar.txtYou can also use the --define=NAME=VALUE (-D NAME=VALUE) option
of yp-data to define and/or override environment variables.
E.g., yp-data -D SWEET_HOME=/home/sweet provides another way to
specify the value of a variable to use for substitution.
The yp-data application also supports date-time substitution using a
similar syntax, for variables names starting with YP_TIME_NOW (time
when yp-data starts running) YP_TIME_REF (reference time,
specified using the YP_TIME_REF_VALUE environment variable
or the --time-ref=VALUE command line option). If no value is set
for the reference time, any reference to the reference time will
simply use the current time.
You can use one or more of these trialing suffixes to apply deltas for the date-time:
_ADD_XXX: adds the duration to the date-time._MINUS_XXX: substracts the duration to the date-time._AT_xxx: sets individual fields of the date-time. E.g._AT_T0Hwill set the hour of the day part the date-time to 00 hour.
where xxx is date-time duration-like syntax in the form nYnMnDTnHnMnS, e.g.:
- 12Y is 12 years.
- 1M2D is 1 month and 2 days.
- 1DT12H is 1 day and 12 hours.
- T12H30M is 12 hours and 30 minutes.
Examples, (for argument sake, let's assume the
current time is 2022-02-01T10:11:18Z and
we have set the reference time to 2024-12-25T00:00:00Z.)
${YP_TIME_NOW} # 2022-02-01T10:11:18+0000
${YP_TIME_NOW_AT_0H0M0S} # 2022-02-01T00:00:00+0000
${YP_TIME_NOW_AT_0H0M0S_PLUS_T12H} # 2022-02-01T12:00:00+0000
${YP_TIME_REF} # 2024-12-25T00:00:00+0000
${YP_TIME_REF_AT_1DT18H} # 2024-12-01T18:00:00+0000
${YP_TIME_REF_PLUS_T6H30M} # 2024-12-25T06:30:00+0000
${YP_TIME_REF_MINUS_1D} # 2024-12-24T00:00:00+0000You can control date-time output formats using the
--time-format=[NAME=]FORMAT option or YP_TIME_FORMAT[_<NAME>]
environment variables.
For example, if you set:
--time-format='%FT%T%z'(default)--time-format=CTIME='%a %e %b %T %Z %Y'orexport YP_TIME_FORMAT_CTIME='%a %e %b %T %Z %Y'--time-format=ABBR='%Y%m%dT%H%M%S%z'orexport YP_TIME_FORMAT_ABBR='%Y%m%dT%H%M%S%z'
Then:
${YP_TIME_REF} # 2024-12-25T00:00:00+0000
${YP_TIME_REF_FORMAT_CTIME} # Wed 25 Dec 00:00:00 GMT 2024
${YP_TIME_REF_PLUS_T12H_FORMAT_ABBR} # 20241225T120000+0000Finally, if a variable name is already defined in the environment
or in a --define=... option, then the defined value takes precedence,
so if you have already export YP_TIME_REF=whatever, then you will get
the value whatever instead of the reference time.