Skip to content

Backends are inconsistent with each other #73

@nolar

Description

@nolar

Example:

Consider the following configs in YAML & XML, which are made friendly to the users (who create these configs). YAML is "canonical" and directly maps to the expected Python dicts.

config:
  db:
    type: redis
    host: 1.2.3.4
    port: 6379
  layer:
    silent: yes
    name: abc
environment:
  mode: static
  tasks:
    - RebootBeforeStart
    - MountLocalHDD
  xservers:
    - ip: 10.20.30.40
      username: root
  pservers:
    - ip: 20.30.40.50
      username:  user
<test-run>
  <config>
    <db type="redis" host="1.2.3.4" port="6379"/>
    <layer silent="yes">abc</layer>
  </config>
  <environment mode="static">
    <task>RebootBeforeStart</flag>
    <task>MountLocalHDD</flag>
    <xserver ip="10.20.30.40" username="root" />
    <pserver ip="20.30.40.50" username="user" />
  </environment>
</test-run>

The actual outcome:

Now, I want to use the unified parser with multi-file loading from a set of expected paths, and the configs can be at any of them. As a developer, I meet lots of inconsistencies caused by multiple formats per se, and with how they are interpreted by anyconfig:

The XML backend returns the structures which contains the meta-information about the structure of the configs, and thus breaks easy access to them. E.g.:

Presence of @attrs, @text & @children keys is basically not under my control, because as there are some conflicts in the XML files, it does not raise an invalid format error, but alters the data structures. Text contents are sometimes set as a string, but if there are attrs, they are in @text.

>>> anyconfig.load(xml_files, ignore_missing=True)
{'test-run': {'config': {'db': {'@attrs': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'}},
                         'layer': {'@attrs': {'silent': 'true'}, '@text': 'abc'}},
              'environment': {'@attrs': {'mode': 'static'},
                              '@children': [{'task': 'RebootBeforeStart'},
                                            {'task': 'MountLocalHDD'},
                                            {'xserver': {'@attrs': {'ip': '10.20.30.40', 'username': 'root'}}},
                                            {'pserver': {'@attrs': {'ip': '20.30.40.50', 'username': 'user'}}}]}}}

Even if I request merging of the attrs, it is still not predictable: instead of having attrs merged on the same level as @text, they are still stored in @attrs sub-dict.

>>> anyconfig.load(files, ignore_missing=True, merge_attrs=True)
{'test-run': {'config': {'db': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'},
                         'layer': {'@attrs': {'silent': 'true'}, '@text': 'abc'}},
              'environment': {'@attrs': {'mode': 'static'},
                              '@children': [{'task': 'RebootBeforeStart'},
                                            {'task': 'MountLocalHDD'},
                                            {'xserver': {'ip': '10.20.30.40', 'username': 'root'}},
                                            {'pserver': {'ip': '20.30.40.50', 'username': 'user'}}]}}}

Also, if there are some duplicated nodes in XML, the whole children list is made as a list, not only those duplicated nodes.

The expected outcome:

{'config': {'db': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'},
            'layer': {'silent': 'true', '@text': 'abc'}},
 'environment': {'mode': 'static',
                 'task': ['RebootBeforeStart',
                          'MountLocalHDD'],
                 'xserver': {'ip': '10.20.30.40', 'username': 'root'},
                 'pserver': {'ip': '20.30.40.50', 'username': 'user'},
                },
}

Or, for multiple <pserver> tags:

{'config': {'db': {'host': '1.2.3.4', 'port': '6379', 'type': 'redis'},
            'layer': {'silent': 'true', '@text': 'abc'}},
 'environment': {'mode': 'static',
                 'task': ['RebootBeforeStart',
                          'MountLocalHDD'],
                 'xserver': {'ip': '10.20.30.40', 'username': 'root'},
                 'pserver': [{'ip': '20.30.40.50', 'username': 'user'},
                             {'ip': '20.30.40.60', 'username': 'user'}],
                },
}

Problem:

  1. As a result of such significant differences, as a developer, I have to go through the whole config structure and merge the attrs/children/texts myself. Even if I use merge_attrs=True. Hence, I produce a lot of code for post-parsing, thus basically doing the parsing myself; the library just "read" the files in different formats, but it is not what was expected.

  2. If there are multiple configs in different formats with such structural differences, they become non-mergeable, or merge improperly (mostly because I have to do this afterwards, not during anyconfig processing & merging with the merge strategies).

Suggestions:

Change the logic of merging for the XML backend, that by default is most consistent with YAML output:

  • Merge all attrs & children nodes into the parent node. Always. Maybe, optionally disable it. But by default, merge them.
  • If there are conflicts between attrs & children with the same name, raise an error. This is obviously an error in the content.
  • If there are nodes with text & attrs/children combined (<layer> or <environment> in the examples above), put the text content in a special (configureable) key (@text as it is now), on the same levels as attrs/children.
  • Stored children with the same names as a list of values (if there are 2+ of them). Otherwise, store the value as is.
  • Ignore the top-level tag in XML formats. It is enforced by XML format itself, and is never meaningful.

Notes:

In the examples, there are differences between singles in XML and plurals in YAML. This corresponds to the semantics of the XML & YAML formats. I believe, this cannot be handled automatically for obvious reasons. But, even cfg.environment.task & cfg.environment.tasks possible keys are easier to interpret where needed with some basic knowledge of the domain.

Also, there is an issue with config.layer.name (YAML) vs. config.layer.@text (XML), which also should be solved with some domain knowledge, and can be ignored for this problem with XML-vs-YAML inconsistencies.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions