Skip to content

pmc-tools/umb

Repository files navigation

title date description published
Specification of the UMB Format
01/21/2026
Overall specification of the unified Markov binary format.
true

Introduction

This repository shall contain the official specification (hereafter called the Standard) and reference implementation for the unified Markov Binary (UMB) format.

Models specified in the UMB ("unified Markov binary") format consist of a folder structure containing a set of files with well-defined names (containing characters [a-z][0-9][-_] plus a single . only) in well-defined locations. For transporting models, that folder structure is bundled in a tar file that is optionally compressed using gzip or xz. The file extension for both compressed and uncompressed tar files of this format is .umb. Tools can look at the magic bytes at the beginning of the file to detect the format: big-endian 75 73 74 61 72 at offset 257 for POSIX tar, 1F 8B 08 for gzip or FD 37 7A 58 5A 00 for xz.

A detailed description of the individual files follows below. Overall, there is one central index.json file, which provides all metadata relevant for the model, e.g., its type, statistics regarding its size and what other information is attached (such as rewards, atomic propositions or state valuations). The file also acts an index for the binary files that make up the rest of the tar file.

Definitions

The specification shall use the keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" as prescribed by RFC 2119.

Below, the following terms, when capitalized are as follows:

  • Model: A Markov model such as a POMDP, MDP, CTMC, or DTMC.
  • Label: Textual or numerical information appended to a State, Choice Edge, or Branch Edge.
    • Labels may be created by the application of a stateless function (or mapping) of the State, Choice Edge, or Branch Edge which is being labeled, to the generated Label.
  • State: A unique configuration of the RS's variables.
  • State Label: A label associated with the State.
  • State Graph: A directed graph representing the state space of a Markov Automata, containing:
    • States, as vertices
    • Distribution Vertices
    • Choice Edges
    • Branch Edges
  • Initial State: A State Vertex that is a valid starting point for traces, representing an initial configuration of the system. These are simply States which contain a State Label "init", encoded as a 4-character ISO 8859-1 string.
  • Choice Edge (Choice): A directed graph edge which represent nondeterministic choices in the RS. Choice edges must exit from a State Vertex and lead to a Distribution Vertex.
  • Distribution Vertex: A non-State vertex connecting a nondeterministic choice to a set of probabilistic Branches. Only one choice edge may lead into a Distribution Vertex.
  • Distribution: The set of all Branches connected to a Distribution Vertex.
  • Branch Edge (Branch): An edge representing a probabilistic change in the system. Branches must exit from a Distribution Vertex and lead to a State Vertex.
  • Deterministic Model (DM): Any model where each State Vertex has only one Choice Edge, and thus only one Distribution Vertex.
  • Nondeterministic Model (NDM): A model where each State Vertex may have more than one Choice Edge, thus more than one Distribution Vertex.
  • Rewards: Additional information, attached as an integer to either States or Choice Edges. Negative Rewards are considered Costs.
    • State Rewards: Rewards attached to States.
    • Choice Rewards: Rewards attached to Choices.
  • Continuous Time Model: A model which evolves in continuous, as opposed to discrete time. Graph branches in Continuous Time Models shall be stored as exponentially distributed rate parameters. A branch between states with indecies $i$ and $j$ respectively, with label $\lambda$ shall correspond to the $i, j$ element in that model's $Q-$matrix shall be $\lambda$.
  • Discrete Time Model: A model which evolves in discrete step-time. Graph branches shall be interpreted as transition probabilities.
  • Partially Epistemic Model/Partially Observable Model: any Model where the underlying State or Configuration is not directly observable.
    • A Partially Epistemic Model shall have at least one Agent (Observer) which can make Observations of the model at each time step.
  • Supported models:
    • Discrete Time Markov Chain (DTMC): A Probabilistically Deterministic Model in Discrete Time.
    • Continuous Time Markov Chain (CTMC): A Probabilistically Deterministic Model in Continuous Time.
    • Discrete-Time Markov Decision Process or just Markov Decision Process (MDP): A Nondeterministic Model in Discrete Time.
    • Continuous Time Markov Decision Process (CMDP): A Nondeterministic Model in Continuous Time.
    • Partially Observable Markov Decision Process (POMDP): A Markov Decision Process (MDP) in discrete time where there exists at least one Agent making Observations.
  • A Tool shall be defined as any piece of software which reads and writes this format, including the provided reference implementation.
  • A Malformed file shall be any file which doesn't conform to this standard. We shall define two kinds of Malformed:
    • Malformed but Recoverable (MbR) files shall be any file which are malformed but can be reasonably interpreted as a valid Markov process. If a file is Malformed but Recoverable, Tools shall warn but allow the user to open the file.
    • Critically Malformed (CM) files shall be any file which cannot be interpreted as a valid Markov process. Tools shall not attempt to open critically malformed files.

Types referenced in this Standard

When this standard refers to "floating point" it shall be in reference to the standard defined in IEEE 754. When this standard refers to a "string" it shall be in reference to an ISO 8859-1 encoded string. An "integer" when not explicitly marked as "unsigned" shall be interpreted as a signed, two's compliment integer, and when when marked as "unsigned" shall be interpreted as such.

Versioning

This Standard shall be semantically versioned, and its version shall be defined as followed:

  • A Major Version shall be any version which requires substantial change to this Standard, as determined by a majority vote on the standards committee. A Major Version may break backwards compatibility.
  • A Minor Version shall be any version which does not break backwards compatibility, and is determined by the committee to be variant enough to not be a Patch.
  • A Patch or Revision shall be all other changes to the Standard.

Versions shall be stylized in the format MajorVersion.MinorVersion.Patch. The major version shall be limited to at most three (3) decimal digits. The minor version shall be limited to at most three (3) decimal digits. The patch or revision shall be limited to four (4) decimal digits. It shall match the following Regular Expression ([0-9]{1,2})+[.]+([0-9]{1,2})+[.]+([0-9]{1,2}).

Tools shall keep track of which versions of this standard are supported, and should emit helpful error and warning messages to a user when they attempt to load an unsupported file version.

Current Version

The current version of the standard shall be

Version Section Value
Major Version 0
Minor Version 0
Patch/Revision 1

This is stylized as 0.0.1 in code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published