Skip to content

Custom column encoding #698

@pmurphy979

Description

@pmurphy979

Google group user JerLucid suggests adding a mechanism that would enable users to encode table columns as chars/shorts/ints, as an alternative to longs with the standard enumeration.

Testing supports the main arguments:

  • the encoded table and mapping dictionaries tested used less disk space and memory than the enumerated table and sym file
  • the writing process was faster and used less memory
  • simple filtering and grouping queries were faster and used less memory

The main downsides are:

  • queries are more complicated, involving dictionary lookups and reverse lookups for encoding/decoding
  • encoding domains are smaller and the available domain space would potentially need to be monitored

One method of implementing this approach in TorQ would be to add some new configuration (e.g. a .csv file with table -> column -> mapping file name -> mapping data type) and a new function similar to .Q.en which would take this config and a table and update the mapping files on disk and encode the table. By inserting a call to this function just before .Q.en everywhere in the code, .Q.en would pick up any unencoded symbols (i.e. replicate the current behaviour if nothing is encoded).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions