Custom column encoding

Google group user JerLucid [suggests](https://groups.google.com/g/kdbtorq/c/x1p4nzPvXZc/m/Soc3efi4CAAJ) adding a mechanism that would enable users to encode table columns as chars/shorts/ints, as an alternative to longs with the standard enumeration.

[Testing](https://github.com/pmurphy979/custom-column-encoding/blob/master/colencode.md) supports the main arguments:
- the encoded table and mapping dictionaries tested used less disk space and memory than the enumerated table and sym file
- the writing process was faster and used less memory
- simple filtering and grouping queries were faster and used less memory

The main downsides are:
- queries are more complicated, involving dictionary lookups and reverse lookups for encoding/decoding
- encoding domains are smaller and the available domain space would potentially need to be monitored
 
One method of implementing this approach in TorQ would be to add some new configuration (e.g. a .csv file with table -> column -> mapping file name -> mapping data type) and a new function similar to .Q.en which would take this config and a table and update the mapping files on disk and encode the table. By inserting a call to this function just before .Q.en everywhere in the code, .Q.en would pick up any unencoded symbols (i.e. replicate the current behaviour if nothing is encoded).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom column encoding #698

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Custom column encoding #698

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions