Skip to content

[Epic] Refactor dataset view APIs for improved rendering #256

@rudokemper

Description

@rudokemper

Feature Request

Currently, the way we serve data for one of the dataset views is highly inefficient and does not scale for large datasets.

At present, we expose a single endpoint (e.g. api/[table]/map.ts) that:

  1. Fetches all records from the database
  2. Applies filtering based on config params
  3. Transforms columns and values into human-readable formats
  4. Converts the result into GeoJSON when used for maps
  5. Returns the fully processed dataset in response.data

This endpoint is called every time a dataset view page loads. This design has several drawbacks:

  • Large payloads and high latency
    For large datasets, the API response can be very large, resulting in several seconds of backend processing time and additional frontend rendering latency (especially when painting data onto the Mapbox canvas).

  • Browser memory pressure
    For very large datasets, holding the full transformed dataset in memory can cause the browser to become unresponsive or crash.

  • Over-coupled API logic
    Filtering and presentation-layer transformations are baked into the API response. As a result, other consumers like the "download data" buttons can only access modified data, rather than the original raw records stored in the database. This behavior is not desirable to users of the buttons, who want the raw, canonical data.

Implementation Plan

This all could be much better:

  • Split data access into purpose-specific endpoints - Replace the single “fetch everything” endpoint with smaller, focused endpoints:
    • GET api/[table]/map -- returns only the minimal data required to render a map view, such as geometry, a stable record identifier, and fields required for styling or filtering (like color, column to filter by).
    • GET api/[table]/[recordId] - returns the full, raw record for a single feature. This endpoint is called on demand (e.g. when a user clicks a point on the map).
    • A "GET-many" to fetch multiple raw records in one request for views that need many details at once (like the Gallery view).
      • option: GET api/[table]/ids=1,3,5,6,8. Downside there are URL length limits, say 2048 chars long. If IDs are UUIDs you are likely to run afoul of that.
      • option: POST api/[table]/records. Downside is screwy semantics, since we are not changing server data, and also hypothetical intermediate caching layers would never cache this.
  • Add client-side caching and request de-duping - Avoid repeated fetches for the same records across map clicks and gallery cards by caching records client-side (e.g. recordId -> record).
    • Make sure to check if the cache needs to be invalidated at any point.
  • Move filtering and transformation client-side - To decouple presentation logic from data access and reduce backend latency, shift filtering and human-readable transformations to the client. Within the client, weigh pros and cons of doing it at render-time vs having client cache the already transformed data.
  • Dedicated export endpoints - Download buttons should call separate export endpoints that stream raw, untransformed records to disk, rather than loading everything into memory; then return a download link to that file.
  • Enable response compression - enable gzip encoding for API responses (e.g. via @nuxtjs/precompress). This will reduce payload size over the wire.

Out of scope:

  • Spatial querying - In the future, for maps, we can return only features that intersect the current map viewport. But this can be something we tackle post-epic.

If needed, we can file sub-issues to track the work. I also propose we start a branch dev/api-refactor that we can submit individual PRs against.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions