Skip to content
/ vine Public

(PoC) Another datalake table format, for research

Notifications You must be signed in to change notification settings

kination/vine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vine - Datalake Format base on Rust (WIP)

This project is 'working in progress'

This project aimes 'datalake table format' optimized for streaming data writes. It is built on Rust, and vortex

Quick Start

Build

./build.sh

This builds:

  • vine-core: Rust library for Vine
  • vine-spark: Spark DataSource V2 connector

Usage with Spark

// Write streaming data
spark.readStream
  .format("vine")
  .load("input-path")
  .writeStream
  .format("vine")
  .option("path", "/data/my-table")
  .start()

// Read with Spark SQL
val df = spark.read.format("vine").load("/data/my-table")
df.show()

Architecture

┌─────────────────────────────────────┐
│   Query Engines (Spark, Flink..)    │
└──────────────┬──────────────────────┘
               │ DataSource API
┌──────────────▼──────────────────────┐
│  Connectors (vine-spark/vine-flink) │
└──────────────┬──────────────────────┘
               │ JNI
┌──────────────▼──────────────────────┐
│  Rust Core (vine-core)              │
│  - Fast 'vortex' writes            │
│  - Date-based partitioning          │
└──────────────┬──────────────────────┘
               │
┌──────────────▼──────────────────────┐
│  Storage (vortex files)             │
│  2024-12-26/data_143025.vtx         │
│  2024-12-27/data_091500.vtx        │
└─────────────────────────────────────┘

Components

Component Language Status Purpose
vine-core Rust WIP Write-optimized datalake table format
vine-spark Scala WIP Spark DataSource V2 connector
vine-trino Java Planned Trino connector (not started)

Storage Format

  • Files: Vortex
  • Partitioning: Date-based directories (YYYY-MM-DD/data_HHMMSS.vtx)
  • Metadata: JSON schema file (vine_meta.json)
  • Types: integer, string, boolean, double

Documentation

Development

Build Components Individually

Rust Core

cd vine-core
cargo build --release
cargo test

Spark Connector

cd vine-spark
sbt clean assembly

Requirements

  • Rust 1.70+
  • Scala 2.13, sbt 1.x
  • Java 11

About

(PoC) Another datalake table format, for research

Resources

Stars

Watchers

Forks

Releases

No releases published