Skip to content

copyleftdev/faux-foundry

Repository files navigation

FauxFoundry Logo

FauxFoundry

A powerful CLI and TUI for synthetic, domain-aware data generation powered by local LLMs.

FauxFoundry enables teams to generate unique synthetic datasets from human-readable YAML specifications. It leverages local AI models (e.g., Ollama) to produce realistic, domain-aware data that respects schema constraints while ensuring exactly N unique records are delivered through efficient streaming with minimal validation overhead.

Created by copyleftdev - Building tools for developers, by developers.

✨ Features

  • 🎯 YAML-Driven: Simple, human-readable specifications
  • πŸ€– LLM-Powered: Uses local models (Ollama) for realistic data generation
  • πŸ”„ Streaming: Constant memory usage, handles large datasets efficiently
  • 🎨 Rich TUI: Interactive terminal interface for guided workflows
  • ⚑ CLI-First: Automation-friendly command-line interface
  • πŸ”’ Privacy-First: All processing happens locally, no data leaves your machine
  • πŸ“Š Real-time Monitoring: Live progress tracking and statistics
  • βœ… Validation: Built-in specification validation and error handling
  • πŸ₯ Healthcare Ready: EDI, FHIR, HL7, and medical claims support
  • πŸ”„ Intelligent Retry: Advanced timeout handling with adaptive strategies
  • 🎲 Deduplication: Ensures 100% unique records with canonical hashing
  • πŸ“ˆ Production Scale: Generate millions of records with constant memory usage

πŸš€ Quick Start

Prerequisites

  • Go 1.21 or later
  • Ollama running locally with a model (e.g., llama3.1:8b)

Installation

# Clone the repository
git clone https://github.com/copyleftdev/faux-foundry
cd faux-foundry

# Build the binary
go build -o bin/fauxfoundry ./cmd/fauxfoundry

# Or install directly
go install ./cmd/fauxfoundry

# Check installation
./bin/fauxfoundry --version

Basic Usage

  1. Create a specification:
fauxfoundry init customer.yaml --template ecommerce
  1. Validate the specification:
fauxfoundry validate customer.yaml
  1. Generate synthetic data:
fauxfoundry generate --spec customer.yaml --output outputs/data.jsonl
  1. Launch interactive TUI:
fauxfoundry tui

πŸ“‹ Specification Format

FauxFoundry uses YAML specifications to define your data generation requirements:

model:
  endpoint: "http://localhost:11434"
  name: "llama3.1:8b"
  batch_size: 32
  temperature: 0.7

dataset:
  count: 1000
  domain: "E-commerce customer data"
  fields:
    - name: "email"
      type: "email"
      required: true
      pattern: "@(gmail|yahoo|outlook)\\.com$"
    - name: "age"
      type: "integer"
      required: true
      range: [18, 80]
    - name: "status"
      type: "enum"
      required: true
      values: ["active", "inactive", "pending"]
    - name: "created_at"
      type: "datetime"
      required: true
      description: "Account creation date"
    - name: "preferences"
      type: "object"
      description: "Customer preferences and settings"

Field Types

  • string - Text strings
  • text - Longer text content
  • integer - Whole numbers
  • float - Decimal numbers
  • boolean - True/false values
  • datetime - ISO 8601 timestamps
  • date - Date values
  • time - Time values
  • email - Email addresses
  • url - URLs
  • uuid - UUID values
  • phone - Phone numbers
  • enum - Predefined values
  • object - Nested objects
  • array - Arrays of values

Field Constraints

  • required - Field must be present
  • pattern - Regex pattern for validation
  • range - Min/max values for numbers
  • values - Allowed values for enums
  • description - Field description for LLM context

πŸ–₯️ CLI Commands

generate - Generate synthetic data

Generate synthetic data from YAML specifications with advanced options:

# Basic generation
fauxfoundry generate --spec customer.yaml

# Override count and specify output
fauxfoundry generate --spec customer.yaml --count 5000 --output outputs/data.jsonl.gz

# Dry run validation
fauxfoundry generate --spec customer.yaml --dry-run

# Interactive mode
fauxfoundry generate --interactive

# Advanced timeout handling
fauxfoundry generate --spec complex-edi.yaml --max-retries 5 --min-batch-size 1

# Custom timeout and seed
fauxfoundry generate --spec customer.yaml --timeout 30m --seed 12345

Flags:

  • -s, --spec string - Path to YAML specification file (required)
  • -o, --output string - Output file path (stdout if not specified)
  • -n, --count int - Override record count from specification
  • -t, --timeout string - Maximum execution time (default "2h")
  • --seed int - Random seed for reproducibility
  • --dry-run - Validate specification without generating data
  • -i, --interactive - Launch interactive TUI mode
  • --max-retries int - Maximum retry attempts on timeout (default 3)
  • --min-batch-size int - Minimum batch size before giving up (default 1)

validate - Validate specifications

Validate YAML specifications for syntax and semantic correctness:

# Validate single file
fauxfoundry validate customer.yaml

# Validate multiple files
fauxfoundry validate *.yaml

# Verbose validation with detailed output
fauxfoundry validate customer.yaml --verbose

# Quiet validation (errors only)
fauxfoundry validate customer.yaml --quiet

Flags:

  • --dry-run - Same as validate (included for consistency)
  • -v, --verbose - Enable detailed validation output
  • -q, --quiet - Show only errors

init - Create new specifications

Create new YAML specifications from templates or interactively:

# Interactive creation
fauxfoundry init customer.yaml

# From template
fauxfoundry init --template ecommerce customer.yaml

# Available templates
fauxfoundry init --list-templates

# Force overwrite existing file
fauxfoundry init --force customer.yaml --template medical

Available Templates:

  • ecommerce - E-commerce customer data
  • user - User profiles and authentication
  • product - Product catalog with pricing
  • medical - Healthcare and medical records
  • financial - Financial transactions and accounts

Flags:

  • --template string - Use predefined template
  • --list-templates - Show available templates
  • --force - Overwrite existing files

tui - Launch interactive interface

Launch the rich Terminal User Interface for guided workflows:

# Launch TUI
fauxfoundry tui

# Launch with specific spec
fauxfoundry tui --spec customer.yaml

# Launch in specific mode
fauxfoundry tui --mode generate

Flags:

  • --spec string - Load specific specification file
  • --mode string - Start in specific mode (browse, edit, generate, monitor)

doctor - System health check

Diagnose system health and Ollama connectivity:

# Full system check
fauxfoundry doctor

# Check specific endpoint
fauxfoundry doctor --endpoint http://localhost:11434

# Verbose diagnostics
fauxfoundry doctor --verbose

Flags:

  • --endpoint string - Ollama endpoint to check
  • --fix - Attempt to fix common issues
  • --models - List available models

🎨 Terminal User Interface (TUI)

The TUI provides a rich, interactive experience with:

  • Specification Editor: Visual YAML editing with validation
  • Generation Monitor: Real-time progress and statistics
  • File Browser: Manage specifications and outputs
  • Settings Panel: Configure models and preferences

Keyboard Shortcuts

  • F1 - Help
  • F2 - Specification Browser
  • F3 - Generate Data
  • F4 - Monitor Generation
  • F10 - Quit
  • Ctrl+N - New Specification
  • Ctrl+S - Save
  • Tab/Shift+Tab - Navigate components

πŸ”§ Configuration

Global Flags

  • --config - Configuration file path
  • --verbose - Enable verbose logging
  • --quiet - Suppress non-essential output
  • --no-color - Disable colored output

Model Configuration

Configure your LLM backend in the specification:

model:
  endpoint: "http://localhost:11434"  # Ollama endpoint
  name: "llama3.1:8b"                # Model name
  batch_size: 32                     # Records per batch
  temperature: 0.7                   # Creativity (0-2)
  timeout: "30s"                     # Request timeout

πŸ“Š Output Format

FauxFoundry generates data in JSON Lines (JSONL) format:

{"email": "john.doe@gmail.com", "age": 34, "status": "active", "created_at": "2023-05-15T10:30:00Z", "preferences": {"newsletter": true}}
{"email": "jane.smith@yahoo.com", "age": 28, "status": "pending", "created_at": "2023-06-20T14:45:00Z", "preferences": {"newsletter": false}}

Output can be:

  • Streamed to stdout
  • Saved to files (.jsonl or .jsonl.gz)
  • Piped to other tools (jq, databases, etc.)

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FauxFoundry Interface                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  CLI Layer          β”‚  TUI Layer          β”‚  Shared Core    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Cobra CLI   β”‚    β”‚  β”‚ Bubble Tea  β”‚    β”‚  β”‚ Spec Parser β”‚ β”‚
β”‚  β”‚ Commands    β”‚    β”‚  β”‚ Components  β”‚    β”‚  β”‚ LLM Client  β”‚ β”‚
β”‚  β”‚ Flags       β”‚    β”‚  β”‚ Views       β”‚    β”‚  β”‚ Dedup Logic β”‚ β”‚
β”‚  β”‚ Validation  β”‚    β”‚  β”‚ Models      β”‚    β”‚  β”‚ Output      β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

faux-foundry/
β”œβ”€β”€ cmd/fauxfoundry/     # Main application entry point
β”œβ”€β”€ internal/            # Internal packages
β”‚   β”œβ”€β”€ cli/            # CLI commands and logic
β”‚   β”œβ”€β”€ tui/            # Terminal UI components
β”‚   β”œβ”€β”€ llm/            # LLM client and Ollama integration
β”‚   β”œβ”€β”€ spec/           # YAML specification parsing
β”‚   β”œβ”€β”€ dedup/          # Record deduplication logic
β”‚   └── output/         # Output writers (JSONL, compression)
β”œβ”€β”€ pkg/types/          # Shared type definitions
β”œβ”€β”€ examples/           # Sample YAML specifications
β”œβ”€β”€ outputs/            # Generated data files (gitignored)
└── docs/              # Documentation (PRD, design specs)

πŸ§ͺ Examples & Use Cases

FauxFoundry includes comprehensive example specifications for various domains:

πŸ“Š Business & E-commerce

  • customer.yaml - E-commerce customer data with demographics
  • product.yaml - Product catalog with pricing and inventory
  • user.yaml - User profiles and authentication data

πŸ₯ Healthcare & Medical

  • medical-demo.yaml - Basic medical insurance verification
  • medical-insurance.yaml - Comprehensive 46-field insurance data
  • edi-270-271.yaml - EDI X12 healthcare eligibility transactions (53 fields)
  • rx-claims-edi.yaml - NCPDP D.0 pharmacy claims (75+ fields)
  • x12-837-core.yaml - X12 837 Professional Claims (66 fields)

πŸ’Ό Enterprise & Integration

  • financial-transactions.yaml - Banking and payment data
  • api-logs.yaml - Application logs and metrics
  • inventory-management.yaml - Supply chain and logistics

🎯 Real-World Applications

Healthcare Systems:

# Generate 1000 medical insurance records
fauxfoundry generate --spec examples/medical-insurance.yaml --count 1000 --output outputs/insurance-test-data.jsonl

# Create EDI test transactions
fauxfoundry generate --spec examples/edi-270-271.yaml --count 100 --output outputs/edi-test.jsonl.gz

Development & Testing:

# Generate customer test data for QA
fauxfoundry generate --spec examples/customer.yaml --count 50000 --output outputs/qa-customers.jsonl

# Create reproducible test datasets
fauxfoundry generate --spec examples/user.yaml --seed 12345 --count 1000

Performance Testing:

# Generate large datasets with streaming
fauxfoundry generate --spec examples/product.yaml --count 1000000 --output outputs/products.jsonl.gz

# Stress test with complex specifications
fauxfoundry generate --spec examples/x12-837-core.yaml --count 10000 --max-retries 5

🀝 Contributing

We welcome contributions from the community! Here's how to get started:

  1. Fork the repository on GitHub
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with proper tests and documentation
  4. Run tests (go test ./...)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request with a clear description

Development Setup

# Clone your fork
git clone https://github.com/yourusername/faux-foundry
cd faux-foundry

# Install dependencies
go mod download

# Run tests
go test ./...

# Build and test locally
go build -o bin/fauxfoundry ./cmd/fauxfoundry
./bin/fauxfoundry doctor

Code Guidelines

  • Follow Go best practices and gofmt formatting
  • Add tests for new functionality
  • Update documentation for user-facing changes
  • Use conventional commit messages

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Open Source Commitment

FauxFoundry is committed to being a truly open-source project:

  • βœ… No vendor lock-in or proprietary dependencies
  • βœ… Local-first processing (your data never leaves your machine)
  • βœ… Community-driven development and feature requests
  • βœ… Transparent development process

πŸ™ Acknowledgments & Credits

Created by copyleftdev with ❀️ for the developer community.

Technology Stack

  • Ollama - Local LLM infrastructure and model management
  • Cobra - Powerful CLI framework for Go
  • Bubble Tea - Terminal UI framework
  • Lip Gloss - Terminal styling and layout
  • Go - Systems programming language

Healthcare Standards

  • ANSI X12 - EDI transaction standards for healthcare
  • NCPDP - Pharmacy claims processing standards
  • HL7 FHIR - Healthcare interoperability standards
  • ICD-10 - International disease classification
  • CPT - Current Procedural Terminology codes

Community

Special thanks to the open-source community and all contributors who help make FauxFoundry better!


πŸš€ Get Started Today

# Quick start - generate your first synthetic dataset
git clone https://github.com/copyleftdev/faux-foundry
cd faux-foundry
go build -o bin/fauxfoundry ./cmd/fauxfoundry
./bin/fauxfoundry init my-data.yaml --template ecommerce
./bin/fauxfoundry generate --spec my-data.yaml --count 100

FauxFoundry - Generate synthetic data with confidence 🎯

Built by developers, for developers. Privacy-first. Open source. Production ready.