FauxFoundry enables teams to generate unique synthetic datasets from human-readable YAML specifications. It leverages local AI models (e.g., Ollama) to produce realistic, domain-aware data that respects schema constraints while ensuring exactly N unique records are delivered through efficient streaming with minimal validation overhead.
Created by copyleftdev - Building tools for developers, by developers.
- π― YAML-Driven: Simple, human-readable specifications
- π€ LLM-Powered: Uses local models (Ollama) for realistic data generation
- π Streaming: Constant memory usage, handles large datasets efficiently
- π¨ Rich TUI: Interactive terminal interface for guided workflows
- β‘ CLI-First: Automation-friendly command-line interface
- π Privacy-First: All processing happens locally, no data leaves your machine
- π Real-time Monitoring: Live progress tracking and statistics
- β Validation: Built-in specification validation and error handling
- π₯ Healthcare Ready: EDI, FHIR, HL7, and medical claims support
- π Intelligent Retry: Advanced timeout handling with adaptive strategies
- π² Deduplication: Ensures 100% unique records with canonical hashing
- π Production Scale: Generate millions of records with constant memory usage
- Go 1.21 or later
- Ollama running locally with a model (e.g.,
llama3.1:8b)
# Clone the repository
git clone https://github.com/copyleftdev/faux-foundry
cd faux-foundry
# Build the binary
go build -o bin/fauxfoundry ./cmd/fauxfoundry
# Or install directly
go install ./cmd/fauxfoundry
# Check installation
./bin/fauxfoundry --version- Create a specification:
fauxfoundry init customer.yaml --template ecommerce- Validate the specification:
fauxfoundry validate customer.yaml- Generate synthetic data:
fauxfoundry generate --spec customer.yaml --output outputs/data.jsonl- Launch interactive TUI:
fauxfoundry tuiFauxFoundry uses YAML specifications to define your data generation requirements:
model:
endpoint: "http://localhost:11434"
name: "llama3.1:8b"
batch_size: 32
temperature: 0.7
dataset:
count: 1000
domain: "E-commerce customer data"
fields:
- name: "email"
type: "email"
required: true
pattern: "@(gmail|yahoo|outlook)\\.com$"
- name: "age"
type: "integer"
required: true
range: [18, 80]
- name: "status"
type: "enum"
required: true
values: ["active", "inactive", "pending"]
- name: "created_at"
type: "datetime"
required: true
description: "Account creation date"
- name: "preferences"
type: "object"
description: "Customer preferences and settings"string- Text stringstext- Longer text contentinteger- Whole numbersfloat- Decimal numbersboolean- True/false valuesdatetime- ISO 8601 timestampsdate- Date valuestime- Time valuesemail- Email addressesurl- URLsuuid- UUID valuesphone- Phone numbersenum- Predefined valuesobject- Nested objectsarray- Arrays of values
required- Field must be presentpattern- Regex pattern for validationrange- Min/max values for numbersvalues- Allowed values for enumsdescription- Field description for LLM context
Generate synthetic data from YAML specifications with advanced options:
# Basic generation
fauxfoundry generate --spec customer.yaml
# Override count and specify output
fauxfoundry generate --spec customer.yaml --count 5000 --output outputs/data.jsonl.gz
# Dry run validation
fauxfoundry generate --spec customer.yaml --dry-run
# Interactive mode
fauxfoundry generate --interactive
# Advanced timeout handling
fauxfoundry generate --spec complex-edi.yaml --max-retries 5 --min-batch-size 1
# Custom timeout and seed
fauxfoundry generate --spec customer.yaml --timeout 30m --seed 12345Flags:
-s, --spec string- Path to YAML specification file (required)-o, --output string- Output file path (stdout if not specified)-n, --count int- Override record count from specification-t, --timeout string- Maximum execution time (default "2h")--seed int- Random seed for reproducibility--dry-run- Validate specification without generating data-i, --interactive- Launch interactive TUI mode--max-retries int- Maximum retry attempts on timeout (default 3)--min-batch-size int- Minimum batch size before giving up (default 1)
Validate YAML specifications for syntax and semantic correctness:
# Validate single file
fauxfoundry validate customer.yaml
# Validate multiple files
fauxfoundry validate *.yaml
# Verbose validation with detailed output
fauxfoundry validate customer.yaml --verbose
# Quiet validation (errors only)
fauxfoundry validate customer.yaml --quietFlags:
--dry-run- Same as validate (included for consistency)-v, --verbose- Enable detailed validation output-q, --quiet- Show only errors
Create new YAML specifications from templates or interactively:
# Interactive creation
fauxfoundry init customer.yaml
# From template
fauxfoundry init --template ecommerce customer.yaml
# Available templates
fauxfoundry init --list-templates
# Force overwrite existing file
fauxfoundry init --force customer.yaml --template medicalAvailable Templates:
ecommerce- E-commerce customer datauser- User profiles and authenticationproduct- Product catalog with pricingmedical- Healthcare and medical recordsfinancial- Financial transactions and accounts
Flags:
--template string- Use predefined template--list-templates- Show available templates--force- Overwrite existing files
Launch the rich Terminal User Interface for guided workflows:
# Launch TUI
fauxfoundry tui
# Launch with specific spec
fauxfoundry tui --spec customer.yaml
# Launch in specific mode
fauxfoundry tui --mode generateFlags:
--spec string- Load specific specification file--mode string- Start in specific mode (browse, edit, generate, monitor)
Diagnose system health and Ollama connectivity:
# Full system check
fauxfoundry doctor
# Check specific endpoint
fauxfoundry doctor --endpoint http://localhost:11434
# Verbose diagnostics
fauxfoundry doctor --verboseFlags:
--endpoint string- Ollama endpoint to check--fix- Attempt to fix common issues--models- List available models
The TUI provides a rich, interactive experience with:
- Specification Editor: Visual YAML editing with validation
- Generation Monitor: Real-time progress and statistics
- File Browser: Manage specifications and outputs
- Settings Panel: Configure models and preferences
F1- HelpF2- Specification BrowserF3- Generate DataF4- Monitor GenerationF10- QuitCtrl+N- New SpecificationCtrl+S- SaveTab/Shift+Tab- Navigate components
--config- Configuration file path--verbose- Enable verbose logging--quiet- Suppress non-essential output--no-color- Disable colored output
Configure your LLM backend in the specification:
model:
endpoint: "http://localhost:11434" # Ollama endpoint
name: "llama3.1:8b" # Model name
batch_size: 32 # Records per batch
temperature: 0.7 # Creativity (0-2)
timeout: "30s" # Request timeoutFauxFoundry generates data in JSON Lines (JSONL) format:
{"email": "john.doe@gmail.com", "age": 34, "status": "active", "created_at": "2023-05-15T10:30:00Z", "preferences": {"newsletter": true}}
{"email": "jane.smith@yahoo.com", "age": 28, "status": "pending", "created_at": "2023-06-20T14:45:00Z", "preferences": {"newsletter": false}}Output can be:
- Streamed to stdout
- Saved to files (
.jsonlor.jsonl.gz) - Piped to other tools (
jq, databases, etc.)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FauxFoundry Interface β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β CLI Layer β TUI Layer β Shared Core β
β βββββββββββββββ β βββββββββββββββ β βββββββββββββββ β
β β Cobra CLI β β β Bubble Tea β β β Spec Parser β β
β β Commands β β β Components β β β LLM Client β β
β β Flags β β β Views β β β Dedup Logic β β
β β Validation β β β Models β β β Output β β
β βββββββββββββββ β βββββββββββββββ β βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
faux-foundry/
βββ cmd/fauxfoundry/ # Main application entry point
βββ internal/ # Internal packages
β βββ cli/ # CLI commands and logic
β βββ tui/ # Terminal UI components
β βββ llm/ # LLM client and Ollama integration
β βββ spec/ # YAML specification parsing
β βββ dedup/ # Record deduplication logic
β βββ output/ # Output writers (JSONL, compression)
βββ pkg/types/ # Shared type definitions
βββ examples/ # Sample YAML specifications
βββ outputs/ # Generated data files (gitignored)
βββ docs/ # Documentation (PRD, design specs)
FauxFoundry includes comprehensive example specifications for various domains:
customer.yaml- E-commerce customer data with demographicsproduct.yaml- Product catalog with pricing and inventoryuser.yaml- User profiles and authentication data
medical-demo.yaml- Basic medical insurance verificationmedical-insurance.yaml- Comprehensive 46-field insurance dataedi-270-271.yaml- EDI X12 healthcare eligibility transactions (53 fields)rx-claims-edi.yaml- NCPDP D.0 pharmacy claims (75+ fields)x12-837-core.yaml- X12 837 Professional Claims (66 fields)
financial-transactions.yaml- Banking and payment dataapi-logs.yaml- Application logs and metricsinventory-management.yaml- Supply chain and logistics
Healthcare Systems:
# Generate 1000 medical insurance records
fauxfoundry generate --spec examples/medical-insurance.yaml --count 1000 --output outputs/insurance-test-data.jsonl
# Create EDI test transactions
fauxfoundry generate --spec examples/edi-270-271.yaml --count 100 --output outputs/edi-test.jsonl.gzDevelopment & Testing:
# Generate customer test data for QA
fauxfoundry generate --spec examples/customer.yaml --count 50000 --output outputs/qa-customers.jsonl
# Create reproducible test datasets
fauxfoundry generate --spec examples/user.yaml --seed 12345 --count 1000Performance Testing:
# Generate large datasets with streaming
fauxfoundry generate --spec examples/product.yaml --count 1000000 --output outputs/products.jsonl.gz
# Stress test with complex specifications
fauxfoundry generate --spec examples/x12-837-core.yaml --count 10000 --max-retries 5We welcome contributions from the community! Here's how to get started:
- Fork the repository on GitHub
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with proper tests and documentation
- Run tests (
go test ./...) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request with a clear description
# Clone your fork
git clone https://github.com/yourusername/faux-foundry
cd faux-foundry
# Install dependencies
go mod download
# Run tests
go test ./...
# Build and test locally
go build -o bin/fauxfoundry ./cmd/fauxfoundry
./bin/fauxfoundry doctor- Follow Go best practices and
gofmtformatting - Add tests for new functionality
- Update documentation for user-facing changes
- Use conventional commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
FauxFoundry is committed to being a truly open-source project:
- β No vendor lock-in or proprietary dependencies
- β Local-first processing (your data never leaves your machine)
- β Community-driven development and feature requests
- β Transparent development process
Created by copyleftdev with β€οΈ for the developer community.
- Ollama - Local LLM infrastructure and model management
- Cobra - Powerful CLI framework for Go
- Bubble Tea - Terminal UI framework
- Lip Gloss - Terminal styling and layout
- Go - Systems programming language
- ANSI X12 - EDI transaction standards for healthcare
- NCPDP - Pharmacy claims processing standards
- HL7 FHIR - Healthcare interoperability standards
- ICD-10 - International disease classification
- CPT - Current Procedural Terminology codes
Special thanks to the open-source community and all contributors who help make FauxFoundry better!
# Quick start - generate your first synthetic dataset
git clone https://github.com/copyleftdev/faux-foundry
cd faux-foundry
go build -o bin/fauxfoundry ./cmd/fauxfoundry
./bin/fauxfoundry init my-data.yaml --template ecommerce
./bin/fauxfoundry generate --spec my-data.yaml --count 100FauxFoundry - Generate synthetic data with confidence π―
Built by developers, for developers. Privacy-first. Open source. Production ready.
