Skip to content

Semantic Dropdown Search is a schema-driven, open-source framework for indexing and querying text using structured dropdown semantics instead of hashtags. It enables deterministic, explainable search across social platforms, documentation systems, and content pipelines.

License

Notifications You must be signed in to change notification settings

dfeen87/Semantic-Dropdown-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

84 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Semantic Dropdown Search

A lightweight, MIT-licensed semantic indexing layer that replaces hashtags with structured, human-selected dropdown descriptors.

License: MIT Version CI Python 3.9+


Table of Contents


Quick Start

Get up and running in 60 seconds:

# 1. Clone and navigate
git clone https://github.com/dfeen87/Semantic-Dropdown-Search.git
cd Semantic-Dropdown-Search

# 2. Run tests to verify
python -m unittest discover tests

# 3. Try it out
python3 << 'EOF'
import sys
sys.path.insert(0, '.')

from core.descriptor import SemanticDescriptor
from indexer.index_text import TextIndex, IndexedText
from query.query_builder import QueryBuilder

# Create a descriptor
desc = SemanticDescriptor(
    domain="Science β†’ Biology",
    intent="Research β†’ Conceptual",
    tone="Analytical / Cautious",
    audience="Researchers",
    stability="Hypothesis (Not yet validated)"
)

# Index some content
index = TextIndex()
item = IndexedText(
    text="Exploring semantic classification frameworks",
    descriptor=desc
)
index.add(item)

# Query it
query = QueryBuilder().filter_domain("Science").build()
results = index.search(query)
print(f"Found {len(results)} results matching 'Science'")
EOF

That's it! You now have a working semantic search system.


Project Status

Aspect Status
Current Version v1.0.0 (Stable)
Schema Stability βœ… v1 schemas are immutable
API Stability βœ… Stable, semantic versioning
Production Ready βœ… Yes
Breaking Changes ⚠️ Only in major versions
Python Support 3.9, 3.10, 3.11, 3.12
Dependencies Zero (core functionality)
License MIT

Overview

Instead of tagging text with free-form keywords, content is described using finite, versioned semantic fields β€” domain, intent, tone, audience, stability, and more. This makes text easier to search, filter, and reason about for both humans and machines.

This project is designed to be embedded, not centralized.

Key Features

  • 🎯 Precision Search - Find exactly what you need using structured semantic filters
  • πŸ”’ Zero Dependencies - Core functionality requires only Python 3.9+
  • πŸ“Š Explainable Results - Every query result comes with clear explanations
  • πŸ”„ Immutable Schemas - v1 schemas guaranteed stable forever
  • 🚫 No Black Boxes - Fully deterministic, no ML required
  • πŸ—οΈ Hierarchical - First-class support for semantic hierarchies
  • πŸ”Œ Embeddable - Integrate into any system, any platform
  • πŸ“ Open Source - MIT license, fork-friendly

Why This Exists

The Problem with Hashtags

Hashtags were a workaround. They are:

  • Ambiguous and overloaded β€” #design could mean graphic design, game design, or system design
  • Easy to game β€” spamming popular tags for visibility
  • Flat β€” no hierarchy or relationships
  • Hostile to serious search β€” finding what you actually want is difficult

The Solution

Modern text discovery needs structure without surveillance and meaning without manipulation.

Semantic Dropdown Search provides:

  • βœ“ Constrained semantic choices
  • βœ“ Transparent intent signaling
  • βœ“ Machine-readable metadata
  • βœ“ Human-legible meaning

No ranking tricks. No hidden models. No behavioral profiling.


Core Concept

Every text object is paired with a semantic descriptor object chosen from dropdown-style schemas.

Instead of This:

#ai #science #health #research #thoughts

You Get This:

{
  "domain": "Science β†’ Biology β†’ Systems Biology",
  "intent": "Research β†’ Conceptual β†’ Early-stage",
  "tone": "Analytical / Cautious",
  "audience": "Researchers",
  "stability": "Hypothesis (Not yet validated)"
}

Result: Search becomes precise, explainable, and meaningful.


Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Your Application Layer                        β”‚
β”‚          (Social Network, Forum, CMS, Knowledge Base)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚   Semantic Dropdown Search    β”‚
         β”‚     (This Framework)          β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό        β–Ό        β–Ό               β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ Schema β”‚β”‚ Core β”‚β”‚Indexer β”‚   β”‚  Query   β”‚
   β”‚  v1    β”‚β”‚Valid.β”‚β”‚Storage β”‚   β”‚ Engine   β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚         β”‚        β”‚              β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Structured      β”‚
              β”‚  Semantic Data   β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

  1. Author creates content and selects semantic descriptors from dropdowns
  2. Validation ensures descriptors match schema (domain, intent, tone, etc.)
  3. Normalization converts values to canonical form
  4. Indexing stores text + semantics together
  5. Querying filters content by semantic criteria
  6. Results returned with explanations of why they matched

What This Is (and Is Not)

βœ“ This Is:

  • A semantic indexing layer
  • A structured metadata schema
  • A search and filter primitive
  • Embeddable infrastructure

βœ— This Is Not:

  • A social network
  • A recommender algorithm
  • A crawler or scraper
  • An AI prediction system

The goal is clarity, not virality.


Semantic Dropdown Search vs. Alternatives

Feature Semantic Dropdown Search Hashtags Full-Text Search Vector Embeddings
Structure βœ… Structured dropdowns ❌ Unstructured text ❌ Free-form ⚠️ Learned vectors
Validation βœ… Schema-enforced ❌ No validation ❌ No validation ❌ No validation
Explainability βœ… Fully explainable ❌ Ambiguous ⚠️ Keyword matching ❌ Black box
Consistency βœ… Guaranteed ❌ User-dependent ⚠️ Limited ⚠️ Model-dependent
Versioning βœ… Immutable schemas ❌ None ❌ None ⚠️ Model versions
Hierarchy βœ… First-class ❌ Flat ❌ Flat ⚠️ Implicit
Precision βœ… Exact matches ❌ Low ⚠️ Moderate ⚠️ Approximate
ML Required βœ… No βœ… No βœ… No ❌ Yes
Stability βœ… Deterministic ⚠️ Changes over time βœ… Stable ⚠️ Model drift
Setup Complexity ⚠️ Schema design βœ… Simple βœ… Simple ❌ Complex

Note: Semantic Dropdown Search is designed to complement full-text search and embeddings, not replace them. Use all three together for optimal results.


Design Principles

Principle Description
Finite vocabularies All dropdowns are constrained and versioned
Human-first semantics Descriptors are selected intentionally by authors
Machine-readable by default Schemas are JSON-based and stable
No training, no tuning No hidden models or personalization
Platform-agnostic Works anywhere text exists
MIT-licensed Free to embed, fork, and extend

Repository Structure

semantic-dropdown-search/
β”‚
β”œβ”€β”€ πŸ“„ README.md              # Project overview, quick start, philosophy
β”œβ”€β”€ πŸ“„ LICENSE                # MIT license
β”œβ”€β”€ πŸ“„ CITATION.cff           # Academic / research citation metadata
β”œβ”€β”€ πŸ“„ CHANGELOG.md           # Release history and notable changes
β”œβ”€β”€ πŸ“„ VERSION                # Current package version (v1.0.0)
β”œβ”€β”€ πŸ“ .github/               # GitHub metadata (funding, workflows, templates)
β”‚
β”œβ”€β”€ πŸ“ docs/                  # Conceptual and integration documentation
β”‚   β”œβ”€β”€ πŸ“ modules/           # Module-level technical documentation
β”‚   β”‚   β”œβ”€β”€ core_module.md    # Core semantics, validation, normalization
β”‚   β”‚   β”œβ”€β”€ indexer_module.md # Indexing and persistence layer
β”‚   β”‚   β”œβ”€β”€ query_module.md   # Query engine and predicates
β”‚   β”‚   └── api_module.md     # API surface and contracts
β”‚   β”œβ”€β”€ philosophy.md         # Design philosophy and guiding principles
β”‚   β”œβ”€β”€ design_principles.md  # Non-negotiable architectural rules
β”‚   β”œβ”€β”€ schema_versioning.md  # Schema lifecycle and compatibility rules
β”‚   β”œβ”€β”€ integration_guide.md  # How to embed in real systems
β”‚   └── faq.md                # Common questions and guarantees
β”‚
β”œβ”€β”€ πŸ“ schema/                # Semantic schema definitions
β”‚   β”œβ”€β”€ v1/                   # Stable schema version v1
β”‚   β”‚   β”œβ”€β”€ domain.json       # Content domain taxonomy
β”‚   β”‚   β”œβ”€β”€ intent.json       # Content intent taxonomy
β”‚   β”‚   β”œβ”€β”€ tone.json         # Tone and communication style
β”‚   β”‚   β”œβ”€β”€ audience.json     # Intended audience classification
β”‚   β”‚   β”œβ”€β”€ stability.json    # Maturity / confidence signaling
β”‚   β”‚   └── README.md         # Schema usage and conventions
β”‚   └── registry.json         # Schema version registry and metadata
β”‚
β”œβ”€β”€ πŸ“ core/                  # Semantic foundations
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ validate.py           # Schema validation engine
β”‚   β”œβ”€β”€ normalize.py          # Canonical normalization logic
β”‚   β”œβ”€β”€ descriptor.py         # SemanticDescriptor data model
β”‚   └── errors.py             # Core exception hierarchy
β”‚
β”œβ”€β”€ πŸ“ indexer/               # Text indexing and storage
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ index_text.py         # IndexedText + TextIndex implementations
β”‚   β”œβ”€β”€ serialize.py          # JSON / NDJSON / CSV serialization
β”‚   └── adapters.py           # Storage adapters (file, memory, directory)
β”‚
β”œβ”€β”€ πŸ“ query/                 # Query engine
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ query_builder.py      # Fluent query construction API
β”‚   β”œβ”€β”€ filters.py            # High-level filter helpers
β”‚   β”œβ”€β”€ predicates.py         # Predicate primitives and logic
β”‚   └── explain.py            # Human-readable query explanations
β”‚
β”œβ”€β”€ πŸ“ api/                   # External API definitions
β”‚   β”œβ”€β”€ openapi.yaml          # OpenAPI specification
β”‚   └── πŸ“ examples/          # API request/response examples
β”‚       β”œβ”€β”€ index_request.json
β”‚       └── search_request.json
β”‚
β”œβ”€β”€ πŸ“ examples/              # End-to-end usage examples
β”‚   β”œβ”€β”€ πŸ“ posts/             # Example content descriptors
β”‚   β”‚   β”œβ”€β”€ research_post.json
β”‚   β”‚   β”œβ”€β”€ blog_post.json
β”‚   β”‚   └── forum_post.json
β”‚   β”œβ”€β”€ πŸ“ queries/           # Example query definitions
β”‚   β”‚   β”œβ”€β”€ cautious_research.json
β”‚   β”‚   └── early_stage_filter.json
β”‚   └── end_to_end.md         # Full indexing β†’ querying walkthrough
β”‚
β”œβ”€β”€ πŸ“ tests/                 # Test suite
β”‚   β”œβ”€β”€ tests.md              # How to run and interpret tests
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ test_schema.py        # Schema validation tests
β”‚   β”œβ”€β”€ test_validation.py   # Descriptor validation tests
β”‚   β”œβ”€β”€ test_query.py        # Query engine tests
β”‚   β”œβ”€β”€ run_tests.py         # Test runner
β”‚   └── πŸ“ fixtures/
β”‚        └── sample_descriptors.json
β”‚
└── πŸ“ tools/                 # Maintenance and migration utilities
    β”œβ”€β”€ schema_linter.py      # Schema validation and consistency checks
    └── migration_helper.py   # Schema migration and compatibility tooling

Note: You can adopt only the schema, or the schema plus helpers β€” whatever fits your needs.


Example: Precision Search

Query:

Show me:

  • Conceptual biology posts
  • Written cautiously
  • Intended for researchers
  • Explicitly marked as unvalidated

Why This Matters:

This is impossible to do reliably with hashtags.

With Semantic Dropdown Search, this query maps directly to structured fields, returning exactly what you're looking for.


Use Cases

Semantic Dropdown Search is ideal for:

  • πŸ”¬ Research platforms β€” filtering by validation stage and audience
  • πŸ’» Developer forums β€” distinguishing questions from solutions
  • ✍️ Long-form blogging tools β€” organizing by tone and intent
  • πŸ“š Knowledge bases β€” structuring internal documentation
  • 🌐 Open-source projects β€” clarifying contribution types
  • πŸ—£οΈ Social platforms β€” enabling non-manipulative discovery

Philosophy

Semantic Dropdown Search treats text as intentional communication, not engagement bait.

Authors Are Encouraged to State:

  • What they are doing β€” research, documentation, opinion
  • Who it is for β€” experts, learners, general audience
  • How stable the content is β€” hypothesis, validated, canonical

Readers Are Empowered to Search Based On:

Meaning, not popularity.


Installation

Prerequisites

  • Python 3.9 or higher
  • No external dependencies required for core functionality

Basic Setup

  1. Clone the repository:
git clone https://github.com/dfeen87/Semantic-Dropdown-Search.git
cd Semantic-Dropdown-Search
  1. Verify installation:
python -m unittest discover tests
  1. Import and use:
# Add the project to your Python path or install locally
import sys
sys.path.append('/path/to/Semantic-Dropdown-Search')

from core.descriptor import SemanticDescriptor
from indexer.index_text import TextIndex
from query.query_builder import QueryBuilder

Integration Options

You can integrate Semantic Dropdown Search into your project in several ways:

  • Direct inclusion: Copy the core/, indexer/, query/, and schema/ directories
  • Submodule: Add as a git submodule: git submodule add https://github.com/dfeen87/Semantic-Dropdown-Search.git
  • Vendor: Vendor the required modules into your project

Note: This is a library/framework, not a standalone application. It's designed to be embedded into your existing systems.


Getting Started

1. Explore the Schema

Start by reviewing the semantic fields in schema/v1/:

ls schema/v1/
# domain.json  intent.json  tone.json  audience.json  stability.json

2. Tag Your Content

Describe your text using the dropdown options:

from core.descriptor import SemanticDescriptor

descriptor = SemanticDescriptor(
    domain="Science β†’ Biology β†’ Systems Biology",
    intent="Research β†’ Conceptual β†’ Early-stage",
    tone="Analytical / Cautious",
    audience="Researchers",
    stability="Hypothesis (Not yet validated)"
)

3. Index and Search

from indexer.index_text import index
from query.query_builder import QueryBuilder

# Index your content
index(text="Your content here", descriptor=descriptor)

# Build precise queries
query = QueryBuilder()
    .filter_domain("Science β†’ Biology")
    .filter_stability("Hypothesis")
    .filter_tone("Cautious")
    .build()

See examples/end_to_end.md for complete workflows.


API Quick Reference

Core Components

# Create a semantic descriptor
from core.descriptor import SemanticDescriptor

descriptor = SemanticDescriptor(
    domain="Science β†’ Biology β†’ Systems Biology",
    intent="Research β†’ Conceptual β†’ Early-stage",
    tone="Analytical / Cautious",
    audience="Researchers",
    stability="Hypothesis (Not yet validated)"
)

# Validate against schema
from core.validate import validate_descriptor
is_valid, errors = validate_descriptor(descriptor, schema_version="v1")

# Normalize descriptor values
from core.normalize import normalize_descriptor
normalized = normalize_descriptor(descriptor)

Indexing

from indexer.index_text import TextIndex, IndexedText

# Create an index
index = TextIndex()

# Add content with semantics
indexed_item = IndexedText(
    text="Your content here",
    descriptor=descriptor,
    metadata={"author": "John Doe", "timestamp": "2026-02-14"}
)
index.add(indexed_item)

# Persist to disk
from indexer.serialize import serialize_to_file
serialize_to_file(index, "my_index.json")

Querying

from query.query_builder import QueryBuilder
from query.predicates import domain_matches, stability_equals

# Build structured queries
query = (QueryBuilder()
    .where(domain_matches("Science β†’ Biology"))
    .where(stability_equals("Hypothesis"))
    .build())

# Execute query
results = index.search(query)

# Get explanations
from query.explain import explain_query
explanation = explain_query(query)
print(explanation)  # Human-readable query description

Available Schema Fields (v1)

Field Description Example Values
domain Content subject area Science β†’ Biology, Engineering β†’ Software
intent Purpose of content Research β†’ Conceptual, Documentation β†’ Tutorial
tone Communication style Analytical / Cautious, Casual / Conversational
audience Target readers Researchers, General Public, Experts
stability Content maturity Hypothesis, Validated, Canonical

See schema/v1/ for complete hierarchies and valid values.


Documentation

Core Documentation

Module Documentation

Examples


Contributing

We welcome contributions! Here's how to get involved:

Ways to Contribute

  • πŸ› Bug Reports - Found an issue? Open a bug report
  • πŸ’‘ Feature Requests - Have an idea? Suggest a feature
  • πŸ“– Documentation - Improve guides, fix typos, add examples
  • πŸ§ͺ Tests - Add test coverage, improve test quality
  • πŸ”§ Code - Fix bugs, implement features
  • 🌐 Schema Extensions - Propose new semantic fields (with strong justification)

Development Guidelines

  1. Respect schema stability - v1 schemas are immutable
  2. Maintain backward compatibility - Don't break existing APIs
  3. Prioritize clarity over cleverness - Code should be readable
  4. Add tests for new functionality - Maintain test coverage
  5. Update documentation - Keep docs in sync with code
  6. Follow existing patterns - Match the project's style

Running Tests

# Run all tests
python -m unittest discover tests

# Run specific test module
python tests/test_schema.py

# Run with verbose output
python -m unittest discover tests -v

Contribution Process

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests to ensure nothing breaks
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to your branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code of Conduct

  • Be respectful and constructive
  • Focus on what is best for the project
  • Show empathy towards other contributors

Please open an issue or pull request for any contributions.


License

MIT License β€” use it, ship it, improve it.

See LICENSE for full details.


Author

Don Michael Feeney Jr.

This project is part of a broader effort to improve epistemic clarity, safety, and trust in technical communication.


Citation

If you use this project in your research or product, please cite:

@software{semantic_dropdown_search,
  author = {Feeney, Don Michael Jr.},
  title = {Semantic Dropdown Search},
  year = {2026},
  url = {https://github.com/dfeen87/semantic-dropdown-search}
}

See CITATION.cff for more formats.


Troubleshooting

Common Issues

Q: ModuleNotFoundError when importing

# Solution: Add project to Python path
import sys
sys.path.insert(0, '/path/to/Semantic-Dropdown-Search')

Q: Validation fails with "Invalid schema value"

  • Ensure your values exactly match those in schema/v1/*.json
  • Check for typos and exact case/spacing
  • Use β†’ (not -> or -) for hierarchy separators

Q: Tests failing on import

# Ensure you're in the project root directory
cd /path/to/Semantic-Dropdown-Search
python -m unittest discover tests

Q: How do I add custom fields?

Custom fields are supported! They're stored but not validated:

descriptor = SemanticDescriptor(
    domain="Science",
    custom_field="my_value"  # This works!
)

Q: Can I modify schema values?

No. Schema v1 is immutable. However, you can:

  • Propose additions in future major versions
  • Create custom schemas for your own use
  • Use custom fields for project-specific metadata

Q: Performance issues with large indexes?

  • Use appropriate storage adapters (see indexer/adapters.py)
  • Consider filtering early in your query pipeline
  • Index incrementally rather than all at once

For more help, see FAQ or open an issue.


Links


Support

Getting Help

Community

  • ⭐ Star this project if you find it useful
  • πŸ‘οΈ Watch for updates and releases
  • 🍴 Fork to create your own variants
  • πŸ’– Sponsor to support development

Roadmap

Current Status (v1.0.0)

βœ… Stable schema (v1)
βœ… Core validation and normalization
βœ… Indexing and persistence layer
βœ… Query engine with explanations
βœ… OpenAPI specification
βœ… Comprehensive documentation

Future Considerations

The following may be explored in future versions (no guarantees):

  • Additional schema fields (with community input)
  • Performance optimizations for large-scale indexing
  • Additional storage adapters (databases, cloud storage)
  • Language bindings (JavaScript, Go, Rust)
  • Schema migration tooling enhancements
  • GraphQL API specification
  • Real-time indexing support

Note: Any changes will respect semantic versioning and backward compatibility guarantees.


Performance

Design for Scale

Semantic Dropdown Search is designed to be lightweight and efficient:

  • Zero overhead schemas - Validation is fast, using simple rule-based checks
  • Minimal memory footprint - Only stores what you index
  • No ML inference costs - Deterministic queries are instant
  • Lazy loading - Schemas and indexes load on-demand
  • Serialization options - JSON, NDJSON, CSV formats available

Typical Performance

Test Environment: Standard developer laptop (Intel i5/Ryzen 5 class CPU, 16GB RAM, Python 3.10, Linux/macOS)

Approximate performance for typical workloads:

  • Schema validation: ~10,000 descriptors/second (5-field descriptors, averaged)
  • Indexing: ~5,000 items/second (in-memory, with 200-character text fields)
  • Query execution: Sub-millisecond for typical predicates (simple domain/stability filters)
  • Serialization: ~2,000 items/second to JSON (full descriptor objects)

Note: These are approximate figures from informal testing. Actual performance depends on your hardware, storage adapter, query complexity, descriptor field count, text length, and system resources. For production deployments, benchmark with your actual data patterns.

Scaling Strategies

For large-scale deployments:

  1. Use appropriate storage adapters - Database-backed indexes scale better than in-memory
  2. Index incrementally - Add items as they're created, not in bulk
  3. Partition by domain - Separate indexes for different content domains
  4. Cache query results - Common queries benefit from caching
  5. Combine with full-text search - Use semantic filters to narrow results, then full-text within

Benchmarking Your Implementation

# Run with timing
python -m timeit -n 1000 -s "from core.validate import validate_descriptor" "validate_descriptor(...)"

# Profile indexing
python -m cProfile -o profile.stats your_indexing_script.py

Security

Security Policy

This project follows responsible security practices:

  • No external dependencies for core functionality reduces attack surface
  • Schema validation prevents injection attacks via semantic fields
  • Deterministic behavior - no hidden models or data exfiltration
  • Open source - all code is auditable

Reporting Security Issues

If you discover a security vulnerability, please report it responsibly:

  1. Do NOT open a public issue
  2. Preferred: Use GitHub's private vulnerability reporting
  3. Alternative: Email security concerns to the maintainer (see CITATION.cff for contact)
  4. Include:
    • Description of the vulnerability
    • Steps to reproduce
    • Potential impact
    • Suggested fix (if any)

We will respond within 48 hours and work with you to address the issue.

Security Considerations for Implementers

When embedding Semantic Dropdown Search:

  • βœ… Do validate all user inputs before descriptor creation
  • βœ… Do sanitize text content before indexing
  • βœ… Do implement access controls at your application layer
  • ❌ Don't trust descriptors from untrusted sources without validation
  • ❌ Don't expose raw file system paths via serialization adapters
  • ❌ Don't store sensitive data in semantic descriptor fields

Built for clarity. Designed to be embedded.

About

Semantic Dropdown Search is a schema-driven, open-source framework for indexing and querying text using structured dropdown semantics instead of hashtags. It enables deterministic, explainable search across social platforms, documentation systems, and content pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages