A lightweight, MIT-licensed semantic indexing layer that replaces hashtags with structured, human-selected dropdown descriptors.
- Overview
- Why This Exists
- Core Concept
- What This Is (and Is Not)
- Design Principles
- Repository Structure
- Example: Precision Search
- Use Cases
- Philosophy
- Installation
- Getting Started
- API Quick Reference
- Documentation
- Contributing
- License
- Citation
Get up and running in 60 seconds:
# 1. Clone and navigate
git clone https://github.com/dfeen87/Semantic-Dropdown-Search.git
cd Semantic-Dropdown-Search
# 2. Run tests to verify
python -m unittest discover tests
# 3. Try it out
python3 << 'EOF'
import sys
sys.path.insert(0, '.')
from core.descriptor import SemanticDescriptor
from indexer.index_text import TextIndex, IndexedText
from query.query_builder import QueryBuilder
# Create a descriptor
desc = SemanticDescriptor(
domain="Science β Biology",
intent="Research β Conceptual",
tone="Analytical / Cautious",
audience="Researchers",
stability="Hypothesis (Not yet validated)"
)
# Index some content
index = TextIndex()
item = IndexedText(
text="Exploring semantic classification frameworks",
descriptor=desc
)
index.add(item)
# Query it
query = QueryBuilder().filter_domain("Science").build()
results = index.search(query)
print(f"Found {len(results)} results matching 'Science'")
EOFThat's it! You now have a working semantic search system.
| Aspect | Status |
|---|---|
| Current Version | v1.0.0 (Stable) |
| Schema Stability | β v1 schemas are immutable |
| API Stability | β Stable, semantic versioning |
| Production Ready | β Yes |
| Breaking Changes | |
| Python Support | 3.9, 3.10, 3.11, 3.12 |
| Dependencies | Zero (core functionality) |
| License | MIT |
Instead of tagging text with free-form keywords, content is described using finite, versioned semantic fields β domain, intent, tone, audience, stability, and more. This makes text easier to search, filter, and reason about for both humans and machines.
This project is designed to be embedded, not centralized.
- π― Precision Search - Find exactly what you need using structured semantic filters
- π Zero Dependencies - Core functionality requires only Python 3.9+
- π Explainable Results - Every query result comes with clear explanations
- π Immutable Schemas - v1 schemas guaranteed stable forever
- π« No Black Boxes - Fully deterministic, no ML required
- ποΈ Hierarchical - First-class support for semantic hierarchies
- π Embeddable - Integrate into any system, any platform
- π Open Source - MIT license, fork-friendly
Hashtags were a workaround. They are:
- Ambiguous and overloaded β
#designcould mean graphic design, game design, or system design - Easy to game β spamming popular tags for visibility
- Flat β no hierarchy or relationships
- Hostile to serious search β finding what you actually want is difficult
Modern text discovery needs structure without surveillance and meaning without manipulation.
Semantic Dropdown Search provides:
- β Constrained semantic choices
- β Transparent intent signaling
- β Machine-readable metadata
- β Human-legible meaning
No ranking tricks. No hidden models. No behavioral profiling.
Every text object is paired with a semantic descriptor object chosen from dropdown-style schemas.
#ai #science #health #research #thoughts
{
"domain": "Science β Biology β Systems Biology",
"intent": "Research β Conceptual β Early-stage",
"tone": "Analytical / Cautious",
"audience": "Researchers",
"stability": "Hypothesis (Not yet validated)"
}Result: Search becomes precise, explainable, and meaningful.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Application Layer β
β (Social Network, Forum, CMS, Knowledge Base) β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββ
β Semantic Dropdown Search β
β (This Framework) β
βββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββΌβββββββββ¬ββββββββββββββββ
βΌ βΌ βΌ βΌ
ββββββββββββββββββββββββββββ ββββββββββββ
β Schema ββ Core ββIndexer β β Query β
β v1 ββValid.ββStorage β β Engine β
ββββββββββββββββββββββββββββ ββββββββββββ
β β β β
βββββββββββ΄βββββββββ΄βββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Structured β
β Semantic Data β
ββββββββββββββββββββ
- Author creates content and selects semantic descriptors from dropdowns
- Validation ensures descriptors match schema (domain, intent, tone, etc.)
- Normalization converts values to canonical form
- Indexing stores text + semantics together
- Querying filters content by semantic criteria
- Results returned with explanations of why they matched
- A semantic indexing layer
- A structured metadata schema
- A search and filter primitive
- Embeddable infrastructure
- A social network
- A recommender algorithm
- A crawler or scraper
- An AI prediction system
The goal is clarity, not virality.
| Feature | Semantic Dropdown Search | Hashtags | Full-Text Search | Vector Embeddings |
|---|---|---|---|---|
| Structure | β Structured dropdowns | β Unstructured text | β Free-form | |
| Validation | β Schema-enforced | β No validation | β No validation | β No validation |
| Explainability | β Fully explainable | β Ambiguous | β Black box | |
| Consistency | β Guaranteed | β User-dependent | ||
| Versioning | β Immutable schemas | β None | β None | |
| Hierarchy | β First-class | β Flat | β Flat | |
| Precision | β Exact matches | β Low | ||
| ML Required | β No | β No | β No | β Yes |
| Stability | β Deterministic | β Stable | ||
| Setup Complexity | β Simple | β Simple | β Complex |
Note: Semantic Dropdown Search is designed to complement full-text search and embeddings, not replace them. Use all three together for optimal results.
| Principle | Description |
|---|---|
| Finite vocabularies | All dropdowns are constrained and versioned |
| Human-first semantics | Descriptors are selected intentionally by authors |
| Machine-readable by default | Schemas are JSON-based and stable |
| No training, no tuning | No hidden models or personalization |
| Platform-agnostic | Works anywhere text exists |
| MIT-licensed | Free to embed, fork, and extend |
semantic-dropdown-search/
β
βββ π README.md # Project overview, quick start, philosophy
βββ π LICENSE # MIT license
βββ π CITATION.cff # Academic / research citation metadata
βββ π CHANGELOG.md # Release history and notable changes
βββ π VERSION # Current package version (v1.0.0)
βββ π .github/ # GitHub metadata (funding, workflows, templates)
β
βββ π docs/ # Conceptual and integration documentation
β βββ π modules/ # Module-level technical documentation
β β βββ core_module.md # Core semantics, validation, normalization
β β βββ indexer_module.md # Indexing and persistence layer
β β βββ query_module.md # Query engine and predicates
β β βββ api_module.md # API surface and contracts
β βββ philosophy.md # Design philosophy and guiding principles
β βββ design_principles.md # Non-negotiable architectural rules
β βββ schema_versioning.md # Schema lifecycle and compatibility rules
β βββ integration_guide.md # How to embed in real systems
β βββ faq.md # Common questions and guarantees
β
βββ π schema/ # Semantic schema definitions
β βββ v1/ # Stable schema version v1
β β βββ domain.json # Content domain taxonomy
β β βββ intent.json # Content intent taxonomy
β β βββ tone.json # Tone and communication style
β β βββ audience.json # Intended audience classification
β β βββ stability.json # Maturity / confidence signaling
β β βββ README.md # Schema usage and conventions
β βββ registry.json # Schema version registry and metadata
β
βββ π core/ # Semantic foundations
β βββ __init__.py
β βββ validate.py # Schema validation engine
β βββ normalize.py # Canonical normalization logic
β βββ descriptor.py # SemanticDescriptor data model
β βββ errors.py # Core exception hierarchy
β
βββ π indexer/ # Text indexing and storage
β βββ __init__.py
β βββ index_text.py # IndexedText + TextIndex implementations
β βββ serialize.py # JSON / NDJSON / CSV serialization
β βββ adapters.py # Storage adapters (file, memory, directory)
β
βββ π query/ # Query engine
β βββ __init__.py
β βββ query_builder.py # Fluent query construction API
β βββ filters.py # High-level filter helpers
β βββ predicates.py # Predicate primitives and logic
β βββ explain.py # Human-readable query explanations
β
βββ π api/ # External API definitions
β βββ openapi.yaml # OpenAPI specification
β βββ π examples/ # API request/response examples
β βββ index_request.json
β βββ search_request.json
β
βββ π examples/ # End-to-end usage examples
β βββ π posts/ # Example content descriptors
β β βββ research_post.json
β β βββ blog_post.json
β β βββ forum_post.json
β βββ π queries/ # Example query definitions
β β βββ cautious_research.json
β β βββ early_stage_filter.json
β βββ end_to_end.md # Full indexing β querying walkthrough
β
βββ π tests/ # Test suite
β βββ tests.md # How to run and interpret tests
β βββ __init__.py
β βββ test_schema.py # Schema validation tests
β βββ test_validation.py # Descriptor validation tests
β βββ test_query.py # Query engine tests
β βββ run_tests.py # Test runner
β βββ π fixtures/
β βββ sample_descriptors.json
β
βββ π tools/ # Maintenance and migration utilities
βββ schema_linter.py # Schema validation and consistency checks
βββ migration_helper.py # Schema migration and compatibility tooling
Note: You can adopt only the schema, or the schema plus helpers β whatever fits your needs.
Show me:
- Conceptual biology posts
- Written cautiously
- Intended for researchers
- Explicitly marked as unvalidated
This is impossible to do reliably with hashtags.
With Semantic Dropdown Search, this query maps directly to structured fields, returning exactly what you're looking for.
Semantic Dropdown Search is ideal for:
- π¬ Research platforms β filtering by validation stage and audience
- π» Developer forums β distinguishing questions from solutions
- βοΈ Long-form blogging tools β organizing by tone and intent
- π Knowledge bases β structuring internal documentation
- π Open-source projects β clarifying contribution types
- π£οΈ Social platforms β enabling non-manipulative discovery
Semantic Dropdown Search treats text as intentional communication, not engagement bait.
- What they are doing β research, documentation, opinion
- Who it is for β experts, learners, general audience
- How stable the content is β hypothesis, validated, canonical
Meaning, not popularity.
- Python 3.9 or higher
- No external dependencies required for core functionality
- Clone the repository:
git clone https://github.com/dfeen87/Semantic-Dropdown-Search.git
cd Semantic-Dropdown-Search- Verify installation:
python -m unittest discover tests- Import and use:
# Add the project to your Python path or install locally
import sys
sys.path.append('/path/to/Semantic-Dropdown-Search')
from core.descriptor import SemanticDescriptor
from indexer.index_text import TextIndex
from query.query_builder import QueryBuilderYou can integrate Semantic Dropdown Search into your project in several ways:
- Direct inclusion: Copy the
core/,indexer/,query/, andschema/directories - Submodule: Add as a git submodule:
git submodule add https://github.com/dfeen87/Semantic-Dropdown-Search.git - Vendor: Vendor the required modules into your project
Note: This is a library/framework, not a standalone application. It's designed to be embedded into your existing systems.
Start by reviewing the semantic fields in schema/v1/:
ls schema/v1/
# domain.json intent.json tone.json audience.json stability.jsonDescribe your text using the dropdown options:
from core.descriptor import SemanticDescriptor
descriptor = SemanticDescriptor(
domain="Science β Biology β Systems Biology",
intent="Research β Conceptual β Early-stage",
tone="Analytical / Cautious",
audience="Researchers",
stability="Hypothesis (Not yet validated)"
)from indexer.index_text import index
from query.query_builder import QueryBuilder
# Index your content
index(text="Your content here", descriptor=descriptor)
# Build precise queries
query = QueryBuilder()
.filter_domain("Science β Biology")
.filter_stability("Hypothesis")
.filter_tone("Cautious")
.build()See examples/end_to_end.md for complete workflows.
# Create a semantic descriptor
from core.descriptor import SemanticDescriptor
descriptor = SemanticDescriptor(
domain="Science β Biology β Systems Biology",
intent="Research β Conceptual β Early-stage",
tone="Analytical / Cautious",
audience="Researchers",
stability="Hypothesis (Not yet validated)"
)
# Validate against schema
from core.validate import validate_descriptor
is_valid, errors = validate_descriptor(descriptor, schema_version="v1")
# Normalize descriptor values
from core.normalize import normalize_descriptor
normalized = normalize_descriptor(descriptor)from indexer.index_text import TextIndex, IndexedText
# Create an index
index = TextIndex()
# Add content with semantics
indexed_item = IndexedText(
text="Your content here",
descriptor=descriptor,
metadata={"author": "John Doe", "timestamp": "2026-02-14"}
)
index.add(indexed_item)
# Persist to disk
from indexer.serialize import serialize_to_file
serialize_to_file(index, "my_index.json")from query.query_builder import QueryBuilder
from query.predicates import domain_matches, stability_equals
# Build structured queries
query = (QueryBuilder()
.where(domain_matches("Science β Biology"))
.where(stability_equals("Hypothesis"))
.build())
# Execute query
results = index.search(query)
# Get explanations
from query.explain import explain_query
explanation = explain_query(query)
print(explanation) # Human-readable query description| Field | Description | Example Values |
|---|---|---|
domain |
Content subject area | Science β Biology, Engineering β Software |
intent |
Purpose of content | Research β Conceptual, Documentation β Tutorial |
tone |
Communication style | Analytical / Cautious, Casual / Conversational |
audience |
Target readers | Researchers, General Public, Experts |
stability |
Content maturity | Hypothesis, Validated, Canonical |
See schema/v1/ for complete hierarchies and valid values.
- π Philosophy - Design rationale and principles
- ποΈ Design Principles - Architectural decisions
- π Integration Guide - How to embed in your system
- β FAQ - Frequently asked questions
- π Schema Versioning - Schema lifecycle and compatibility
- Core Module - Validation, normalization, descriptors
- Indexer Module - Storage and persistence
- Query Module - Query building and filtering
- API Module - API contracts and OpenAPI spec
- π‘ End-to-End Workflow - Complete usage example
- π Example Posts - Sample semantic descriptors
- π Example Queries - Sample query patterns
We welcome contributions! Here's how to get involved:
- π Bug Reports - Found an issue? Open a bug report
- π‘ Feature Requests - Have an idea? Suggest a feature
- π Documentation - Improve guides, fix typos, add examples
- π§ͺ Tests - Add test coverage, improve test quality
- π§ Code - Fix bugs, implement features
- π Schema Extensions - Propose new semantic fields (with strong justification)
- Respect schema stability - v1 schemas are immutable
- Maintain backward compatibility - Don't break existing APIs
- Prioritize clarity over cleverness - Code should be readable
- Add tests for new functionality - Maintain test coverage
- Update documentation - Keep docs in sync with code
- Follow existing patterns - Match the project's style
# Run all tests
python -m unittest discover tests
# Run specific test module
python tests/test_schema.py
# Run with verbose output
python -m unittest discover tests -v- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests to ensure nothing breaks
- Commit your changes (
git commit -m 'Add amazing feature') - Push to your branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Be respectful and constructive
- Focus on what is best for the project
- Show empathy towards other contributors
Please open an issue or pull request for any contributions.
MIT License β use it, ship it, improve it.
See LICENSE for full details.
Don Michael Feeney Jr.
This project is part of a broader effort to improve epistemic clarity, safety, and trust in technical communication.
If you use this project in your research or product, please cite:
@software{semantic_dropdown_search,
author = {Feeney, Don Michael Jr.},
title = {Semantic Dropdown Search},
year = {2026},
url = {https://github.com/dfeen87/semantic-dropdown-search}
}See CITATION.cff for more formats.
Q: ModuleNotFoundError when importing
# Solution: Add project to Python path
import sys
sys.path.insert(0, '/path/to/Semantic-Dropdown-Search')Q: Validation fails with "Invalid schema value"
- Ensure your values exactly match those in
schema/v1/*.json - Check for typos and exact case/spacing
- Use
β(not->or-) for hierarchy separators
Q: Tests failing on import
# Ensure you're in the project root directory
cd /path/to/Semantic-Dropdown-Search
python -m unittest discover testsQ: How do I add custom fields?
Custom fields are supported! They're stored but not validated:
descriptor = SemanticDescriptor(
domain="Science",
custom_field="my_value" # This works!
)Q: Can I modify schema values?
No. Schema v1 is immutable. However, you can:
- Propose additions in future major versions
- Create custom schemas for your own use
- Use custom fields for project-specific metadata
Q: Performance issues with large indexes?
- Use appropriate storage adapters (see
indexer/adapters.py) - Consider filtering early in your query pipeline
- Index incrementally rather than all at once
For more help, see FAQ or open an issue.
- π Documentation
- π§ API Reference
- π‘ Examples
- β FAQ
- π Changelog
- π Issues
- π Pull Requests
- π Start with the FAQ for common questions
- π Read the Integration Guide for implementation help
- π¬ Open a GitHub Discussion for questions
- π Report bugs via GitHub Issues
- β Star this project if you find it useful
- ποΈ Watch for updates and releases
- π΄ Fork to create your own variants
- π Sponsor to support development
β
Stable schema (v1)
β
Core validation and normalization
β
Indexing and persistence layer
β
Query engine with explanations
β
OpenAPI specification
β
Comprehensive documentation
The following may be explored in future versions (no guarantees):
- Additional schema fields (with community input)
- Performance optimizations for large-scale indexing
- Additional storage adapters (databases, cloud storage)
- Language bindings (JavaScript, Go, Rust)
- Schema migration tooling enhancements
- GraphQL API specification
- Real-time indexing support
Note: Any changes will respect semantic versioning and backward compatibility guarantees.
Semantic Dropdown Search is designed to be lightweight and efficient:
- Zero overhead schemas - Validation is fast, using simple rule-based checks
- Minimal memory footprint - Only stores what you index
- No ML inference costs - Deterministic queries are instant
- Lazy loading - Schemas and indexes load on-demand
- Serialization options - JSON, NDJSON, CSV formats available
Test Environment: Standard developer laptop (Intel i5/Ryzen 5 class CPU, 16GB RAM, Python 3.10, Linux/macOS)
Approximate performance for typical workloads:
- Schema validation: ~10,000 descriptors/second (5-field descriptors, averaged)
- Indexing: ~5,000 items/second (in-memory, with 200-character text fields)
- Query execution: Sub-millisecond for typical predicates (simple domain/stability filters)
- Serialization: ~2,000 items/second to JSON (full descriptor objects)
Note: These are approximate figures from informal testing. Actual performance depends on your hardware, storage adapter, query complexity, descriptor field count, text length, and system resources. For production deployments, benchmark with your actual data patterns.
For large-scale deployments:
- Use appropriate storage adapters - Database-backed indexes scale better than in-memory
- Index incrementally - Add items as they're created, not in bulk
- Partition by domain - Separate indexes for different content domains
- Cache query results - Common queries benefit from caching
- Combine with full-text search - Use semantic filters to narrow results, then full-text within
# Run with timing
python -m timeit -n 1000 -s "from core.validate import validate_descriptor" "validate_descriptor(...)"
# Profile indexing
python -m cProfile -o profile.stats your_indexing_script.pyThis project follows responsible security practices:
- No external dependencies for core functionality reduces attack surface
- Schema validation prevents injection attacks via semantic fields
- Deterministic behavior - no hidden models or data exfiltration
- Open source - all code is auditable
If you discover a security vulnerability, please report it responsibly:
- Do NOT open a public issue
- Preferred: Use GitHub's private vulnerability reporting
- Alternative: Email security concerns to the maintainer (see CITATION.cff for contact)
- Include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if any)
We will respond within 48 hours and work with you to address the issue.
When embedding Semantic Dropdown Search:
- β Do validate all user inputs before descriptor creation
- β Do sanitize text content before indexing
- β Do implement access controls at your application layer
- β Don't trust descriptors from untrusted sources without validation
- β Don't expose raw file system paths via serialization adapters
- β Don't store sensitive data in semantic descriptor fields
Built for clarity. Designed to be embedded.