Build scalable ensemble systems for transformer-based models.
torchstack is a library designed to simplify the creation and deployment of scalable ensemble learning systems for Hugging Face transformers. It provides tools to address challenges like tokenizer mismatch, voting strategies, and model integration, making ensemble learning accessible and efficient for natural language processing tasks.
- High-Level API: Simplifies ensemble learning, inspired by Keras for transformers.
- Tokenizer Compatibility: Support for union vocabularies, projections (e.g., DEEPEN), and other solutions to handle tokenizer mismatches.
- Flexible Voting Strategies: Includes average voting, majority voting, and extensible custom strategies.
- Integration with Hugging Face: Seamlessly works with Hugging Face models and tokenizers.
- Production-Ready: Tools for building, testing, and deploying your ensemble systems with ease.
- Packaging: uv
- Linting/Formatting: ruff
- Testing: PyTest
- Code Coverage: coverage.py
- Static Code Analysis: CodeClimate
- Transformers: Core library for transformer-based models.
- Torch: Deep learning framework for model integration and training.
- Loguru: Advanced logging with rotation, retention, and compression.
poetry run python examples/text-generation/run.pypoetry run python examples/text-classification/run.py- Development Mode:
uv run
- Production Mode:
uv build
The uv tool builds a source distribution first, followed by a binary distribution (wheel). You can customize the build process:
- Build only a source distribution:
uv build --sdist
- Build only a binary distribution:
uv build --wheel
- Build both distributions from source:
uv build --sdist --wheel
By default, uv builds all packages in isolated virtual environments, following PEP 517. However, some packages (e.g., PyTorch) may require disabling build isolation. To do so, add the dependency to the no-build-isolation-package list in your pyproject.toml file.
- Implement remote model integration (
ensemble.add_remote_member). - Add more voting strategies and tokenization solutions.
- Publish and manage ensembles on Hugging Face Model Repository.
- Expand documentation with tutorials and advanced examples.
Contributions are welcome! Feel free to open an issue or submit a pull request. See the Contributing Guide for more details.
This project is licensed under the MIT License.
This revised README focuses on being engaging, informative, and structured, with clear headings, concise descriptions, and actionable examples. Let me know if you’d like further refinements or to add anything specific!