TextEmbed: Embedding Inference Server
TextEmbed is a high-throughput, low-latency REST API for serving vector embeddings. Built to be flexible and scalable, TextEmbed supports a wide range of sentence-transformer models and frameworks.
This documentation reflects the latest updates from the
main
branch. For earlier versions, visit the TextEmbed repository.
🚀 Get Started Now!
Explore the key features and setup instructions below.
🔍 Key Features
- 🌐 Flexible Model Support: Deploy any model from supported sentence-transformer frameworks, including SentenceTransformers.
- ⚡ High-Performance Inference: Leverages efficient backends like Torch for optimal performance across various devices.
- 🔄 Dynamic Batching: Processes new embedding requests as soon as resources are available, ensuring high throughput and low latency.
- ✔️ Accurate and Tested: Provides embeddings consistent with SentenceTransformers, validated with unit and end-to-end tests for reliability.
- 📜 User-Friendly API: Built with FastAPI and fully documented via Swagger, conforming to OpenAI's Embedding specs.
🛠 Getting Started
Installation via PyPI
-
Install TextEmbed:
-
Start the Server:
-
View Help Options:
Running with Docker (Recommended)
-
Pull the Docker Image:
-
Run the Docker Container:
-
View Help Options:
🌐 Accessing the API
Access the API documentation via Swagger UI at http://localhost:8000/docs
.
📥 Contributing
Contributions are welcome! Please read the Contributing Guide to get started.
📄 License
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
📧 Contact
For questions or support, please reach out through GitHub Issues.
🌟 Stay Updated
Stay tuned for updates by following the TextEmbed repository. Don't forget to give us a ⭐ if you find this project helpful!