This is based upon the bootstrapped version (via create-llama) of LlamaIndex using FastAPI with a locally run LLM via Ollama.
First , go to backend and create an .env file with your environment variables, a sample setup can be found in backend/.env-sample.
Next, setup the environment with poetry:
poetry install
poetry shell
Put your data sources in the backend/data folder
Generate the embeddings of the documents in the ./data directory:
poetry run generate
Run the development server:
python main.py
There are two API endpoint:
/api/chat/ - a streaming chat endpoint
/api/chat/request - a non-streaming chat endpoint
You can use it via its streaming endpoint:
curl --location 'localhost:8000/api/chat/' \
--header 'Content-Type: application/json' \
--data '{ "messages": [{ "role": "user", "content": "Hello" }] }'
or its non-streaming counterpart:
curl --location 'localhost:8000/api/chat/request' \
--header 'Content-Type: application/json' \
--data '{ "messages": [{ "role": "user", "content": "Hello" }] }'
For a simple CLI assistant run
poetry run chat-cli
after having started the server.
- Streaming output ✅
- CLI ✅
- VectorDB (e.g. Weaviate)
- React Frontend