Skip to content

🌐 Rankify Server

Deploy Rankify as a REST API for production applications.

Quick Start

Command Line

# Start server with default config
rankify serve --port 8000

# Custom configuration
rankify serve --retriever bge --reranker flashrank --port 8000

Python

from rankify.server import RankifyServer

server = RankifyServer(
    retriever="bge",
    reranker="flashrank",
    generator="basic-rag",
)

server.start(host="0.0.0.0", port=8000)

API Endpoints

Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "retriever": "bge",
  "reranker": "flashrank",
  "generator": "basic-rag"
}


Retrieve Documents

curl -X POST http://localhost:8000/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "What is machine learning?", "n_docs": 10}'

Response:

{
  "query": "What is machine learning?",
  "documents": [
    {
      "id": "doc_1",
      "text": "Machine learning is a subset of AI...",
      "title": "Introduction to ML",
      "score": 0.89
    }
  ],
  "latency_ms": 45.2
}


Rerank Documents

curl -X POST http://localhost:8000/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is deep learning?",
    "documents": [
      {"id": "1", "text": "Deep learning uses neural networks..."},
      {"id": "2", "text": "Machine learning is a type of AI..."},
      {"id": "3", "text": "Deep neural networks have many layers..."}
    ],
    "top_k": 2
  }'

Response:

{
  "query": "What is deep learning?",
  "documents": [
    {"id": "3", "text": "Deep neural networks have many layers...", "score": 0.92},
    {"id": "1", "text": "Deep learning uses neural networks...", "score": 0.87}
  ],
  "latency_ms": 12.5
}


RAG Generation

curl -X POST http://localhost:8000/rag \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain transformers", "n_contexts": 5}'

Response:

{
  "query": "Explain transformers",
  "answer": "Transformers are a neural network architecture that uses self-attention...",
  "contexts": [
    {"id": "1", "text": "The transformer was introduced in...", "score": 0.95}
  ],
  "latency_ms": 1250.3
}


Batch Retrieve

curl -X POST http://localhost:8000/retrieve/batch \
  -H "Content-Type: application/json" \
  -d '["What is AI?", "What is ML?", "What is DL?"]'

Server Configuration

from rankify.server import RankifyServer

server = RankifyServer(
    retriever="bge",              # Retriever method
    reranker="flashrank",         # Reranker method
    generator="basic-rag",        # RAG method (optional)
    retriever_model=None,         # Specific retriever model
    reranker_model="ms-marco-MiniLM-L-12-v2",
    generator_model="gpt-4o-mini",
    generator_backend="openai",
    index_type="wiki",
    n_docs=100,
)

server.start(
    host="0.0.0.0",
    port=8000,
    workers=4,      # Number of workers
    reload=False,   # Auto-reload for dev
)

Docker Deployment

FROM python:3.10-slim

WORKDIR /app
RUN pip install rankify fastapi uvicorn

EXPOSE 8000

CMD ["python", "-m", "rankify.server", "--host", "0.0.0.0", "--port", "8000"]
docker build -t rankify-server .
docker run -p 8000:8000 rankify-server

OpenAPI Documentation

Access interactive API docs at: - Swagger UI: http://localhost:8000/docs - ReDoc: http://localhost:8000/redoc


Client Libraries

Python

import requests

response = requests.post(
    "http://localhost:8000/rag",
    json={"query": "What is AI?", "n_contexts": 5}
)
print(response.json()["answer"])

JavaScript

const response = await fetch('http://localhost:8000/rag', {
  method: 'POST',
  headers: {'Content-Type': 'application/json'},
  body: JSON.stringify({query: 'What is AI?', n_contexts: 5})
});
const data = await response.json();
console.log(data.answer);