🤖 RankifyAgent - Intelligent Model Selection
RankifyAgent is an AI-powered assistant that helps you select the optimal retrieval, reranking, and RAG models for your specific use case.
Overview
With 10+ retrievers, 23+ rerankers, and 7 RAG methods, choosing the right combination can be overwhelming. RankifyAgent solves this by:
- 📊 Analyzing your requirements - Task type, hardware, latency needs
- 🎯 Recommending optimal models - Based on accuracy, speed, and constraints
- 💬 Conversational interface - Natural language interaction
- 💻 Code generation - Ready-to-use code snippets
Installation
RankifyAgent is included with Rankify:
For conversational features, install OpenAI:
Quick Start
1. Programmatic Recommendation
Get instant recommendations with a single function call:
from rankify.agent import recommend
# Recommend models for QA task with GPU
result = recommend(task="qa", gpu=True, include_rag=True)
print(f"Retriever: {result.retriever.name}")
print(f"Reranker: {result.reranker.name}")
print(f"RAG Method: {result.rag_method.name}")
Expected Output:
2. Conversational Agent
Have a natural conversation to get recommendations:
from rankify.agent import RankifyAgent
import os
# Set up Azure OpenAI (or use other backends)
os.environ["AZURE_OPENAI_ENDPOINT"] = "your-endpoint"
os.environ["AZURE_OPENAI_API_KEY"] = "your-key"
os.environ["AZURE_DEPLOYMENT_NAME"] = "gpt-4o"
agent = RankifyAgent(backend="azure")
response = agent.chat("I need a fast search system for production without GPU")
print(response.message)
Expected Output:
For a production search system without GPU, I recommend:
**Retriever: BM25**
- Ultra-fast sparse retrieval, no GPU needed
- Great for keyword-based search
**Reranker: FlashRank MiniLM**
- ONNX-based, runs on CPU
- Very low latency (~10ms)
This combination gives you sub-50ms latency with good accuracy.
Supported LLM Backends
RankifyAgent supports multiple LLM backends:
| Backend | Description | Setup |
|---|---|---|
azure |
Azure OpenAI | Set AZURE_OPENAI_* env vars |
openai |
OpenAI API | Set OPENAI_API_KEY |
litellm |
100+ providers | Model-specific setup |
local |
Local LLMs | Requires GPU, uses transformers |
Azure OpenAI Setup
import os
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource.openai.azure.com/"
os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key"
os.environ["AZURE_DEPLOYMENT_NAME"] = "gpt-4o"
os.environ["AZURE_API_VERSION"] = "2024-05-01-preview"
from rankify.agent import RankifyAgent
agent = RankifyAgent(backend="azure")
OpenAI Setup
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
from rankify.agent import RankifyAgent
agent = RankifyAgent(backend="openai", model_name="gpt-4o-mini")
LiteLLM (Claude, Gemini, etc.)
from rankify.agent import RankifyAgent
# Use Claude
agent = RankifyAgent(backend="litellm", model_name="claude-3-5-sonnet-20241022")
# Use Gemini
agent = RankifyAgent(backend="litellm", model_name="gemini/gemini-pro")
API Reference
recommend() Function
Quick recommendation without conversation.
from rankify.agent import recommend
result = recommend(
task="qa", # "qa", "search", "summarization", "conversational"
gpu=True, # GPU availability
api_allowed=True, # Allow API-based models
include_rag=False, # Include RAG method recommendation
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
task |
str | "qa" | Task type |
gpu |
bool | True | GPU available |
api_allowed |
bool | True | Allow API models |
include_rag |
bool | False | Include RAG recommendation |
Returns: RecommendationResult
RankifyRecommender Class
More control over recommendations.
from rankify.agent import RankifyRecommender
recommender = RankifyRecommender()
# Full recommendation with constraints
result = recommender.recommend(
task="search",
constraints={
"gpu": False,
"prefer_speed": True,
"no_api": True,
},
include_reranker=True,
include_rag=False,
top_k=3, # Number of alternatives
)
print(f"Best: {result.retriever.name}")
print(f"Alternatives: {[m.name for m in result.alternatives['retrievers']]}")
Expected Output:
Specialized Methods
# Optimize for latency
result = recommender.recommend_for_latency(max_latency_ms=50, task="search")
# Optimize for accuracy
result = recommender.recommend_for_accuracy(task="qa", gpu_available=True)
# Production-ready models
result = recommender.recommend_for_production(gpu_available=False, api_allowed=True)
RankifyAgent Class
Conversational interface with memory.
from rankify.agent import RankifyAgent
agent = RankifyAgent(backend="azure")
# First message
response = agent.chat("I need to build a QA system for legal documents")
print(response.message)
# Follow-up (remembers context)
response = agent.chat("What if I also need summarization?")
print(response.message)
# Get code snippet
if response.code_snippet:
print(response.code_snippet)
# Clear conversation history
agent.clear_history()
Response Object:
@dataclass
class AgentResponse:
message: str # Agent's response
recommendation: RecommendationResult # Recommendation object
code_snippet: str # Generated code
Constraint Options
| Constraint | Type | Description |
|---|---|---|
gpu |
bool | GPU availability |
max_memory_mb |
int | Maximum memory budget |
prefer_speed |
bool | Prioritize fast models |
prefer_accuracy |
bool | Prioritize accurate models |
no_api |
bool | Only local models |
api_only |
bool | Only API-based models |
language |
str | Required language ("en", "zh", etc.) |
Model Registry
Explore available models:
from rankify.agent import (
get_all_retrievers,
get_all_rerankers,
get_all_rag_methods,
)
# List all retrievers
for name, model in get_all_retrievers().items():
print(f"{model.name}: {model.speed.value}, GPU={model.gpu_required}")
Expected Output:
BM25: very_fast, GPU=False
DPR (Multi-Encoder): medium, GPU=True
ANCE: medium, GPU=True
BGE: fast, GPU=True
ColBERT: slow, GPU=True
Contriever: medium, GPU=True
HyDE: slow, GPU=True
Online Retriever: medium, GPU=False
Model Metadata
Each model has detailed metadata:
from rankify.agent import get_model
model = get_model("reranker", "flashrank-minilm")
print(f"Name: {model.name}")
print(f"Speed: {model.speed.value}")
print(f"Accuracy: {model.accuracy.value}")
print(f"GPU Required: {model.gpu_required}")
print(f"Memory: {model.memory_mb}MB")
print(f"Best For: {model.best_for}")
Expected Output:
Name: FlashRank MiniLM
Speed: very_fast
Accuracy: good
GPU Required: False
Memory: 50MB
Best For: ['low latency', 'cpu deployment', 'edge devices', 'production']
Complete Example
End-to-end usage from recommendation to execution:
from rankify.agent import recommend
from rankify.dataset.dataset import Dataset
from rankify.retrievers.retriever import Retriever
from rankify.models.reranking import Reranking
# Step 1: Get recommendation
result = recommend(task="qa", gpu=True)
print(f"Using: {result.retriever.name} + {result.reranker.name}")
# Step 2: Load data
dataset = Dataset(retriever="bm25", dataset_name="nq-dev", n_docs=100)
documents = dataset.download()[:10]
# Step 3: Use recommended retriever
retriever = Retriever(
method=result.retriever.method,
n_docs=100
)
documents = retriever.retrieve(documents)
# Step 4: Use recommended reranker
reranker = Reranking(
method=result.reranker.method,
model_name=result.reranker.model_path
)
documents = reranker.rank(documents)
print(f"Processed {len(documents)} documents")
Next Steps
- Retrieval Tutorials - Learn about retrieval methods
- Reranking Tutorials - Explore reranking options
- RAG Tutorials - Build RAG pipelines