Your agent works.
Until a customer touches it.
AI agents that pass every internal test still fail 40% of real-world tasks.
We fix the architecture that makes that number inescapable.
Free and open source. MIT License.
Star us on GitHubStructural Security, Not Probabilistic Guardrails
Security isn't bolted on. It's architecturally guaranteed.
| Traditional AI Agents | OpenSymbolicAI |
|---|---|
| Data dumped into context | Data stays in variables |
| "Please don't access other users' data" | Code enforces boundaries |
| Hope the AI doesn't cause harm | Mutations require approval |
| Probabilistic guardrails | Structural guarantees |
| Cloud-dependent | Deploy anywhere |
Working agent in 3 steps
From install to running output — no config files, no boilerplate.
pip install opensymbolicai-core ddgsfrom ddgs import DDGS
from opensymbolicai import (
PlanExecute, primitive, decomposition,
)
from opensymbolicai.llm import LLMConfig, Provider
class SearchAgent(PlanExecute):
@primitive(read_only=True)
def search(self, query: str, k: int = 5) -> list[str]:
"""Search the web via DuckDuckGo."""
results = DDGS().text(query, max_results=k)
return [r["body"] for r in results]
@primitive(read_only=True)
def answer(self, question: str, ctx: list[str]) -> str:
"""Answer a question given search context."""
return self._llm.generate(
"Answer based on context:\n"
+ "\n".join(ctx)
+ f"\n\nQ: {question}"
).text
@decomposition(
intent="What are the new features in Python 3.13?",
expanded_intent="Search the web, then answer using results",
)
def web_qa(self) -> str:
hits = self.search("Python 3.13 new features", k=3)
return self.answer(
"What are the new features in Python 3.13?", hits,
)agent = SearchAgent(llm=LLMConfig(
provider=Provider.OLLAMA, model="qwen3:1.7b",
))
result = agent.run("What is Rust and why is it popular?")
print(result.result)Why Agents Break, and How to Fix It
Three concepts that turn prompt spaghetti into software you can actually ship.
Define
Typed primitives: the atomic actions your agent can take, like search, retrieve, or send email.
Compose
Wire primitives into decompositions: named workflows the agent selects by matching user intent.
Run
Call agent.run() and intent matching picks the right decomposition. Guardrails are built in.
It worked in your demo. Then a customer used it.
Most enterprise AI projects never make it to production. The reason is reliability. 95% accuracy per step sounds great until you chain 10 steps and your end-to-end success rate is 60%.
The problem isn't the models. It's the architecture. Everyone's running LLMs in loops, re-reading context every turn, burning tokens, compounding errors. With OpenSymbolicAI, you define steps as code functions, not paragraphs of instructions.
- ✓No more 500-token prompts
- ✓No more guessing what the AI will do
- ✓No more untestable behavior
tools = [{"name": "retrieve", ...}, {"name": "rerank", ...}, ...]
prompt = f"""You are a RAG assistant. CRITICAL: Use ONLY retrieved info.
## QUERY CLASSIFICATION (classify BEFORE acting):
- Simple factual → retrieve(k=3) → extract_answer
- Complex/deep dive → retrieve(k=8) → rerank(k=3) → extract
- Comparison → retrieve(topic_A) + retrieve(topic_B) → compare
## RESPONSE FORMAT (STRICT):
Return JSON: {{"thinking": "...", "tool_calls": [...],
"final_answer": "...", "sources": [...], "confidence": 0.0-1.0}}
## TOOL PARAMETER RULES:
- retrieve: k must be 3-10, query must be <100 chars
- rerank: only after retrieve, k <= original k
- extract_answer: requires non-empty doc list
## CRITICAL CONSTRAINTS:
❌ NEVER hallucinate or make up information
❌ NEVER call extract_answer without first calling retrieve
❌ NEVER exceed confidence 0.9 without source validation
✓ ALWAYS cite sources with doc_id references
✓ ALWAYS include confidence scores
REMEMBER: You are a RETRIEVAL assistant, not a knowledge base.
Query: {query}"""
# ... (500 more tokens of examples and error handling)
response = llm.complete(prompt, tools=tools)class RAGAgent(PlanAndExecute):
@primitive
def retrieve(self, q: str, k: int = 5) -> list[Document]: ...
@primitive
def rerank(self, docs, q: str) -> list[Document]: ...
@primitive
def extract(self, docs, q: str) -> str: ...
@decomposition(intent="What is machine learning?")
def simple_qa(self):
docs = self.retrieve("machine learning definition", k=3)
return self.extract(docs, "What is machine learning?")
@decomposition(intent="Explain the architecture of transformers")
def deep_dive(self):
docs = self.retrieve("transformer architecture innovations", k=8)
ranked = self.rerank(docs, "transformer architecture")
return self.extract(ranked, "Explain transformer architecture")
@decomposition(intent="Compare React vs Vue")
def compare(self):
docs = self.retrieve("React") + self.retrieve("Vue")
return self.extract(docs, "Compare React vs Vue")
# Intent matching happens automatically:
answer = agent.run("What is attention?")
deep_dive = agent.run("Deep dive on transformers")
comparison = agent.run("React vs Vue")tools = [{"name": "retrieve", ...}, {"name": "rerank", ...}, ...]
prompt = f"""You are a RAG assistant. CRITICAL: Use ONLY retrieved info.
## QUERY CLASSIFICATION (classify BEFORE acting):
- Simple factual → retrieve(k=3) → extract_answer
- Complex/deep dive → retrieve(k=8) → rerank(k=3) → extract
- Comparison → retrieve(topic_A) + retrieve(topic_B) → compare
## RESPONSE FORMAT (STRICT):
Return JSON: {{"thinking": "...", "tool_calls": [...],
"final_answer": "...", "sources": [...], "confidence": 0.0-1.0}}
## TOOL PARAMETER RULES:
- retrieve: k must be 3-10, query must be <100 chars
- rerank: only after retrieve, k <= original k
- extract_answer: requires non-empty doc list
## CRITICAL CONSTRAINTS:
❌ NEVER hallucinate or make up information
❌ NEVER call extract_answer without first calling retrieve
❌ NEVER exceed confidence 0.9 without source validation
✓ ALWAYS cite sources with doc_id references
✓ ALWAYS include confidence scores
REMEMBER: You are a RETRIEVAL assistant, not a knowledge base.
Query: {query}"""
# ... (500 more tokens of examples and error handling)
response = llm.complete(prompt, tools=tools)class RAGAgent(PlanAndExecute):
@primitive
def retrieve(self, q: str, k: int = 5) -> list[Document]: ...
@primitive
def rerank(self, docs, q: str) -> list[Document]: ...
@primitive
def extract(self, docs, q: str) -> str: ...
@decomposition(intent="What is machine learning?")
def simple_qa(self):
docs = self.retrieve("machine learning definition", k=3)
return self.extract(docs, "What is machine learning?")
@decomposition(intent="Explain the architecture of transformers")
def deep_dive(self):
docs = self.retrieve("transformer architecture innovations", k=8)
ranked = self.rerank(docs, "transformer architecture")
return self.extract(ranked, "Explain transformer architecture")
@decomposition(intent="Compare React vs Vue")
def compare(self):
docs = self.retrieve("React") + self.retrieve("Vue")
return self.extract(docs, "Compare React vs Vue")
# Intent matching happens automatically:
answer = agent.run("What is attention?")
deep_dive = agent.run("Deep dive on transformers")
comparison = agent.run("React vs Vue")Engineering Certainty into AI
Production-Grade Reliability
Agents That Actually Work in Production
While LangChain hits 77.8% and CrewAI hits 73.3%, OpenSymbolicAI achieves a 100% framework pass rate on complex workflows. By replacing unpredictable prompts with type-safe primitives, you eliminate the randomness of agents that work on Tuesday but fail on Friday.
See the benchmarksCompound Improvements
Fix Once, Improve Everywhere
Stop playing whack-a-mole with one-off prompt patches. Because the architecture uses reusable symbolic primitives, every fix automatically upgrades every workflow that uses it. Ten primitives combine in hundreds of ways. Twenty combine in thousands.
Zero-Fail Tooling
0% Error Rate on External Actions
Standard agent frameworks face a 20% error rate when calling external tools. A symbolic boundary between planning and execution brings that to zero, so your agents never invent parameters or leak sensitive data during real-world execution.
Optimization by Design
Up to 10x Cheaper
Reliability shouldn't come with a token tax. The LLM plans once and your code executes. 2x faster and up to 10x cheaper. These are programs using LLMs, not LLMs as the execution engine. You don't need a frontier model.
From the Blog
Technical articles and insights about building AI applications.
Third Language, Same Result: MultiHopRAG in Go
Go joins Python and C# on the MultiHopRAG benchmark. Different runtime, different vector store, single static binary. Accuracy: 81.6%. The framework holds.
Change Everything, Change Nothing: MultiHopRAG in Python and C#
We swapped the language, the vector store, the code executor, and the type system. Accuracy moved by 0.9pp. The framework is the invariant, not the infrastructure.
Agent-to-Agent Is Just Function Calls
Multi-agent systems don't need new infrastructure. They use the same patterns that connect microservices today: typed interfaces, explicit wiring, and the auth and observability stack you already have.
Make your AI engineers 10x more productive
Reduce debugging time. Version-control behavior changes. Onboard new engineers faster.