I built NerdQA out of frustration with opaque AI research tools. I wanted an engine where every claim is traceable, every source is cited, and the reasoning path is transparent. This open-source system delivers that, using hierarchical citations – developed and released 6 months prior to similar major industry announcements.

🔍 Open Source Research Tool

NerdQA is ready to change your research process with transparent, verifiable citations. Full setup guide and source code are on GitHub.

Build Your Own Research Engine

Overview

NerdQA is a deep research automation system that I implemented 6 months before OpenAI announced its Deep Research agent. This groundbreaking system enables verifiable web research through hierarchical citation graphs, providing a deterministic, fully transparent alternative to black-box research agents, creating traceable, attributable reasoning chains with direct citations to original sources.

NerdQA Query Interface NerdQA’s intuitive interface for submitting research queries with customizable parameters

Key Features

  • Transparent Reasoning: Every conclusion is backed by a fully verifiable citation tree
  • Source Attribution: All information is directly linked to primary web sources
  • Hierarchical Citations: Complex questions are broken down into sub-questions with their own evidence trees
  • Deterministic Research: Reproducible research paths for consistent, verifiable results
  • Alternative to Black-Box Agents: Brings transparency to AI-assisted research (developed 6 months before OpenAI’s Deep Research)
  • Cost-Effective: Designed to run efficiently, potentially on smaller/cheaper LLMs

Why Not Agent-Based?

Unlike common agent-based approaches, NerdQA uses an engineered LLM workflow in a loop that can explore research questions both vertically (for a predefined number of steps) and horizontally (making a defined number of parallel queries). This approach offers several advantages:

  • Deterministic Behavior: More predictable outcomes with less randomness
  • Easier Troubleshooting: Clear workflow steps make debugging simpler
  • Cost Efficiency: Optimized prompts and controlled token usage keep costs low
  • Simplified Development: Built as a budget-free pet project in personal time
  • Model Flexibility: Feasible to run on cheaper, smaller LLMs

What Makes NerdQA Unique

The most novel aspect of NerdQA is its hierarchical statement architecture. When generating intermediate or final reports, the LLM can cite not only original web sources but also intermediate statements that are built on top of other statements and sources. This creates citation trees that can:

  1. Unlock complex conclusions not explicitly stated in sources
  2. Support reasoning that builds safely on multiple pieces of evidence
  3. Be extended to generate novel solutions by producing both strict synthesis and hypotheses

NerdQA Langfuse Trace Comprehensive tracing of the hierarchical research process through Langfuse integration

Technical Implementation

Tech Stack

  • Core: Custom LLM framework for orchestration (no dependencies on LangChain, etc.)
  • LLM: LiteLLM proxy for reliable API access with fallbacks and Redis caching
  • Search: Exa Search for web search capabilities
  • Reranking: Cohere Reranker for improved search result relevance
  • Monitoring: Langfuse for comprehensive LLM tracing
  • Frontend: Vanilla JavaScript for a lightweight, customizable UI

System Architecture

NerdQA consists of several key components:

  • Pipeline: Orchestrates the research workflow, managing statement generation, query generation, and answer synthesis
  • Statement Generator: Creates factual statements based on search results
  • Query Generator: Formulates follow-up queries based on current research state
  • Answer Generator: Synthesizes final answers with citations to statements and sources
  • Web Server: Provides API endpoints for interacting with the system
  • Frontend: Simple, customizable vanilla JavaScript interface

The system uses a loop-based approach where each iteration:

  1. Generates queries based on the research question
  2. Searches for relevant information
  3. Creates statements from search results
  4. Builds on previous statements to generate deeper insights
  5. Eventually synthesizes a comprehensive answer

Advanced Features

  • Citation Trees: Click on citations in the final report to explore the evidence chain
  • Feedback: Provide feedback on answers to improve future research
  • Custom Search Providers: The system supports multiple search backends including Exa Search, Brave Search, and OpenAlex
  • LLM Tracing: With Langfuse integration, you can inspect all LLM calls, prompts, token usage, timing, and more
  • Caching: Redis-based LLM response caching to improve performance and reduce API costs

View on GitHub