Behind the burger

How 25 AI agents, 119 sources and 10 pipeline stages turn a question into a cited answer

The name is playful — the architecture is not. Nerd Burger is real-time evidence retrieval that surfaces existing systematic reviews, meta-analyses and RCTs — it does not conduct a new systematic review. Here is exactly what happens inside the burger, layer by layer.

What's in the brain right now
25
AI agents
125
evidence sources
143
trusted sites
10
pipeline stages
3
ground-truth layers
5
AU gov clients

Numbers read live from SOURCE_REGISTRY, REPUTABLE_SITES and DEPARTMENTS on every page load. Plus 3 imported MIT-licensed npm packages from Bond IEBH wired in upcoming sprints.

Scope honesty

What PICO-SEARCH retrieves, and what it doesn't do

There are three tiers of evidence review. PICO-SEARCH is the fast one: it surfaces the top-tier evidence that already exists (systematic reviews, meta-analyses, RCTs, guidelines) and grades them against CEBM in real time. It does NOT conduct a new rapid review or a new Cochrane systematic review (those are multi-reviewer workflows measured in weeks to months). We borrow the PRISMA reporting layout so the output looks familiar, but that is a layout choice, not a methodology claim.

TierEffortDual-reviewerPROSPEROProtocolPICO-SEARCH
Full Cochrane systematic review6–18 monthsYesYesYesNot this
Rapid review (WHO / Cochrane RRMG)Days–weeksUsually noOptionalYesNot this either
Real-time evidence retrievalSeconds–minutesNoNoNoThis is PICO-SEARCH
What that means in practice. PICO-SEARCH runs the literature search, grades each study against the CEBM hierarchy, and returns a cited rapid evidence summary in a PRISMA-style layout — in under a minute. It is a literature search engine with an evidence-grading layer on top. It is not a medical device. It does not diagnose, treat, or make decisions about individual patients. When the evidence warrants a full systematic review, PICO-SEARCH is the starting point: you export the search strategy + RIS file and escalate into Covidence, Rayyan, RevMan, or DistillerSR to run the full SR workflow with two human reviewers.
The team

One Nerd Burger and 24 Slider specialist agents

Each Slider is a specialist AI agent — its own system prompt, its own ICD-11 chapter scope, its own MeSH tree roots, its own preferred sources. They are not mascots. They are twenty-four domain specialists, each trained for one clinical area.

The router

Nerd Burger (the Big Burger)

router

3-stage router: keyword/regex (24 specialist patterns) → MeSH-tree walk → LLM fallback (Haiku) with department menu. Outputs 1–5 routed Sliders. Never searches evidence directly. Synthesises cross-department answers when multiple Sliders are engaged.

The 24 Slider AI agents
  • Neurology & Nervous System
    ICD-11 8A00–8E7Z · MeSH C10

    Diseases of the central and peripheral nervous system: stroke, dementia, epilepsy, headache, neurodegeneration, neuromuscular disease.

  • Ophthalmology & Vision
    ICD-11 9A00–9E1Z · MeSH C11

    Diseases of the eye, visual system, and adnexa. Glaucoma, macular degeneration, retinopathy, cataract, refractive disorders.

  • Ear, Nose & Throat
    ICD-11 AA00–AC0Z · MeSH C09

    Diseases of the ear, nose, throat, sinus, larynx, and mastoid. Hearing loss, otitis, rhinosinusitis, head & neck infection.

  • Cardiovascular
    ICD-11 BA00–BE2Z · MeSH C14

    Heart and vascular disease: coronary disease, heart failure, arrhythmia, hypertension, lipid disorders, valvular disease, stroke overlap.

  • Respiratory
    ICD-11 CA00–CB7Z · MeSH C08

    Respiratory disease: asthma, COPD, ILD, pneumonia, sleep-disordered breathing, tuberculosis, pulmonary hypertension.

  • Gastrointestinal & Hepatology
    ICD-11 DA00–DE2Z · MeSH C06

    GI tract, liver, biliary, pancreas. IBD, IBS, viral hepatitis, fatty liver, reflux, peptic disease, functional GI.

  • Endocrine, Nutrition & Metabolic
    ICD-11 5A00–5D46 · MeSH C18, C19

    Endocrine, nutrition, metabolism: diabetes, obesity, thyroid, adrenal, pituitary, osteoporosis, lipid, weight-loss pharmacology (GLP-1 etc.).

  • Renal & Urology
    ICD-11 GA00–GC8Z · MeSH C12, C13

    Kidney disease, electrolyte and acid-base disorders, urinary tract, bladder, prostate, stone disease, nephrology overlap with CV and endocrine.

  • Musculoskeletal & Rheumatology
    ICD-11 FA00–FC0Z · MeSH C05, C17

    Bone, joint, muscle, connective tissue, rheumatology. OA, RA, spondyloarthropathy, SLE, fibromyalgia, orthopaedic trauma.

  • Dermatology
    ICD-11 EA00–EM0Z · MeSH C17

    Skin, hair, nails, subcutaneous tissue. Eczema, psoriasis, acne, skin cancer, infections, drug eruptions, paediatric dermatology.

  • Mental Health & Psychiatry
    ICD-11 6A00–6E8Z · MeSH F03

    Mood, anxiety, psychotic, neurodevelopmental, substance use, eating disorders, trauma. Psychopharmacology + psychotherapy evidence.

  • Obstetrics & Maternity
    ICD-11 JA00–JB6Z · MeSH C13.703

    Pregnancy, labour, delivery, postpartum, maternal medicine, prenatal screening, fetal medicine, breastfeeding.

  • Gynaecology & Women's Health
    ICD-11 GA00–GA4Z · MeSH C13

    Female reproductive system, menstrual disorders, PCOS, endometriosis, menopause, contraception, HRT, gynae oncology overlap.

  • Men's Health
    ICD-11 GB00–GC8Z · MeSH C12

    Male reproductive system, andrology, testosterone, erectile dysfunction, prostate, sexual health, male-specific overlap with cardiovascular.

  • Paediatrics & Child Health
    ICD-11 KA00–KD5Z · MeSH M01.060.406

    Neonatal, infant, childhood and adolescent medicine across all organ systems. Developmental, behavioural, growth, vaccination, paediatric oncology overlap.

  • Geriatrics & Older Persons
    ICD-11 * · MeSH M01.060.116

    Medicine of older adults: frailty, falls, polypharmacy, cognitive decline, functional assessment, end-of-life, multimorbidity.

  • Haematology
    ICD-11 3A00–3C0Z · MeSH C15

    Blood and blood-forming organs. Anaemia, clotting and bleeding disorders, haemoglobinopathies, transfusion, haematological malignancy overlap with oncology.

  • Oncology
    ICD-11 2A00–2F9Z · MeSH C04

    Solid tumour and haematological cancer treatment, screening, survivorship, palliative intent. Systemic therapy evidence, immunotherapy, radiation, biomarker testing.

  • Infectious Disease
    ICD-11 1A00–1H0Z · MeSH C01

    Infectious and parasitic disease, antimicrobial therapy and stewardship, global/travel health, sepsis, HIV, TB, hepatitis, emerging pathogens.

  • Immunology & Allergy
    ICD-11 4A00–4B4Z · MeSH C20

    Immune system disorders, primary immunodeficiency, autoimmunity overlap, allergy, anaphylaxis, asthma overlap, immunotherapy.

  • Emergency & Critical Care
    ICD-11 NA00–NF2Z · MeSH E02.365

    Acute resuscitation, trauma, emergency medicine, intensive care, sepsis management, ventilation, shock, mass-casualty triage.

  • Rehabilitation, Pain & Palliative
    ICD-11 * · MeSH E02.760, G11

    Physical rehabilitation, chronic pain management, palliative and end-of-life care, symptom control, hospice, functional restoration.

  • Public Health & Preventive Medicine
    ICD-11 QA00–QF4Z · MeSH N06

    Population health, screening, vaccination, epidemiology, health-promotion interventions, social determinants, cost-effectiveness.

  • Dental & Oral Health
    ICD-11 DA00–DA0Z · MeSH C07

    Teeth, gingiva, oral mucosa, salivary glands, jaw. Caries, periodontal disease, oral cancer screening, paediatric dentistry overlap, orthodontics.

The pipeline

What happens when a question comes in

Ten stages from question intake to the cited answer on screen. Most of them run in parallel. The whole thing finishes in 30–60 seconds.

  1. 01

    Routing — Nerd Burger 3-stage router

    Stage A keyword/regex match against 24 specialist patterns. Stage B MeSH-tree walk for ambiguous cases. Stage C LLM fallback (Haiku) with department menu. Output: one to five routed Sliders.

  2. 02

    Smart reuse cache check

    Normalised question hash lookup via find_recent_search RPC. If the same question (case + punctuation + stopword normalised) was completed within 90 days, reuse the result. Skippable with the ‘fresh search’ checkbox.

  3. 03

    Literature fan-out — 11 sources in parallel

    Each routed Slider runs its own parallel search across PubMed (RCT/SR + dedicated Practice Guideline streams), Europe PMC, ClinicalTrials.gov v2, Semantic Scholar, CORE, Crossref, Epistemonikos. Each source has its own 15-second timeout. One slow source never blocks the rest.

  4. 04

    Dedupe + CEBM rank

    Cross-source dedupe by DOI → PMID → NCT → fuzzy title+year. Then rank: evidence_tier × recency_decay × relevance × Jadad heuristic. Top 15 citations carried forward to synthesis.

  5. 3.5

    Safety overlay — US drug ground truth

    Drug names extracted from the question, resolved to RxNorm, then OpenFDA label sections (indications, contraindications, warnings, adverse reactions, drug interactions, pregnancy) and FAERS top-5 adverse events. Special-population regex flags pregnancy, paediatric, geriatric, renal, hepatic, breastfeeding. Built into a structured promptBlock.

  6. 3.6

    AU context overlay — Australian regulatory ground truth

    PBS API for authority/restriction text + subsidised brand listings, NCTS Ontoserver for SNOMED CT-AU expansion of clinical terms, TGA CKAN discovery + safety alert deep-links. Conditional firing — only the relevant arm runs per question. Built into a structured promptBlock.

  7. 05

    Authoritative source matcher

    Deterministic walker over SOURCE_REGISTRY for any source whose authoritativeFor[] keywords appear in the question. Surfaces eTG, AMH, HealthPathways, PBS API, NCTS, TGA, NHMRC, health.gov.au as click-through banners above the answer. Pure registry walk, no LLM call.

  8. 06

    Layered LLM synthesis

    Anthropic Claude Sonnet 4.6 → OpenAI GPT-5 → Google Gemini 2.5 Pro fallback chain via Vercel AI Gateway. Both clinician (PRISMA Zod schema) and plain-language (Y8 Zod schema with hard minimum character counts) generated in parallel. Single-provider calls are forbidden for the synthesis path.

  9. 07

    Citation validator

    Every [Ref N] pointer in the generated prose is walked and validated against the top-15 citation block. Orphan references (N out of bounds) and unreferenced citations (in the block but not cited in the prose) are both logged. The job still ships, but quality metrics track hallucinations over time.

  10. 08

    Render

    Authoritative banner → safety overlay banner → AU context banner → PRISMA-aligned clinician answer → CEBM pyramid + ranked citations → further reading link-outs → progress trace. Top to bottom.

Three kinds of structured facts

What the Nerd Burger feeds the LLM

When the synthesis layer writes an answer, it has three structured fact layers available. Each layer has its own boundary rules so the model never conflates peer-reviewed evidence with regulatory data.

Layer 1

Literature citations

Always carries `[Ref N]` pointers. The only layer that gets reference numbers in the prose. The LLM is forbidden to fabricate citations or cite N greater than the citation count.

Triggered: Every search where any literature was retrieved
Sources: PubMed + Europe PMC + ClinicalTrials.gov + Semantic Scholar + CORE + Crossref + Epistemonikos
Layer 2

Safety overlay

Structured drug facts from US FDA. Treated as ground truth, NOT as new citations. Inline references like ‘FDA black box warning for…’ are allowed, but no `[Ref N]` is generated for them.

Triggered: When a known drug name is detected in the question
Sources: RxNorm (drug identity) + OpenFDA (labels) + FAERS (adverse events)
Layer 3

AU context overlay

Structured AU regulatory facts. Clinician mode references SNOMED concepts inline as `(SNOMED <code> |<display>|)` and PBS authority status as `(PBS authority required)`. Plain mode explains PBS authority in patient-friendly language. Never gets a `[Ref N]` either.

Triggered: When a drug OR a clinical term is detected (PBS+TGA fire on drugs, NCTS fires on terms)
Sources: PBS API v3 + NCTS Ontoserver (SNOMED CT-AU + AMT) + TGA via data.gov.au CKAN
Why some studies count more

The CEBM evidence pyramid

A large systematic review of randomised trials outranks a single case series or expert opinion. We rank studies the way the Centre for Evidence-Based Medicine specifies — and the ranking changes depending on the question type.

  • Tier 1aSR / meta-analysis of RCTs×1.00
    e.g. Cochrane review of statins for primary prevention
  • Tier 1bSingle RCT (or SR of inception cohorts for prognosis)×0.85
    e.g. JUPITER trial for rosuvastatin
  • Tier 2aSR of cohort studies×0.70
    e.g. SR of cohort studies linking PPI use to fracture
  • Tier 2bSingle cohort study×0.60
    e.g. Framingham Heart Study cohort analysis
  • Tier 3aSR of case-control studies×0.45
    e.g. SR of case-control studies on NSAIDs and AKI
  • Tier 3bSingle case-control study×0.35
    e.g. Case-control of clopidogrel and bleeding
  • Tier 4Case series / case report×0.25
    e.g. Case series of rare drug interactions
  • Tier 5Expert opinion / narrative review×0.15
    e.g. Editorial in NEJM

Question-type weighted: a treatment question puts SR-of-RCTs at tier 1a; a test-accuracy question puts SR-of-validation studies at tier 1a and weights RCTs as tier 2a; a prognosis question puts SR-of-inception cohorts at tier 1a; an aetiology question puts cohort SRs first. The full mapping lives in packages/config/src/evidence-tiers.ts.

The shape of the answer

PRISMA-style layout + citation validation

The clinician answer borrows the PRISMA 2020 reporting layout — the section structure developed for systematic reviews — so readers can scan methods, results and limitations in a familiar shape. PICO-SEARCH retrieves the top-tier evidence (existing systematic reviews, meta-analyses and RCTs) and presents it in PRISMA's section format — it does NOT conduct a new systematic review (that is a 6–18 month dual-reviewer workflow). Every [Ref N] pointer in the prose is checked against the citation block before the answer ships. No fabricated studies. Ever.

The clinician answer is generated via generateObject + a Zod schema modelled on PRISMA 2020 reporting standards. Every section is structurally validated:

  • Background — minimum character count, sets up the clinical question
  • Methods — the search strategy, sources searched, dates, study types
  • Results — narrative synthesis with[Ref N] pointers and an evidence-tier breakdown
  • Limitations — risk of bias, heterogeneity, gaps in the evidence
  • Conclusion — GRADE strength rating + practice recommendation
  • Authoritative sources — registry-matched click-throughs (eTG, AMH, etc.) for any paywalled references

Post-synthesis, the citation validator walks every [Ref N] in the prose against the citation block. Orphans are logged and surfaced. The job ships even with orphans (so the user always gets an answer) but the quality metric tracks hallucination rate over time.

The science

Evidence-Based Medicine, in three minutes

EBM is the discipline of applying the best current evidence in clinical practice. PICO is how the question is structured. CEBM is how the answer is graded. GRADE is how confidence in that answer is expressed. PICO-SEARCH applies all three.

PICO

Population — Intervention — Comparator — Outcome. The structure that turns a vague clinical question (‘should I give statins to my 75-year-old?’) into a searchable one (‘in adults over 70 without cardiovascular disease, do statins reduce all-cause mortality compared with placebo?’).

CEBM hierarchy

The Centre for Evidence-Based Medicine (Oxford) ranks evidence from tier 1a (systematic reviews of RCTs) down to tier 5 (expert opinion). PICO-SEARCH uses the Burns/Rohrich/Chung 2012 mapping with question-type weighting.

GRADE

The Grading of Recommendations Assessment, Development and Evaluation framework. After ranking the evidence, GRADE expresses how confident a clinician should be in the recommendation: high / moderate / low / very low. The clinician answer always includes a GRADE rating.

PRISMA 2020

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 standard. Defines the section layout a systematic review report should use. PICO-SEARCH borrows this LAYOUT for its rapid evidence summaries — we use PRISMA's section headings so clinicians can scan methods/results/limitations in a familiar shape. PICO-SEARCH is NOT a systematic review. We use the reporting shell; we are not claiming the methodological rigour of a full SR.

AGREE II

Appraisal of Guidelines for Research and Evaluation. The standard for assessing whether a clinical practice guideline is well-developed. Used implicitly when ranking guideline citations.

Risk of bias

Cochrane RoB 2.0 for RCTs, Newcastle-Ottawa for cohorts. PICO-SEARCH applies a Jadad heuristic on RCT abstracts as a quality multiplier in the ranking score; full RoB is on the deferred queue.

The bright lines

What the architecture never does

The product is built on a few non-negotiables — encoded in the source code, the system prompts, and the Zod schemas. Not disclaimers at the bottom; actual architectural guardrails.

Never scrape licensed content

eTG, AMH, MIMS, UpToDate, BMJ Best Practice, DynaMed and 12 other commercial references are tier ‘licensed_linkout’. We surface a click-through banner. We never fetch their content. Ever.

Never provide dosing in plain mode

The plain-language system prompt forbids dosing, frequencies, routes, schedules, or titration. If a regression appears, fix the prompt before shipping.

Never fabricate citations

Citation validator walks every `[Ref N]` pointer in the prose against the top-15 citation block. Orphan references are logged. Synthesis schemas carry hard min character counts so the LLM cannot produce a shallow placeholder answer.

Never give a verdict on an individual

Plain answers use ‘studies suggest’, ‘evidence indicates’, ‘researchers found’ — never ‘you have…’. The clinician answer is an evidence summary built from published literature. It is not a verdict on any individual person or case.

Never store subscription credentials

User subscription preferences (eTG / AMH / UpToDate / BMJ Best Practice / DynaMed / Cochrane / NICE) are boolean flags only. We never store passwords or tokens for licensed third parties. Ever.

Never use a single LLM provider for synthesis

The synthesis path requires the Anthropic → OpenAI → Google fallback chain. Single-provider calls are forbidden in code review. Resilience + clinical safety.

The honest gaps

What the brain doesn't have yet

Every shipping product has open work. Here is what is still on the roadmap, ranked by leverage.

  1. 1

    PubTator + Unpaywall source clients

    Free PDF link button on every citation card via Unpaywall. Entity-tag chips on PubMed cards via PubTator 3.0. Both Apache/MIT, both free REST, both shippable in one commit. ETA: next sprint.

  2. 2

    NCTS Syndication TS port

    Pulls the NCTS Atom feed daily and stamps every synthesis answer with ‘Pinned to AMT v3 release YYYY-MM’. Defensible against ‘your data is stale’ critique. ETA: ~1 week.

  3. 3

    @iebh/sra-polyglot live in-browser

    Bond IEBH ship an MIT npm package that translates any PubMed search query into Ovid, Embase, Cochrane, CINAHL, Web of Science, Scopus syntaxes. Replace our Polyglot link-out with the live translator. ETA: 1 commit.

  4. 4

    BioLinkBERT local re-ranker

    Stanford LinkBERT (Apache-2.0) outperforms PubMedBERT on BLURB. Local re-ranking pass between source fan-out and LLM synthesis cuts Sonnet token bill by ~40-60% per query without hurting answer quality. ETA: ~1 week.

  5. 5

    Regression suite — EBM-NLP + MS² + MedReview

    Hard PICO span F1 + ROUGE numbers we can defend against the surveyed open SR-automation tools. ETA: 3-5 days.