Behind the burger

How 25 AI agents, 119 sources and 10 pipeline stages turn a question into a cited answer

The name is playful — the architecture is not. Nerd Burger is real-time evidence retrieval that surfaces existing systematic reviews, meta-analyses and RCTs — it does not conduct a new systematic review. Here is exactly what happens inside the burger, layer by layer.

What's in the brain right now

AI agents

125

evidence sources

143

trusted sites

pipeline stages

ground-truth layers

AU gov clients

Numbers read live from SOURCE_REGISTRY, REPUTABLE_SITES and DEPARTMENTS on every page load. Plus 3 imported MIT-licensed npm packages from Bond IEBH wired in upcoming sprints.

Scope honesty

What PICO-SEARCH retrieves, and what it doesn't do

There are three tiers of evidence review. PICO-SEARCH is the fast one: it surfaces the top-tier evidence that already exists (systematic reviews, meta-analyses, RCTs, guidelines) and grades them against CEBM in real time. It does NOT conduct a new rapid review or a new Cochrane systematic review (those are multi-reviewer workflows measured in weeks to months). We borrow the PRISMA reporting layout so the output looks familiar, but that is a layout choice, not a methodology claim.

Tier	Effort	Dual-reviewer	PROSPERO	Protocol	PICO-SEARCH
Full Cochrane systematic review	6–18 months	Yes	Yes	Yes	Not this
Rapid review (WHO / Cochrane RRMG)	Days–weeks	Usually no	Optional	Yes	Not this either
Real-time evidence retrieval	Seconds–minutes	No	No	No	This is PICO-SEARCH

What that means in practice. PICO-SEARCH runs the literature search, grades each study against the CEBM hierarchy, and returns a cited rapid evidence summary in a PRISMA-style layout — in under a minute. It is a literature search engine with an evidence-grading layer on top. It is not a medical device. It does not diagnose, treat, or make decisions about individual patients. When the evidence warrants a full systematic review, PICO-SEARCH is the starting point: you export the search strategy + RIS file and escalate into Covidence, Rayyan, RevMan, or DistillerSR to run the full SR workflow with two human reviewers.

The team

One Nerd Burger and 24 Slider specialist agents

Each Slider is a specialist AI agent — its own system prompt, its own ICD-11 chapter scope, its own MeSH tree roots, its own preferred sources. They are not mascots. They are twenty-four domain specialists, each trained for one clinical area.

The router

Nerd Burger (the Big Burger)

router

3-stage router: keyword/regex (24 specialist patterns) → MeSH-tree walk → LLM fallback (Haiku) with department menu. Outputs 1–5 routed Sliders. Never searches evidence directly. Synthesises cross-department answers when multiple Sliders are engaged.

The 24 Slider AI agents

Neurology & Nervous System →
ICD-11 8A00–8E7Z · MeSH C10
Diseases of the central and peripheral nervous system: stroke, dementia, epilepsy, headache, neurodegeneration, neuromuscular disease.
Ophthalmology & Vision →
ICD-11 9A00–9E1Z · MeSH C11
Diseases of the eye, visual system, and adnexa. Glaucoma, macular degeneration, retinopathy, cataract, refractive disorders.
Ear, Nose & Throat →
ICD-11 AA00–AC0Z · MeSH C09
Diseases of the ear, nose, throat, sinus, larynx, and mastoid. Hearing loss, otitis, rhinosinusitis, head & neck infection.
Cardiovascular →
ICD-11 BA00–BE2Z · MeSH C14
Heart and vascular disease: coronary disease, heart failure, arrhythmia, hypertension, lipid disorders, valvular disease, stroke overlap.
Respiratory →
ICD-11 CA00–CB7Z · MeSH C08
Respiratory disease: asthma, COPD, ILD, pneumonia, sleep-disordered breathing, tuberculosis, pulmonary hypertension.
Gastrointestinal & Hepatology →
ICD-11 DA00–DE2Z · MeSH C06
GI tract, liver, biliary, pancreas. IBD, IBS, viral hepatitis, fatty liver, reflux, peptic disease, functional GI.
Endocrine, Nutrition & Metabolic →
ICD-11 5A00–5D46 · MeSH C18, C19
Endocrine, nutrition, metabolism: diabetes, obesity, thyroid, adrenal, pituitary, osteoporosis, lipid, weight-loss pharmacology (GLP-1 etc.).
Renal & Urology →
ICD-11 GA00–GC8Z · MeSH C12, C13
Kidney disease, electrolyte and acid-base disorders, urinary tract, bladder, prostate, stone disease, nephrology overlap with CV and endocrine.
Musculoskeletal & Rheumatology →
ICD-11 FA00–FC0Z · MeSH C05, C17
Bone, joint, muscle, connective tissue, rheumatology. OA, RA, spondyloarthropathy, SLE, fibromyalgia, orthopaedic trauma.
Dermatology →
ICD-11 EA00–EM0Z · MeSH C17
Skin, hair, nails, subcutaneous tissue. Eczema, psoriasis, acne, skin cancer, infections, drug eruptions, paediatric dermatology.
Mental Health & Psychiatry →
ICD-11 6A00–6E8Z · MeSH F03
Mood, anxiety, psychotic, neurodevelopmental, substance use, eating disorders, trauma. Psychopharmacology + psychotherapy evidence.
Obstetrics & Maternity →
ICD-11 JA00–JB6Z · MeSH C13.703
Pregnancy, labour, delivery, postpartum, maternal medicine, prenatal screening, fetal medicine, breastfeeding.
Gynaecology & Women's Health →
ICD-11 GA00–GA4Z · MeSH C13
Female reproductive system, menstrual disorders, PCOS, endometriosis, menopause, contraception, HRT, gynae oncology overlap.
Men's Health →
ICD-11 GB00–GC8Z · MeSH C12
Male reproductive system, andrology, testosterone, erectile dysfunction, prostate, sexual health, male-specific overlap with cardiovascular.
Paediatrics & Child Health →
ICD-11 KA00–KD5Z · MeSH M01.060.406
Neonatal, infant, childhood and adolescent medicine across all organ systems. Developmental, behavioural, growth, vaccination, paediatric oncology overlap.
Geriatrics & Older Persons →
ICD-11 * · MeSH M01.060.116
Medicine of older adults: frailty, falls, polypharmacy, cognitive decline, functional assessment, end-of-life, multimorbidity.
Haematology →
ICD-11 3A00–3C0Z · MeSH C15
Blood and blood-forming organs. Anaemia, clotting and bleeding disorders, haemoglobinopathies, transfusion, haematological malignancy overlap with oncology.
Oncology →
ICD-11 2A00–2F9Z · MeSH C04
Solid tumour and haematological cancer treatment, screening, survivorship, palliative intent. Systemic therapy evidence, immunotherapy, radiation, biomarker testing.
Infectious Disease →
ICD-11 1A00–1H0Z · MeSH C01
Infectious and parasitic disease, antimicrobial therapy and stewardship, global/travel health, sepsis, HIV, TB, hepatitis, emerging pathogens.
Immunology & Allergy →
ICD-11 4A00–4B4Z · MeSH C20
Immune system disorders, primary immunodeficiency, autoimmunity overlap, allergy, anaphylaxis, asthma overlap, immunotherapy.
Emergency & Critical Care →
ICD-11 NA00–NF2Z · MeSH E02.365
Acute resuscitation, trauma, emergency medicine, intensive care, sepsis management, ventilation, shock, mass-casualty triage.
Rehabilitation, Pain & Palliative →
ICD-11 * · MeSH E02.760, G11
Physical rehabilitation, chronic pain management, palliative and end-of-life care, symptom control, hospice, functional restoration.
Public Health & Preventive Medicine →
ICD-11 QA00–QF4Z · MeSH N06
Population health, screening, vaccination, epidemiology, health-promotion interventions, social determinants, cost-effectiveness.
Dental & Oral Health →
ICD-11 DA00–DA0Z · MeSH C07
Teeth, gingiva, oral mucosa, salivary glands, jaw. Caries, periodontal disease, oral cancer screening, paediatric dentistry overlap, orthodontics.

The pipeline

What happens when a question comes in

Ten stages from question intake to the cited answer on screen. Most of them run in parallel. The whole thing finishes in 30–60 seconds.

01
Routing — Nerd Burger 3-stage router
Stage A keyword/regex match against 24 specialist patterns. Stage B MeSH-tree walk for ambiguous cases. Stage C LLM fallback (Haiku) with department menu. Output: one to five routed Sliders.
02
Smart reuse cache check
Normalised question hash lookup via find_recent_search RPC. If the same question (case + punctuation + stopword normalised) was completed within 90 days, reuse the result. Skippable with the ‘fresh search’ checkbox.
03
Literature fan-out — 11 sources in parallel
Each routed Slider runs its own parallel search across PubMed (RCT/SR + dedicated Practice Guideline streams), Europe PMC, ClinicalTrials.gov v2, Semantic Scholar, CORE, Crossref, Epistemonikos. Each source has its own 15-second timeout. One slow source never blocks the rest.
04
Dedupe + CEBM rank
Cross-source dedupe by DOI → PMID → NCT → fuzzy title+year. Then rank: evidence_tier × recency_decay × relevance × Jadad heuristic. Top 15 citations carried forward to synthesis.
3.5
Safety overlay — US drug ground truth
Drug names extracted from the question, resolved to RxNorm, then OpenFDA label sections (indications, contraindications, warnings, adverse reactions, drug interactions, pregnancy) and FAERS top-5 adverse events. Special-population regex flags pregnancy, paediatric, geriatric, renal, hepatic, breastfeeding. Built into a structured promptBlock.
3.6
AU context overlay — Australian regulatory ground truth
PBS API for authority/restriction text + subsidised brand listings, NCTS Ontoserver for SNOMED CT-AU expansion of clinical terms, TGA CKAN discovery + safety alert deep-links. Conditional firing — only the relevant arm runs per question. Built into a structured promptBlock.
05
Authoritative source matcher
Deterministic walker over SOURCE_REGISTRY for any source whose authoritativeFor[] keywords appear in the question. Surfaces eTG, AMH, HealthPathways, PBS API, NCTS, TGA, NHMRC, health.gov.au as click-through banners above the answer. Pure registry walk, no LLM call.
06
Layered LLM synthesis
Anthropic Claude Sonnet 4.6 → OpenAI GPT-5 → Google Gemini 2.5 Pro fallback chain via Vercel AI Gateway. Both clinician (PRISMA Zod schema) and plain-language (Y8 Zod schema with hard minimum character counts) generated in parallel. Single-provider calls are forbidden for the synthesis path.
07
Citation validator
Every [Ref N] pointer in the generated prose is walked and validated against the top-15 citation block. Orphan references (N out of bounds) and unreferenced citations (in the block but not cited in the prose) are both logged. The job still ships, but quality metrics track hallucinations over time.
08
Render
Authoritative banner → safety overlay banner → AU context banner → PRISMA-aligned clinician answer → CEBM pyramid + ranked citations → further reading link-outs → progress trace. Top to bottom.

Three kinds of structured facts

What the Nerd Burger feeds the LLM

When the synthesis layer writes an answer, it has three structured fact layers available. Each layer has its own boundary rules so the model never conflates peer-reviewed evidence with regulatory data.

Layer 1

Literature citations

Always carries `[Ref N]` pointers. The only layer that gets reference numbers in the prose. The LLM is forbidden to fabricate citations or cite N greater than the citation count.

Triggered: Every search where any literature was retrieved

Sources: PubMed + Europe PMC + ClinicalTrials.gov + Semantic Scholar + CORE + Crossref + Epistemonikos

Layer 2

Safety overlay

Structured drug facts from US FDA. Treated as ground truth, NOT as new citations. Inline references like ‘FDA black box warning for…’ are allowed, but no `[Ref N]` is generated for them.

Triggered: When a known drug name is detected in the question

Sources: RxNorm (drug identity) + OpenFDA (labels) + FAERS (adverse events)

Layer 3

AU context overlay

Structured AU regulatory facts. Clinician mode references SNOMED concepts inline as `(SNOMED <code> |<display>|)` and PBS authority status as `(PBS authority required)`. Plain mode explains PBS authority in patient-friendly language. Never gets a `[Ref N]` either.

Triggered: When a drug OR a clinical term is detected (PBS+TGA fire on drugs, NCTS fires on terms)

Sources: PBS API v3 + NCTS Ontoserver (SNOMED CT-AU + AMT) + TGA via data.gov.au CKAN

Why some studies count more

The CEBM evidence pyramid

A large systematic review of randomised trials outranks a single case series or expert opinion. We rank studies the way the Centre for Evidence-Based Medicine specifies — and the ranking changes depending on the question type.

Tier 1aSR / meta-analysis of RCTs×1.00
e.g. Cochrane review of statins for primary prevention
Tier 1bSingle RCT (or SR of inception cohorts for prognosis)×0.85
e.g. JUPITER trial for rosuvastatin
Tier 2aSR of cohort studies×0.70
e.g. SR of cohort studies linking PPI use to fracture
Tier 2bSingle cohort study×0.60
e.g. Framingham Heart Study cohort analysis
Tier 3aSR of case-control studies×0.45
e.g. SR of case-control studies on NSAIDs and AKI
Tier 3bSingle case-control study×0.35
e.g. Case-control of clopidogrel and bleeding
Tier 4Case series / case report×0.25
e.g. Case series of rare drug interactions
Tier 5Expert opinion / narrative review×0.15
e.g. Editorial in NEJM

Question-type weighted: a treatment question puts SR-of-RCTs at tier 1a; a test-accuracy question puts SR-of-validation studies at tier 1a and weights RCTs as tier 2a; a prognosis question puts SR-of-inception cohorts at tier 1a; an aetiology question puts cohort SRs first. The full mapping lives in packages/config/src/evidence-tiers.ts.

The shape of the answer

PRISMA-style layout + citation validation

The clinician answer borrows the PRISMA 2020 reporting layout — the section structure developed for systematic reviews — so readers can scan methods, results and limitations in a familiar shape. PICO-SEARCH retrieves the top-tier evidence (existing systematic reviews, meta-analyses and RCTs) and presents it in PRISMA's section format — it does NOT conduct a new systematic review (that is a 6–18 month dual-reviewer workflow). Every [Ref N] pointer in the prose is checked against the citation block before the answer ships. No fabricated studies. Ever.

The clinician answer is generated via generateObject + a Zod schema modelled on PRISMA 2020 reporting standards. Every section is structurally validated:

Background — minimum character count, sets up the clinical question
Methods — the search strategy, sources searched, dates, study types
Results — narrative synthesis with[Ref N] pointers and an evidence-tier breakdown
Limitations — risk of bias, heterogeneity, gaps in the evidence
Conclusion — GRADE strength rating + practice recommendation
Authoritative sources — registry-matched click-throughs (eTG, AMH, etc.) for any paywalled references

Post-synthesis, the citation validator walks every [Ref N] in the prose against the citation block. Orphans are logged and surfaced. The job ships even with orphans (so the user always gets an answer) but the quality metric tracks hallucination rate over time.

The science

Evidence-Based Medicine, in three minutes

EBM is the discipline of applying the best current evidence in clinical practice. PICO is how the question is structured. CEBM is how the answer is graded. GRADE is how confidence in that answer is expressed. PICO-SEARCH applies all three.

PICO

Population — Intervention — Comparator — Outcome. The structure that turns a vague clinical question (‘should I give statins to my 75-year-old?’) into a searchable one (‘in adults over 70 without cardiovascular disease, do statins reduce all-cause mortality compared with placebo?’).

CEBM hierarchy

The Centre for Evidence-Based Medicine (Oxford) ranks evidence from tier 1a (systematic reviews of RCTs) down to tier 5 (expert opinion). PICO-SEARCH uses the Burns/Rohrich/Chung 2012 mapping with question-type weighting.

GRADE

The Grading of Recommendations Assessment, Development and Evaluation framework. After ranking the evidence, GRADE expresses how confident a clinician should be in the recommendation: high / moderate / low / very low. The clinician answer always includes a GRADE rating.

PRISMA 2020

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 standard. Defines the section layout a systematic review report should use. PICO-SEARCH borrows this LAYOUT for its rapid evidence summaries — we use PRISMA's section headings so clinicians can scan methods/results/limitations in a familiar shape. PICO-SEARCH is NOT a systematic review. We use the reporting shell; we are not claiming the methodological rigour of a full SR.

AGREE II

Appraisal of Guidelines for Research and Evaluation. The standard for assessing whether a clinical practice guideline is well-developed. Used implicitly when ranking guideline citations.

Risk of bias

Cochrane RoB 2.0 for RCTs, Newcastle-Ottawa for cohorts. PICO-SEARCH applies a Jadad heuristic on RCT abstracts as a quality multiplier in the ranking score; full RoB is on the deferred queue.

The bright lines

What the architecture never does

The product is built on a few non-negotiables — encoded in the source code, the system prompts, and the Zod schemas. Not disclaimers at the bottom; actual architectural guardrails.

✗ Never scrape licensed content

eTG, AMH, MIMS, UpToDate, BMJ Best Practice, DynaMed and 12 other commercial references are tier ‘licensed_linkout’. We surface a click-through banner. We never fetch their content. Ever.

✗ Never provide dosing in plain mode

The plain-language system prompt forbids dosing, frequencies, routes, schedules, or titration. If a regression appears, fix the prompt before shipping.

✗ Never fabricate citations

Citation validator walks every `[Ref N]` pointer in the prose against the top-15 citation block. Orphan references are logged. Synthesis schemas carry hard min character counts so the LLM cannot produce a shallow placeholder answer.

✗ Never give a verdict on an individual

Plain answers use ‘studies suggest’, ‘evidence indicates’, ‘researchers found’ — never ‘you have…’. The clinician answer is an evidence summary built from published literature. It is not a verdict on any individual person or case.

✗ Never store subscription credentials

User subscription preferences (eTG / AMH / UpToDate / BMJ Best Practice / DynaMed / Cochrane / NICE) are boolean flags only. We never store passwords or tokens for licensed third parties. Ever.

✗ Never use a single LLM provider for synthesis

The synthesis path requires the Anthropic → OpenAI → Google fallback chain. Single-provider calls are forbidden in code review. Resilience + clinical safety.

The honest gaps

What the brain doesn't have yet

Every shipping product has open work. Here is what is still on the roadmap, ranked by leverage.

1
PubTator + Unpaywall source clients
Free PDF link button on every citation card via Unpaywall. Entity-tag chips on PubMed cards via PubTator 3.0. Both Apache/MIT, both free REST, both shippable in one commit. ETA: next sprint.
2
NCTS Syndication TS port
Pulls the NCTS Atom feed daily and stamps every synthesis answer with ‘Pinned to AMT v3 release YYYY-MM’. Defensible against ‘your data is stale’ critique. ETA: ~1 week.
3
@iebh/sra-polyglot live in-browser
Bond IEBH ship an MIT npm package that translates any PubMed search query into Ovid, Embase, Cochrane, CINAHL, Web of Science, Scopus syntaxes. Replace our Polyglot link-out with the live translator. ETA: 1 commit.
4
BioLinkBERT local re-ranker
Stanford LinkBERT (Apache-2.0) outperforms PubMedBERT on BLURB. Local re-ranking pass between source fan-out and LLM synthesis cuts Sonnet token bill by ~40-60% per query without hurting answer quality. ETA: ~1 week.
5
Regression suite — EBM-NLP + MS² + MedReview
Hard PICO span F1 + ROUGE numbers we can defend against the surveyed open SR-automation tools. ETA: 3-5 days.

What PICO-SEARCH retrieves, and what it doesn't do

One Nerd Burger and 24 Slider specialist agents

Nerd Burger (the Big Burger)

What happens when a question comes in

Routing — Nerd Burger 3-stage router

Smart reuse cache check

Literature fan-out — 11 sources in parallel

Dedupe + CEBM rank

Safety overlay — US drug ground truth

AU context overlay — Australian regulatory ground truth

Authoritative source matcher

Layered LLM synthesis

Citation validator

Render