How it works

How the PICO-SEARCH clinical brain works

A 25-agent federated literature search + LLM-assisted rapid evidence summary pipeline. Real-time evidence retrieval — surfaces existing systematic reviews, meta-analyses and RCTs; does NOT conduct a new systematic review (that is a 6–18 month workflow). PRISMA-style layout, citation-validated, AU regulatory context-aware.

What's in the brain right now

AI agents

125

evidence sources

143

trusted sites

pipeline stages

ground-truth layers

AU gov clients

Numbers read live from SOURCE_REGISTRY, REPUTABLE_SITES and DEPARTMENTS on every page load. Plus 3 imported MIT-licensed npm packages from Bond IEBH wired in upcoming sprints.

Scope honesty

What PICO-SEARCH retrieves, and what it doesn't do

There are three tiers of evidence review. PICO-SEARCH is the fast one: it surfaces the top-tier evidence that already exists (systematic reviews, meta-analyses, RCTs, guidelines) and grades them against CEBM in real time. It does NOT conduct a new rapid review or a new Cochrane systematic review (those are multi-reviewer workflows measured in weeks to months). We borrow the PRISMA reporting layout so the output looks familiar, but that is a layout choice, not a methodology claim.

Tier	Effort	Dual-reviewer	PROSPERO	Protocol	PICO-SEARCH
Full Cochrane systematic review	6–18 months	Yes	Yes	Yes	Not this
Rapid review (WHO / Cochrane RRMG)	Days–weeks	Usually no	Optional	Yes	Not this either
Real-time evidence retrieval	Seconds–minutes	No	No	No	This is PICO-SEARCH

What that means in practice. PICO-SEARCH runs the literature search, grades each study against the CEBM hierarchy, and returns a cited rapid evidence summary in a PRISMA-style layout — in under a minute. It is a literature search engine with an evidence-grading layer on top. It is not a medical device. It does not diagnose, treat, or make decisions about individual patients. When the evidence warrants a full systematic review, PICO-SEARCH is the starting point: you export the search strategy + RIS file and escalate into Covidence, Rayyan, RevMan, or DistillerSR to run the full SR workflow with two human reviewers.

Architecture

25 specialist AI agents — Nerd Burger router + 24 ICD-11 specialists

The Nerd Burger is a 3-stage router (keyword → MeSH → LLM fallback). Each of the 24 specialists is a Claude agent with its own system prompt, ICD-11 chapter scope, MeSH tree roots, and preferred sources. Compound questions fan out to multiple specialists in parallel.

The router

Nerd Burger (router)

router

3-stage router: keyword/regex (24 specialist patterns) → MeSH-tree walk → LLM fallback (Haiku) with department menu. Outputs 1–5 routed Sliders. Never searches evidence directly. Synthesises cross-department answers when multiple Sliders are engaged.

The 24 Slider AI agents

Neurology & Nervous System →
ICD-11 8A00–8E7Z · MeSH C10
Diseases of the central and peripheral nervous system: stroke, dementia, epilepsy, headache, neurodegeneration, neuromuscular disease.
Ophthalmology & Vision →
ICD-11 9A00–9E1Z · MeSH C11
Diseases of the eye, visual system, and adnexa. Glaucoma, macular degeneration, retinopathy, cataract, refractive disorders.
Ear, Nose & Throat →
ICD-11 AA00–AC0Z · MeSH C09
Diseases of the ear, nose, throat, sinus, larynx, and mastoid. Hearing loss, otitis, rhinosinusitis, head & neck infection.
Cardiovascular →
ICD-11 BA00–BE2Z · MeSH C14
Heart and vascular disease: coronary disease, heart failure, arrhythmia, hypertension, lipid disorders, valvular disease, stroke overlap.
Respiratory →
ICD-11 CA00–CB7Z · MeSH C08
Respiratory disease: asthma, COPD, ILD, pneumonia, sleep-disordered breathing, tuberculosis, pulmonary hypertension.
Gastrointestinal & Hepatology →
ICD-11 DA00–DE2Z · MeSH C06
GI tract, liver, biliary, pancreas. IBD, IBS, viral hepatitis, fatty liver, reflux, peptic disease, functional GI.
Endocrine, Nutrition & Metabolic →
ICD-11 5A00–5D46 · MeSH C18, C19
Endocrine, nutrition, metabolism: diabetes, obesity, thyroid, adrenal, pituitary, osteoporosis, lipid, weight-loss pharmacology (GLP-1 etc.).
Renal & Urology →
ICD-11 GA00–GC8Z · MeSH C12, C13
Kidney disease, electrolyte and acid-base disorders, urinary tract, bladder, prostate, stone disease, nephrology overlap with CV and endocrine.
Musculoskeletal & Rheumatology →
ICD-11 FA00–FC0Z · MeSH C05, C17
Bone, joint, muscle, connective tissue, rheumatology. OA, RA, spondyloarthropathy, SLE, fibromyalgia, orthopaedic trauma.
Dermatology →
ICD-11 EA00–EM0Z · MeSH C17
Skin, hair, nails, subcutaneous tissue. Eczema, psoriasis, acne, skin cancer, infections, drug eruptions, paediatric dermatology.
Mental Health & Psychiatry →
ICD-11 6A00–6E8Z · MeSH F03
Mood, anxiety, psychotic, neurodevelopmental, substance use, eating disorders, trauma. Psychopharmacology + psychotherapy evidence.
Obstetrics & Maternity →
ICD-11 JA00–JB6Z · MeSH C13.703
Pregnancy, labour, delivery, postpartum, maternal medicine, prenatal screening, fetal medicine, breastfeeding.
Gynaecology & Women's Health →
ICD-11 GA00–GA4Z · MeSH C13
Female reproductive system, menstrual disorders, PCOS, endometriosis, menopause, contraception, HRT, gynae oncology overlap.
Men's Health →
ICD-11 GB00–GC8Z · MeSH C12
Male reproductive system, andrology, testosterone, erectile dysfunction, prostate, sexual health, male-specific overlap with cardiovascular.
Paediatrics & Child Health →
ICD-11 KA00–KD5Z · MeSH M01.060.406
Neonatal, infant, childhood and adolescent medicine across all organ systems. Developmental, behavioural, growth, vaccination, paediatric oncology overlap.
Geriatrics & Older Persons →
ICD-11 * · MeSH M01.060.116
Medicine of older adults: frailty, falls, polypharmacy, cognitive decline, functional assessment, end-of-life, multimorbidity.
Haematology →
ICD-11 3A00–3C0Z · MeSH C15
Blood and blood-forming organs. Anaemia, clotting and bleeding disorders, haemoglobinopathies, transfusion, haematological malignancy overlap with oncology.
Oncology →
ICD-11 2A00–2F9Z · MeSH C04
Solid tumour and haematological cancer treatment, screening, survivorship, palliative intent. Systemic therapy evidence, immunotherapy, radiation, biomarker testing.
Infectious Disease →
ICD-11 1A00–1H0Z · MeSH C01
Infectious and parasitic disease, antimicrobial therapy and stewardship, global/travel health, sepsis, HIV, TB, hepatitis, emerging pathogens.
Immunology & Allergy →
ICD-11 4A00–4B4Z · MeSH C20
Immune system disorders, primary immunodeficiency, autoimmunity overlap, allergy, anaphylaxis, asthma overlap, immunotherapy.
Emergency & Critical Care →
ICD-11 NA00–NF2Z · MeSH E02.365
Acute resuscitation, trauma, emergency medicine, intensive care, sepsis management, ventilation, shock, mass-casualty triage.
Rehabilitation, Pain & Palliative →
ICD-11 * · MeSH E02.760, G11
Physical rehabilitation, chronic pain management, palliative and end-of-life care, symptom control, hospice, functional restoration.
Public Health & Preventive Medicine →
ICD-11 QA00–QF4Z · MeSH N06
Population health, screening, vaccination, epidemiology, health-promotion interventions, social determinants, cost-effectiveness.
Dental & Oral Health →
ICD-11 DA00–DA0Z · MeSH C07
Teeth, gingiva, oral mucosa, salivary glands, jaw. Caries, periodontal disease, oral cancer screening, paediatric dentistry overlap, orthodontics.

Pipeline

10-stage live federation + synthesis

From query intake to validated cited answer in 30–60 seconds. Stages 3–4 fan out across 11 free APIs in parallel; stages 3.5–3.6 inject structured ground truth from US safety and AU regulatory sources; stage 5 runs the layered LLM chain with PRISMA schema enforcement.

01
Routing — Nerd Burger 3-stage router
Stage A keyword/regex match against 24 specialist patterns. Stage B MeSH-tree walk for ambiguous cases. Stage C LLM fallback (Haiku) with department menu. Output: one to five routed Sliders.
02
Smart reuse cache check
Normalised question hash lookup via find_recent_search RPC. If the same question (case + punctuation + stopword normalised) was completed within 90 days, reuse the result. Skippable with the ‘fresh search’ checkbox.
03
Literature fan-out — 11 sources in parallel
Each routed Slider runs its own parallel search across PubMed (RCT/SR + dedicated Practice Guideline streams), Europe PMC, ClinicalTrials.gov v2, Semantic Scholar, CORE, Crossref, Epistemonikos. Each source has its own 15-second timeout. One slow source never blocks the rest.
04
Dedupe + CEBM rank
Cross-source dedupe by DOI → PMID → NCT → fuzzy title+year. Then rank: evidence_tier × recency_decay × relevance × Jadad heuristic. Top 15 citations carried forward to synthesis.
3.5
Safety overlay — US drug ground truth
Drug names extracted from the question, resolved to RxNorm, then OpenFDA label sections (indications, contraindications, warnings, adverse reactions, drug interactions, pregnancy) and FAERS top-5 adverse events. Special-population regex flags pregnancy, paediatric, geriatric, renal, hepatic, breastfeeding. Built into a structured promptBlock.
3.6
AU context overlay — Australian regulatory ground truth
PBS API for authority/restriction text + subsidised brand listings, NCTS Ontoserver for SNOMED CT-AU expansion of clinical terms, TGA CKAN discovery + safety alert deep-links. Conditional firing — only the relevant arm runs per question. Built into a structured promptBlock.
05
Authoritative source matcher
Deterministic walker over SOURCE_REGISTRY for any source whose authoritativeFor[] keywords appear in the question. Surfaces eTG, AMH, HealthPathways, PBS API, NCTS, TGA, NHMRC, health.gov.au as click-through banners above the answer. Pure registry walk, no LLM call.
06
Layered LLM synthesis
Anthropic Claude Sonnet 4.6 → OpenAI GPT-5 → Google Gemini 2.5 Pro fallback chain via Vercel AI Gateway. Both clinician (PRISMA Zod schema) and plain-language (Y8 Zod schema with hard minimum character counts) generated in parallel. Single-provider calls are forbidden for the synthesis path.
07
Citation validator
Every [Ref N] pointer in the generated prose is walked and validated against the top-15 citation block. Orphan references (N out of bounds) and unreferenced citations (in the block but not cited in the prose) are both logged. The job still ships, but quality metrics track hallucinations over time.
08
Render
Authoritative banner → safety overlay banner → AU context banner → PRISMA-aligned clinician answer → CEBM pyramid + ranked citations → further reading link-outs → progress trace. Top to bottom.

Ground truth

Three structured fact layers in the synthesis prompt

Literature citations, safety overlay, and AU context overlay. Each layer has its own boundary rules in the LLM prompt so the model never confuses regulatory data with peer-reviewed evidence.

Layer 1

Literature citations

Always carries `[Ref N]` pointers. The only layer that gets reference numbers in the prose. The LLM is forbidden to fabricate citations or cite N greater than the citation count.

Triggered: Every search where any literature was retrieved

Sources: PubMed + Europe PMC + ClinicalTrials.gov + Semantic Scholar + CORE + Crossref + Epistemonikos

Layer 2

Safety overlay

Structured drug facts from US FDA. Treated as ground truth, NOT as new citations. Inline references like ‘FDA black box warning for…’ are allowed, but no `[Ref N]` is generated for them.

Triggered: When a known drug name is detected in the question

Sources: RxNorm (drug identity) + OpenFDA (labels) + FAERS (adverse events)

Layer 3

AU context overlay

Structured AU regulatory facts. Clinician mode references SNOMED concepts inline as `(SNOMED <code> |<display>|)` and PBS authority status as `(PBS authority required)`. Plain mode explains PBS authority in patient-friendly language. Never gets a `[Ref N]` either.

Triggered: When a drug OR a clinical term is detected (PBS+TGA fire on drugs, NCTS fires on terms)

Sources: PBS API v3 + NCTS Ontoserver (SNOMED CT-AU + AMT) + TGA via data.gov.au CKAN

Ranking

CEBM evidence pyramid — question-type weighted

Burns/Rohrich/Chung 2012 hierarchy. Tier weights from 1.0 (1a SR-of-RCTs) to 0.15 (5 expert opinion). Question-type-specific mappings: a treatment RCT is tier 1b; a test-accuracy RCT is tier 2a — test-accuracy questions weight validation studies at 1b instead.

Tier 1aSR / meta-analysis of RCTs×1.00
e.g. Cochrane review of statins for primary prevention
Tier 1bSingle RCT (or SR of inception cohorts for prognosis)×0.85
e.g. JUPITER trial for rosuvastatin
Tier 2aSR of cohort studies×0.70
e.g. SR of cohort studies linking PPI use to fracture
Tier 2bSingle cohort study×0.60
e.g. Framingham Heart Study cohort analysis
Tier 3aSR of case-control studies×0.45
e.g. SR of case-control studies on NSAIDs and AKI
Tier 3bSingle case-control study×0.35
e.g. Case-control of clopidogrel and bleeding
Tier 4Case series / case report×0.25
e.g. Case series of rare drug interactions
Tier 5Expert opinion / narrative review×0.15
e.g. Editorial in NEJM

Question-type weighted: a treatment question puts SR-of-RCTs at tier 1a; a test-accuracy question puts SR-of-validation studies at tier 1a and weights RCTs as tier 2a; a prognosis question puts SR-of-inception cohorts at tier 1a; an aetiology question puts cohort SRs first. The full mapping lives in packages/config/src/evidence-tiers.ts.

Synthesis schema

PRISMA-style layout + citation validator

The rapid evidence summary is generated via generateObject + a Zod schema modelled on the PRISMA 2020 reporting layout. We borrow the section structure so clinicians can scan methods/results/limitations in a familiar shape — we are NOT claiming to produce a systematic review. Post-synthesis, every [Ref N] pointer in the prose is walked and validated against the citation block. Orphan references are logged and surfaced on the result page for review.

The clinician answer is generated via generateObject + a Zod schema modelled on PRISMA 2020 reporting standards. Every section is structurally validated:

Background — minimum character count, sets up the clinical question
Methods — the search strategy, sources searched, dates, study types
Results — narrative synthesis with[Ref N] pointers and an evidence-tier breakdown
Limitations — risk of bias, heterogeneity, gaps in the evidence
Conclusion — GRADE strength rating + practice recommendation
Authoritative sources — registry-matched click-throughs (eTG, AMH, etc.) for any paywalled references

Post-synthesis, the citation validator walks every [Ref N] in the prose against the citation block. Orphans are logged and surfaced. The job ships even with orphans (so the user always gets an answer) but the quality metric tracks hallucination rate over time.

Methods

Evidence-Based Medicine — the underlying discipline

PICO question structure (Population / Intervention / Comparator / Outcome). CEBM evidence hierarchy. GRADE strength of recommendation. AGREE II for guideline appraisal. PRISMA 2020 for synthesis reporting. PICO-SEARCH applies all five.

PICO

Population — Intervention — Comparator — Outcome. The structure that turns a vague clinical question (‘should I give statins to my 75-year-old?’) into a searchable one (‘in adults over 70 without cardiovascular disease, do statins reduce all-cause mortality compared with placebo?’).

CEBM hierarchy

The Centre for Evidence-Based Medicine (Oxford) ranks evidence from tier 1a (systematic reviews of RCTs) down to tier 5 (expert opinion). PICO-SEARCH uses the Burns/Rohrich/Chung 2012 mapping with question-type weighting.

GRADE

The Grading of Recommendations Assessment, Development and Evaluation framework. After ranking the evidence, GRADE expresses how confident a clinician should be in the recommendation: high / moderate / low / very low. The clinician answer always includes a GRADE rating.

PRISMA 2020

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 standard. Defines the section layout a systematic review report should use. PICO-SEARCH borrows this LAYOUT for its rapid evidence summaries — we use PRISMA's section headings so clinicians can scan methods/results/limitations in a familiar shape. PICO-SEARCH is NOT a systematic review. We use the reporting shell; we are not claiming the methodological rigour of a full SR.

AGREE II

Appraisal of Guidelines for Research and Evaluation. The standard for assessing whether a clinical practice guideline is well-developed. Used implicitly when ranking guideline citations.

Risk of bias

Cochrane RoB 2.0 for RCTs, Newcastle-Ottawa for cohorts. PICO-SEARCH applies a Jadad heuristic on RCT abstracts as a quality multiplier in the ranking score; full RoB is on the deferred queue.

Compliance posture

Hard rules — what the brain never does

The legal and clinical safety boundaries. Encoded in source code, system prompts, and Zod schemas. Auditable.

✗ Never scrape licensed content

eTG, AMH, MIMS, UpToDate, BMJ Best Practice, DynaMed and 12 other commercial references are tier ‘licensed_linkout’. We surface a click-through banner. We never fetch their content. Ever.

✗ Never provide dosing in plain mode

The plain-language system prompt forbids dosing, frequencies, routes, schedules, or titration. If a regression appears, fix the prompt before shipping.

✗ Never fabricate citations

Citation validator walks every `[Ref N]` pointer in the prose against the top-15 citation block. Orphan references are logged. Synthesis schemas carry hard min character counts so the LLM cannot produce a shallow placeholder answer.

✗ Never give a verdict on an individual

Plain answers use ‘studies suggest’, ‘evidence indicates’, ‘researchers found’ — never ‘you have…’. The clinician answer is an evidence summary built from published literature. It is not a verdict on any individual person or case.

✗ Never store subscription credentials

User subscription preferences (eTG / AMH / UpToDate / BMJ Best Practice / DynaMed / Cochrane / NICE) are boolean flags only. We never store passwords or tokens for licensed third parties. Ever.

✗ Never use a single LLM provider for synthesis

The synthesis path requires the Anthropic → OpenAI → Google fallback chain. Single-provider calls are forbidden in code review. Resilience + clinical safety.

Roadmap

What's deferred — ranked by leverage

Every shipping product has gaps. Here are PICO-SEARCH's, ranked. Honest disclosure is part of the trust posture.

1
PubTator + Unpaywall source clients
Free PDF link button on every citation card via Unpaywall. Entity-tag chips on PubMed cards via PubTator 3.0. Both Apache/MIT, both free REST, both shippable in one commit. ETA: next sprint.
2
NCTS Syndication TS port
Pulls the NCTS Atom feed daily and stamps every synthesis answer with ‘Pinned to AMT v3 release YYYY-MM’. Defensible against ‘your data is stale’ critique. ETA: ~1 week.
3
@iebh/sra-polyglot live in-browser
Bond IEBH ship an MIT npm package that translates any PubMed search query into Ovid, Embase, Cochrane, CINAHL, Web of Science, Scopus syntaxes. Replace our Polyglot link-out with the live translator. ETA: 1 commit.
4
BioLinkBERT local re-ranker
Stanford LinkBERT (Apache-2.0) outperforms PubMedBERT on BLURB. Local re-ranking pass between source fan-out and LLM synthesis cuts Sonnet token bill by ~40-60% per query without hurting answer quality. ETA: ~1 week.
5
Regression suite — EBM-NLP + MS² + MedReview
Hard PICO span F1 + ROUGE numbers we can defend against the surveyed open SR-automation tools. ETA: 3-5 days.

What PICO-SEARCH retrieves, and what it doesn't do

25 specialist AI agents — Nerd Burger router + 24 ICD-11 specialists

Nerd Burger (router)

10-stage live federation + synthesis

Routing — Nerd Burger 3-stage router

Smart reuse cache check

Literature fan-out — 11 sources in parallel

Dedupe + CEBM rank

Safety overlay — US drug ground truth

AU context overlay — Australian regulatory ground truth

Authoritative source matcher

Layered LLM synthesis

Citation validator

Render