How the PICO-SEARCH clinical brain works
A 25-agent federated literature search + LLM-assisted rapid evidence summary pipeline. Real-time evidence retrieval — surfaces existing systematic reviews, meta-analyses and RCTs; does NOT conduct a new systematic review (that is a 6–18 month workflow). PRISMA-style layout, citation-validated, AU regulatory context-aware.
Numbers read live from SOURCE_REGISTRY, REPUTABLE_SITES and DEPARTMENTS on every page load. Plus 3 imported MIT-licensed npm packages from Bond IEBH wired in upcoming sprints.
What PICO-SEARCH retrieves, and what it doesn't do
There are three tiers of evidence review. PICO-SEARCH is the fast one: it surfaces the top-tier evidence that already exists (systematic reviews, meta-analyses, RCTs, guidelines) and grades them against CEBM in real time. It does NOT conduct a new rapid review or a new Cochrane systematic review (those are multi-reviewer workflows measured in weeks to months). We borrow the PRISMA reporting layout so the output looks familiar, but that is a layout choice, not a methodology claim.
| Tier | Effort | Dual-reviewer | PROSPERO | Protocol | PICO-SEARCH |
|---|---|---|---|---|---|
| Full Cochrane systematic review | 6–18 months | Yes | Yes | Yes | Not this |
| Rapid review (WHO / Cochrane RRMG) | Days–weeks | Usually no | Optional | Yes | Not this either |
| Real-time evidence retrieval | Seconds–minutes | No | No | No | This is PICO-SEARCH |
25 specialist AI agents — Nerd Burger router + 24 ICD-11 specialists
The Nerd Burger is a 3-stage router (keyword → MeSH → LLM fallback). Each of the 24 specialists is a Claude agent with its own system prompt, ICD-11 chapter scope, MeSH tree roots, and preferred sources. Compound questions fan out to multiple specialists in parallel.
Nerd Burger (router)
router3-stage router: keyword/regex (24 specialist patterns) → MeSH-tree walk → LLM fallback (Haiku) with department menu. Outputs 1–5 routed Sliders. Never searches evidence directly. Synthesises cross-department answers when multiple Sliders are engaged.
- Neurology & Nervous System →ICD-11 8A00–8E7Z · MeSH C10
Diseases of the central and peripheral nervous system: stroke, dementia, epilepsy, headache, neurodegeneration, neuromuscular disease.
- Ophthalmology & Vision →ICD-11 9A00–9E1Z · MeSH C11
Diseases of the eye, visual system, and adnexa. Glaucoma, macular degeneration, retinopathy, cataract, refractive disorders.
- Ear, Nose & Throat →ICD-11 AA00–AC0Z · MeSH C09
Diseases of the ear, nose, throat, sinus, larynx, and mastoid. Hearing loss, otitis, rhinosinusitis, head & neck infection.
- Cardiovascular →ICD-11 BA00–BE2Z · MeSH C14
Heart and vascular disease: coronary disease, heart failure, arrhythmia, hypertension, lipid disorders, valvular disease, stroke overlap.
- Respiratory →ICD-11 CA00–CB7Z · MeSH C08
Respiratory disease: asthma, COPD, ILD, pneumonia, sleep-disordered breathing, tuberculosis, pulmonary hypertension.
- Gastrointestinal & Hepatology →ICD-11 DA00–DE2Z · MeSH C06
GI tract, liver, biliary, pancreas. IBD, IBS, viral hepatitis, fatty liver, reflux, peptic disease, functional GI.
- Endocrine, Nutrition & Metabolic →ICD-11 5A00–5D46 · MeSH C18, C19
Endocrine, nutrition, metabolism: diabetes, obesity, thyroid, adrenal, pituitary, osteoporosis, lipid, weight-loss pharmacology (GLP-1 etc.).
- Renal & Urology →ICD-11 GA00–GC8Z · MeSH C12, C13
Kidney disease, electrolyte and acid-base disorders, urinary tract, bladder, prostate, stone disease, nephrology overlap with CV and endocrine.
- Musculoskeletal & Rheumatology →ICD-11 FA00–FC0Z · MeSH C05, C17
Bone, joint, muscle, connective tissue, rheumatology. OA, RA, spondyloarthropathy, SLE, fibromyalgia, orthopaedic trauma.
- Dermatology →ICD-11 EA00–EM0Z · MeSH C17
Skin, hair, nails, subcutaneous tissue. Eczema, psoriasis, acne, skin cancer, infections, drug eruptions, paediatric dermatology.
- Mental Health & Psychiatry →ICD-11 6A00–6E8Z · MeSH F03
Mood, anxiety, psychotic, neurodevelopmental, substance use, eating disorders, trauma. Psychopharmacology + psychotherapy evidence.
- Obstetrics & Maternity →ICD-11 JA00–JB6Z · MeSH C13.703
Pregnancy, labour, delivery, postpartum, maternal medicine, prenatal screening, fetal medicine, breastfeeding.
- Gynaecology & Women's Health →ICD-11 GA00–GA4Z · MeSH C13
Female reproductive system, menstrual disorders, PCOS, endometriosis, menopause, contraception, HRT, gynae oncology overlap.
- Men's Health →ICD-11 GB00–GC8Z · MeSH C12
Male reproductive system, andrology, testosterone, erectile dysfunction, prostate, sexual health, male-specific overlap with cardiovascular.
- Paediatrics & Child Health →ICD-11 KA00–KD5Z · MeSH M01.060.406
Neonatal, infant, childhood and adolescent medicine across all organ systems. Developmental, behavioural, growth, vaccination, paediatric oncology overlap.
- Geriatrics & Older Persons →ICD-11 * · MeSH M01.060.116
Medicine of older adults: frailty, falls, polypharmacy, cognitive decline, functional assessment, end-of-life, multimorbidity.
- Haematology →ICD-11 3A00–3C0Z · MeSH C15
Blood and blood-forming organs. Anaemia, clotting and bleeding disorders, haemoglobinopathies, transfusion, haematological malignancy overlap with oncology.
- Oncology →ICD-11 2A00–2F9Z · MeSH C04
Solid tumour and haematological cancer treatment, screening, survivorship, palliative intent. Systemic therapy evidence, immunotherapy, radiation, biomarker testing.
- Infectious Disease →ICD-11 1A00–1H0Z · MeSH C01
Infectious and parasitic disease, antimicrobial therapy and stewardship, global/travel health, sepsis, HIV, TB, hepatitis, emerging pathogens.
- Immunology & Allergy →ICD-11 4A00–4B4Z · MeSH C20
Immune system disorders, primary immunodeficiency, autoimmunity overlap, allergy, anaphylaxis, asthma overlap, immunotherapy.
- Emergency & Critical Care →ICD-11 NA00–NF2Z · MeSH E02.365
Acute resuscitation, trauma, emergency medicine, intensive care, sepsis management, ventilation, shock, mass-casualty triage.
- Rehabilitation, Pain & Palliative →ICD-11 * · MeSH E02.760, G11
Physical rehabilitation, chronic pain management, palliative and end-of-life care, symptom control, hospice, functional restoration.
- Public Health & Preventive Medicine →ICD-11 QA00–QF4Z · MeSH N06
Population health, screening, vaccination, epidemiology, health-promotion interventions, social determinants, cost-effectiveness.
- Dental & Oral Health →ICD-11 DA00–DA0Z · MeSH C07
Teeth, gingiva, oral mucosa, salivary glands, jaw. Caries, periodontal disease, oral cancer screening, paediatric dentistry overlap, orthodontics.
10-stage live federation + synthesis
From query intake to validated cited answer in 30–60 seconds. Stages 3–4 fan out across 11 free APIs in parallel; stages 3.5–3.6 inject structured ground truth from US safety and AU regulatory sources; stage 5 runs the layered LLM chain with PRISMA schema enforcement.
- 01
Routing — Nerd Burger 3-stage router
Stage A keyword/regex match against 24 specialist patterns. Stage B MeSH-tree walk for ambiguous cases. Stage C LLM fallback (Haiku) with department menu. Output: one to five routed Sliders.
- 02
Smart reuse cache check
Normalised question hash lookup via find_recent_search RPC. If the same question (case + punctuation + stopword normalised) was completed within 90 days, reuse the result. Skippable with the ‘fresh search’ checkbox.
- 03
Literature fan-out — 11 sources in parallel
Each routed Slider runs its own parallel search across PubMed (RCT/SR + dedicated Practice Guideline streams), Europe PMC, ClinicalTrials.gov v2, Semantic Scholar, CORE, Crossref, Epistemonikos. Each source has its own 15-second timeout. One slow source never blocks the rest.
- 04
Dedupe + CEBM rank
Cross-source dedupe by DOI → PMID → NCT → fuzzy title+year. Then rank: evidence_tier × recency_decay × relevance × Jadad heuristic. Top 15 citations carried forward to synthesis.
- 3.5
Safety overlay — US drug ground truth
Drug names extracted from the question, resolved to RxNorm, then OpenFDA label sections (indications, contraindications, warnings, adverse reactions, drug interactions, pregnancy) and FAERS top-5 adverse events. Special-population regex flags pregnancy, paediatric, geriatric, renal, hepatic, breastfeeding. Built into a structured promptBlock.
- 3.6
AU context overlay — Australian regulatory ground truth
PBS API for authority/restriction text + subsidised brand listings, NCTS Ontoserver for SNOMED CT-AU expansion of clinical terms, TGA CKAN discovery + safety alert deep-links. Conditional firing — only the relevant arm runs per question. Built into a structured promptBlock.
- 05
Authoritative source matcher
Deterministic walker over SOURCE_REGISTRY for any source whose authoritativeFor[] keywords appear in the question. Surfaces eTG, AMH, HealthPathways, PBS API, NCTS, TGA, NHMRC, health.gov.au as click-through banners above the answer. Pure registry walk, no LLM call.
- 06
Layered LLM synthesis
Anthropic Claude Sonnet 4.6 → OpenAI GPT-5 → Google Gemini 2.5 Pro fallback chain via Vercel AI Gateway. Both clinician (PRISMA Zod schema) and plain-language (Y8 Zod schema with hard minimum character counts) generated in parallel. Single-provider calls are forbidden for the synthesis path.
- 07
Citation validator
Every [Ref N] pointer in the generated prose is walked and validated against the top-15 citation block. Orphan references (N out of bounds) and unreferenced citations (in the block but not cited in the prose) are both logged. The job still ships, but quality metrics track hallucinations over time.
- 08
Render
Authoritative banner → safety overlay banner → AU context banner → PRISMA-aligned clinician answer → CEBM pyramid + ranked citations → further reading link-outs → progress trace. Top to bottom.
Three structured fact layers in the synthesis prompt
Literature citations, safety overlay, and AU context overlay. Each layer has its own boundary rules in the LLM prompt so the model never confuses regulatory data with peer-reviewed evidence.
Literature citations
Always carries `[Ref N]` pointers. The only layer that gets reference numbers in the prose. The LLM is forbidden to fabricate citations or cite N greater than the citation count.
Safety overlay
Structured drug facts from US FDA. Treated as ground truth, NOT as new citations. Inline references like ‘FDA black box warning for…’ are allowed, but no `[Ref N]` is generated for them.
AU context overlay
Structured AU regulatory facts. Clinician mode references SNOMED concepts inline as `(SNOMED <code> |<display>|)` and PBS authority status as `(PBS authority required)`. Plain mode explains PBS authority in patient-friendly language. Never gets a `[Ref N]` either.
CEBM evidence pyramid — question-type weighted
Burns/Rohrich/Chung 2012 hierarchy. Tier weights from 1.0 (1a SR-of-RCTs) to 0.15 (5 expert opinion). Question-type-specific mappings: a treatment RCT is tier 1b; a test-accuracy RCT is tier 2a — test-accuracy questions weight validation studies at 1b instead.
- Tier 1aSR / meta-analysis of RCTs×1.00e.g. Cochrane review of statins for primary prevention
- Tier 1bSingle RCT (or SR of inception cohorts for prognosis)×0.85e.g. JUPITER trial for rosuvastatin
- Tier 2aSR of cohort studies×0.70e.g. SR of cohort studies linking PPI use to fracture
- Tier 2bSingle cohort study×0.60e.g. Framingham Heart Study cohort analysis
- Tier 3aSR of case-control studies×0.45e.g. SR of case-control studies on NSAIDs and AKI
- Tier 3bSingle case-control study×0.35e.g. Case-control of clopidogrel and bleeding
- Tier 4Case series / case report×0.25e.g. Case series of rare drug interactions
- Tier 5Expert opinion / narrative review×0.15e.g. Editorial in NEJM
Question-type weighted: a treatment question puts SR-of-RCTs at tier 1a; a test-accuracy question puts SR-of-validation studies at tier 1a and weights RCTs as tier 2a; a prognosis question puts SR-of-inception cohorts at tier 1a; an aetiology question puts cohort SRs first. The full mapping lives in packages/config/src/evidence-tiers.ts.
PRISMA-style layout + citation validator
The rapid evidence summary is generated via generateObject + a Zod schema modelled on the PRISMA 2020 reporting layout. We borrow the section structure so clinicians can scan methods/results/limitations in a familiar shape — we are NOT claiming to produce a systematic review. Post-synthesis, every [Ref N] pointer in the prose is walked and validated against the citation block. Orphan references are logged and surfaced on the result page for review.
The clinician answer is generated via generateObject + a Zod schema modelled on PRISMA 2020 reporting standards. Every section is structurally validated:
- Background — minimum character count, sets up the clinical question
- Methods — the search strategy, sources searched, dates, study types
- Results — narrative synthesis with[Ref N] pointers and an evidence-tier breakdown
- Limitations — risk of bias, heterogeneity, gaps in the evidence
- Conclusion — GRADE strength rating + practice recommendation
- Authoritative sources — registry-matched click-throughs (eTG, AMH, etc.) for any paywalled references
Post-synthesis, the citation validator walks every [Ref N] in the prose against the citation block. Orphans are logged and surfaced. The job ships even with orphans (so the user always gets an answer) but the quality metric tracks hallucination rate over time.
Evidence-Based Medicine — the underlying discipline
PICO question structure (Population / Intervention / Comparator / Outcome). CEBM evidence hierarchy. GRADE strength of recommendation. AGREE II for guideline appraisal. PRISMA 2020 for synthesis reporting. PICO-SEARCH applies all five.
PICO
Population — Intervention — Comparator — Outcome. The structure that turns a vague clinical question (‘should I give statins to my 75-year-old?’) into a searchable one (‘in adults over 70 without cardiovascular disease, do statins reduce all-cause mortality compared with placebo?’).
CEBM hierarchy
The Centre for Evidence-Based Medicine (Oxford) ranks evidence from tier 1a (systematic reviews of RCTs) down to tier 5 (expert opinion). PICO-SEARCH uses the Burns/Rohrich/Chung 2012 mapping with question-type weighting.
GRADE
The Grading of Recommendations Assessment, Development and Evaluation framework. After ranking the evidence, GRADE expresses how confident a clinician should be in the recommendation: high / moderate / low / very low. The clinician answer always includes a GRADE rating.
PRISMA 2020
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 standard. Defines the section layout a systematic review report should use. PICO-SEARCH borrows this LAYOUT for its rapid evidence summaries — we use PRISMA's section headings so clinicians can scan methods/results/limitations in a familiar shape. PICO-SEARCH is NOT a systematic review. We use the reporting shell; we are not claiming the methodological rigour of a full SR.
AGREE II
Appraisal of Guidelines for Research and Evaluation. The standard for assessing whether a clinical practice guideline is well-developed. Used implicitly when ranking guideline citations.
Risk of bias
Cochrane RoB 2.0 for RCTs, Newcastle-Ottawa for cohorts. PICO-SEARCH applies a Jadad heuristic on RCT abstracts as a quality multiplier in the ranking score; full RoB is on the deferred queue.
Hard rules — what the brain never does
The legal and clinical safety boundaries. Encoded in source code, system prompts, and Zod schemas. Auditable.
✗ Never scrape licensed content
eTG, AMH, MIMS, UpToDate, BMJ Best Practice, DynaMed and 12 other commercial references are tier ‘licensed_linkout’. We surface a click-through banner. We never fetch their content. Ever.
✗ Never provide dosing in plain mode
The plain-language system prompt forbids dosing, frequencies, routes, schedules, or titration. If a regression appears, fix the prompt before shipping.
✗ Never fabricate citations
Citation validator walks every `[Ref N]` pointer in the prose against the top-15 citation block. Orphan references are logged. Synthesis schemas carry hard min character counts so the LLM cannot produce a shallow placeholder answer.
✗ Never give a verdict on an individual
Plain answers use ‘studies suggest’, ‘evidence indicates’, ‘researchers found’ — never ‘you have…’. The clinician answer is an evidence summary built from published literature. It is not a verdict on any individual person or case.
✗ Never store subscription credentials
User subscription preferences (eTG / AMH / UpToDate / BMJ Best Practice / DynaMed / Cochrane / NICE) are boolean flags only. We never store passwords or tokens for licensed third parties. Ever.
✗ Never use a single LLM provider for synthesis
The synthesis path requires the Anthropic → OpenAI → Google fallback chain. Single-provider calls are forbidden in code review. Resilience + clinical safety.
What's deferred — ranked by leverage
Every shipping product has gaps. Here are PICO-SEARCH's, ranked. Honest disclosure is part of the trust posture.
- 1
PubTator + Unpaywall source clients
Free PDF link button on every citation card via Unpaywall. Entity-tag chips on PubMed cards via PubTator 3.0. Both Apache/MIT, both free REST, both shippable in one commit. ETA: next sprint.
- 2
NCTS Syndication TS port
Pulls the NCTS Atom feed daily and stamps every synthesis answer with ‘Pinned to AMT v3 release YYYY-MM’. Defensible against ‘your data is stale’ critique. ETA: ~1 week.
- 3
@iebh/sra-polyglot live in-browser
Bond IEBH ship an MIT npm package that translates any PubMed search query into Ovid, Embase, Cochrane, CINAHL, Web of Science, Scopus syntaxes. Replace our Polyglot link-out with the live translator. ETA: 1 commit.
- 4
BioLinkBERT local re-ranker
Stanford LinkBERT (Apache-2.0) outperforms PubMedBERT on BLURB. Local re-ranking pass between source fan-out and LLM synthesis cuts Sonnet token bill by ~40-60% per query without hurting answer quality. ETA: ~1 week.
- 5
Regression suite — EBM-NLP + MS² + MedReview
Hard PICO span F1 + ROUGE numbers we can defend against the surveyed open SR-automation tools. ETA: 3-5 days.