About Iatronix

Iatronix is an evidence-based clinical reference built for medical professionals. It searches real-time data from FDA, PubMed, NICE, and your own documents, formats them with AI, and grades every claim by the evidence behind it. Your API key. Your data. Your control.

Started in March 2026 as a personal side project by Kayomarz — built for his own clinical use and made public. More at kayomarz.com.

How a search works — step by step

Every query goes through a 7-stage pipeline. No AI token is spent until real data has been retrieved and quality-checked.

Query rewriting

Before any search begins, the query is cleaned: typos fixed, abbreviations expanded, terminology standardized (e.g. HTN → hypertension, MI → myocardial infarction). This maximizes match quality against PubMed MeSH terms and FDA drug labels.

Classification

Regex pattern scoring instantly classifies the query into one of five types: drug, disease, procedure, evidence (study), or comparative (drug vs. drug). For ambiguous phrasing, a lightweight LLM call (GPT-OSS 120B via Cerebras by default) resolves the type. The query type determines which APIs to call and which response schema to fill.

Semantic cache lookup

Before any API call, the query is embedded into a vector and compared against all previously answered queries using cosine similarity (threshold: 0.92). If a semantically identical past answer exists and is less than 7 days old, it is returned immediately. If the hit is older than 7 days, stale cache is skipped and a fresh pipeline run is executed to avoid serving outdated clinical content.

Parallel data fetch

Relevant data is pulled in parallel from up to 10 sources with no LLM involvement at this stage. Drug queries: OpenFDA labels, interactions, adverse events, DailyMed, RxNorm. Disease queries: PubMed guidelines, recent RCTs (date-sorted), PMC full-text, StatPearls monographs, Unpaywall free PDFs, MedlinePlus summaries, NICE clinical guidelines. Evidence queries: PubMed search ranked by publication date. Each source has a 20-second timeout; failures are logged and skipped without blocking the response.

Evidence quality assessment

Before any LLM call, the fetched data is scored for quality. If the total evidence falls below a minimum threshold, the pipeline returns a DegradedResponse (a clear message explaining what was found) instead of generating potentially unsupported claims. This fail-closed behavior is intentional: a transparent 'insufficient data' message is always safer than a confident hallucination.

Adaptive LLM formatting

The LLM is prompted to prioritize fetched evidence and fill specific schema fields (BLUF headline, summary, sections, citations) without inventing data. Each claim must cite its source by index. If retrieval times out, a guarded fallback response may be generated with explicit validation warnings. The default model is GPT-OSS 120B via Cerebras (or Claude if an Anthropic key is configured in Settings).

Evidence grading & validation

After generation, each section is assigned a Level of Evidence (LOE I–III) and Class of Recommendation (COR I–IIb) based on its source type. RCT-backed guidelines earn LOE I; expert consensus earns LOE III. Citations are verified against the fetched sources. If the response is too sparse, a second-pass LLM call is triggered with a wider evidence budget. Results passing validation are stored in the semantic cache.

How hallucinations are prevented

The pipeline is designed so the LLM cannot invent clinical facts. Five mechanisms enforce this:

Evidence groundingThe LLM is instructed to anchor claims to fetched article text and cite each claim. If retrieval fails or times out, the system allows guarded fallback generation and surfaces warnings so unsupported claims are treated cautiously.

Fail-closed designIf retrieved evidence is insufficient, the pipeline stops and returns a DegradedResponse instead of proceeding to generation. A clear 'not enough data' message is safer than a confident wrong answer.

Citation validationEvery section cites specific source indices. The formatter verifies citations exist in the fetched data. Unsupported claims cannot earn a high LOE rating.

LOE/COR consistency enforcementLevels of evidence are assigned by source type at a structural level — not inferred by the model. An expert opinion cannot be upgraded to LOE I regardless of how the LLM phrases the claim.

Query-focused retrievalPubMed is searched using the standardized query term, not freeform prose. MeSH-matched results are more likely to be on-topic than semantic similarity alone. Date-sorted results prioritize recent guidelines over older studies.

Evidence grading

Every claim is assigned a Level of Evidence and Class of Recommendation based on its source type — not inferred from phrasing:

LOE IRandomized controlled trial (RCT). The gold standard for causal evidence.

LOE IIProspective cohort study, systematic review of observational data, or major guideline consensus.

LOE IIICase reports, cross-sectional studies, or expert opinion. Used when no higher evidence exists.

COR IStrong benefit — should be done. Supported by LOE I evidence.

COR IIaModerate benefit — reasonable to do. Supported by LOE II or consistent LOE III.

COR IIbWeak benefit — may consider. Conflicting or limited evidence.

COR IIINo benefit or harmful — should not be done.

Data sources

All data is fetched in real time from these authoritative sources before any AI processing:

FDA OpenFDA Drug labels, adverse events, recalls PubMed / NCBI Guidelines, RCTs, systematic reviews PMC Open Access Full-text articles & StatPearls monographs Unpaywall Free legal PDFs for open-access articles RxNorm Drug names & interaction data DailyMed FDA-approved prescribing information MedlinePlus Drug & disease patient-facing summaries NICE UK clinical practice guidelines

Lessons learnt building this

Quantity vs. quality is a harder trade-off than it looks

Fetching more sources always sounds better on paper. In practice, a noisy PubMed result set with 20 weakly-relevant abstracts produces worse LLM output than 5 high-quality ones. We built evidence scoring precisely because raw retrieval count is a bad proxy for answer quality. More data causes the model to hedge, bury the key point, or invent a consensus that doesn't exist in the sources.

Medical research is behind paywalls — and that matters

Most impactful RCTs and meta-analyses are published in journals that don't offer open access. PubMed gives titles and abstracts; the actual trial data is paywalled. Unpaywall helps for open-access articles, but institutional guideline PDFs — NICE, ACC/AHA, ESC — are not consistently machine-readable. This means a query about a rare disease or a recent trial update will frequently hit the evidence quality floor and return a DegradedResponse, not because the answer doesn't exist, but because it exists behind a paywall.

LLMs are good editors, not good researchers

The pipeline treats the LLM purely as a formatter. Give it structured evidence and a schema to fill, and it produces clean, graded, citable output. Ask it to 'find information about X' without grounded sources and it will confabulate confidently. The fail-closed evidence gate exists because we learned early that the model will fill gaps with plausible-sounding but unsourced content if you let it.

Cache design has a correctness problem, not just a performance one

Semantic caching at 0.92 cosine similarity means 'scabies management' and 'scabies treatment guidelines' can map to the same cached response. That is usually correct — but old cache can miss guideline updates. The current policy is safety-first: stale semantic hits are skipped and the full pipeline reruns. The harder unsolved problem is detecting meaningful guideline deltas automatically.

Read the full Engineering Journal →

BYOK — Your Key, Your Data

All LLM calls use your own API key. Nothing is sent to Iatronix servers for generation. Keys are encrypted at rest and in transit. Switch providers anytime from Settings.

Cerebras (GPT-OSS 120B)Default

Default AI provider — powers evidence formatting, query classification, and section generation.

Get API key

Anthropic (Claude)Optional · Required for Waves

Alternative AI provider for users who prefer Claude. Required for Waves (medical image analysis via Claude vision).

Get API key