About Iatronix
Iatronix is an evidence-based clinical reference built for medical professionals. It searches real-time data from FDA, PubMed, NICE, and your own documents, formats them with AI, and grades every claim by the evidence behind it. Your API key. Your data. Your control.
Started in March 2026 as a personal side project by Kayomarz — built for his own clinical use and made public. More at kayomarz.com.
How a search works — step by step
Every query goes through a 7-stage pipeline. No AI token is spent until real data has been retrieved and quality-checked.
Query rewriting
Before any search begins, the query is cleaned: typos fixed, abbreviations expanded, terminology standardized (e.g. HTN → hypertension, MI → myocardial infarction). This maximizes match quality against PubMed MeSH terms and FDA drug labels.
Classification
Regex pattern scoring instantly classifies the query into one of five types: drug, disease, procedure, evidence (study), or comparative (drug vs. drug). For ambiguous phrasing, a lightweight LLM call (GPT-OSS 120B via Cerebras by default) resolves the type. The query type determines which APIs to call and which response schema to fill.
Semantic cache lookup
Before any API call, the query is embedded into a vector and compared against all previously answered queries using cosine similarity (threshold: 0.92). If a semantically identical past answer exists and is less than 7 days old, it is returned immediately. If the hit is older than 7 days, stale cache is skipped and a fresh pipeline run is executed to avoid serving outdated clinical content.
Parallel data fetch
Relevant data is pulled in parallel from up to 10 sources with no LLM involvement at this stage. Drug queries: OpenFDA labels, interactions, adverse events, DailyMed, RxNorm. Disease queries: PubMed guidelines, recent RCTs (date-sorted), PMC full-text, StatPearls monographs, Unpaywall free PDFs, MedlinePlus summaries, NICE clinical guidelines. Evidence queries: PubMed search ranked by publication date. Each source has a 20-second timeout; failures are logged and skipped without blocking the response.
Evidence quality assessment
Before any LLM call, the fetched data is scored for quality. If the total evidence falls below a minimum threshold, the pipeline returns a DegradedResponse (a clear message explaining what was found) instead of generating potentially unsupported claims. This fail-closed behavior is intentional: a transparent 'insufficient data' message is always safer than a confident hallucination.
Adaptive LLM formatting
The LLM is prompted to prioritize fetched evidence and fill specific schema fields (BLUF headline, summary, sections, citations) without inventing data. Each claim must cite its source by index. If retrieval times out, a guarded fallback response may be generated with explicit validation warnings. The default model is GPT-OSS 120B via Cerebras (or Claude if an Anthropic key is configured in Settings).
Evidence grading & validation
After generation, each section is assigned a Level of Evidence (LOE I–III) and Class of Recommendation (COR I–IIb) based on its source type. RCT-backed guidelines earn LOE I; expert consensus earns LOE III. Citations are verified against the fetched sources. If the response is too sparse, a second-pass LLM call is triggered with a wider evidence budget. Results passing validation are stored in the semantic cache.
How hallucinations are prevented
The pipeline is designed so the LLM cannot invent clinical facts. Five mechanisms enforce this:
Evidence grading
Every claim is assigned a Level of Evidence and Class of Recommendation based on its source type — not inferred from phrasing:
Data sources
All data is fetched in real time from these authoritative sources before any AI processing:
Lessons learnt building this
Quantity vs. quality is a harder trade-off than it looks
Fetching more sources always sounds better on paper. In practice, a noisy PubMed result set with 20 weakly-relevant abstracts produces worse LLM output than 5 high-quality ones. We built evidence scoring precisely because raw retrieval count is a bad proxy for answer quality. More data causes the model to hedge, bury the key point, or invent a consensus that doesn't exist in the sources.
Medical research is behind paywalls — and that matters
Most impactful RCTs and meta-analyses are published in journals that don't offer open access. PubMed gives titles and abstracts; the actual trial data is paywalled. Unpaywall helps for open-access articles, but institutional guideline PDFs — NICE, ACC/AHA, ESC — are not consistently machine-readable. This means a query about a rare disease or a recent trial update will frequently hit the evidence quality floor and return a DegradedResponse, not because the answer doesn't exist, but because it exists behind a paywall.
LLMs are good editors, not good researchers
The pipeline treats the LLM purely as a formatter. Give it structured evidence and a schema to fill, and it produces clean, graded, citable output. Ask it to 'find information about X' without grounded sources and it will confabulate confidently. The fail-closed evidence gate exists because we learned early that the model will fill gaps with plausible-sounding but unsourced content if you let it.
Cache design has a correctness problem, not just a performance one
Semantic caching at 0.92 cosine similarity means 'scabies management' and 'scabies treatment guidelines' can map to the same cached response. That is usually correct — but old cache can miss guideline updates. The current policy is safety-first: stale semantic hits are skipped and the full pipeline reruns. The harder unsolved problem is detecting meaningful guideline deltas automatically.
BYOK — Your Key, Your Data
All LLM calls use your own API key. Nothing is sent to Iatronix servers for generation. Keys are encrypted at rest and in transit. Switch providers anytime from Settings.
Cerebras (GPT-OSS 120B)Default
Default AI provider — powers evidence formatting, query classification, and section generation.
Anthropic (Claude)Optional · Required for Waves
Alternative AI provider for users who prefer Claude. Required for Waves (medical image analysis via Claude vision).