The filing problem nobody talks about
My mother-in-law is a well-organised person. Her health documents still defeat her. GP letters as PDF attachments in one email thread. Hospital discharge summary in another. Blood results on a patient portal, but only the recent ones. A medication change six months ago, rationale explained in a letter she cannot locate.
The information exists. It has been written down, sent to her, filed somewhere in the system. But from the patient's side, it is functionally inaccessible. There is no search function for your own health history. Every encounter starts from scratch because nobody, patient or clinician, has a structured overview of the whole picture.
This is not an NHS problem or a Barbados problem or an American problem. It is a universal failure of how health systems communicate with the people they serve. Patient portals have improved access to documents, but a chronological list of PDFs is a filing cabinet, not a health record you can think with.
The question I started with
Could a language model running on a phone do what no one has time to do manually: read every letter, extract the important facts, and build a structured personal health record that the patient actually owns?
Not by uploading documents to a cloud service. Not by granting access to a third party. Entirely on the device, offline, using the kind of hardware people already carry in their pockets.
This is now technically possible in a way it was not even recently. Google's Gemma 4 at 2 billion parameters is small enough for consumer hardware. It is capable enough to parse the varied mess of clinical correspondence: narrative prose, tabular blood results, abbreviated notes, formal radiology reports. The output maps into FHIR-aligned schemas and lands in a local database. The patient ends up with a queryable health wiki built from their own documents.
"When was my statin changed and why?" pulls the answer from a specific letter. "What has my eGFR done over the last year?" aggregates values from documents sent by three different services. The information was always there. It was just never assembled.
Why local matters
The moment you route patient documents through a cloud service, you enter a governance conversation with no clean resolution in the current landscape. Local inference sidesteps it entirely. The patient processes documents they already possess, on hardware they own, using a model that runs offline. No third party in the loop means no consent question beyond the patient's own decision to use the tool.
This is not an ideological stance about Big Tech. It is an architectural choice that determines what becomes possible downstream. If the data lives on someone else's infrastructure, the patient's role is to grant or withhold consent for someone else's plans. If it lives on their device, structured and queryable, they become a participant rather than a subject.
The architecture of restraint
I designed the system around what I call pass gates: a layered approach born from healthy scepticism about what language models can and cannot be trusted to do.
The model only ever processes one document at a time. It never has access to the complete record simultaneously. It extracts structured data from a single letter, then that output is handed to deterministic code, not AI, which validates, deduplicates, and stores it. The model's job is narrow and auditable. Every claim in the structured record is traceable back to a specific sentence in a specific source document.
This matters because the discourse around AI in healthcare tends toward two poles: uncritical enthusiasm or blanket refusal. The pass-gate architecture is an attempt at a third position. Use AI for what it is genuinely good at, understanding unstructured text. Constrain it rigorously. Never let it become a black box sitting between the patient and their record.
The opportunity this opens
A structured, patient-held health record is useful on its own. But the more interesting question is what it enables.
Consider clinical trial matching. The majority of patients who would qualify for a trial never hear about it. Not because trials are secret, but because no one has time to cross-reference every patient's history against every open study's eligibility criteria. It is a matching problem, and right now it is solved by human attention at a scale that cannot work.
A patient-controlled structured record, matched locally against published eligibility criteria, could surface opportunities the system currently misses. The patient decides whether to act, whether to share, and with whom. This reframes trial recruitment from something done to patients into something patients can initiate for themselves.
There are other possibilities: medication interaction checking against a complete rather than partial medication history, preparation for appointments with structured summaries a clinician could actually use, continuity of care when switching providers. Each becomes more tractable when the patient holds a structured version of their own story.
What I found and what remains
I tested this against synthetic patient histories designed to stress the extraction pipeline: complex multi-morbid cases with medication changes, conflicting information across providers, and lab values in different formats. The 2B model handles it with meaningful accuracy running entirely offline. The premise holds.
But this is a proof-of-concept, not a product, and the distance between the two is significant. Clinical safety validation, usability with real patients, regulatory positioning, and the question of what happens when extraction is wrong all remain open. These are solvable problems, but they require serious work beyond what any individual can do alone.
What I think matters now is the framing. The default trajectory for health AI is centralised: data aggregated into platforms, processed by large models in someone else's data centre, insights sold back to institutions. That architecture is not inevitable. Edge devices are powerful enough. Small models are capable enough. The patient-sovereign alternative is technically viable. Whether it becomes the norm depends on whether we build it before the centralised version becomes too entrenched to displace.
My mother-in-law still calls me. But the calls have become more interesting.