Discussion about this post

User's avatar
The AI Architect's avatar

Brilliant framing of why data standardization has been such a nightmare in vet med. The funnel metaphor really clarifies something I've experienced but couldn't articulate, that forcing precision too early just doesnt match how cases actually unfold. Ran into this constantly when trying to get my team to use structured codes mid-appointment, it always felt wrong but now I see it's fighting against clinical cognition itself. Backend translation powered by LLMs migt finally be the workaround we need.

Eric Fish, DVM's avatar

This is a great framework to start understanding pitfalls in medical data coding, though I would argue it is fundamentally incomplete: Medical labels do not exist solely on a unidirectional continuum that progresses from vague, unstructured information to precise structured data; there is bi-directionality and rich compression inherent to diagnosis.

Consider a patient with a tumor that a pathologist calls a "melanoma" in their report. Seems like a simple, unambiguous label: It either is or isn't that type of tumor, right? The challenge is that under the microscope (or these days, a computer screen), a "melanoma" can have variable levels of pigmentation ranging from heavy to absent. It can differ in appearance from "epithelioid" to "spindyloid" to "round cell," or a combination! There are weird variants like "balloon cell" melanomas that don't fit the options above. One tumor might have a lot more "atypia" than another, but considered lower grade based on a factor like mitotic rate (number of dividing cells in a tumor). And two melanomas with cancer cells that appear morphologically identical can differ in aggressiveness based solely on anatomic location or on non-tumor features in the tissue section (stroma, vasculature/lymphatic invasion, inflammatory infiltrate, etc). Even if you used SNOMED or ICD coding to try and force a specific structured label (to reduce variation in terms like "melanoma" vs "malignant melanoma" vs "melanocytoma" vs "melanocytic neoplasia, etc), any computer vision AI program that is trying to learn what "melanoma" is going to face an uphill battle to differentiate it from very similar looking lesions that behave completely differently.

Some folks seem to think you can simply overcome this problem with "brute force" by throwing more and more data at the problem. I'm skeptical. An experienced pathologist does not only rely on visual pattern recognition, they draw on knowledge about cell biology, physiology, normal and abnormal anatomy, the impacts of drugs and radiation on tissues, infectious agents that cause dysplasia, and more. A pathologist might see a possible melanoma and say to themselves "Hmm, here is my initial set of differentials, let's order a panel of immunohistochemistry including pancytokeratin, vimentin, MelanA, and S100." Sometimes you might need molecular tests besides IHC (like PARR clonality testing for lymphoma or PCR for mutated c-kit in mast cell tumors). Then you would be presented with one or more sets of additional assay data that requires it's own interpretation and fine discernment of real signal vs background, comparison to positive and negative control reactions, knowledge from the research literature about how well (or not) different assays perform in different situations, etc. Sometimes the key to diagnosis might depend on clinical history or other lab/imaging data in the medical record. The combinatorial complexity is staggering when you really think about it!

My point is that even something that seems extremely precise and specific like a cancer diagnosis often encodes a wide richness that is more challenging than it appears at first glance. Is it an impossible problem to solve? Theoretically, no. But it is going to require a *LOT* more time, data, resourcefulness, and creativity than what I'm seeing many companies prepared to build.

4 more comments...

No posts

Ready for more?