Clinicians can bill, at least to an extent, to account for complexity. When a patient walks into a clinic for an annual physical, an acute upper respiratory tract infection, or an endless litany of chronic complaints including uncontrolled hypertension, diabetes, and hypercholesterolemia, and an acute complaint, their documentation and the codes they use can differ between a brief med check and some more demanding undertaking.
Modality ≠ Complexity
In radiology, we don’t have complexity. We have modality. MRIs earn more than CTs, which earn more than ultrasounds, which earn more than radiographs. There is no distinction between a unindicated pre-operative screening CXR on a healthy adult and an ICU plain film. There is no distinction between a negative trauma pan-scan in an 18-year-old and a grossly abnormal pan-scan in a 98-year-old with ankylosing spondylitis, multiple fractures, and a few incidental cancers.
Leave aside adjusting actual reimbursement RVUs from payors and the government, which is beyond the scope of this essay and would require changes that are likely ultimately unhelpful in the sense that assigning RVUs for reimbursement is a zero-sum game: paying more for one thing will mean paying less for others. Yes, the reality is that some groups and some locations do have more complex cases than others, but capturing that in a fair way by a third party would be a substantial challenge and one with clear winners and losers. Reimbursement has never been fair: between the wide range of complexity and payor contracts, some doctors (or at least institutions) are simply paid more on a per-effort basis.
Internally, however, a group has limitless wiggle room to adjust internal accounting to reward effort and pursue fairness. Again, on the whole, in an ideal world, everyone receives a combination of easy and hard cases, and therefore everyone’s efforts will, on the whole, be comparable. In practice, this may not be the case in many contexts.
For example, a community division in a large university practice may not be reading the same kinds of cases as their counterparts working in the hospital. Some attendings work in rotations with junior residents, and some don’t. Different shifts and different silos across practices that involve different hospitals or different centers can vary widely, and even imaging centers in strip malls may draw different kinds of pathology by zip code and referral patterns. Even covering the ER may yield different sorts of cases with different issues, depending on the time of day. Deep night, high-speed MVCs at 3 am at your hospital may be different from the parade of elderly falls that come during the late morning. If all radiologists share in the differing kinds of work equally, no biggie, but especially in larger practices, that is not always the case.
Across modalities and divisions, it can be relatively straightforward to account for an internal work unit according to generalized desirability or typical time spent. A group might choose to bump the internal RVUs of radiographs and decrease them for some varieties of MRI. A group might decrease single-phase abdomen/pelvis CTs and increase chest CTs. A group might bump thyroid ultrasounds but decrease right upper quadrant ultrasounds. These sorts of customized “work units” bsaed on average time-to-dictation are common.
But the problem of variable challenge within an exam type is thornier. Complexity varies, and preventing the peek-and-shriek cherry pick is a nontrivial task. A normal MRI of the brain for a 20-year-old with migraines is a different diagnostic challenge than a case of recurrent glioblastoma with radiation necrosis or progression.
Most of the metrics one could use to attempt this feat on a case-by-case basis are gameable and ultimately not tractable. If you use time spent to read the case, it’s very challenging to normalize across individuals with varying intrinsic speed, let alone the fact that someone can open an easy case and leave it open while dropping a deuce. I don’t think anyone wants to live in a world where Big Brother is tracking their mouse movements or other invasive surveillance. Radiologists have a hard enough time fighting the widget factory worker mentality enough as it is.
But even when everyone behaves nicely, having a system that accounts for tough cases would help with frustration, burnout, blah blah. No one likes to get bogged down in a complex case and feel behind. What constitutes a solid day’s work depends on how hard the work is.
Enter the eRVU
Here’s an example of how the scalability of AI could make an intractable problem potentially tractable: the use and application of LLMs on reports after the fact to create a complexity grid to help account for case difficulty. Such a model (or wrapper) could be trained to create a holistic score using a variety of factors like patient age, number of positive or negative findings, and the ultimate diagnosis.
Now, obviously, such solutions–like all things–would be imperfect. It may even be a terrible idea.
For one, you’d have to determine how to weigh and differentiate between essential findings or extraneous details such that dumping word salad into a report does not increase the complexity score when it does not meaningfully add value. We all know that longer is not better. But I think if people are creative, viable experiments could be run to figure out what feels fair within a practice and how to drive desired behavior. It’s possible a big powerscribe dump with report data could yield a pretty robust solution that takes into account what “hard” and “easy” work looks like historically based on report content and the time it took to make it. Or maybe you need to wait for vision-language models that can actually look at pictures.
Again, a non-terrible version of such a product would be for internal workload and efficiency accounting, not for reimbursement. Think of it like the customized wRVU tables already in use but with an added layer that it would work across all exam types instead of just modality.
With an effortRVU, we could account for the relative complexity of certain kinds of cases within any modality. We could account for the relative ease of an unchanged follow-up exam for a single finding, and we could account for the very heavy lift that sometimes drives certain types of cases to be ignored on the list, like temporal bone CTs or postoperative CTs of extensive spinal fusions with hardware complications.
Providing good care for the most challenging cases should never be a punishment for good citizens.
(Yes I’m aware some institutions already use an “eRVU” for educational activities, meetings, tumor boards, etc. Accounting for non-renumerative time is also a defensible approach, but that’s not related to the challenges associated with variable case complexity itself.)
((It’s also worth noting that it’s also not hard to imagine a world where payors try to do things like this without your permission. Long term, how reimbursement changes for work in a post-AI world is anyone’s guess because all the current tools suck.))
Infighting & Fun
Any attempt to differ from the status quo, any variation of customization—whether simple wRVU tweaks or something more dramatic like this—is inevitably fraught. The more such a solution is based on messy human opinions, the more contentious the discussions would likely be. Everyone has an opinion about RVUs, and no one wants to see their efforts undervalued. Every tool is just a reusable manifestation of the opinions that go into it. For example, historically common complaints about the RVUs of interventionalists (often ignoring the critical clinical role our IR colleagues play and the physical presence it requires) are a cultural and financial problem, but probably not an AI one.
It’s worth noting that the desire to not “downgrade” work or deal with infighting is probably why many practices choose to change daily targets and bonus thresholds based on subspeciality, shift-type, etc instead of creating/adjusting work units. It’s the same idea tackled less dramatically from a greater distance.
Counting every activity (phone calls, conferences, etc) is also something that’s been deployed in some settings, but it’s easy to see how taken to extremes such efforts to reward behaviors can veer too far into counterproductive counting games and even tokenizing just being a decent person.
If there is a “right” answer, it may be specific to the company and the people in it, and adding complexity to the system has its own very real costs. Nonetheless, there is a strong argument to be made that some degree of practice effort to make sure that everyone’s work goes noticed and appreciated in a “fair” way is a step in the right direction for subspecialized practices.
Internal productivity metrics help prevent low effort output while smart worklists and other guardrails can ensure largely ethical behavior within the list. (But sure, theoretically, if you can solve case assignment, most everything else that matters should just even out in the long run.)
Ultimately, radiology is a field where, especially in large organizations, it can become easy to feel like an anonymous cog. Individualizing productivity accounting to truly recognize the hard, challenging work many radiologists do—and reward those who are willing to develop expertise and do a good job reading complicated cases—might help humanize the work.
(Or…maybe it would just be more stressful and counter-productive to get less credit for those easy palate-cleansers, I don’t actually know. I do know that this particular food-for-thought is bound to make some people very uncomfortable. You can tell me how far you think the gulf between possible and desirable is.)