Program directors and the pass/fail USMLE

Just over a year ago, the NBME announced that Step 1 would soon become pass/fail in 2022. A lot of program directors complained, saying the changes would make it harder to compare applicants. In this study of radiology PDs, most weren’t fans of the news:

A majority of PDs (69.6%) disagreed that the change is a good idea, and a minority (21.6%) believe the change will improve medical student well-being. Further, 90.7% of PDs believe a pass/fail format will make it more difficult to objectively compare applicants and most will place more emphasis on USMLE Step 2 scores and medical school reputation (89.3% and 72.7%, respectively).

Some students also complained, believing that a high Step score was their one chance to break into a competitive specialty.

There are two main reasons some program directors want to maintain a three-digit score for the USMLE exams.

The Bad Reason Step Scores Matter

One reason Step scores matter is that they’re a convenience metric that allows program staff to rapidly summarize a candidate’s merit across schools or other non directly comparable metrics. This is a garbage use case—in all ways you might imagine—but several reasons include:

  • The test wasn’t designed for this. It’s a licensing exam, and it’s a single data point.
  • The standard error of measurement is 6. According to the NBME scoring interpretation guide, “plus and minus one SEM represents an interval that will encompass about two thirds of the observed scores for an examinee’s given true score.” As in, given your score on test day, you should expect a score in that 12-point page only 2/3 of the time. That’s quite the range for an objective summary of a student’s worth.
  • The standard error of difference is 8, which is supposed to help us figure out if two candidates are statistically different. According to the NBME, “if the scores received by two examinees differ by two or more SEDs, it is likely that the examinees are different in their proficiency.” Another way of stating this is that within 16 points, we should consider applicants as being statistically inseparable. A 235 and 250 may seem like a big difference, but our treatment of candidates as such isn’t statistically valid. Not to mention, a statistical difference doesn’t mean a real-life clinical difference (a concept tested on Step 1, naturally).
  • The standard deviation is ~20 (19 in 2019), a broad range. With a mean of 232 in 2019 and our standard errors as above, the majority of applicants are going to fall into that +/- 1SD range with lots of overlap in the error ranges. All that hard work of these students is mostly just to see the average score creep up year to year (it was 229 in 2017 and 230 in 2018). If our goal was just to find the “smartest” 10% of medical students suitable for dermatology, then we could just use a nice IQ test and forget the whole USMLE thing.

It’s easier to believe in a world where candidates are both smarter and just plain better when they have higher scores than it is to acknowledge that it’s a poor proxy for picking smart, hard-working, dedicated, honest, and caring doctors. You know, the things that would actually help predict future performance. Is there a difference in raw intelligence between someone with a 200 vs 280? Almost certainly. That’s 4 standard deviations apart. But what about a 230 and 245? How much are we really accidentally weighing the luxury of having both the time and money needed in order to dedicate lots of both to Step prep?

In my field of radiology, I care a lot about your attention to detail (and maybe your tolerance for eyestrain). I care about your ability to not cut corners and lose your focus when you’re busy or at the end of a long shift. I care that you’re patient with others and care about the real humans on the other side of those images.

There’s no test for that.

If there were, it wouldn’t be given by the NBME.

The Less Bad Reason Step Scores Matter

But there is one use case that unfortunately has some merit: multiple-choice exams are pretty good at predicting performance on other multiple-choice exams. That wouldn’t matter here if licensure was the end of the test-taking game, but Step performance tends to predict future board exam performance.

Some board exams are quite challenging, and programs pride themselves on high pass-rates and hate dealing with residents that can’t pass their boards. So, Step 1 helps programs screen applicants by test-taking ability.

Once upon a time, I considered a career as a neurosurgeon instead of a neuroradiologist. No denying it certainly sounded cooler. I remember attending a meeting with the chair of neurosurgery at my medical school. This is only noteworthy because of his somewhat uncommon frankness. At the meeting, he said his absolute minimum interview/rank threshold was 230 (this was back around 2010). And I remember him saying the only reason he cared was because of the boards. They’d recently had a resident that everyone loved and thought was an excellent surgeon but just couldn’t seem to pass his boards after multiple attempts. It was a blight on the program.

Now, leave aside for a moment the possible issue with test validity if a dutiful clinician and excellent operator is being screened out over some multiple-choice questions. At the end of the day, programs need their residents to pass their boards. And it’s ideal if they pass their boards without special accommodations or other back-bending (like extra study time off-service) to help enable success. So while Step 1 cutoffs may be a way to quickly filter a large number of ERAS applications to a smaller more manageable number, they’re also a way to help programs in specialties with more challenging board exams ensure that candidates will eventually move on successfully to independent practice.

There is only one real reason a “good” Step score matters, and that is because specialty board certification exams are also broken.

One of the easiest ways a program can demonstrate high-quality and high board passage rates regardless of the underlying training quality is to select residents who can bring strong test-taking abilities to bear when it comes to another round of bullshitty multiple-choice exams.

A widely known secret is that board exams don’t exactly reflect real-life practice or real-life practical skills. Much of this type of board knowledge is learned by the trainees on their own, often through commercial prep products. A residency program in a field with a challenging board exam, like radiology, may be incentivized to pick students with high scores simply as a way to best ensure that their board pass rates will remain high. If Step 1 mania has taught us anything, it’s shown us that if you want high scores on a high-stakes exam, you pick people with high academic performance and then get out of their way.

What Are We Measuring?

When I see the work of other radiologists, I am rarely of the opinion that the quality of their work depends on their innate intelligence such as might be measured on a standardized exam. Ironically, most radiology exam questions ask questions about obvious findings. Almost none rely on actually making the finding or combating satisfaction of search (missing secondary or incidental findings when another finding is more obvious). And literally none test whether or not a radiologist can communicate findings in writing or verbally. When radiologists miss findings and get sued, the vast majority are for “perceptual errors” and not “interpretive ones.” As in, when I miss things, it’s relatively rare that I misinterpreted the findings I make and more often that I just didn’t see something (often that even I normally would [because I’m human]).

Obviously, it’s never a bad thing to be super smart or even hard-working. But the medical testing industrial complex has already selected sufficiently for intelligence. What it hasn’t selected for is being competent at practicing medicine.

While everyone would like to have a smarter doctor and train “smarter” residents, the key here is that board passage rates are another reflection of knowledge cached predominately in general test-taking ability and not clinical prowess. All tests are an indirect measure, for obvious reasons, but most include a wide variety of dubiously useful material largely designed to simply make exams challenging without necessarily distinguishing capable from dangerous candidates.

So when program directors complain about a pass/fail Step 1, they should be also be talking with their medical boards. I don’t think we should worry about seeing less qualified doctors, but we should be proactive about ensuring trainee success in the face of exams of arbitrary difficulty.


Private Equity & the Comeback of the For-Profit Medical School

You may be used to hearing about private equity takeovers of medical practices, but you may be less familiar with the recent growth of for-profit (primarily osteopathic) medical schools, two of which are owned by Medforth Global Healthcare Education. Medforth, as you might have guessed, is a private equity firm based in New York, NY.

Given the current osteopathic tilt of these for-profit schools, can this do anything but worsen the unfair stigma already facing DO students and physicians?

Well, here is an excerpt for how a recent proposed for-profit private-equity-backed medical school in Billings, Montana got derailed:

Billings Clinic has had concerns about many aspects of the Medforth project. These concerns, combined with three events that occurred recently, have caused Billings Clinic to cease discussions with Medforth. On two separate occasions an executive representative of the medical school cast aspersions on a proposed medical school in Great Falls, Montana, on the basis of that medical school’s Jewish affiliation. Those statements intimated that a school with a stated Jewish heritage may not belong in Montana and would not be able to assimilate in the state. In a third instance, a different executive representative of the medical school referred to a female Billings Clinic leader as a “token.” These comments are inconsistent with Billings Clinic’s core values, including a dedication to diversity, inclusion, equity and belonging.

Ew. Now, are these clowns really a bunch of abhorrent scummy sexist racist antisemites? Absolutely a possibility, though flaunting that bias would be incredibly stupid.

Is it possible that much of this bigotry display instead reflects some poorly conceived cynical attempt to appeal to others believed to hold bigoted views? Do these private equity jokers just think that Montanans are a bunch of abhorrent scummy sexist racist antisemites?

Maybe it’s a bit of both. Maybe Medforth is just looking for kindred spirits.

When it comes to people running a medical school, neither possibility should be acceptable.

(h/t @jbcarmody)

Old Guard Medical Wisdom? Rest

From Rest: Why You Get More Done When You Work Less:

Neurosurgeon Wilder Penfield, for example, warned medical students that unless they cultivated other interests, “your specializing will expose you to an insidious disease that can shut you away from all but your occupational associates” and “imprison you in lonely solitude.” Penfield’s mentor, William Osler, warned that without care, “good men are ruined by success in practice,” and that “ever-increasing demands” can leave even the most curious person “worn out, yet not able to rest.” It was essential to develop “some intellectual pastime which may serve to keep you in touch with the world of art, of science, or of letters.”

These statements came from an era when residents literally lived in the hospital and Osler’s famous surgical colleague William Halstead’s work ethic was fueled by cocaine.

And even they thought it was important for doctors to be well-rounded, have hobbies, and get a life.

Honestly, I’m more interested in what you do for you than what boxes you’re just checking to impress me.

A Chance for Meaningful Parental Leave During Residency

Last year, the ABMS–the umbrella consortium of medical specialties–waded into the established toxic mess of medical training schedules with a new mandate to provide trainees with a nonpunitive way to be parents, caretakers, or just sick:

Starting in July 2021, all ABMS Member Boards with training programs of two or more years duration will allow for a minimum of six weeks away once during training for purposes of parental, caregiver, and medical leave, without exhausting time allowed for vacation or sick leave and without requiring an extension in training. Member Boards must communicate when a leave of absence will require an official extension to help mitigate the negative impact on a physician’s career trajectory that a training extension may have, such as delaying a fellowship or moving into a full, salaried position.

6 weeks over the course of an entire residency may not seem like much given the vagaries of life, but it’s a better floor than many programs currently offer. A graduation delay sucks, and it’s the kind of punishment for living your life that causes many doctors to put off big milestones like starting a family. Medical training already takes a long time, and ~1 in 4 female physicians struggle with infertility (and in that study, 17% of those struggling would have picked a different specialty).

This issue is being addressed across medicine, but we’re going to discuss it in the context of radiology because I am a radiologist.

The American Board of Radiology’s recent attempt at how such language should look has drawn some ire on Twitter. Here is their email to program directors that’s been making the rounds:


They proposed that a program “may” grant up to 6 weeks of leave over the course of residency for parental/caregiver/medical leave as a maximum without needing to extend residency at the tail end. The language here doesn’t even meet the ABMS mandate, which again states that a program “will” provide a “minimum” of 6 weeks (and explicitly states that said 6 weeks of leave shouldn’t be counted against regular sick time).

The ABR could have simply taken the straightforward approach of parroting the ABMS mandate. They could have–even better–taken the higher ground with an effort to trailblaze the first generous specialty-wide parental leave policy in modern medicine.

Instead, they have advocated for a maximum of six weeks, because any more and they feel they wouldn’t be able to “support the current length of required training.” As in, if a mom gets 3 months off to care for a newborn then the whole system falls apart.

I think they realized it would be prudent to ask for feedback first and then make the plan because a new softer blog post removes any specific language:

We need your input to develop a policy that appropriately balances the need for personal time including vacation as well as parental, caregiver, and/or medical leave with the need for adequate training. 

It is important to realize that the ABR is not restricting the amount of time an institution might choose to allow for parental, caregiver, and/or medical leave, nor are we limiting the amount of vacation a residency program might choose to provide. These are local decisions and the ABR does not presume to make these determinations. However, above a certain limit (not yet determined), an extension of training might be needed to satisfy the requirement for completion of the residency. 

Of course, in the original proposal, the ABR literally did want to limit program vacation (to 4 weeks, see above).

After the mishandling of the “ABR agreement” debacle and the initial we-can’t-do-remote-testing Covid pseudo-plan and now this, I hope the ABR will eventually come to the conclusion that stakeholders matter and that we can make radiology better by working together as a community.

Radiology is a “male-dominated” field, but it shouldn’t be. A public relations win here could make all the difference.

Plenty of Slack

I think there are more than six weeks of slack in our 4-year training paradigm, and it’s hard to argue otherwise.

When the ABR created the Core Exam and placed it at the of the PGY4/R3 year, they created a system where a successful radiology resident has proven (caveat: to the ABR) that they are competent to practice radiology before their senior year. It created a system where the fourth year of residency was opened up largely to a choose-you-own-adventure style of highly variable impact.

We have ESIR residents who spend most of their fourth-year doing IR, and we have accelerated nuclear medicine pathway residents that do a nuclear medicine fellowship integrated into their residency. There are folks early specializing into two-year neuroradiology fellowships during senior year, and others who take a bevy of random electives that they may never use again in clinical practice.1

We have many programs with a whole host of extracurricular “tracks” where residents might spend protected time every week doing research, quality improvement, or clinician-educator activities. I would know, I did all three during my residency. We have residents doing research electives and all kinds of other interesting things that may worthwhile but have no positive impact on their ability to practice radiology clinically, which is the primary purpose of residency training.

A hypothetical example: Take a research track resident with one half-day protected time every week for 40 weeks a year (say because of 8 weeks of night float and 4 weeks of vacation). That’s 20 days a year of reduced clinical activity. 20 working days is basically a month. If they have their R1 year to just focus on learning radiology before taking call, then over the next three years that resident would be “missing” 3 months of clinical time. But no one is seriously arguing that these tracks should postpone residency graduation.

We already have a system where there are minimum case requirements for residents to complete residency training. Last I checked, the ABR is certifying radiologists in the domain of clinical radiology, not their number of peer-reviewed publications or ability to do a sick root cause analysis.

Radiology residency may be four years after a clinical internship, but it’s clear that there is no standard radiology training program clinical “length” despite that fixed duration. Some residents are already doing far fewer months.

No one is adding up diagnostic work hours and saying you need 48 weeks/yr * 52 hours/wk * 4 years = 9,984 hours.

It’s not a thing, and it shouldn’t be.

Competency-based Assessment and Reasonable Limits

The core problem is that we have time-based residencies masquerading as a proxy for competency. You don’t magically become competent when you graduate. Competency is a continuum. Hiring trainees for a set number of years is convenient. It’s easy to schedule. It’s easy to budget. But it’s an artifact of convenience, not a mission-critical component of clinical growth.

There are R3 residents who are ready for the big leagues, and there are practicing doctors who should honestly move back down to the minors. No one is going to argue that a little more training makes you worse. But the logic that more is better gets us to the unsustainable current state of affairs, where doctors are accumulating more and more training to become hyper-specialized in the least efficient way possible while non-physician providers bypass our residency/fellowship paradigm to do similar jobs with zero training.

We all get better with deliberate practice. The question isn’t: is more better? The question is how much less is still enough for independent practice?

Obviously, the ABMS member boards like the ABR don’t exactly have the power to force institutions to change policies directly, and they probably don’t want to. But they do set the stage by mandating the criteria for board eligibility.

I would argue that the ABR should set a minimum threshold and no maximum. If a program is happy with that resident’s progress and they pass the Core Exam, then consider the boxes checked. Let everyone be treated with dignity and then give the programs the flexibility to compete in the marketplace of support.

When my son was born, I was able to take 4 days of sick time and then went straight into night float. That’s bullshit. You want to see motivation? Tell an expecting resident that if they’re a total champion that they can spend as much time as they need with their baby without delaying graduation.

Less than 6 weeks is unacceptable. And while a 6-week minimum is an improvement, I think the true minimum consistent with current training practices that should also have a chance of being implemented is three months.

I’d love to see six months or more. I don’t think that’s going to happen as a minimum, and there’s a very reasonable argument against it as underperforming residents really may need some of that time back. It would be nice to see language that demands 3 months, has no maximum, and strongly encourages programs to work with residents on a case-by-case basis to ensure they are ready for graduation with however much time they have.

But the first step is to have a minimum that doesn’t punish women who want to stay home with their infants until they’re done cluster feeding. Convince me otherwise.


The ABR doesn’t use the language of “fairness” in their email, but I suspect the perception of fairness is at play. It’s almost always at play when older doctors consider policies that might benefit younger physicians. It’s the I-did-it-this-way-and-I’m-amazing-so-it-must-be-an-integral-part-of-the process. It’s the hazing.

Right now, some lucky residents across the country get varying degrees of time “off” thanks to PD support in the form of research electives, reading electives, and program staff simply looking the other way. We need to standardize a fair minimum that enables programs to provide a consistent humane process and not just put trainees solely at the mercy of their PDs and local GME office.

Yes, it’s true that if you allow parents time to be parents or people to take care of loved ones or people time to recover from illness that some residents will work fewer months than others. Every resident has their unique experience, but a policy change will also mean that every resident may not have a similar “paper” experience. That’s a fact.

Some people will say, that’s not fair. That it’s not fair to single residents or non-parents. That it’s not fair to the able-bodied. Or to those whose aging parents are healthy or have the resources to support themselves.

But let me provide a counterpoint:

I don’t think fairness means that every single resident has to have the exact same experience. They already don’t. I think fairness means we treat humans with the respect and compassion that every person deserves. I want to live in a world where everyone gets time to be a parent, even if yes, that world means that some doctors may have a career that is a few months shorter.

I think fairness means not punishing people when life happens just because making people jump through hoops makes it easier to check a box.

If you’re ready to practice, you’re ready.

If we need to reassess the validity of an exclusively time-based (instead of competency-based) training paradigm in order to do that, then let’s get to it.

The ABR is accepting feedback until April 15.

Patient Satisfaction: A Danger to be Avoided

Doctors intuitively know that the Yelpification of medicine is bad. But it’s not just toxic to the physician-patient relationship and bad for burnout, it’s actually dangerous.

The outsized and misplaced importance of patient satisfaction scores is a perfect embodiment of Goodhart’s law, well-paraphrased as “when a measure becomes a target, it ceases to be a good measure.”

If you make patient satisfaction scores a critical target—and they are—you will see consequent mismanagement. This is so blatantly apparent when it comes to urgent care and pain management that, if anything, high satisfaction scores are likely a more meaningful signal of poor care.

If a patient comes to an urgent care for a URI and wants antibiotics, they will be most “satisfied” when they receive the prescription they didn’t need. And all that over-treatment is not without risk.

Even outside of quality metrics, profit-centered health care businesses need patients to make money, and the “customer” is always right.

A study published in JAMA is a great example of the obvious negative externalities of prioritizing patient satisfaction scores. It analyzed a large number of telemedicine visits for URI:

72 percent of patients gave 5-star ratings after visits with no resulting prescriptions, 86 percent gave 5 stars when they got a prescription for something other than an antibiotic, and 90 percent gave 5 stars when they received an antibiotic prescription.

In fact, no other factor was as strongly associated with patient satisfaction as to whether they received a prescription for an antibiotic.

Another study out of UC Davis study analyzed a >50,000 person national Medical Expenditure Panel Survey and found that patients who were most satisfied had greater chances of being admitted to the hospital, had ~9% higher total health-care costs, and 9% higher prescription drug expenditures. Of course, if you’re a for-profit entity (and most “non-profit” hospitals certainly are), higher costs and more prescriptions often just mean more profit. A win-win-win.

But even worse, death rates also were higher: For every 100 people who died over an average period of nearly four years in the least satisfied group, about 126 people died in the most satisfied group.

Moreover, the more satisfied patients had better average physical and mental health status at baseline than the less satisfied patients, and the association between patient satisfaction and death was strongest among the healthiest patients. Perhaps the “worried well” should be worried.

The push to satisfy patients at all costs is no secret. But some doctors are fighting back, like Dr. Eryn Alpert, who sued Kaiser Permanente in 2019:

A doctor who refused to prescribe patients unnecessary opioids has sued Kaiser Permanente, alleging the way the company used patient satisfaction scores hurt her career and incentivized doctors to over-prescribe painkillers.

By requiring its employee physicians to achieve certain patient satisfaction scores in departments where those scores are closely related to a physician’s willingness to prescribe opioids, other addictive medications, and to order unnecessary medical testing (e.g. labs, radiology) in response to patient demand, Kaiser’s intent was to increase its profits so that … its executives and physicians would receive higher bonus compensation.”

These sorts of individual fights happen quietly all over the country, but the opiate crises may have created an opportunity for doctors to put the focus back on patient outcomes.

Do no harm in many cases means doing less, but the combination of short visits and Press Ganey pressures makes it harder for doctors to do the right thing. Healthcare may be a business, but patient care isn’t.

This article was originally published in Physician Sense in October 2019.