Explanations for the 2021 Official Step 1 Practice Questions

This year’s set was updated in February 2021 (PDF here).

The asterisks (*) signify a new question, of which there are only 2 (#24 and 53). The 2020 set explanations and pdf are available here; the comments on that post may be helpful if you have questions.

The less similar 2019 set is still available here for those looking for more free questions, and even older sets are all listed here. The 2019 and 2020 sets, for example, differed by 36 questions (in case you were curious).


Continue reading

Scheduling Slack

From Alan’s Weiss’ classic Getting Started in Consulting:

Medical consultants advise doctors never to schedule wall-to-wall appointments during the day, because inevitably there are emergencies, late patients, complications on routine exams, and so forth. These create a domino effect by day’s end, and some very unhappy scheduled patients. Instead, they advise some built-in slack time that can absorb the contingencies. If not needed, slack time provides valuable respite.


I read this book years ago when I was a resident and came across this passage when reviewing my Kindle highlights the other day.

Perhaps there are consultants in real-life operating as Dr. Weiss suggests, but this common-sense approach to sustainable medical practice is not what many large health systems employ.

In my wife’s old outpatient academic practice, lunchtime wasn’t respite. It was an overbook slot, and her schedule was so jam-packed that there were always patients clamoring to squeeze in.

In order to make that all work, the average doctor spends 1-2 hours charting at home per day.

Contrast that with her current solo practice where she has complete autonomy: her patients aren’t scheduled wall to wall, and she has time for the inevitable emergencies, hospitalizations, collateral phone calls, prior auths, and the other vagaries of modern medical practice.

I’m proud of the practice she’s built–during a pandemic no less!–but it’s crazy that even academic medicine has become so corporatized in its paradigm that it was easier to craft her own business in order to practice on anything approaching the terms that would best serve her patients and herself.



A few separate passages I’ve combined from Dr. Ronald Epstein’s Attending: Medicine, Mindfulness, and Humanity:

Altogether, I saw too much harshness, mindlessness, and inhumanity. Medical school was dominated by facts, pathways, and mechanisms; residency was about learning to diagnose, treat, and do procedures, framed by a pit-of-the-stomach dread that you might kill someone by missing something or not knowing enough.

Good doctors need to be self-aware to practice at their best; self-awareness needs to be in the moment, not just Monday-morning quarterbacking; and no one had a road map.

The great physician-teacher William Osler once said, “We miss more by not seeing than by not knowing.”

The fast pace of clinical practice—accelerated by electronic records—requires juggling multiple tasks seemingly simultaneously. Although commonly thought of as multitasking, multitasking is a misnomer—we actually alternate among tasks. Each time we switch tasks we need time to recover and, during the recovery period, we are less effective. Psychologists call this interruption recovery failure, which sounds a bit like those computer error messages we all dread. We increasingly feel as if we are victims of distractions rather than in control of them.

Outside of the OR (and not always even then), it’s rare to find an environment that promotes the space for deep focus and self-awareness. Mindfulness, insofar as a daily approach to medical practice, is something that goes against the grain of one’s surroundings.

Good doctors need to be self-aware to practice at their best; self-awareness needs to be in the moment, not just Monday-morning quarterbacking.

I like that. Medicine is generally ripe for Monday-morning quarterbacking (and radiology in particular due to the permanent, accessible, and objective nature of the imaging record).

But doctors don’t work in vacuums. We are humans.

Consider for a moment the discipline of human factors engineering:

Human factors engineering is the discipline that attempts to identify and address these issues. It is the discipline that takes into account human strengths and limitations in the design of interactive systems that involve people, tools and technology, and work environments to ensure safety, effectiveness, and ease of use. A human factors engineer examines a particular activity in terms of its component tasks, and then assesses the physical demands, skill demands, mental workload, team dynamics, aspects of the work environment (e.g., adequate lighting, limited noise, or other distractions), and device design required to complete the task optimally. In essence, human factors engineering focuses on how systems work in actual practice, with real—and fallible—human beings at the controls, and attempts to design systems that optimize safety and minimize the risk of error in complex environments.

(I first found that passage plagiarized on page 8 of the American Board of Radiology’s Non-interpretive Skills Guide.)

Despite the rise of checklists and evidence-based medicine, humans have been almost designed out of healthcare entirely. Rarely is anything in the system–from the overburdened schedules, administrative tasks, constant messaging, system-wide emails, the cluttered EMR, or the byzantine billing/coding game–designed to help humans take the time and mental space to sit in front of a patient (or an imaging study, for that matter) and fully be, in that moment, a doctor.

Program directors and the pass/fail USMLE

Just over a year ago, the NBME announced that Step 1 would soon become pass/fail in 2022. A lot of program directors complained, saying the changes would make it harder to compare applicants. In this study of radiology PDs, most weren’t fans of the news:

A majority of PDs (69.6%) disagreed that the change is a good idea, and a minority (21.6%) believe the change will improve medical student well-being. Further, 90.7% of PDs believe a pass/fail format will make it more difficult to objectively compare applicants and most will place more emphasis on USMLE Step 2 scores and medical school reputation (89.3% and 72.7%, respectively).

Some students also complained, believing that a high Step score was their one chance to break into a competitive specialty.

There are two main reasons some program directors want to maintain a three-digit score for the USMLE exams.

The Bad Reason Step Scores Matter

One reason Step scores matter is that they’re a convenience metric that allows program staff to rapidly summarize a candidate’s merit across schools or other non directly comparable metrics. This is a garbage use case—in all ways you might imagine—but several reasons include:

  • The test wasn’t designed for this. It’s a licensing exam, and it’s a single data point.
  • The standard error of measurement is 6. According to the NBME scoring interpretation guide, “plus and minus one SEM represents an interval that will encompass about two thirds of the observed scores for an examinee’s given true score.” As in, given your score on test day, you should expect a score in that 12-point page only 2/3 of the time. That’s quite the range for an objective summary of a student’s worth.
  • The standard error of difference is 8, which is supposed to help us figure out if two candidates are statistically different. According to the NBME, “if the scores received by two examinees differ by two or more SEDs, it is likely that the examinees are different in their proficiency.” Another way of stating this is that within 16 points, we should consider applicants as being statistically inseparable. A 235 and 250 may seem like a big difference, but our treatment of candidates as such isn’t statistically valid. Not to mention, a statistical difference doesn’t mean a real-life clinical difference (a concept tested on Step 1, naturally).
  • The standard deviation is ~20 (19 in 2019), a broad range. With a mean of 232 in 2019 and our standard errors as above, the majority of applicants are going to fall into that +/- 1SD range with lots of overlap in the error ranges. All that hard work of these students is mostly just to see the average score creep up year to year (it was 229 in 2017 and 230 in 2018). If our goal was just to find the “smartest” 10% of medical students suitable for dermatology, then we could just use a nice IQ test and forget the whole USMLE thing.

It’s easier to believe in a world where candidates are both smarter and just plain better when they have higher scores than it is to acknowledge that it’s a poor proxy for picking smart, hard-working, dedicated, honest, and caring doctors. You know, the things that would actually help predict future performance. Is there a difference in raw intelligence between someone with a 200 vs 280? Almost certainly. That’s 4 standard deviations apart. But what about a 230 and 245? How much are we really accidentally weighing the luxury of having both the time and money needed in order to dedicate lots of both to Step prep?

In my field of radiology, I care a lot about your attention to detail (and maybe your tolerance for eyestrain). I care about your ability to not cut corners and lose your focus when you’re busy or at the end of a long shift. I care that you’re patient with others and care about the real humans on the other side of those images.

There’s no test for that.

If there were, it wouldn’t be given by the NBME.

The Less Bad Reason Step Scores Matter

But there is one use case that unfortunately has some merit: multiple-choice exams are pretty good at predicting performance on other multiple-choice exams. That wouldn’t matter here if licensure was the end of the test-taking game, but Step performance tends to predict future board exam performance.

Some board exams are quite challenging, and programs pride themselves on high pass-rates and hate dealing with residents that can’t pass their boards. So, Step 1 helps programs screen applicants by test-taking ability.

Once upon a time, I considered a career as a neurosurgeon instead of a neuroradiologist. No denying it certainly sounded cooler. I remember attending a meeting with the chair of neurosurgery at my medical school. This is only noteworthy because of his somewhat uncommon frankness. At the meeting, he said his absolute minimum interview/rank threshold was 230 (this was back around 2010). And I remember him saying the only reason he cared was because of the boards. They’d recently had a resident that everyone loved and thought was an excellent surgeon but just couldn’t seem to pass his boards after multiple attempts. It was a blight on the program.

Now, leave aside for a moment the possible issue with test validity if a dutiful clinician and excellent operator is being screened out over some multiple-choice questions. At the end of the day, programs need their residents to pass their boards. And it’s ideal if they pass their boards without special accommodations or other back-bending (like extra study time off-service) to help enable success. So while Step 1 cutoffs may be a way to quickly filter a large number of ERAS applications to a smaller more manageable number, they’re also a way to help programs in specialties with more challenging board exams ensure that candidates will eventually move on successfully to independent practice.

There is only one real reason a “good” Step score matters, and that is because specialty board certification exams are also broken.

One of the easiest ways a program can demonstrate high-quality and high board passage rates regardless of the underlying training quality is to select residents who can bring strong test-taking abilities to bear when it comes to another round of bullshitty multiple-choice exams.

A widely known secret is that board exams don’t exactly reflect real-life practice or real-life practical skills. Much of this type of board knowledge is learned by the trainees on their own, often through commercial prep products. A residency program in a field with a challenging board exam, like radiology, may be incentivized to pick students with high scores simply as a way to best ensure that their board pass rates will remain high. If Step 1 mania has taught us anything, it’s shown us that if you want high scores on a high-stakes exam, you pick people with high academic performance and then get out of their way.

What Are We Measuring?

When I see the work of other radiologists, I am rarely of the opinion that the quality of their work depends on their innate intelligence such as might be measured on a standardized exam. Ironically, most radiology exam questions ask questions about obvious findings. Almost none rely on actually making the finding or combating satisfaction of search (missing secondary or incidental findings when another finding is more obvious). And literally none test whether or not a radiologist can communicate findings in writing or verbally. When radiologists miss findings and get sued, the vast majority are for “perceptual errors” and not “interpretive ones.” As in, when I miss things, it’s relatively rare that I misinterpreted the findings I make and more often that I just didn’t see something (often that even I normally would [because I’m human]).

Obviously, it’s never a bad thing to be super smart or even hard-working. But the medical testing industrial complex has already selected sufficiently for intelligence. What it hasn’t selected for is being competent at practicing medicine.

While everyone would like to have a smarter doctor and train “smarter” residents, the key here is that board passage rates are another reflection of knowledge cached predominately in general test-taking ability and not clinical prowess. All tests are an indirect measure, for obvious reasons, but most include a wide variety of dubiously useful material largely designed to simply make exams challenging without necessarily distinguishing capable from dangerous candidates.

So when program directors complain about a pass/fail Step 1, they should be also be talking with their medical boards. I don’t think we should worry about seeing less qualified doctors, but we should be proactive about ensuring trainee success in the face of exams of arbitrary difficulty.