The 2019 ABR Core Exam Results, the Board Prep Arms Race, and Where It All Went Wrong

On August 15, the ABR released the 2019 Core Exam results, which included the highest failure rate since the exam’s inception in 2013: 15.9%.

(Side note: due to a “computer error,” the ABR decided to release the aggregate results before sharing individual results with trainees, resulting in entirely unnecessary extra anxiety. This itchy trigger finger release is in stark contrast to the Certifying Exam pass rates, which have never been released.)

 

YearPercent PassedPercent FailedPercent ConditionedTotal Examinees
201691.18.50.41,150
201793.56.30.21,173
201886.213.00.81,189
201984.015.90.11,191

So what happened?

 

Option 1

One potential explanation is that current residents are less intelligent, less hard-working, or less prepared for the exam despite similar baseline board scores in medical school, similar training at their residency programs, and now very mature and continually improving board preparation materials. This would seem unlikely.

If it really does simply chalk up to resident “caliber” as reflected in minor variations in Step scores, then I would volunteer that we should be concerned that a minimally related test could be so predictive (i.e., so what are we testing here? Radiology knowledge as gained over years of training or just MCQ ability?).

Option 2

Another explanation is that—despite the magical Angoff method used to determine the difficulty/fairness of questions—the ABR simply isn’t very good at figuring out how hard their test is, and we should expect to see large swings in success rates year to year because different exams are simply easier or harder than others. This is feasible but does not speak well to the ABR’s ability to fairly and accurately test residents (i.e., their primary stated purpose). In terms of psychometrics, this would make the Core exam “unreliable.”

The ABR would certainly argue that the exam is criterion-based and that a swing of 10% is within the norms of expected performance. The simple way to address this would be to have the ABR’s psychometric data evaluated by an independent third-party such as the ACR. Transparency is the best disinfectant.

Option 3

The third and most entertaining explanation is that current residents are essentially being sacrificed in petty opposition to Prometheus Lionheart. The test got too easy a couple years back and there needed to be a course correction.

 

The Core Prep Arms Race

With the widespread availability of continually evolving high-yield board prep material, the ABR may feel the need to update the exam in unpredictable ways year to year in order to stay ahead of “the man.”

(I’ve even heard secondhand stories about persons affiliated with the ABR in some capacity making intimations to that effect including admitting to feeling threatened by Lionheart’s materials/snarky approach and expressing a desire to “get him.” I wouldn’t reprint such things because they seem like really stupid things for someone to admit within public earshot, and I certainly cannot vouch for their veracity.)

If you’re happy with how your exam works, and then third parties create study materials that you feel devalue the exam, then your only option is to change (at least parts of) the exam. This may necessitate more unusual questions that do not make appearances in any of the several popular books or question banks. This is also not a good long-term plan.

This scenario was not just predictable but was the inevitable outcome of creating the Core exam to replace the oral boards. If the ABR thought people “cheating” on the oral boards by using recalls was bad, replacing that live performance with an MCQ test–the single most recallable and reproducible exam format ever created–was a true fool’s errand.

A useless high-stakes MCQ test based on a large and unspecified fraction of bullshit results in residents optimizing their learning for exam preparation. I see first-year residents using Crack the Core as a primary text, annotating it like a medical student annotates First Aid for the USMLE Step 1. Look no further than undergraduate medical education to see what happens when you make a challenging test that is critically important and cannot be safely passed without a large amount of dedicated studying: you devalue the actual thing you ostensibly want to promote.

In medical school, that means swathes of students ignoring their actual curricula in favor of self-directed board prep throughout the basic sciences and third-year students who would rather study for shelf exams than see patients. The ABR has said in the past that the Core Exam should require no dedicated studying outside of daily service learning. That is blatantly untrue, and an increasing failure rate only confirms how nonsensical that statement was and continues to be. Instead, the ABR is going to drive more residents into a board prep attitude that will detract from their actual learning. Time is finite; something always has to give.

If I were running a program that had recurrent Core Exam failures, I wouldn’t focus on improving teaching and service-learning. Because on a system-level, those things are not only hard to do well but probably wouldn’t even help. The smart move would be to give struggling residents more time to study. And that is bad for radiology and bad for patients.

The underlying impression is that the ABR’s efforts to make the test feel fresh every year have forced them to abandon some of the classic Aunt Minnie’s and reasonable questions in favor of an increasing number of bullshit questions in either content or form in order to drive the increasing failure rates. Even if this is not actually true, those are the optics, and that’s what folks in the community are saying. It’s the ABR’s job to convince people otherwise, but they’ve shown little interest in doing so in the past.

There is no evidence that the examination has gotten more relevant to clinical practice or better at predicting clinical performance, because there has never been any data nor will there ever be any data regarding the validity of the exam to do that.

 

The Impossibility of True Exam Validity

The ABR may employ a person with the official title of “Psychometric Director” with an annual base salary of $132,151, but it’s crucial to realize the difference between psychometrics in terms of making a test reliable and reproducible (such that the same person will achieve a similar score on different days) and that score being meaningful or valid in demonstrating what it is you designed the test to do. The latter would be if passing the Core Exam meant that you were actually safe to practice diagnostic radiology and failing it meant you were unsafe. That isn’t going to happen. It is unlikely to happen with any multiple-choice test because real life is not a closed book multiple-choice exam, but it’s compounded by the fact that the content choices just aren’t that great (no offense to the unpaid volunteers that do the actual work here). Case in point: there is completely separate dedicated Cardiac imaging section, giving it the same weight as all of MSK or neuroradiology. Give me a break.

The irony here is that one common way to demonstrate supposed validity is to norm results with a comparison group. In this case, to determine question fairness and passing thresholds, you wouldn’t just convene a panel of subject matter experts (self-selected mostly-academic rads) and then ask them to estimate the fraction of minimally competent radiologists you’d expect to get the question right (the Angoff method). You’d norm the test against a cohort of practicing general radiologists.

Unfortunately, this wouldn’t work, because the test includes too much material that a general radiologist would never use. Radiologists in practice would probably be more likely to fail than residents. That’s why MOC is so much easier than initial certification. Unlike the Core exam, the statement that no studying is required for MOC is actually true. Now, why isn’t the Core Exam more like MOC? That’s a question only the ABR can answer.

I occasionally hear the counter-argument that the failure rate should go up because some radiologists are terrible at their jobs. I wouldn’t necessarily argue that last part, with the caveat that we are all human and there are weak practitioners of all ages. But this sort of callous offhand criticism only makes sense if an increasing failure rate means that the people who pass the exam are better radiologists, the people who fail the exam are worse radiologists, and those who initially fail and then pass demonstrate a measurable increase in their ability to independently practice radiology. It is likely that none of the three statements are true.

Without getting too far into the weeds discussing types of validity (e.g., content, construct, and criterion), a valid Core Exam should have content that aligns closely with the content of practicing radiology, should actually measure radiology practice ability and not just radiology “knowledge,” and should be predictive of job performance. 0 for 3, it would seem.

So, this exam is lame and apparently getting lamer with no hope in sight. And let’s not get started on shameless exercise in redundant futility that is the Certifying Exam. So where did everything go wrong? Right from the start.

That’s the end of the rant. But let’s end with some thoughts for the future.

What the Core Exam SHOULD Be

To the ABR, feel free to use this obvious solution. It will be relatively expensive to produce, but luckily, you have the funds.

Diagnostic radiology is a specialty of image interpretation. While some content would be reasonable to continue in a single-best-answer multiple-choice format, the bulk of the test should be composed of simulated day-to-day practice. Unlike most medical fields, where it would be impossible to objectively see a resident perform in a standardized assortment of medical situations, the same portability of radiology that makes AIs so easy to train and cases so easy to share would be equally easy to use for resident testing.

Oral boards aren’t coming back. The testing software should be a PACS.

Questions would be cases, and the answers would be impressions. Instead of having a selection of radio buttons to click on, there would be free text boxes that would narrow down to a list of diagnoses as you type (like when you try to order a lab or enter a diagnosis in the EMR; this component would be important to make grading automated.)

The exam could be anchored in everyday practice. One should present cases centered on the common and/or high-stakes pathology that we expect every radiologist to safely and consistently diagnose. We could even have differential questions by having the examinee enter two or three diagnoses for the cases where such things are important considerations (e.g., some cases of diverticulitis vs colon cancer). These real-life PACS-based cases could be tied into second-order questions about management, communication, image quality, and even radiation dose. But it should all center around how radiologists actually view real studies. It could all be a true real-world simulation that is a direct assessment of relevant practice ability and not a proxy for other potentially related measurables. Let’s just have the examinees practice radiology and see how they do.

The ABR has argued in the past that the Core exam cannot be ported to a commercial center, which is largely the fault of the ABR for producing a terrible test. But at least that argument would finally hold water if the ABR actually deployed a truly unique evaluative experience that could actually demonstrate a trainee’s ability. The current paradigm is silly and outdated, and radiology is uniquely positioned within all of medicine to do better. The exam of the future should not be rooted in the largely failed techniques of the past.

 

10 Comments

  1. Hi Ben,
    Big fan here. Been reading your stuff since med school — thanks for writing!

    I’m a PGY-5 rad onc resident, and the ABR did the same thing to us in rad onc last year for our radiation physics and cancer biology qualifying exams, with pass rates dropping to 71% and 74%, respectively (see here: https://www.theabr.org/radiation-oncology/initial-certification/the-qualifying-exam/scoring-and-results). Approximately 1/3 of the residents in the country failed one of the two exams, and we are still nervously awaiting this year’s results as well.

    Some further reading:

    Response by ARRO (resident association) to the ABR: https://www.astro.org/ASTRO/media/ASTRO/AffiliatePages/arro/PDFs/ARROLettertoABR.pdf

    ABR response to ARRO: https://www.astro.org/ASTRO/media/ASTRO/AffiliatePages/arro/PDFs/ABRResponse.pdf

    Editorial by leaders in the field in our specialty’s journal calling for changes: https://www.redjournal.org/article/S0360-3016(18)34228-7/fulltext

    Reply
    • Great links, thanks for sharing!

      I heard about the rad onc situation last year, which is an order of magnitude more dramatic. I love the ABR’s response: 1) you have no idea what you’re talking about; 2) trust us, our test is awesome and our methods infallible; and 3) it’s really just the fault of those lame uncompetitive small programs because they don’t teach their intellectually challenged residents enough.

      It’s like they think they can gaslight away a 20% drop in exam passage.

      Reply
      • Having had tremendously long email discussions with ABR higher ups, I can verify that they do not like crack the core. When I asked what I SHOULD use to study/prep for the exam, they basically recommended my attendings, every video on the ACR website, and every textbook. I asked what they thought about the Core radiology text and they had never heard of it. They genuinely have no clue what resources are available for their own exam and do not offer tremendous guidance.

        I’ve taken the exam multiple times now and the sections I’ve passed/failed have changed dramatically between every test. I went from getting EXTREMELY high scores in particular sections to outright failing them the next time. These did not correlate with the amount of studying for the corresponding sections. One would think that if this material were truly what the “minimally competent” radiologist should know, that I wouldn’t have such large fluctuations…..I also don’t see how anything I can google quickly for an answer is remotely relevant.

      • I’m sorry to hear that. It’s a frustrating state of affairs, made all the worse because the ABR really just could do better.

  2. I have long argued that the material the ABR throws out to residents, fellows, and now attendings is not at all a measure of how competent a radiologist is.

    In the worst examples it can actually make radiologists WORSE.

    Take for example the asinine MOC weekly quizzes that attending radiologists are now being forced to take. They are timed questions (1-3 minutes). Why do they need to be timed? In a true clinical practice I do not have a time limit on a study. If there is something I see I can take my time and research it or ask a colleague for input. The ABR is trying to create a mindset where a radiologist is supposed to look at an image and make a snap judgment in 1-3 minutes. That type of behavior is dangerous and does not promote good radiology practices in the real world. This is because people sitting in their ivory towers do not see front line radiologists (and don’t get me started on the money grab play from all this).

    The only part of the testing that was the most useful was the oral boards in Louisville so guess what, the “geniuses” in the ABR decided to eliminate that part.

    Reply
  3. I have a question about the results of this, are these results only for the June version of the exam? Are stats ever released for the November iteration of the exam. As someone taking the test for the first time in November, these stats are highly alarming. Would they also enforce a 16% fail rate for the November session?

    Reply
    • My understanding is that these are always just the main June administration. They have never publically released the November pass rates on their site, but I have heard numbers bandied about on anonymous internet forums. My impression is that the failure rate is higher, but presumably that is due to the fact that many takers are those who have already failed an attempt.

      The ABR does not set a true failure threshold, and the exam is not curved (i.e. they won’t be enforcing a 16% fail rate). They set passing standards based on the perceived difficulty of the exam questions as determined by the Angoff committees. Their stated goal is that all test administrations should be of similar difficulty such that variance from year to year only reflects the different people taking it.

      I think the exam is basically garbage, but I don’t think the ABR is lying about that. They can make the exam harder year to year–even by accident–without needing to change their stated approach. I wouldn’t worry about the time of year you take it or anything truly nefarious on their part. The ABR is just misguided and casually incompetent. Chin up, do your best.

      Reply
  4. Option 4: More retakes = more $$$ for the ABR. Failing the exam essentially has no negative effect for the ABR or for radiology in general. Those who fail, retake the exam once, twice, 3 times etc until they pass. The only victims are the residents and the programs associated with said failures. While its likely the other options contribute to a multifactorial explanation, money is always part of it.

    In the words of Wu-Tang:
    C.R.E.A.M.
    (cash rules everything around me)

    Reply
    • Given the relative amount of money involved, I always considered Option 4 to be an ancillary benefit of Option 3 :)

      Reply

Leave a Reply to Ben Cancel reply