The 2019 ABR Core Exam Results, the Board Prep Arms Race, and Where It All Went Wrong

On August 15, the ABR released the 2019 Core Exam results, which included the highest failure rate since the exam’s inception in 2013: 15.9%.

(Side note: due to a “computer error,” the ABR decided to release the aggregate results before sharing individual results with trainees, resulting in entirely unnecessary extra anxiety. This itchy trigger finger release is in stark contrast to the Certifying Exam pass rates, which have never been released.)

 

YearPercent PassedPercent FailedPercent ConditionedTotal Examinees
201691.18.50.41,150
201793.56.30.21,173
201886.213.00.81,189
201984.015.90.11,191

So what happened?

 

Option 1

One potential explanation is that current residents are less intelligent, less hard-working, or less prepared for the exam despite similar baseline board scores in medical school, similar training at their residency programs, and now very mature and continually improving board preparation materials. This would seem unlikely.

If it really does simply chalk up to resident “caliber” as reflected in minor variations in Step scores, then I would volunteer that we should be concerned that a minimally related test could be so predictive (i.e., so what are we testing here? Radiology knowledge as gained over years of training or just MCQ ability?).

Option 2

Another explanation is that—despite the magical Angoff method used to determine the difficulty/fairness of questions—the ABR simply isn’t very good at figuring out how hard their test is, and we should expect to see large swings in success rates year to year because different exams are simply easier or harder than others. This is feasible but does not speak well to the ABR’s ability to fairly and accurately test residents (i.e., their primary stated purpose). In terms of psychometrics, this would make the Core exam “unreliable.”

The ABR would certainly argue that the exam is criterion-based and that a swing of 10% is within the norms of expected performance. The simple way to address this would be to have the ABR’s psychometric data evaluated by an independent third-party such as the ACR. Transparency is the best disinfectant.

Option 3

The third and most entertaining explanation is that current residents are essentially being sacrificed in petty opposition to Prometheus Lionheart. The test got too easy a couple years back and there needed to be a course correction.

 

The Core Prep Arms Race

With the widespread availability of continually evolving high-yield board prep material, the ABR may feel the need to update the exam in unpredictable ways year to year in order to stay ahead of “the man.”

(I’ve even heard secondhand stories about persons affiliated with the ABR in some capacity making intimations to that effect including admitting to feeling threatened by Lionheart’s materials/snarky approach and expressing a desire to “get him.” I wouldn’t reprint such things because they seem like really stupid things for someone to admit within public earshot, and I certainly cannot vouch for their veracity.)

If you’re happy with how your exam works, and then third parties create study materials that you feel devalue the exam, then your only option is to change (at least parts of) the exam. This may necessitate more unusual questions that do not make appearances in any of the several popular books or question banks. This is also not a good long-term plan.

This scenario was not just predictable but was the inevitable outcome of creating the Core exam to replace the oral boards. If the ABR thought people “cheating” on the oral boards by using recalls was bad, replacing that live performance with an MCQ test–the single most recallable and reproducible exam format ever created–was a true fool’s errand.

A useless high-stakes MCQ test based on a large and unspecified fraction of bullshit results in residents optimizing their learning for exam preparation. I see first-year residents using Crack the Core as a primary text, annotating it like a medical student annotates First Aid for the USMLE Step 1. Look no further than undergraduate medical education to see what happens when you make a challenging test that is critically important and cannot be safely passed without a large amount of dedicated studying: you devalue the actual thing you ostensibly want to promote.

In medical school, that means swathes of students ignoring their actual curricula in favor of self-directed board prep throughout the basic sciences and third-year students who would rather study for shelf exams than see patients. The ABR has said in the past that the Core Exam should require no dedicated studying outside of daily service learning. That is blatantly untrue, and an increasing failure rate only confirms how nonsensical that statement was and continues to be. Instead, the ABR is going to drive more residents into a board prep attitude that will detract from their actual learning. Time is finite; something always has to give.

If I were running a program that had recurrent Core Exam failures, I wouldn’t focus on improving teaching and service-learning. Because on a system-level, those things are not only hard to do well but probably wouldn’t even help. The smart move would be to give struggling residents more time to study. And that is bad for radiology and bad for patients.

The underlying impression is that the ABR’s efforts to make the test feel fresh every year have forced them to abandon some of the classic Aunt Minnie’s and reasonable questions in favor of an increasing number of bullshit questions in either content or form in order to drive the increasing failure rates. Even if this is not actually true, those are the optics, and that’s what folks in the community are saying. It’s the ABR’s job to convince people otherwise, but they’ve shown little interest in doing so in the past.

There is no evidence that the examination has gotten more relevant to clinical practice or better at predicting clinical performance, because there has never been any data nor will there ever be any data regarding the validity of the exam to do that.

 

The Impossibility of True Exam Validity

The ABR may employ a person with the official title of “Psychometric Director” with an annual base salary of $132,151, but it’s crucial to realize the difference between psychometrics in terms of making a test reliable and reproducible (such that the same person will achieve a similar score on different days) and that score being meaningful or valid in demonstrating what it is you designed the test to do. The latter would be if passing the Core Exam meant that you were actually safe to practice diagnostic radiology and failing it meant you were unsafe. That isn’t going to happen. It is unlikely to happen with any multiple-choice test because real life is not a closed book multiple-choice exam, but it’s compounded by the fact that the content choices just aren’t that great (no offense to the unpaid volunteers that do the actual work here). Case in point: there is completely separate dedicated Cardiac imaging section, giving it the same weight as all of MSK or neuroradiology. Give me a break.

The irony here is that one common way to demonstrate supposed validity is to norm results with a comparison group. In this case, to determine question fairness and passing thresholds, you wouldn’t just convene a panel of subject matter experts (self-selected mostly-academic rads) and then ask them to estimate the fraction of minimally competent radiologists you’d expect to get the question right (the Angoff method). You’d norm the test against a cohort of practicing general radiologists.

Unfortunately, this wouldn’t work, because the test includes too much material that a general radiologist would never use. Radiologists in practice would probably be more likely to fail than residents. That’s why MOC is so much easier than initial certification. Unlike the Core exam, the statement that no studying is required for MOC is actually true. Now, why isn’t the Core Exam more like MOC? That’s a question only the ABR can answer.

I occasionally hear the counter-argument that the failure rate should go up because some radiologists are terrible at their jobs. I wouldn’t necessarily argue that last part, with the caveat that we are all human and there are weak practitioners of all ages. But this sort of callous offhand criticism only makes sense if an increasing failure rate means that the people who pass the exam are better radiologists, the people who fail the exam are worse radiologists, and those who initially fail and then pass demonstrate a measurable increase in their ability to independently practice radiology. It is likely that none of the three statements are true.

Without getting too far into the weeds discussing types of validity (e.g., content, construct, and criterion), a valid Core Exam should have content that aligns closely with the content of practicing radiology, should actually measure radiology practice ability and not just radiology “knowledge,” and should be predictive of job performance. 0 for 3, it would seem.

So, this exam is lame and apparently getting lamer with no hope in sight. And let’s not get started on shameless exercise in redundant futility that is the Certifying Exam. So where did everything go wrong? Right from the start.

That’s the end of the rant. But let’s end with some thoughts for the future.

What the Core Exam SHOULD Be

To the ABR, feel free to use this obvious solution. It will be relatively expensive to produce, but luckily, you have the funds.

Diagnostic radiology is a specialty of image interpretation. While some content would be reasonable to continue in a single-best-answer multiple-choice format, the bulk of the test should be composed of simulated day-to-day practice. Unlike most medical fields, where it would be impossible to objectively see a resident perform in a standardized assortment of medical situations, the same portability of radiology that makes AIs so easy to train and cases so easy to share would be equally easy to use for resident testing.

Oral boards aren’t coming back. The testing software should be a PACS.

Questions would be cases, and the answers would be impressions. Instead of having a selection of radio buttons to click on, there would be free text boxes that would narrow down to a list of diagnoses as you type (like when you try to order a lab or enter a diagnosis in the EMR; this component would be important to make grading automated.)

The exam could be anchored in everyday practice. One should present cases centered on the common and/or high-stakes pathology that we expect every radiologist to safely and consistently diagnose. We could even have differential questions by having the examinee enter two or three diagnoses for the cases where such things are important considerations (e.g., some cases of diverticulitis vs colon cancer). These real-life PACS-based cases could be tied into second-order questions about management, communication, image quality, and even radiation dose. But it should all center around how radiologists actually view real studies. It could all be a true real-world simulation that is a direct assessment of relevant practice ability and not a proxy for other potentially related measurables. Let’s just have the examinees practice radiology and see how they do.

The ABR has argued in the past that the Core exam cannot be ported to a commercial center, which is largely the fault of the ABR for producing a terrible test. But at least that argument would finally hold water if the ABR actually deployed a truly unique evaluative experience that could actually demonstrate a trainee’s ability. The current paradigm is silly and outdated, and radiology is uniquely positioned within all of medicine to do better. The exam of the future should not be rooted in the largely failed techniques of the past.

 

Class Action Lawsuit Against the ABR

Radiology joined the ranks of physician-led class action lawsuits against the ABMS member boards last week when interventional radiologist Sadhish K. Siva filed a complaint on behalf of radiologists against the ABR for (and I’m paraphrasing) running an illegal anticompetitive monopoly and generally being terrible.

You can read the full 30-page suit if you’re interested. Legal writing is generally not of the page-turning variety, but there are still some great lines.

Regarding MOC (emphasis mine):

[The] ABR admits that no studying will be necessary for [the new MOC program] OLA and that ABR “doesn’t anticipate” incorrect answers “will happen often.” ABR also confirms on its website that “[t]he goal with all OLA content is that diplomates won’t have to study.” When a question is answered incorrectly, an explanation of the correct answer is provided so that when a similar question is asked in the future it can be answered correctly. Unsurprisingly, ABR admits it does “not anticipate a high failure rate.”

In short, to maintain ABR certification under OLA, a radiologist need only spend as little as 52 minutes per year (one minute for each of 52 questions) answering questions designed so as not to require studying, and for which ABR anticipates neither incorrect answers nor a high failure rate.

Because OLA has been designed so that all or most radiologists will pass, it validates nothing more than ABR’s ability to force radiologists to purchase MOC and continue assessing MOC fees.

Burn!

Though not called out in the lawsuit, this argument also applies to the Certifying Exam (a second, superfluous exam taken after the Core Exam, after graduating residency, and after already practicing independently as a radiologist). This may be in part because the angriest radiologists are the ones who paid for and then passed what should have been a 10-year recertification exam only to be told they had to start shelling out and doing questions right after. But the main reason is likely that the suit primarily asserts that the monopolistic behavior at play includes the ABR illegally tying mandatory MOC to its “initial certification product,” and the Certifying Exam—though suspect–is part of the initial certification process.

Interesting fact that I did not know about MOC & the insurance market:

In addition, patients whose doctors have been denied coverage by BCBS because they have not complied with MOC requirements, are typically required to pay a higher “out of network” coinsurance rate (for example, 10% in network versus 30% out of network) to their financial detriment.

It’s amazing how these organizations, which are completely unaccountable, have become such integral parts of so many different components of the healthcare machine from hospital credentialing to insurance coverage.

Speaking of that power:

The American Medical Association (“AMA”) has adopted “AMA Policy H-275.924, Principles on Maintenance of Certification (MOC),” which states, among other things, that “MOC should be based on evidence,” “should not be a mandated requirement for licensure, credentialing, reimbursement, network participation or employment,” “should be relevant to clinical practice,” “not present barriers to patient care,” and “should include cost effectiveness with full financial transparency, respect for physician’s time and their patient care commitments, alignment of MOC requirements with other regulator and payer requirements, and adherence to an evidence basis for both MOC content and processes.” ABR’s MOC fails in all of these respects.

And lastly:

[The] ABR is not a “self”-regulatory body in any meaningful sense for, among other reasons, its complete lack of accountability. Unlike the medical boards of the individual States, for example, as alleged above, ABR is a revenue-driven entity beholden to its own financial interests and those of its officers, governors, trustees, management, and key employees. ABR itself is not subject to legislative, regulatory, administrative, or other oversight by any other person, entity, or organization. It answers to no one, much less to the radiologist community which it brazenly claims to self-regulate.

Final burn!

Whether or not the suit will convince a jury that an illegal monopoly is at play, I don’t know. I can take a pretty confident educated guess as to what radiologists are rooting for. It’s pretty clear that while MOC can engender a controversy, the ABR’s efforts can’t meaningfully impact the quality of radiology practiced by its diplomates or have a significant effect on patient care.

 

Stop Free-Dictating

There are many institutions/practices with well-defined “normal” templates for all types studies, which help provide a reasonable approximation of a house style. A clinician (or the next radiologist) has a reasonable chance of knowing where to find the information in the report. The reader can see something in the impression and quickly find the longer description in the body of the report for more information.

Templates can be brief skeletal outlines or include more thorough components containing pertinent negative verbiage. A section for the Kidneys could say “Normal” or it could say, “No parenchymal lesions. No calculi. No hydronephrosis.” Some groups have diagnosis-specific templates that build off a generic foundation to better address specific concerns like renal mass characterization or appendicitis.

Either way, some form of templating is critical to creating a readable report. After all, radiology for better or worse is a field where the report is the primary product, and creating reports that are concise, organized, and readable should be a goal.

Some institutions and practices do not have these baseline templates. There are (often but not always older) attendings who seem to not only practice but respect the freewheeling old school transcriptionist style of reporting. A resident who doesn’t “need” a template is to be prized and congratulated.

This isn’t 100% wrong either. It’s a useful ability in the sense that it’s important to be able to summarize findings in cohesive English. It’s largely the same skill as the casemanship skills used during hot-seat conferences that the recent Core exam generation of residents have largely lost, and so I can appreciate this perspective. However, at least from a reporting perspective, this is wrong in the 21st century.

 

The purpose of the radiology report

The first attending I ever worked in radiology was a neuroradiologist who posed a semi-rhetorical question on my first day. He used to ask:

What is the purpose of the radiology report?

The answer, he argued, was to create the right frame of mind in the reader.

I think this view is exactly right.

Defined in a narrow sense, this means that the reader should come away with the impression that you intend for them to have. If something is bad and scary, that should be clear. If something is of no consequence, that should also be clear. Items in the impression are there because we want those impressed on the minds of our readers, not just because we saw them.

With increasing patient access to radiology reports, we now have a second audience. While doing away with all medical and radiological jargon is probably misguided and unnecessary, we need to at least be cognizant of how our reports might read to a layperson (or non-specialist, for that matter). If we can be more clear and more direct, we have a greater chance of communicating effectively to all involved parties.

Templates make reports more organized, scannable, and readable. Not even debatable.

But while the primary intent of “frame of mind”-creation may relate to the significant radiological findings, it’s also about creating the right frame of mind about you, the radiologist. Thorough, thoughtful, organized, conscientious? Or rushed, disorganized, careless, apathetic?

There may be some perks of blinding readers with science and drowning readers in long-winded descriptions of even benign and irrelevant incidental findings. At least you won’t look lazy! But for the less verbose among us, we can show we care by creating reports that reflect our systematic approach and clear writing style. Templating is critical to creating digestible reports.

Lastly, as quality metrics rise in importance and resource utilization re-enters the arena as a responsibility of the radiologist, we also need our reports to be readable and indexable by computers. The easier our reports are to parse, the easier we can extract meaningful data about our findings, link these up with patient data from the EMR, and draw high-powered conclusions about patient impact, outcomes, and (of special importance to me) the utility of certain exams in specific clinical contexts.

 

Dictation software is a tool, not a recorder

If you’re a resident somewhere and your institution doesn’t have power normals to frame-out your reports, make some. If you find yourself saying the exact same things over and over again every single day, then you’re doing it wrong. It should either part of the template or an auto-text macro. If nothing else, it will reduce your rate of transcription errors.

No one needs to reinvent the wheel on every case!