ABR totally botches 2017 Core Exam

This email belies how royally the ABR botched the 2017 Core Exam.

What the ABR should have done is what any accountable organization should do when they mess up.

  • Express regret and acknowledge responsibility
  • Be transparent and describe the mistake
  • Give an action plan and step to correct the problem
  • Ask for forgiveness

Instead, examinees received the lip service version.

“Technical issue” is not a satisfactory explanation for the cause.

“Problems with the display of some questions” is not what happened.

“Those questions will NOT be counted toward your exam results” is a grossly incomplete solution.

So what did happen?

Well, the ABR still hasn’t offered a technical explanation. It would seem there was an issue with mammo module of the exam. If I had to guess, the larger image file sizes in this module probably exceeded a temporary throttling of server they were hosted on and could not be transferred to all stations as the requests timed out.

But who knows? Apparently not the ABR.

The result of whatever happened is that some examinees in Chicago couldn’t start the exam. Some of them waited nervously in the holding room at the hotel room without explanation awaiting the shuttle. Others already at the center just had to sit at their desks wondering when they would be able to start. For two hours. Which of course turns the already long day into a hellishly long one with nerves racked, tummy grumbling, caffeine wearing off, etc.

Once the exam began, some test-takers had the mammo questions. Others did not. And some had them added to the end of the test mid-way through, suddenly increasing their day by another hour. In all cases, the ABR has suggested that “those questions” won’t adversely affect their scores. This presumably means that no one in Chicago will have mammo graded. But then why add it to some people’s tests and not others? Why make someone whose test-day is already two hours delayed stay another hour for questions that won’t count? How are they going to reconcile the fact that there are psychological and fatigue effects from this mistake that have nothing to do with the “display of some questions,” and that some of this could have simply been mitigated by upfront transparency?

In the grand scheme of things, given that nobody has ever conditioned the mammo section, I imagine the ABR feels confident saying that those questions not being graded will not have a meaningful impact on the grading of the examination itself. With around 103 total fails last year, one imagines only a fraction of those would even include mammo. Even the vast majority of people affected are probably nowhere near the failing mark, unfair psychological BS notwithstanding.

A follow-up email on June 14 (almost a week later) said this (emphasis mine):

The ABR sincerely regrets the problems with the administration of the Core Exam in Chicago on Thursday, June 8, 2017. We are taking this matter very seriously and are working hard to identify the sources of the problem and the impact on affected candidates.

We don’t yet have all the information needed to determine how many candidates have been affected and to what extent. Staff worked very hard over the weekend to ensure that the Core exams administered in Chicago and Tucson this week would go smoothly, and we have had no issues.

I want to emphasize that any candidate impacted by last Thursday’s difficulties with the breast imaging content will not have those items counted against their scores. We don’t expect anyone to have problems qualifying for MQSA.

How can you not know who was affected? The nature of this problem should have made it obvious who was affected during the examination itself. What they mean is that—despite getting into the business of test administration—the ABR never anticipated technical difficulties, has no meaningful system in place for troubleshooting or identifying issues, and had no contingency plans formed to deal with this eventuality.

Also missing: acknowledgment of any the issues outlined above outside of the “difficulties with the breast imaging content.”

And: you don’t “expect” problems with MQSA? The MQSA requirements only state that the radiologist be board-certified, not that the boards actually contain mammography. Of course this shouldn’t be a problem. But if you anticipate that there could be an issue, perhaps you should get some clarification before dropping a half-baked position-statement.1

 

Let’s go back to the underlying arguments for how we got here in the first place.

From the ABR FAQ:

Why do I have to go to Chicago or Tucson instead of a local testing center for diagnostic radiology exams?
With the transition to more image-rich exams with advanced item types, the ABR has built two exam centers in Chicago and Tucson to administer all diagnostic radiology exams. At this time, commercial test centers do not have the technology or means available to support these kinds of exams.

More detail from the 2014 Core Exam FAQ & misconceptions presentation:

Why can’t I just go to a PearsonVUE center to take this test?
• Modular content difficult for PV
• PV can’t handle case structure on their software
• PV monitors aren’t calibrated, can’t control lighting
• Aim: to have distributed exam. We are working on system to implement

So, now in 2017, we can firmly debunk these arguments

1. Modular Content

The content is not bizarrely or unique modular. First, this doesn’t really matter (even the very long Step exams are broken up into multiple modules). In years past, the modules for different sections were given in succession (breast, then cardiac, then GI) though lumped seamlessly into one large mega-module as you progress through the day. This year the modules were jumbled and topics jumped around. Thus there are just two days of relatively unmodular content.

2. PV can’t handle case structure on their software

This is only plausible if the ABR’s software is particularly poorly written. The USMLE also has multiple different case structure formats, including videos, images, and interactive fake physical exams, not to mention Step 3’s ludicrous choose-your-own-adventure CCS program. If we need to get rid of the two or three “drag the X” format questions per test in order to do a disseminated exam, I think we can all agree the collective radiology hivemind would acquiesce.

3. PV monitors aren’t calibrated, can’t control lighting

After this year’s difficulties, one can easily argue that there is no point having a “well-calibrated” monitor that can’t even show the carefully curated “Angoff-validated” questions in the first place. I’ll admit, the lighting is nicely dim. As a practical matter, few images are of sufficient quality for the lighting to be a plausible limiting factor. Most of the MR looks photocopied from books published in the 1980s. Residents take the ACR in-service exam in droves every year. The criticism there has always been the exam itself; not the testing software nor the ambiance of the venue.

4. Aim: to have distributed exam. We are working on system to implement

2018 sounds like a great year to start.

 

The costs of the ABR’s exam paradigm are absurd

There are almost 1200 graduating radiology residents every year (1149 took the core in 2016; 91% passed). Every class contributes $640 per person per year for a total of $3 million per graduating class over the course of a four-year residency ($4.6 million total when including the extra two years to take the Certifying Examination). That also means that the ABR rakes in around $750k per class per year and $3 million per year from residents alone. Not to mention the $340/year for every single radiologist in the MOC phase. Or the $3000+ to take subspecialty exams like neuro or VIR.

To reiterate: the class that just took this failed exam gave the ABR on the order of $3,000,000 to take this test. This figure doesn’t include the additional costs for the honor of traveling across the country to spend two days in a hotel to actually take the exam (at least another $500,000 per year).

If you can’t get photos and radio buttons working consistently on an operating budget of millions, then you’re doing it wrong.

 

Having a decent test is an important noninterpretive skill

When the ABR decided to start from scratch and write a new exclusively computer-based exam, they chose to become not just test-writers but test-administrators. No one forced the ABR to write a test that no high-volume testing center could implement. When you take over something this important, you have to do it right, and you should be completely accountable for your performance. Transparency should not be optionable. The way the Core and Certifying exams were created, graded, and handled is a poorly conceived and unnecessarily obfuscated embarrassment (e.g. why does the Certifying exam even exist?).

You don’t just say things like2

we had a mysterious technical difficulty but also we totally fixed it we promise though actually we don’t know what happened or exactly to whom it happened but also don’t worry about those questions they won’t count for anyone because for real we don’t know who had them or didn’t have them or if they had them how pretty they looked so trust us also by the way your annual fee is due.

Since noninterpretive skills are an important part of the Core exam, let’s just say that a 6% failure rate for successful Core exam administrations is a far cry from Six Sigma.3

ACGME reaffirms independent call for radiology is okay

They didn’t actually do that. That is my subjective interpretation as a random person of the language of the current ACGME Common Program Requirements (emphasis mine):

For many aspects of patient care, the supervising physician may be a more advanced resident or fellow. Other portions of care provided by the resident can be adequately supervised by the immediate availability of the supervising faculty member, fellow, or senior resident physician, either on site in the institution or by means of telephonic and/or electronic modalities. Some activities require the physical presence of the supervising faculty member. In some circumstances, supervision may include post-hoc review of resident-delivered care with feedback.

I think imaging has and should continue to fall under “some circumstances.” Until the machines take over, hold-out radiology programs should strive to maintain their status quos of “post-hoc review.” Efforts should absolutely be made to improve that review process and help residents learn and iterate toward improvement, but the last thing we need in the era of increasing mid-level autonomy is to have graduating residents unable to make a call.

The danger (?) of intravenous contrast media

Another study piling on the mounting evidence that at least modern contrast agents put into people’s veins (and not arteries) for CT scans might not be bad for your kidneys after all.

The biggest single center study of EM patients was just published in The Annals of Emergency Medicine, which studied 17,934 patient encounters and compared renal function across 7201 contrast-enhanced scans, 5499 non-con scans, and 5,234 folks with no-CT.

6.8%, 8.9%, and 8.1% were the rates of AKI respectively. As in, folks who received either no contrast or no CT imaging were more likely to have a significant rise in creatinine than people who got contrast. As in, contrast was protective (statistically). Using different cutoff guidelines for AKI, the three were all statistically equivalent.

Practice patterns here still get in the way. Patients with low GFRs are more likely to get fluids prior to receiving contrast, possibly explaining the pseudo-protective effect of contrast. Patients with poor renal function are less likely to get contrast in the first place, reducing the power for evaluating contrast’s effects on those with CKD. However, controlling for baseline GFR didn’t change the story: there wasn’t an increased risk associated with receiving intravenous contrast in this controlled retrospective study regardless of underlying renal disease.

Historically, randomized controlled trials designed to elucidate the true incidence of contrast-induced nephropathy have been perceived as unethical because of the presumption that contrast media administration is a direct cause of acute kidney injury. To date, all controlled studies of contrast-induced nephropathy have been observational, and conclusions from these studies are severely limited by selection bias associated with the clinical decision to administer contrast media.

Maybe with all this mounting evidence it’s time to do an RCT.

It was the best of exams. It was the worst of exams.

From the awesome and scathing “What Went Wrong With the ABR Examinations?” in JACR:

The new examination format also does a poor job of emulating how radiology needs to be practiced. Each candidate is alone in a cubicle, interacting strictly with a computer. There is no one to talk to and no opportunity to formulate a differential diagnosis, suggest additional imaging options, or provide suggestions for further patient management. The examination consists entirely of multiple-choice questions, a highly inauthentic form of assessment.

Only partially true. Questions can ask you for further management. Additionally, it’s possible to formulate questions (via checkbox) that allow you select reasonable inclusions for a differential. This isn’t the same as having a list memorized but is in some ways more accurate in the world of Google, StatDX etc. Of course, this kind of question isn’t meaningfully present, but multiple choice format itself doesn’t necessarily preclude all meaningful lines of testing.

Another rationale for the new examination regimen was integrity. Yet instead of reducing candidate reliance on recalled examination material, the new regimen has increased it, spawning at least six commercial online question bank sites. The fact that one of the most widely used print examination preparation resources is pseudonymously authored is a powerful indicator that the integrity of the examination process has been undermined, effectively institutionalizing mendacity.

Every board exam has qbank products. Part of why Crack the Core is pseudonymously authored isn’t just the recalls; it’s presumably also related to his amusing but completely unprofessional teaching style. I very much doubt the Core Exam is more “recalled” than anything available for the prior exams. What we should be doing is acknowledging that any standardized test will be prepared for this way via facsimile questions, and there is literally no way to avoid it. It’s not as though Step 1 is any different.

Many of the residents we speak with regard the core examination not as a legitimate assessment of their ability to practice radiology but as a test of arcana. When we recently asked a third-year resident hunkered down over a book what he was studying, he replied, “Not studying radiology, that’s for sure. I am studying multiple-choice tests.” The fact that this sentiment has become so widespread should give pause to anyone concerned about the future of the field.

Yes, this is true. But it also strikes me that the old school boards wrapped a useful and worthwhile skill in a bunch of gamesmanship, BS, and pomp. Nonetheless I can’t dispute that casemanship skills have real-world parallels and that the loss of them may have resulted in some young radiologists sounding like idiots when describing a novel case in front of a group of their peers.

In essence, the ABR jettisoned a highly effective oral board examination that did a superb job of preparing candidates for the real-world practice of radiology and replaced it with an examination that merely magnifies the defects of the old physics and written examinations. The emphasis is now on memorization rather than real-time interaction and problem solving. In our judgment, candidates are becoming less well prepared to practice radiology.

It seems increasingly true that anyone more than a couple years out of residency has now fully fetishized the oral boards. It’s definitely true that traditional case taking skills have rapidly atrophied; residency may feel long but institutional memory is short. Old school casemanship isn’t really the same thing as talking to clinicians, but it certainly has more in common with that than selecting the “best” answer from a list of choices.

It is an important skill/ability to succinctly and correctly describe a finding and its associated diagnosis. Some residents now are still able to get the diagnosis but may struggle with describing the findings appropriately when on the spot. But I don’t how much that matters in the long term and if this lack self-corrects over time. I would be interested in seeing if any of the old vs new debate has would have any impact on the quality of written reports, the fundamental currency of our field in the 21st century. I’ve seen plenty of terrible reports and unclear language from older radiologists, so the oral boards barrier couldn’t have been that formative.

The fact is that neither exam is a good (or even reasonable) metric. Frankly, a closed-book exam in and of itself is inherently unrealistic from daily practice. But any exam that trades in antiquated “Aunt Minnies” or relies on demonstrating “common pathology in unusual ways” are really dealing in academic mind games and not really testing baseline radiologic competence.