NBME Shelf Exam scores, with a grain of salt

The NBME Shelf exams are enjoyable standardized tests that every first year looks forward to with almost unbearable glee. Each tests a single subject (“Anatomy”) and is (for the preclinical years)  made up from the old or junior varsity questions from the USMLE Step 1, a test that makes the MCAT look like the GRE and the SAT look like building with Lincoln logs.

Some schools force their students to take a variety of Shelf exams (spending/wasting $30 a pop) to help measure how well their students have mastered the material (AKA how they are doing compared to their national counterparts). What is a bit amusing and misleading about the whole ordeal is that the national norms are probably a big crock.

Different schools use the “shelves” differently. Some use them as a just-for-fun intellectual exercise, others as extra-credit, and still others as a true final exam. Don’t get me wrong, it’s not a bad thing to get some USMLE Step 1 experience, but it’s highly dependent on the environment: if you take five shelf exams in a single week, you are clearly not going to be prepared or even particularly focused. If it’s your final exam, you are going to do your best to rock it.

So if the national average is computed from all of these groups together, then it’s going to have a huge unseen left tail: if people are taking the exam who don’t care how they perform, they’re going to be dragging the average down from where it would otherwise be. So while the test is technically normalized, it’s not the same normal as a regular standardized test: Unlike the MCAT, not every student has something riding on the exam. I personally knew people who filled out all C’s on an exam that was for extra-credit only.

While your school receives the group’s average and your grade relative to your test group (classmates), the theoretically more interesting numbers a student receives are the grade based on the national average and corresponding percentile. I’m curious as to how far off the scores really are. If all those people who weren’t making a good faith effort actually tried (as they do on the USMLE Steps 1, 2, 3), then I’d wager it’d be a different ball game. It’s essentially an unstandardized standardized test.

Further reading: How NBME Shelf Scores Work

7 Comments

  1. hi,

    thanx for the information, I’m a final year medical student in saudi arabia.
    I wish to get into a residency program in the US, so I came over for “FALCON Physician reviews” …

    I’m done studying, my exam is in 2 weeks.
    well, I’ve read what you wrote about the scores, that is a problem!

    I think I’m gonna take the NBME self exam anyway, but I found many forms, is there a level deference between them? which one should I choose?
    a couple of my friends are telling me to choose 6 …well, I don’t know!

  2. I’m not sure what you mean by form 6. The shelves are done by subject, both in basic science (anatomy) and clinical science (pediatrics). These are often used internally for grading purposes by medical schools but are not required for residency.

    They are related but distinct from the Steps (1, 2, and 3), which are all required to be licensed to practice medicine in the US (steps 1 and 2 are done prior to beginning residency, Step 3 during).

  3. I agree with you but still they are good for assessment since most of them take these exams when they are not at their best, that means if you just scraped through the nbme that means you are less likely to pass, so i think for FMG this a good tool to assess weather to give the exam or not.in short if you cant beat a score of people when they didn’t do their best it is unlikely that you would beat their score when they are at their best.

  4. Sure, but that’s very rough and certainly doesn’t tell you where you’ll land on the big day unless you are a) at the bottom or b) at the top. I suppose those in the middle might mostly remain in the middle, but I’d be hesitant to say how laziness scales up. How does your worst compare to some else’s worst? It’s not necessarily a 1-to-1 kinda thing.

    They’re useful but expensive. I wouldn’t claim to know what proportion of students taking the exam are well-prepared or not—that’s why the statistics feature doesn’t impress me very much. Thanks for commenting!

Leave a Comment.