Whether to help yourself or your classmates practice, produce learning materials for your students, make money, or perhaps to create a large free question bank the likes of which have never been seen, knowing how to write a USMLE-style board may be a skill you’re interested in cultivating.
In fact, even for students who only plan to take—not write—the USMLE, understanding the qualities of a Step question may help one understand how to approach (and hopefully guess/answer correctly) the ones on the actual exams.
The NBME actually has an extremely detailed Item Writing Manual (181 pages!). It’s quite long (but full of lots of examples), so I’ve compiled some highlights below.
Section I // Technical Flaws in Question-writing
Section one concerns structure and format. The bottom line is that single-best-answer multiple choice questions are really the only format used. The real money starts on page 19, which discusses technical flaws. Poorly written questions, in addition to choosing niche or overly-specific content, often contain several common mistakes:
- Grammatical flaws are disproportionately present in the wrong answer choices
- A subset of choices contain all reasonable possibilities (and the remaining choices are therefore blatant add-ons)
- Using absolutes always and never are a huge red flag as they are generally never (get it?) correct and are therefore (rightly) discounted
- The correct answer choice is generally the longest
- The question stem and the correct answer choice contain the same word or a very similar one.
- “Convergence” – when in doubt, a clever examinee will select the answer choice with the most elements in common with other answer choices (this is a subtle finding but once pointed out is surprisingly common (p. 22)
Other things to avoid in writing answer choices:
- Ridiculously long answer choices (as opposed to ridiculously long question stems, which are delightful)
- Internal inconsistency (e.g. using ranges versus specific numbers; percentage versus a number; etc)—answer choices should be presented in the same format
- Vague qualifiers (usually, often, etc) are unspecific, debatable, and should be avoided
- Language that isn’t parallel (the choices are supposed to read/sound similar, otherwise some can be ruled out out-of-hand)
- Awkward ordering (i.e. answers should be grouped together, alphabetical order is also a safe choice)
- “None of the above” — just don’t.
Section 1 is a pretty good read, especially pages 19-26.
Section II // Qualities of a strong question
Section II details how to write a one-best-answer question, including these five basic rules to which all questions should adhere:
- Each item should focus on an important concept, typically a common or potentially catastrophic clinical problem.
- Each item should assess application of knowledge, not recall of an isolated fact. Also, Step 1 vignettes may be shorter than Step 2 vignettes, which are more focused on the clinical presentation of disease.
- The stem of the item must pose a clear question, and it should be possible to arrive at an answer with the options
- All distractors (ie, incorrect options) should be homogeneous. [as above]
- Avoid technical item flaws that provide special benefit to test-wise examinees or that pose irrelevant difficulty (as above). [p. 33]
While the type of knowledge required to answer simple recall questions isn’t all that different from vignette-based questions, the placement of questions into clinical context is an important component of a “good” question. The common vignette structure goes something like this:
Age, Gender (eg, A 45-year-old man)
Site of Care (eg, comes to the emergency department)
Chief Complaint (eg, because of a headache)
HPI (duration, quality, etc).
+/- Relevant PMH, PSH, FH, and SH
+/- Physical findings
+/- Diagnostic/laboratory studies
+/- Treatment, follow-up, clinical course, subsequent findings [p. 38]
The higher the Step number, the longer the vignette typically is. Step 1 vignettes are often brief, framing a straightforward first or second-order question in a clinical context. In many cases, elements of the vignette, while relevant, are entirely unnecessary to arriving at the correct answer. This is called window dressing, and it is abundant. The additional information does help de-emphasize sometimes obvious “key” words that can become telegraphic in very short vignettes. The NBME discourages the use of “red herrings” though, which—while realistic—are purposefully distracting data that lead the examinee away from the correct answer. Ultimately, exams test the ability to take exams; they do not mirror real life. The guide includes some simple short Step 1 style templates on page 39.
General rules for constructing a strong question:
- Lead-ins should also end in a question mark and not a preposition or colon: “The effects of which medication best explain this patient’s symptoms?” is correct. “The patient is likely taking:” is not.
- Information should be concentrated in the stem and not in the answer choices. Long stem, short choices is good. Short stem, two line answers choices is not.
- The best distractors (wrong choices) are the ones that are based on common misconceptions and mistakes. If no one would choose it, then it’s not a good distractor.
- When using a single case for multiple questions (a cluster), avoid “cueing” — the habit of providing hints to one question’s correct answer by the content or answers to the following questions. Also avoid “hinging,” where not being able to answer one question guarantees missing the second question. Each question should stand alone, even if several questions are based on one vignette.
- A generally successful examinee should be able to answer the question without looking at the answer choices. They should confirm his/her answer. If a student can’t hazard a reasonable guess from the stem and lead in, then the question is poorly worded. “Which of the following is true?” questions fail this rule, which is why they are generally weak questions.
- Never use negative phrasing (e.g. which of these is NOT involved in…)
Step 1 Questions do not test knowledge of a fact alone. They test the application of this knowledge in a clinical context:
- Guess the culprit (drug, exposure, diet, mood)
- Predict the associated finding given a presentation of a disease (physical exam, history, lab, etc)
- Identify the disease, bug, etc
- Treatment of choice
Step 2 CK questions typically address:
- Home and health maintenance
- Mechanisms of disease
- Next step in management (to the frustration of students everywhere)
- Definitive management
- Nuances of management (complications of management, contraindications to management, etc)
Sections III/IV/Appendix // The rest
Section III contains examples of how to construct the extended matching question type, and Section IV largely concerns interpreting question viability, test composition, and setting grading thresholds. Somewhat interesting reading, but unless you’re administering your questions in an official capacity (i.e. you’re teaching a course and writing your exam, these sections can be safely skipped). The key fact is that it’s the pattern of responses that determines if a question is “fair,” not simply the percentage of respondents who answer correctly. The appendix is an old question format “graveyard.”
From personal experience, writing questions is a fantastic way to really learn material and helps you hone in on key distinctions instead of pure fact accumulation.
And it can be pretty good money.