| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |


* From the Departments of Anesthesiology and Surgery, Washington University School of Medicine, St. Louis, MO; and the
Anesthesia and Critical Care, University of Chicago, Chicago, Illinois, USA;
(see Appendix II).
Address correspondence to: Dr. Eric Jacobsohn, Department of Anesthesiology, Washington University School of Medicine, 660 S. Euclid, MC 8054, St. Louis, MO 63110, USA. Phone: 314-747-4155; Fax: 314-362-4551; E-mail: jacobsoe{at}msnotes.wustl.edu
| Abstract |
|---|
|
|
|---|
Methods: This was a randomized, pretest-posttest trial. Twenty-five residents did an initial oral examination (E1) resembling the ABA examination. They were then randomized into two groups, a routine education group, and an intervention group that was taught oral examination skills. Six weeks later they did another oral examination (E2). The videotaped examinations were subsequently scored by six experienced RCPSC and ABA graders.
Results: There was very poor IRR on E1 (weighted Kappa = 0.166, intraclass correlation coefficient 0.243), which improved only slightly on E2 (weighted Kappa = 0.275, P = NS; intraclass correlation coefficient = 0.405, P < 0.01). Pass rate for graderpairs increased from E1 to E2 (15% vs 43%, P = 0.01). The improved pass rate on E2 occurred in both the routine education group and in the intervention group. There was no significant difference between RCPSC and ABA graders. Teaching examination skills per se did not improve performance, but this conclusion may be limited by the poor IRR. Practice orals do appear to improve performance on future examinations.
Conclusions: Inter-rater reliability may be poor when graders score an oral examination in true isolation. Teaching candidates an oral examination communication and presentation technique did not appear to improve performance. Oral examination practice may be of value in training for future examinations.
| Introduction |
|---|
|
|
|---|
There were two primary aims of this study: 1) to test the IRR of experienced oral examiners who had previously examined at the ABA or RCPSC; 2) to test the effect of teaching candidates certain oral examination skills, such as presentation and communication techniques. The secondary aim was to compare scores given by graders from the ABA and RCPSC.
| Methods |
|---|
|
|
|---|
|
After E1, the residents were randomized and stratified by year of training to one of two groups: Group T (new educational intervention, n = 13) and Group R (routine residency education, n = 14). The residents in Group T received a package relating to oral examination presentation and communication techniques. This package included the information that was to be presented at a communication skills workshop they were required to attend. This workshop was designed to improve the residents oral examination communication and presentation skills. It consisted of a presentation on the structure and pitfalls in the oral examinations, a review of the ABA scoring system, a review of the information that the ABA sends to oral examination candidates (which includes reviewing a typical examination scenario and the accompanying examiners grid), and a discussion on oral examination communication techniques such as clarity and speed of speaking, intonation, eye contact, body language, and dress. This workshop also taught the participants a comprehensive system to present and communicate cases on an oral examination. After the workshop, the participants were divided into small groups to practice the presentation and communication system. They were urged to start using the system in their daily practice when communicating or presenting cases to their attending physicians. To reinforce learning the system, individual one-hour follow-up sessions were held with Group T. Group T continued to attend all the usual residency educational seminars. Group R did not attend the educational intervention, but continued their usual residency educational and reading schedules. The residents agreed not to communicate about the examination content. They understood that there was no gain or loss from a good or bad performance and they acknowledged that after accepting a stipend it would be unethical if they did communicate with each other along with invalidating the study. Six weeks after the workshop, the residents in both groups did another oral examination (posttest, E2). The same questions used in E1 were administered, but in crossover design (Figure
).
The videotaped performances were randomly copied to a series of videotapes. They were "scrambled" as to the sequence of candidates and to E1 or E2; this was done to minimize the confounding variable of grader calibration. The series of videotapes were then sent to the six oral examination graders, three with current experience examining for the ABA, and three with current or very recent experience examining for the RCPSC. These graders had been selected on the basis of peer examiners (at the ABA and RCPSC) recommendations as neither being too difficult nor too lenient. The graders were paid for their time and knew nothing of study design. The graders used a scoring sheet similar to that used by the ABA that has an ordinal scale of 70, 73, 77, and 80 (Supplement 1: Scoring sheet A, available as Additional Material at www.cja-jca.org). In addition to the scoring sheet, to further try and standardize scoring graders were given an educational module regarding scoring of oral examinations (which included the information that the ABA examiners are given regarding the ABA oral examination). They also received an examiners grid for each question that listed the topics that had to be addressed in the question. A candidate passed if the average score given by two graders was > 75. To collect more detailed data that could not be assessed by the ABA scoring system, the investigators designed another scoring sheet that included an overall percentage score (0100) and several tenpoint ordinal scores (Supplement 2: Scoring sheet B, available as Additional Material at www.cja-jca.org).
After E1 and E2, the residents completed several self-assessment questionnaires, including one for fairness, difficulty, examiner intimidation and self-assessment score. An anonymous questionnaire after E2 was used to detect if there was any communication between residents about the examination.
| Statistical analysis |
|---|
|
|
|---|
was set at 0.05. Intraclass correlation coefficients (ICC) were used to calculate IRR between the graders for interval scores. Weighted Kappa tests were used for agreement between two methods, raters, or observers, when the observations are measured on an ordinal scale. The degree of agreement is indicated by the weighed Kappa statistic, which can be roughly interpreted as follows: 0 = trivial; 0.1 = small; moderate = 0.3; large = 0.5; very large = 0.7; nearly perfect = 0.9. | Results |
|---|
|
|
|---|
|
|
|
|
|
| Discussion |
|---|
|
|
|---|
The IRR of graders scoring in true isolation (i.e., videotaped performances) was very poor on both examinations. Although some improvement was noted on the second examination, the IRR never reached an acceptable level, that is, > 0.7. Most certifying boards strive for IRR levels of 0.8 to 0.9. In a four-year prospective study of anesthesiology residents, 441 practice oral examinations resembling the ABA exam were administered to 190 residents.2 Using the ABA ordinal score and pairs of examiners, the IRR was found to be 0.68. However, in this study, the examiners were faculty members in the residency program and therefore knew the residents.2 This fact could have lead to examiner bias due to the "halo effect" where a candidate is scored high or low based upon previous encounters or knowledge of the resident. In a study of practice oral examinations using the RCPSC oral examination format, the mean IRR ranged from 0.477 to 0.791 (measured at two different study sites and at two different times).3 However, the examiners were not truly grading in isolation, and not all examiners were experienced RCPSC examiners.
There are several possible explanations for the poor IRR in our study. Examiners were evaluating residents-in- training who were probably not as well prepared as true candidates. Given that IRR is weaker for poor performances than good performances, we may have expected poorer IRR for the practice orals than for real certifying examinations. This may be supported by the observation that the scores improved significantly from E1 to E2. This was accompanied by an improvement in the IRR for the component scores, the overall continuous score, and the ordinal score. It may be that if the residents had been in their CA3 years that the IRR would have been better. We did not attempt a subgroup analysis for the CA3 residents because the group was too small. Another possible explanation for the poor IRR may be that the examinations were inconsistent for content and quality. However, this is unlikely, as each question had previously been used at an ABA certifying exam, and the graders felt that they were fair and similar to a certifying ABA examination. Also, the examiner (P.A.K.) had been on faculty for five years, was trained in administering oral examinations, and had developed the grid himself.
A plausible explanation for the poor IRR may be that conscious or subconscious consensus building between graders was not possible due to the study design. In 1966, it was first suggested that "subtle communication" occurred between examiners.6 The graders in our study scored the examinations with-out being aware of the study aims, and there was no possibility for consensus building. Most exam boards report good IRR, and in the two practice oral examination studies2,3 the IRR was reported as being fair to good. However, graders were invariably not scoring in true isolation, and non-verbal communication could have occurred.
Another factor that may have affected the IRR is the type of grader; it is well known that there are strict and lenient graders. It appears that one of the US graders (US1) was lenient, and one of the Canadian examiners (Can3) was strict. This exposes one of the difficulties with oral examinations, namely that they are difficult to grade reliably even when they have been standardized. Standardizing the oral and having examiner grids and scoring sheets helps, but it does not eliminate this problem. In our study, when using the ordinal score, 44% of candidates failed irrespective of grader, and only 8% passed irrespective of grader. In the remaining 48%, the chance of passing or failing was examiner dependent. In order to minimize the IRR problems, many examining boards provide training for examiners and have ongoing monitoring of their examiners. The graders in our study were all trained and experienced in giving oral examinations. Another possible way to try and improve the process would be for the same examiners to ask the same question to all candidates, that is, have the candidates move from one room to another during their exam.3 This would mean that the examiners in the same room would ask only one question to multiple candidates. This would have the effect of standardizing the exam as well as assuring some degree of standardization in the scoring. This makes the examination logistically more challenging, especially for large numbers of candidates.
We thought that comparing RCPSC and ABA graders would be interesting as both countries have high standards in anesthesiology training, similar certification processes, and the competencies expected of a certified anesthesiologist are very similar.7 However, the analysis was complicated by the variability in grader stringency. One ABA grader was lenient (pass rate 80%), and one RCPSC grader was strict (pass rate 16%). When these two were removed from the analysis, there was no significant difference in the pass rates between the ABA and RCPSC graders. When the variability of the grader pass rates is considered, we would have needed 25 residents per group and 12 graders per country to detect a 10% difference in pass rates with a power of 80%.
The improved scores from the first to the second examination may have several possible explanations. Both groups reported significantly less anxiety at the time of the second examination. This is an important finding as it has been shown that less confident and more anxious examinees do worse.8 Anesthesiology residents who were exposed to repeat mock oral examinations became more confident on subsequent examinations, even though their anxiety remained unchanged.9 We speculate that the residents in our study probably felt less anxious on the second examination as they knew that the examiner (P.A.K.) was non-intimidating and fair (based on their experience on E1). The combination of less anxiety and improved confidence could partly explain the improvement in the graders scores. The intervening time was only six weeks, making it unlikely that the improvement could be attributed to maturation in knowledge. The fact that both groups improved suggests that a repeat oral examination may have a powerful effect on examination performance. Practice orals in anesthesiology are well established and have been proposed to enhance exam-taking skills.
One of the primary goals of this study was to establish whether a presentation and communication method could improve performance. We could not show this. It is unlikely that inadequate dosage (strength of intervention) occurred; the seminar and follow-up sessions were comprehensive. The intervention may have been ineffective if the residents were not motivated to use it, but the questionnaire suggests that residents liked the system and used it. Another possible explanation may be that the poor IRR may have made it difficult for the effect of the intervention to be seen, especially when there was such a powerful effect of a repeat oral examination. A reassuring explanation may be that experienced graders were not swayed by elegant communication and presentation skills in the absence of a substantive knowledge base. While it was once believed that communication skills were a discrete component in assessing clinical competence, it is now accepted that communication skills are dependent on the context in which they are performed.10 Using standardized patients, Colliver et al. showed that clinical competence and communication skills were related only in the clinical context in which they were presented.11 In their work, if students did badly in the clinical context, communication would be rated poorly. Since most of the residents in our study did not receive overall passing grades, the association between poor overall competences may have been another factor in addition to the poor IRR. The sample population consisted of residents in their CA1-CA3 years, and it may be that if we studied only residents with a more developed knowledge (CA3) that we would have shown an effect. Another possible explanation for not showing a difference between groups may be that the method we taught was simply not good enough.
The study has several limitations. The residents were a "convenience" sample and included residents from CA1-CA3 years. Junior residents were unlikely to do well, and it can be assumed that their performance would be below that of candidates presenting themselves for a certifying examination. The IRR of poor performances is less reliable than that of good performances.11 The IRR results of the study may therefore have been biased by the participation of the junior residents. However, potentially incompetent candidates do sometimes present themselves for certifying examinations, and it would be concerning if some of them passed. Another limitation of the study was in the examiners grid. Although the questions were standardized, each examination could take a slightly different course depending on how the resident responded. This could have made the scoring on the grid difficult to follow at times. However, this again is something that does occur in certifying examinations. Another weakness is that the self-assessment scores may have significant bias. The investigator had the ability to identify the residents, and hence they may have been less likely to criticize the intervention.
In conclusion, we have shown that on a mock anesthesiology oral examination resembling session B of the ABA certifying examination, that: 1) IRR between experienced examiners was poor when they graded in isolation, and many candidates would have passed or failed depending on the examiner; 2) practice oral examinations have a powerful effect on improving performance by reducing candidate anxiety and improving confidence; 3) oral examination presentation and communication skills per se did not improve scores; and 4) graders from the ABA and RCPSC appeared to give similar scores.
| APPENDIX I: Examination cases and examiners grids |
|---|
|
|
|---|
SC1: examiners grid
A 68-yr-old male with steroid-dependent rheumatoid arthritis is scheduled for a revision total hip replacement. He has diabetes, a hiatal hernia and coronary artery disease. He had a myocardial infarction in the past, which was complicated by congestive heart failure. Medications include: digoxin, furosemide, captopril, atenolol, prednisone, ranitidine and glyburide. Vital signs: P 59, BP 130/70, R 14, T36.9
What are your concerns about this patient?
PREOPERATIVE MANAGEMENT
Coronary disease
Medications:
Rheumatoid arthritis (RA):
Hiatal hernia:
Diabetes:
Drug interactions: polypharmacy
ANESTHETIC PLAN AND INTRAOPERATIVE MANAGEMENT
Monitors: art line, CVP vs PAC, TEE
Induction plan
Maintenance plan
Blood loss
Heat conservation
Positioning
PA catheter is placed
Critical incident: Bradycardia
Critical incident: Hypoxemia
Emergence
Postoperative disposition: ICU vs PACU
Decision to extubate.
POSTOPERATIVE MANAGEMENT
Pain control. Epidural management
Hypothermia (34.5°C)
Stem case 2 (SC2): resident worksheet
A 47-yr-old obese female is scheduled for a pelvic exenteration (radical hysterectomy, oophorectomy, possible partial cystectomy and/or partial colectomy). She has a history of diabetes, hypertension, and asthma. Her medications include diltiazem, captopril, insulin, hydrochlorothiazide, cimetidine, and an albuterol inhaler. Her examination reveals P 68, BP 145/90, R 14, T 37.0
SC2: examiners grid
A 47-yr-old obese female is scheduled for a pelvic exenteration (radical hysterectomy, oophorectomy, possible partial cystectomy and/or partial colectomy). She has a history of diabetes, hypertension, asthma, she has recently started treatment for reflux esophagitis. Her medications include diltiazem, captopril, insulin, hydrochlorothiazide, cimetidine and an albuterol inhaler. Her examination reveals P 68, BP 145/90, R 14, T 37.0
What are your concerns about this patient?
PREOPERATIVE MANAGEMENT
Hypertension:
Test:
Asthma:
Obesity:
Reflux esophagitis:
Diabetes:
Drug interactions: polypharmacy
ANESTHETIC PLAN AND INTRAOPERATIVE MANAGEMENT
Monitors: art line, CVP
Induction plan
Maintenance plan
Blood loss
Heat conservation
Positioning
Central line is placed
Critical incident: increased peak airway pressure.
Critical incident: hypotension
Emergence
Postoperative disposition ICU vs PACU
Decision to extubate
POSTOPERATIVE
Pain control
Hypothermia (34.5°C)
Additional case 1 (AC1)
A 60-yr-old man has a spinal anesthetic for transurethral prostatectomy (TURP). Six minutes after the administration of the spinal, his BP decreases from 120/60 to 80/50, and HR decreases from 85 to 45. He has difficulty breathing and is becoming somnolent. What has happened? Pathophysiology? Management?
Additional case 2 (AC2)
A 26-yr-old IV drug abuser requires an exploratory laparotomy. He has a history of hepatitis. What are your concerns? What anesthetic technique is the best? What agents to use? What are the risks?
| APPENDIX II |
|---|
|
|
|---|
| Acknowledgments |
|---|
| Footnotes |
|---|
Accepted for publication January 19, 2006. Revision accepted January 30. 2006.
Competing interests: None declared.
This article is accompanied by an editorial. Please see Can J Anesth 2006; 53: 63942..
| References |
|---|
|
|
|---|
2 Schubert A, Tetzlaff JE, Tan M, Ryckman JV, Mascha E. Consistency, inter-rater reliability, and validity of 441 consecutive mock oral examinations in anesthesiology: implications for use as a tool for assessment of residents. Anesthesiology 1999; 91: 28898.[Medline]
3 Kearney RA, Puchalski SA, Yang HY, Skakun EN. The inter-rater and intra-rater reliability of a new Canadian oral examination format in anesthesia is fair to good. Can J Anesth 2002; 49: 2326.
4 Muzzin LJ, Hart L. Oral examinations. In: Neufeld VR, Norman GR (Eds). Assessing Clinical Competence. New York: Springer Publishing Co.; 1985: 7193.
5 Burchard KW, Rowland-Morin PA, Coe NP, Garb JL. A surgery oral examination: interrater agreement and the influence of rater characteristics. Acad Med 1995; 70: 10446.[Medline]
6 McGuire CH. The oral examination as a measure of professional competence. J Med Educ 1966; 41: 267 74.[Medline]
7 Eagle C. Anaesthesia and education. Can J Anaesth 1992; 39: 15865.
8 Linn BS, Zeppa R. Stress in junior medical students: relationship to personality and performance. J Med Educ 1984; 59: 712.[Medline]
9 Schubert A, Tetzlaff JE, Licina M, Mascha E, Smith MP. Organization of a comprehensive anesthesiology oral practice examination program: planning, structure, startup, administration, growth, and evaluation. J Clin Anesth 1999; 11: 50418.[Medline]
10 Newble DI, Swanson DB. Psychometric characteristics of the objective structured clinical examination. Med Educ 1988; 22: 32534.[Medline]
11 Colliver JA, Swartz MH, Robbs RS, Cohen DS. Relationship between clinical competence and interpersonal and communication skills in standardized-patient assessment. Acad Med 1999; 74: 2714.[Medline]
Related articles in CJA:
This article has been cited by other articles:
![]() |
S. C. Hall Poor inter-rater reliability on mock anesthesia oral examinations Can J Anesth, December 1, 2006; 53(12): 1268 - 1269. [Full Text] [PDF] |
||||
![]() |
P. Houston, R. A. Kearney, and G. Savoldelli The oral examination process - gold standard or fool's gold/Le processus d'examen oral - un vrai ou un faux etalon-or ? Can J Anesth, July 1, 2006; 53(7): 639 - 642. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |