| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
,



,
* From the Departments of Critical Care Medicine,
Medicine,
Community Health Sciences, and
Clinical Neurosciences, University of Calgary, Calgary, Alberta Canada.
Address correspondence to: Dr. Christopher J. Doig, Departments of Medicine and Community Health Sciences, Faculty of Medicine, University of Calgary, Department of Critical Care, Calgary Health Region, Room EG23G, Foothills Medical Centre, 1403 - 29th Street, NW, Calgary, Alberta T2N 2T9, Canada. Phone: 403-944-1691; Fax: 403-283-9994; E-mail: cdoig{at}ucalgary.ca
| Abstract |
|---|
|
|
|---|
Methods: Prospective automated daily measurements of MOD and SOFA scores were performed in 1,436 patients admitted to a multisystem ICU in the Calgary Health Region over a one-year period. Logistic regression modeling techniques were used to describe the association of SOFA and MODS with mortality. Receiver operator characteristic (ROC) curves were used to assess the models discriminatory ability.
Results: For ICU and hospital mortality, there was very little practical difference between the SOFA and MOD scores in their ability to discriminate outcome as determined by the area under the ROC. However, compared to previous literature, the discriminatory ability of both scores in this population was weak. As well, the calibration of the models was poor for both scores. The SOFA cardiovascular component score performed better than the MOD cardiovascular component score in the discrimination of both ICU and hospital mortality.
Conclusions: SOFA and MOD scores had only a modest ability to discriminate between survivors and non-survivors. These results question the appropriateness of using organ dysfunction scores as a surrogate for mortality in clinical trials and suggest further work is necessary to better understand the temporal relationship and course of organ failure with mortality.
| Introduction |
|---|
|
|
|---|
Although SOFA and MOD evaluate the same six organ systems, there are practical differences which may affect the operating characteristics of each system. The MOD score was developed based on a critical appraisal of the literature, and then cut-off scores for each organ system were derived and validated based on probability of subsequent mortality in a sample of 692 patients from one Canadian ICU.6 The SOFA score was developed by a consensus conference at a meeting of the European Society of Intensive Care Medicine (ESICM). The SOFA score has been validated against mortality in multiple subsequent studies.912
The other significant difference is in the timing in calculating the score in each system. SOFA is calculated based on the most abnormal value in a 24-hr period, whereas the MOD score is calculated using physiologic values measured at the same point in time every day (first morning values) to avoid capturing momentary physiologic changes unrelated to changes in the patients underlying physiologic status.
Given that these two scores are now being incorporated into clinical trials in critical care,7,8 despite the caveats that each was developed using disparate techniques and each uses a different technique to assess the severity of physiologic derangement, the objective of our study was to compare the correlation of the MOD and SOFA scores with mortality in a large heterogeneous population of adult multisystem ICU patients.
| Methods |
|---|
|
|
|---|
The SOFA and MOD scores were collected following the recommendations in the original publications.5,6 An electronic patient information system [Quantitative Sentinel (QS), GE-Marquette Medical Systems Inc., Chicago, IL, USA] interfaced to all bedside devices collected physiologic data, and these data were validated (accepted by the system) by nursing or respiratory therapy staff on at least an hourly basis by examining the representativeness and sensibility of the data. An HL-7 interface with the regional laboratory information system (Cerner PathNet Classic, version 306, Kansas City, MO, USA) was utilized to collect all laboratory data. Two programs were developed in visual basic (Microsoft VBL, Microsoft Corporation, Seattle, WA, USA) to examine all physiologic and laboratory values in each 24-hr period, measured daily from 0000 to 2359 hours. For the SOFA score, one visual basic program determined the most abnormal value for each variable. The program then calculated the appropriate SOFA value (range 04), which was then exported to a local longitudinal ICU database known as TRACER (Microsoft Access, Microsoft Corporation, Seattle, WA, USA). Missing SOFA values were replaced between a preceding and subsequent SOFA value with the lower of the two scores. In the absence of a preceding or subsequent SOFA value, the score was calculated at zero. In the second visual basic program, the least abnormal value at 0700 ± 2 hr was used to calculate the appropriate MOD score. Imputation of missing values was conducted in a similar manner as to the SOFA score. The calculation of each component system value and the total values for both SOFA and MOD scores were checked manually (C.J.D.) for their accuracy by comparing to the laboratory or physiologic data recorded in the QS system over a one-month period (683 patient days) prior to the start of the study; no errors were found in the calculation of either score. Clinical data were collected on all patients at the time of entry to the ICU. Outcome data were collected at the time of ICU discharge, and hospital discharge.
Statistical analysis
All data were analyzed using STATA-7 and STATA-8 (Stata Corporation, College Station, TX, USA). Our primary objective was to describe the relationship of SOFA and MOD scores with outcome. Consistent with previous publications we chose admission, patient mean, delta and maximum values to describe the association of SOFA and MOD scores with outcome. Maximum values were defined as the sum of the most abnormal component scores during the patients ICU stay. Delta scores, a measure of the degree of organ dysfunction acquired during ICU stay, were defined as the difference between the maximum score and admission score. In addition, we also compared individual components scores. Organ failure was defined by a component score
3. Logistic regression modelling techniques were used to describe the relative strength of the relationship of SOFA and MOD scores with mortality. Model calibration was examined using the Hosmer-Lemeshow goodness of fit test.13 Area under the receiver operator characteristic (AuROC) curves were used to assess model discriminatory ability. A P value of < 0.05 was considered significant.
| Results |
|---|
|
|
|---|
|
|
|
0.5). For hospital mortality, discrimination and/or calibration was also generally poor for each of the models with little difference between the scoring systems (Figure 2
|
|
To investigate the independent association of each component score with hospital mortality, multivariable logistic regression models were created for MOD and SOFA component scores. Backwards step-wise elimination was used to produce the most parsimonious model. For both the SOFA and MOD scores, neurological, cardiovascular and renal component scores remain significantly and independently associated with hospital mortality while scores for the hepatic, coagulation and respiratory fell out of both models.
| Discussion |
|---|
|
|
|---|
Marshall et al., in their initial description of the MOD score, reported excellent discrimination (AuROC of 0.93) for the maximum MOD score.6 We found an AuROC of only 0.64. It should be noted that patients in Marshalls study were all surgical patients while our study included surgical and medical patients. In addition, it is unclear whether Marshall enrolled all consecutive patients. The admission APACHE II score in our patients was significantly higher than those in Marshalls study (25 vs 13) with a proportionate greater ICU mortality (27% vs 9%). Marshalls results are based on the validation sample which consisted of 356 patients compared to our study of 1,400 patients. Given the marked differences in patient demographics, it is difficult to definitely resolve the disparity of the results between these studies. Further, as the MOD score was developed over a decade ago, it is possible the change in discrimination ability observed in this study is related to changes in patient care over these years. Although a more recent study by Cook and colleagues14 examined the relation between the six components of the MOD score with time to death in a selected sample of 1,200 ICU patients, comparisons between Cooks data and ours are not valid for two important reasons. Firstly, the Cook study did not collect the MOD score as originally described but opted for a "simpler" cardiovascular component score. Neither the rationale for this change nor the ability of this unique cardiovascular score to discriminate outcome was reported. In addition, the data were derived from patients entered into a large randomized controlled trial.15 In this trial, only 1,200 of 4,232 patients with ICU length of stay greater than 48 hr were included. The use of this highly selected sample makes generalization of their results problematic.
Two other studies have directly compared SOFA and MOD scores. Pettilä and colleagues recently compared the ability of SOFA, MOD, logistic organ dysfunction score (LODS) and APACHE III to predict hospital mortality.16 In this single-centre study, SOFA, MOD and LOD scores were calculated retrospectively for day one, three, five, and seven. However, MOD score was calculated for each day by using the worst single value of that day. This is contrary to previously described methods that used physiologic values measured at the same point in time every day (first morning values). Thus, this study does not use the MOD score as described by Marshall et al.6 Further, because this study did not collect SOFA and MOD scores for all days of ICU admission, it must be interpreted with caution. The authors found an AuROC curve of 0.776 for admission SOFA score and 0.695 for admission MOD score. The AuROC curve for maximum SOFA score was 0.816 and 0.817 for the maximum MOD score. Given the lack of daily collection and the variant method of collection of the MOD score, it is difficult to comment on the comparability of these results to previous studies. Although the AuROC curve is lower than previously reported, this may be secondary to methodological differences.
The second study was published in 2002 by Peres Bota and colleagues.17 In this single-centre study, they collected SOFA and MOD scores every 48 hr until discharge. They found both scores to have good discrimination in outcome prediction (maximum MOD AuROC 0.900, maximum SOFA AuROC 0.898) and concluded they are reliable outcome measures. Several important differences between our study and the study by Peres Bota warrant discussion. First, our study calculated prospectively for each day of ICU stay in a manner consistent with previously reported methods. In addition, we used a population-based cohort thereby strengthening the generalizability of our results. Another major difference in the collection of these scores was our use of an automated collection system. Automated sampling has been shown to influence the overall score when compared to manual sampling.18,19 Bosman et al. suggest that sampling of score data is different due to a higher frequency of data gathering during automatic data transfer, an absence of errors and the non-existing pre-selection of data by care personnel.19 The increased density of screening possible with automated collection makes an occurrence of extremely high or low values much more likely and increases the volume of data taken into account for score calculation. Hammond et al. found that almost 20% of all health care personnel note vital data incorrectly.20 Two studies have suggested that there is a tendency for nurses to favour normal readings and ignore extreme values resulting in a reduced variability in manually collected values.21,22 Because physiological values in our studies were validated by nursing or respiratory staff, our data are likely subject to similar bias but we would expect less so than previous studies as data were not generated de novo by the nurses.
Both systems had only modest ability to discriminate and most models suffered from poor calibration. Discrimination evaluates the ability of a model to distinguish dying patients from those who survive. This is commonly assessed using the AuROC curve. A value of > 0.70 for the AuROC curve is considered satisfactory for discrimination,23 but too low to be useful for mortality prediction in individual patients.24 Calibration evaluates the degree of correspondence between the predicted probabilities of mortality and the observed mortality. If calibration is poor, customization of the model can be attempted but is not always successful.25,26 Further, the practicality of such an approach is questionable.
A practical disadvantage to the MOD score is the calculation of the pressure-adjusted heart rate. Due to the absence of a central venous monitor, this value cannot be calculated in a significant proportion of ICU patients. In the original study describing the MOD score, one half of the patients could not have a cardiovascular component calculated. Unlike the MOD score, the SOFA score has been described in both surgical and medical patients. The two scores are also different with respect to the timing of measurement. The SOFA score requires a review of data over a 24-hr period to identify the most abnormal value. In contrast, the MOD scores use of measurements at one particular time avoids capturing momentary physiological changes unrelated to patient condition, and is perhaps practically easier. However, the ideal choice of data collection time and its effect on validity has not yet been investigated.
| Conclusions |
|---|
|
|
|---|
| Acknowledgments |
|---|
| Footnotes |
|---|
Funding for this work was supported in part by grants from the Alberta Heritage Foundation for Medical Research Health Research Fund, and the Calgary Regional Health Authority Special Competition Fund.
Accepted for publication January 27, 2004. Revision accepted December 7, 2004.
| References |
|---|
|
|
|---|
2 Beal AL, Cerra FB. Multiple organ failure syndrome in the 1990s. Systemic inflammatory response and organ dysfunction. JAMA 1994; 271: 22633.[Abstract]
3 Baue AE. Multiple organ failure, multiple organ dysfunction syndrome, and systemic inflammatory response syndrome. Why no magic bullets? Arch Surg 1997; 132: 7037.[Abstract]
4 Deitch EA. Multiple organ failure. Pathophysiology and potential future therapy. Ann Surg 1992; 216: 11734.[Medline]
5 Vincent JL, Moreno R, Takala J, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis- Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996; 22: 70710.[Medline]
6 Marshall JC, Cook DJ, Christou NV, Bernard GR, Sprung CL, Sibbald WJ. Multiple organ dysfunction score: a reliable descriptor of a complex clinical outcome. Crit Care Med 1995; 23: 163852.[Medline]
7 Bernard GR, Vincent JL, Laterre PF, et al. Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med 2001; 344: 699709.
8 Rivers E, Nguyen B, Havstad S, et al. Early goal-directed therapy in the treatment of severe sepsis and septic shock. N Engl J Med 2001; 345: 136877.
9 Antonelli M, Moreno R, Vincent JL, et al. Application of SOFA score to trauma patients. Sequential Organ Failure Assessment. Intensive Care Med 1999; 25: 38994.[Medline]
10 Vincent JL, de Mendonca A, Cantraine F, et al. Use of the SOFA score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicenter, prospective study. Working group on "sepsis-related problems" of the European Society of Intensive Care Medicine. Crit Care Med 1998; 26: 1793800.[Medline]
11 Moreno R, Vincent JL, Matos R, et al. The use of maximum SOFA score to quantify organ dysfunction/failure in intensive care. Results of a prospective, multicentre study. Working Group on Sepsis related Problems of the ESICM. Intensive Care Med 1999; 25: 68696.[Medline]
12 Metnitz PG, Lang T, Valentin A, Steltzer H, Krenn CG, Le Gall JR. Evaluation of the logistic organ dysfunction system for the assessment of organ dysfunction and mortality in critically ill patients. Intensive Care Med 2001; 27: 9928.[Medline]
13 Hosmer D, Lemeshow S. A goodness -of-fit test for the multiple logistic regression model. Communications in Statistics 1980; A10: 104369.
14 Cook R, Cook D, Tilley J, Lee K, Marshall J; Canadian Critical Care Trials Group. Multiple organ dysfunction: baseline and serial component scores. Crit Care Med 2001; 29: 204650.[Medline]
15 Cook D, Guyatt G, Marshall J, et al. A comparison of sucralfate and ranitidine for the prevention of upper gastrointestinal bleeding in patients requiring mechanical ventilation. Canadian Critical Care Trials Group. N Engl J Med 1998; 338: 7917.
16 Pettila V, Pettila M, Sarna S, Voutilainen P, Takkunen O. Comparison of multiple organ dysfunction scores in the prediction of hospital mortality in the critically ill. Crit Care Med 2002; 30: 170511.[Medline]
17 Peres Bota D, Melot C, Lopes Ferreira F, Nguyen Ba V, Vincent JL. The multiple organ dysfunction score (MODS) versus the sequential organ failure assessment (SOFA) score in outcome prediction. Intensive Care Med 2002; 28: 161924.[Medline]
18 Suistomaa M, Kari A, Ruokonen E, Takala J. Sampling rate causes bias in APACHE II and SAPS II scores. Intensive Care Med 2000; 26: 17738.[Medline]
19 Bosman RJ, Oudemane van Straaten HM, Zandstra DF. The use of intensive care information systems alters outcome prediction. Intensive Care Med 1998; 24: 9538.[Medline]
20 Hammond J, Ward CG, Johnson M, Varas R, Marcial E. The computerized burn unit: experience with a patient data management system. Int J Clin Monit Comput 1989; 6: 879.[Medline]
21 Reich DL, Wood RK Jr, Mattar R, et al. Arterial blood pressure and heart rate discrepancies between handwritten and computerized anesthesia records. Anesth Analg 2000; 91: 6126.
22 Taylor DE, Whamond JS. Reliability of human and machine measurements in patient monitoring. Eur J Intensive Care Med 1975; 1: 539.[Medline]
23 Lemeshow S, Le Gall JR. Modeling the severity of illness of ICU patients. A systems update. JAMA 1994; 272: 104955.[Abstract]
24 Randolph AG, Guyatt GH, Richardson WS. Prognosis in the intensive care unit: finding accurate and useful estimates for counseling patients. Crit Care Med 1998; 26: 76772.[Medline]
25 Markgraf R, Deutschinoff G, Pientka L, Scholten T, Lorenz C. Performance of the score systems acute physiology and chronic health evaluation II and III at an interdisciplinary intensive care unit, after customization. Crit Care 2001; 5: 316.[Medline]
26 Moreno R, Apolone G. Impact of different customization strategies in the performance of a general severity score. Crit Care Med 1997; 25: 20018.[Medline]
27 Nathens AB, Marshall JC. Sepsis, SIRS, and MODS: whats in a name? World J Surg 1996; 20: 38691.[Medline]
This article has been cited by other articles:
![]() |
J. C. Marshall Measuring organ dysfunction in the intensive care unit: why and how?/Evaluer la dysfonction organique a l'unite des soins intensifs : pourquoi et comment ? Can J Anesth, March 1, 2005; 52(3): 224 - 230. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |