Purpose: To externally validate and compare the performance of previously published diagnostic models developed to predict malignancy in adnexal masses. Experimental Design: We externally validated the diagnostic performance of 11 models developed by the International Ovarian Tumor Analysis (IOTA) group and 12 other (non-IOTA) models on 997 prospectively collected patients. The non-IOTA models included the original risk of malignancy index (RMI), three modified versions of the RMI, six logistic regression models, and two artificial neural networks. The ability of the models to discriminate between benign and malignant adnexal masses was expressed as the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and likelihood ratios (LR+, LR-). Results: Seven hundred and forty-two (74%) benign and 255 (26%) malignant masses were included. The IOTA models did better than the non-IOTA models (AUCs between 0.941 and 0.956 vs. 0.839 and 0.928). The difference in AUC between the best IOTA and the best non-IOTA model was 0.028 [95% confidence interval (CI), 0.011-0.044]. The AUC of the RMI was 0.911 (difference with the best IOTA model, 0.044; 95% CI, 0.024-0.064). The superior performance of the IOTA models was most pronounced in premenopausal patients but was also observed in postmenopausal patients. IOTA models were better able to detect stage I ovarian cancer. Conclusion: External validation shows that the IOTA models outperform other models, including the current reference test RMI, for discriminating between benign and malignant adnexal masses. Clin Cancer Res; 18(3); 815-25. (C)2011 AACR.