Сравнение системы "Discovery" с другими методами Data Mining

Фирсов Н.И., Витяев Е.Е. Сравнение системы "Discovery" с алгоритмами Microsoft As-sociation Rules и Decision Trees, встроенными в Microsoft SQL Server Analysis Services. Информационные технологии в гуманитарных исследованиях, Вып.17, ИАЭТ СО РАН, Новосибирск, 2012, 51-63

Сравнение в финансовом прогнозировании

Сравнение системы "Discovery" с нейронными сетями, решающими деревьями (Sipina), извлечением правил из нейронной сети, методами логики первого порядка (FOIL) и другими базовыми методами, представленными в таблице и картинке

Разработка диагностической системы рака груди

Результаты сравнения с нейронными сетями, решающими деревьями (Sipina) и линейным дискиминантным анализом (SIGAMD)

Figure presents results for another selection criterion: level of conditional probability. We studied three levels: 0.7, 0.85 and 0.95. A higher level of conditional probability decreases the number of rules and diagnosed patients, but increases accuracy of diagnosis. Results for them are marked as MMDR1, MMDR2 and MMDR3. We extracted 44 statistically significant diagnostic rules for 0.05 level of F –criterion with a conditional probability no less than 0.75 (MMDR1). There were 30 rules with a conditional probability no less than 0.85 (MMDR2) and 18 rules with a conditional probability no less than 0.95 (MMDR3). The total accuracy of diagnosis is 82%. The false negative rate was 6.5% (9 malignant cases were diagnosed as benign) and the false positive rate was 11.9% (16 benign cases were diagnosed as malignant). The most reliable 30 rules delivered a total accuracy of 90%, and the 18 most reliable rules performed with 96.6% accuracy with only 3 false positive cases (3.4%). Neural Network (“Brainmaker”, California Scientific Software) software had given 100% accuracy on training data, but for the Round-Robin test, the total accuracy fell to 66%. The main reason for this low accuracy is that Neural Networks (NN) do not evaluate the statistical significance of the perfect performance (100%) on training data. Poor results (76% on training data test) were also obtained with Linear Discriminant Analysis (“SIGAMD” software, StatDialogue software, Moscow). The Decision Tree approach (“SIPINA” software, Universite Lumiere, Lyon, France) performed with accuracy of 76%-82% on training data. This is worse than what we obtained for the MMDR method with the much more difficult Round-Robin test (fig. 8). The very important false-negative rate was 3-8 cases (MMDR), 8-9 cases (Decision Tree), 19 cases (Linear Discriminant Analysis) and 26 cases (NN).
In these experiments, rule-based methods (MMDR and decision trees) outperformed other methods. Note also that only MMDR and decision trees produce diagnostic rules. These rules make a computer-aided diagnostic decision process visible, transparent to radiologists. With these methods radiologists can control and evaluate the decision making process. Linear discriminant analysis gives an equation, which separates benign and malignant classes. For example, 0.0670x1-0.9653x2+… represents a case. How would one interpret a weighted number of calcifications/cm 2 (0.0670x1) plus a weighted volume (cm 3), i.e., 0.9653x2? There is no direct medical sense in this arithmetic.