Methods for data and knowledge mining

Project 1: Methods for statistical data analysis with decision trees

Project participants:

Vladimir Berikov                                                                                


Alexander Litvinenko


Gennady Lbov (passed away on June 30, 2010)


 

 

Project 2: Cluster analysis of heterogeneous, incomplete and noisy data (RFBR project  11-07-00346)

Project participants:

Vladimir Berikov, Igor Pestunov, Victor Nedelko, Alexander Vikentiev, Victor Gusev,  Maxim Gerasimov, Pavel Maslov, Yuri Sinyavsky, Galina Polyakova

The project aims to develop and investigate methods and algorithms for solving clustering problems characterized by a combination of heterogeneity, incompleteness and noise effects in data. In this case the classified objects are described by heterogeneous (quantitative, ordinal or qualitative) variables; they may be characterized by partially differing feature systems. There are exist missed values for some characteristics; there are "noisy" objects; present non-informative variables. Such problems may arise from the analysis of biological, sociological and medical information, web data, satellite images etc.

 

In this project, we suggest to use a combination of logical, probabilistic and the ensemble approaches to construct models for classification and forecasting. The novelty of the project consists in extending of these approaches to a problem of cluster analysis, and also in use of original methods for constructing ensembles of logical-and-probabilistic models and algorithms of nonparametric cluster analysis.

Some recent papers:

Berikov V.B. Grouping of Objects in a Space of Heterogeneous Variables with the Use of Taxonomic Decision Trees // Pattern Recognition and Image Analysis. 2011. Vol. 21, No. 4.  P. 591-598.

 

I. A. Pestunov, V. B. Berikov, E. A. Kulikova and S. A. Rylov  Ensemble of clustering algorithms for large datasets // Optoelectronics, Instrumentation and Data Processing. 2011. Vol. 47, N 3. P. 245-252.

 

Berikov V.B. A latent variable pairwise classification model of a clustering ensemble // C. Sansone et al. (Eds.): Multiple Classifier Systems, MCS-2011. Lecture Notes in Computer Science, LNCS 6713. 2011.  P. 279-288.