CLASSIFICATION OF DATA ANALYSIS PROBLEMS [1]

     Let us consider the problems of prediction of elements in the two-dimension table of «object-property» type, where strings ai (i=1,2,..,m) describe m objects and columns xj (j=1,2..,n) correspond to n properties (characteristics) of these objects. 
     The elements to be predicted (b) may be positioned in a different way. Depending upon this let us mark three FAMILIES of problems: 
1) All elements bi0 are situated in one column; 
2) All elements bj0 are situated in one string; 
3) Elements bij0 belong to different strings and columns. 
In every such family we will select the CLASSES of problems depending upon the number (q) of elements to be predicted. The first family according to this classification will have three classes of problems: 
1.1) One element is predicted (q=1); 
1.2) Few elements are predicted at once (1 < q < m); 
1.3) All elements of the column are predicted at once (q=m). 
     In the similar way let us select the classes of problems in the second family: 
2.1) q =1 ;  
2.2) 1 < q < n ;  
2.3) q = n .  
     In the third family there is two classes of problems: 
3.2) 1< q < m*n ;  
3.3) q = m*n  
    In each of these eight classes of problems we will differ the TYPES of problems in accordance with scales, applied for measurement of values of elements to be predicted. We will differ three groups of scales: names (N), order (P) and «quantitative» (K). The situation, when the different-types elements are predicted will be marked by symbol (R). 
     The described classification is represented in the table 1. 
 

 
Table 1

    Let us give some examples of the most often types of prediction problems. 
    The problem 1.1.N consists in prediction of one element in the column, measured in the scale of names. It is the usual problem of pattern recognition: to indicate the name of pattern (class) to which some new object b belongs (to define the type of disease, predict presence or absence of oil etc.). 
    In the problem 1.1.P all objects are put in order according to the aim property x0 and it is necessary to find the position of the new object b in this order (for example, to predict, that oil capacity of deposit b is higher, than of ai-th one, but lower, than of ai+1-th one). 
   In the case 1.1.K it is necessary to give the quantitative estimation of the property x0 of the object b (e.g., to predict the oil reserves in millions of tonnes). If the objects in the table are put in order according to time, then the 1.1.K problem permits to predict the values of object properties in the future. 
   Problems of class 1.2 are similar in sense. But here it is necessary to make a decision about few elements at once: to recognise q objects (type 1.2.N), to define order positions of the group of objects (type 1.2.P) or estimate the quantitative characteristic x0 for q objects at once (1.2.K). 
    Problems of class 1.3 are of significant importance. To separate the objects according to the similarity of their properties, i.e. to set some classification, means to form some new column x0, measured in the scale of names (problem of 1.3.N type). It is often called the problem of automatic classification or taxonomy. Under expert estimation of m objects by n experts it is necessary to define the summary estimation either in the scale of order (then it is 1.3.P problem) or in more strong scale, e.g. in percents (problem of 1.3.K type). 
   The problems of the second family are met when it is necessary, for example, to estimate the informativeness of the properties, represented in the table. If the existing properties are preliminary separated into «informative» and «non-informative» classes then to define the place of some new property among these groups will be the problem of 2.1.N type. If it is required to indicate the place of new property in the preliminary regulated set of properties, then the problem 2.1.P is solved. And if one needs to estimate the informativeness of property b in bytes, then the problem 3.1.K appears. For the group of properties in this class the problems 2.2.N, 2.2.P and 2.2.K are formulated. 
   The interpretation of problems of estimation of whole aggregate of properties at once (problems of 2.3.N, 2.3.P and 2.3.K types) is evident. Imagine the table with empty spaces in different columns and strings. To predict the values of missed elements one have to solve the problems of different types from the class 3.2, including the problem of prediction of different-type elements 3.2.P. 
    At last, class 3.3 seizes the problems of generation of table with fixed properties: test tables for checking of pattern recognition programs, tables of random numbers etc. Depending upon the required scale type the problems of 3.3.N, 3.3.P, 3.3.K or 3.3.R will arise. 
   Not all described types of problems are equally well studied. Some of them have an old history, are well known, have well developed algorithms and programs for their solution, which are used in different applied areas. Others are less known, but are well understood and are used sometimes. There are such ones, which were not yet formulated clearly and which interpretation is still complicated. 
   This software represents the methods for the solution of the problems of the following types: taxonomy (1.3.N), selection of the informative properties system (2.3.N), pattern recognition (1.1.N, 1.2.N), filling of gaps (3.2.N, 3.2.P, 3.2.K), prediction of dynamic objects (1.1.K). 


REFERENCES

1.Zagoruiko N.G., Elkina V.N., Lbov G.S., Emelianov S.V. Package of Applied Programs OTEKS. "Finance and Statistics". M., 1968