SELECTION OF THE INFORMATIVE PROPERTIES SYSTEM

     Informativeness of the property is the relative conception. The same system of properties may be both informative for solution of one recognition problem and not informative for another one. For example, the candidates for participation in mathematical and sportive Olympiads must be selected according to the different features.  The estimation of informativeness depends upon the list of patterns to be recognised  
S=<s1, s2,...si,...sk >, i.e. what must be told from what. It also depends upon the decision functions D. Thus, it is impossible to indicate the «typical» or «often used» features. It is necessary to find its own multitude of informative describing properties Xn for every problem. 
     The initial set of properties (Xg system) is created in the non-formalised  way, on the base of expert experience and intuition. The formal methods are applied to test this initial system for its sufficiency and necessity. Among all W possible systems of features we will accept the system w as sufficient, if for the given values of S, D and training set A it provides the expenses N, not exceeding the given limit of N0. Expenses N here are understood the price of properties measurement (Nx) and the price of losses, caused by recognition errors (Nr): N = Nx + Nr. The system of minimum complexity (price) is accepted as necessary. So, in fact we solve the running over problem of the following type: 

 w = arg min N(X )/S,D,A,N0 .   
Criteria of properties informativeness.

     The key criterion of properties informativeness in the problem of pattern recognition is, of course, the value of losses from errors R. Even if the distributions of general totality are known, the calculation of losses R is connected with high consumption of computer operation time. Due to it the more simply calculated criteria are used and, at the same time, which strictly, if not directly, correlate to the estimation of losses R. 
     If the distribution of each pattern realisations is ruled by the normal law with diagonal and equal matrixes of covariations  (the surfaces of equal density represent itself the spheres of equal radius), then the average value of Euclid distance between mathematical expectations of all couples of patterns may be taken as the measure of recognition complexity (Q). 
         k              2 
Q = (S mij ) / C     , 
     i,j=1           k 
where mij is an Euclid distance between mathematical expectations of i-th and j-th patterns. 
      In terms of information theory the entropy H of patterns probability density distributions serves as the measure of recognition complexity.  
     But in real problems the laws of patterns realisation distributions are usually unknown and the often the volume of the training set is not big. In such conditions it is reasonable to use the methods, not requiring the construction of the distribution models and relying upon the concrete objects, present in the training set A. 
     According to these «precedents» the decision rule is constructed (for example, the rule of k closest neighbours), the control sequence is recognised and according to the number of obtained errors the informativeness of separate property of properties system is estimated. Other methods of properties informativeness estimation are possible as well. 
The hypothesis of compactness gives us the basis for the estimation of informativeness of the property space via revealing of compactness characteristics. It means that for good pattern recognition it is desirable, that the distances between points of each pattern were small and the distances between points of different patterns were as large as possible. The compactness (density) Di of pattern i, presented in the training set by mi points 1,2,...t,...l,...mi, may be characterised by the average length r(tl) of the verges of the full graph, connecting them: 
                 mi            2 
        Di= [ Sr(tl)] / C   .  
              l,t=1          mi 
   Similarly, the compactness of mj points 1,2,...s,...v,...mj, presenting the pattern j will be equal to Dj. The remoteness of patterns in the property space may be estimated by average distance between all couples of points from different patterns.  
     On the basis of aforementioned the informativeness of property space will be higher the higher is the value  

J = Dij* |Di - Dj| / (Di + Dj).  
  Estimation of informativeness of properties may be as obtained well  directly in the process of decision rule construction in the form of the tree of dichotomy separations of the selection by separate properties.  
     Imagine, that we have the possibility to separate the property to the two gradations only: x<s and x>=s. Let us observe the composition of realisations in these gradations. If mi realisations of i-th pattern and mv realisation of v-th pattern will be discovered in the first gradation, then the non-uniformity of composition of this gradation may be estimated by the value  
              k 
    R1 = S mi1 *mv1.  
       i=1,v=i+1  
     In the similar way it will be possible to find the non-uniformity of the second gradation composition R2. The value Rs = R1+R2 will characterise the informativeness of the property x for the threshold of the separation into two gradations s. Changing the threshold value s it is possible to find such its meaning, when R1 will reach its minimum value R’. If the initial indefiniteness is estimated  by 
           k 
 R0 = S mi *mv   ,  
    i=1,v=i+1               
then the decrease of indefiniteness after recovery of information from the property x, i.e. the informativeness of the property x, will be estimated by the value Ix =(R0 - R')/R0.  If R’=0 then the informativeness Ix of the property will be maximum and equal to 1. If R’ has not decreased the initial indefiniteness, then Ix=0 and it is natural to accept the property x as non-informative one. 
     If it is known, that properties do not depend upon each other, then it is possible to estimate, by means of one of prescribed methods, the informativeness of all g properties of initial system and then to select from them n most informative ones. But in real data tables the dependence between properties is observed very often. And if the properties are dependent, then the selection of the most informative subsystem should not be based on estimation of their individual informativeness, it is necessary to run over and to test for informativeness C their combinations. To reduce the running over there were developed the heuristic algorithms of directed running over, which provide the solution, closed to optimal, in the appropriate time.