|
SELECTION
OF THE INFORMATIVE PROPERTIES SYSTEM
Informativeness of the property is the relative conception. The same system
of properties may be both informative for solution of one recognition problem
and not informative for another one. For example, the candidates for participation
in mathematical and sportive Olympiads must be selected according to the
different features. The estimation of informativeness depends upon
the list of patterns to be recognised
S=<s1, s2,...si,...sk
>, i.e. what must be told from what. It also depends upon the decision
functions D. Thus, it is impossible to indicate the «typical»
or «often used» features. It is necessary to find its own multitude
of informative describing properties Xn for every problem.
The initial set of properties (Xg system) is created in the non-formalised
way, on the base of expert experience and intuition. The formal methods
are applied to test this initial system for its sufficiency and necessity.
Among all W possible systems of features we will accept the system w as
sufficient, if for the given values of S, D and training set A it provides
the expenses N, not exceeding the given limit of N0. Expenses N here are
understood the price of properties measurement (Nx) and the price of losses,
caused by recognition errors (Nr): N = Nx + Nr. The system of minimum complexity
(price) is accepted as necessary. So, in fact we solve the running over
problem of the following type:
w = arg min N(X
)/S,D,A,N0 .
Criteria
of properties informativeness.
The key criterion of properties informativeness in the problem of pattern
recognition is, of course,
the value of losses from errors R. Even if the distributions of general
totality are known, the calculation of losses R is connected with high
consumption of computer operation time. Due to it the more simply calculated
criteria are used and, at the same time, which strictly, if not directly,
correlate to the estimation of losses R.
If the distribution of each pattern realisations is ruled by the normal
law with diagonal and equal matrixes of covariations (the surfaces
of equal density represent itself the spheres of equal radius), then the
average value of Euclid distance between mathematical expectations of all
couples of patterns may be taken as the measure of recognition complexity
(Q).
k
2
Q = (S
mij ) /
C ,
i,j=1 k
where mij
is an Euclid distance between mathematical expectations of i-th and j-th
patterns.
In terms of information theory the entropy H of patterns probability density
distributions serves as the measure of recognition complexity.
But in real problems the laws of patterns realisation distributions are
usually unknown and the often the volume of the training set is not big.
In such conditions it is reasonable to use the methods, not requiring the
construction of the distribution models and relying upon the concrete objects,
present in the training set A.
According to these «precedents» the decision rule is constructed
(for example, the rule of k closest neighbours), the control sequence is
recognised and according to the number of obtained errors the informativeness
of separate property of properties system is estimated. Other methods of
properties informativeness estimation are possible as well.
The hypothesis of compactness
gives us the basis for the estimation of informativeness of the property
space via revealing of compactness characteristics. It means that for good
pattern recognition it is desirable, that the distances between points
of each pattern were small and the distances between points of different
patterns were as large as possible. The compactness (density) Di of pattern
i, presented in the training set by mi points 1,2,...t,...l,...mi, may
be characterised by the average length r(tl) of the verges of the full
graph, connecting them:
mi 2
Di= [ Sr(tl)]
/ C .
l,t=1 mi
Similarly,
the compactness of mj points 1,2,...s,...v,...mj, presenting the pattern
j will be equal to Dj. The remoteness of patterns in the property space
may be estimated by average distance between all couples of points from
different patterns.
On the basis of aforementioned the informativeness of property space will
be higher the higher is the value
J = Dij* |Di - Dj| /
(Di + Dj).
Estimation of informativeness
of properties may be as obtained well directly in the process of
decision rule construction in the form of the tree of dichotomy separations
of the selection by separate properties.
Imagine, that we have the possibility to separate the property to the two
gradations only: x<s and x>=s. Let us observe the composition of realisations
in these gradations. If mi realisations of i-th pattern and mv realisation
of v-th pattern will be discovered in the first gradation, then the non-uniformity
of composition of this gradation may be estimated by the value
k
R1 =
S mi1 *mv1.
i=1,v=i+1
In the similar way it will be possible to find the non-uniformity of the
second gradation composition R2. The value Rs = R1+R2 will characterise
the informativeness of the property x for the threshold of the separation
into two gradations s. Changing the threshold value s it is possible to
find such its meaning, when R1 will reach its minimum value R’. If the
initial indefiniteness is estimated by
k
R0 = S
mi *mv ,
i=1,v=i+1
then the decrease of indefiniteness
after recovery of information from the property x, i.e. the informativeness
of the property x, will be estimated by the value Ix =(R0 - R')/R0.
If R’=0 then the informativeness Ix of the property will be maximum and
equal to 1. If R’ has not decreased the initial indefiniteness, then Ix=0
and it is natural to accept the property x as non-informative one.
If it is known, that properties do not depend upon each other, then it
is possible to estimate, by means of one of prescribed methods, the informativeness
of all g properties of initial system and then to select from them n most
informative ones. But in real data tables the dependence between properties
is observed very often. And if the properties are dependent, then the selection
of the most informative subsystem should not be based on estimation of
their individual informativeness, it is necessary to run over and to test
for informativeness C their combinations. To reduce the running over there
were developed the heuristic
algorithms of directed running
over, which provide the solution, closed to optimal, in the appropriate
time.
|