|
DESCRIPTION
OF FOREL FAMILY ALGORITHMS
FOREL family algorithms apply criterion F, based on the
compactness
hypothesis: one taxon must include objects, similar in their
properties to some «central» object. The measure of similarity
is defined by Euclid distance in the space of properties.
If the coordinates of the j-th taxon centre will be denoted by the symbol
cj, then the sum of distances r between the centre and all lj points xj
of this taxon will be equal to
lj
rj = Sr
(cj,xj), and the sum of such internal distances for all k taxons will
j=1 k
be
F = Srj.
j=1
It is necessary to find the separation of m objects into k taxons, providing
the minimum value of F. Let us describe the basic version and some modifications
of FOREL algorithm, proposed for the search for such separation [1].
FOREL
algorithm.
The taxons,
produced by means of this algorithm has the spherical shape. In the beginning
the object properties are normalised to put their values to the range (0--1).
Then the hypersphere with minimum radius R0, including all m points, is
constructed. If we need one taxon, then it is represented by this sphere.
To obtain more taxons the radius of the sphere is gradually decreasing.
The radius of 0.9R0 is taken and the sphere centre is placed to any of
these points. Finding the points the distance to which is lower than the
radius, we calculate the coordinates of the centre of gravity of these
internal points. Then, shifting the centre of the sphere to the centre
of gravity, we find again the internal points. The sphere looks like floating
to the place of local clotting of points. This procedure of inner points
detection and shifting the sphere centre is performed until the moment,
when the sphere will stop, i.e. when the composition of inner points, and,
subsequently, their centre of gravity, will become established. It means
that the sphere stopped in the region of local maximum of point’s density
in the property space.
We declare that the points, remaining inside the stopped sphere, belong
to the taxon number 1 and exclude them from the further investigation.
For other points the prescribed procedure is repeated until inclusion of
all points into taxons.
If the amount
of obtained taxons is below the desired value, we repeat the taxonomy process
with the sphere radius by 10% lower than preceding one.
It was proved that the algorithm requires finite number of iterations.
FOREL-2
algorithm.
This modification
of the prescribed basic algorithm is proposed for taxonomy with obtaining
of strictly fixed number of taxons k. Here the sphere radius is increased
or decreased, according to the necessity, by the value of R, which at every
following i-th iteration becomes lower and lower, e.g. is decreased by
a factor of two. This method of consecutive approaching permits to quickly
find the step, providing the obtained number of taxons ki equal to k.
The functional of the taxonomy quality in this algorithm looks as follows:
k
F = f(ki)* S rj
, where f(ki) = 1, if ki=k
and f
j=1
SKAT
algorithm.
If in
the result of multiply random selection of the starting point the big number
of different taxonomies is received and if the taxons significantly differ
from each other by number of the included points, then it may mean, that
our material contains, along with few local clots of points, the single
points and their small clots, dispersed randomly in the space between large
clots. There appears the feeling, that there are few «independent»
taxons and some number of randomly formed, «dependent» taxons
and would be reasonable to attach them to the nearest «independent»
ones.
SKAT software is supplied by the multitude of m objects and the results
of its taxonomy S, performed by means of FOREL algorithm with sphere radius,
equal to R. The taxonomy procedures are repeated with the same sphere radius,
but with the difference, that the starting points are selected in the centres,
obtained in taxonomy S and new taxonomy and new taxonomy is performed with
participation of all m points.
It results
in discovery of unstable taxons, which are rolling down to taxons - predecessors.
The solution is given in a form of list of stable taxons and indication
of those unstable taxons, which are attracted to them. If we want to limit
ourselves by stable taxons only, then we must strive to the taxonomy variant,
providing the maximum number of points l in stable taxons, i.e. to maximise
the functional:
k
F = Slj*f(j),
where f(j) = 1, if j-th taxon is stable
and f(j) = 0, if
unstable.
j=1
BIGFOR
algorithm.
Sometimes
one can meet such large tables, when it is impossible to put them in a
whole into the computer RAM and it is necessary to treat them by parts.
Iterative algorithm BIGFOR is created for taxonomy of such arrays.
In the
beginning the computer memory is loaded by an array of V objects (V<<m),
which is divided into k’ taxons by FOREL-2 algorithm. Description of taxon
consists of the coordinates of its centre and number of its internal points
lj’ («weight» of the centre). Then the next portion of data
is input into the memory and the same procedure repeats. After t times
repeating of these steps (where t=m/V) we receive the array of q=t*k’ points,
representing the centres of taxons, which appeared at each step. Grouping
of these points into k taxons is performed by means of FOREL-2 algorithm
modification, accounting for the weight of the internal points in calculation
of the centre of gravity coordinates.
If the number of initial objects m is too high, the procedure of taxon
enlargement may contain not only two, but many stages, the taxons «rolling
down» into more and more large «balls» is performed few
times.
The criterion of quality for the taxonomy, obtained by BIGFOR algorithm,
is the same as for FOREL algorithm.
BIGFOR algorithm may be applied for the hierarchic taxonomy. At the first
stage the small radius R>0, giving k1<m taxons of the first level, is
selected. At the second stage the confluence of closed to each other small
taxons into more large ones occurs, resulting in the appearance of k2 taxons
of the second level (k2<k1). If we will repeat these steps, then at
some step p we will obtain the taxonomy, uniting all points into the only
one taxon.
Relations between
taxons of different levels may be represented in a form of hierarchic structure
or tree, consisting of m objects at zero level («leaves» level)
and ki taxons (i=1,…,p) at each of p-th level.
The example of FOREL algorithm operation stages may be observed at the
demonstration example.
REFERENCES
1. Zagoruiko
N.G., Elkina V.N., Lbov G.S., Emelianov S.V. OTEKS applied software. «Finances
and statistics», Moscow, 1986. |
|