DESCRIPTION OF FOREL FAMILY ALGORITHMS

     FOREL family algorithms apply criterion F, based on the 
compactness hypothesis: one taxon must include objects, similar in their properties to some «central» object. The measure of similarity is defined by Euclid distance in the space of properties.  If the coordinates of the j-th taxon centre will be denoted by the symbol cj, then the sum of distances r between the centre and all lj points xj of this taxon will be equal  to  
       lj 
rj = Sr (cj,xj), and the sum of such internal distances for all k taxons will  
     j=1            k 
be          F = Srj. 
                   j=1 
     It is necessary to find the separation of m objects into k taxons, providing the minimum value of F. Let us describe the basic version and some modifications of FOREL algorithm, proposed for the search for such separation [1]. 
  

 FOREL algorithm.

    The taxons, produced by means of this algorithm has the spherical shape. In the beginning the object properties are normalised to put their values to the range (0--1). Then the hypersphere with minimum radius R0, including all m points, is constructed. If we need one taxon, then it is represented by this sphere. To obtain more taxons the radius of the sphere is gradually decreasing. The radius of 0.9R0 is taken and the sphere centre is placed to any of these points. Finding the points the distance to which is lower than the radius, we calculate the coordinates of the centre of gravity of these internal points. Then, shifting the centre of the sphere to the centre of gravity, we find again the internal points. The sphere looks like floating to the place of local clotting of points. This procedure of inner points detection and shifting the sphere centre is performed until the moment, when the sphere will stop, i.e. when the composition of inner points, and, subsequently, their centre of gravity, will become established. It means that the sphere stopped in the region of local maximum of point’s density in the property space. 
     We declare that the points, remaining inside the stopped sphere, belong to the taxon number 1 and exclude them from the further investigation. For other points the prescribed procedure is repeated until inclusion of all points into taxons. 
   If the amount of obtained taxons is below the desired value, we repeat the taxonomy process with the sphere radius by 10% lower than preceding one. 
     It was proved that the algorithm requires finite number of iterations. 
  

FOREL-2 algorithm.

    This modification of the prescribed basic algorithm is proposed for taxonomy with obtaining of strictly fixed number of taxons k. Here the sphere radius is increased or decreased, according to the necessity, by the value of R, which at every following i-th iteration becomes lower and lower, e.g. is decreased by a factor of two. This method of consecutive approaching permits to quickly find the step, providing the obtained number of taxons ki equal to k. 
     The functional of the taxonomy quality in this algorithm looks as follows: 
                        k                  
         F = f(ki)* S rj ,  where f(ki) = 1,  if ki=k    and f  
                      j=1                

  
SKAT algorithm.

    If in the result of multiply random selection of the starting point the big number of different taxonomies is received and if the taxons significantly differ from each other by number of the included points, then it may mean, that our material contains, along with few local clots of points, the single points and their small clots, dispersed randomly in the space between large clots. There appears the feeling, that there are few «independent» taxons and some number of randomly formed, «dependent» taxons and would be reasonable to attach them to the nearest «independent» ones. 
     SKAT software is supplied by the multitude of m objects and the results of its taxonomy S, performed by means of FOREL algorithm with sphere radius, equal to R. The taxonomy procedures are repeated with the same sphere radius, but with the difference, that the starting points are selected in the centres, obtained in taxonomy S and new taxonomy and new taxonomy is performed with participation of all m points. 
   It results in discovery of unstable taxons, which are rolling down to taxons - predecessors. The solution is given in a form of list of stable taxons and indication of those unstable taxons, which are attracted to them. If we want to limit ourselves by stable taxons only, then we must strive to the taxonomy variant, providing the maximum number of points l in stable taxons, i.e. to maximise the functional: 
        k                         
 F = Slj*f(j),  where f(j) =  1, if j-th taxon is stable  and f(j) =  0, if  unstable. 
      j=1  
 

BIGFOR algorithm.

    Sometimes one can meet such large tables, when it is impossible to put them in a whole into the computer RAM and it is necessary to treat them by parts. Iterative algorithm BIGFOR is created for taxonomy of such arrays. 
    In the beginning the computer memory is loaded by an array of V objects (V<<m), which is divided into k’ taxons by FOREL-2 algorithm. Description of taxon consists of the coordinates of its centre and number of its internal points lj’ («weight» of the centre). Then the next portion of data is input into the memory and the same procedure repeats. After t times repeating of these steps (where t=m/V) we receive the array of q=t*k’ points, representing the centres of taxons, which appeared at each step. Grouping of these points into k taxons is performed by means of FOREL-2 algorithm modification, accounting for the weight of the internal points in calculation of the centre of gravity coordinates. 
     If the number of initial objects m is too high, the procedure of taxon enlargement may contain not only two, but many stages, the taxons «rolling down» into more and more large «balls» is performed few times. 
     The criterion of quality for the taxonomy, obtained by BIGFOR algorithm, is the same as for FOREL algorithm. 
     BIGFOR algorithm may be applied for the hierarchic taxonomy. At the first stage the small radius R>0, giving k1<m taxons of the first level, is selected. At the second stage the confluence of closed to each other small taxons into more large ones occurs, resulting in the appearance of k2 taxons of the second level (k2<k1). If we will repeat these steps, then at some step p we will obtain the taxonomy, uniting all points into the only one taxon. 
 Relations between taxons of different levels may be represented in a form of hierarchic structure or tree, consisting of m objects at zero level («leaves» level) and ki taxons (i=1,…,p) at each of p-th level. 
     The example of FOREL algorithm operation stages may be observed at the demonstration example. 


REFERENCES

1. Zagoruiko N.G., Elkina V.N., Lbov G.S., Emelianov S.V. OTEKS applied software. «Finances and statistics», Moscow, 1986.