EN|RU
English version:
Journal of Applied and Industrial Mathematics, 2020, 14:2, 242-248

Volume 27, No 2, 2020, P. 5-16

UDC 519.254
I. A. Borisova
Computational complexity of the problem of choosing typical representatives in a 2-clustering of a finite set of points in a metric space

Abstract:
We consider the computational complexity of one extremal problem of choosing a subset of $p$ points from some given 2-clustering of a finite set in a metric space. The chosen subset of points has to describe the given clusters in the best way from the viewpoint of some geometric criterion. This is a formalization of an applied problem of data mining which consists in finding a subset of typical representatives of a dataset composed of two classes based on the function of rival similarity. The problem is proved to be NP-hard. To this end, we polynomially reduce to the problem one of the well-known problems NP-hard in the strong sense, the $p$-median problem.
Bibliogr. 15.

Keywords: NP-hard problem, typical representative, rival similarity, $p$-median problem, data mining.

DOI: 10.33048/daio.2020.27.631

Irina A. Borisova 1,2
1. Sobolev Institute of Mathematics,
4 Koptyug Ave., 630090 Novosibirsk, Russia
2. Novosibirsk State University,
2 Pirogov St., 630090 Novosibirsk, Russia
e-mail: biamia@mail.ru

Received September 10, 2018
Revised December 24, 2019
Accepted February 19, 2020

References

[1] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness (Freeman, San Francisco, 1979).

[2] D. Aloise, A. Deshpande, P. Hansen, and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Mach. Learn. 75 (2), 245–248 (2009).

[3] S. Dasgupta, The hardness of $k$-means clustering, in Technical report CS2007-0890 (Univ. California, San Diego, 2008), pp. 1–6.

[4] C. H. Papadimitriou, Worst-case and probabilistic analysis of a geometric location problem, SIAM J. Comput. 10 (3), 542–557 (1981).

[5] S. Har-Peled and S. Mazumdar, Coresets for k-means and k-median clustering and their applications, in Proc. 36th Annu. ACM Sympos. Theory Comput., Chicago, IL, USA, June 13–15, 2004 (ACM, New York, 2004), pp. 291–300.

[6] L. Kaufman and P. J. Rousseeuw, Clustering by means of medoids, in Statistical Data Analysis Based on the $L_1$-Norm and Related Methods (North Holland, Amsterdam, 1987), pp. 405–416.

[7] A. V. Kel’manov, A. V. Pyatkin, and V. I. Khandeev, On the complexity of some max–min clustering problems, Tr. Inst. Mat. Mekh. UrO RAN 24 (4), 189–198 (2018) [Russian].

[8] A. V. Kel’manov and A. V. Pyatkin, NP-hardness of some Euclidean problems of partition of a finite set of points, Zh. Vychisl. Mat. Mat. Fiz. 58 (5), 852–856 (2018) [Russian].

[9] H. Aggarwal, N. Imai, N. Katoh, and S. Suri, Finding k points with minimum diameter and related problems, J. Algorithms 12 (1), 38–56 (1991).

[10] A. V. Zukhba, NP-completeness of the problem of prototype selection in the nearest neighbor method, Pattern Recognit. Image Anal. 20 (4), 484–494 (2010).

[11] S. Banerjee, S. Bhore, and R. Chitnis, Algorithms and hardness results for nearest neighbor problems in bicolored point sets, in Proc. 13th Latin Amer. Theor. Inform. Symp., Buenos Aires, Argentina, Apr. 16–19, 2018 (Springer, Cham, 2018), pp. 80–93 (Lect. Notes Comput. Sci., Vol. 10807).

[12] V. N. Vapnik, The Restoration of Dependencies from Empirical Data (Nauka, Moscow, 1974) [Russian].

[13] C. J. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining Knowl. Discov. 2 (2), 121–167 (1998).

[14] N. G. Zagoruiko, I. A. Borisova, V. V. Dyubanov, and O. A. Kutnenko, Methods of recognition based on the function of rival similarity, Pattern Recognit. Image Anal. 18 (1), 1–6 (2008).

[15] O. Kariv and S. Hakimi, An algoritmic approach to network location problems. II: The $p$-medians, SIAM J. Appl. Math., 37 (3), 539–560 (1979).
 © Sobolev Institute of Mathematics, 2015