ДАИО

EN\|RU About Journal Editorial Board Publishing Ethics Author Guide Submission Prepare Your Manuscript Peer review Archive English version: Journal of Applied and Industrial Mathematics, 2020, 14:2, 242-248
	Volume 27, No 2, 2020, P. 5-16 UDC 519.254 I. A. Borisova Computational complexity of the problem of choosing typical representatives in a 2-clustering of a finite set of points in a metric space Abstract: We consider the computational complexity of one extremal problem of choosing a subset of $p$ points from some given 2-clustering of a finite set in a metric space. The chosen subset of points has to describe the given clusters in the best way from the viewpoint of some geometric criterion. This is a formalization of an applied problem of data mining which consists in finding a subset of typical representatives of a dataset composed of two classes based on the function of rival similarity. The problem is proved to be NP-hard. To this end, we polynomially reduce to the problem one of the well-known problems NP-hard in the strong sense, the $p$-median problem. Bibliogr. 15. Keywords: NP-hard problem, typical representative, rival similarity, $p$-median problem, data mining. DOI: 10.33048/daio.2020.27.631 Irina A. Borisova ^1,2 1. Sobolev Institute of Mathematics, 4 Koptyug Ave., 630090 Novosibirsk, Russia 2. Novosibirsk State University, 2 Pirogov St., 630090 Novosibirsk, Russia e-mail: biamia@mail.ru Received September 10, 2018 Revised December 24, 2019 Accepted February 19, 2020 References [1] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness (Freeman, San Francisco, 1979). [2] D. Aloise, A. Deshpande, P. Hansen, and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Mach. Learn. 75 (2), 245–248 (2009). [3] S. Dasgupta, The hardness of $k$-means clustering, in Technical report CS2007-0890 (Univ. California, San Diego, 2008), pp. 1–6. [4] C. H. Papadimitriou, Worst-case and probabilistic analysis of a geometric location problem, SIAM J. Comput. 10 (3), 542–557 (1981). [5] S. Har-Peled and S. Mazumdar, Coresets for k-means and k-median clustering and their applications, in Proc. 36th Annu. ACM Sympos. Theory Comput., Chicago, IL, USA, June 13–15, 2004 (ACM, New York, 2004), pp. 291–300. [6] L. Kaufman and P. J. Rousseeuw, Clustering by means of medoids, in Statistical Data Analysis Based on the $L_1$-Norm and Related Methods (North Holland, Amsterdam, 1987), pp. 405–416. [7] A. V. Kel’manov, A. V. Pyatkin, and V. I. Khandeev, On the complexity of some max–min clustering problems, Tr. Inst. Mat. Mekh. UrO RAN 24 (4), 189–198 (2018) [Russian]. [8] A. V. Kel’manov and A. V. Pyatkin, NP-hardness of some Euclidean problems of partition of a finite set of points, Zh. Vychisl. Mat. Mat. Fiz. 58 (5), 852–856 (2018) [Russian]. [9] H. Aggarwal, N. Imai, N. Katoh, and S. Suri, Finding k points with minimum diameter and related problems, J. Algorithms 12 (1), 38–56 (1991). [10] A. V. Zukhba, NP-completeness of the problem of prototype selection in the nearest neighbor method, Pattern Recognit. Image Anal. 20 (4), 484–494 (2010). [11] S. Banerjee, S. Bhore, and R. Chitnis, Algorithms and hardness results for nearest neighbor problems in bicolored point sets, in Proc. 13th Latin Amer. Theor. Inform. Symp., Buenos Aires, Argentina, Apr. 16–19, 2018 (Springer, Cham, 2018), pp. 80–93 (Lect. Notes Comput. Sci., Vol. 10807). [12] V. N. Vapnik, The Restoration of Dependencies from Empirical Data (Nauka, Moscow, 1974) [Russian]. [13] C. J. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining Knowl. Discov. 2 (2), 121–167 (1998). [14] N. G. Zagoruiko, I. A. Borisova, V. V. Dyubanov, and O. A. Kutnenko, Methods of recognition based on the function of rival similarity, Pattern Recognit. Image Anal. 18 (1), 1–6 (2008). [15] O. Kariv and S. Hakimi, An algoritmic approach to network location problems. II: The $p$-medians, SIAM J. Appl. Math., 37 (3), 539–560 (1979).

	© Sobolev Institute of Mathematics, 2015

References