Volume 22, No 4, 2015, P. 50-62
UDC 519.16+519.85
Kel’manov A. V., Khandeev V. I.
An exact pseudopolynomial algorithm for a bi-partitioning problem
Abstract:
We consider the strongly NP-hard problem of partitioning a set of Euclidean vectors into two sets (clusters) under the criterion of minimum sum-of-squared distances from the elements of clusters to their centers. The center of the first cluster is the average value of the vectors in the cluster, and the center of the second one is the origin. We prove that the problem is solvable in polynomial time in the case of fixed space dimension. We also present a pseudopolynomial algorithm which finds an optimal solution in the case of integer values of the components of the vectors in the input set and fixed space dimension.
Bibliogr. 27.
Keywords: bi-partitioning, vector subset, squared Euclidean distances, NP-hardness, exact pseudopolynomial algorithm.
DOI: 10.17377/daio.2015.22.463
Alexander V. Kel’manov 1,2
Vladimir I. Khandeev 1
1. Sobolev Institute of Mathematics
4 Koptyug Ave., 630090 Novosibirsk, Russia
2. Novosibirsk State University
2 Pirogov St., 630090 Novosibirsk, Russia
e-mail: kelm@math.nsc.ru, khandeev@math.nsc.ru
Received 16 September 2014
Revised 22 February 2015
References
[1] A. A. Ageev, A. V. Kel’manov, and A. V. Pyatkin, NP-hardness of the Euclidean max-cut problem, Dokl. Akad. Nauk, 456, No. 5, 511–513, 2014. Translated in Dokl. Math., 89, No. 3, 343–345, 2014.
[2] A. E. Baburin, E. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, The problem of finding a subset of vectors with the maximum total weight, Diskretn. Anal. Issled. Oper., Ser. 2, 14, No. 1, 32–42, 2007. Translated in
J. Appl. Ind. Math., 2, No. 1, 32–38, 2008.
[3] A. E. Galashov and A. V. Kel’manov, A 2-approximate algorithm to solve one problem of the family of disjoint vector subsets, Avtom. Telemekh., No. 4, 5–19, 2014. Translated in Autom. Remote Control, 75, No. 4, 595–606, 2014.
[4] E. Kh. Gimadi, Yu. V. Glazkov, and I. A. Rykov, On two problems of choosing some subset of vectors with integer coordinates that has maximum norm of the sum of elements in an Euclidean space, Diskretn. Anal. Issled. Oper., 15, No. 4, 30–43, 2008. Translated in J. Appl. Ind. Math., 3, No. 3, 343–352, 2009.
[5] E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, A posteriori detection of a quasiperiodic fragment with a given number of repetitions in a numerical sequence, Sib. Zh. Ind. Mat., 9, No. 1, 55–74, 2006.
[6] E. Kh. Gimadi, A. V. Pyatkin, and I. A. Rykov, On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension, Diskretn. Anal. Issled. Oper., 15, No. 6, 11–19, 2008. Translated in J. Appl. Ind. Math., 4, No. 1, 48–53, 2010.
[7] A. V. Dolgushev and A. V. Kel’manov, On the algorithmic complexity of a problem in cluster analysis, Diskretn. Anal. Issled. Oper., 17, No. 2, 39–45, 2010. Translated in J. Appl. Ind. Math., 5, No. 2, 191–194, 2011.
[8] A. V. Dolgushev and A. V. Kel’manov, An approximation algorithm for solving a problem of cluster analysis, Diskretn. Anal. Issled. Oper., 18, No. 2, 29–40, 2011. Translated in J. Appl. Ind. Math., 5, No. 4, 551–558, 2011.
[9] A. V. Dolgushev, A. V. Kel’manov, and V. V. Shenmaier, A polynomial approximation scheme for a problem of cluster analysis, in Doklady 9-i mezhdunarodnoi konferentsii “Intellektualizatsiya obrabotki informatsii” (Doklady 9th Int. Conf. “Intellectualization of Information Processing”), Budva,
Montenegro, Sept. 16–22, 2012, pp. 242–244, Torus Press, Moscow, 2012.
[10] I. I. Eremin, E. Kh. Gimadi, A. V. Kel’manov, A. V. Pyatkin, and M. Yu. Khachai, 2-Approximation algorithm for finding a clique with minimum weight of vertices and edges, Tr. Inst. Mat. Mekh., 19, No. 2, 134–143, 2013. Translated in Proc. Steklov Inst. Math., 284, Suppl. 1, S87–S95, 2014.
[11] A. V. Kel’manov, Off-line detection of a quasi-periodically recurring fragment in a numerical sequence, Tr. Inst. Mat. Mekh., 14, No. 2, 81–88, 2008. Translated in Proc. Steklov Inst. Math., 263, Suppl. 2, S84–S92, 2008.
[12] A. V. Kel’manov, On the complexity of some data analysis problems, Zh. Vychisl. Mat. Mat. Fiz., 50, No. 11, 2045–2051, 2010. Translated in Comput. Math. Math. Phys., 50, No. 11, 1941–1947, 2010.
[13] A. V. Kel’manov, On the complexity of some cluster analysis problems, Zh. Vychisl. Mat. Mat. Fiz., 51, No. 11, 2106–2112, 2011. Translated in Comput. Math. Math. Phys., 51, No. 11, 1983–1988, 2011.
[14] A. V. Kel’manov and A. V. Pyatkin, On the complexity of a search for a subset of “similar” vectors, Dokl. Akad. Nauk, 421, No. 5, 590–592, 2008. Translated in Dokl. Math., 78, No. 1, 574–575, 2008.
[15] A. V. Kel’manov and A. V. Pyatkin, Complexity of certain problems of searching for subsets of vectors and cluster analysis, Zh. Vychisl. Mat. Mat. Fiz., 49, No. 11, 2059–2067, 2009. Translated in Comput. Math. Math. Phys., 49, No. 11, 1966–1971, 2009.
[16] A. V. Kel’manov and A. V. Pyatkin, On complexity of some problems of cluster analysis of vector sequences, Diskretn. Anal. Issled. Oper., 20, No. 2, 47–57, 2013. Translated in J. Appl. Ind. Math., 7, No. 3, 363–369, 2013.
[17] A. V. Kel’manov, S. M. Romanchenko, and S. A. Khamidullin, Accurate pseudopolynomial-time algorithms for some NP-hard problems of searching for a vector subsequence, Zh. Vychisl. Mat. Mat. Fiz., 53, No. 1, 143–153, 2013.
[18] A. V. Kel’manov and V. I. Khandeev, A 2-approximation polynomial algorithm for a clustering problem, Diskretn. Anal. Issled. Oper., 20, No. 4, 36–45, 2013. Translated in J. Appl. Ind. Math., 7, No. 4, 515–521, 2013.
[19] A. V. Kel’manov and V. I. Khandeev, A randomized algorithm for two-cluster partition of a set of vectors, Zh. Vychisl. Mat. Mat. Fiz., 55, No. 2, 335–344, 2015. Translated in Comput. Math. Math. Phys., 55, No. 2, 330–339, 2015.
[20] D. Aloise, A. Deshpande, P. Hansen, and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Mach. Learn., 75, No. 2, 245–248, 2009.
[21] A. K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett., 31, No. 8, 651–666, 2010.
[22] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, San Francisco, 1979.
[23] P. Hansen and B. Jaumard, Cluster analysis and mathematical programming, Math. Program., Ser. A, 79, No. 1–3, 191–215, 1997.
[24] P. Hansen, B. Jaumard, and N. Mladenovic, Minimum sum of squares clustering in a low dimensional space, J. Classif., 15, No. 1, 37–55, 1998.
[25] M. Inaba, N. Katoh, and H. Imai, Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering, in Proc. 10th Symp. Comput. Geom., Stony Brook, NY, USA, June 6–8, 1994, pp. 332–339, ACM, New York, 1994.
[26] J. B. MacQueen, Some methods for classification and analysis of multivariate observations, in L. M. Le Cam and J. Neyman, eds., Proc. 5th Berkeley Symp. Math. Stat. Probab., Berkeley, USA, June 21 – July 18, 1965 and Dec. 27, 1965 – Jan. 7, 1966, Vol. 1, pp. 281–297, Univ. of California Press, Berkeley, 1967.
[27] M. R. Rao, Cluster analysis and mathematical programming, J. Am. Stat. Assoc., 66, 622–626, 1971. |