EN|RU
English version:
Journal of Applied and Industrial Mathematics, 2016, 10:2, 209-219

Volume 23, No 2, 2016, P. 21-40

UDC 519.16+519.85
A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev
A fully polynomial-time approximation scheme for a sequence 2-cluster partitioinig problem

Abstract:
We consider a strongly NP-hard problem of partitioning a finite sequence of points in Euclidean space into two clusters minimizing the sum over both clusters of intra-cluster sum of squared distances from the clusters elements to their centers. The sizes of the clusters are fixed. The centroid of the first cluster is defined as the mean value of all vectors in the cluster, and the center of the second one is given in advance and is equal to 0. Additionally, the partition must satisfy the restriction that for all vectors in the first cluster the difference between the indices of two consequent points from this cluster is bounded from below and above by some given constants. We present a fully polynomial-time approximation scheme for the case of fixed space dimension.
Bibliogr. 27.

Keywords: partitioning, sequence, Euclidean space, minimum sum-of-squared distances, NP-hardness, FPTAS.

DOI: 10.17377/daio.2016.23.511

Alexander V. Kel’manov 1,2
Sergey A. Khamidullin 1

Vladimir I. Khandeev 1

1. Sobolev Institute of Mathematics,
4 Koptyug Ave., 630090 Novosibirsk, Russia
2. Novosibirsk State University,
2 Pirogov St., 630090 Novosibirsk, Russia
e-mail: kelm@math.nsc.ru, kham@math.nsc.ru, khandeev@math.nsc.ru

Received 15 September 2015
Revised 12 January 2016

References

[1] A. E. Baburin, E. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, The problem of finding a subset of vectors with the maximum total weight, Diskretn. Anal. Issled. Oper., Ser. 2, 14, No. 1, 32–42, 2007. Translated in
J. Appl. Ind. Math., 2, No. 1, 32–38, 2008.

[2] E. Kh. Gimadi, Yu. V. Glazkov, and I. A. Rykov, On two problems of choosing some subset of vectors with integer coordinates that has maximum norm of the sum of elements in Euclidean space, Diskretn. Anal. Issled. Oper., 15, No. 4, 30–43, 2008. Translated in J. Appl. Ind. Math., 3, No. 3, 343–352, 2009.

[3] E. Kh. Gimadi, A. V. Pyatkin, and I. A. Rykov, On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension, Diskretn. Anal. Issled. Oper., 15, No. 6, 11–19, 2008. Translated in J. Appl. Ind. Math., 4, No. 1, 48–53, 2010.

[4] A. V. Dolgushev and A. V. Kel’manov, An approximation algorithm for solving a problem of cluster analysis, Diskretn. Anal. Issled. Oper., 18, No. 2, 29–40, 2011. Translated in J. Appl. Ind. Math., 5, No. 4, 551–558, 2011.

[5] A. V. Dolgushev, A. V. Kel’manov, and V. V. Shenmaier, Polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters, Tr. Inst. Mat. Mekh., 21, No. 3, 100–109, 2015.

[6] A. V. Kel’manov, Off-line detection of a quasi-periodically recurring fragment in a numerical sequence, Tr. Inst. Mat. Mekh., 14, No. 2, 81–88, 2008. Translated in Proc. Steklov Inst. Math., 263, Suppl. 2, S84–S92, 2008.

[7] A. V. Kel’manov, On the complexity of some data analysis problems, Zh. Vychisl. Mat. Mat. Fiz., 50, No. 11, 2045–2051, 2010. Translated in Comput. Math. Math. Phys., 50, No. 11, 1941–1947, 2010.

[8] A. V. Kel’manov, On the complexity of some cluster analysis problems, Zh. Vychisl. Mat. Mat. Fiz., 51, No. 11, 2106–2112, 2011. Translated in Comput. Math. Math. Phys., 51, No. 11, 1983–1988, 2011.

[9] A. V. Kel’manov and A. V. Pyatkin, Complexity of certain problems of searching for subsets of vectors and cluster analysis, Zh. Vychisl. Mat. Mat. Fiz., 49, No. 11, 2059–2067, 2009. Translated in Comput. Math. Math. Phys., 49, No. 11, 1966–1971, 2009.

[10] A. V. Kel’manov and A. V. Pyatkin, On complexity of some problems of cluster analysis of vector sequences, Diskretn. Anal. Issled. Oper., 20, No. 2, 47–57, 2013. Translated in J. Appl. Ind. Math., 7, No. 3, 363–369, 2013.

[11] A. V. Kel’manov and S. M. Romanchenko, An FPTAS for a vector subset search problem, Diskretn. Anal. Issled. Oper., 21, No. 3, 41–52, 2014. Translated in J. Appl. Ind. Math., 8, No. 3, 329–336, 2014.

[12] A. V. Kel’manov and S. A. Khamidullin, Posterior detection of a given number of identical subsequences in a quasi-periodic sequence, Zh. Vychisl. Mat. Mat. Fiz., 41, No. 5, 807–820, 2001. Translated in Comput. Math. Math. Phys., 41, No. 5, 762–774, 2001.

[13] A. V. Kel’manov and S. A. Khamidullin, An approximating polynomial algorithm for a sequence partitioning problem, Diskretn. Anal. Issled. Oper., 21, No. 1, 53–66, 2014. Translated in J. Appl. Ind. Math., 8, No. 2, 236–244, 2014.

[14] A. V. Kel’manov and S. A. Khamidullin, An approximation polynomial-time algorithm for a sequence bi-clustering problem, Zh. Vychisl. Mat. Mat. Fiz., 55, No. 6, 1076–1085, 2015. Translated in Comput. Math. Math. Phys., 55, No. 6, 1068–1076, 2015.

[15] A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, An exact pseudopolynomial algorithm for a sequence bi-clustering problem, in Tezisy dokladov XV Vserossiiskoy konferentsii “Matematicheskoe programmirovanie i prilozheniya” (Abstr. XV All-Russ. Conf. “Mathematical Programming and
Applications”), Ekaterinburg, Russia, Mar. 2–6, 2015, pp. 139–140, Inst. Mat. Mekh. UrO RAN, Ekaterinburg, 2015.

[16] A. V. Kel’manov and V. I. Khandeev, A randomized algorithm for two-cluster partition of a set of vectors, Zh. Vychisl. Mat. Mat. Fiz., 55, No. 2, 335–344, 2015. Translated in Comput. Math. Math. Phys., 55, No. 2, 330–339, 2015.

[17] A. V. Kel’manov and V. I. Khandeev, An exact pseudopolynomial algorithm for a problem of the two-cluster partitioning of a set of vectors, Diskretn. Anal. Issled. Oper., 22, No. 3, 36–48, 2015. Translated in J. Appl. Ind. Math., 9, No. 4, 497–502, 2015.

[18] A. V. Kel’manov and V. I. Khandeev, Fully polynomial-time approximation scheme for special case of a quadratic Euclidean 2-clustering problem, Zh. Vychisl. Mat. Mat. Fiz., 56, No. 2, 145–153, 2016.

[19] D. Aloise, A. Deshpande, P. Hansen, and P. Popat, NP-hardness of Euclidean sum-of-squares clustering, Mach. Learn., 75, No. 2, 245–248, 2009.

[20] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006.

[21] J. A. Carter [et al.], Kepler-36: A pair of planets with neighboring orbits and dissimilar densities, Science, 337, No. 6094, 556–559, 2012.

[22] P. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge Univ. Press, New York, 2012.

[23] E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, A posteriori detecting a quasiperiodic fragment in a numerical sequence, Pattern Recognit. Image Anal., 18, No. 1, 30–42, 2008.

[24] A. K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett., 31, No. 8, 651–666, 2010.

[25] G. James, D. Witten, T. Hastie, and D. Tibshirani, An Introduction to Statistical Learning: With Applications in R, Springer, New York, 2013.

[26] A. V. Kel’manov and B. Jeon, A posteriori joint detection and discrimination of pulses in a quasiperiodic pulse train, IEEE Trans. Signal Process., 52, No. 3, 645–656, 2004.

[27] C. Steger, M. Ulrich, and C. Wiedemann, Machine Vision Algorithms and Applications, Wiley-VCH, Berlin, 2007.
 © Sobolev Institute of Mathematics, 2015