[{"acknowledgement":"This   project   has   received   funding   from   the   Euro-pean  Research  Council  (ERC)  under  the  EuropeanUnion’s  Horizon  2020  research  and  innovation  programme  (Grant  agreement  No.   101019564  “The  De-sign  of  Modern  Fully  Dynamic  Data  Structures  (Mo-DynStruct)”  and  the  Austrian  Science  Fund  (FWF)project Z 422-N, project “Static and Dynamic Hierar-chical  Graph  Decompositions”,  I  5982-N,  and  project“Fast  Algorithms  for  a  Reactive  Network  Layer  (Re-actNet)”, P 33775-N, with additional funding from thenetidee SCIENCE Stiftung, 2020–2024.D.  Sauplic  has  received  funding  from  the  Euro-pean  Union’s  Horizon  2020  research  and  innovation programme under the Marie Sklodowska-Curie    grant    agreementNo 101034413.","abstract":[{"text":"For a set of points in Rd, the Euclidean k-means problems consists of finding k centers such that the sum of distances squared from each data point to its closest center is minimized. Coresets are one the main tools developed recently to solve this problem in a big data context. They allow to compress the initial dataset while preserving its structure: running any algorithm on the coreset provides a guarantee almost equivalent to running it on the full data. In this work, we study coresets in a fully-dynamic setting: points are added and deleted with the goal to efficiently maintain a coreset with which a k-means solution can be computed. Based on an algorithm from Henzinger and Kale [ESA'20], we present an efficient and practical implementation of a fully dynamic coreset algorithm, that improves the running time by up to a factor of 20 compared to our non-optimized implementation of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the quality of the k-means solution.","lang":"eng"}],"doi":"10.1137/1.9781611977929.17","arxiv":1,"day":"04","external_id":{"arxiv":["2310.18034"]},"date_updated":"2025-07-15T12:51:52Z","year":"2024","citation":{"apa":"Henzinger, M. H., Saulpic, D., &#38; Sidl, L. (2024). Experimental evaluation of fully dynamic k-means via coresets. In <i>2024 Proceedings of the Symposium on Algorithm Engineering and Experiments</i> (pp. 220–233). Alexandria, VA, United States: Society for Industrial &#38; Applied Mathematics. <a href=\"https://doi.org/10.1137/1.9781611977929.17\">https://doi.org/10.1137/1.9781611977929.17</a>","ama":"Henzinger MH, Saulpic D, Sidl L. Experimental evaluation of fully dynamic k-means via coresets. In: <i>2024 Proceedings of the Symposium on Algorithm Engineering and Experiments</i>. Society for Industrial &#38; Applied Mathematics; 2024:220-233. doi:<a href=\"https://doi.org/10.1137/1.9781611977929.17\">10.1137/1.9781611977929.17</a>","chicago":"Henzinger, Monika H, David Saulpic, and Leonhard Sidl. “Experimental Evaluation of Fully Dynamic K-Means via Coresets.” In <i>2024 Proceedings of the Symposium on Algorithm Engineering and Experiments</i>, 220–33. Society for Industrial &#38; Applied Mathematics, 2024. <a href=\"https://doi.org/10.1137/1.9781611977929.17\">https://doi.org/10.1137/1.9781611977929.17</a>.","ieee":"M. H. Henzinger, D. Saulpic, and L. Sidl, “Experimental evaluation of fully dynamic k-means via coresets,” in <i>2024 Proceedings of the Symposium on Algorithm Engineering and Experiments</i>, Alexandria, VA, United States, 2024, pp. 220–233.","short":"M.H. Henzinger, D. Saulpic, L. Sidl, in:, 2024 Proceedings of the Symposium on Algorithm Engineering and Experiments, Society for Industrial &#38; Applied Mathematics, 2024, pp. 220–233.","mla":"Henzinger, Monika H., et al. “Experimental Evaluation of Fully Dynamic K-Means via Coresets.” <i>2024 Proceedings of the Symposium on Algorithm Engineering and Experiments</i>, Society for Industrial &#38; Applied Mathematics, 2024, pp. 220–33, doi:<a href=\"https://doi.org/10.1137/1.9781611977929.17\">10.1137/1.9781611977929.17</a>.","ista":"Henzinger MH, Saulpic D, Sidl L. 2024. Experimental evaluation of fully dynamic k-means via coresets. 2024 Proceedings of the Symposium on Algorithm Engineering and Experiments. ALENEX: Workshop on Algorithm Engineering and Experiments, 220–233."},"publisher":"Society for Industrial & Applied Mathematics","page":"220-233","ec_funded":1,"quality_controlled":"1","title":"Experimental evaluation of fully dynamic k-means via coresets","publication_status":"published","date_created":"2024-01-09T16:22:47Z","department":[{"_id":"MoHe"}],"article_processing_charge":"No","author":[{"id":"540c9bbd-f2de-11ec-812d-d04a5be85630","orcid":"0000-0002-5008-6530","full_name":"Henzinger, Monika H","first_name":"Monika H","last_name":"Henzinger"},{"full_name":"Saulpic, David","first_name":"David","last_name":"Saulpic","id":"f8e48cf0-b0ff-11ed-b0e9-b4c35598f964"},{"id":"8b563fd0-b441-11ee-9101-a3891c61efa6","first_name":"Leonhard","last_name":"Sidl","full_name":"Sidl, Leonhard"}],"_id":"14769","scopus_import":"1","status":"public","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","main_file_link":[{"url":"https://doi.org/10.48550/arXiv.2310.18034","open_access":"1"}],"oa":1,"publication_identifier":{"eisbn":["9781611977929"]},"date_published":"2024-01-04T00:00:00Z","type":"conference","conference":{"location":"Alexandria, VA, United States","end_date":"2024-01-08","start_date":"2024-01-07","name":"ALENEX: Workshop on Algorithm Engineering and Experiments"},"language":[{"iso":"eng"}],"month":"01","oa_version":"Preprint","project":[{"call_identifier":"H2020","_id":"bd9ca328-d553-11ed-ba76-dc4f890cfe62","name":"The design and evaluation of modern fully dynamic data structures","grant_number":"101019564"},{"name":"Wittgenstein Award - Monika Henzinger","grant_number":"Z00422","_id":"34def286-11ca-11ed-8bc3-da5948e1613c"},{"_id":"bda196b2-d553-11ed-ba76-8e8ee6c21103","grant_number":"I05982","name":"Static and Dynamic Hierarchical Graph Decompositions"},{"name":"Fast Algorithms for a Reactive Network Layer","grant_number":"P33775 ","_id":"bd9e3a2e-d553-11ed-ba76-8aa684ce17fe"},{"_id":"fc2ed2f7-9c52-11eb-aca3-c01059dda49c","call_identifier":"H2020","grant_number":"101034413","name":"IST-BRIDGE: International postdoctoral program"}],"publication":"2024 Proceedings of the Symposium on Algorithm Engineering and Experiments"},{"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","status":"public","main_file_link":[{"open_access":"1","url":"https://doi.org/10.48550/arXiv.2310.04076"}],"type":"conference","date_published":"2023-12-22T00:00:00Z","oa":1,"publication_identifier":{"eisbn":["9798350318944"]},"language":[{"iso":"eng"}],"conference":{"start_date":"2023-11-06","name":"FOCS: Symposium on Foundations of Computer Science","location":"Santa Cruz, CA, United States","end_date":"2023-11-09"},"publication":"2023 IEEE 64th Annual Symposium on Foundations of Computer Science","month":"12","project":[{"_id":"fc2ed2f7-9c52-11eb-aca3-c01059dda49c","call_identifier":"H2020","name":"IST-BRIDGE: International postdoctoral program","grant_number":"101034413"},{"name":"The design and evaluation of modern fully dynamic data structures","grant_number":"101019564","call_identifier":"H2020","_id":"bd9ca328-d553-11ed-ba76-dc4f890cfe62"}],"oa_version":"Preprint","acknowledgement":"D. Sauplic has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 101034413, and Grant agreement No. 101019564 “The Design of Modern Fully Dynamic Data Structures (MoDynStruct)”.\r\nC. Schwiegelshohn acknowledges the support of the Independent Research Fund Denmark (DFF) under a Sapere Aude Research Leader grant No 1051-00106B.","external_id":{"arxiv":["2310.04076"]},"citation":{"apa":"Cohen-Addad, V., Saulpic, D., &#38; Schwiegelshohn, C. (2023). Deterministic clustering in high dimensional spaces: Sketches and approximation. In <i>2023 IEEE 64th Annual Symposium on Foundations of Computer Science</i> (pp. 1105–1130). Santa Cruz, CA, United States: IEEE. <a href=\"https://doi.org/10.1109/focs57990.2023.00066\">https://doi.org/10.1109/focs57990.2023.00066</a>","ama":"Cohen-Addad V, Saulpic D, Schwiegelshohn C. Deterministic clustering in high dimensional spaces: Sketches and approximation. In: <i>2023 IEEE 64th Annual Symposium on Foundations of Computer Science</i>. IEEE; 2023:1105-1130. doi:<a href=\"https://doi.org/10.1109/focs57990.2023.00066\">10.1109/focs57990.2023.00066</a>","chicago":"Cohen-Addad, Vincent, David Saulpic, and Chris Schwiegelshohn. “Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation.” In <i>2023 IEEE 64th Annual Symposium on Foundations of Computer Science</i>, 1105–30. IEEE, 2023. <a href=\"https://doi.org/10.1109/focs57990.2023.00066\">https://doi.org/10.1109/focs57990.2023.00066</a>.","ieee":"V. Cohen-Addad, D. Saulpic, and C. Schwiegelshohn, “Deterministic clustering in high dimensional spaces: Sketches and approximation,” in <i>2023 IEEE 64th Annual Symposium on Foundations of Computer Science</i>, Santa Cruz, CA, United States, 2023, pp. 1105–1130.","mla":"Cohen-Addad, Vincent, et al. “Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation.” <i>2023 IEEE 64th Annual Symposium on Foundations of Computer Science</i>, IEEE, 2023, pp. 1105–30, doi:<a href=\"https://doi.org/10.1109/focs57990.2023.00066\">10.1109/focs57990.2023.00066</a>.","short":"V. Cohen-Addad, D. Saulpic, C. Schwiegelshohn, in:, 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, IEEE, 2023, pp. 1105–1130.","ista":"Cohen-Addad V, Saulpic D, Schwiegelshohn C. 2023. Deterministic clustering in high dimensional spaces: Sketches and approximation. 2023 IEEE 64th Annual Symposium on Foundations of Computer Science. FOCS: Symposium on Foundations of Computer Science, 1105–1130."},"year":"2023","date_updated":"2024-01-16T07:28:06Z","abstract":[{"lang":"eng","text":"In all state-of-the-art sketching and coreset techniques for clustering, as well as in the best known fixed-parameter tractable approximation algorithms, randomness plays a key role. For the classic k-median and k-means problems, there are no known deterministic dimensionality reduction procedure or coreset construction that avoid an exponential dependency on the input dimension d, the precision parameter $\\varepsilon^{-1}$ or k. Furthermore, there is no coreset construction that succeeds with probability $1-1/n$ and whose size does not depend on the number of input points, n. This has led researchers in the area to ask what is the power of randomness for clustering sketches [Feldman WIREs Data Mining Knowl. Discov’20].Similarly, the best approximation ratio achievable deterministically without a complexity exponential in the dimension are $1+\\sqrt{2}$ for k-median [Cohen-Addad, Esfandiari, Mirrokni, Narayanan, STOC’22] and 6.12903 for k-means [Grandoni, Ostrovsky, Rabani, Schulman, Venkat, Inf. Process. Lett.’22]. Those are the best results, even when allowing a complexity FPT in the number of clusters k: this stands in sharp contrast with the $(1+\\varepsilon)$-approximation achievable in that case, when allowing randomization.In this paper, we provide deterministic sketches constructions for clustering, whose size bounds are close to the best-known randomized ones. We show how to compute a dimension reduction onto $\\varepsilon^{-O(1)} \\log k$ dimensions in time $k^{O\\left(\\varepsilon^{-O(1)}+\\log \\log k\\right)}$ poly $(n d)$, and how to build a coreset of size $O\\left(k^{2} \\log ^{3} k \\varepsilon^{-O(1)}\\right)$ in time $2^{\\varepsilon^{O(1)} k \\log ^{3} k}+k^{O\\left(\\varepsilon^{-O(1)}+\\log \\log k\\right)}$ poly $(n d)$. In the case where k is small, this answers an open question of [Feldman WIDM’20] and [Munteanu and Schwiegelshohn, Künstliche Intell. ’18] on whether it is possible to efficiently compute coresets deterministically.We also construct a deterministic algorithm for computing $(1+$ $\\varepsilon)$-approximation to k-median and k-means in high dimensional Euclidean spaces in time $2^{k^{2} \\log ^{3} k / \\varepsilon^{O(1)}}$ poly $(n d)$, close to the best randomized complexity of $2^{(k / \\varepsilon)^{O(1)}}$ nd (see [Kumar, Sabharwal, Sen, JACM 10] and [Bhattacharya, Jaiswal, Kumar, TCS’18]).Furthermore, our new insights on sketches also yield a randomized coreset construction that uses uniform sampling, that immediately improves over the recent results of [Braverman et al. FOCS ’22] by a factor k."}],"day":"22","arxiv":1,"doi":"10.1109/focs57990.2023.00066","quality_controlled":"1","ec_funded":1,"page":"1105-1130","publisher":"IEEE","author":[{"last_name":"Cohen-Addad","first_name":"Vincent","full_name":"Cohen-Addad, Vincent"},{"id":"f8e48cf0-b0ff-11ed-b0e9-b4c35598f964","full_name":"Saulpic, David","last_name":"Saulpic","first_name":"David"},{"full_name":"Schwiegelshohn, Chris","last_name":"Schwiegelshohn","first_name":"Chris"}],"scopus_import":"1","_id":"14768","title":"Deterministic clustering in high dimensional spaces: Sketches and approximation","date_created":"2024-01-09T16:20:09Z","article_processing_charge":"No","department":[{"_id":"MoHe"}],"publication_status":"published"}]
