---
_id: '14769'
abstract:
- lang: eng
  text: 'For a set of points in Rd, the Euclidean k-means problems consists of finding
    k centers such that the sum of distances squared from each data point to its closest
    center is minimized. Coresets are one the main tools developed recently to solve
    this problem in a big data context. They allow to compress the initial dataset
    while preserving its structure: running any algorithm on the coreset provides
    a guarantee almost equivalent to running it on the full data. In this work, we
    study coresets in a fully-dynamic setting: points are added and deleted with the
    goal to efficiently maintain a coreset with which a k-means solution can be computed.
    Based on an algorithm from Henzinger and Kale [ESA''20], we present an efficient
    and practical implementation of a fully dynamic coreset algorithm, that improves
    the running time by up to a factor of 20 compared to our non-optimized implementation
    of the algorithm by Henzinger and Kale, without sacrificing more than 7% on the
    quality of the k-means solution.'
acknowledgement: This   project   has   received   funding   from   the   Euro-pean  Research  Council  (ERC)  under  the  EuropeanUnion’s  Horizon  2020  research  and  innovation  programme  (Grant  agreement  No.   101019564  “The  De-sign  of  Modern  Fully  Dynamic  Data  Structures  (Mo-DynStruct)”  and  the  Austrian  Science  Fund  (FWF)project
  Z 422-N, project “Static and Dynamic Hierar-chical  Graph  Decompositions”,  I  5982-N,  and  project“Fast  Algorithms  for  a  Reactive  Network  Layer  (Re-actNet)”,
  P 33775-N, with additional funding from thenetidee SCIENCE Stiftung, 2020–2024.D.  Sauplic  has  received  funding  from  the  Euro-pean  Union’s  Horizon  2020  research  and  innovation
  programme under the Marie Sklodowska-Curie    grant    agreementNo 101034413.
article_processing_charge: No
arxiv: 1
author:
- first_name: Monika H
  full_name: Henzinger, Monika H
  id: 540c9bbd-f2de-11ec-812d-d04a5be85630
  last_name: Henzinger
  orcid: 0000-0002-5008-6530
- first_name: David
  full_name: Saulpic, David
  id: f8e48cf0-b0ff-11ed-b0e9-b4c35598f964
  last_name: Saulpic
- first_name: Leonhard
  full_name: Sidl, Leonhard
  id: 8b563fd0-b441-11ee-9101-a3891c61efa6
  last_name: Sidl
citation:
  ama: 'Henzinger MH, Saulpic D, Sidl L. Experimental evaluation of fully dynamic
    k-means via coresets. In: <i>2024 Proceedings of the Symposium on Algorithm Engineering
    and Experiments</i>. Society for Industrial &#38; Applied Mathematics; 2024:220-233.
    doi:<a href="https://doi.org/10.1137/1.9781611977929.17">10.1137/1.9781611977929.17</a>'
  apa: 'Henzinger, M. H., Saulpic, D., &#38; Sidl, L. (2024). Experimental evaluation
    of fully dynamic k-means via coresets. In <i>2024 Proceedings of the Symposium
    on Algorithm Engineering and Experiments</i> (pp. 220–233). Alexandria, VA, United
    States: Society for Industrial &#38; Applied Mathematics. <a href="https://doi.org/10.1137/1.9781611977929.17">https://doi.org/10.1137/1.9781611977929.17</a>'
  chicago: Henzinger, Monika H, David Saulpic, and Leonhard Sidl. “Experimental Evaluation
    of Fully Dynamic K-Means via Coresets.” In <i>2024 Proceedings of the Symposium
    on Algorithm Engineering and Experiments</i>, 220–33. Society for Industrial &#38;
    Applied Mathematics, 2024. <a href="https://doi.org/10.1137/1.9781611977929.17">https://doi.org/10.1137/1.9781611977929.17</a>.
  ieee: M. H. Henzinger, D. Saulpic, and L. Sidl, “Experimental evaluation of fully
    dynamic k-means via coresets,” in <i>2024 Proceedings of the Symposium on Algorithm
    Engineering and Experiments</i>, Alexandria, VA, United States, 2024, pp. 220–233.
  ista: 'Henzinger MH, Saulpic D, Sidl L. 2024. Experimental evaluation of fully dynamic
    k-means via coresets. 2024 Proceedings of the Symposium on Algorithm Engineering
    and Experiments. ALENEX: Workshop on Algorithm Engineering and Experiments, 220–233.'
  mla: Henzinger, Monika H., et al. “Experimental Evaluation of Fully Dynamic K-Means
    via Coresets.” <i>2024 Proceedings of the Symposium on Algorithm Engineering and
    Experiments</i>, Society for Industrial &#38; Applied Mathematics, 2024, pp. 220–33,
    doi:<a href="https://doi.org/10.1137/1.9781611977929.17">10.1137/1.9781611977929.17</a>.
  short: M.H. Henzinger, D. Saulpic, L. Sidl, in:, 2024 Proceedings of the Symposium
    on Algorithm Engineering and Experiments, Society for Industrial &#38; Applied
    Mathematics, 2024, pp. 220–233.
conference:
  end_date: 2024-01-08
  location: Alexandria, VA, United States
  name: 'ALENEX: Workshop on Algorithm Engineering and Experiments'
  start_date: 2024-01-07
date_created: 2024-01-09T16:22:47Z
date_published: 2024-01-04T00:00:00Z
date_updated: 2025-07-15T12:51:52Z
day: '04'
department:
- _id: MoHe
doi: 10.1137/1.9781611977929.17
ec_funded: 1
external_id:
  arxiv:
  - '2310.18034'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2310.18034
month: '01'
oa: 1
oa_version: Preprint
page: 220-233
project:
- _id: bd9ca328-d553-11ed-ba76-dc4f890cfe62
  call_identifier: H2020
  grant_number: '101019564'
  name: The design and evaluation of modern fully dynamic data structures
- _id: 34def286-11ca-11ed-8bc3-da5948e1613c
  grant_number: Z00422
  name: Wittgenstein Award - Monika Henzinger
- _id: bda196b2-d553-11ed-ba76-8e8ee6c21103
  grant_number: I05982
  name: Static and Dynamic Hierarchical Graph Decompositions
- _id: bd9e3a2e-d553-11ed-ba76-8aa684ce17fe
  grant_number: 'P33775 '
  name: Fast Algorithms for a Reactive Network Layer
- _id: fc2ed2f7-9c52-11eb-aca3-c01059dda49c
  call_identifier: H2020
  grant_number: '101034413'
  name: 'IST-BRIDGE: International postdoctoral program'
publication: 2024 Proceedings of the Symposium on Algorithm Engineering and Experiments
publication_identifier:
  eisbn:
  - '9781611977929'
publication_status: published
publisher: Society for Industrial & Applied Mathematics
quality_controlled: '1'
scopus_import: '1'
status: public
title: Experimental evaluation of fully dynamic k-means via coresets
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
_id: '14768'
abstract:
- lang: eng
  text: 'In all state-of-the-art sketching and coreset techniques for clustering,
    as well as in the best known fixed-parameter tractable approximation algorithms,
    randomness plays a key role. For the classic k-median and k-means problems, there
    are no known deterministic dimensionality reduction procedure or coreset construction
    that avoid an exponential dependency on the input dimension d, the precision parameter
    $\varepsilon^{-1}$ or k. Furthermore, there is no coreset construction that succeeds
    with probability $1-1/n$ and whose size does not depend on the number of input
    points, n. This has led researchers in the area to ask what is the power of randomness
    for clustering sketches [Feldman WIREs Data Mining Knowl. Discov’20].Similarly,
    the best approximation ratio achievable deterministically without a complexity
    exponential in the dimension are $1+\sqrt{2}$ for k-median [Cohen-Addad, Esfandiari,
    Mirrokni, Narayanan, STOC’22] and 6.12903 for k-means [Grandoni, Ostrovsky, Rabani,
    Schulman, Venkat, Inf. Process. Lett.’22]. Those are the best results, even when
    allowing a complexity FPT in the number of clusters k: this stands in sharp contrast
    with the $(1+\varepsilon)$-approximation achievable in that case, when allowing
    randomization.In this paper, we provide deterministic sketches constructions for
    clustering, whose size bounds are close to the best-known randomized ones. We
    show how to compute a dimension reduction onto $\varepsilon^{-O(1)} \log k$ dimensions
    in time $k^{O\left(\varepsilon^{-O(1)}+\log \log k\right)}$ poly $(n d)$, and
    how to build a coreset of size $O\left(k^{2} \log ^{3} k \varepsilon^{-O(1)}\right)$
    in time $2^{\varepsilon^{O(1)} k \log ^{3} k}+k^{O\left(\varepsilon^{-O(1)}+\log
    \log k\right)}$ poly $(n d)$. In the case where k is small, this answers an open
    question of [Feldman WIDM’20] and [Munteanu and Schwiegelshohn, Künstliche Intell.
    ’18] on whether it is possible to efficiently compute coresets deterministically.We
    also construct a deterministic algorithm for computing $(1+$ $\varepsilon)$-approximation
    to k-median and k-means in high dimensional Euclidean spaces in time $2^{k^{2}
    \log ^{3} k / \varepsilon^{O(1)}}$ poly $(n d)$, close to the best randomized
    complexity of $2^{(k / \varepsilon)^{O(1)}}$ nd (see [Kumar, Sabharwal, Sen, JACM
    10] and [Bhattacharya, Jaiswal, Kumar, TCS’18]).Furthermore, our new insights
    on sketches also yield a randomized coreset construction that uses uniform sampling,
    that immediately improves over the recent results of [Braverman et al. FOCS ’22]
    by a factor k.'
acknowledgement: "D. Sauplic has received funding from the European Union’s Horizon
  2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement
  No 101034413, and Grant agreement No. 101019564 “The Design of Modern Fully Dynamic
  Data Structures (MoDynStruct)”.\r\nC. Schwiegelshohn acknowledges the support of
  the Independent Research Fund Denmark (DFF) under a Sapere Aude Research Leader
  grant No 1051-00106B."
article_processing_charge: No
arxiv: 1
author:
- first_name: Vincent
  full_name: Cohen-Addad, Vincent
  last_name: Cohen-Addad
- first_name: David
  full_name: Saulpic, David
  id: f8e48cf0-b0ff-11ed-b0e9-b4c35598f964
  last_name: Saulpic
- first_name: Chris
  full_name: Schwiegelshohn, Chris
  last_name: Schwiegelshohn
citation:
  ama: 'Cohen-Addad V, Saulpic D, Schwiegelshohn C. Deterministic clustering in high
    dimensional spaces: Sketches and approximation. In: <i>2023 IEEE 64th Annual Symposium
    on Foundations of Computer Science</i>. IEEE; 2023:1105-1130. doi:<a href="https://doi.org/10.1109/focs57990.2023.00066">10.1109/focs57990.2023.00066</a>'
  apa: 'Cohen-Addad, V., Saulpic, D., &#38; Schwiegelshohn, C. (2023). Deterministic
    clustering in high dimensional spaces: Sketches and approximation. In <i>2023
    IEEE 64th Annual Symposium on Foundations of Computer Science</i> (pp. 1105–1130).
    Santa Cruz, CA, United States: IEEE. <a href="https://doi.org/10.1109/focs57990.2023.00066">https://doi.org/10.1109/focs57990.2023.00066</a>'
  chicago: 'Cohen-Addad, Vincent, David Saulpic, and Chris Schwiegelshohn. “Deterministic
    Clustering in High Dimensional Spaces: Sketches and Approximation.” In <i>2023
    IEEE 64th Annual Symposium on Foundations of Computer Science</i>, 1105–30. IEEE,
    2023. <a href="https://doi.org/10.1109/focs57990.2023.00066">https://doi.org/10.1109/focs57990.2023.00066</a>.'
  ieee: 'V. Cohen-Addad, D. Saulpic, and C. Schwiegelshohn, “Deterministic clustering
    in high dimensional spaces: Sketches and approximation,” in <i>2023 IEEE 64th
    Annual Symposium on Foundations of Computer Science</i>, Santa Cruz, CA, United
    States, 2023, pp. 1105–1130.'
  ista: 'Cohen-Addad V, Saulpic D, Schwiegelshohn C. 2023. Deterministic clustering
    in high dimensional spaces: Sketches and approximation. 2023 IEEE 64th Annual
    Symposium on Foundations of Computer Science. FOCS: Symposium on Foundations of
    Computer Science, 1105–1130.'
  mla: 'Cohen-Addad, Vincent, et al. “Deterministic Clustering in High Dimensional
    Spaces: Sketches and Approximation.” <i>2023 IEEE 64th Annual Symposium on Foundations
    of Computer Science</i>, IEEE, 2023, pp. 1105–30, doi:<a href="https://doi.org/10.1109/focs57990.2023.00066">10.1109/focs57990.2023.00066</a>.'
  short: V. Cohen-Addad, D. Saulpic, C. Schwiegelshohn, in:, 2023 IEEE 64th Annual
    Symposium on Foundations of Computer Science, IEEE, 2023, pp. 1105–1130.
conference:
  end_date: 2023-11-09
  location: Santa Cruz, CA, United States
  name: 'FOCS: Symposium on Foundations of Computer Science'
  start_date: 2023-11-06
date_created: 2024-01-09T16:20:09Z
date_published: 2023-12-22T00:00:00Z
date_updated: 2024-01-16T07:28:06Z
day: '22'
department:
- _id: MoHe
doi: 10.1109/focs57990.2023.00066
ec_funded: 1
external_id:
  arxiv:
  - '2310.04076'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2310.04076
month: '12'
oa: 1
oa_version: Preprint
page: 1105-1130
project:
- _id: fc2ed2f7-9c52-11eb-aca3-c01059dda49c
  call_identifier: H2020
  grant_number: '101034413'
  name: 'IST-BRIDGE: International postdoctoral program'
- _id: bd9ca328-d553-11ed-ba76-dc4f890cfe62
  call_identifier: H2020
  grant_number: '101019564'
  name: The design and evaluation of modern fully dynamic data structures
publication: 2023 IEEE 64th Annual Symposium on Foundations of Computer Science
publication_identifier:
  eisbn:
  - '9798350318944'
publication_status: published
publisher: IEEE
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Deterministic clustering in high dimensional spaces: Sketches and approximation'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
