---
_id: '14949'
abstract:
- lang: eng
  text: Many approaches have been proposed to use diffusion models to augment training
    datasets for downstream tasks, such as classification. However, diffusion models
    are themselves trained on large datasets, often with noisy annotations, and it
    remains an open question to which extent these models contribute to downstream
    classification performance. In particular, it remains unclear if they generalize
    enough to improve over directly using the additional data of their pre-training
    process for augmentation. We systematically evaluate a range of existing methods
    to generate images from diffusion models and study new extensions to assess their
    benefit for data augmentation. Personalizing diffusion models towards the target
    data outperforms simpler prompting strategies. However, using the pre-training
    data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure,
    leads to even stronger downstream performance. Our study explores the potential
    of diffusion models in generating new training data, and surprisingly finds that
    these sophisticated models are not yet able to beat a simple and strong image
    retrieval baseline on simple downstream vision tasks.
acknowledgement: The authors would like to thank Varad Gunjal and Vishaal Udandarao.
  MFB thanks the International Max Planck Research School for Intelligent Systems
  (IMPRS-IS).
alternative_title:
- TMLR
article_processing_charge: No
article_type: original
author:
- first_name: Max
  full_name: Burg, Max
  last_name: Burg
- first_name: Florian
  full_name: Wenzel, Florian
  last_name: Wenzel
- first_name: Dominik
  full_name: Zietlow, Dominik
  last_name: Zietlow
- first_name: Max
  full_name: Horn, Max
  last_name: Horn
- first_name: Osama
  full_name: Makansi, Osama
  last_name: Makansi
- first_name: Francesco
  full_name: Locatello, Francesco
  id: 26cfd52f-2483-11ee-8040-88983bcc06d4
  last_name: Locatello
  orcid: 0000-0002-4850-0683
- first_name: Chris
  full_name: Russell, Chris
  last_name: Russell
citation:
  ama: Burg M, Wenzel F, Zietlow D, et al. Image retrieval outperforms diffusion models
    on data augmentation. <i>Journal of Machine Learning Research</i>. 2023.
  apa: Burg, M., Wenzel, F., Zietlow, D., Horn, M., Makansi, O., Locatello, F., &#38;
    Russell, C. (2023). Image retrieval outperforms diffusion models on data augmentation.
    <i>Journal of Machine Learning Research</i>. ML Research Press.
  chicago: Burg, Max, Florian Wenzel, Dominik Zietlow, Max Horn, Osama Makansi, Francesco
    Locatello, and Chris Russell. “Image Retrieval Outperforms Diffusion Models on
    Data Augmentation.” <i>Journal of Machine Learning Research</i>. ML Research Press,
    2023.
  ieee: M. Burg <i>et al.</i>, “Image retrieval outperforms diffusion models on data
    augmentation,” <i>Journal of Machine Learning Research</i>. ML Research Press,
    2023.
  ista: Burg M, Wenzel F, Zietlow D, Horn M, Makansi O, Locatello F, Russell C. 2023.
    Image retrieval outperforms diffusion models on data augmentation. Journal of
    Machine Learning Research.
  mla: Burg, Max, et al. “Image Retrieval Outperforms Diffusion Models on Data Augmentation.”
    <i>Journal of Machine Learning Research</i>, ML Research Press, 2023.
  short: M. Burg, F. Wenzel, D. Zietlow, M. Horn, O. Makansi, F. Locatello, C. Russell,
    Journal of Machine Learning Research (2023).
date_created: 2024-02-07T14:57:39Z
date_published: 2023-12-10T00:00:00Z
date_updated: 2024-02-12T08:30:21Z
day: '10'
ddc:
- '000'
department:
- _id: FrLo
file:
- access_level: open_access
  checksum: af87ddea7908923426365347b9c87ba7
  content_type: application/pdf
  creator: ptazenko
  date_created: 2024-02-07T14:57:32Z
  date_updated: 2024-02-07T14:57:32Z
  file_id: '14950'
  file_name: Burg_et_al_2023_Image_retrieval_outperforms.pdf
  file_size: 27325153
  relation: main_file
file_date_updated: 2024-02-07T14:57:32Z
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://openreview.net/forum?id=xflYdGZMpv
month: '12'
oa: 1
oa_version: Published Version
publication: Journal of Machine Learning Research
publication_identifier:
  eissn:
  - 2835-8856
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: Image retrieval outperforms diffusion models on data augmentation
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
