---
_id: '12128'
abstract:
- lang: eng
  text: We introduce a machine-learning (ML) framework for high-throughput benchmarking
    of diverse representations of chemical systems against datasets of materials and
    molecules. The guiding principle underlying the benchmarking approach is to evaluate
    raw descriptor performance by limiting model complexity to simple regression schemes
    while enforcing best ML practices, allowing for unbiased hyperparameter optimization,
    and assessing learning progress through learning curves along series of synchronized
    train-test splits. The resulting models are intended as baselines that can inform
    future method development, in addition to indicating how easily a given dataset
    can be learnt. Through a comparative analysis of the training outcome across a
    diverse set of physicochemical, topological and geometric representations, we
    glean insight into the relative merits of these representations as well as their
    interrelatedness.
acknowledgement: 'C P acknowledges funding from Astex through the Sustaining Innovation
  Program under the Milner Consortium. B C acknowledges resources provided by the
  Cambridge Tier-2 system operated by the University of Cambridge Research Computing
  Service funded by EPSRC Tier-2 capital Grant EP/P020259/1. F A F acknowledges funding
  from the Swiss National Science Foundation (Grant No. P2BSP2_191736). '
article_number: '040501'
article_processing_charge: No
article_type: original
author:
- first_name: Carl
  full_name: Poelking, Carl
  last_name: Poelking
- first_name: Felix A
  full_name: Faber, Felix A
  last_name: Faber
- first_name: Bingqing
  full_name: Cheng, Bingqing
  id: cbe3cda4-d82c-11eb-8dc7-8ff94289fcc9
  last_name: Cheng
  orcid: 0000-0002-3584-9632
citation:
  ama: 'Poelking C, Faber FA, Cheng B. BenchML: An extensible pipelining framework
    for benchmarking representations of materials and molecules at scale. <i>Machine
    Learning: Science and Technology</i>. 2022;3(4). doi:<a href="https://doi.org/10.1088/2632-2153/ac4d11">10.1088/2632-2153/ac4d11</a>'
  apa: 'Poelking, C., Faber, F. A., &#38; Cheng, B. (2022). BenchML: An extensible
    pipelining framework for benchmarking representations of materials and molecules
    at scale. <i>Machine Learning: Science and Technology</i>. IOP Publishing. <a
    href="https://doi.org/10.1088/2632-2153/ac4d11">https://doi.org/10.1088/2632-2153/ac4d11</a>'
  chicago: 'Poelking, Carl, Felix A Faber, and Bingqing Cheng. “BenchML: An Extensible
    Pipelining Framework for Benchmarking Representations of Materials and Molecules
    at Scale.” <i>Machine Learning: Science and Technology</i>. IOP Publishing, 2022.
    <a href="https://doi.org/10.1088/2632-2153/ac4d11">https://doi.org/10.1088/2632-2153/ac4d11</a>.'
  ieee: 'C. Poelking, F. A. Faber, and B. Cheng, “BenchML: An extensible pipelining
    framework for benchmarking representations of materials and molecules at scale,”
    <i>Machine Learning: Science and Technology</i>, vol. 3, no. 4. IOP Publishing,
    2022.'
  ista: 'Poelking C, Faber FA, Cheng B. 2022. BenchML: An extensible pipelining framework
    for benchmarking representations of materials and molecules at scale. Machine
    Learning: Science and Technology. 3(4), 040501.'
  mla: 'Poelking, Carl, et al. “BenchML: An Extensible Pipelining Framework for Benchmarking
    Representations of Materials and Molecules at Scale.” <i>Machine Learning: Science
    and Technology</i>, vol. 3, no. 4, 040501, IOP Publishing, 2022, doi:<a href="https://doi.org/10.1088/2632-2153/ac4d11">10.1088/2632-2153/ac4d11</a>.'
  short: 'C. Poelking, F.A. Faber, B. Cheng, Machine Learning: Science and Technology
    3 (2022).'
date_created: 2023-01-12T12:02:21Z
date_published: 2022-11-17T00:00:00Z
date_updated: 2023-08-04T08:49:53Z
day: '17'
ddc:
- '000'
department:
- _id: BiCh
doi: 10.1088/2632-2153/ac4d11
external_id:
  isi:
  - '000886534000001'
file:
- access_level: open_access
  checksum: 8930d4ad6ed9b47358c6f1a68666adb6
  content_type: application/pdf
  creator: dernst
  date_created: 2023-01-23T10:42:04Z
  date_updated: 2023-01-23T10:42:04Z
  file_id: '12343'
  file_name: 2022_MachLearning_Poelking.pdf
  file_size: 13814559
  relation: main_file
  success: 1
file_date_updated: 2023-01-23T10:42:04Z
has_accepted_license: '1'
intvolume: '         3'
isi: 1
issue: '4'
keyword:
- Artificial Intelligence
- Human-Computer Interaction
- Software
language:
- iso: eng
license: https://creativecommons.org/licenses/by/4.0/
month: '11'
oa: 1
oa_version: Published Version
publication: 'Machine Learning: Science and Technology'
publication_identifier:
  issn:
  - 2632-2153
publication_status: published
publisher: IOP Publishing
quality_controlled: '1'
related_material:
  link:
  - relation: software
    url: https://github.com/capoe/benchml
scopus_import: '1'
status: public
title: 'BenchML: An extensible pipelining framework for benchmarking representations
  of materials and molecules at scale'
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: journal_article
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
volume: 3
year: '2022'
...
