---
_id: '15023'
abstract:
- lang: eng
  text: Reinforcement learning has shown promising results in learning neural network
    policies for complicated control tasks. However, the lack of formal guarantees
    about the behavior of such policies remains an impediment to their deployment.
    We propose a novel method for learning a composition of neural network policies
    in stochastic environments, along with a formal certificate which guarantees that
    a specification over the policy's behavior is satisfied with the desired probability.
    Unlike prior work on verifiable RL, our approach leverages the compositional nature
    of logical specifications provided in SpectRL, to learn over graphs of probabilistic
    reach-avoid specifications. The formal guarantees are provided by learning neural
    network policies together with reach-avoid supermartingales (RASM) for the graph’s
    sub-tasks and then composing them into a global policy. We also derive a tighter
    lower bound compared to previous work on the probability of reach-avoidance implied
    by a RASM, which is required to find a compositional policy with an acceptable
    probabilistic threshold for complex tasks with multiple edge policies. We implement
    a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.
acknowledgement: "This work was supported in part by the ERC-2020-AdG 101020093 (VAMOS)
  and the ERC-2020-\r\nCoG 863818 (FoRM-SMArt)."
article_processing_charge: No
arxiv: 1
author:
- first_name: Dorde
  full_name: Zikelic, Dorde
  id: 294AA7A6-F248-11E8-B48F-1D18A9856A87
  last_name: Zikelic
  orcid: 0000-0002-4681-1699
- first_name: Mathias
  full_name: Lechner, Mathias
  id: 3DC22916-F248-11E8-B48F-1D18A9856A87
  last_name: Lechner
- first_name: Abhinav
  full_name: Verma, Abhinav
  id: a235593c-d7fa-11eb-a0c5-b22ca3c66ee6
  last_name: Verma
- first_name: Krishnendu
  full_name: Chatterjee, Krishnendu
  id: 2E5DCA20-F248-11E8-B48F-1D18A9856A87
  last_name: Chatterjee
  orcid: 0000-0002-4561-241X
- first_name: Thomas A
  full_name: Henzinger, Thomas A
  id: 40876CD8-F248-11E8-B48F-1D18A9856A87
  last_name: Henzinger
  orcid: 0000-0002-2985-7724
citation:
  ama: 'Zikelic D, Lechner M, Verma A, Chatterjee K, Henzinger TA. Compositional policy
    learning in stochastic control systems with formal guarantees. In: <i>37th Conference
    on Neural Information Processing Systems</i>. ; 2023.'
  apa: Zikelic, D., Lechner, M., Verma, A., Chatterjee, K., &#38; Henzinger, T. A.
    (2023). Compositional policy learning in stochastic control systems with formal
    guarantees. In <i>37th Conference on Neural Information Processing Systems</i>.
    New Orleans, LO, United States.
  chicago: Zikelic, Dorde, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee,
    and Thomas A Henzinger. “Compositional Policy Learning in Stochastic Control Systems
    with Formal Guarantees.” In <i>37th Conference on Neural Information Processing
    Systems</i>, 2023.
  ieee: D. Zikelic, M. Lechner, A. Verma, K. Chatterjee, and T. A. Henzinger, “Compositional
    policy learning in stochastic control systems with formal guarantees,” in <i>37th
    Conference on Neural Information Processing Systems</i>, New Orleans, LO, United
    States, 2023.
  ista: 'Zikelic D, Lechner M, Verma A, Chatterjee K, Henzinger TA. 2023. Compositional
    policy learning in stochastic control systems with formal guarantees. 37th Conference
    on Neural Information Processing Systems. NeurIPS: Neural Information Processing
    Systems.'
  mla: Zikelic, Dorde, et al. “Compositional Policy Learning in Stochastic Control
    Systems with Formal Guarantees.” <i>37th Conference on Neural Information Processing
    Systems</i>, 2023.
  short: D. Zikelic, M. Lechner, A. Verma, K. Chatterjee, T.A. Henzinger, in:, 37th
    Conference on Neural Information Processing Systems, 2023.
conference:
  end_date: 2023-12-16
  location: New Orleans, LO, United States
  name: 'NeurIPS: Neural Information Processing Systems'
  start_date: 2023-12-10
date_created: 2024-02-25T09:23:24Z
date_published: 2023-12-15T00:00:00Z
date_updated: 2025-07-14T09:10:04Z
day: '15'
department:
- _id: ToHe
- _id: KrCh
ec_funded: 1
external_id:
  arxiv:
  - '2312.01456'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://doi.org/10.48550/arXiv.2312.01456
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 0599E47C-7A3F-11EA-A408-12923DDC885E
  call_identifier: H2020
  grant_number: '863818'
  name: 'Formal Methods for Stochastic Models: Algorithms and Applications'
- _id: 62781420-2b32-11ec-9570-8d9b63373d4d
  call_identifier: H2020
  grant_number: '101020093'
  name: Vigilant Algorithmic Monitoring of Software
publication: 37th Conference on Neural Information Processing Systems
publication_status: epub_ahead
quality_controlled: '1'
status: public
title: Compositional policy learning in stochastic control systems with formal guarantees
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
