{"publication_status":"submitted","citation":{"short":"D. Zikelic, M. Lechner, T.A. Henzinger, K. Chatterjee, ArXiv (n.d.).","ama":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. <i>arXiv</i>. doi:<a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">10.48550/ARXIV.2210.05308</a>","apa":"Zikelic, D., Lechner, M., Henzinger, T. A., &#38; Chatterjee, K. (n.d.). Learning control policies for stochastic systems with reach-avoid guarantees. <i>arXiv</i>. <a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">https://doi.org/10.48550/ARXIV.2210.05308</a>","chicago":"Zikelic, Dorde, Mathias Lechner, Thomas A Henzinger, and Krishnendu Chatterjee. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” <i>ArXiv</i>, n.d. <a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">https://doi.org/10.48550/ARXIV.2210.05308</a>.","ista":"Zikelic D, Lechner M, Henzinger TA, Chatterjee K. Learning control policies for stochastic systems with reach-avoid guarantees. arXiv, <a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">10.48550/ARXIV.2210.05308</a>.","ieee":"D. Zikelic, M. Lechner, T. A. Henzinger, and K. Chatterjee, “Learning control policies for stochastic systems with reach-avoid guarantees,” <i>arXiv</i>. .","mla":"Zikelic, Dorde, et al. “Learning Control Policies for Stochastic Systems with Reach-Avoid Guarantees.” <i>ArXiv</i>, doi:<a href=\"https://doi.org/10.48550/ARXIV.2210.05308\">10.48550/ARXIV.2210.05308</a>."},"date_published":"2022-11-29T00:00:00Z","ec_funded":1,"main_file_link":[{"url":"https://arxiv.org/abs/2210.05308","open_access":"1"}],"abstract":[{"text":"We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.","lang":"eng"}],"_id":"14600","department":[{"_id":"KrCh"},{"_id":"ToHe"}],"language":[{"iso":"eng"}],"publication":"arXiv","day":"29","author":[{"id":"294AA7A6-F248-11E8-B48F-1D18A9856A87","full_name":"Zikelic, Dorde","first_name":"Dorde","last_name":"Zikelic","orcid":"0000-0002-4681-1699"},{"last_name":"Lechner","first_name":"Mathias","id":"3DC22916-F248-11E8-B48F-1D18A9856A87","full_name":"Lechner, Mathias"},{"orcid":"0000-0002-2985-7724","last_name":"Henzinger","first_name":"Thomas A","full_name":"Henzinger, Thomas A","id":"40876CD8-F248-11E8-B48F-1D18A9856A87"},{"orcid":"0000-0002-4561-241X","last_name":"Chatterjee","first_name":"Krishnendu","full_name":"Chatterjee, Krishnendu","id":"2E5DCA20-F248-11E8-B48F-1D18A9856A87"}],"doi":"10.48550/ARXIV.2210.05308","type":"preprint","license":"https://creativecommons.org/licenses/by-sa/4.0/","date_updated":"2024-01-22T14:08:29Z","tmp":{"short":"CC BY-SA (4.0)","image":"/images/cc_by_sa.png","name":"Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0)","legal_code_url":"https://creativecommons.org/licenses/by-sa/4.0/legalcode"},"month":"11","article_processing_charge":"No","external_id":{"arxiv":["2210.05308"]},"status":"public","related_material":{"record":[{"relation":"dissertation_contains","id":"14539","status":"public"},{"status":"public","relation":"later_version","id":"14830"}]},"title":"Learning control policies for stochastic systems with reach-avoid guarantees","project":[{"_id":"0599E47C-7A3F-11EA-A408-12923DDC885E","name":"Formal Methods for Stochastic Models: Algorithms and Applications","grant_number":"863818","call_identifier":"H2020"},{"grant_number":"101020093","_id":"62781420-2b32-11ec-9570-8d9b63373d4d","name":"Vigilant Algorithmic Monitoring of Software","call_identifier":"H2020"},{"call_identifier":"H2020","grant_number":"665385","_id":"2564DBCA-B435-11E9-9278-68D0E5697425","name":"International IST Doctoral Program"}],"date_created":"2023-11-24T13:10:09Z","user_id":"8b945eb4-e2f2-11eb-945a-df72226e66a9","oa_version":"Preprint","oa":1,"year":"2022"}