{"date_updated":"2022-06-27T07:05:12Z","article_processing_charge":"No","month":"12","day":"06","type":"conference","main_file_link":[{"open_access":"1","url":"https://proceedings.neurips.cc/paper/2021/file/7cfd5df443b4eb0d69886a583b33de4c-Paper.pdf"}],"abstract":[{"text":"Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational\r\nor storage costs, which limits their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms: the first is tailored towards network compression and can compute the IHVP for dimension d, if the Hessian is given as a sum of m rank-one matrices, using O(dm2) precomputation, O(dm) cost for computing the IHVP, and query cost O(m) for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost O(dm + m2) for computing the IHVP and O(dm + m3) for adding or removing any gradient from the sliding window. These\r\ntwo algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [9] and [17].","lang":"eng"}],"department":[{"_id":"DaAl"}],"date_published":"2021-12-06T00:00:00Z","publication_status":"published","scopus_import":"1","publisher":"Curran Associates","user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","date_created":"2022-06-26T22:01:35Z","oa_version":"Published Version","oa":1,"status":"public","title":"M-FAC: Efficient matrix-free approximations of second-order information","quality_controlled":"1","publication_identifier":{"isbn":["9781713845393"],"issn":["1049-5258"]},"external_id":{"arxiv":["2010.08222"]},"publication":"35th Conference on Neural Information Processing Systems","author":[{"last_name":"Frantar","first_name":"Elias","id":"09a8f98d-ec99-11ea-ae11-c063a7b7fe5f","full_name":"Frantar, Elias"},{"first_name":"Eldar","last_name":"Kurtic","full_name":"Kurtic, Eldar","id":"47beb3a5-07b5-11eb-9b87-b108ec578218"},{"first_name":"Dan-Adrian","last_name":"Alistarh","id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","full_name":"Alistarh, Dan-Adrian","orcid":"0000-0003-3650-940X"}],"page":"14873-14886","_id":"11463","language":[{"iso":"eng"}],"conference":{"end_date":"2021-12-14","location":"Virtual, Online","start_date":"2021-12-06","name":"NeurIPS: Neural Information Processing Systems"},"citation":{"ama":"Frantar E, Kurtic E, Alistarh D-A. M-FAC: Efficient matrix-free approximations of second-order information. In: 35th Conference on Neural Information Processing Systems. Vol 34. Curran Associates; 2021:14873-14886.","ista":"Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations of second-order information. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems vol. 34, 14873–14886.","apa":"Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free approximations of second-order information. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Curran Associates.","chicago":"Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” In 35th Conference on Neural Information Processing Systems, 34:14873–86. Curran Associates, 2021.","short":"E. Frantar, E. Kurtic, D.-A. Alistarh, in:, 35th Conference on Neural Information Processing Systems, Curran Associates, 2021, pp. 14873–14886.","mla":"Frantar, Elias, et al. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” 35th Conference on Neural Information Processing Systems, vol. 34, Curran Associates, 2021, pp. 14873–86.","ieee":"E. Frantar, E. Kurtic, and D.-A. Alistarh, “M-FAC: Efficient matrix-free approximations of second-order information,” in 35th Conference on Neural Information Processing Systems, Virtual, Online, 2021, vol. 34, pp. 14873–14886."},"ec_funded":1,"year":"2021","volume":34,"intvolume":" 34","project":[{"_id":"268A44D6-B435-11E9-9278-68D0E5697425","name":"Elastic Coordination for Scalable Machine Learning","grant_number":"805223","call_identifier":"H2020"}],"acknowledgement":"We gratefully acknowledge funding the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML), as well as computational support from Amazon Web Services (AWS) EC2."}