---
_id: '10367'
abstract:
- lang: eng
  text: How information is created, shared and consumed has changed rapidly in recent
    decades, in part thanks to new social platforms and technologies on the web. With
    ever-larger amounts of unstructured and limited labels, organizing and reconciling
    information from different sources and modalities is a central challenge in machine
    learning. This cutting-edge tutorial aims to introduce the multimodal entailment
    task, which can be useful for detecting semantic alignments when a single modality
    alone does not suffice for a whole content understanding. Starting with a brief
    overview of natural language processing, computer vision, structured data and
    neural graph learning, we lay the foundations for the multimodal sections to follow.
    We then discuss recent multimodal learning literature covering visual, audio and
    language streams, and explore case studies focusing on tasks which require fine-grained
    understanding of visual and linguistic semantics question answering, veracity
    and hatred classification. Finally, we introduce a new dataset for recognizing
    multimodal entailment, exploring it in a hands-on collaborative section. Overall,
    this tutorial gives an overview of multimodal learning, introduces a multimodal
    entailment dataset, and encourages future research in the topic.
acknowledgement: "We would like to thank Abby Schantz, Abe Ittycheriah, Aliaksei Severyn,
  Allan Heydon, Aly\r\nGrealish, Andrey Vlasov, Arkaitz Zubiaga, Ashwin Kakarla, Chen
  Sun, Clayton Williams, Cong\r\nYu, Cordelia Schmid, Da-Cheng Juan, Dan Finnie, Dani
  Valevski, Daniel Rocha, David Price, David Sklar, Devi Krishna, Elena Kochkina,
  Enrique Alfonseca, Franc¸oise Beaufays, Isabelle Augenstein, Jialu Liu, John Cantwell,
  John Palowitch, Jordan Boyd-Graber, Lei Shi, Luis Valente, Maria Voitovich, Mehmet
  Aktuna, Mogan Brown, Mor Naaman, Natalia P, Nidhi Hebbar, Pete Aykroyd, Rahul Sukthankar,
  Richa Dixit, Steve Pucci, Tania Bedrax-Weiss, Tobias Kaufmann, Tom Boulos, Tu Tsao,
  Vladimir Chtchetkine, Yair Kurzion, Yifan Xu and Zach Hynes."
article_processing_charge: No
author:
- first_name: Cesar
  full_name: Ilharco, Cesar
  last_name: Ilharco
- first_name: Afsaneh
  full_name: Shirazi, Afsaneh
  last_name: Shirazi
- first_name: Arjun
  full_name: Gopalan, Arjun
  last_name: Gopalan
- first_name: Arsha
  full_name: Nagrani, Arsha
  last_name: Nagrani
- first_name: Blaž
  full_name: Bratanič, Blaž
  last_name: Bratanič
- first_name: Chris
  full_name: Bregler, Chris
  last_name: Bregler
- first_name: Christina
  full_name: Liu, Christina
  last_name: Liu
- first_name: Felipe
  full_name: Ferreira, Felipe
  last_name: Ferreira
- first_name: Gabriek
  full_name: Barcik, Gabriek
  last_name: Barcik
- first_name: Gabriel
  full_name: Ilharco, Gabriel
  last_name: Ilharco
- first_name: Georg F
  full_name: Osang, Georg F
  id: 464B40D6-F248-11E8-B48F-1D18A9856A87
  last_name: Osang
- first_name: Jannis
  full_name: Bulian, Jannis
  last_name: Bulian
- first_name: Jared
  full_name: Frank, Jared
  last_name: Frank
- first_name: Lucas
  full_name: Smaira, Lucas
  last_name: Smaira
- first_name: Qin
  full_name: Cao, Qin
  last_name: Cao
- first_name: Ricardo
  full_name: Marino, Ricardo
  last_name: Marino
- first_name: Roma
  full_name: Patel, Roma
  last_name: Patel
- first_name: Thomas
  full_name: Leung, Thomas
  last_name: Leung
- first_name: Vaiva
  full_name: Imbrasaite, Vaiva
  last_name: Imbrasaite
citation:
  ama: 'Ilharco C, Shirazi A, Gopalan A, et al. Recognizing multimodal entailment.
    In: <i>59th Annual Meeting of the Association for Computational Linguistics and
    the 11th International Joint Conference on Natural Language Processing, Tutorial
    Abstracts</i>. Association for Computational Linguistics; 2021:29-30. doi:<a href="https://doi.org/10.18653/v1/2021.acl-tutorials.6">10.18653/v1/2021.acl-tutorials.6</a>'
  apa: 'Ilharco, C., Shirazi, A., Gopalan, A., Nagrani, A., Bratanič, B., Bregler,
    C., … Imbrasaite, V. (2021). Recognizing multimodal entailment. In <i>59th Annual
    Meeting of the Association for Computational Linguistics and the 11th International
    Joint Conference on Natural Language Processing, Tutorial Abstracts</i> (pp. 29–30).
    Bangkok, Thailand: Association for Computational Linguistics. <a href="https://doi.org/10.18653/v1/2021.acl-tutorials.6">https://doi.org/10.18653/v1/2021.acl-tutorials.6</a>'
  chicago: Ilharco, Cesar, Afsaneh Shirazi, Arjun Gopalan, Arsha Nagrani, Blaž Bratanič,
    Chris Bregler, Christina Liu, et al. “Recognizing Multimodal Entailment.” In <i>59th
    Annual Meeting of the Association for Computational Linguistics and the 11th International
    Joint Conference on Natural Language Processing, Tutorial Abstracts</i>, 29–30.
    Association for Computational Linguistics, 2021. <a href="https://doi.org/10.18653/v1/2021.acl-tutorials.6">https://doi.org/10.18653/v1/2021.acl-tutorials.6</a>.
  ieee: C. Ilharco <i>et al.</i>, “Recognizing multimodal entailment,” in <i>59th
    Annual Meeting of the Association for Computational Linguistics and the 11th International
    Joint Conference on Natural Language Processing, Tutorial Abstracts</i>, Bangkok,
    Thailand, 2021, pp. 29–30.
  ista: 'Ilharco C, Shirazi A, Gopalan A, Nagrani A, Bratanič B, Bregler C, Liu C,
    Ferreira F, Barcik G, Ilharco G, Osang GF, Bulian J, Frank J, Smaira L, Cao Q,
    Marino R, Patel R, Leung T, Imbrasaite V. 2021. Recognizing multimodal entailment.
    59th Annual Meeting of the Association for Computational Linguistics and the 11th
    International Joint Conference on Natural Language Processing, Tutorial Abstracts.
    ACL: Association for Computational Linguistics ; IJCNLP: International Joint Conference
    on Natural Language Processing, 29–30.'
  mla: Ilharco, Cesar, et al. “Recognizing Multimodal Entailment.” <i>59th Annual
    Meeting of the Association for Computational Linguistics and the 11th International
    Joint Conference on Natural Language Processing, Tutorial Abstracts</i>, Association
    for Computational Linguistics, 2021, pp. 29–30, doi:<a href="https://doi.org/10.18653/v1/2021.acl-tutorials.6">10.18653/v1/2021.acl-tutorials.6</a>.
  short: C. Ilharco, A. Shirazi, A. Gopalan, A. Nagrani, B. Bratanič, C. Bregler,
    C. Liu, F. Ferreira, G. Barcik, G. Ilharco, G.F. Osang, J. Bulian, J. Frank, L.
    Smaira, Q. Cao, R. Marino, R. Patel, T. Leung, V. Imbrasaite, in:, 59th Annual
    Meeting of the Association for Computational Linguistics and the 11th International
    Joint Conference on Natural Language Processing, Tutorial Abstracts, Association
    for Computational Linguistics, 2021, pp. 29–30.
conference:
  end_date: 2021-08-06
  location: Bangkok, Thailand
  name: 'ACL: Association for Computational Linguistics ; IJCNLP: International Joint
    Conference on Natural Language Processing'
  start_date: 2021-08-01
date_created: 2021-11-28T23:01:30Z
date_published: 2021-08-01T00:00:00Z
date_updated: 2022-01-26T14:26:36Z
day: '01'
ddc:
- '000'
department:
- _id: HeEd
doi: 10.18653/v1/2021.acl-tutorials.6
file:
- access_level: open_access
  checksum: b14052a025a6ecf675bdfe51db98c0d7
  content_type: application/pdf
  creator: cchlebak
  date_created: 2021-11-29T08:41:00Z
  date_updated: 2021-11-29T08:41:00Z
  file_id: '10368'
  file_name: 2021_ACL_Ilharco.pdf
  file_size: 1227703
  relation: main_file
  success: 1
file_date_updated: 2021-11-29T08:41:00Z
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
  url: https://aclanthology.org/2021.acl-tutorials.6/
month: '08'
oa: 1
oa_version: Published Version
page: 29-30
publication: 59th Annual Meeting of the Association for Computational Linguistics
  and the 11th International Joint Conference on Natural Language Processing, Tutorial
  Abstracts
publication_identifier:
  isbn:
  - 9-781-9540-8557-2
publication_status: published
publisher: Association for Computational Linguistics
quality_controlled: '1'
scopus_import: '1'
status: public
title: Recognizing multimodal entailment
tmp:
  image: /images/cc_by.png
  legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
  name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
  short: CC BY (4.0)
type: conference
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2021'
...
