Multimodal attention for neural machine translation
File(s)1609.03976v1.pdf (870.27 KB)
Working paper
Author(s)
Caglayan, Ozan
Barrault, Loïc
Bougares, Fethi
Type
Working Paper
Abstract
The attention mechanism is an important part of the neural machine
translation (NMT) where it was reported to produce richer source representation
compared to fixed-length encoding sequence-to-sequence models. Recently, the
effectiveness of attention has also been explored in the context of image
captioning. In this work, we assess the feasibility of a multimodal attention
mechanism that simultaneously focus over an image and its natural language
description for generating a description in another language. We train several
variants of our proposed attention mechanism on the Multi30k multilingual image
captioning dataset. We show that a dedicated attention for each modality
achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT
baseline.
translation (NMT) where it was reported to produce richer source representation
compared to fixed-length encoding sequence-to-sequence models. Recently, the
effectiveness of attention has also been explored in the context of image
captioning. In this work, we assess the feasibility of a multimodal attention
mechanism that simultaneously focus over an image and its natural language
description for generating a description in another language. We train several
variants of our proposed attention mechanism on the Multi30k multilingual image
captioning dataset. We show that a dedicated attention for each modality
achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT
baseline.
Date Issued
2016-09-13
Citation
2016
Publisher
arxiv
Copyright Statement
© 2016 The Authors.
Identifier
http://arxiv.org/abs/1609.03976v1
Subjects
cs.CL
cs.CL
cs.NE
Notes
10 pages, under review COLING 2016
Publication Status
Published