On the correctness of electronic documents: studying, finding, and localizing inconsistency bugs in PDF readers and files
File(s)s10664-018-9600-2.pdf (3.55 MB)
Published version
Author(s)
Kuchta, Tomasz
Lutellier, Thibaud
Wong, Edmund
Tan, Lin
Cadar, C
Type
Journal Article
Abstract
Electronic documents are widely used to store and share information such as bank statements, contracts, articles, maps and tax information. Many different applications exist for displaying a given electronic document, and users rightfully assume that documents will be rendered similarly independently of the application used. However, this is not always the case, and these inconsistencies, regardless of their causes—bugs in the application or the file itself—can become critical sources of miscommunication. In this paper, we present a study on the correctness of PDF documents and readers. We start by manually investigating a large number of real-world PDF documents to understand the frequency and characteristics of cross-reader inconsistencies, and find that such inconsistencies are common—13.5% PDF files are inconsistently rendered by at least one popular reader. We then propose an approach to detect and localize the source of such inconsistencies automatically. We evaluate our automatic approach on a large corpus of over 230 K documents using 11 popular readers and our experiments have detected 30 unique bugs in these readers and files. We also reported 33 bugs, some of which have already been confirmed or fixed by developers.
Date Issued
2018-12-01
Date Acceptance
2018-01-24
Citation
Empirical Software Engineering, 2018, 23 (6), pp.3187-3220
ISSN
1382-3256
Publisher
Springer Verlag
Start Page
3187
End Page
3220
Journal / Book Title
Empirical Software Engineering
Volume
23
Issue
6
Copyright Statement
© The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Sponsor
Engineering & Physical Science Research Council (EPSRC)
Grant Number
EP/L002795/1
Subjects
Science & Technology
Technology
Computer Science, Software Engineering
Computer Science
Cross-software inconsistencies
Document correctness
Image comparison
Error-message clustering
DIGITAL FORENSICS
Software Engineering
0803 Computer Software
Publication Status
Published
Date Publish Online
2018-03-09