Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems
File(s)live-5175-9434-jair.pdf (1.28 MB)
Accepted version
Author(s)
Jennings, N
Venanzi, M
Guiver, J
Kohli, P
Type
Journal Article
Abstract
Many aspects of the design of efficient crowdsourcing processes, such as defining worker’s
bonuses, fair prices and time limits of the tasks, involve knowledge of the likely duration
of the task at hand. In this work we introduce a new time–sensitive Bayesian aggregation
method that simultaneously estimates a task’s duration and obtains reliable aggregations of
crowdsourced judgments. Our method, called BCCTime, uses latent variables to represent
the uncertainty about the workers’ completion time, the tasks’ duration and the workers’
accuracy. To relate the quality of a judgment to the time a worker spends on a task,
our model assumes that each task is completed within a latent time window within which
all workers with a propensity to genuinely attempt the labelling task (i.e., no spammers)
are expected to submit their judgments. In contrast, workers with a lower propensity
to valid labelling, such as spammers, bots or lazy labellers, are assumed to perform tasks
considerably faster or slower than the time required by normal workers. Specifically, we use
efficient message-passing Bayesian inference to learn approximate posterior probabilities of
(i) the confusion matrix of each worker, (ii) the propensity to valid labelling of each worker,
(iii) the unbiased duration of each task and (iv) the true label of each task. Using two realworld
public datasets for entity linking tasks, we show that BCCTime produces up to
11% more accurate classifications and up to 100% more informative estimates of a task’s
duration compared to state–of–the–art methods.
bonuses, fair prices and time limits of the tasks, involve knowledge of the likely duration
of the task at hand. In this work we introduce a new time–sensitive Bayesian aggregation
method that simultaneously estimates a task’s duration and obtains reliable aggregations of
crowdsourced judgments. Our method, called BCCTime, uses latent variables to represent
the uncertainty about the workers’ completion time, the tasks’ duration and the workers’
accuracy. To relate the quality of a judgment to the time a worker spends on a task,
our model assumes that each task is completed within a latent time window within which
all workers with a propensity to genuinely attempt the labelling task (i.e., no spammers)
are expected to submit their judgments. In contrast, workers with a lower propensity
to valid labelling, such as spammers, bots or lazy labellers, are assumed to perform tasks
considerably faster or slower than the time required by normal workers. Specifically, we use
efficient message-passing Bayesian inference to learn approximate posterior probabilities of
(i) the confusion matrix of each worker, (ii) the propensity to valid labelling of each worker,
(iii) the unbiased duration of each task and (iv) the true label of each task. Using two realworld
public datasets for entity linking tasks, we show that BCCTime produces up to
11% more accurate classifications and up to 100% more informative estimates of a task’s
duration compared to state–of–the–art methods.
Date Issued
2016-07-28
Date Acceptance
2016-07-01
Citation
Journal of Artificial Intelligence Research, 2016, 56, pp.517-545
ISSN
1943-5037
Publisher
Association for the Advancement of Artificial Intelligence
Start Page
517
End Page
545
Journal / Book Title
Journal of Artificial Intelligence Research
Volume
56
Copyright Statement
© 2016 AI Access Foundation. All rights reserved.
Subjects
Artificial Intelligence & Image Processing
0102 Applied Mathematics
0801 Artificial Intelligence And Image Processing
1702 Cognitive Science
Publication Status
Published
Date Publish Online
2016-07