Towards matching user mobility traces in large-scale datasets
File(s)1709.05772.pdf (2.9 MB)
Accepted version
Author(s)
Kondor, Daniel
Hashemian, Behrooz
de Montjoye, Yves-Alexandre
Ratti, Carlo
Type
Journal Article
Abstract
The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people's mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching process and analyze their impact on the matchability of users. We show that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success only after a one-week long observation of their mobility traces, and over 55% after four weeks. We show that the main determinant of matchability is the expected number of co-occurring records in the two datasets. Finally, we discuss different scenarios in terms of data collection frequency and give estimates of matchability over time. We show that with higher frequency data collection becoming more common, we can expect much higher success rates in even shorter intervals.
Date Issued
2020-12-01
Date Acceptance
2018-09-24
Citation
IEEE Transactions on Big Data, 2020, 6 (4), pp.714-726
ISSN
2332-7790
Publisher
IEEE
Start Page
714
End Page
726
Journal / Book Title
IEEE Transactions on Big Data
Volume
6
Issue
4
Copyright Statement
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Subjects
08 Information and Computing Sciences
Publication Status
Published
Date Publish Online
2018-09-24