Theory for socio-demographic enrichment performance using the inverse discrete choice modelling approach
File(s)ZhaoEtAl_TR_PartB_Revised_2_Final.docx (1.23 MB)
Accepted version
Author(s)
Zhao, Yuanying
Pawlak, Jacek
Sivakumar, Aruna
Type
Journal Article
Abstract
In light of the growing availability of big data sources and the essential role of socio-demographic information in travel behaviour and transport demand modelling more broadly, the enrichment of socio-demographic attributes for anonymous big datasets is a key issue that continues to be explored. The common shortcoming of existing socio-demographic enrichment approaches concerns their lack of consistent theory that can link their enrichment performance (i.e. the ability to correctly enrich the required attribute) to the underlying covariance structure in the anonymous big datasets. In other words, existing approaches are unable to indicate, prior to the enrichment, to what extent it will be successful. Instead, they require undertaking the enrichment itself to assess and validate it post factum, incurring the effort and cost of the activity. An alternative and arguably preferable way would be to have a prior indicator as to whether an enrichment is likely to be sufficiently effective for the desired application.
Towards this end, this paper draws upon the Inverse Discrete Choice Modelling (IDCM) approach to demonstrate what is termed as the IDCM performance theory, which systematically and in a tractable manner links the socio-demographic enrichment performance of the IDCM approach to the structure of the underlying datasets. This is achieved by recalibration of the constant, a technique adopted from conventional discrete choice modelling practice, while also drawing upon information theory employed in the context of communication systems. The established IDCM performance theory is validated in two empirical applications where performance of the IDCM approach in enriching several socio-demographic attributes, given travel behaviour patterns, is successfully estimated. Additionally, the IDCM approach is found to perform comparably to commonly used methods in previous socio-demographic enrichment efforts. It is thus argued that the capability of the IDCM performance theory to predict and explain its enrichment performance under different data conditions can facilitate informed and transparent transferability of the IDCM framework in the socio-demographic enrichment for anonymous big datasets.
Towards this end, this paper draws upon the Inverse Discrete Choice Modelling (IDCM) approach to demonstrate what is termed as the IDCM performance theory, which systematically and in a tractable manner links the socio-demographic enrichment performance of the IDCM approach to the structure of the underlying datasets. This is achieved by recalibration of the constant, a technique adopted from conventional discrete choice modelling practice, while also drawing upon information theory employed in the context of communication systems. The established IDCM performance theory is validated in two empirical applications where performance of the IDCM approach in enriching several socio-demographic attributes, given travel behaviour patterns, is successfully estimated. Additionally, the IDCM approach is found to perform comparably to commonly used methods in previous socio-demographic enrichment efforts. It is thus argued that the capability of the IDCM performance theory to predict and explain its enrichment performance under different data conditions can facilitate informed and transparent transferability of the IDCM framework in the socio-demographic enrichment for anonymous big datasets.
Date Issued
2022-01-01
Date Acceptance
2021-11-13
Citation
Transportation Research Part B: Methodological: an international journal, 2022, 155, pp.101-134
ISSN
0191-2615
Publisher
Elsevier
Start Page
101
End Page
134
Journal / Book Title
Transportation Research Part B: Methodological: an international journal
Volume
155
Copyright Statement
© 2021 Elsevier Ltd. All rights reserved. This manuscript is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence http://creativecommons.org/licenses/by-nc-nd/4.0/
Identifier
https://www.sciencedirect.com/science/article/pii/S0191261521002095
Subjects
0102 Applied Mathematics
0905 Civil Engineering
1507 Transportation and Freight Services
Logistics & Transportation
Publication Status
Published
Date Publish Online
2021-11-27