Locating faults in MANET-hosted software systems
File(s)main.pdf (1.61 MB)
Accepted version
Author(s)
Novotny, P
Ko, B-J
Wolf, AL
Type
Journal Article
Abstract
We present a method to locate faults in service-based software systems hosted on mobile ad hoc networks (MANETs). In
such systems, computations are structured as interdependent services distributed across the network, collaborating to satisfy client
requests. Faults, which may occur at either or both the service and network layers, propagate by cascading through some subset of the
services, from their root causes back to the clients that initiate requests. Fault localization in this environment is especially challenging
because the systems are typically subject to a wider variety and higher incidence of faults than those deployed in fixed networks, the
resources available to collect and store analysis data are severely limited, and many of the sources of faults are by their nature
transient. Our method makes use of service-dependence and fault data that are harvested in the network through decentralized,
run-time observations of service interactions and fault symptoms. We have designed timing- and Bayesian-based reasoning
techniques to analyze the data in the context of a specific fault propagation model. The analysis provides a ranked list of candidate fault
locations. Through extensive simulations, we evaluate the performance of our method in terms of its accuracy in correctly ranking root
causes under a wide range of operational conditions.
such systems, computations are structured as interdependent services distributed across the network, collaborating to satisfy client
requests. Faults, which may occur at either or both the service and network layers, propagate by cascading through some subset of the
services, from their root causes back to the clients that initiate requests. Fault localization in this environment is especially challenging
because the systems are typically subject to a wider variety and higher incidence of faults than those deployed in fixed networks, the
resources available to collect and store analysis data are severely limited, and many of the sources of faults are by their nature
transient. Our method makes use of service-dependence and fault data that are harvested in the network through decentralized,
run-time observations of service interactions and fault symptoms. We have designed timing- and Bayesian-based reasoning
techniques to analyze the data in the context of a specific fault propagation model. The analysis provides a ranked list of candidate fault
locations. Through extensive simulations, we evaluate the performance of our method in terms of its accuracy in correctly ranking root
causes under a wide range of operational conditions.
Date Issued
2018-05-01
Date Acceptance
2016-06-09
Citation
IEEE Transactions on Dependable and Secure Systems, 2018, 15 (3), pp.452-465
ISSN
1545-5971
Publisher
IEEE
Start Page
452
End Page
465
Journal / Book Title
IEEE Transactions on Dependable and Secure Systems
Volume
15
Issue
3
Copyright Statement
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor
IBM United Kingdom Ltd
Grant Number
PO4603106973
Subjects
Science & Technology
Technology
Computer Science, Hardware & Architecture
Computer Science, Information Systems
Computer Science, Software Engineering
Computer Science
Fault identification and localization
mobile ad hoc networks
software services
fault propagation
Bayesian network
AD HOC NETWORKS
COMMUNICATION-NETWORKS
WEB SERVICES
IDENTIFICATION
DIAGNOSIS
LOCALIZATION
WIRELESS
PROTOCOL
0803 Computer Software
0804 Data Format
0805 Distributed Computing
Strategic, Defence & Security Studies
Publication Status
Published
Date Publish Online
2016-07-29