53
IRUS TotalDownloads
Altmetric
Efficient partial-pairs SimRank search on large graphs
File | Description | Size | Format | |
---|---|---|---|---|
partial-simrank.pdf | Accepted version | 387.96 kB | Adobe PDF | View/Open |
Title: | Efficient partial-pairs SimRank search on large graphs |
Authors: | Yu, W McCann, J |
Item Type: | Journal Article |
Abstract: | The assessment of node-to-node similarities based on graph topology arises in a myriad of applications, e.g., web search. SimRank is a notable measure of this type, with the intuition that "two nodes are similar if their in-neighbors are similar". While most existing work retrieving SimRank only considers all-pairs SimRank s(*, *) and single-source SimRank s(*, j) (scores between every node and query j), there are appealing applications for partial-pairs SimRank, e.g., similarity join. Given two node subsets A and B in a graph, partial-pairs SimRank assessment aims to retrieve only {s(a, b)}∀aεA,∀bεB. However, the best-known solution appears not self-contained since it hinges on the premise that the SimRank scores with node-pairs in an h-go cover set must be given beforehand. This paper focuses on efficient assessment of partial-pairs SimRank in a self-contained manner. (1) We devise a novel "seed germination" model that computes partial-pairs SimRank in O(k|E| min{|A|, |B|}) time and O(|E| + k|V|) memory for k iterations on a graph of |V| nodes and |E| edges. (2) We further eliminate unnecessary edge access to improve the time of partial-pairs SimRank to O(m min{|A|, |B|}), where m ≤ min{k|E|, Δ2k}, and Δ is the maximum degree. (3) We show that our partial-pairs SimRank model also can handle the computations of all-pairs and single-source SimRanks. (4) We empirically verify that our algorithms are (a) 38x faster than the best-known competitors, and (b) memory-efficient, allowing scores to be assessed accurately on graphs with tens of millions of links. |
Issue Date: | 1-Jan-2015 |
Date of Acceptance: | 1-Jan-2015 |
URI: | http://hdl.handle.net/10044/1/23679 |
DOI: | 10.14778/2735479.2735489 |
ISSN: | 2150-8097 |
Publisher: | VLDB Endowment |
Start Page: | 569 |
End Page: | 580 |
Journal / Book Title: | Proceedings of the VLDB Endowment International Conference on Very Large Data Bases |
Volume: | 8 |
Issue: | 5 |
Copyright Statement: | © ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, 8(5), June 2015. https://dx.doi.org/10.14778/2735479.2735489 |
Sponsor/Funder: | NEC Corporation European Institute of Innovation and Technology - EIT European Institute of Innovation and Technology - EIT Intel Corporation (UK) Ltd Commission of the European Communities |
Funder's Grant Number: | N/A . EIT ICT Labs 2014 PO #3000922346 619795 |
Keywords: | Science & Technology Technology Computer Science, Information Systems Computer Science, Theory & Methods Computer Science SIMILARITY 0802 Computation Theory and Mathematics 0806 Information Systems 0807 Library and Information Studies |
Publication Status: | Published |
Appears in Collections: | Computing Faculty of Engineering |