Altmetric
Global and Local Knowledge in Citation Network Formation
Title: | Global and Local Knowledge in Citation Network Formation |
Authors: | Evans, T Goldberg, S |
Item Type: | Dataset |
Abstract: | Talk
presented at NetSci-X, Wrocław, 12th January 2016.Models of the citation
distribution of academic papers have a long history (Price, 1965). One aim is
to illustrate if certain simple processes can explain important features. In
this paper we focus on the fact that the distribution of citations for papers
of a similar age scales primarily with the average number of citations (Radicchi, Fortunato,
& Castellano 2008, Evans, Hopkins
& Kaube, 2012), with the shape otherwise largely invariant. In particular
the width of such distributions (as measured by the s parameter of a lognormal fit to reasonably well cited papers) shows no
temporal evolution. Simple multiplicative processes or basic models such as the
Price model (Price 1965) give dramatically different results, typically the
distributions become narrower over time.
We found that to get a
reasonable agreement our model had to incorporate three key aspects: local
searches of the network (these generate preferential attachment), such local
searches had to start from recent papers, and finally some knowledge of the
global set of papers was needed. from recent papers, plus global access to
papers, we needed.
To check our results we used data
from the citation network of the hep-th section of the arXiv repository (KDD
cup 2003) as a benchmark. Our three-parameter model was able to produce an
acceptable fit to the hep-th data over 11 different years (see figure).
We find the best fits for our
model to our data is when around 70% to 80% of papers cited are ‘subsidiary
papers’, papers found from local searches through the bibliographies of other
papers. Interestingly similar results have been seen by Simkin and Roychowdhury
(2005) derived from an analysis of mistakes in bibliographic entries. In our
terminology these would be citations to subsidiary papers so both sets of
results are consistent. Further support for this result comes from the
transitive reduction analysis of Clough et al. (2015).
We conclude that the citation
patterns we see are based on around 25% of papers found from some global source
come from reflect a mixture of local searches from papers found through some
global information but favouring recent papers, with the remainder then found
by local searches. Talk presented at NetSci-X, Wrocław, 12th January 2016.Models of the citation distribution of academic papers have a long history (Price, 1965). One aim is to illustrate if certain simple processes can explain important features. In this paper we focus on the fact that the distribution of citations for papers of a similar age scales primarily with the average number of citations (Radicchi, Fortunato, & Castellano 2008, Evans, Hopkins & Kaube, 2012), with the shape otherwise largely invariant. In particular the width of such distributions (as measured by the s parameter of a lognormal fit to reasonably well cited papers) shows no temporal evolution. Simple multiplicative processes or basic models such as the Price model (Price 1965) give dramatically different results, typically the distributions become narrower over time. We found that to get a reasonable agreement our model had to incorporate three key aspects: local searches of the network (these generate preferential attachment), such local searches had to start from recent papers, and finally some knowledge of the global set of papers was needed. from recent papers, plus global access to papers, we needed. To check our results we used data from the citation network of the hep-th section of the arXiv repository (KDD cup 2003) as a benchmark. Our three-parameter model was able to produce an acceptable fit to the hep-th data over 11 different years (see figure). We find the best fits for our model to our data is when around 70% to 80% of papers cited are ‘subsidiary papers’, papers found from local searches through the bibliographies of other papers. Interestingly similar results have been seen by Simkin and Roychowdhury (2005) derived from an analysis of mistakes in bibliographic entries. In our terminology these would be citations to subsidiary papers so both sets of results are consistent. Further support for this result comes from the transitive reduction analysis of Clough et al. (2015). We conclude that the citation patterns we see are based on around 25% of papers found from some global source come from reflect a mixture of local searches from papers found through some global information but favouring recent papers, with the remainder then found by local searches. Talk presented at NetSci-X, Wrocław, 12th January 2016.Models of the citation distribution of academic papers have a long history (Price, 1965). One aim is to illustrate if certain simple processes can explain important features. In this paper we focus on the fact that the distribution of citations for papers of a similar age scales primarily with the average number of citations (Radicchi, Fortunato, & Castellano 2008, Evans, Hopkins & Kaube, 2012), with the shape otherwise largely invariant. In particular the width of such distributions (as measured by the s parameter of a lognormal fit to reasonably well cited papers) shows no temporal evolution. Simple multiplicative processes or basic models such as the Price model (Price 1965) give dramatically different results, typically the distributions become narrower over time. We found that to get a reasonable agreement our model had to incorporate three key aspects: local searches of the network (these generate preferential attachment), such local searches had to start from recent papers, and finally some knowledge of the global set of papers was needed. from recent papers, plus global access to papers, we needed. To check our results we used data from the citation network of the hep-th section of the arXiv repository (KDD cup 2003) as a benchmark. Our three-parameter model was able to produce an acceptable fit to the hep-th data over 11 different years (see figure). We find the best fits for our model to our data is when around 70% to 80% of papers cited are ‘subsidiary papers’, papers found from local searches through the bibliographies of other papers. Interestingly similar results have been seen by Simkin and Roychowdhury (2005) derived from an analysis of mistakes in bibliographic entries. In our terminology these would be citations to subsidiary papers so both sets of results are consistent. Further support for this result comes from the transitive reduction analysis of Clough et al. (2015). We conclude that the citation patterns we see are based on around 25% of papers found from some global source come from reflect a mixture of local searches from papers found through some global information but favouring recent papers, with the remainder then found by local searches. |
Issue Date: | 8-Jan-2016 |
URI: | http://hdl.handle.net/10044/1/30252 |
DOI: | http://dx.doi.org/10.6084/m9.figshare.2061870 |
Keywords: | Citation network bibliometrics directed acyclic graphs Digital Humanities Library and Information Studies Informetrics Statistical Theory Theoretical and Applied Mechanics |
Notes: | URL: http://dx.doi.org/10.6084/m9.figshare.1452953 URL: http://www.issi2015.org/en/Proceedings-of-ISSI-2015.html |
Appears in Collections: | Faculty of Natural Sciences - Research Data |