With the advent of digital archives information transfer between researchers can be achieved at a far quicker rate than that possible in the world of the paper press.
Authors can deposit a unrefereed pre-print and in the same day be read by an international audience of like-minded researchers. This is to be contrasted with the printed journal which must first decide whether the paper is of an appropriate content and have the paper peer-reviewed (although there are digital archives that exercise a peer-review system, often providing the choice of papers). The journal may only be published bi-annually, leading to a possible time-gap between a paper being written by an author and being published of upwards of a year. By this time the paper may be obsolete or the work been already carried out by another research team.
The effect of digital archives on citation patterns is crucial and interesting; does the rapid availability and readership of papers reduce the time gap between a paper being deposited and it being cited? How does this relate to the citation of articles in the printed press? Will authors cite unrefereed preprints while waiting for the refereed postprint?
Using the Los Alamos National Laboratory Digital Physics Archive [arXiv] we can analyse the citation patterns within the digital domain, that is we can analyse the citation "links" between two papers deposited in the archive.
The arXiv archive provides us with the means to build a list pairing of each paper with the identifiable papers
that it cites. Each paper in the archive is given a unique digital identity, formulated from a subject area (for
example "hep-th" is High Energy Physics - Theoretical), a date indentifier (consisting of the year and month of
deposition) and a three digit index number. Given two arXiv papers we can find the difference, with an accuracy of
one month, between the two deposition dates. For example:
hep-th/0005048 - Deposited in 5/2000, cited:
hep-th/9908105 - Deposited in 8/1999.
Giving a time difference of 9 months. This calculation can then be performed for all the identified citations
from this paper.
By looking retrospectively at the archive we obtain what look like erroneous results: negative time differences (i.e. a paper cited a paper that had not come in to existence yet). These occurences account for 6289 citations, of the 603,460 total identified citations. This can be explained by:
By finding the time differences between a number of papers and their cited papers a table can be built of the latency of citations - a list of the arXiv id of the paper, the paper that is being cited and the date of the paper's deposit minus the date the cited paper was deposited.
paper reference time diff
astro-ph/9501044 hep-ph/9408302 5 astro-ph/9501044 hep-ph/9408342 5 astro-ph/9501044 hep-ph/9406139 7 astro-ph/9501044 gr-qc/9302019 23 astro-ph/9501074 astro-ph/9311052 14 astro-ph/9501074 astro-ph/9311057 14 astro-ph/9501085 astro-ph/9312023 13 astro-ph/9501085 astro-ph/9311064 14 astro-ph/9501085 astro-ph/9311003 14 ...
This set can then be broken down by the year that the paper was deposited, therefore building a picture of how citation behaviour may have changed over the period the archive has been active.
The archive has been active since 1991, therefore we can analyse papers deposited from 1992 through until 1999 - covering citations from 1991 up to the most recent.
We can see that, following a quick growth in citations, there is an approximate linear decrease of citations being of a certain age, within arXiv. As we move back to papers at and before 1995 the citation behaviour appears to change from having a peak at 2-4 months to 12 months. This suggests that over the period of the archive author behaviour has changed from citing through the paper press (which has a typical "leading edge" of 12 months - Embryology), to citing in arXiv, where the only time delay is the author writing the paper.
This should be taken in the context of the general archive behaviour, that depositions to the archive have been growing linearly since its inception in 1991, therefore the older the citation the less "chance" it has of being to a paper that is in the archive. We can plot a graph of this linear growth along with the citation ages, and find a ratio between the two.
Even with taking into account the background population of papers there is still a significant linear decay of citations, suggesting that, with current behaviour, authors cite more recent papers, up to a peak of highest citations within 4 months.