2.0 The LANL Archive

The subset in question is the NSF/DOE-supported Los Alamos National Laboratory (LANL) Eprint Archive  <http://xxx.lanl.gov >, which already contains over half the current physics journal literature and is growing at the rate of 25,000 papers annually, with over 35,000 users daily, and 15 mirror sites around the world. LANL also contains the Computing Research Repository (CoRR), which can be accessed directly through LANL or through the more generalized and integrated interface of the Networked Computer Science Technical Reference Library (NCSTRL) (Davis & Lagoze 1999). LANL (Paul Ginsparg) and CoRR/NCSTRL (Carl Lagoze, Joe Halpern) are partners in this Project, in association with ACM (Association of Computer Machinery;  William Arms).

The LANL Archive represents a substantial body of literature in Physics, Mathematics and Computer Science, but the full texts are archived in a variety of forms, as a database of formats spanning HTML to TeX to PDF to PS; the first problem that needs to be solved is designing a way to integrate and navigate them seamlessly.

One especially important feature of full texts -- their reference list -- is arguably the most natural and powerful way of interconnecting and navigating this literature. The "links" are already provided by the authors themselves, and users already have a long, skilled tradition of navigating with them "offline" (looking up the references in paper).

In the recently completed, JISC-funded Open Journal and CogPrints Projects, the UK partners (Wendy Hall, Stevan Harnad, Les Carr) have successfully used citation linking to interconnect a small but interdisciplinary "seed" database of full texts in the Cognitive Sciences with a much larger 10-year set of abstracts and their reference lists from a subset of the ISI (Institute for Scientific Information  http://www.isinet.com/prodserv/citation/citsci.html ) journal citation database in the Cognitive Sciences (Psychology, Neurobiology, Computer Science, Linguistics, Philosophy). This work has already gone some way toward solving the problem of automatically recognizing and linking (within and between texts) the finite but noisy set of existing citation formats (Hitchcock et al. 1997a-c, 1998a,b; Giles et al. 1998; Bolacker et al. 1998). The reaction of users was exhilaration with citation-based navigation, but frustration at accessing only abstracts. The obvious conclusion to be drawn was that the real power of citation linking can only be realized with full-text linking. That is what the LANL Archive makes possible.