Open access eprint archives, where authors of published research papers can self-archive their work for all to see, pose a challenge to journal publishers. Researchers wants to improve access to papers while preserving the recognised quality control established by journals. Open access archives will cause journals to review the business model and focus on adding new, digital features.
What is the scale of this challenge currently? Despite the rhetoric there are no quantitative studies. It can't be that difficult to produce a list of open access archives, surely? Actually, it is harder than might be imagined, not just because of the growing scale of open access archives and the sheer number of archives, but by the evolving structure of distributed archives and independent services. Unlike journals, which are by design distinct and bounded entities (a collection of papers bounded by an editorial framework enforced by peer review standards), Web-based open access archives are not simply collections built for browsing but also as open data sources for powerful, automated independent services such as search, aggregation and impact measurement. For this reason open access archives do not need a user interface, although most do have one. From a prospective reader's viewpoint (or that of someone surveying these archives), an archive may have no independent presence other than through a service interface.
The critical infrastructure required to support distributed archives and independent data services was introduced by the Open Archives Initiative (OAI) with its Protocol for Metadata Harvesting (PMH) in January 2001 (Lynch 2001). Tomaiuolo and Packer (2000) provided a checklist of disciplinary 'preprint' archives that, because OAI was then in its infancy, recognised the likely influence of cross-archive services such as search but could not have detected the growth in institutional archives that OAI has subsequently motivated.
So a new checklist is warranted, but a list of open access eprint archives, and examination of their contents, is insufficient as a measure of the challenge. It is important to look through the lens at archive service providers too.
Thus, this is not a list of individual open access archives of full-text research papers, but instead lists and comments on other lists of individual archives. This list and its categorisation gives a broad overview of the structure, size and progress of full-text open access archives, and is intended to be useful for further quantitative research on the open access archive phenomenon.
Until 1999 many institutionally-based archives would have had a departmental
bias and contained technical reports (TRs), the Guild
Model identified by Kling et al. (2002). Since then the Open
Archives Initiative (OAI) has given momentum to a new type of institutional
archive that contains eprints of published (refereed) journal papers produced
within research and educational institutions. OAI archives can be disciplinary
or institutional, but its primary contribution has been to motivate new
institutional archives. Not all OAI archives serve full-text papers, and
it is definitely not a pre-condition of compliance with OAI that the items
described by OAI metadata are openly or freely accessible.
Both types of archive, full-text and non-full-text, can be found in
OAI
archives.
Where TR archives were essentially separate archives that could be indexed
(see for example the Unified
Computer Science Technical Report Index (UCSTRI) list of sites, one
of the first TR indexes on the Web) but had to be accessed and searched
separately for each institution or department, the OAI-PMH enables independent
services to provide common search and browse interfaces covering many archives.
To give users an idea of scope and coverage, these automated services typically
provide useful details of the indexed archives.
Find more details of OAI archives in OAI services-based
lists of archives
Some lists focus on institutional archives as the most likely area for
growth of open access, OAI-based eprint archives.
See Lists of institutional archives
Institutional archives can be distinguished by the type of software
used to build the archives. As can be deduced from the lists of institutional
archives, the software most widely used for this is produced by Eprints.org
(also known as GNU EPrints from version 2 of the software, to indicate
its availablity as open source software under the GNU licence). Eprints.org-based
archives are mostly institutional, but not exclusively so. The Cogprints
disciplinary archive was built with software that evolved to become Eprints.org.
Other types of archive software are becoming available, and no doubt there
will soon be lists of archives supported by these packages. Whichever software
is chosen, these packages invariably produce archives that comply with
the OAI, so this list will overlap with the OAI list above.
For now see Eprints.org archives.
It is not the intent in this paper to list individual institutional
archives extensively, although a few are chosen to highlight different
implementation
models, described by Tennant (2002), adopted within institutions to
motivate the uptake of archive services across the range of cultures and
disciplines found within academic institutions.
See Institutional archives.
OAI services were not the first to introduce unified search and browse
interfaces for archives. Various gateway services preceded these. While
not archives in their own right, these services are important for the way
in which they have enabled the structure of different archives to evolve.
Some gateways are based on the largest archives, in this case the physics,
maths and computer science archives at arXiv. For example, a number of
previously independent maths archives merged with arXiv without loss of
functionality or focus due to interfaces such as the Front for the Mathematics
ArXiv. Other services combine searches on high-energy physics and astronomy
in arXiv with bibliographic sources.
See Centralising subject-based archive gateways.
Gateways have not exerted solely a centralising influence, and in two
notable examples, RePEc (Research Papers in Economics) and NCSTRL (Networked
Computer Science Technical Reference Library), can be found forerunners
of the distributed OAI model: independent archives, indexes and databases.
RePEc
is a large database of papers, an "Open Library", open to contributions
and providing open data for user services (Krichel 2000). Interpretations
vary on the proportion of material available as full texts from the constituent
archives of 'working papers', but RePEc is claimed to be the "second-largest
source of freely downloadable scientific preprints" after arXiv. The growth
and appeal of NCSTRL appears to have been limited by the large administrative,
maintenance and metadata overhead imposed on participating institutional
archives, a lesson learnt by the OAI designers who wanted a simpler, more
widely accepted standard metadata format describing the contents of archives.
NCSTRL is being converted into an OAI-compliant index.
See Decentralising archive gateways: the
Economics network (RePEc) example.
Perhaps one of the more surprising developments in the wider context
of full-text archives is the growth of open access journal archives. Papers
in these archives are not deposited by authors but by journal publishers.
Mostly this is focussed on biomedical journals, and was initiated by PubMed
Central, the US National Library of Medicine's site, which has grown significantly
after a slow start, and makes copies of subscription-based journals available
some tine after publication. HighWire Press, a large producer of biomedical
e-journals, similarly makes delayed copies of journal papers available
free. Unlike PubMed Central and HighWire, the publisher BioMed Central
has pioneered a new business model of original open access journals funded
through author and institutional payments for review and publication. For
some in this field the progress represented by these examples is not enough,
as they will be joined by new open
access journals from the Public Library of Science (PLoS). The model
adopted by PubMed Central and PLoS has been endorsed by the Budapest
Open Access Initiative (BOAI), which by supporting both open access
archives and journals has reinvigorated the cause and adoption of services
providing open access to full-text research papers. There are other distinctive
and successful journal-archive models, such as Advances in Theoretical
and Mathematical Physics, a journal 'overlay' of some arXiv physics
archives that has published high-impact papers. Open access journals per
se, without an archive connection, are not included here.
See Open access journal archives.
Athough it is not intended to list individual archives, some disciplinary
archives are significant enough to be included in their own right. These
archives demonstrate a wide range of types, from the ubiquitous arXiv,
to the large Citeseer autonomously indexed collection of computer science
papers mostly cached from authors' personal Web pages, to publisher-sponsored
preprint collections, as well as smaller, specialised archives.
See Disciplinary archives (not a comprehensive
list).
For a chronological view of the development of open access institutional archives in the wider context of free online scholarship (FOS), including many of the services and archives listed here, see Suber's Timeline of the FOS Movement.
This commented version of the archives metalist is just a snapshot of an emerging new phenomenon, of distributed institutional archives with real and growing open access content including published research papers. The engine for growth of these archives is the recognition by researchers and policy-makers that the improved impact achieved through open access, demonstrated by Lawrence (2001), is not only desirable but entirely comapatible with peer reviewed publication. The core metalist will be maintained and updated on the Explore Open Archives section of the Open Citation Project Web site.
This list includes sources that were considered to be either current or recently updated at the time of the investigation in March 2003.
HighWire Press, Earth's Largest Free Full-Text Science Archives (20
archives), list produced to highlight HighWire's Free Online Full-text
Articles (see Open access journal archives) as the
largest such archive
http://highwire.stanford.edu/lists/largest.dtl
University of Maryland Libraries, Virtual Technical Reports Center:
EPrints, Preprints, & Technical Reports on the Web, "Institutions listed
here provide either full-text reports, or searchable extended abstracts
of their technical reports". Alphabetical by institution name (last updated
March 05, 2003)
http://www.lib.umd.edu/ENGIN/TechReports/Virtual-TechReports.html
University of Virginia Science and Engineering Libraries, Preprint Servers and Databases (33 archives, last modified: January 13, 2003) pointers to a variety of electronic pre-print sources in all areas of science and engineering http://viva.lib.virginia.edu/science/guides/s-preprn.htm
Tardis (JISC FAIR project 2002- ), E-print and Related Archives with
Subject and Institutional Categories Identified (first posted January 2003).
Institution, Multi-institution, Subject and Multidisciplinary archives
http://tardis.eprints.org/discussion/eprintarchivessubjecttable9103.htm
Aardvark, Asian Resources for Libraries, Free preprint and full text
science archives (115 archives, viewed 20 March 2003)
http://www.aardvarknet.info/user/subject19/index.cfm?all=All
American Mathematical Society (AMS), Directory of Mathematics Preprint and e-Print Servers http://www.ams.org/global-preprints/
Open Archives Forum, List of Repositories (20 archives, viewed 20 March
2003). No reasons for selection given
http://www.oaforum.org/oaf_db/list_db/list_repositories.php
OAIster, serving 1,093,169 records from 144 institutions (updated 21 February 2003) http://oaister.umdl.umich.edu/o/oaister/viewcolls.html
Arc, an experimental cross-archive search service, List of Existing
Archives
http://arc.cs.odu.edu:8080/oai/admin.jsp
my.OAI, user customisable search engine covering selected metadata databases
from the OAI, see forms-based list of databases in guest search interface
http://www.myoai.com/search/Search.cgi/LoginForm?Login=guest&Password=guest
Open Archives Initiative - Repository Explorer, Virginia Tech interface to test archives interactively for compliance with the OAI-PMH, see forms-based predefined archive list in Explorer interface http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai
Public Knowledge Project, Open Archives Harvester (12 archives, viewed
20 March 2003). Listed archives have to request harvesting)
http://www.pkp.ubc.ca/harvester/archives.php
Caltech Collection of Open Digital Archives (CODA), includes more then 10 repositories in production or in development http://library.caltech.edu/digital/
The Information Bridge provides the open source to full-text and bibliographic
records of Department of Energy (DOE) research and development reports
in physics, chemistry, materials, biology, environmental sciences, energy
technologies, engineering, computer and information science, renewable
energy, and other topics. The Information Bridge consists of full-text
documents produced and made available by the Department of Energy National
Laboratories and grantees from 1995 forward. Additional legacy documents
are also included as they become available in electronic format http://www.osti.gov/bridge/
see also PrePRINT Network
SLAC SPIRES HEP literature database contains more than 500,000 high-energy physics related articles indexed by the SLAC and DESY libraries since 1974 http://www.slac.stanford.edu/spires/hep/
Citebase, citation-ranked search and impact discovery for arXiv (also covers CogPrints and BioMed Central) http://citebase.eprints.org/help/coverage.php
NASA ADS
ArXiv Preprints Query Form http://adsabs.harvard.edu/preprint_service.html
Harvard-Smithsonian Center for Astrophysics Preprints (CfA) Preprints
Query Form http://adsabs.harvard.edu/cfa/preprints.html
CERN Document Server (CDS), searchable Web interface to over 550,000
bibliographic records, including 220,000 fulltext documents in particle
physics and related areas, covers preprints, articles, books, journals,
photographs ... http://weblib.cern.ch/
Results include reference links (including journal links to publisher
site, abstract, summary only, not OpenURL) and cited by, but cannot search
or rank by citations
CDS services include:
MPRESS, The Mathematics Preprint Search System, a searchable index of preprints from 10 servers, mostly covering geographical servers, but also disciplinary servers including Topology Atlas, Algebraic Number Theory Archives (frozen since Jan 2003) and K-theory Preprint Archives, as well as the mathematics part of the arXiv mirror at Augsburg http://mathnet.preprints.org/
PrePRINT Network, Department of Energy's searchable gateway to preprint
servers that deal with scientific and technical disciplines of concern
to DOE: physics, materials, and chemistry, as well as portions of biology,
environmental sciences and nuclear medicine. Browse sites at http://www.osti.gov/preprints/ppnbrowse.html
see also Information Bridge
NTRS, NASA Technical Reports Server, search interface for 18 databases
http://techreports.larc.nasa.gov/cgi-bin/NTRS
The following services provide access to all or part of the RePEc database for browse or search:
A more broadly based database, rclis (Research in Computing, Library and Information Science) is in development
Networked Computer Science Technical Reference Library (NCSTRL) is being developed into a sustainable OAI conformant framework in a collaborative project involving NASA Langley, Old Dominion University, University of Virginia and Virginia Tech http://www.ncstrl.org/
Networked Digital Library Of Theses And Dissertations (NDLTD) http://www.ndltd.org/
PubMed Central (PMC) is the U.S. National Library of Medicine's digital archive of life sciences journal literature (52 participating journals at 20 Feb 2003) http://pubmedcentral.nih.gov/
HighWire Press Free Online Full-text Articles (list limited to journals published online with the assistance of HighWire Press). As of 2/28/03, 472,871 free full-text articles are included in the Free Online Full-text Articles from 1,358,713 total articles http://highwire.stanford.edu/lists/freeart.dtl
Advances in Theoretical and Mathematical Physics is an overlay of the arXiv archives. All papers are archived at LANL and its mirror sites. ATMP maintains only links to the above archive thus realising the first e-journal as an overlay to the global e-print archives http://www.intlpress.com/journals/ATMP/
BBS Prints Interactive Archive of the journal Behavioral and Brain Sciences containing original refereed 'target' papers, open peer commentary and repsonses (OAI compliant, Eprints.org journal archive) http://www.bbsonline.org/
Psycoloquy, articles and peer commentary in all areas of psychology as well as cognitive science, neuroscience, behavioral biology, artificial intelligence, robotics/vision, linguistics and philosophy (Eprints.org archive) http://psycprints.ecs.soton.ac.uk/
Citeseer (aka ResearchIndex), indexes Postscript and PDF research articles on computer science on the Web, and provides autonomous citation indexing, caches copies of freely available papers. Developed by NEC Research Institute, it is claimed to index over 500,000 papers. Not yet OAI compliant, but planned to become so http://citeseer.nj.nec.com/cs
EbizSearch, based on Citeseer, autonomously creates citation indexes of e-commerce literature. The search engine crawls Web sites of universities, commercial organizations, research institutes and government departments to retrieve academic articles, working papers, white papers, consulting reports, magazine articles, and published statistics and facts. For certain documents, the database only stores the hyperlinks to those documents. eBizSearch performs a citation analysis of all the academic articles accessed http://gunther.smeal.psu.edu/
Kling, Rob, Lisa Spector and Geoff McKim (2002) "Locally Controlled
Scholarly Publishing via the Internet: The Guild Model".
SLIS
Indiana University, Center for Social Informatics, Working Paper No. WP-
02-01 http://www.slis.indiana.edu/csi/WP/WP02-01B.html
also in Proceedings of the 2002 Annual Meeting of the American Society
for Information Science and Technology, Philadelphia, PA, November,
and Journal of Electronic Publishing, Vol. 8, No. 1, August http://www.press.umich.edu/jep/08-01/kling.html
Krichel, Thomas (2000) "RePEc, an Open Library for Economics". March
http://openlib.org/home/krichel/papers/salisbury.html
Lawrence, Steve (2001) "Free Online Availability Substantially Increases
a Paper's Impact". Nature Web Debate on e-access, May
http://www.nature.com/nature/debates/e-access/Articles/lawrence.html
Lynch, Clifford A (2001) "Metadata Harvesting and the Open Archives
Initiative". ARL Bimonthly Report, No. 217, August
http://www.arl.org/newsltr/217/mhp.html
Open Citation Project, Explore Open Archives http://opcit.eprints.org/explorearchives.shtml
Public Library of Science, Journals http://www.publiclibraryofscience.org/journals.htm
Suber, Peter (2002) Timeline of the Free Online Scholarship Movement http://www.earlham.edu/~peters/fos/timeline.htm
Tennant, Roy (2002) "Institutional Repositories". Library Journal, 15 September 2002 http://libraryjournal.reviewsnews.com/index.asp?layout=article&articleid=CA242297&display=Digital+LibrariesNews&industry=Digital+Libraries&industryid=3760&verticalid=151
Tomaiuolo, Nicholas G. and Packer, Joan G. (2000) "Preprint Servers:
Pushing the Envelope of Electronic Scholarly Publishing". Searcher,
Vol. 8, No. 9, October
http://www.infotoday.com/searcher/oct00/tomaiuolo&packer.htm
Unified Computer Science Technical Report Index (UCSTRI) http://www.cs.indiana.edu/ucstri/sitelist.html