The Open Citation Project - Reference Linking and Citation Analysis for Open Archives
Explore Open Archives

Core Metalist of Open Access Eprint Archives

The original, annotated version of this metalist appeared in the ARL Bimonthly Report, No. 227, April 2003.
This version last updated June 30, 2003


Open access eprint archives are where authors of published research papers and papers destined for peer reviewed publication can self-archive the full texts of their work for all to see. Researchers who self-archive want to improve access to papers while preserving the recognised quality control established by journals (Harnad 2001). The engine for growth of these archives is the recognition by researchers and policy-makers that the improved impact achieved through open access, demonstrated by Lawrence (2001), is not only desirable but entirely compatible with peer reviewed publication.

What is the scale of open access eprint archives, and of author self-archiving, currently? Despite the rhetoric there are no quantitative studies. The context for such studies is not just the growing scale of open access archives and the sheer number of archives, but the evolving structure of distributed archives and independent services. Web-based open access archives are not simply collections built for browsing but also as open data sources for powerful, automated independent services such as search, aggregation and impact measurement.

The enabling infrastructure for distributed archives and independent data services was introduced by the Open Archives Initiative (OAI) with its Protocol for Metadata Harvesting (PMH) in January 2001 (Lynch 2001). Tomaiuolo and Packer (2000) provided a checklist of disciplinary 'preprint' archives that, because OAI was then in its infancy, recognised the likely influence of cross-archive services such as search but could not have detected the growth in institutional archives that OAI has subsequently motivated.

So a new checklist is warranted, but a list of open access eprint archives, and examination of their contents, is insufficient as a measure of the challenge. It is important to look through the lens at archive service providers too.

Thus, this is not a list of individual open access archives of full-text research papers, but instead lists and comments on other lists of individual archives. This list and its categorisation gives a broad overview of the structure, size and progress of full-text open access eprint archives.

This list will be maintained and updated as far as is possible, and is intended to assist further quantitative research on the open access eprint phenomenon for those who want to measure the growth and quality of open access eprint archives.

For a chronological view of the development of open access institutional archives in the wider context of free online scholarship (FOS), including many of the services and archives listed here, see Suber's Timeline of the FOS Movement.

The Budapest Open Access Initiative (BOAI), which supports both open access eprint archives and journals, has reinvigorated the cause and adoption of services providing open access to full-text research papers. While this list covers eprint archives, Bosc et al. offer an overview of new models of scientific communication (in French) that is more in line with the broader BOAI agenda.


Bosc, Hélène, Simone Jérôme and Jean-Philippe Schmitt (2003) La communication scientifique revue et corrigée par Internet

Harnad, Stevan (2001) "The Self-Archiving Initiative". Nature, 410: 1024-1025

Lawrence, Steve (2001) "Free Online Availability Substantially Increases a Paper's Impact". Nature Web Debate on e-access, May

Lynch, Clifford A (2001) "Metadata Harvesting and the Open Archives Initiative". ARL Bimonthly Report, No. 217, August

Suber, Peter (2002) Timeline of the Free Online Scholarship Movement

Tomaiuolo, Nicholas G. and Packer, Joan G. (2000) "Preprint Servers: Pushing the Envelope of Electronic Scholarly Publishing". Searcher, Vol. 8, No. 9, October

Structure of the metalist

Where the number of archives given in a source is stated, this is an approximate number intended to give an estimate of size. Since the numbers can change on a daily basis these are dated for reference, either by the last-modified date claimed by the resource when viewed, or the date viewed by the compiler of this list.

1 General lists of open access eprint (full-text) archives

Open Directory Project, Free Access Online Archives (60 archives listed, last update 16 March 2003)
Electronic Archives "providing free and unrestricted access to peer reviewed scientific papers and academic publications"

HighWire Press, Earth's Largest Free Full-Text Science Archives (20 archives), list produced to highlight HighWire's Free Online Full-text Articles (see Open access journal archives) as the largest such archive

University of Maryland Libraries, Virtual Technical Reports Center: EPrints, Preprints, & Technical Reports on the Web, "Institutions listed here provide either full-text reports, or searchable extended abstracts of their technical reports". Alphabetical by institution name (last updated March 05, 2003)

University of Virginia Science and Engineering Libraries, Preprint Servers and Databases (33 archives, last modified January 13, 2003), pointers to a variety of electronic pre-print sources in all areas of science and engineering

Tardis (JISC FAIR project 2002- ), E-print and Related Archives with Subject and Institutional Categories Identified (113 archives, first posted January 2003). Institution, multi-institution, subject and multidisciplinary archives

Aardvark, Asian Resources for Libraries, Free preprint and full text science archives (115 archives, viewed 20 March 2003)

American Mathematical Society (AMS), Directory of Mathematics Preprint and e-Print Servers

  • Umbrella servers, which cover all areas of mathematics, e.g.
  • Special subject (disciplinary) servers (17 archives, covering maths, physics)
  • Mathematics departments and institutes (institutional servers, 56 archives, international)
Astronomy Preprints & Abstracts, hosted by National Radio Astronomy Observatory, Charlottesville, VA, linked list of sites, includes institutional preprint servers (56 archives, viewed 20 March 2003)

2 OAI archives

Open Archives Initiative, registered data providers, "conforming repositories" (77 archives, viewed 27 March 2003). Sites found still to be using OAI 1.1 on 2002/12/01 were purged from this list

Open Archives Forum, List of Repositories (20 archives, viewed 20 March 2003). No reasons for selection given (OAF is a focus for dissemination of information about European activity related to open archives and, in particular, to the OAI)

2.1 OAI services-based lists of archives

Celestial, Open Archives gateway that harvests and caches metadata from OAI-PMH repositories and makes these data available for other services to harvest, includes number of records in repository and metadata namespace

OAIster, serving 1,093,169 records from 144 institutions (updated 21 February 2003)

Arc, an experimental cross-archive search service, used to investigate issues in harvesting OAI compliant repositories and making them accessible through a unified search interface, List of Existing Archives (140 archives, viewed 4 April 2003)

my.OAI, user customisable search engine covering selected metadata databases from the OAI, see forms-based list of databases in guest search interface (15 archives, viewed 4 April 2003)

Public Knowledge Project, Open Archives Harvester (12 archives, viewed 20 March 2003). Listed archives have to request harvesting)

Open Archives Initiative - Repository Explorer, Virginia Tech interface to test archives interactively for compliance with the OAI-PMH, see forms-based predefined archive list in Repository Explorer interface (60 archives, viewed 4 April 2003)

3 Lists of institutional archives

SPARC, Select list of Institutional Repositories, by country, lists type of content (mostly preprints, published papers), software used (13 of 26 repositories listed use, last updated February 13, 2002 ), url of repositories

Signal Hill, a European partnership for academic publishing set up by the University Libraries of Utrecht and Delft and Firenze University Press, institutional archives by country (34 archives, viewed 20 March 2003)

3.1 Institutional archives

University of California, California Digital Library eScholarship Repository, offers faculty a central location for depositing any research or scholarly output deemed appropriate by their participating research unit, center, or department, including working papers and pre-publication scholarship

Caltech, Collection of Open Digital Archives (CODA), includes more then 10 repositories in production or in development

US Department of Energy (DOE), the Information Bridge, provides the open source to full-text and bibliographic records of DOE research and development reports in physics, chemistry, materials, biology, environmental sciences, energy technologies, engineering, computer and information science, renewable energy, and other topics. Contains full-text documents produced and made available by the DOE National Laboratories and grantees from 1995 forward. Legacy documents are included as they become available
see also DOE PrePRINT Network, included in section on Centralising subject-based archive gateways

4 archives

GNU EPrints, software for the development of institutional eprint archives, but can also be used to build other types of archives with other types of content. All the repositories known to have been built using the first two version releases of this software are in these two lists (viewed 20 March 2003):
EPrints 2 Archives (37 archives)
EPrints 1 Archives (29 archives)

5 Gateways (indexes, unified search and browse of covered sites)

5.1 Centralising subject-based archive gateways

ArXiv search interfaces
Front for the Mathematics ArXiv, alternative arXiv interface
NASA, Astrophysics Data System (ADS) ArXiv Preprints Query Form
Die Pro-Physik Findemaschine, specialised German search engine, includes arXiv among searchable resources, uses flexible taxonomies to support thematic searching across disciplines

NASA ADS Harvard-Smithsonian Center for Astrophysics Preprints (CfA) Preprints Query Form

The Stanford Linear Accelerator Center (SLAC), SPIRES HEP literature database contains more than 500,000 high-energy physics related articles including journal papers, preprints, e-prints, technical reports, conference papers and theses, indexed by the SLAC and Deutsches Elektronen Synchotron (DESY) libraries since 1974

Citebase, citation-ranked search and impact discovery for arXiv (also covers CogPrints and BioMed Central)

Elsevier, Scirus, "the most comprehensive science-specific search engine on the Internet", covers over 135 million science-related pages, consisting of 120 million Web pages from paid-for sources as well as prominent eprint archives

CERN Document Server (CDS), searchable Web interface to over 550,000 bibliographic records, including 220,000 fulltext documents in particle physics and related areas, covers preprints, articles, books, journals, photographs ...
Results include reference links (including journal links to publisher site, abstract, summary only, not OpenURL) and cited by, but cannot search or rank by citations
CDS services include:

PhysDoc - Physics Documents Worldwide - offers lists of links to document sources, such as preprints, research reports, annual reports, and list of publications of worldwide distributed physics institutions and individual physicists, ordered by continent, country and town

MPRESS, the Mathematics Preprint Search System, a searchable index of preprints from 10 servers, mostly covering geographical servers, but also disciplinary maths servers including Topology Atlas, Algebraic Number Theory Archives and K-theory Preprint Archives, as well as the mathematics part of the arXiv mirror at Augsburg

US Department of Energy (DOE), PrePRINT Network, searchable gateway to preprint servers that deal with scientific and technical disciplines of concern to DOE: physics, materials, and chemistry, as well as portions of biology, environmental sciences and nuclear medicine. Browse sites at
see also DOE Information Bridge

NTRS, NASA Technical Reports Server, search interface for 18 databases

5.2 Decentralising archive gateways

Networked Computer Science Technical Reference Library (NCSTRL) is being developed into a sustainable OAI conformant framework in a collaborative project involving NASA Langley, Old Dominion University, University of Virginia and Virginia Tech
Browse list of participating archives

Networked Digital Library Of Theses And Dissertations (NDLTD), theses rather than eprints, but included here as an example of an archive aiming to present open access to full-text research outputs

Open Language Archives Community (OLAC), creating a worldwide virtual library of language resources, 21 participating archives, three service providers including OLAC Aggregator, Swahili Language Resources, and a virtual service provider. Open Language Archives are repositories of language data, documentation and description, including texts, recordings, dictionaries, grammars and field notes, where there is an intent to make the materials openly available, includes any such repository which has an accessible digital component, even if it is just an online catalog or a few digital holdings (use of "open" is inspired by OAI). Less an eprint archive, more a preservation and rescue service for language resources

5.3 The Economics network (RePEc) example

RePEc is a large database of working papers, journal articles and software components, with records on over 177,000 items, over 86,000 of which are available online (27 Feb 2003)

The following services provide access to all or part of the RePEc database for browse or search:

RePEc Archives

Current archive providers to RePEc
Participating institutions provide over 1000 RePEc series (many of the top series are journal series or smaller databases). LogEc list of the top 25 RePEc series of the past month

Working Papers in Economics

WoPEc, all papers in WoPEc are downloable but not necessarily free (contains over 80,000 documents in electronic format: 53035 Working Papers, 41895 Journal Articles, last updated 23 March 2003) Among the largest contributing RePEc archives are the following working paper archives:

RePEc-modelled archives, not economics

Documents in Information Science (DoIS) is a database of articles and conference proceedings published in electronic format in the area of Library and Information Science, holds about 10042 articles and 3045 conference proceedings, 6928 of them are downloable (28th February 2003)

A more broadly based database, rclis (Research in Computing, Library and Information Science) is in development

6 Open access journal archives

BioMed Central (120 journals at 20 Feb. 2003)

PubMed Central (PMC) is the U.S. National Library of Medicine's digital archive of life sciences journal literature (52 participating journals at 20 Feb. 2003)

HighWire Press Free Online Full-text Articles (list limited to journals published online with the assistance of HighWire Press). At 28 Feb. 2003, 472,871 full-text articles were available free from 1,358,713 total articles
Free Online Full-text Articles is the top entry in Earth's Largest Free Full-Text Science Archives (a list produced by HighWire Press)

Advances in Theoretical and Mathematical Physics is an overlay of the arXiv archives. All papers are archived at LANL and its mirror sites. ATMP maintains only links to the above archive, thus realising one of the first e-journals as an overlay to the global eprint archives

BBS Prints Interactive Archive of the journal Behavioral and Brain Sciences containing original refereed 'target' papers, open peer commentary and repsonses (OAI compliant, journal archive)

Psycoloquy, articles and peer commentary in all areas of psychology as well as cognitive science, neuroscience, behavioral biology, artificial intelligence, robotics/vision, linguistics and philosophy ( archive)

Open access journals per se, without an archive connection, are not included here.

7 Disciplinary archives

arXiv (1991-  ), main administration site at Cornell University, multiple mirrors worldwide, manages access to over 230.000 papers, abstracts include links to citation anlysis for the paper by SLAC Spires and Citebase

Citeseer (1998-  , aka ResearchIndex), developed at NEC Research Institute, NJ, USA, caches openly accessible full-text research papers on computer science found on the Web in Postscript and PDF formats for autonomous citation indexing, it is claimed to index over 500,000 papers. Not yet OAI compliant, but planned to become so

ebizSearch (2001-  ), administered by the eBusiness Research Center at Pennsylvania State University, based on Citeseer software, autonomously creates citation indexes of e-commerce literature. The search engine crawls Web sites of universities, commercial organizations, research institutes and government departments to retrieve academic articles, working papers, white papers, consulting reports, magazine articles, and published statistics and facts. Not all documents are stored by eBizSearch, which performs a citation analysis of all articles accessed


* searchable via MPRESS

The International Mathematical Union adopted a resolution (May 2001) encouraging mathematicians to make their work available online: "Open access to the mathematical literature is an important goal. ... Our action will have greatly enlarged the reservoir of freely available primary mathematical material, particularly helping scientists working without adequate library access."

Cognitive Science

  • Cogprints (1997-  ), an electronic archive for self-archived papers in any area of Psychology, Neuroscience, and Linguistics, and many areas of Computer Science, Philosophy, Biology, Medicine, Anthropology, as well as any other areas pertinent to the study of cognition, initially a project in the JISC Electronic Libraries (eLib) Programme, administered by the IAM Group, University of Southampton

Library and Information Science (LIS)

  • E-LIS, E-Prints in Library and Information Science
  • DList, Digital Library of Information Science and Technology (October 2002-  ), managed by School of Information Resources and Library Science and Arizona Health Sciences Library, University of Arizona

Publisher supported (author self-archiving) preprint archives

Elsevier appears a little shy of associating itself with the latter two preprint servers. The connection is not indicated on the home pages of the Computer Science and Mathematics servers, but is made clear on the 'About' pages within the respective services (although even that has not always been the case, as this email correspondence attests). The servers are not linked from the Elsevier Science home page, nor can they be found easily if at all by browsing from this page, and search returns no results for 'preprint servers' (tried 27 March 2003). All services are searchable from Scirus, and the Mathematics preprint server is linked from Elsevier Science's Mathematics Web portal.

Many journals operate a preprint archive, making electronic copies of papers available pre- print publication. These are typically not based on author self-archiving nor are they open access, and so are not covered here.

Other disciplinary archives

  • HTP Prints, the History & Theory of Psychology Eprint Archive (September 2001-  ), administered at York University, Toronto
  • Education-line (1997-  ), a freely accessible database of the full text of conference papers, working papers and electronic literature which supports educational research, policy and practice, initially a project in the JISC Electronic Libraries (eLib) Programme, administered by the Brotherton Library, University of Leeds
  • Social Science Research Network (SSRN), Social Science Electronic Publishing, Inc., working papers and abstracts are provided by journals, publishers, and institutions for distribution through SSRN's eLibrary, which consists of two parts: a database containing abstracts on over 49,200 scholarly working papers and forthcoming papers, and an Electronic Paper Collection containing over 30,800 (27 March 2003) downloadable full-text documents. SSRN is composed of specialized research networks/journals in the social sciences: Accounting, Economics, Financial Economics, Legal Scholarship, Management, Negotiations. From an eprint perspective this is a curious amalgam, not a pure eprint archive at all, more a subscription-based service. The business model and purpose are not clear. Are downloadable papers freely downloadable? Clearly some are, but what proportion, if not all, is not clear. Networks can be browsed separately, but not searched separately, it appears. It does not seem to be possible to search only for freely downloadable papers
  • ArchiveSIC (open archive on Sciences de l'Information et de la Communication), full-text papers on information and communication science (bilingual site in French/English)
  • Electronic Colloquium on Computational Complexity (papers from 1994), led by the chair of theoretical computer science and new applications at the University of Trier. Research reports, surveys and books in computational complexity
  • Cryptology ePrint Archive (2000- ), maintained by the International Association for Cryptologic Research (IACR), incorporates contents of the Theory of Cryptology Library 1996-1999
  • The Digital Library of the Commons (DLC), Indiana University, contains a Working Paper Archive of author-submitted papers, as well as full-text conference papers, dissertations, working papers and pre-prints. (The commons is a general term for shared resources in which each stakeholder has an equal interest. Studies on the commons include the information commons with issues about public knowledge, the public domain, open science, and the free exchange of ideas.)
  • Organic Eprints (September 2002-  ), established by the Danish Research Centre for Organic Farming (DARCOF), open access archive for papers related to research in organic agriculture
  • University of California International and Area Studies (UCIAS) Digital Collection (October 2002-  ), partnership of the University of California Press, the California Digital Library (CDL), and internationally oriented research units on eight UC campuses, publishes articles, monographs, and edited volumes that are peer-reviewed according to standards set by an interdisciplinary UCIAS Editorial Board and approved by the University of California Press
  • Formations, Faculty of Arts, University of Ulster, hosts eprints in Media Studies and participative 'eLearning Forums' based on short discussion papers. Initially a project in the JISC Electronic Libraries (eLib) Programme
  • Ecology Preprint Registry (papers from July 2001), hosted at the National Center for Ecological Analysis and Synthesis, dissemination of new research results destined for publication (i.e. not white papers or gray literature), only preprints with a theoretical basis can be submitted, the scope may be expanded to include submissions from the entire discipline of ecology
  • PhilSci Archive (January 2001-  ), hosted at the Departments of Philosophy and of History and Philosophy of Science, University of Pittsburgh, preprints in the philosophy of science

