Project
Manager: Steve Hitchcock Lead Institution: Southampton University
Duration
of Award: 10/99-09/02
Period
of Report: 10/01-12/02
Version
history of this report
This
version 0.9 DRAFT (internal project use only)
First year report http://opcit.eprints.org/y1report/y1report-final.pdf
Second year report http://opcit.eprints.org/y2report/y2report20.pdf
It has always been clear that awareness of open access to published research papers, which means that users can access the papers free of charge, is not enough among the wider academic community. To widen use of open access what is needed are real tools and services to show that open access works, on terms demanded by authors of research papers. Key to this is causal connection between research access and research impact: open access increases impact. In the UK there are signs the next Research Assessment Exercise will use citation analysis*·, a means of measuring the impact of published research.
With the completion of the Open Citation (OpCit) Project, a broadly-based campaign for raising awareness of open access, embracing the Open Archives Initiative and the Budapest Open Access Initiative, can now be complemented by software for building open access eprint archives, GNU EPrints (also known as eprints.org software), and a citation-ranked search engine for open archives, Citebase. Together these tools enable authors to provide open access to their papers, and to measure impact by citation counts and usage.
The principal partners in the Open Citation Project were Southampton University's IAM Group, the Digital Library Research Group at Cornell University and, arXiv, at the outset of the project based at Los Alamos and now hosted at Cornell.
The method used by the project at Southampton has been to build tools to measure and analyse citations from all 200,000+ papers stored by the arXiv physics archives, the largest eprint archive of its type. These data have been complemented, experimentally, with data on how the archives are used, e.g. which papers are viewed most. Collectively the citation and usage data are stored in Citebase, a citation database which provides a user interface for search and discovery, and a machine interface for analysis of this rich data source data by other services.
With the emergence of OAI and the consequent emphasis on institutional archives, it was evident there would be a need for large numbers of archives smaller than arXiv, but which would need to operate on similar principles — low cost, largely automated deposit, indexing and dissemination of author-archived content. Software used to build CogPrints, a cognitive sciences archive modelled on arXiv, was rewritten to make it OAI-compliant, and then to make it generic. This became the basis of GNU EPrints, which was further developed within the remit of the Open Citation project to generalise the author and management interfaces for open-access archives.
Of most significance, EPrints builds archives that comply with the OAI Protocol for Metadata Harvesting (PMH). This means that any content deposited within an EPrints-based archive will become visible to users of independent OAI services, such as Citebase, immediately enhancing the chances of discovery. Authors depositing papers in an EPrints archive are not required to have any knowledge of OAI metadata: it is generated automatically.
Connecting papers in open eprint archives and a citation database is a method for automatically extracting metadata and reference lists from the papers. There are many different applications for reference linking. The project at Cornell considered the question "what would be the ideal behavior of a digital object that supported reference linking (both incoming and outgoing)"? Answering this question led to an application programming interface (API) for reference linking.
All three components have been tested, evaluated and shown demonstrably to be useful by third-party users, and will continue to be developed and integrated within new projects and products beyond the lifetime of the OpCit project.
The activities of the OpCit project were described by Hitchcock et al.
(2002a).
Citebase is a citation-ranked search and impact discovery service that measures citations of scholarly research papers that are available on the Web in the larger open access, OAI disciplinary archives - currently arXiv (http://arxiv.org/), CogPrints (http://cogprints.soton.ac.uk/) and BioMed Central (http://www.biomedcentral.com/). Citebase harvests OAI metadata records for papers in these archives, automatically extracting the references from each paper. The association between document records and references is the basis for a classical citation database.
The primary means for users of accessing this database is the Citebase Web interface (http://citebase.eprints.org/) (Figure 1). The user can classify the search query terms (typical of an advanced search interface) based on metadata in the harvested record (title, author, publication, date). In separate interfaces, users can search by archive identifier or by citation. What differentiates Citebase is that it also allows users to select the criterion for ranking results by Citebase processed data (citation impact, author impact) or based on terms in the records identified by the search, e.g. date. It is also possible to rank results by the number of 'hits', a measure of the number of downloads and therefore a rough measure of the usage of a paper. This is an experimental feature to analyse the quantitative and the temporal relationship between hit (i.e. usage) and citation data, as measures of impact. Hits are currently based on limited data from download frequencies at the UK arXiv mirror at Southampton only.

Figure 1. Citebase search interface showing user-selectable criteria for ranking results
The combination of data from an OAI record for a selected paper with the references from and citations to that paper is also the basis of the Citebase record for the paper. A record can be opened from a search results list. The record contains bibliographic metadata and an abstract for the paper, from the OAI record. This is supplemented with four characteristic services from Citebase:
· Graph of this Article's Citation/Hit History for the paper
· All Articles Cited by this Article (Reference List)
· Top 5 Articles Citing this Article (option to view All Articles Citing this Article)
· Top 5 Articles Co-cited with this Article (option to view All Articles Co-Cited with this Article)
Another option presented to users from a results list is to open a PDF version of the paper. This option is also available from the record page for the paper. This version of the paper is enhanced with linked references to other papers identified to be within arXiv, and is produced by OpCit. Since the project began, arXiv has been producing referenced linked versions of papers. Although the methods used for linking are similar, they are not identical and OpCit versions may differ from versions of the paper available from arXiv. An important finding of the evaluation will be whether reference linking of full-text papers should be continued outside arXiv. An earlier evaluation found that arXiv papers are the most appropriate place for reference links because users overwhelmingly use arXiv for accessing full texts of papers, and references contained within papers are used to discover new works. (see http://opcit.eprints.org/evaluation/v10/v10evaluation.html).
Just prior to the evaluation Citebase had records for 230,000 papers, indexing 5.6 million references. By discipline, approximately 200,000 of these papers are classified within arXiv physics archives.
Tthe current target user group for Citebase is physicists. The impact being made by OAI should help extend coverage significantly to other disciplines, although because the emphasis of OAI is on promoting institutional archives the impact on disciplines, as measured by services such as Citebase, may take longer to emerge. For this reason there is a need to target this evaluation at prospective users, not just current users, so that Citebase can be designed for an expanding user base.
Prior to evaluation Citebase had not been formally announced and was little used. The evaluation was first announced to selected open discussion targetted at: colleagues in digital library research programmes, advocates of open access to the scholarly literature and librarians. The most significant contributor to increased usage was the inclusion of links, on a trial basis, from abstract pages of papers in arXiv to the corresponding Citebase records.
A notable success of the evaluation has been to increase usage of Citebase, in terms of average daily visits, by more than a factor of 10. There is still considerable scope to increase usage of Citebase by arXiv physicists. According to Paul Ginsparg, founder of arXiv: "(Citebase) is a potentially critical component of scholarly information architecture".

Figure 2. Co-citation map of the entire arXiv collection
A first attempt to extend the analysis and presentation of citation relationships has been explored with OpCit e-Services (http://opcit.eprints.org/eservices/). Like Citebase, the e-Services framework uses OAI to download metadata about papers in arXiv, for which it then provides advanced services:
· simple visualisations (e.g. number of e-print deposits each year)
· knowledge services (e.g. most significant papers)
· co-citation visualisations (uses the co-citedness of papers as a proximity measure when plotting papers on a graph) (Figure 2)
The approach needs refinement before user interface issues can be tackled. First, the large dataset causes computation to slow significantly. Second, due to erroneous or missing citations, some visualisations may not display convincing or useful patterns.
This was the first detailed investigation of the impact on users of an open access Web citation indexing service. The evaluation, including details of methodology, design and results, has been reported by Hitchcock et al. (2002b).
The following elements of Citebase were the focus of the evaluation:
Given the wide prospective user base, what was evaluated was not just the current implementation of Citebase, but the principle of citation-based navigation and ranking.
The evaluation sought to:
The evaluation used two methods to collect data:
The evaluation was open from June 2002, when the first observational tests took place, to the end of October 2002 when a closure notice was placed on the forms. Links from arXiv became active on 20th August.
Valid submissions to Form 1 were received from 195 evaluators. Although the primary target group were physicists, responses also came from mathematicians, computer scientists, information scientists, cognitive scientists, biologists, health scientists, and others.
Overall, results of the evaluation show there is much scope for improvement, but as exemplified by Citebase Web-based citation indexing of open access archives is closer to a state of readiness for serious use than had previously been realised.
Within the scope of its primary components, the search interface and services available from a Citebase record, it was found Citebase can be used simply and reliably for resource discovery. More data need to be collected and the process refined before it is as reliable for measuring impact. As part of this process users should be encouraged to use Citebase to compare the evaluative rankings it yields with other forms of ranking.
Citebase is a useful service that compares favourably with other bibliographic services, although it needs to do more to integrate with some of these services if it is to become the primary choice for users.
The linked PDFs are unlikely to be as useful to potential users as the main features of Citebase; among physicists linked PDFs are likely to be little used.
Although the majority of users were able to complete a task involving all the major features of Citebase, user satisfaction appeared to be markedly lower when users were invited to assess navigability than for other features of Citebase.
Perhaps one of the most important findings of the evaluation is that Citebase needs to be strengthened considerably in terms of the help and support documentation it offers to users.
The first step must be to examine the results of this evaluation to improve the services Citebase offers with a view to establishing Citebase as a service used regularly by all arXiv users.
There are wider objectives and aspirations for developing Citebase. The overarching purpose is to help increase the open-access literature. Where there are gaps in the literature - and there are very large gaps in the open-access literature currently - Citebase will motivate authors to accelerate the rate at which these gaps are filled.
EPrints is aimed at institutions and special-interest communities, and is now used by nearly 60 archives.
In its current incarnation, the name GNU EPrints reflects that it is open source and freely available under the GNU General Public License and conforms to the strict GNU guidelines for free software. The last major release of EPrints, version 2.0, appeared in February 2002, although it has been updated (now on version 2.2.1) to conform with the latest OAI-PMH (also version 2) announced in June. Features of EPrints version 2, described by Gutteridge (2002), include:
· Internationalised metadata stored as Unicode
· Support for multiple archives on one server
· An improved user interface
EPrints has new features that extend its focus on institutional research papers. It is now configurable for adoption as a journal-archive for new open access journals or established journals converting to open access, e.g. Psycoloquy. There are plans to extend EPrints for structured data handling in, for example, e-science applications.
The API uses four principal methods:
Each component produced by these methods can be seen in a typical Citebase record, but this approach is more generalisable to other reference linking applications than that used to build Citebase.
A few Java classes were defined to support reference linking in an object oriented way. These methods can be invoked on the surrogate, a special class in the API that encapsulates data regarding a particular online digital object. To use the API, a new surrogate is instantiated, passing it the URL of the online digital object for which information is to be gathered.
The bulk of the analysis within the API program is done by the surrogate constructor. This call downloads the online work, turns it into XHTML, parses the XHTML, and extracts some information, such as citations and references. The next call on the API invokes the method that returns the references in the form of an XML document, which is then converted to a string and printed.
It is anticipated that repositories will at some point contain reference linking data, so the API was later extended to support persistent storage of surrogates. Once a surrogate is instantiated, it can be saved to a repository, if desired. Thus one could build a repository of surrogates, which could later be re-instantiated and have the basic API methods invoked on them.
The API was used to build several applications against online journals (D-Lib Magazine, Journal of Electronic Publishing, ACM Digital Library). With five methods (the original four, plus save) the API was found to be sufficiently usable. The main limitation of the software is that not all HTML pages are equally easy to analyse, e.g. some HTML is badly written and cannot be converted into XHTML and, therefore, cannot be parsed. This is likely to remain a problem on the Web for some time. A more complete description of the reference linking API and its evaluation, including the D-Lib application, can be found in Bergmark and Lagoze (2001).
All three components described above, and a new component, Paracite, a software agent and search interface for parsing and locating raw references on the Web, are usable by others and will continue to be so beyond the conclusion of the OpCit project. What is available, the means of access, and plans for maintenance of services, are noted below:
·
Citebase is now up-to-date and indexes arXiv fully.
Citebase can be searched by users at http://citebase.eprints.org/.
A machine interface for data sharing with other services is operational, and
Citebase is listed as an OAI 2.0-conforming data provider (http://www.openarchives.org/Register/BrowseSites.pl).
Researchers at Old Dominion University have harvested Citebase data as part of
their Archon federated digital library on physics, and arXiv is a possible
(re)harvester of Citebase data too. Due to ongoing developments with the data
formats, enquiries about the machine interface should be directed at the
developer, Tim Brody tdb01r@ecs.soton.ac.uk. Both
interfaces to Citebase will continue to be developed and maintained.
· GNU EPrints is available as open source software and is downloadable from http://software.eprints.org/. Machine requirements for running GNU EPrints are other open source components including Linux, Apache Web server, Perl and a MySQL database. GNU EPrints will continue to be developed and maintained.
· The Reference linking API was written in Java and is downloadable from the OpCit project site at Cornell http://www.cs.cornell.edu/cdlrg/Reference%20Linking/. The API is no longer being developed.
· Paracite is still experimental, but can be tried at http://paracite.eprints.org/. There are plans to use the reference linking API within Paracite. Paracite will continue to be developed.
The ideas and efforts that have characterised OpCit will be taken forward not just in the obvious products of the project, such as Citebase and GNU EPrints, but in new environments as well.
The JISC Focus on Access to Institutional Resources (FAIR) programme, which is just beginning, includes major projects that will seek to extend the culture of EPrints-based archives in UK universities through the provision and targetting of new archives and supplementary services:
· SHERPA (Securing a Hybrid Environment for Research Preservation and Access), lead institution: Nottingham University, will build EPrints-based archives at six major UK universities, using this experience to report on the implications for management and quality control of such archives.
· E-Prints UK, Resource Discovery Network, King's College London, plans to use Citebase software and citation data from Citebase to enhance its database for discovery of eprint papers available from Open Archives hosted at UK universities and colleges.
· TARDIS (Targeting Academic Research for Deposit and dISclosure), Southampton University, will investigate strategies 'to overcome the technical, cultural and academic barriers', which might be found to be restricting the development of institutional eprint archives, by developing a working model of a multidisciplinary institutional archive based on EPrints.
· RoMEO (Rights MEtadata for Open archiving), Loughborough University, will canvas users to identify (mis)perceptions about how rights should be formulated and protected for 'give away' works — "texts from which the author does not seek sales revenue" — promoting practical approaches that can "assigned, disclosed, harvested, and displayed" via the OAI-PMH.
To improve interoperability, scalability and reliability of OAI services, OpCit has worked with a team from Old Dominion University (USA) on infrastructure components such as proxies and caches (Liu et al. 2002). Proxies, transparent layers acting between data providers and harvesters, can be used to fix simpler encoding errors as part of the delivery process. More serious errors in the data require an intermediate storage approach: caching and aggregation. In this case a few large service providers might harvest and cache metadata from registered OAI repositories, reducing the load on those archives and serving many smaller harvesters. Celestial, an OAI aggregator, is software that harvests metadata from OAI-compliant repositories and re-exposes that metadata to other services - in effect an OAI cache. Metadata can be harvested from Celestial's aggregated collection (all the metadata from all the source repositories), or from repository-specific interfaces. The Celestial software can be downloaded from http://oai-perl.sourceforge.net/
EPrints software is undoubtedly the better known product of the OpCit project, and this is reflected in coverage in popular news and feature sources shown below. It could be argued that Citebase or similar services will ultimately have more impact with users, but EPrints is necessary now and plays a critical role in enabling open-access archives to be filled.
· Colin Steele, E-prints: the future of scholarly communication? InCite, October 2002 http://www.alia.org.au/incite/2002/10/eprints.html
· Konrad Lischka, Der Geist, der aus der Flasche kam, Telepolis magazine, 16th March 2002 (in German) http://www.heise.de/tp/deutsch/special/copy/12031/1.html
Citebase
· Belinda Weaver, Open archives citation tool, InCite, October 2002 http://www.alia.org.au/incite/2002/10/weaver.html
EPrints
· Roy Tennant, Institutional Repositories, Library Journal, 15th September 2002
· Georg C. F. Greve, Brave GNU World - GNU EPrints, Linux Magazin, September 2002 (in German)
http://www.linux-magazin.de/Artikel/ausgabe/2002/09/bgw/bgw.html
· Raym Crow, The Case for Institutional Repositories: A SPARC Position Paper, The Scholarly Publishing & Academic Resources Coalition, August 2002 http://www.arl.org/sparc/IR/ir.html
· Kendra Mayfield, College Archives 'Dig' Deeper, Wired News, 3rd August 2002 http://www.wired.com/news/school/0,1383,54229,00.html
· Jeffrey R. Young, 'Superarchives' Could Hold All Scholarly Output, Chronicle of Higher Education, 5th July 2002 http://chronicle.com/free/v48/i43/43a02901.htm
· Anon. The ghost is out of the bottle, Higher Education & Research Opportunities in the UK, 29th March 2002 http://www.hero.ac.uk/inside_he/the_ghost_is_out_of_the_b1365.cfm
· Ivan Noble, Boost for research paper access, BBC Online News, 14th February 2002 http://news.bbc.co.uk/1/hi/sci/tech/1818652.stm
· Ed Sponsler and Eric F. Van de Velde, Eprints.org Software: A Review, SPARC E-News, August-September 2001 http://www.arl.org/sparc/core/index.asp?page=g20#6
· Kendra Mayfield, The Science of E-Publishing, Wired News, 19th October 2000 http://www.wired.com/news/culture/0,1284,39323,00.html
Citebase
GNU Eprints
Paracite
The following papers and reports were published by the project during the final year of its work from September 2001. The list should be read to include the references above. A full list of publications covering the whole of the project back to 1999 is available at http://opcit.eprints.org/opcitpapers.shtml
Bergmark, D. and Lagoze, C. (2001) "An Architecture for
Automatic Reference Linking". 5th European Conference on Research and
Advanced Technology for Digital Libraries (ECDL),Darmstadt, September
http://www.cs.cornell.edu/cdlrg/Reference%20Linking/tr1842.ps
Bergmark,
D., Phempoonpanich, P. and Shumin Zhao, S. (2001) ”Scraping the ACM
Digital Library”. SIGIR Forum, Vol. 35 No. 2, Fall
http://www.acm.org/sigir/forum/F2001/bergmarkFinal.pdf
Brody, T., Carr, L and Harnad, S. (2002) “Evidence of
Hypertext in the Scholarly Archive”. Proceedings of HT'02, the 13th ACM
Conference on Hypertext, University of Maryland, June 2002
http://opcit.eprints.org/ht02-short/archiveht-ht02.pdf
Gutteridge, C. (2002) "GNU EPrints 2 Overview".
Author eprint, Dept. of Electronics and Computer Science, Southampton
University, October, and in Proceedings 11th Panhellenic Academic Libraries
Conference, Larissa, Greece, November
http://eprints.ecs.soton.ac.uk/archive/00006840/
Harnad, S. (2001) “Skyreading and Skywriting for Researchers: A Post-Gutenberg Anomaly and How to Resolve it”. text-e virtual symposium, 14 – 30 November
http://text-e.org/conf/index.cfm?ConfText_ID=7
Hitchcock, S., Bergmark, D., Brody, T., Gutteridge, C., Carr, L., Hall, W., Lagoze, C. and Harnad, S. (2002a) “Open Citation Linking: The Way Forward”. D-Lib Magazine, Vol. 8, No. 10, October
http://www.dlib.org/dlib/october02/hitchcock/10hitchcock.html
Hitchcock, S., Woukeu, W., Brody, T., Carr, L., Hall, W. and Harnad, S. (2002b) “Evaluating Citebase, an open access Web-based citation-ranked search and impact discovery service”. Evaluation report, IAM Dept., University of Southampton
http://opcit.eprints.org/evaluation/Citebase-evaluation/evaluation-report.html
Liu, X., Brody, T., et al. (2002) “A Scalable Architecture for Harvest-Based Digital Libraries - The ODU/Southampton Experiments”. D-Lib Magazine, Vol. 8, No. 11, November
http://www.dlib.org/dlib/november02/liu/11liu.html
Previously available as arXiv
Computer Science cs.DL/0205071, May 2002
http://arxiv.org/abs/cs.DL/0205071
OpCit-related presentations were given at the following meetings during the final year of the project in 2002. The full list of presentations, including presentations from previous years, with a link to keynote presentations by Stevan Harnad, can be found at http://opcit.eprints.org/opcitpapers.shtml
November 6-8 "Academic Libraries of Open and Continuous Access", 11th Pan Hellenic Conference of Academic Libraries, Larissa, Greece
October 17-19 "Gaining independence with e-prints archives and OAI", 2nd Workshop on the Open Archives Initiative (OAI), CERN, Geneva
September 13 “Open Access Journals - will they fly?” ALPSP/OSI round table meeting, London
June 24-25 JISC/NSF Digital Libraries Initiative (DLI) All Projects Meeting, Edinburgh
May 29 “Applications of Metadata”, a one-day conference organised by the BCS Electronic Publishing Specialist Group, London
May 13-14 First Workshop of the Open Archives Forum, Pisa
April 12 “We can't go on like this: the future of journals”, ALPSP International Learned Journals Seminar, London
March 22 The Future of Journal Publishing, Nottingham University
March 4 CURL ePrints workshop, Glasgow
January 24-25 JISC All-Projects Synthesis Meeting, Manchester
The following researchers were involved with the Open Citation Project during the year reported:
Stevan Harnad, Carl Lagoze (Principal Investigators), Wendy Hall (Chair of Management Committee ), Les Carr (Project Technical Director), Steve Hitchcock (Project Manager ), Donna Bergmark (Linking API), Tim Brody (Citebase), Christopher Gutteridge (EPrints), Mike Jewell (Paracite), Zhuoan Jiao, Simon Kampa (e-Services), Arouna Woukeu (Evaluation)
|
|
Year 1 |
Year 2 |
Year 3 |
Year 4 |
Total |
|
CONSUM |
-4278.11 |
-3335.49 |
-1619.39 |
-656.49 |
-9889.48 |
|
ENTERT |
-161.2 |
-636.04 |
-220.53 |
|
-1017.77 |
|
EQUIP |
-13680 |
-9611.57 |
1209.11 |
-2746.75 |
-24829.2 |
|
S/W |
-91.2 |
0 |
-1020.83 |
|
-1112.03 |
|
SAL |
-59456.5 |
-72068.3 |
-47195.4 |
-26743.1 |
-205463 |
|
TRAVEL |
-10769.8 |
-8094.37 |
-10420.4 |
-1165.48 |
-30450.1 |
|
TOTAL |
-88436.8 |
-93745.8 |
-59267.5 |
-31311.8 |
-272762 |
|
BUDGET |
100846 |
90869 |
99861 |
0 |
291576 |
|
SURPLUS |
12409.22 |
-2876.78 |
40593.51 |
-31311.8 |
18814.11 |
Missing
payments -2500
Year 1: Oct. 1999-Sept. 2000; Year 2: Oct. 2000-Sept. 2001; Year 3: Oct. 2001-Sept. 2002; Year 4: Oct. 2002-Dec. 2002
Underspend in Year 3 was principally due to salaries. The project lost a research assistant, Zhuoan Jiao, at the end of November 2001. Zhuoan was replaced by Tim Brody, who continued to develop Citebase as part of a PhD project, and was paid fees for additional work required by the project.
Year 4 covers an extension to the project to end December 2002 agreed with Rachel Bruce at JISC.
Salaries for Year 4 includes a late claim for fees by Tim Brody.
Equipment spending during years 2 and 3 was mainly to expand capacity or replace faulty or damaged equipment..
Equipment spending in Year 4 was to upgrade Citebase to improve service and reliability in anticipation of increased usage due to collaboration with arXiv.
Some of the remaining budget surplus has been identified to fund part-time work into 2003 to complete the project’s publishing and dissemination activities. Versions of the evaluation report will be published, and at least two further papers on different aspects of the project will be published in 2003.
A brief record of progress against the final year work plan given in the previous report
· Evaluation, analysis, dissemination of data mining, user survey, OpCit demonstrators and other OpCit results
o See OpCit Publications 2001-2 above. The Citebase evaluation report will be edited for journal publication.
· Integrate OpCit with arXiv: develop and promote AMF
o Citebase is linked from arXiv on a trial basis. The results of the evaluation indicate there is a basis for permanent linking.
o AMF, an extended OAI-compliant metadata format for sharing rich metadata such as found in Citebase records, is being considered for this purpose along with other formats (see OAI-implementers discussion thread starting at http://www.openarchives.org/pipermail/oai-implementers/2002-June/000518.html)
· Add OpenURL services: links to OpCit linked demonstrator; work with OpenURL resolver services; build an advanced OpenURL generator to turn references in PDF/TeX/LaTeX/HTML papers to OpenURL requests when viewed
o The primary interface to OpCit links is now via Citebase rather than full-text papers in PDF or other formats.
o Experiments have been performed, with partial success, with an OpenURL resolver at VUB (Brussels). Correspondence with Herbert Van de Sompel, principal architect of OpenURL. is ongoing. Author self-archived data tends to be unstructured, and this is a problem for OpenURL. Paracite may offer a solution. The use of OpenURL for transporting Citebase data will continue to be investigated for the ePrints-UK FAIR project.
· Advanced citation analysis– new measures of impact
o OpCit e-Services provide experimental visualisations of: e.g. most significant papers, number of e-print deposits each year, and co-citations.
· Implement and test EPrints components for reference checking
o The work has migrated to Paracite, and will be developed further.
· Evaluate OpCit project software
o The evaluation focussed on Citebase, since much of the project’s work on reference linking and citation analysis has converged within this interface.
· Migrate non-OAi archives (e.g. NCSTRL)
o This became unnecessary when Virginia Tech was awarded a grant to move NCSTRL into an OAI-conformant framework using EPrints software.
Moves to establish open access for published, peer reviewed research papers have been re-invigorated and re-legitimised in 2002. Momentum has been growing because new services demonstrably prove that open access works: software that allows authors and their institutions to deposit and manage their peer reviewed journal papers in archives; services that allow others to find and access these papers through citation indexing and reference linking, at the same time improving the visibility and impact of authors. In other words, all the discovery services that journal authors and readers are familiar with are enriched through open access with newer measures of "impact", and are accessible continuously online by anyone, any time.1
The Open Citation Project (http://opcit.eprints.org/) has played a major role in this reinvigoration of open access as founder members of important initiatives, the Open Archives Initiative (OAI) and latterly the Budapest Open Access Initiative (BOAI)2, that span the lifetime of the project from 1999 to 2002. The legacy of the project is those services: GNU EPrints archive-creating software (http://software.eprints.org/), and Citebase (http://citebase.eprints.org/), “Google for the refereed literature”.
Open access works because it means that users can access research papers free, and it means that increasingly sophisticated software can be deployed that "will help scholars find what is relevant to their research, what is worthy, and what is new".
Open access works because the costs of electronic storage and maintenance are lower than for print publishing, and can be borne in new ways, in particular by institutions who share with their researchers the benefits of greater visibility and impact. Institutional archives are the way forward for many researchers who do not enjoy the benefits of their colleagues in fields already served by large disciplinary open archives such as arXiv.3,4
Institutional archives can, like disciplinary archives, support unified, global coverage of fields because they are based on the OAI, which has been remarkably successful in motivating an - as the name would imply - open approach to advertising the availability of objects and documents in digital libraries. If digital libraries store records in a form that complies with an acceptable OAI metadata format, then independent services, such as search and indexing services, can collect this data using a protocol defined by the OAI.
Now institutions can extend their digital libraries with archives of research papers that comply with the OAI protocol and metadata simply by using open source GNU EPrints software, which is designed specifically for open access. It works: 60 leading institutions worldwide have adopted GNU EPrints; some have written about their experiences with EPrints.5-7
What these institutions most need to do next is attract authors to these archives. The incentive for authors is exemplified by Citebase, a Web-based citation-ranked search and impact discovery service. Citebase indexes OAI-compliant archives in physics, maths, computer science and biomedical science, but mostly it covers physics. That is simply the current implementation. The principle of citation-based navigation and ranking of papers in OAI-compliant open access archives has been proved8 and can be expanded to other OAI archives.
For authors and institutional archives, indexing, impact measurement and discovery come free with services such as Citebase, which are limited only by the ideas and talents of developers, and by their ability to access the original, raw data.
The JISC Focus on Access to Institutional Resources (FAIR) programme of new projects, which is just underway, includes several projects that will use EPrints and Citebase. Innovations from the Open Citation Project will in this way continue to inform and motivate new and improved tools and services that demonstrate open access archives as a widely applicable and powerful mode of dissemination for all scholarly journal papers.
1 Harnad, S., Why I think research access,
impact and assessment are linked, Times Higher Education Supplement,
18th May 2001 http://www.cogsci.soton.ac.uk/~harnad/Tp/thes1.html
2 Noble, I., Boost for research paper access, BBC
Online News, 14 February 2002 http://news.bbc.co.uk/1/hi/sci/tech/1818652.stm
3 Young, J. R., Superarchives' Could Hold All
Scholarly Output, Chronicle of Higher Education, 5th July 2002
http://chronicle.com/free/v48/i43/43a02901.htm
4 Crow, R., The Case for Institutional
Repositories: A SPARC Position Paper, July 2002
http://www.arl.org/sparc/IR/ir.html
5 Nixon, W., The evolution of an institutional
e-prints archive at the University of Glasgow, Ariadne, July 2002
http://www.ariadne.ac.uk/issue32/eprint-archives/
6 Pinfield, S., et al., Setting up an
institutional e-print archive, Ariadne, April 2002
http://www.ariadne.ac.uk/issue31/eprint-archives/
7 Sponsler, E. and Van de Velde, E. F.,
Eprints.org Software: A Review, SPARC E-News, Aug.-Sept. 2001 http://www.arl.org/sparc/core/index.asp?page=g20#6
8 Hitchcock, S., et al., Evaluating
Citebase, November 2002 http://opcit.eprints.org/evaluation/Citebase-evaluation/evaluation-report.html
· Sam Jaffe, Citing UK Science Quality: The next Research Assessment Exercise will probably include citation analysis, The Scientist, Vol. 16, No. 22, Nov. 11, 2002 http://www.the-scientist.com/yr2002/nov/prof1_021111.html