The ratio or the difference, or even (P-U)/(P+U)
One suggestion for display is to show the standard error bars for your
averages (standard error is the roout mean square deviation from the
mean, divided by the sqaure toor of N: could be drived by doing monthly
averages and calculating their variance).
Divide the papers into hi/med/lo impact using several measures:
(1) Total hits per paper in the archive
(2) Citations of the paper (cannot be done in early embryo stages).
in the Archive
We should discuss whether/how we can get this data
while
Tim and Ian are with us - sh94r
Hits are in the download stats already. Citations can come from
Les/Zhouan's link data and, supplemented, from ISI/SCI (I will
contact them when we have made a bit more progress) - Harnad
(3) Citations of the paper, ISI statistics (tell me what you need and
I'll contact ISI about access to their database)
Which database? We have access to Web of Science -
sh94r
...try [Web of Science] but papers in XXX may be too early to be
picked up by ISI, but the AUTHORS will have their own impact
factors, which we can calculate for a sample of lo, med and hi)
... try the BIDS citation index now ... vanishes end of July -
Harnad
See SOTON library for WoS/BIDS.
(4) Impact factor (citation ratio) of the AUTHOR (rather than the
paper): easier to get (both from the Archive, using Zhuoan's tools, and
from ISI).
Further subdivide by hi/med/lo on each of the above measures and
sector: HEP/ASTRO/COND/other
The experimental design is then Impact (3 levels) by sector (4 levels)
Zhouan: Currently working on the tools to produce the citation information. Also it would be easier to ascertain the impact factor of a paper rather than an author (considering different formats of name, and many authors are quoted for one paper - should the position of the author be considered?
Author's hit-rate; author's citation-ratio ("impact factor")
| Citations | Author | No. Citations/No. of Papers |
| Papers | No. Citations | |
| Hits | Author | (No. of Papers Factor)*No. Hits/No. of Papers |
| Paper | No. Hits |
Hypothesis: That authors who use the archive will have a higher "impact
facter" than those who don't (over period 1991 -> 2000).
| Early | Later | |
|---|---|---|
| Use Archive | Low | High |
| Not Archive | Medium | Medium |
Important. Here we can use other forms of analysis: Latent Semantic Analysis (I can contact Tom Landauer about the LSA software for research purposes), Shimon Edelman's similarity metric, shared keywords, co-citation
Produce report on LSA technique.
Contact each of the other mirror sites (compose a letter and send it to me: I could edit and send for you).
LSA and other techniques
How valid is the use of LSA? To make an accurate assessment of the "spread" of an area, a physics dictionary will be needed plus a "core" set of papers that should be in the area. What will this tell us about the archive?
Area Analysis - does this
answer this
question? What details are needed for the kinds of updates?
XXX specifically tells authors to replace their
papers, therefore there shouldn't be any "linking" going on? Or are you
refering to changing citations?
tdb198
Re-writes of text-body (how big), re-writes of abstract, and front-matter, journal reference insertion
SLAC/SPIRES
Again, draft a text and I can liaise for you:
Heath O'Connell hoc@SLAC.Stanford.EDU
They have all the validated biblio data for all of HEP and many other
areas of physics. We MUST use that info to cross-check whether those
papers in XXX whose authors have not given journal-refs are indeed in
journals. Those stats are essential -- again subdivided by
impact-level (hi/med/lo) and sector (HEP/astro/cond/other)
Get the same info for Astro from the Astro database (pboyce@aas.org)
Hypothesis: That high-impact authors will deposit papers that get published/are published. Low-impact authors will submit articles that will never be published. Papers that aren't published - why are they submitted to XXX?
For papers that are not tech-reports/non journal-refed:
Search SPIRES - for article title [did they not replace original]
Email author sample - was article published/where?
ASTRO-PH - does astro store pre-prints, are authors using XXX to store just preprints because they can't store them in Astro? Look in astro/contact authors to find out behaviour.
and at each impact level -- and compare across the years as XXX grew and practise evolved...
and AAS and maybe even ISI
Contact authors who updated with JR, but not paper, why they didn't/whether they made changes.
This is one of many variables you will want to correlate with impact (which can be measured the 4 ways mentioned above): latency (how soon the hits occur); whether journal ref is given; sector; etc.
For hep-ph (the largest area in the archive), during the 7 month period, only 8 papers were replaced and 217 had their abstracts updated. Is there enough data to answer this question? - tdb198
i) Is a paper published?
ii) Does the author say that it is published?
iii) Did the version number change?
iv) If it is not updated what is the "diff" between submitted and
deposited papers?
Would need to obtain the published paper from Journal
(ISI?) - tdb198
Actions:
Tim: Think up preamble for questionnaire, estimate what people are going to send back.Actions:
Tim:
Ian:
Citations:
(stored in /export/3/users/lac/CITED).
Citations of hep-th to articles in XXX
SCOOT - script to apply spotcite to hepth on arabica /export/2/XXX_PDF
DOLIST - script to take SC.OUT0 and provide list of article x cited article y [ZZZ]
Action:
(note these are my notes, so please don't fry me if I get anything wrong!)
Present:
Stevan Harnad
Steve Hitchcock
Zhuoan Jiao
Ian Hickman
Tim Brody
(Bits relevent to ePrint usage research:)
LSA: Could Ian research LSA technique/produce some info on how it works. Harnad: Need to have a "core" set of HEP papers to test against.
Harnad: 4 tests for impact of articles:
Citations: Author & Papers
Hits: Author & Papers
Steve Hitchcock: Where do we want to go with impact factor
analysis?
Analysing Low-impact vs High-impact authors: Ideally low-impact authors
will be able to increase their impact by using XXX [difference in impact
of early papers and later papers]
Tim: SPIRES doesn't contain publication [journal-ref] entries for all papers (of sample of 10, 2 had j/r). Harnad: For papers that do not contain journal-refs, need to contact sample of authors to ascertain what has happened to these papers. Has the article been published in a book/conference etc. Zhouan: This is classed as published.
Astro-ph section of XXX. Why is it so popular? Low proportion of deposited papers have J-R, how does this relate to ASTRO e-Archive? Contact authors/ASTRO find out what deposits in XXX are? Tim: Large number of technical reports in astro.
Concerning updates to papers/journal-ref addition. Steve Hitchcok: Authors who update with J-R? Contact authors to find out whether they changed paper/why they didn't (didn't bother because very little change/because it was published and they want people to look in journal?)
Citation analysis: [from earlier] can Zhouan produce some statistics on citation ratios/can Ian look at Les' code to extract this info? Use ISI to get citation ratios?
Zhouan: Questions over author extraction; how much sharing of names is there?
Action:
Next Meeting
Date of next tech meeting: 2 Weeks From Now
Date of next general meeting: 22nd August 2000