Home | You are at

Analysis of Article Authors


Last updated September 07 2000 14:10:18.

(Tables are only available to Soton viewers)

Authors (about 76000 authors)

Authors 2 (Table 1), broken down by area and only having no firstname(slightly less 76000 authors)

Example of same name, different spelling:

4. nucl-th/9810016 [abs, src, ps, other] :

Title: Charmed Mesic Nuclei: Bound D and $\bar{D}$ states with ^{208}Pb
Authors: K. Tsushima (1), D. H. Lu (1), A. W. Thomas (1), K. Saito (2), R. H. Landau (3) ((1) CSSM and The University of Adelaide, Australia (2) Tohoku College of Pharmacy, Sendai, Japan (3) Oregon State University, USA)
Comments: RevTex, 14 pages, 3 Postscript figures, version to appear in Phys. Rev. C, title, abstract, text, references are modified
Journal-ref: Phys.Rev. C59 (1999) 2824-2828

5. hep-lat/9810005 [abs, src, ps, other] :

Title: Nucleon Magnetic Moments Beyond the Perturbative Chiral Regime
Authors: Derek B. Leinweber, Ding H. Lu, Anthony W. Thomas
Comments: Revised version accepted for publication includes a new section demonstrating extrapolations of lattice QCD results
Journal-ref: Phys.Rev. D60 (1999) 034014

Example distribution of name format:

S.W._Hawking12
S._W._Hawking8
Stephen_W._Hawking4
Stephen_Hawking3
S.W.Hawking2

[authors.txt] These names would be collapsed to S.W.Hawking, Stephen_W.Hawking and Stephen_Hawking.

[authors2.txt] These names would be collapsed to S.W.Hawking and S.Hawking.

This data does not include papers submitted before 10/94, as at that time there are no author meta-tags.

Author Citations

Using the citation data provided by Dr. Les Carr, hep-th, 98-00 we can build a table of the number of citations that individual authors have got (disregarding the importance or not of the author). See Table 2.

Then, using Table 1, a mean number of citations per paper can be built for the author, Table 3 (Author, Citations, Papers, Citations/Papers).

Graph of the number of citations an author has received, against the number of papers that author has written. A trend (Excel: poly 2) is shown in black.

Excluding Self-Citation

Using the same technique as above the citation "impact" can be found for authors, except excluding any occasions where an author references themselves.

Source Paper - Cited Paper
AuthorsA - AuthorsB
...do not give a citation to author B if that author is in set A

This results in Table 4.

(Code to generate mean citations/author awk '{print $1"\t"$2"\t"$3"\t"($2/$3)}' < d_notauthorcitations2 | sort -rn +3 > d_notauthorcitations3)

Defining Impact

(Tim's patent-pending bear-no-relationship-to-statistics-method)

Using Table 4, where $2 is the sum citations (y axis) and $3 is the sum papers (x axis).

ImpactTotalCit'sPapersShell Script
High338612948411awk '{ if( $2 >= 50 && $3 >= 10 ) print $0 }' < d_authorcitations3 > d_highimpact
Medium27563000928926awk '{ if( ($2 < 50 || $3 < 10) && $2 > 1 && $3 > 1 ) print $0 }' < d_authorcitations3 > d_medimpact
Low2215361512269awk '{ if( ($2 < 50 || $3 < 10) && ($2 == 1 || $3 == 1)) print $0 }' < d_authorcitations3 > d_lowimpact

Although some highly-cited authors may be excluded from "High Impact", because I require a minimum number of papers. It is assumed that an author's lack of articles shows that they either do not use the archive or have not written many papers, in which case their impact may be a "one off".

Splitting By Thirds

Using the citations/papers ratio as sort algorithm, then splitting the authors into three equal groups.

ThirdCitationsPapers
Top8465416169
Middle711311020
Bottom315122417

Splitting Using Quartiles

Using the citations/papers ratio as cumulator for quartiles. Taking top/bottom 25% and middle 50%.

Adding in the number of deposits that the articles that these authors have deposited have, and taking the mean over the number of authors. Dividing this by the mean number of papers per author generates a deposit "rate" for the sector - the mean number of deposits per paper per author.

QuartileTotalCitationsPapersDeposits(Deposit Authors)Deposits/AuthorPapers/AuthorDeposit Rate
Top 125290111649278712322.658513.1921.718
Middle 1119495361027915613106814.61899.1861.591
Bottom 4066163713767847809384912.42119.2671.340
Quartile Citation Impact
(Cites/Papers/Authors)
Hits Impact (all areas)
(Hits/Papers/Authors)
Top0.14111.873
Middle0.004318.185
Bottom0.0001075.085
(Unranked)4.280

Authors Per Paper

Total number of papers are the number of abstracts, that are after 1995 - we can't get authors before that time:
grep hep-th < q1/d_papers | grep abs | restrictcol '-3/(1991)|(1992)|(1993)|(1994)/' | wc

Total number of authors:
grep hep-th < d_authors | awk '{ print $1 }' | sort | uniq | wc

AreaPapersAuthorsAuthors/Papers
hep-th1453470360.484
hep-ph1937482660.427
cond-mat20521154250.752
astro-ph20629140270.680
math720052550.730

Authors per Paper, by Impact

Using the impact level author list, a list of papers by those authors can be compiled. Using that list of papers a list of authors who are named for those papers can be built.

Authors/Paper 12345678910
High Impact 383 458 240 135 27 9 1 1 0 1
Medium Impact 2024 2332 1526 608 128 46 12 3 1 5
Low Impact 8403 7695 5174 2132 689 259 127 67 45 44

Paper state by Author Impact

The state of papers, by author impact level.



For Whole Archive

Using spotcites data for all papers.

What proportion of citations does this cover?

wc SCOOT.OUTwc d_papersTotal CitationsTotal "Red Link"/"Orange Link" CitationsAntique
3,090,1311322192,957,912603,460836,945

This gives 100*(603460/2957912) = 20.40% (i.e. 1 in 5 citations), as a proportion of all citations in the archive. (603460/132219) = 4.56 citations/paper identified, against 2957912/132219 = 22.37 citations/paper identified from PDF source.

Analysing how many red/orange links have been picked up, by year. Using the raw citation data (paper -> citation), the number of references for a given year can be found by taking the first two digits from the paper reference. The total number of papers deposited in that year can then be found by using a listing of all papers in the archive and using the first two digits of the paper references. When taking the total number of papers, any papers from areas that did not have any references were ignored.

Total citations = 597688. Total papers = 115940 (only includes 2000 up to June).

YearPapers DepositedCitationsCitations/Paper
91305190.0623
922,8911,2910.447
936,1277,5761.24
948,90119,1712.15
9511,03439,2403.56
9613,70961,0194.45
9717,310100,7145.82
9821,040132,0966.28
9924,163142,8885.91
0010,46093,6748.96

Using quartiles we come up with the following split for authors:

QuartileTotalCitationsPapersCites/Author/PaperDepositsMean Deposits/AuthorVariance
Top798240,0922,7320.1106,7201.475260.301527
Middle9,262733,27237,3180.0021293,6711.369820.218753
Bottom28,211251,92567,9510.000131165,9711.26650.189012

Mean number of citations/author (ignoring the number of papers those authors have deposited).

QuartileTotalSum CitationsMeanVariance
Top798240092300.8671213.259
Middle926273327279.170203.716
Low282112519258.93030.281

Mean number of hits/paper (by author impact).

ImpactTotalSum HitsMeanVariance
High2732226748.299232.487
Medium373181447143.87871.434
Low679511958672.88237.584

This graph shows the proportion of authors with a given deposit rate for different impact levels. The number of authors for each deposit rate is shown.

Papers that have authors from different impact levels/% of all unique papers in combined area:

HighMediumLow
High-1586/4.12%254/0.361%
Medium1586/4.12%-12881/13.9%
Low254/0.361%12881/13.9%-

Papers with authors from all three impact levels: 155/ (155/93435) 0.166%

This diagram shows the approximate authorship of papers (the area of all the circles are all the papers, and each circle represents the authors of those papers). Therefore where the circles intercect is where papers have authors from more than one impact level.




This graph shows the cumulative number of papers against the number of citations for those papers (divided into high, med, low impact authors).

Authors per paper (awk '{ print $2 }' d_highimpactauthorpapers | sort | uniq -c | awk '{ c++; s += $1 } END { print s/c }'):

LevelMeanVariance
High1.564427.50354
Med1.810824.52046
Low1.947484.31381

These graphs show the frequency of papers broken down by the number of citations they receive and by what impact the authors were (so these graphs may feature the same paper more than once, as a individual paper may have more than one author).

This graph shows the age of citations (the time difference between a paper being deposited and its referenced papers being deposited), broken down by the impact factor of the paper's authors.

How long have authors been depositing?

Using the authored list (paper ref * author name), the time difference in months can be found between the first paper the author deposited and the last. This includes authors who have only one paper in the archive (defined as have a period of 0 months).

Total AuthorsMean Timediff(months)Variance(months)
7506213.824431.147




Author names can not be easily extract pre-1994, so there is a peak at 5 years of usage from all the authors who have continually deposited from before that period, but only appear in 1995.

Looking at the time between every paper deposited by an author:

Total 2+ PapersMean Timediff(months)Variance(months)
1534186.76462.150

This graph is based on taking the time difference, in order, between papers deposited by authors (the yymm part of the paper reference), excluding the time difference between two papers deposited in the same month (i.e. 0).

Growth of Authors Over Time

By using the meta data "author" field, the number of unique authors of papers per year can be found (for most areas the number of authors can not be easily found at or before 1994).

YearAuthors
1991411
19921152
19931439
19945958
199515198
199617762
199722359
199827785
199932673
2000-0619593

Analysing number of authors by LANL subfield

Using the authors meta-data field the author list can be found for each paper. The total number of authors and total number of papers can then be found by summing each occurence of a unique author for each area and each occurence of a unique paper for each area. To find the variance the number of authors per paper was also stored.

Because the authors meta-data field did not exist in some areas before 1995 all these years have been ignored.

Area Authors Papers Authors/Papers Standard Deviation
acc-phys 114 43 2.651 8.828
adap-org 305 245 1.245 1.219
alg-geom 649 854 0.760 1.059
ao-sci 19 13 1.462 0.761
astro-ph 12754 17509 0.728 4.244
atom-ph 112 68 1.647 1.258
bayes-an 8 11 0.727 0.273
chao-dyn 1391 1416 0.982 1.733
chem-ph 143 89 1.607 1.551
cmp-lg 641 671 0.955 1.326
comp-gas 119 78 1.526 1.357
cond-mat 13766 17411 0.791 2.272
cs 915 661 1.384 1.960
dg-ga 398 501 0.794 1.025
funct-an 169 213 0.793 0.895
gr-qc 2959 5098 0.580 2.202
hep-ex 2493 1689 1.476 7.065
hep-lat 1299 2517 0.516 3.438
hep-ph 7597 16875 0.450 2.635
hep-th 5987 12788 0.468 1.769
math 3808 5324 0.715 1.131
math-ph 781 705 1.108 1.090
mtrl-th 213 148 1.439 1.765
neuro-dev 2 1 2.000 0.000
neuro-sys 33 13 2.538 1.321
nlin 530 303 1.749 1.365
nucl-ex 2141 432 4.956 13.016
nucl-th 2906 4078 0.713 2.490
patt-sol 435 351 1.239 1.572
phys-lib 3 2 1.500 0.500
physics 2966 2051 1.446 2.459
plasm-ph 48 28 1.714 1.321
q-alg 773 1161 0.666 1.457
quant-ph 2789 3975 0.702 1.782
solv-int 582 747 0.779 1.335
supr-con 137 64 2.141 2.097

We can also analyse the distribution of authors between archive sub-fields by finding the intersection and union between sets of authors from different fields. The values shown is the cardinality of intersection divided by cardinality of union.

Home