Home | You are at

# Analysis of Article Authors

Last updated September 07 2000 14:10:18.

(Tables are only available to Soton viewers)

Authors 2 (Table 1), broken down by area and only having no firstname(slightly less 76000 authors)

Example of same name, different spelling:

### 4. nucl-th/9810016 [abs, src, ps, other] :

Title: Charmed Mesic Nuclei: Bound D and $\bar{D}$ states with ^{208}Pb
Authors: K. Tsushima (1), D. H. Lu (1), A. W. Thomas (1), K. Saito (2), R. H. Landau (3) ((1) CSSM and The University of Adelaide, Australia (2) Tohoku College of Pharmacy, Sendai, Japan (3) Oregon State University, USA)
Comments: RevTex, 14 pages, 3 Postscript figures, version to appear in Phys. Rev. C, title, abstract, text, references are modified
Journal-ref: Phys.Rev. C59 (1999) 2824-2828

### 5. hep-lat/9810005 [abs, src, ps, other] :

Title: Nucleon Magnetic Moments Beyond the Perturbative Chiral Regime
Authors: Derek B. Leinweber, Ding H. Lu, Anthony W. Thomas
Comments: Revised version accepted for publication includes a new section demonstrating extrapolations of lattice QCD results
Journal-ref: Phys.Rev. D60 (1999) 034014

Example distribution of name format:

 S.W._Hawking 12 S._W._Hawking 8 Stephen_W._Hawking 4 Stephen_Hawking 3 S.W.Hawking 2

[authors.txt] These names would be collapsed to S.W.Hawking, Stephen_W.Hawking and Stephen_Hawking.

[authors2.txt] These names would be collapsed to S.W.Hawking and S.Hawking.

This data does not include papers submitted before 10/94, as at that time there are no author meta-tags.

### Author Citations

Using the citation data provided by Dr. Les Carr, hep-th, 98-00 we can build a table of the number of citations that individual authors have got (disregarding the importance or not of the author). See Table 2.

Then, using Table 1, a mean number of citations per paper can be built for the author, Table 3 (Author, Citations, Papers, Citations/Papers).

Graph of the number of citations an author has received, against the number of papers that author has written. A trend (Excel: poly 2) is shown in black.

### Excluding Self-Citation

Using the same technique as above the citation "impact" can be found for authors, except excluding any occasions where an author references themselves.

Source Paper - Cited Paper
AuthorsA - AuthorsB
...do not give a citation to author B if that author is in set A

This results in Table 4.

(Code to generate mean citations/author awk '{print $1"\t"$2"\t"$3"\t"($2/$3)}' < d_notauthorcitations2 | sort -rn +3 > d_notauthorcitations3) ### Defining Impact (Tim's patent-pending bear-no-relationship-to-statistics-method) Using Table 4, where$2 is the sum citations (y axis) and $3 is the sum papers (x axis). ImpactTotalCit'sPapersShell Script High338612948411awk '{ if($2 >= 50 && $3 >= 10 ) print$0 }' < d_authorcitations3 > d_highimpact
Medium27563000928926awk '{ if( ($2 < 50 ||$3 < 10) && $2 > 1 &&$3 > 1 ) print $0 }' < d_authorcitations3 > d_medimpact Low2215361512269awk '{ if( ($2 < 50 || $3 < 10) && ($2 == 1 || $3 == 1)) print$0 }' < d_authorcitations3 > d_lowimpact

Although some highly-cited authors may be excluded from "High Impact", because I require a minimum number of papers. It is assumed that an author's lack of articles shows that they either do not use the archive or have not written many papers, in which case their impact may be a "one off".

#### Splitting By Thirds

Using the citations/papers ratio as sort algorithm, then splitting the authors into three equal groups.

ThirdCitationsPapers
Top8465416169
Middle711311020
Bottom315122417

#### Splitting Using Quartiles

Using the citations/papers ratio as cumulator for quartiles. Taking top/bottom 25% and middle 50%.

Adding in the number of deposits that the articles that these authors have deposited have, and taking the mean over the number of authors. Dividing this by the mean number of papers per author generates a deposit "rate" for the sector - the mean number of deposits per paper per author.

QuartileTotalCitationsPapersDeposits(Deposit Authors)Deposits/AuthorPapers/AuthorDeposit Rate
Top 125290111649278712322.658513.1921.718
Middle 1119495361027915613106814.61899.1861.591
Bottom 4066163713767847809384912.42119.2671.340
Quartile Citation Impact
(Cites/Papers/Authors)
Hits Impact (all areas)
(Hits/Papers/Authors)
Top0.14111.873
Middle0.004318.185
Bottom0.0001075.085
(Unranked)4.280

### Authors Per Paper

Total number of papers are the number of abstracts, that are after 1995 - we can't get authors before that time:
grep hep-th < q1/d_papers | grep abs | restrictcol '-3/(1991)|(1992)|(1993)|(1994)/' | wc

Total number of authors:
grep hep-th < d_authors | awk '{ print $1 }' | sort | uniq | wc AreaPapersAuthorsAuthors/Papers hep-th1453470360.484 hep-ph1937482660.427 cond-mat20521154250.752 astro-ph20629140270.680 math720052550.730 ### Authors per Paper, by Impact Using the impact level author list, a list of papers by those authors can be compiled. Using that list of papers a list of authors who are named for those papers can be built.  Authors/Paper High Impact Medium Impact Low Impact 1 2 3 4 5 6 7 8 9 10 383 458 240 135 27 9 1 1 0 1 2024 2332 1526 608 128 46 12 3 1 5 8403 7695 5174 2132 689 259 127 67 45 44 ### Paper state by Author Impact The state of papers, by author impact level. ### For Whole Archive Using spotcites data for all papers. What proportion of citations does this cover? wc SCOOT.OUTwc d_papersTotal CitationsTotal "Red Link"/"Orange Link" CitationsAntique 3,090,1311322192,957,912603,460836,945 This gives 100*(603460/2957912) = 20.40% (i.e. 1 in 5 citations), as a proportion of all citations in the archive. (603460/132219) = 4.56 citations/paper identified, against 2957912/132219 = 22.37 citations/paper identified from PDF source. Analysing how many red/orange links have been picked up, by year. Using the raw citation data (paper -> citation), the number of references for a given year can be found by taking the first two digits from the paper reference. The total number of papers deposited in that year can then be found by using a listing of all papers in the archive and using the first two digits of the paper references. When taking the total number of papers, any papers from areas that did not have any references were ignored. Total citations = 597688. Total papers = 115940 (only includes 2000 up to June). YearPapers DepositedCitationsCitations/Paper 91305190.0623 922,8911,2910.447 936,1277,5761.24 948,90119,1712.15 9511,03439,2403.56 9613,70961,0194.45 9717,310100,7145.82 9821,040132,0966.28 9924,163142,8885.91 0010,46093,6748.96 Using quartiles we come up with the following split for authors: QuartileTotalCitationsPapersCites/Author/PaperDepositsMean Deposits/AuthorVariance Top798240,0922,7320.1106,7201.475260.301527 Middle9,262733,27237,3180.0021293,6711.369820.218753 Bottom28,211251,92567,9510.000131165,9711.26650.189012 Mean number of citations/author (ignoring the number of papers those authors have deposited). QuartileTotalSum CitationsMeanVariance Top798240092300.8671213.259 Middle926273327279.170203.716 Low282112519258.93030.281 Mean number of hits/paper (by author impact). ImpactTotalSum HitsMeanVariance High2732226748.299232.487 Medium373181447143.87871.434 Low679511958672.88237.584 This graph shows the proportion of authors with a given deposit rate for different impact levels. The number of authors for each deposit rate is shown. Papers that have authors from different impact levels/% of all unique papers in combined area: High Medium Low - 1586/4.12% 254/0.361% 1586/4.12% - 12881/13.9% 254/0.361% 12881/13.9% - Papers with authors from all three impact levels: 155/ (155/93435) 0.166% This diagram shows the approximate authorship of papers (the area of all the circles are all the papers, and each circle represents the authors of those papers). Therefore where the circles intercect is where papers have authors from more than one impact level. This graph shows the cumulative number of papers against the number of citations for those papers (divided into high, med, low impact authors). Authors per paper (awk '{ print$2 }' d_highimpactauthorpapers | sort | uniq -c | awk '{ c++; s += \$1 } END { print s/c }'):

LevelMeanVariance
High1.564427.50354
Med1.810824.52046
Low1.947484.31381

These graphs show the frequency of papers broken down by the number of citations they receive and by what impact the authors were (so these graphs may feature the same paper more than once, as a individual paper may have more than one author).

This graph shows the age of citations (the time difference between a paper being deposited and its referenced papers being deposited), broken down by the impact factor of the paper's authors.

### How long have authors been depositing?

Using the authored list (paper ref * author name), the time difference in months can be found between the first paper the author deposited and the last. This includes authors who have only one paper in the archive (defined as have a period of 0 months).

Total AuthorsMean Timediff(months)Variance(months)
7506213.824431.147

Author names can not be easily extract pre-1994, so there is a peak at 5 years of usage from all the authors who have continually deposited from before that period, but only appear in 1995.

Looking at the time between every paper deposited by an author:

Total 2+ PapersMean Timediff(months)Variance(months)
1534186.76462.150

This graph is based on taking the time difference, in order, between papers deposited by authors (the yymm part of the paper reference), excluding the time difference between two papers deposited in the same month (i.e. 0).

### Growth of Authors Over Time

By using the meta data "author" field, the number of unique authors of papers per year can be found (for most areas the number of authors can not be easily found at or before 1994).

YearAuthors
1991411
19921152
19931439
19945958
199515198
199617762
199722359
199827785
199932673
2000-0619593

### Analysing number of authors by LANL subfield

Using the authors meta-data field the author list can be found for each paper. The total number of authors and total number of papers can then be found by summing each occurence of a unique author for each area and each occurence of a unique paper for each area. To find the variance the number of authors per paper was also stored.

Because the authors meta-data field did not exist in some areas before 1995 all these years have been ignored.

 Area Authors Papers Authors/Papers Standard Deviation acc-phys 114 43 2.651 8.828 adap-org 305 245 1.245 1.219 alg-geom 649 854 0.760 1.059 ao-sci 19 13 1.462 0.761 astro-ph 12754 17509 0.728 4.244 atom-ph 112 68 1.647 1.258 bayes-an 8 11 0.727 0.273 chao-dyn 1391 1416 0.982 1.733 chem-ph 143 89 1.607 1.551 cmp-lg 641 671 0.955 1.326 comp-gas 119 78 1.526 1.357 cond-mat 13766 17411 0.791 2.272 cs 915 661 1.384 1.960 dg-ga 398 501 0.794 1.025 funct-an 169 213 0.793 0.895 gr-qc 2959 5098 0.580 2.202 hep-ex 2493 1689 1.476 7.065 hep-lat 1299 2517 0.516 3.438 hep-ph 7597 16875 0.450 2.635 hep-th 5987 12788 0.468 1.769 math 3808 5324 0.715 1.131 math-ph 781 705 1.108 1.090 mtrl-th 213 148 1.439 1.765 neuro-dev 2 1 2.000 0.000 neuro-sys 33 13 2.538 1.321 nlin 530 303 1.749 1.365 nucl-ex 2141 432 4.956 13.016 nucl-th 2906 4078 0.713 2.490 patt-sol 435 351 1.239 1.572 phys-lib 3 2 1.500 0.500 physics 2966 2051 1.446 2.459 plasm-ph 48 28 1.714 1.321 q-alg 773 1161 0.666 1.457 quant-ph 2789 3975 0.702 1.782 solv-int 582 747 0.779 1.335 supr-con 137 64 2.141 2.097

We can also analyse the distribution of authors between archive sub-fields by finding the intersection and union between sets of authors from different fields. The values shown is the cardinality of intersection divided by cardinality of union.

 Home