Home | You are at

Hot Dates


Written by Tim Brody, last updated on August 31 2000 10:09:16.

This will be added to once code is "finalised"...

  • countcol (31/08/2000)
  • restrictcol (31/08/2000)
  • uniquecol (31/08/2000)

    Interpretating date formats in the Los Alamos archive

    In order to analyse the relationship between publication dates and dates of submission to the archive the relevent data must be extracted from the "Date:" and "Journal-ref:" fields.

    Date:

    The date field is a computer generated field, however over time the format has changed to a large extent - there are around 10 different formats of time stamp.

    Some examples:

    Tue, 28 Dec 1999 10:00:03 GMT
    Wed, 11 May 94 17:13:03 CDT
    ...

    PERL code to handle date formats:

            # Date format: 06/08/1999 or MM-dd-yyyy
            ($line =~ /(\d\d)\/(\d\d)\/(\d\d\d\d)/ || $line =~ /(\d\d)-(\d\d)-(\d\d\d\d)/) && return $3.$1.$2;
     
            # Date format: 06/08/99 or mm-dd-yy
            ($line =~ /(\d\d)\/(\d\d)\/(\d\d)/ || $line =~ /(\d\d)-(\d\d)-(\d\d)/) && return year2y($3).$1.$2;
     
            # Date format: 6-Jun-1999
            ($line =~ /(\d+)-(\w\w\w)-(\d\d\d\d)/ ) && return $3.mon2m($2).day2d($1);
     
            # Date format: 8 Jun 1999 or 8 June 1999 or ddd, 8 Jun hh:mm:ss 1999
            ($line =~ /(\d+)\s(\w\w\w)\s(\d\d\d\d)[\s,]/ || $line =~ /(\d+)\s(\w\w\w)\w+\s(\d\d\d\d)[\s,]/ ||
             $line =~ /\w+,*\s+(\d+)\s(\w\w\w)\w*\s\d\d:\d\d:\d\d\s(\d\d\d\d)/ ) &&
                    return $3.mon2m($2).day2d($1);
    
            # Date format: 8 Jun 99
            ($line =~ /(\d+)\s(\w\w\w)\s(\d\d)\s/) && return year2y($3).mon2m($2).day2d($1);
     
            # Date format: ddd Jun 8 hh:mm:ss 1999 or ddd, Jun 8 hh:mm:ss 1999
            ($line =~ /\w+\s(\w\w\w),*\s+(\d+)\s\d\d:\d\d:\d\d\s(\d\d\d\d)/) &&
                    return $3.mon2m($1).day2d($2);
            # Date format: 8 Jun 0
            ($line =~ /\s(\d+)\s(\w\w\w)\s(\d)\s/) && return year2y($3).mon2m($2).day2d($1);
    

    Journal-ref:

    The Journal-ref is a author-entered (human readable) value, that should contain the name, and hopefully, the publication date of the journal that the paper has been published in. Most commonly this field contains the journal name, probably the journal number, and the year of publication. Normally the year is expressed in four digits (e.g. 1999), especially near the year 2000. However some authors have expressed the year in '99, or sometimes just 99. In order not to get too many "false positives", all two digit values that are on their own (not in brackets or preceded by a tick) are ignored.

            # Push in years in brackets
            while( s/[\(\[](\d\d\d\d)[\)\]]// ) {
                    push(@years, $1);
            }
            # Push in any years at beginning or end of line
            (s/\D(\d\d\d\d)$//g) && push(@years, $1);
            (s/^(\d\d\d\d)\D//g) && push(@years, $1);
            # Push in remaining four-digit values
            while( s/\D(\d\d\d\d)\D// ) {
                    push(@years, $1);
            }
            # Push in two-digit years e.g. '99, or (99)
            while( s/\'(\d\d)\D// || s/\((\d\d)\)// || s/\'(\d\d)$// ) {
                    if( $1 > 50 ) {
                            push(@years, 1900+$1);
                    } else {
                            push(@years, 2000+$1);
                    }
            }
            # Go through the 4 digit values until we find a likely candidate
            while( ($year = shift(@years)) ) {
                    if( ($year > 1950) && ($year < 2050) ) {
                            return $year;
                    }
            }
    

    Find the remainder from two files

    diff all_file subset_file | grep '^<' | grep abs | awk '{ print $2"\t"$3"\t"$4"\t"$5"\t"$6; }' > remainder_file

  • Home