26 Sep, 2008, quixadhal wrote in the 41st comment:
Votes: 0
Ok, here's a little more clever version. I don't promise it works 100%, nor do I say it's clean. It's a quick little hack that you can extend a bit without TOO much fuss. I like to call it… file_dater.pl.
To add new file types (as in source files to get dates from), edit the $source_types definition. That's part of a regex, the vertical bars are seperaters. You have to escape periods, as they're the "match any single character" magic symbol, and because it's a string, you have to escape the escape character… which is why the backslashes are doubled.
To add new archive types, edit the $file_types definition, and also go add more elsif() clauses in the get_newest_date() function. You need to do whatever command lists the table of contents of the archive, grep out the source_types, pull the date and time fields, and then possibly muck about with the date like I did for zip archives.
If it works right, it should print the full pathname of the file, followed by a TAB delimiters, and then the date of the newest file that matches the source regex (currently C and C++… adding .php, .py, .rb, etc… should be cake).
Enjoy!
#!/usr/bin/perl -w
use strict; use Cwd;
my $arg_dir = shift || "."; my $file_types = "tgz|tar\\.gz|tar\\.bz2|zip"; my $source_types = "c|h|cpp";
sub process_dirs;
sub get_newest_date { my $file = shift; my $date = undef;
return (undef,undef) if !defined $file; my $cwd = cwd(); $file = "$cwd/$file";
#print "FILE: $file\n"; if ($file =~ /\.(tgz|tar\.gz)$/) { open PIPE, "tar -ztvf '$file' | perl -ne 'print \"\$_\" if /.($source_types)\$/i;' | awk '{print \$4\" \"\$5}' | sort -r | uniq | head -1 |"; $date = <PIPE>; chomp $date if defined $date; close PIPE; } elsif ($file =~ /\.(tar\.bz2)$/) { open PIPE, "tar -jtvf '$file' | perl -ne 'print \"\$_\" if /.($source_types)\$/i;' | awk '{print \$4\" \"\$5}' | sort -r | uniq | head -1 |"; $date = <PIPE>; chomp $date if defined $date; close PIPE; } elsif ($file =~ /\.(zip)$/) { open PIPE, "unzip -l '$file' | perl -ne 'print \"\$_\" if /.($source_types)\$/i;' | awk '{print \$2\" \"\$3}' | sort -r | uniq | head -1 |"; $date = <PIPE>; chomp $date if defined $date; close PIPE;
return ($file, undef) if !defined $date; # ZIP has to be weird… date format is MM-DD-YY if ( $date =~ /(\d\d)-(\d\d)-(\d\d)(.*)/ ) { #print "PREDATE: $date\n"; my ($m,$d,$y,$rest) = ($1,$2,$3,$4); $y += 1900; $y += 100 if $y < 1938; # Why 2038? That's the rollover date for time()! $date = "$y-$m-$d$rest"; } else { $date = undef; } } else { #print "Unsupported archive!\n"; } return ($file, undef) if !defined $date; return ($file, undef) if $date !~ /^(19|20)\d\d\-/; #print "DATE: $date\n"; return ($file, $date); }
sub process_files { opendir( DIR, "." ) or die "Cannot open directory: $!"; my @files = grep { /\.($file_types)$/i && -f "$_" && -r "$_" } readdir( DIR ); closedir DIR;
foreach my $file (@files) { next if ! -r "$file"; next if -s "$file" > 100000000; # 100 megs is a BIG mud!
my ($f, $d) = get_newest_date( $file ); print "$f\t$d\n" if defined $f and defined $d; } }
sub process_dirs { my $dir = shift; return if !defined $dir;
chdir $dir; my $cwd = cwd(); #print "NOW: $cwd\n"; opendir( DIR, "." ) or die "Cannot open $dir: $!"; my @subdirs = sort grep { /^[^\.]/ && -d "$_" && !-l "$_" && -r "$_" } readdir( DIR ); closedir DIR;
process_files();
foreach my $d (@subdirs) { #print "RECURSE: $d\n"; process_dirs($d); #last; } chdir ".." if $dir ne "."; }
process_dirs($arg_dir); 1;
06 Oct, 2008, quixadhal wrote in the 42nd comment:
Votes: 0
Ok, I just discovered the pastebin! You guys are in trouble now…. *grin*
To add new file types (as in source files to get dates from), edit the $source_types definition. That's part of a regex, the vertical bars are seperaters. You have to escape periods, as they're the "match any single character" magic symbol, and because it's a string, you have to escape the escape character… which is why the backslashes are doubled.
To add new archive types, edit the $file_types definition, and also go add more elsif() clauses in the get_newest_date() function. You need to do whatever command lists the table of contents of the archive, grep out the source_types, pull the date and time fields, and then possibly muck about with the date like I did for zip archives.
If it works right, it should print the full pathname of the file, followed by a TAB delimiters, and then the date of the newest file that matches the source regex (currently C and C++… adding .php, .py, .rb, etc… should be cake).
Enjoy!