Amazon Interview Question
Software Engineer / DevelopersI solved this by scanning each line for the IP address, incrementing the respective count in a hash, then sorting the hash by value in the end.
#!/usr/bin/perl
open(FILE, "/path/to/log/file") or die "Sorry! $!";
my %hash = ();
while (my $line = <FILE>) {
if ($line =~ m/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/) {
$hash{$1}++;
}
}
foreach (reverse sort {$hash{$a} cmp $hash{$b}} keys %hash) {
print "$_: $hash{$_}\n";
}
Here is shell script code for doing it:
awk '{print $1}' access_log | sort -u > uniqueips.txt -- first get unique ips assuming ips is in one column.
for i in `cat uniqueips.txt `; do echo $i; grep $i access_log | wc -l; done
capture it in file and sort based on second field to get the top ten ips.
Here is shell script code for doing it:
awk '{print $1}' access_log | sort -u > uniqueips.txt -- first get unique ips assuming ips is in one column.
for i in 'cat uniqueips.txt'; do echo $i; grep $i access_log | wc -l; done
capture it in file and sort based on second field to get the top ten ips.
cat log_file | awk -f " " '{print $4}'| sort -u > toptenIPadd_file
Correct me if im wrong
This is a question about perl.
Following code gives the list of IPaddress in descending order of its number of count.
use List::MoreUtils qw/uniq/;
open (FILE,"logfile.txt") || die "can't open: $!\n";
my $cnt=0;
my $i=0;
my @ipaddr;
my @all=<FILE>;
seek (FILE,0,0);
while (my $line = <FILE>){
my @new = split(" ",$line);
if ($cnt == 0){
foreach $str (@new){
if ($str =~ /\d+\.\d+\.\d+\.\d+/) {
$index=$cnt; last;
}
$cnt++;
}
}
$ipaddr[$i++]=$new[$index];
}
@ipaddr = uniq @ipaddr;
my @ipcount;
my $i=0;
foreach $str (@ipaddr) {
$num=grep ( /$str/,@all);
$ipcount[$i++]="$str $num";
$num=0;
}
@ipcount=reverse sort{(split " ",$a)[1] cmp (split " ",$b)[1]} @ipcount;
foreach $str (@ipcount){
print "$str\n";
}
close (FILE);
output is:
13.12.12.10 9
10.20.30.21 6
15.12.13.30 4
Let's start with a sample log file
192.12.4.5 data 10/12/1 blat
191.12.4.1 date 10/12/a bla
191.12.4.2 date 10/12/a bla
191.12.4.3 date 10/12/a bla
191.12.4.4 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.6 date 10/12/a bla
191.12.4.7 date 10/12/a bla
191.12.4.8 date 10/12/a bla
191.12.4.9 date 10/12/a bla
191.12.4.10 date 10/12/a bla
191.12.4.11 date 10/12/a bla
191.12.4.12 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
192.12.4.5 date 10/12/a blaa
190.12.4.5 data 10/12/1 blat
191.12.4.3 date 10/12/a bla
Now, since this is perl, we ignore for now complexity. For a long file (more than 1000 lines), one can do better without sort - just scan for the highest, remove it and repeat 10 times.
Note, that they ask for pseudo code , as sorting hash by values is somewhat tricky. I give here both the pseudo code (in a comment) and the real code.
#!/usr/bin/perl -w
use strict;
my %all_IPs;
# Read the log file
for (<>) {
my @fields = split;
$all_IPs{$fields[0]} ++;
}
# @best = %all_IPs sorted by keys in a reversed order
my @best = reverse sort {$all_IPs{$a} cmp $all_IPs{$b}}(keys %all_IPs);
# print the best 10
for (1..10) {
print $best[$_]."\n";
}
To print the best 10 you need to start from 0 and not from 1
for (0..9) {
print $best[$_]."\n";
}
Hi Ieofer,
I think by
my @best = reverse sort {$all_IPs{$a} cmp $all_IPs{$b}}(keys %all_IPs);
you meant
# @best = %all_IPs sorted by "$all_IPs{$each_key} in a reversed order,
not sort by keys, as we are sorting by frequencies.
#!/usr/bin/perl -w
use strict;
my $file_path = "F:\\Thesis Work\\SLAMS_IP.csv";
open(FILE, $file_path) or die("Unable to open a file, $!");
open(HANDLE, ">file.txt") or die("Unable to open a file, $!");
my %ip_hash;
my $count;
while(defined(my $lines = <FILE>))
{
if($lines =~ m/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/)
{
if(!(defined($ip_hash{$1})))
{
$count = 1;
}
else
{
$count++;
}
$ip_hash{$1} = $count;
}
}
foreach my $key(sort {$ip_hash{$b} <=> $ip_hash{$a}}keys %ip_hash)
{
print HANDLE "$key:$ip_hash{$key}\n";
}
close(HANDLE);
close(FILE);
+ Hash to find the number of logs per IP address
- Code Monkey October 26, 2009+ Use a min-heap to find the top 10 from the hash