There is a log file where each

Amazon Interview Question for Software Engineer / Developers

0

of 0 votes

16
Answers
There is a log file where each line contains fields separated by spaces. One of the fields is the IP address of the source node. We want to find the list of IP Addresses that have the most log entries. Lets say find the top 10 IP Addresses with most log entries. Give the pseudo code for this?
- NJ October 23, 2009 | Report Duplicate | Flag | PURGE
Amazon Software Engineer / Developer Perl

Email me when people comment.

An error occurred in subscribing you.

More Questions from This Interview

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 2 vote

+ Hash to find the number of logs per IP address
+ Use a min-heap to find the top 10 from the hash

- Code Monkey October 26, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

I think we have to maintain max heap of 10 elements corresponds to this hash table entries (Not MIN Heap)

- Shwetank Gupta October 27, 2009 | Flag

Comment hidden because of low score. Click to expand.

of 1 vote

I solved this by scanning each line for the IP address, incrementing the respective count in a hash, then sorting the hash by value in the end.

#!/usr/bin/perl

open(FILE, "/path/to/log/file") or die "Sorry! $!";

my %hash = ();

while (my $line = <FILE>) {
	if ($line =~ m/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/) {
		$hash{$1}++;
	}
}

foreach (reverse sort {$hash{$a} cmp $hash{$b}} keys %hash) {
	print "$_: $hash{$_}\n";
}

- Jman March 12, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Seems like a good solution.

- D October 31, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

code monkey is rite...we have to compare it with min value of the heap..in min heap it can be found in O(1) and replace it with new val whose count is more

- Anonymous October 29, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Here is shell script code for doing it:

awk '{print $1}' access_log | sort -u > uniqueips.txt -- first get unique ips assuming ips is in one column.

for i in `cat uniqueips.txt `; do echo $i; grep $i access_log | wc -l; done

capture it in file and sort based on second field to get the top ten ips.

- Anon November 29, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Here is shell script code for doing it:

awk '{print $1}' access_log | sort -u > uniqueips.txt -- first get unique ips assuming ips is in one column.

for i in 'cat uniqueips.txt'; do echo $i; grep $i access_log | wc -l; done

capture it in file and sort based on second field to get the top ten ips.

- Anon November 29, 2009 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

cat log_file | awk -f " " '{print $4}'| sort -u > toptenIPadd_file

Correct me if im wrong

- funny August 17, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

cat dog cow fox ...sab yahi laga do ...

- Anonymous October 04, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

cat dog cow fox ...sab yahi laga do ...

- bond !!! October 04, 2010 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

This is a question about perl.
Following code gives the list of IPaddress in descending order of its number of count.

use List::MoreUtils qw/uniq/;

open (FILE,"logfile.txt") || die "can't open: $!\n";
my $cnt=0;
my $i=0;
my @ipaddr;
my @all=<FILE>;
seek (FILE,0,0);
while (my $line = <FILE>){
my @new = split(" ",$line);
if ($cnt == 0){
foreach $str (@new){
if ($str =~ /\d+\.\d+\.\d+\.\d+/) {
$index=$cnt; last;
}
$cnt++;
}
}
$ipaddr[$i++]=$new[$index];
}
@ipaddr = uniq @ipaddr;

my @ipcount;
my $i=0;
foreach $str (@ipaddr) {
$num=grep ( /$str/,@all);
$ipcount[$i++]="$str $num";
$num=0;
}

@ipcount=reverse sort{(split " ",$a)[1] cmp (split " ",$b)[1]} @ipcount;
foreach $str (@ipcount){
print "$str\n";
}
close (FILE);

output is:
13.12.12.10 9
10.20.30.21 6
15.12.13.30 4

- Disha Sharma September 21, 2011 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Let's start with a sample log file
192.12.4.5 data 10/12/1 blat
191.12.4.1 date 10/12/a bla
191.12.4.2 date 10/12/a bla
191.12.4.3 date 10/12/a bla
191.12.4.4 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.6 date 10/12/a bla
191.12.4.7 date 10/12/a bla
191.12.4.8 date 10/12/a bla
191.12.4.9 date 10/12/a bla
191.12.4.10 date 10/12/a bla
191.12.4.11 date 10/12/a bla
191.12.4.12 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
191.12.4.5 date 10/12/a bla
192.12.4.5 date 10/12/a blaa
190.12.4.5 data 10/12/1 blat
191.12.4.3 date 10/12/a bla

Now, since this is perl, we ignore for now complexity. For a long file (more than 1000 lines), one can do better without sort - just scan for the highest, remove it and repeat 10 times.

Note, that they ask for pseudo code , as sorting hash by values is somewhat tricky. I give here both the pseudo code (in a comment) and the real code.


#!/usr/bin/perl -w
use strict;

my %all_IPs;

# Read the log file
for (<>) {
	my @fields = split;
	$all_IPs{$fields[0]} ++;
}

# @best = %all_IPs sorted by keys in a reversed order
my @best = reverse sort  {$all_IPs{$a} cmp $all_IPs{$b}}(keys %all_IPs);

# print the best 10
for (1..10) {
	print $best[$_]."\n";
}

- leofer October 22, 2012 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

To print the best 10 you need to start from 0 and not from 1
for (0..9) {
print $best[$_]."\n";
}

- Nakul May 20, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Hi Ieofer,
I think by
my @best = reverse sort {$all_IPs{$a} cmp $all_IPs{$b}}(keys %all_IPs);
you meant

# @best = %all_IPs sorted by "$all_IPs{$each_key} in a reversed order,
not sort by keys, as we are sorting by frequencies.

- DavidHo July 12, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

also, do you think this works?

my @best = sort { $all_IPs{$b} <=> $all_IPs{$a} } (keys %all_IPs)

- DavidHo July 12, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

#!/usr/bin/perl -w

use strict;

my $file_path = "F:\\Thesis Work\\SLAMS_IP.csv";

open(FILE, $file_path) or die("Unable to open a file, $!");
open(HANDLE, ">file.txt") or die("Unable to open a file, $!");

my %ip_hash;
my $count;
while(defined(my $lines = <FILE>))
{
	if($lines =~ m/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/)
	{
		if(!(defined($ip_hash{$1})))
		{
			$count = 1;
		}
		else
		{
			$count++;
		}
		$ip_hash{$1} = $count;
	}
}

foreach my $key(sort {$ip_hash{$b} <=> $ip_hash{$a}}keys %ip_hash)
{
	print HANDLE "$key:$ip_hash{$key}\n";
}
close(HANDLE);
close(FILE);

- lochan.brijesh March 02, 2015 | Flag Reply

CareerCup

Amazon Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews