Google Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: In-Person
I was also thinking that sum of squares.
Sum results high probability of different same length set but squares results less probability ;)
But there is is a chance of getting same result of same different set. Now we can do a trick here (cross check)..
NB: As it's said that complexity less than log(N), I think sorting should not be used which requires typical log(N) complexity
We can create a DS for it
// Represents each line
class FileLine {
// supposed to be unique for same length set
long code = <sum of squares of all numbers>;
Set<Integer> = ... <Set of numbers>
// set size int length = ..
}
// as system reads the line of file, this set gets populated with new entries // hashcode based on assigning code (sum of squares) and length
Set<FileLine> soFarReadLines = ....
1. Read line by line and populate soFarReadLines set
2. When next line is read then calculate the code of line using sum of squares and then look into the set and if match found (code and same length) then we need to do cross check whether it's same set or not (match element one by one). If no match found then it's unique line. If match found but different set then also it's unique line. So this is the trick to make sure whether same code results same length different set.
3. If duplicate line then no need to do entry into soFarReadLines set I think without sorting each line, it's a better aproach
First, create a large table of prime numbers {P1, P2, P3, ... Pn} (where n is larger than the largest number you will encounter)
Second, when you encounter a line with the numbers {a b c d}, multiply together the numbers Pa, Pb, Pc and Pd and store in a hashtable.
When you encounter another line with the numbers {b d c a} and multiply then primes together, you will find that the entry already exists in hashtable, in constant time
@aka just that a number formed by a multiplication of primes cannot be composed by any other set of primes. So if you multiply together a bunch of prime numbers, it does not matter in which order you multiplied them, you always get the same number. Since the numbers are prime, you cannot get the numbers in any other way, through any other set of primes...
Maintain a hashmap, where key is the sorted line.
e.g if a line is "3 2 6 1 4", it will be treated as "1 2 3 4 6" in hashtable.
Now, for each line:
1. sort it
2. lookup in hashmap
3. if it is not present, then enter it in hashmap and output it
4. if it is present, then don't do anything and continue to next line.
Python Code:
Tried using the same logic with sum as well as storing everything as a sorted array so it becomes a nxm array.
f is the file object in the code. I need to pass the path to the file as f.
def printunique_sum(f):
temp=[]
with open(f,'r') as fp:
for lines in fp:
x=sum([ int(x) for x in lines.replace('\n','').split() ])
if x not in temp:
print lines,
temp.append(x)
def printunique_sort(f):
temp=[]
with open(f,'r') as fp:
for lines in fp:
x=sorted([ int(x) for x in lines.replace('\n','').split() ])
if x not in temp:
print lines,
temp.append(x)
Two lines are equal if there sum and product of the numbers match.
Start scanning each line of the file, for every line store the sum and product. Print the line, whenever a new line with already seen sum and product occurs move to the next line.
Isn't the complexity here O(n), n being the number of lines in the file? But, the OP says the solution should be better than O(log(n))? Although I doubt how we can do less than O(log(n)).
Can you know the data beforehand without even parsing it?
If not then it is not possible to give the solution in o(logn)..
You should also compare the length of the lists.
In addition, this algorithm would not work especially if the elements can be 0 or negative. Example:
[1 0 1]
[2 0 0]
[-1 0 -1]
I guess that will also work.
I am not able to get any example where it will not work.
Only thing is that may be squaring and adding will be a costly operation than taking the product.
What do you mean by constant time ?
- gevorgk July 15, 2013Just to read file with N lines and with K elements in each line you have to do O(N*K) operations. Looks like you didn't understand the problem correctly dude.
Big minus for this.