Google Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: Written Test
Well, here comes the English:
The idea of the problem is from the famous game: Tetris
Lets see how it works. Consider m = 5;
Given an array : 4 3 3 2 1 2 3 4 4 7, we treat each number as a piece in Tetris, which falls down from the ceil. Our task is to try to keep the same number stacked on the same column. Consider the moment that 7 is going to fall down. The snapshot of our game now is like :
7
4
4 3 2
4 3 2 1 _
Note that, the size of a row is designed as M, which is 5 here.
Just like Tetris, this game has a similar rule that:
if a row is full of numbers then it should be eliminated.
So when 7 goes down, it becomes:
4
4 3 2
4 3 2 1 7 //This row is full and to be eliminated
Then the bottom row is gone.
It becomes
4
4 3 2
As the numbers falls down, eventually, the game will end in a status that no row is full.
So we have at most M - 1 numbers left at the final stage.
But it is not over. We can easily prove that, if there is a solution, it must be in the number left at the final stage; but we can not guarantee all numbers left are the correct solution.
So we need to scan the array again, and count the numbers left to find the correct solutions and report their frequencies.
Now the idea is clear, but the implementation of this idea is more crucial. A bad implementation would result in a very bad complexity. In my code, I used std::map, which, more theoretically, is RB tree.
For each number, I inserted it into the map. I have the rule from the Tetris that the map size cannot reach M. If an existing number is inserted, the count of the number in map is increased by 1; if a new one comes, it is inserted with count = 1; if the map reaches the size limitation M, all the counts in map are forced to be decreased by 1. Of course, if it counts to 0, eliminate it! This is how it works like a Tetris.
You may ask why it is still O(NlogM) if I need to traverse the whole map to decrease each count by 1. That's because in total, every number is added once and erased once, so the amortized complexity is still O(N) for this part. So the total is still O(NlogM).
The remaining part of this algo is trivial then. I believe everyone here can figure it out.
Finally, btw, the variable "total" in my code plays no role but was for debugging, feel free to delete it: )
True, you nailed it, great job. while going through other solutions i felt that yours is a generalization of majority voting algorithm (correct me if thats wrong).
However, on a curious note, you could have used 2 arrays, for std::map, and would have avoid that logM factor. (i know thats constant anyway). Thanks for posting great solution.
@Fentoyal
Superb Solution..I was thinking one possible tweak though..U scan for each element left in the tetris to check if its the desired answer..For this we can do this
1) Take a variable count.When ever u elimiate one row increment count by 1.
2) Now say 3 rows were deleted(count =3) and size is 15 and we need to find which element is present say 15/3 (N=15,M=3)=5 times . So we know each element has appeared 3 times so far (value of count) . Now scan the remaining elements in the map and check if any one's count is >=2. Thats the answer. So instead of scanning the whole array again for each remaining element we just scan the remaining elements.
What if there are N elements in input and M distinct numbers and each occurs exactly n/m times. For example:
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
The question is to find all the elements that occur exactly 2 times.
Also you are using a hashtable, complexity of access is O(1). We are scanning the hashtable to decrease count when a row is full. This can happen N/M times in the worst case. So the complexity comes out to be O(n) + O(m*n/m) = O(n)
What if there are N elements in input and M distinct numbers and each occurs exactly n/m times. For example:
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10
The question is to find all the elements that occur exactly 2 times.
Also you are using a hashtable, complexity of access is O(1). We are scanning the hashtable to decrease count when a row is full. This can happen N/M times in the worst case. So the complexity comes out to be O(n) + O(m*n/m) = O(n)
Obviously, this algorithm can only find the element that occurs > N/M times not >=. But it is exactly what the original question asks.
How would you know at the start that there are M different numbers in the array. In my opinion all we can know about that at the start that if there will be at one number with the count more than n/3 then there can be at most n - n/3 distinct numbers in the array. And if in the case there are where there are more than one number in the array with the count more than n/3 then all we know that there can be at most 2 of these numbers and the maximum number of distinct numbers in the list are n - 2(n/3) distinct numbers in the array....
Hi, Abdul
Please make sure you fully understand the algorithm since I did explain a lot. I do not need the assumption that there are M different numbers for this algorithm. The task is only to find the ones that appear more than N/M times.
like:
1 2 3 4 4 4 4 5 5 5 5 6.
N = 12
M = 4
There are 6 different numbers.
The goal is to find ones occurring > 12/4 = 3 times.
4, occurs 4 times.
5, occurs 4 times.
How about this case:
1 1 1 2 2 2 3 3 3
Size: 9
M: 2
Your algorithm seems to output only 3 in this case. Please correct if I'm wrong here.
So how do you explain that in array
1,2,3,4,5,6,7,8,9,11,11,11 and M=3
we still will get 11 as our answer, while we have 3 repetitions, not 4.
But seems that it said no excessive space can be used, and no hash. Using a map violated both requirements...
But seems that it said no excessive space can be used, and no hash. Using a map violated both requirements...
bellow is the java implementation
public ArrayList<Integer> findRepetedElements(int[] arr, int M) {
ArrayList<Integer> result = new ArrayList<Integer>();
TreeMap<Integer, Integer> tm = new TreeMap<Integer, Integer>();
int minCount = (int) Math.ceil(((double) arr.length) / M);
for (int i = 0; i < arr.length; i++)
add(tm, arr[i], M);
// find count of elements left in the tree
Iterator<Entry<Integer, Integer>> iter = tm.entrySet().iterator();
Entry<Integer, Integer> e;
int value;
int currCount;
while (iter.hasNext()) {
currCount = 0;
e = iter.next();
value = e.getValue();
for (int i = 0; i < arr.length; i++)
if (arr[i] == value)
currCount++;
if (currCount >= minCount)
result.add(value);
}
return result;
}
private void add(TreeMap<Integer, Integer> tm, Integer key, int maxSize) {
Integer mpValue = tm.get(key);
if (mpValue == null)
tm.put(key, 1);
else
tm.put(key, mpValue + 1);
if (tm.size() == maxSize)
clearRow(tm);
}
private void clearRow(TreeMap<Integer, Integer> tm) {
Iterator<Entry<Integer, Integer>> iter = tm.entrySet().iterator();
Entry<Integer, Integer> e;
int value;
while (iter.hasNext()) {
e = iter.next();
value = e.getValue();
if (value == 1)
iter.remove();
else
e.setValue(value - 1);
}
}
U use the std:map to store the count..which seems to violate the requirement of no excessive storage
Hi, Anonymous on February 26, 2012 ,
Plz try your input 1,2,3,4,5,6,7,8,9,11,11,11 to this algorithm. It outputs nothing, because there is no solution.
Hi, liangzhongliang on March 01, 2012 & Anonymous on April 19, 2012
As I have explained in the very beginning, the algorithm requires O(M) space. Since M for this original question is 3. so it requires O(3) space. It is technically O(1) space. In my implementation, the map has a size limit : M. So it cannot ever grow larger than 3. That is, the map i used has only 3 elements! This is not a excessive storage. It consumes just like you did in every program: int i, j, k; <--- this uses O(3) too, and people do not call constant space storage excessive storage because it is not.
@bashrc, no, I am not. But I have several friends who got its offer. Someone did go there and someone rejected the offer. I have not even interviewed with MS. Why are you asking?
The solution is good. But its space complexity is O(n) in worst case(when no element is repeated).
In this case what if you go for solution using hash? (hash function is going to be critical)
for example 111123456789 or 123123123 , this would fail as there would be no element left in any row finaly, but answer is sure, here m = 3 .
Misra-Gries Algorithm
apps.topcoder.com/forums/;jsessionid=182CD38E8BD7E48B3C15FF2B80776811?module=Thread&threadID=729518&start=0&mc=12#1463747
@Paidi @Navaneet, @etc,
After read these comments and suspects on this algorithm, I'd like to highlight some points to make it clear. So anyone questioning the complexity or correctness could firstly double check the algorithm with these points:
0. It is meant to find those that occur more than N/M times
1. not those that duplicate more than "M" times
2. Neither are those that appear "exactly" N/M times. It can only find those that repeat "more than" N/M times, which is exactly the interview question asked. And, in fact, to report those that appear "exactly" N/M times will turn this problem to a completely different one.
3. The space complexity is O(M), in any case. There is no such "worst case" thing, because for any input, I always allocate a M sized BST structure(in my implementation, it was std::map).
4. We do call O(M) space complexity as "constant storage" or "no excessive storage required"if M = O(1) i.e. M is a constant number, which holds in this interview question where M = 3, always.
Given that so many people are interesting this solution, I leave my email here:
the.easiest.name.to.remember@gmail.com
if you have any questions, you may contact me directly
@fentoyal,
Very good solution. Anyone up for C# implementation? Or I will try to implement this in C# this weekend.
Hey Fentoyal, I think your explanation is good, especially when equating to Tetris. But, I have difficulty believing that this is O(1) space or that it will find the correct answer.
Consider that you have N = 300 and M = 3. And the first 101 elements are 1 and the rest are 2 and 3. In this case, the first column goes to size 101 as there are no complete rows to delete. So, the space complexity is N/3 or O(N). Or if the numbers 1 and 2 keep repeating till 298. Then your space complexity becomes O(2n).
You can certainly make this better is if you keep the count of the numbers rather than keep a list of them in your columns, as Ankur suggested. Then this is essentially a map of the positions and their count.
Am I missing something here?
Consider the following:
N = 10
M = 4 (which is greater that 10/3)
Sequence = 1 2 3 5 1 4 1 4 1 4
Answer is 1 (as it is the only number that repeats 4 times)
Based on my understanding, which could be wrong, the states will be:
1
1 2
1 2 3
1 2 3 5
clear
1 4
1 4
1 4
And you are left with 2 elements, 1 and 4. So, how do you distinguish which one is the right answer?
PS: I really want to like and understand your solution and I hope I missed something in understanding your solution. I hate to be the messenger of bad news here :)
Yes, you did misunderstood it.
As I said " but the implementation of this idea is more crucial".
If you implemented it that way, you'll end up in a O(N) solution.
Check my third posts on how to implement it.
I did a counting.
So, for your first column, why should I bother storing all 101 same numbers? I just need to store (1, 101). That's enough.
So I need only M space, each representing a column. And each space only storing a single integer representing how high the column is.
I hope it is clear now.
Again, because I can't check this thread all the time, if anyone got any new questions, write to my email:
.
the.easiest.name.to.remember@gmail.com
There can be atmost 2 elements that appear more than n/3 times.
Suppose given array is A. Use a 2d array of size 2 ( or u can use 4 variables ) -> RES[2][2] which stores the two possible values in R[0][0] and R[1][0]
RES[0][1] -> count of RES[0][0] in array
RES[1][1] -> count of RES[1][0] in array
The trick is if we keep cancelling 3 distinct elements then the remaining one/two final values will be the possible answer ( like for majority element, we keep cancelling two distinct elements )
for each A[i] in A:
if RES[0][1] == 0:
RES[0][0] = A[i]
RES[0][1] = 1
else if RES[0][0] == A[i]:
RES[0][1]++
else if RES[1][1] == 0:
RES[1][0] = A[i]
RES[1][1] = 1
else if RES[1][0] == A[i]:
RES[1][1]++
else
RES[0][1]--
RES[1][1]--
count1 = count2 = 0
for each A[i] in A:
if A[i] == RES[0][0]:
count1++
else if A[i] == RES[1][0]:
count2++
if count1 > size(A)/3:
RES[0][0] is first desired no.
if count2 > size(A)/3:
RES[1][0] is second desired no.
Complexity : O(n)
Hey first thing which I wanna tell you is that there can b atmost 3 elements. Secondly you are not supposed to use additional memory. :)
@CodeCracker : The idea is correct... and there will be only 2 such elements..
lets say the number of elements is 99.. Then more by than by n/3 means, (n/3 + 1), which is 34...
So there 34 + 34 = 68.. the count of the third one is going to be 31.
So that leaves the possibility of just 2 numbers.
And there is no additional memory used here.
@erberuz The idea looks fine based on the majority element algo,had a confusion though
For sequence like 12312
When we encounter 3, are we not reducing count1 to 0 and count2 to 0 as well making a[maximum1]=3 and a[maximum2]=3 as well.
Are you sure its working perfectly, an example would help.Thanks
Yes, this question was posted before and that solution is correct.
This is a variation of linear time majority voting algo.
The key idea behind this method is that such n/3 majority numbers
will still remain n/3 majority if you remove three different numbers
from the set if exist. This algo keeps doing so until you have only
less than three elements left. And you need to check if any of them
are really n/3 majority numbers.
I think it should iterate twice to find two majority elements. but still achievable in O(n).
I think the idea is correct, but the code should be modified. I use the same idea, but use 3 parameters to remember the number and 3 parameters for counting. Here is my solution:
for each A[i] in A:
if A[i] == Num[0]:
Count[0]+=2;
Count[1]--;
Count[2]--;
else if A[i] == Num[1]:
Count[0]--;
Count[1]+=2;
Count[2]--;
else if A[i] == Num[2]:
Count[0]--;
Count[1]--;
Count[2]+=2;
else
Count[0]--;
Count[1]--;
Count[2]--;
if Count[0] == 0:
Num[0] = A[i];
Count[0] = 2;
else if Count[1] == 0:
Num[1] = A[i];
Count[1] = 2;
else if Count[2] == 0:
Num[2] = A[i];
Count[2] = 2;
As far as number X is more than n/3, it will has count as m*2 - (n-m) > 0, m is the appear time of X. So X will stay in the parameter when the algorithm finished.
This should works for the scenario that when there is 2 majority elements.
Made little change, it should be fine now.
@GS, only one more array iteration will do it ( store count of RES[0][0] and RES[1][0] )
Have you even tried the solution. Does not work and does not make sense. The idea is not majority voting. The idea is more than that.
PS: I am trying your solution for array sizes 6,7,8,9 and it keeps failing
Hi Cerberuz:
1. which 2 initial values should be filled in res[0][0] and res[1][0]?
2. Are they examining the same A[i] in each iteration?
3. As per Ashar's example 123123, if using your way, it will both have 2 as a result, while 1 will be missed.
Can you explain? Thanks
What if the array size 'n' is a multiple of 3? Would the maximum number of elements be repeated n/3 times is 2? I don't think so: n=3, A[a, b, c]. 3 elements. Please let me know.
@Cerberuz, I am really not sure how this will work for something like 1, 1, 2, 2, 3, 3, 1, 1
Sorry but in the question is not clear if the numbers are only 1, 2 and 3 or can be any other number (or comparable element). Es: [2,3,4,1,7,5,3,4,5,8,8,8,3,4,5]. In this case the solution has no sense. What's the difference of using 3 pairs of variables or a Map with 3 possible keys??? The constraints is don't use hashing or more (eccessive) space.
#include <iostream>
#include <vector>
#include <cstdio>
using namespace std;
void find(vector<int> data)
{
int one, two;
int num1 = 0;
int num2 = 0;
vector<int>::iterator it;
for (it = data.begin(); it != data.end(); ++it)
{
if (num1 == 0)
{
one = *it;
num1 = 1;
}
else if (one == *it)
{
++num1;
}
else if (num2 == 0)
{
two = *it;
num2 = 1;
}
else if (two == *it)
{
++num2;
}
else
{
--num1;
--num2;
if (num1 == 0 && num2 != 0)
{
one = two;
num1 = num2;
num2 = 0;
}
}
}
if (num1 > 0)
{
num1 = 0;
for (it = data.begin(); it != data.end(); ++it)
{
if (*it == one)
{
++num1;
}
}
if (num1 > data.size() / 3)
{
cout << one << endl;
}
}
if (num2 > 0)
{
num2 = 0;
for (it = data.begin(); it != data.end(); ++it)
{
if (*it == two)
{
++num2;
}
}
if (num2 > data.size() / 3)
{
cout << two << endl;
}
}
}
int main()
{
int t;
vector<int> v;
/*while (scanf("%d", &t) != EOF)
{
v.push_back(t);
}*/
while (cin >> t)
{
v.push_back(t);
}
find(v);
return 0;
}
The idea of the algorithms seems to be the same as in the previous post (by nio, who simply provides link to the solution by fentoyal), and the code is quite clean, I like it. However, the complete lack of comments makes it unreadable, unless you are aware of the idea (which fentoyal calls 'tetris rules').
EDIT: deleted incorrect paragraph.
The last paragraph in the previous comment is wrong: counting decrements would not provide useful info. Oops.
1. Find the n * 1/3 and n * 2/3 largest element using QuickSelect. In order for an element to occur more than 1/3 of the time in the array, it must be one of these two elements.
2. Do a linear pass through the array to see if the array contains these two elements.
@Dr. Andrew: What you are saying is not applicable for the array: 1,1,1,3,4,5
The value of n is 6. Hence the n/3(or 2nd) largest element is 3 and 2n/3(or 4th) largest element is4. And neither of them is repeated more than n/3 times. Instead, 1 is repeated more than n/3 times!
Correct me if I am wrong.
@alex: I haven't implemented it myself yet, but the QuickSelect algorithm (see Wikipedia), as I understand it, would give you the nth smallest element in an unordered list. So in your list of 6, I'm saying it would pick the 2nd and 4th largest elements, counting duplicate elements, so the 2nd largest is 1 and the 4th largest is 3. Any list longer than 1/3 of the list has to touch one or both of those ranks.
@Dr. Andrew: If you count the duplicates, it might be valid, provided you take the ceiling of n/3 and 2n/3. Thanks!
Of course it is correct. The sequence is larger then n/3 of the size of the vector and it has to occupy the (n/3)th or (2n/3)th position in vector. This is the correct answer.
by definition, you can only have 2 values that are in the set for more than 1/3.
therefore, if sorted (which we wont do), the 2 values would have to be visible at n/3 and/or 2n/3 (and/or because 1 or 2 can be more than 1/3)
so selection/rank (should be expected O(n)) those 2 positions.
then iterate over the the entire array counting those that match those 2 positions.
Algorithm
similar as the linear majority voting algorithm, we maintains two numbers having top two votes
if it's the number having the top votes, increment the top votes by 1
if it's the number having the second top votes, increment the second votes by 1 and swap it with the top votes number if it has more votes
if the number having the top votes is not defined, make the number top votes with vote 1
if the number having the second top votes is not defined, make the number second top votes with vote 1
otherwise, decrement the votes of top two votes
when the votes become zero, the number of the top or the second top are undefined
Time: O(n)
Space: O(1)
ideone.com/8yNNIW
public void find(int[] a)
{
int first = Integer.MIN_VALUE; // the number with the top votes
int firstVote = 0; // the top votes
int second = Integer.MIN_VALUE; // the number with the second top votes
int secondVote = 0; // the second top votes
for(int i: a) {
if(firstVote > 0 && i == first) {
firstVote++;
} else if(secondVote > 0 && i == second) {
secondVote++;
if(secondVote > firstVote) {
int t = first;
first = second;
second = t;
t = firstVote;
firstVote = secondVote;
secondVote = t;
}
} else if(firstVote == 0) {
first = i;
firstVote++;
} else if(secondVote == 0) {
second = i;
secondVote++;
} else {
firstVote--;
secondVote--;
}
}
// confirm if the number with top 2 votes occurs more than 3/n times
int firstCount = 0;
int secondCount = 0;
//System.out.println(first + "," + second);
for(int i: a) {
if(firstVote > 0 && i == first) firstCount++;
if(secondVote > 0 && i == second) secondCount++;
}
if(firstCount > a.length / 3) System.out.println(first);
if(secondCount > a.length / 3) System.out.println(second);
}
Algorithm:
The idea of the problem is from the famous game: Tetris
Lets see how it works. Consider m = 5;
Given an array : 4 3 3 2 1 2 3 4 4 7, we treat each number as a piece in Tetris,
which falls down from the ceil. Our task is to try to keep the same number stacked on the same column.
Consider the moment that 7 is going to fall down. The snapshot of our game now is like :
7
4
4 3 2
4 3 2 1 _
Note that, the size of a row is designed as M, which is 5 here.
Just like Tetris, this game has a similar rule that:
if a row is full of numbers then it should be eliminated.
So when 7 goes down, it becomes:
4
4 3 2
4 3 2 1 7 //This row is full and to be eliminated
Then the bottom row is gone.
It becomes
4
4 3 2
As the numbers falls down, eventually, the game will end in a status that no row is full.
So we have at most M - 1 numbers left at the final stage.
But it is not over. We can easily prove that, if there is a solution, it must be in the number
left at the final stage; but we can not guarantee all numbers left are the correct solution.
So we need to scan the array again, and count the numbers left to find the correct solutions and report their frequencies.
Implementation:
struct node
{
int val;
int count;
};
Create following var for keeping track
struct node tracker[3];
for each node in input array arr
{
if(tracker[0].val == arr[i])
{
tracker[0].count++;
}
else if(tracker[1].val == arr[i])
{
tracker[1].count++;
}
else if(tracker[2].val != 0)
{
tracker[0].count--;
tracker[1].count--;
tracker[2].count--;
}
if(tracker[0].val == 0)
{
tracker[0].val = arr[i];
tracker[0].count++;
}
else if(tracker[1].val == 0)
{
tracker[1].val = arr[i];
tracker[1].count++;
}
else if(tracker[2].val == 0)
{
tracker[2].val = arr[i];
tracker[2].count++;
}
}// for end
// reset the counters
tracker[0].count = 0;
tracker[1].count = 0;
tracker[2].count = 0;
for each node in input array
{
if(tracker[0].val == arr[i])
tracker[0].count++;
else if(tracker[1].val == arr[i])
tracker[1].count++;
else if(tracker[2].val == arr[i])
tracker[2].count++;
} // for end
void max_num(int *A, int l){
int n1 = -1, n2 = -1, c1 = 0, c2 = 0, i;
for(i=0; i<l; i++){
if(-1 == n1){ n1 = A[i]; c1++; }
else if(-1 == n2){ n2 = A[i]; c2++; }
else if(A[i] == n1) {c1++;}
else if(A[i] == n2) {c2++;}
else{
c1--;
c2--;
if(c1 == 0) n1 = -1;
if(c2 == 0) n2 = -1;
}
}//for()
c1 = 0; c2 = 0;
for(i=0; i<l ; i++){
if(n1 == A[i]) c1++;
else if(n2 == A[i]) c2++;
}
if(c1>(l/3)) cout << n1;
if(c2>(l/3)) cout << n2;
}
I think the best solution uses a map. You could return the map or just dump out the freqs like this does.
template <class Ty>
void ElementCounter(std::vector<Ty> inArray)
{
// Map key, value
std::map<Ty,int> tempMap;
std::vector<Ty>::iterator vecIter;
// O(n) iteration
for (vecIter = inArray.begin(); vecIter != inArray.end(); ++vecIter)
{
std::pair<std::map<Ty,int>::iterator, bool> pr = tempMap.insert(std::pair<Ty, int>(*vecIter, 1));
if (pr.second == false)
{
// Already exists, increment
(pr.first)->second++;
}
}
// Print results
std::map<Ty,int>::iterator tempIter;
for (tempIter = tempMap.begin(); tempIter != tempMap.end(); ++tempIter)
{
if (tempIter->second >= (int)(inArray.size() / 3))
std::cout << "Item " << tempIter->first << " appears " << tempIter->second << " times." << std::endl;
}
}
vector<int> FindAtLeastOneThirdFreq(vector<int> sample)
{
vector<int> result;
if (sample.size() == 0)
{
return result;
}
int mostFrequent = -1;
int count = 0;
for (vector<int>::const_iterator it = sample.begin(); it != sample.end(); it++)
{
if (count == 0 || *it == mostFrequent)
{
++count;
mostFrequent = *it;
}
else
{
--count;
}
}
int secondFreq = -1;
count = 0;
for (vector<int>::const_iterator it = sample.begin(); it != sample.end(); it++)
{
if (*it != mostFrequent)
{
if (count == 0 || *it == secondFreq)
{
++count;
secondFreq = *it;
}
else
{
--count;
}
}
}
int mostFreqCount = 0;
int seondFreqCount = 0;
for (vector<int>::const_iterator it = sample.begin(); it != sample.end(); it++)
{
if (*it == mostFrequent)
{
++mostFreqCount;
}
else if (*it == secondFreq)
{
++seondFreqCount;
}
}
int threshold = sample.size();
if (threshold % 3 == 0)
{
threshold /= 3;
++threshold;
}
else
{
threshold = (threshold + 2) / 3;
}
if (mostFreqCount >= threshold)
{
result.push_back(mostFrequent);
if (seondFreqCount >= threshold)
{
result.push_back(secondFreq);
}
}
return result;
}
This works, when an element occurs more than n/2 times, and not n/3.
Consider this: 2 2 5 4 2 6 8
Here in the end, mostFrequent comes out to be: 6, and not 2.
I too believe that because if we keep track we need extra memory and can be done in linear time if not i.e., with out extra memory it cannot be done in linear time.
Probabilistic algos exist in constant space and linear time. Consider the following. Pick 1000 entries at random. Sort them and find any entries that occur at least (for instance) 200 times. It is now very unlikely that any entry that occurs more than n/3 has less than 200 entries in this representative sample of 1000. You will find at most 5 (O(1)) values that have at least 200 occurrences in the sample. Do a pass through the full array counting the number of occurrences of each of these values. Return the ones that occur at least n/3 times.
This is correct with incredibly high probability (for large n) and is linear time. But a deterministic algo...I definitely think it's possible, but I'm not sure right now.
Also, no one said "constant space". Only "no excessive space" was mentioned. Hashing is apparently "excessive" for purposes of this problem. But I think a log-space solution or something like that would be fine...
If we assume it's an array of integers then maybe we can accomplish this by counting bits.
Note that there can be at most 2 numbers that occur more than n/3 times. So we make 2 passes through the array.
Maintain a 32 element BIT-TRACKER array. Examine each number and for any 1 bits in the number, and increment the corresponding element int the BIT-TRACKER array.
So, for example, if we encounter 25 (11001), do BIT-TRACKER[0]++, BIT-TRACKER[3]++, BIT-TRACKER[4]++.
Then check all the bits that have occurred more than n/3 times. If those bits match the current number, then that number must occur more than n/3 times.
So now we have 1 number. So we repeat the process again but only consider the remaining numbers. As we go through the array we simply skip the number we found already. Call the remaining count m. So for the 2nd pass we look for the number that occurs more than m/2 times. There can only be 1 of these.
If it's strings instead of integers then we have to map the strings to integers and do the same thing. However, the integers (and hence the number of bits) could get really large. So maybe this doesn't make sense for strings.
And then of course this whole idea could be complete nonsense.
I'll try to write it out this weekend.
Later!
I don't think your algorithm works correctly. Assume we have 9 numbers in binary as below:
1. 1001100
2. 1010011
3. 1010011
4. 1010011
5. 1000011
6. 1111111
7. 1111111
8. 1111111
9. 1111111
According to your algorithm, after we check the 5th number, the bit tracker is like [5][0][3][1][1][4][4]. The 5th number has bits [1][0][0][0][0][1][1], all of those with 1 now have 4 counts in the bit tracker. However, this number only has one copy in the whole array.
A suggestion
First Pass: Pick each element one by one. When you pick the first element increase the counter by 2. Get the next element if it is the same element increase counter by 2 otherwise decrease by 1. Get the next element and so on. If the counter hits 0 set the current character as the comparison element and increment by 2. Keep the last comparison element.
Second Pass: Counter the number of occurences of the element found from first pass. If it is bigger than n/3 than store this as one of the elements.
Third pass: If the element found has occurred more than 2n/3-1 times dont need this other wise Repeat first pass with incrementing by one and ignoring the element found from first pass. And then repeat the second pass.
int findMostFrequentNumber(int *a, int size) {
int i;
int cand1; int count=0;
int cand2; int ncount=0;
for(i=0;i<size;i++) {
if(count==0) {
if(ncount!=0){
if(cand2!=a[i]) {
cand1=a[i];
count++;
} else {
ncount++;
}
} else {
cand1=a[i];
count++;
}
} else {
if(cand1==a[i]) {
count++;
} else {
if(ncount==0) {
cand2=a[i];
ncount++;
} else {
if(cand2==a[i])
ncount++;
else {
--count;
--ncount;
}
}
}
}
}
if(count!=0)
printf("candidate:%d\n",cand1);
if(ncount!=0)
printf("candidate:%d\n",cand2);
return cand1;
}
I thought of a solution , I can neither prove it right nor prove it wrong. The solution is on the similar lines as it were for the problem where you have to find a number occuring more than n/2 number of times.
lets say we take 4 numebrs at a time. each time shifting the window by one bit.
1st to 4th , 2nd to 5th, 3rd to 6th etc...
Now we see the majority in these four numbrs, if the majority is the same as last majority, we increase a variable frequency, if it is different thn we decrease the freq. If not majority do
nothing
//edge cases not counted, jsut the sudo code
for(i = 1 to n)
{
maj = getMajority(i,i+3);
if(maj==null)
//nothing
else if(maj==last_majority)
freq++
else frq --
if(freq< 0)
{
last_majority = maj;
freq = 1 ;
}
}
answer = last_majority
}
I think it is a variation of voting problem. If you add 3 when a number is encountered and subtract 1 if not and only change a number if count hits 0 then you will get your answer.
Lets prove it. If you have 4 numbers 1 2 3 then clearly this holds. For an arbitrary case if the count is x then it can only go to 0 if following 3x numbers are different. Hence if a number is more than n/3 times then it will yield a positive count.
I didn/t understand. could you pls take an example and give step by step explanation?
--> If you add 3 when a number is encountered and subtract 1 if not and only change a number if count hits 0 then you will get your answer.
What is this for> I dont think it'l help.
If you add 2, it doesn't work for
2 2 5 6 7 8 4 2
length = 8. a no shud occur more than 8/3 = 3 times.
if you check for the above input, last no. comes out to be 4 and not 2 as expected.
I think Ashish is right about :
If you add 3 when a number is encountered and subtract 1 if not and only change a number if count hits 0 then you will get your answer.
But the only thing that we have to modify in here is we'll have to traverse the array two times because there will be atmost 2 elements with more than n/3 occurence.
So in first attempt we'll find our first number and in second attempt we'll cross out the found number as dummy and will search for another number, and print both the numbers found.
Thanks.
I think this problem is simply an extention to the class majority voting algorithm.
Instead of incrementing the counter by 1, you should increment by 2.
Code in Python:
def FindThird(A):
Num = A[1]
count = 1
for i <- 2 to n:
if A[i] == Num
count += 2
else if count > 0
count -= 1
else:
Num = A[i]
count = 1
return Num
Here is the algorithm. Maintain up to two elements and a positive counter for each. Scan through the array. If the next element x is the same as either stored element, increment the counter for that stored element. Otherwise, if there are fewer than two stored elements, store x and set its counter to 1. In the third case, if there are two stored elements and x is different from both, decrement the counter for each stored element, unstoring it if the counter would become zero (since we require the counter to be positive).
After the scan, any special element will be one of the stored elements. Then you just need to verify them.
I'm not sure whether this would work.
Type candidate1,candidate2;
void Find(Type* ID,int N)
{
int nTimes1 = 0 ;
int nTimes2 = 0 ;
int i;
for( i = 0; i < N; i++)
{
if (nTimes1 == 0)
{
candidate1 = ID[i], nTimes1 = 1;
}
else
{
if (candidate1 == ID[i])
{
nTimes1++;
}
else
{
if (nTimes2 == 0)
{
candidate2 = ID[i], nTimes2 = 1;
}
else
{
if (candidate2 == ID[i])
{
nTimes2++;
}
else
{
nTimes1--;
nTimes2--;
}
}
}
}
}
}
int main()
{
int *arr,count[128];
int n,i,size,u;
float notzero;
printf(" enter the value of n \n");
scanf("\n%d",&n);
arr=(int *)malloc(sizeof(int)*n);
printf(" enter values in array \n");
for(i=0;i<n;i++)
scanf("\n %d",&arr[i]);
if(n<0)
exit(0);
for(i=0;i<128;i++)
{
*(count+i)=0;
}
printf("\n values is ->> ");
for(i=0;i<n;i++)
{
u=*(arr+i);
*(count+u)= *(count+u)+1;
size=n/3;
if(*(count+u)>=size)
printf("\n %d ",*(arr+i));
}
getch();
}
for(int i = 0; i < sizeof(arr) - 2; i++)
{
if(arr[i] == arr[i+1] || arr[i] == arr[i+2])
{
printf("%d\n", arr[i]);
}
}
for(;i<sizeof(arr);i++)
{
if(arr[i] == arr[(i+1)%sizeof(arr)] || arr[i] == arr[(i+2)%sizeof(arr)]
{
printf("%d\n",arr[i]);
}
}
For a given list of element of length n, Take an array a, now iterate the list and for every element i , increment the value a[i]++.
At the end you have array a with only count of the respective no (I.e. Index is the actual number and value is there count)
Please refer count sort
1.use radix sort with 0(n) complexity ,and no use of extra space ..
1.1 find lowest unit digits of all the elements,and with the help of lowest unit digit swap elements ,and repeat again for 2nd lowest digit and at the find a sorted array list .
2. after sorting like elements 1,1,1,1,2,2,3,4,5 ,(n=9)
size=n/3 and in this size =3
3. check 1st and n/3th+1 element if both are same so print this element.
and over all complexity is 0(n).
a) most three elements can appear N/3 times in an array of size N.
b) Take a structure consisting of two integers to store an element value and its count in the array. Let there be 4 instances of this structure (or an array of size 4) initialized to 0.
e.g. struct { int element, int count } counter[4];
c) Iterate over the array and place each element into the counter array at index having the smallest count if the element does not exist in the array. If the element already exist, increment its count.
d) At the end of iteration, print out the elements with count >= N/3
1. Create 3 equal partition. If the 3 partitions were sorted , then one of the partition would contain the integer which occurs more than n/3.
2. On each of the partition, do the process of figuring out the minimum integer out the integers in the partition. One of these 3 numbers is a potential candidate we are looking for.
3. Create 3 separate variables, which count the number of times each of these candidate occurs in the entire array and figure out the number for which count > n/3
Heapify - O(n)
Remove max and store it... while(removeMax.equals(max) increment counter, if counter > n/3 return max.
Remove max takes at most log(n) so SUM(logn + log(n-1) + ... + 2 + 1) = O(n)
How about this
1.USE BST where duplication is allowed.
2.In order traverse of Tree.(gives sorted array such as 111 2 344455)
3.traverse the array and if count goes to 3 or above print the number.
your recommendation is welcome
We can not have more than 2 such elements. Here we start with first two unique elements and keep track of their count. we update the values when required. Implementation is self intuitive. Its a O(n) time complex algorithm and with O(1) i.e. constant memory requirement
package google;
public class FindTheElementWhichIsRepeatedMoreThanNBy3Times {
/**
* Problem Statement :- Find the element which is repeated for more than n/3 times in a given array of ints
*/
public static void main(String[] args) {
int[] input = {1, 2, 3, 2, 2, 3, 3};
int probableValue1 = input[0];
int probableVaulue1Count = 1;
int i = 1;
if(i == input.length){
System.out.println("Element : " + probableValue1 + " Count Of The Element : " + probableVaulue1Count);
return;
}
while(probableValue1 == input[i]){
i++;
probableVaulue1Count++;
continue;
}
int probableVlaue2 = input[i];
int probableValue2Count = 1;
boolean probableValue1Turn = true;
for(; i < input.length; i++){
if(input[i] == probableValue1){
probableVaulue1Count++;
}else if(input[i] == probableVlaue2){
probableValue2Count++;
}else{
if(probableValue1Turn){
probableVaulue1Count--;
if(probableVaulue1Count == 0){
probableValue1 = input[i];
probableVaulue1Count++;
probableValue1Turn = !probableValue1Turn;
}
}else{
probableValue2Count--;
if(probableValue2Count == 0){
probableVlaue2 = input[i];
probableValue2Count++;
probableValue1Turn = !probableValue1Turn;
}
}
}
}
probableVaulue1Count = 0;
probableValue2Count = 0;
for(i = 0; i< input.length; i++){
if(probableValue1 == input[i]){
probableVaulue1Count++;
}else if(probableVlaue2 == input[i]){
probableValue2Count++;
}
}
if(probableVaulue1Count > input.length/3){
System.out.println("Element : " + probableValue1 + " Count Of The Element : " + probableVaulue1Count);
}
if(probableValue2Count > input.length/3){
System.out.println("Element : " + probableVlaue2 + " Count Of The Element : " + probableValue2Count);
}
if(!(probableVaulue1Count > input.length/3) && !(probableValue2Count > input.length/3)){
System.out.println("No Such Element found");
}
}
}
Another approach would be to construct a trie with all the numbers. At each node of the trie, including non-leaf nodes, store the count of that particular number. This will need a constant space as the height of the trie is the number of digits in the largest number. Moreover, traversing a trie of constant height is linear - O(1).
When trying to find the numbers that occur at least N/M times among N numbers, this implies that there can be, at the most, N/3 - 1 such numbers. For instance, there can be only 3 numbers that repeat more than 4 times in a list of N numbers.
Along with the trie, have M separate variables that keeps track of the numbers with the highest count so far. In this case, have 2 variables that keep track of the top 2 counts.
For example:
Consider N = 10 => M = 4 (>10/3)
Sequence = 24 56 38 38 51 56 56 56 80 8
Then the trie will be
----2(0) 3(0) 5(0) 8(1)
/ / / \ \
/ / / \ \
4(1) 38(2) 51(1) 56(4) 80(1)
And max count is 4 for 56. The second max is 2, which is < 10/3 and hence, can be ignored.
using selection algorithm
1. quickselect (amorized cost is O(n))
or 2. median of median algorithmO(n))
pesudo code:
input: Array
Element elements=new Element[3];
elements[0]=selectKthElement(Array,0);
elements[1]=selectKthElement(Array,Array.size/3);
elements[2]=selectKthElement(Array,Array.size/3*2);
// each Element in elements is a candidate for appearing more than n/3 times.
for(int i=0;i<elements.size;i++)
{
if(appearMoreThanNOver3Times(elements[i]))
print(elements[i]);
}
1) Sort the array using quicksort - O(nlogn)
2) Iterate through the array once
3) if (a[i]=a[i-1] ) then cnt ++ ;
4) if (a[i]!=a[i-1] ) then if cnt > a.length/3 then print a[i-1]; // if cnt is more than third of the array, then print it as one of the elemnts that is more than 1/3rd
5) Reset cnt = 1;
6) go to step 3
We do not need std::map for the original question. Just six int's shall be enough. See the code below.
#include <stdio.h>
main() {
int elements[2];
int counters[2];
int sum[2];
int a[] = {5,6,7,8,10,10,10,10,10,10,4,4,4,4,4,1,1};
int size = sizeof(a)/sizeof(int);
int i;
counters[0] = counters[1] = sum[0] = sum[1] = 0;
for (i = 0; i < size; ++i) {
if (counters[0] == 0) {
elements[0] = a[i];
++counters[0];
sum[0] = 1;
} else if (counters[1] == 0) {
elements[1] = a[i];
++counters[1];
sum[1] = 1;
} else if (elements[0] == a[i]) {
++counters[0];
++sum[0];
} else if (elements[1] == a[i]) {
++counters[1];
++sum[1];
} else {
--counters[0];
--counters[1];
}
//printf("%d %d(%d) %d(%d)\n", a[i], elements[0], counters[0], elements[1], counters[1]);
}
if (counters[0] > 0 && sum[0] > size/3)
printf("%d\n", elements[0]);
if (counters[1] > 0 && sum[1] > size/3)
printf("%d\n", elements[1]);
return 0;
}
Use the quicksort partition method (randomized) to find the n/3 largest and 2n/3 largest element. Then partition the array based on 2n/3 element. If all the elements after index 2n/3 are same then you have found the number. If not then partition the first 2n/3 elements based on n/3 largest element. Check whether all elements in first partition is same or 2nd parition is same. If one of them is same then we have found it. This is O(n) algo with no extra space.
hi guys ,
i suggest one could sort the array using merge sort which takes O(nlogn) [worst-case].
then scan the sorted array for all elements that appear more than n/3 times , this would take O(n)
thus we achieve what we want in O(nlogn) + O(n).
let me know if this solution is efficient or not ...
thank you..
It's not terribly inefficient. Still, I imagine we're trying to do better than O(NlogN) here because it's just too easy to do what you did.
What I mean is the problem is rendered uninteresting if we admit this kind of O(NlogN) solution. However, in a practical setting it wouldn't necessarily be a bad idea.
yes ,
and i think this algo is risk free..
even if all the n elements are distinct(no duplicates) this algo will give correct answer.
and one improvement while scanning the sorted array can be : scanning (n-((n/3)-1)) elements only .
suppose if there are 20 elements ; n=20 and n/3 = 6 then scanning only till 15 elements , if in case 14th and 15th element are same then we can scan further elements (16th 17th ...) in the hope that elements 14-20 will be the same and can appear in our answer. if we find any element with different values in between 16 - 20 we stop processing further elements.
saving could be large for larger values of n
it can be done like this..array size Max 9
lets say elements are 123 262 482
we have to take array[MAX-1] extra space
scan the the given no
1st no i got 1 so a[1] 1
2nd no i got 2 so a[2] 2
after scanning all the no we end up getting
a[1] 1 a[2] 4 a[3] 1 a[4] 1 a[5] 0 a[6] 1 a[7] 0 a[8] 1 a[9] 0
print that index in which value is > n/3 (4 in this case)
so ans is 2
void printMajorElements(int[] a) {
int e1, c1, e2, c2;
foreach (var x in a) {
if (c1 == 0) {
e1 = x;
c1++;
continue;
}
if (c2 == 0 && x != e1) {
e2 = x;
c2++;
continue;
}
if (x == e1) {
c1++; continue;
}
if (x == e2) {
c2++; continue;
}
c1--; c2--;
}
// second pass to verify
int cA, cB;
foreach (int x in a) {
if (x == e1) cA++;
if (x == e2) cB++;
}
if (cA > a.length / 3)
print(e1);
if (cB > a.length / 3)
print(e2);
}
Please help me to find counter example for this solution:
import random
import time
import collections
import sys
def solve(A):
INT_MAX = sys.maxint
MAX_TABLE = 3
HASHTABLE = collections.defaultdict(int)
N = len(A)
min_element = None
min_count = INT_MAX
for i in range(0, N):
if len(HASHTABLE) >= MAX_TABLE:
break
HASHTABLE[A[i]] += 1
for i in range(i, N):
if A[i] in HASHTABLE:
HASHTABLE[A[i]]+=1
if min_element == A[i]:
min_count += 1
elif len(HASHTABLE) < MAX_TABLE:
HASHTABLE[A[i]] = 1
if min_count != 1:
min_count = 1
min_element = A[i]
else:
# make some room for new element
# first, find the least frequent
if min_count == 1:
else:
min_element = None
min_count = INT_MAX
for element,count in HASHTABLE.iteritems():
if count < min_count:
min_element, min_count = element, count
HASHTABLE[min_element] -= 1
if HASHTABLE[min_element] <= 1:
del HASHTABLE[min_element]
HASHTABLE[A[i]] = 1
#print HASHTABLE, N, N/3
ret = {}
for element, count in HASHTABLE.iteritems():
ret[element] = "%.2f%%" % (float(count)/ N)
return ret
def main(N):
N3 = N / 3 + 1
A = [1] * N3
#A.extend([2] * N3)
#A.extend([3] * (N - len(A))) # int(N3 * 0.5)
L = len(A)
garbage_size = N - L
A.extend([random.randint(3,4) for i in range(garbage_size)])
S = solve(A)
print S
random.shuffle(A)
S = solve(A)
print S
if __name__=='__main__':
MAX_TESTS = 6 if len(sys.argv) == 1 else int(sys.argv[1])
SCALE = 10e6
N = 10
for t in range(0, MAX_TESTS):
begin = time.clock()
main(N)
end = time.clock()
T = end - begin
print "test#%d: array_size: %d speed factor:%f" % (t, N, SCALE * (end - begin)/N )
N *= 10
I understood how it works but still don't understand why it works. Can anyone please help?
Why it works - this is not a trivial question. In his original post, near the bottom, @fentoyal says: "We can easily prove that, if there is a solution, it must be in the number left at the final stage; but we can not guarantee all numbers left are the correct solution. "
Well, maybe not *easily*, but this thing requires proof, it is not obvious. The idea of the proof would be, that every time you remove 3 different numbers. If some number occurs more than 1/3 of time, it should survive all the removals. The less frequent numbers not necessarily would survive it. (But if a number occurs exactly 1/3 of the times, it may not survive either, BTW).
I think, one can formulate an accurate proof along these lines.
Why it works - this is not a trivial question. In his original post, near the bottom, @fentoyal says: "We can easily prove that, if there is a solution, it must be in the number left at the final stage; but we can not guarantee all numbers left are the correct solution. "
Well, maybe not *easily*, but this thing requires proof, it is not obvious. The idea of the proof would be, that every time you remove 3 different numbers. If some number occurs more than 1/3 of time, it should survive all the removals. The less frequent numbers not necessarily would survive it. (But if a number occurs exactly 1/3 of the times, it may not survive either, BTW). I think, one can formulate an accurate proof along these lines.
Can be done using an Augmented AVL tree.
1. create an AVL tree with an additional field as count;
2. Treat numbers in an array as stream of intergers
3. start creating the AVL tree , and keep on updating the count field for each node.
4. Once the array is complete, do a traversal of tree and print nodes whose count > N/3
I have a correct solution to it. I am gonna post a small piece of code. You need a compiler that support C++ 11 to run the code.
But don't worry if you don't have such one. I know that most of people would prefer English to code. I will explain the idea in English afterward, but, excuse me for I am not a native English speaker.
The algorithm here is actually not designed dedicatedly to solve this question but to handle a more general case:
Given an array of N numbers, finds all the elements that appear more than N/M times and report the their frequencies.
The time complexity is O(2*N*logM) and space complexity is O(M)
For this question, M = 3, so the time is O(2log3 N) = O(N), space is O(3) = O(1);
- fentoyal December 06, 2011