Google Interview Question
Software Engineer / DevelopersCountry: United States
I don't understand why can't we simply have two variables for min,max an update them every time we change window boundary?
@gaurav101: How would you update the min / max efficiently with your proposal? The trouble is that we need to keep the min / max updated as elements drop off the left end of the sliding window.
Hi everyone.
First of all, I should point out that if we want to know all those pairs, we cannot do better that n-squared. Consider an array of all 1's. For this array we have to print out all pairs as the answer. So, printing alone would take quadratic amount of time. This said, from now on, I am assuming that we are only interested in the number of those pairs.
As the OP has said, we can solve this problem in O(n). I will give a hint, and start coding myself. However, let's analyze and see an O(nlogn) solution, which itself leads to the O(n) solution.
The algorithm that I am going to describe, utilizes a two-pointer approach. That is, two pointers are going to traverse the array. These pointers are going to show us a valid slicing. We are going to advance the first pointer until the slicing becomes invalid. After that we are going to advance the second pointer until we have a valid slicing, again. We are going to do this routine until the first pointer gets out of range.
Here is the algorithm.
1. Point both pointers to the first element.
2. Answer <= 0
3. While first pointer is in range:
a. If current slicing is valid, then:
1. Answer <= Answer + (Number of elements between two pointers, inclusive)
2. Advance the first pointer one step
b. Else, current slicing is not valid:
1. Advance the second pointer one step
4. Return Answer
For the moment, let's not worry about the running time of the validity check and find out if the algorithm counts all valid pairs.
We can prove this by contradiction. Suppose there is a valid pair, say (A,B), that we do not count when the first pointer points to B. That is, when the first pointer reaches B, the second pointer is pointing to an element after A. This can only happen when the second pointer points to A and the first pointer is pointing to an element C(A<C<B), and the slicing (A,C) is not valid (so the second pointer goes forward one step and we miss (A,B) when we reach B). However, a situation like this is impossible, because if (A,C) is not valid the difference between the maximum and the minimum in (A,C) is more than two; therefore, the difference between the maximum and the minimum in (A,B) is going to be at least equal to that of (A,C), because (A,C) is contained in (A,B). This shows that we are not going to miss a valid pair.
Let's go back to validity check. As we are going to do this check n times we should be able to do it in O(logn). We can do this check in O(logn) by using a balanced binary search tree like Red-Black Tree. This tree contains all elements in the slicing between the pointers. In each iteration, we compare the maximum (O(logn) ) and the minimum (O(logn) ) of the tree, to see if a slicing is valid. If we have a valid slicing, we increment the first pointer and insert the element that it points to, in the tree. If the slicing is not valid, we delete the element that the second pointer points to from the tree, and the increment the second pointer.
Here is my implementation of the algorithm in C++. I have used STL map as the balanced tree:
#include <iostream>
#include <map>
using namespace std;
long long numberOfGoodSlices(int n, int *ary)
{
int ptr1 = 0, ptr2 = 0, mx, mn;
long long ans = 0ll;
map <int, int> mp;
map <int, int> ::iterator it;
mp[ary[0]] = 1;
while (ptr1 < n)
{
mx = mp.rbegin()->first;
mn = mp.begin()->first;
if ( (mx - mn) <= 2 ) //we are in good shape :)
{
ans += (ptr1 - ptr2) + 1;
ptr1++;
if (ptr1 < n)
{
it = mp.find(ary[ptr1]);
if (it == mp.end())
mp[ary[ptr1]] = 1;
else
it->second++;
}
}
else
{
it = mp.find(ary[ptr2++]);
it->second--;
if (!(it->second))
mp.erase(it);
}
}
return ans;
}
Hint for linear solution: We never have more than 4 distinct values in the tree. I'm going to code this one, too and post it here.
Cheers!
Good approach. Small nitpick: technically you could have 5 distinct values in the tree.
If you know you'll only have 5 values you might even use an array and not bother with a tree.
The PDF document the OP linked to asks for maximum and minimum in the slice <= K (rather than <= 2). Then you can no longer assume the tree will be small. Can you solve this problem in O(n) regardless?
Thanks Eugene :-)
Could you please provide a small example for the case with 5 distinct values rather than 4?
I've actually used an array in my other post here.
Unfortunately, I have read the solution in the pdf and what comes next is not my solution:
First of all, observe that we are only interested in maximum and minimum of a given slicing to check its validity. This said, the problem is solved if we can find a way for handling maxima and minima efficiently. From here on, I will only explain how to handle the maximum (the minimum is pretty similar). Suppose the first pointer goes forward one step and points to element X. Now we have two types of numbers in the slicing. The ones that are smaller than X and the ones that are larger than or equal to X. Let us focus on the smaller ones. We can see that those numbers are never going to be the maximum. The proof is by contradiction and is pretty straightforward, so I'll skip it. Now, suppose that we have the numbers in the slicing in non-increasing order. When we want to add X, we discard the smaller ones and the append X to end of the list; therefore, we maintain the non-increasing order. Another interesting property we have is that the numbers in the set are in order of their appearance in the original array, because we always append the new one to end of the list. This property, helps us in advancing the second pointer. Right before every step of the second pointer, we simply check the first element of the list and if it is the element that the second pointer is currently pointing to, we remove it from the list, before advancing the second pointer.
We can see that, maintaining a list like the above is going to cost us O(n) total amount of time, because every element is inserted and removed at most once. Here is an implementation of the list using STL list.:
#include <iostream>
#include <list>
using namespace std;
class myDataStructure{
private:
list < pair<int, int> > maxima, minima;
public:
void insert(int value, int position)
{
while (maxima.size() && maxima.back().first < value)
maxima.pop_back();
maxima.push_back(make_pair(value, position));
while (minima.size() && minima.back().first > value)
minima.pop_back();
minima.push_back(make_pair(value, position));
}
void remove(int position)
{
if (maxima.front().second == position)
maxima.pop_front();
if (minima.front().second == position)
minima.pop_front();
}
int maxMinDifference()
{
int mx = maxima.front().first;
int mn = minima.front().first;
return mx - mn;
}
};
Just thinking here, If you hash all values as you go along where the key is the value at position i in the array and i is the index. Whenever you reach a new position in the array you check the hash for keys with values -2=> i <= 2.
pseudo code:
public void findSlices(int[] a){
NavigableMap <int, int> hash = new NavigableMap <int, int>();
for(int i=0;i<a.length;i++){
hash.put(a[i], i);
Collection<int> nearKeys = hash.subMap(-2+i,2+i);
foreach(int val : nearKeys){
System.out.println("("+val+","+i+")");
}
}
}
This can never be O(n), since all values in the array might be the same number. If all values are unique then this solution can use a regular hashmap and would be O(n).
Don't post study questions under google interview.
This gives the wrong impression of the types and calibration of google interviews for the job title you posted under.
Confused output is this one
Output:
(0,0) (1,1) (2,2) (3,3) (4,4) (5,5)
(0,1) (1,2) (1,3) (2,3)
or this one
Example slices: 3 5, 5 7, 1 3, 2 3.
Output is output and example slices is example. I don't know why you are getting confused.
It is confusing because the initial array of the example is { 3, 5, 7, 6, 3 }
and the output is not considering the last element of the array, also the example
slices include term not available in the initial array such 1 and 2
so the complete problem seems not to be well defined
1. Hash all the input array values as input, List<index>
(I use list of index to consider repeated numbers in the input array)
2. Traverse the array
For each number in array
if hash has any value in the range (number + 0 to number + 2)
we have found a pair
(additional steps could be added to remove duplicate pairs if needed)
Time complexity: see below comment for time complexity (thanks to Anonymous)
.
public static List<int[]> FindSlicedPairs(int[] input)
{
List<int[]> pairs = new List<int[]>();
// hash of the <int, indexes>
Dictionary<int, List<int>> hash = new Dictionary<int, List<int>>();
for (int i = 0; i < input.Length; i++)
{
if (hash.ContainsKey(input[i])) hash[input[i]].Add(i);
else hash.Add(input[i], new List<int> { i });
}
for (int i = 0; i < input.Length; i++)
{
for (int j = 0; j <= 2; j++)
{
int n = input[i] + j;
if (hash.ContainsKey(n))
foreach (int index in hash[n])
pairs.Add(new int[] { i, index });
}
}
return pairs;
}
This won't be O(n) rather O(n*m) where n is number of elements in the array and m is the number of slices we find. 2n to iterate through the array twice and n+(n*m) because of {{foreach (int index in hash[n])}} loop. So in case if we had an array like 1,2,1,2,1,2,1,2,1 or even 1,1,1,1,1,1,1 then each pair has to be reported which will end up in O(n^2) time complexity
C++ implementation for the algorithm described in my other post: ( O(n) running time )
#include <iostream>
#include <algorithm>
using namespace std;
class myDS{
private:
int mx, mn, distinctCount;
pair<int, int> ary[4];
public:
myDS()
{
for (int i = 0; i < 4; i++)
ary[i].first = ary[i].second = -1;
distinctCount = 0;
}
void insert(int x)
{
int emptySlot = -1;
for (int i = 0; i < 4; i++)
{
if (ary[i].second > 0)
{
if (ary[i].first == x)
{
ary[i].second++;
return;
}
continue;
}
emptySlot = i;
}
ary[emptySlot].first = x;
ary[emptySlot].second = 1;
if (distinctCount>0)
{
mx = max(mx, x);
mn = min(mn, x);
}
else
mx = mn = x;
distinctCount++;
}
void remove(int x)
{
bool isThisTheFirstOne = true;
for (int i = 0; i < 4; i++)
{
if (ary[i].first == x && ary[i].second>0)
{
if (ary[i].second == 1)
{
ary[i].first = ary[i].second = -1;
distinctCount--;
}
else
{
ary[i].second--;
}
}
if (ary[i].second > 0)
{
if (isThisTheFirstOne)
{
mx = mn = ary[i].first;
isThisTheFirstOne = false;
}
else
{
mx = max(mx, ary[i].first);
mn = min(mn, ary[i].first);
}
}
}
}
bool isGood()
{
return ((mx - mn) <= 2);
}
};
long long numberOfGoodSlices(int n, int *ary)
{
int ptr1 = 0, ptr2 = 0, mx, mn;
long long ans = 0ll;
myDS *mds = new myDS();
mds->insert(ary[0]);
while (ptr1 < n)
{
if (mds->isGood() )
{
ans += (ptr1 - ptr2) + 1;
ptr1++;
if (ptr1 < n)
mds->insert(ary[ptr1]);
}
else
mds->remove(ary[ptr2++]);
}
return ans;
}
- start from i=0, j=0 (zero indexing)
1. move j by one position until window (i to j) violets the rule (during this keep printing (i,j), i.e., (1,2) (1,3) (1,4) etc.)
2. suppose j is the position where max-min > 2 for the current window. print (i,j-1) and move i by one position and go to (1).
O(n) i assume ??
O(n) solution. Note iterators 'beg' and 'end' only one time run from 0 to size();
For every step 'beg' we move 'end' farther while max-min <= K in range [beg, end]
List 'min_track' keeps numbers in ascending order, list 'max_track' keeps numbers in descending order,
For array 3 5 7 6 3:
for 'beg'=0 and 'end'=2:
min_track: 3 5 7, max_track: 7
for 'beg'=2 and 'end'=5
min_track: 3, max_track: 7 6 3
struct Item{
int value;
int index;
};
void PrintSlices(const vector<int>& v, int k) {
list<Item> min_track;
list<Item> max_track;
int end = 0;
for (int beg = 0; beg < v.size(); beg++) {
while (end < v.size()) {
while (!max_track.empty() && max_track.back().value <= v[end])
max_track.pop_back();
max_track.push_back(Item{v[end], end});
while (!min_track.empty() && min_track.back().value >= v[end])
min_track.pop_back();
min_track.push_back(Item{v[end], end});
if (max_track.front().value - min_track.front().value > k)
break;
end++;
}
for (int i = beg; i < end; i++)
printf("(%d, %d) ", beg, i);
if (!min_track.empty() && min_track.front().index == beg)
min_track.pop_front();
if (!max_track.empty() && max_track.front().index == beg)
max_track.pop_front();
}
}
The problem in the PDF you linked to is different from the problem you describe. You ask to print the ranges; the PDF asks only to count the number of such ranges. The PDF also asks you to make the difference <= K and not just <= 2.
- eugene.yarovoi April 14, 2014The problem as you posed it requires O(n + r) time, where r is the size of the output. Since the output is up to O(n^2) in size, no algorithm is possible that will always be O(n) -- the best we can hope for is an output-sensitive algorithm that solves the problem in O(n) if the output size is O(n) or smaller. Consider that an array of all 0s will need to print O(n^2) pairs of numbers.
The problem posed in the PDF only asks you to count the number of subarrays, not enumerate them. This is, in fact, doable in O(n). The basic idea is that you can start with arr[0] and then see how many more elements you can include before you violate the max -min <= K constraint. When you reach some index j you can no longer include, you know the maximum subarray starting at index 0 is arr[0...j-1], and you can count j different subarrays there. You then already know arr[1...j-1] is valid, and you try to advance the right boundary again (try arr[1...j], then arr[1...j+1]) and so on until again you reach a point past which you can't advance. Then you'll try subarrays starting with index 2, and so on. This is what they mean when they talk about "crawling" over the array.
The key issue here is how you will check whether a particular subarray is valid efficiently. One possibility would be to use a data structure that allows you to do minimum / maximum range queries (see the Maximum Range Query problem). However, in this case, you can reach the solution more simply by recognizing that the values over which you want to find maxima / minima can be modeled by a queue. When you advance the right boundary, you append an element to the queue; when you advance the left boundary, you dequeue an element. Now all that remains is to show how to implement a queue that allows O(1) amortized append, dequeue, and max / min.
You may have heard of the classic interview question to implement a stack with O(1) push, pop, and max / min. If you know how to solve that and also know how to make a queue from 2 stacks with amortized O(1) stack operations per queue operation, you have a queue that supports O(1) amortized append, dequeue, and max / min (each of the 2 stacks will have a max / min, and the overall max / min is just the larger / smaller of the two). I find that to be the most straightforward solution.
The PDF goes with a different approach to the same problem of implementing a queue with O(1) min / max. They have 1 queue each for min and max (the min is treated analogous to the max case). Whenever a new value comes in the max queue, they remove all values less than the value, since those can never again be the max. For that reason the queue's values are in ascending order (you can prove that if you start with an ascending sequence, remove everything less than a given number, and add the given number to the front, you still have an ascending sequence). Since the queue is in ascending order, the max at any given time is the last element, and since the elements are always inserted in the front, the oldest elements are in the back, so you always dequeue from the back.