Flipkart Interview Question
Software Engineer / DevelopersCountry: India
get the occurrences for all characters in the string and store them in an array if all characters are ASCII
build the heap
decrement the root by one and sift down the root N times
sort the heap
calculate the difference between the max and min occurrence
@artist abbbbccc and we have to delete only 1 character then by your algo we got 2 answer(abbbccc) but actual answer is 1(bbbbccc)
true...thanks for the correction. in that case the heap algorithm by lfenjoy9 would also not work, will it?...got to rethink the algorithm
It would be better to decide whether we should remove the least frequent number or most frequent number. for instance
let in a string aabbbbbppppppppp
here a occurs 2 times b 5 times and p 9 times
so if we are allowed to remove 2 letters in this case it would be better to remove both a's rather than p
@cool
you are right. When we plan to remove the characters we need to keep following things in mind:
. Which one is more beneficial whether to remove characters with least frequency (or) characters with high frequency?
. If you are able to remove least frequency character such that c2 value is going up with modified array then keep doing it for iteration < n.
. Once you reach at the stage where you are no longer able to increase the value of c2 by removing characters <= n then you have to remove highest frequency character for remaining n iteration.
it becomes more complicated in a scenario like this:
input string = "aabbccccdddd" and n=2. in this case you need to remove one 'c' and 1 'd'. how do you handle all such scenarios programmatically?
Eg: aabbccccddddeee
1. Scan the string once to get the count for each alphabet (use hashing). We get a=2, b=2,c=4,d=4,e=3. While doing this scan also maintain a max and min. So we get max = 4, min = 2.
2. diff = max - min and chars_red=0, prev=0 (stores previous value of chars_red)
Now create an array of size max (arr[max]) and assign values to it in this way:
First a = 2. so arr[2]++, then b=2, so arr[2]++..so on
So at the end, arr[] ={0,0,2,1,2}
3. while(char_red <=n)&&(diff != 0) do this :
if(arr[max]!=0)
{
prev = chars_red;
chars_red+=arr[max];
diff--;
max--;
}
4. Ans is prev
I guess we can have a similar method to remove elements with minimum frequency and get prev. Then we compare the values obtained from both and return the minimum one
Eg: aabbccccddddeee
1. Scan the string once to get the count for each alphabet (use hashing). We get a=2, b=2,c=4,d=4,e=3. While doing this scan also maintain a max and min. So we get max = 4, min = 2.
2. diff = max - min and chars_red=0, prev=0 (stores previous value of chars_red)
Now create an array of size max (arr[max]) and assign values to it in this way:
First a = 2. so arr[2]++, then b=2, so arr[2]++..so on
So at the end, arr[] ={0,0,2,1,2}
3. while(char_red <=n)&&(diff != 0) do this :
if(arr[max]!=0)
{
prev = chars_red;
chars_red+=arr[max];
diff--;
max--;
}
4. Ans is prev
I guess we can have a similar method to remove elements with minimum frequency and get prev. Then we compare the values obtained from both and return the minimum one
This seems like a DP problem. This is what my take at this problem.
1. Let the RemoveCount is the max number of items that can be removed.
2. Sort the elements with their frequencies.
3. This function takes the elemArray and the maximum element to be removed and returns the min possible variation in the data.
int MinVariation(ElementFrequencyArray elemArray, int removeCount )
4. To reduce the highest frequent element, we need to check for the case when it no longer becomes a highest frequent element. Lets suppose reduceCount is more than the difference between the top two elements. Then ReduceHigherFrequentNode will reduce the the top element frequency till the time it becomes equal to the second max of the element array.
int ReduceHigherFrequentNode(ElementFrequencyArray& elemArray, int removeCount )
{
if (Max(elemArray) - SecondMax(elemArray) > removeCount)
{
// Reduce the frequency of maximum elem by remove count
// remove count = 0;
}
else
{
// Reduce max element till its equal to second max element count
// Reduce the removeCount by the difference
}
return removeCount;
}
5. To reduce element from the lower frequency side
int ReduceLowerFrequentNode(ElementFrequencyArray& elemArray, int removeCount )
{
if (Min(elemArray) < removeCount)
{
// Reduce freq of min frequent elem to 0.
// removeCount -= Min(elemArray)
}
return removeCount;
}
6. Given these functions, at any point we have following options
a1. if (removeCount > Min(elementArray)), it means there is a possibility that if we remove the min frequent element we may get the desired result. What we will do is, call ReduceLowerFrequentNode, and then with the changed elementArray and removeCount, we will find the minimum deviation. Lets call this result to be LowFreqRemovalResult.
a2. If the above condition is true, even then we may decide to reduce the highest frequency element, Call ReduceHigherFrequentNode, get the new set of elementArray and reduce count/
b. if the above condition is false, then call ReduceHigherFrequentNode repeatedly.
int MinVariation(ElementFrequencyArray elemArray, int removeCount )
{
if (removeCount == 0)
return Max(elemArray)- Min(elemArray);
if (Min(elemArray) < removeCount)
{
lowElemArray = copy(elementArray)
remCount = ReduceLowerFrequentNode(lowElemArray, removeCount);
lowFreq = MinVariation(lowElemArray, remCount);
highElemArray = copy(elementArray)
remcount2 = ReduceHigherFrequentNode(highElemArray, removeCount);
highFreq = MinVariation(lowElemArray, remCount);
return Min(lowFreq, higFreq);
}
highElemArray = copy(elementArray)
remcount2 = ReduceHigherFrequentNode(highElemArray, removeCount);
return MinVariation(lowElemArray, remCount);
}
private static int GetMinRoughness(string s, int maxLimit)
{
if (s == null) return 0;
int length = s.Length;
int[] a = new int[256];//considering ASCII not unicode
for (int i = 0; i < length; i++)
{
a[s[i]]++;
}
a = Sortings.QuickSort(a, 0, 255); // ascending
int index = 255; //interate from right to get the index of the least frequent of the character
int minIndex=-1;
while (a[index] >0)
{
if(index ==0) //rare case when all the possible ASCII characters have at least one occurance in the given string
{
minIndex=0;
break;
}
index--;
}
minIndex=(minIndex == -1) ? index+1 : minIndex;
return GetRoughness(a, maxLimit, 1, minIndex,255);
}
private static int GetRoughness(int[] a, int n, int adjustments, int minIndex, int maxIndex)
{
int length = a.Length;
if (n == 0 || (n < adjustments && n < a[minIndex])) //base condition
return a[maxIndex] - a[minIndex];
if (n >= a[minIndex] && a[minIndex + 1] > (2 * a[minIndex]) // if removing the less frequent character is favorable
&& minIndex < 254) // we need to make sure that we don't decrease the occurance of 2nd most frequent character
{
n = (n - a[minIndex]);
a[minIndex] = 0;
return GetRoughness(a, n, adjustments, ++minIndex, maxIndex);
}
else // continue removing the most frequent characters
{
int rightGain = (a[maxIndex] - a[maxIndex - 1]) * adjustments;
if (rightGain <= n && rightGain > 0)
{
n = n - rightGain;
//a[maxIndex] -= (rightGain/adjustments);
return GetRoughness(a, n, ++adjustments, minIndex, --maxIndex);
}
else
{
int adjustedRoughness = n / adjustments;
//a[maxIndex]-=adjustedRoughness;
return (a[maxIndex] - adjustedRoughness - a[minIndex]);
}
}
}
This is the solution with O(m log m ) time complexity(where (m=length of string) and O(256) space Complexity in case of ASCII characters in the string else O(512)
sort the characters by its no. of occurrence.
1) create a min heap and a max heap based on no. of occurrence.
2) now for given N number. we have total N possible ways to remove character
ex. for given n we can remove 0 char from min heap and N char from max heap
-> 1 char from min heap and N-1 char from Max heap.......
.......
removing n char from mean heap means
for i in range(0,n):
mean_heap_root -= 1
readjust_heap()
Same way removing n char from max_heap means subtract 1 from heap root and adust heap and do this for n times
say R_i_j represents remove i chars from min heap and j char from max heap
then ans is
for i in range (0,N):
R_i_(N-i)
ans = min(ans, max_heap(root) - min_heap(root))
Create a max-heap of characters present in line s, where comparison criteria among heap nodes is their frequency of occurrence. The root(A) will have highest frequency, say fMax. Remove this max occurring character, so that heap contains second max occurring character(B) as its root.Let new root's frequency be fMax2.
Look out for leaf nodes (representing minimum frequency characters). Take one of the least occurring character( there can be multiple).Say its frequency is fMin.
Now we need to minimize fMax - fMin by removing character A that many times as that frequency of A remains higher than second highest occurring number B. i.e. fMax - fMax2 times.
Let the length of the word is m at any given point and n is remaining deletes we are left with.
Define M(m,n) = min roughness value with first m letters and n remaining deletes at our disposal. Then
M(m,n) = min[ M(m-1,n-1), M(m-1,n) ]... either del mth element or leave it.
M(for all m,0)= calculated roughness from the given string s
M(2,n) = 0
The original version of this question is here:
red.cliff.jp/topcoder/RoughStrings.txt
Algo is as follows:
1) Iterate over possible values of roughness and check if given value can be achieved
a) if yes reduce the value
b) if no increase the value
2) Repeat a and b till no change is possible
3) To check if roughness can be achieved check every possible combination of min and max value counting number of needed changes which has to be done to achieved given roughness. If number of changes <= n then roughness can be achieved.
Code:
boolean check(int r, int[] freq, int n, int len) {
for (int min = 1; min <= len; min++) {
int max = r + min;
int need = 0;
for (int i = 0; i < 26; i++) {
if (freq[i] == 0) {
continue;
}
if (freq[i] < min) {
need += freq[i];
} else if (freq[i] > max) {
need += freq[i] - max;
}
}
if (need <= n) {
return true;
}
}
return false;
}
public int minRoughness(String s, int n) {
int[] freq = new int[26];
int len = s.length();
for (int i = 0; i < len; i++) {
freq[s.charAt(i) - 'a']++;
}
int left = 0;
int right = len;
while (left < right) {
int mid = (left + right) / 2;
if (check(mid, freq, n, len)) {
right = mid;
} else {
left = mid + 1;
}
}
return left;
}
public static void main(String[] args) {
int[] noOfRemoves = { 1, 5, 1, 5, 17, 2 };
String[] s = { "aaaaabbc", "aaaabbbbc", "veryeviltestcase", "gggggggooooooodddddddllllllluuuuuuuccckkk",
"zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", "bbbccca" };
for (int i = 0; i < s.length; i++) {
System.out.println(new RoughStrings().minRoughness(s[i], noOfRemoves[i]));
}
}
What you need to remember while erasing elements is that when you start removing the char that appeared the most number of time, you will also have to remove the characters that appeared second highest number of time if the most frequent char number equalled the 2nd most frequent elements. this will repeat at every level (when 3rd highest level is reached you will have to start erasing 3 char to actually reduce the roughness by 1). I'll post the code for this one once i find some time,
- The Artist October 30, 2012