Google Interview Question
Software Engineer / DevelopersCountry: United States
Lets say strings are X = { S1, S2, ... Sn}, sum(|S1|+...+|Sn|) = L and max{ |Si| } = M.
1. Compute longest common sub-sequence between S1 and S2. O( M^2 )
2. Create a string M2 by merging S1 and S2. Include chars from LCS once. For other letters lexicographically smallest one comes first maintaining their order in the original string. It can be done considering string as a implicit linked list. O( L )
3. Remove S1, S2 from set X, and add M2 to the set.
4. Do the same thing for [S3, S4], [S5, S6]..... We have effectively reduced the set size to n/2. By successively doing this we will get n/4, n/8, ... elements, and finally X = {Mn} which is the answer. O( L^2 log n ).
There could be some other easier way to get rid of the common sub-sequences between strings except using LCS. I can't think of one though.
Imagine you sort the list of N strings, and from the sorting you create a long new string by concatenating the elements of the list. Now the requirement of the task is met. But if you have a concatenation of prefixes abcd and ecd that is abcdecd you may simplify it as abecd - both prefixes appears here as subsequences. This step might be done while concatenation. If we have string L and R, and we need to concatenate them, and R contains a substring of L, then we can remove it from L.
Need more info on this one.
1. Does that smallest string is one of the N strings? (I know that would be dumb, but not clear from question.)
2. If not, does the smallest string need to come from a dictionary? Or just and string with characters in sorted order would do?
"Small lexicographic order" is confusing me... do you mean "small" or "lexicographic order" or both?
Lets have an example.
apple, bc, plent => applentbc or applebcplent?
@mindless monk,
i think for following example of apple,bc, plent
the smallest lexicographic ordered string is
abcpplent
@geekofthegeeks
I think abcplent would be the right answer, as suggested by @anonymous. If it is not then please give some examples and explain them.
I am thinking about this approach.
Code in java:
Let String[] NStrings be given N input Strings.
String allStringsSupersetSoFar = NStrings[0] /* first input String */
for(int i = 1; i < NStrings.length; i++)
{
allStringsSupersetSoFar = formSuperSetsoFar(NStrings[i], allStringsSupersetSoFar);
}
/*
* Given two strings, finds the superset string in lexicographic order for
* which both the input strings are subsequences
*/
String formSuperSetsoFar(String ithString, String allStringsSupersetSoFar)
{
int m = allStringsSupersetSoFar.length();
int n = ithString.length();
int i, j = 0; /* i - index to allStringsSupersetSoFar and j - index to ithString
StringBuilder superSetString = new StringBuilder();
while ( i < m && j < n)
{
if(allStringsSupersetSoFar [i] < ithString[j])
{
superSetString.append(allStringsSupersetSoFar [i]);
i++;
}
else if (allStringsSupersetSoFar [i] > ithString[j])
{
superSetString.append(ithString[j]);
j++;
}
else
{
i++; j++;
superSetString.append(allStringsSupersetSoFar [j]);
}
}
if(j < n)
{
superSetString.append(ithString.substring(j+1));
}
if(i < m)
{
superSetString.append(allStringsSupersetSoFar.substring(i+1));
}
return superSetString;
}
May be I am missing something here or my interpretation of question inputs and output is wrong. This problem seems simpler to come in Google interview.
@geekofthegeeks,
Did you have this for telephonic or face-to-face?
Keep in mind that your code definitely gets the wrong answer in some situations. For example:
{"b", "c"} gives "bc". The right answer is "aaabc" since "aaabc" has as subsequences "b" and "c" as well as being < "bc" and < "abc" and < "aabc".
In fact, in all cases where an answer string consists of k letters without any 'a' in them, the true answer has k 'a' appended to it. And if the first L letters of the answer string are 'a', then you must append k - L 'a's.
How about this?
{{
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.Collections;
public class StringCombineLexicographic
{
public static void main(String[] arguementList)
{
String[] strArray = {"pqr","zy","aaaa","pqrs"};
char[] result = new char[128];
int size = 0;
for(int i=0;i<strArray.length;i++)
{
for (int j =0;j<strArray[i].length();j++)
{
int index = (int)strArray[i].charAt(j);
if(result[index] != strArray[i].charAt(j))
{
result[index] = strArray[i].charAt(j);
size++;
}
}
}
Arrays.sort(result);
for(int i = result.length-size;i<result.length;i++)
System.out.print(result[i]+" ");
}
}
}}
There is something wrong with the ques..
lets say u form some string which contains all the strings as its subsequences, bcdf
then abcdf is lexicographically smaller than bcdf...aabcdf is smaller than abcdf..and so on..
hence we can keep on adding number of a's to form smaller and smaller(lexicographically) strings.
Well, it's a classic problem which can be solved by dynamic programming(DP).
Considering the final patten satisfying the condition, it must be the form "(B)(A)(D)(C)(E)... etc.", where (A) means the word A in dictionary. i.e. Just connect them maybe cause overlaps or include.
In order to apply this approach, we need some preparation--erase all words which is contained in other word, and also keep the word set unique.
Thus, we define our DP equation as dp[i][statu] denoting the smaller string with i-th word be the last word we used, and this string has contained "statu" words("0100101" means that three words are contained).
dp[j][statu|(1<<j)] = dp[i][statu] + (an extra suffix string of word j)
In the brackets, an extra suffix means we need a string concentrate with word i to get the word j. i.e. if word i is "abcdefg" and word j is "defghij", we need an extra "hij" to finish the transfer.
The initial DP statu is to define each word as a statu, dp[0][00001] = "apple", dp[1][00010] = "orange", and so on.
The answer of the question is dp[anyone(0,1,2,..,N-1)][binary statu of all one(1<<N -1)], choose a smallest lexicographic string.
Well, would this algorithm work?
1. Push all those string into a stack.
2. Peek the top value of the stack in each of the strings
3. The most minimum value lexographically is popped out (from more than one stack, if there should be duplicates) and stored into the resultant string.
4. Repeat 2 through 3 till all stacks are empty.
Of course, this takes up space proportional to number of strings (and no. of characters in each of those strings), but space concerns can alleviated, cause this problem would work well with divide and conquer method too (il est, process the strings in pairs, and merge them).
Using stack here, only because it makes the algorithm a bit easier to explain, of course you could just use an array to store the indices (of the yet-to-be-added-to-the-resultant-string characters).
N-way merge using min heap in Java. Complexity is O(m log n) where m is the overall number of chars in the input and n is the number of words in the input. The repeating chars are handled by popping all words that start with the same char and then putting them all back together...
import java.util.Arrays;
import java.util.LinkedList;
import util.MinHeap;
public class WordSuperSequence {
private static class StringSuffix implements Comparable<StringSuffix> {
String str;
int start;
StringSuffix(String string, int s){ str = string; start = s; }
public int compareTo(StringSuffix other){
// this is just a hack - need to implement properly to achieve target complexity
return -this.toString().compareTo(other.toString());
}
char peekChar() { return str.charAt(start); }
char nextChar() { return str.charAt(start++); }
boolean isEmpty () { return str.length() == start; }
public String toString(){
return str.substring(start);
}
}
public static String getLeastSupersequence(String [] words){
int n = words.length;
if(n == 0) return "";
MinHeap<StringSuffix> heap = new MinHeap<>(n);
StringBuilder sb = new StringBuilder();
for(String word : words) heap.insert(new StringSuffix(word, 0));
LinkedList<StringSuffix> sameCharList = new LinkedList<>();
while(!heap.isEmpty()){
StringSuffix minSuffix = heap.extractMin();
System.out.println("Extraced " + minSuffix);
char ch = minSuffix.nextChar();
sb.append(ch);
sameCharList.clear();
if(!minSuffix.isEmpty()) sameCharList.add(minSuffix);
while(!heap.isEmpty() && heap.peekMin().peekChar() == ch){
StringSuffix curSuffix = heap.extractMin();
curSuffix.nextChar();
if(!curSuffix.isEmpty()) sameCharList.add(curSuffix);
}
for(StringSuffix suf: sameCharList){
heap.insert(suf);
}
}
return sb.toString();
}
public static void main(String[] args) {
String [] words = {
"apple",
"house",
"horse",
"help",
"pop"
};
System.out.println("Supersequence of "+ Arrays.toString(words)+ " is "+ getLeastSupersequence(words));
}
}
well here is my solution in Java, the idea is to use Iterator and Heap (as many answers above)
public void solve(int testNumber, InputReader input, PrintWriter out) {
FastScanner in = new FastScanner(input);
int n = in.i(); // read an interger
int total = 0;
StringCharacterIterator[]strings = new StringCharacterIterator [n];
for(int i = 0; i < n; ++i) {
String s = in.sl(); // read a line
strings[i] = new StringCharacterIterator(s);
total += s.length();
}
PriorityQueue<StringCharacterIterator> heap = new PriorityQueue<>(cprt);
for(int i = 0; i < n; ++i)
heap.add(strings[i]);
StringBuilder sb = new StringBuilder();
for(int i = 0; i < total; ++i) {
StringCharacterIterator it = heap.poll();
sb.append(it.current());
if(it.next() != CharacterIterator.DONE)
heap.add(it);
}
out.println(sb);
}
Comparator<StringCharacterIterator> cprt = (o1, o2) -> Character.compare(o1.current(), o2.current());
This can be solved most efficiently using n-way merge. The basic idea is that we use a binomial heap and add a character to it one by one from each string. We also need to keep track of what is already added to the heap.
- Anonymous November 16, 2013