Given N strings, find the smal

Google Interview Question for Software Engineer / Developers

5

of 7 votes

21
Answers
Given N strings, find the smallest string in lexicographic order which contains all the given strings as subsequences
- geekofthegeeks November 15, 2013 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer / Developer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

This can be solved most efficiently using n-way merge. The basic idea is that we use a binomial heap and add a character to it one by one from each string. We also need to keep track of what is already added to the heap.

- Anonymous November 16, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

There should be one modification, if the next character is the same as the last character in the merged result, and the last character is chosen from a different string, the next character can be skipped.

- PC January 02, 2014 | Flag

Comment hidden because of low score. Click to expand.

of 4 vote

Lets say strings are X = { S1, S2, ... Sn}, sum(|S1|+...+|Sn|) = L and max{ |Si| } = M.

1. Compute longest common sub-sequence between S1 and S2. O( M^2 )

2. Create a string M2 by merging S1 and S2. Include chars from LCS once. For other letters lexicographically smallest one comes first maintaining their order in the original string. It can be done considering string as a implicit linked list. O( L )

3. Remove S1, S2 from set X, and add M2 to the set.

4. Do the same thing for [S3, S4], [S5, S6]..... We have effectively reduced the set size to n/2. By successively doing this we will get n/4, n/8, ... elements, and finally X = {Mn} which is the answer. O( L^2 log n ).

There could be some other easier way to get rid of the common sub-sequences between strings except using LCS. I can't think of one though.

- lasthope November 20, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 2 vote

Imagine you sort the list of N strings, and from the sorting you create a long new string by concatenating the elements of the list. Now the requirement of the task is met. But if you have a concatenation of prefixes abcd and ecd that is abcdecd you may simplify it as abecd - both prefixes appears here as subsequences. This step might be done while concatenation. If we have string L and R, and we need to concatenate them, and R contains a substring of L, then we can remove it from L.

- joe kidd November 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

abcdecd is smaller than abecd in lexicographic order, if sort them first then just append them all.

- soysauce November 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Need more info on this one.

1. Does that smallest string is one of the N strings? (I know that would be dumb, but not clear from question.)
2. If not, does the smallest string need to come from a dictionary? Or just and string with characters in sorted order would do?

- mindless monk November 15, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

No to both

- geekofthegeeks November 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

"Small lexicographic order" is confusing me... do you mean "small" or "lexicographic order" or both?

Lets have an example.

apple, bc, plent => applentbc or applebcplent?

- mindless monk November 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@mindless monk,

i think for following example of apple,bc, plent
the smallest lexicographic ordered string is

abcpplent

- Anonymous November 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

for following example of apple,bc, plent ,
answer will be:
abcplenplet

- geekofthegeeks November 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

@geekofthegeeks
I think abcplent would be the right answer, as suggested by @anonymous. If it is not then please give some examples and explain them.

- mindless monk November 16, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

abcplenplet is the correct answer because it is lexicographically smaller than "abcpplent" (size is not 'l' < 'p')

- Anonymous December 27, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

Hey can we apply this approach to this question.

first we sort the strings by that we can get the ordered string and then using the trie data structure we can find the string which contains the subsequences of all the strings....

- Mac.theCreator November 16, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I am thinking about this approach.

Code in java:
Let String[] NStrings be given N input Strings.

String allStringsSupersetSoFar = NStrings[0] /* first input String */

for(int i = 1; i < NStrings.length; i++)
{
	allStringsSupersetSoFar = formSuperSetsoFar(NStrings[i], allStringsSupersetSoFar);
}

/*  
 * Given two strings, finds the superset string in lexicographic order for
 * which both the input strings are subsequences
 */
String formSuperSetsoFar(String ithString, String allStringsSupersetSoFar)
{
	int m = allStringsSupersetSoFar.length();
        int n = ithString.length();
        int i, j = 0; /* i - index to allStringsSupersetSoFar and j - index to ithString 
        StringBuilder superSetString = new StringBuilder();
        while ( i < m && j < n)
	{
		if(allStringsSupersetSoFar [i] < ithString[j]) 
		{
			superSetString.append(allStringsSupersetSoFar [i]);
			i++;
		}
		else if (allStringsSupersetSoFar [i] > ithString[j])
		{
			superSetString.append(ithString[j]);
			j++;
		}
		else
		{
			i++; j++;
			superSetString.append(allStringsSupersetSoFar [j]);
		}
	}
        
        if(j < n)
	{
		superSetString.append(ithString.substring(j+1));
	}   

	if(i < m)
	{
				superSetString.append(allStringsSupersetSoFar.substring(i+1));
	}
	return superSetString;
}

May be I am missing something here or my interpretation of question inputs and output is wrong. This problem seems simpler to come in Google interview.

@geekofthegeeks,
Did you have this for telephonic or face-to-face?

- Anonymous November 16, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Keep in mind that your code definitely gets the wrong answer in some situations. For example:
{"b", "c"} gives "bc". The right answer is "aaabc" since "aaabc" has as subsequences "b" and "c" as well as being < "bc" and < "abc" and < "aabc".

In fact, in all cases where an answer string consists of k letters without any 'a' in them, the true answer has k 'a' appended to it. And if the first L letters of the answer string are 'a', then you must append k - L 'a's.

- TheRightAnswer November 21, 2013 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

I still don't understand the problem. For example for the following two strings:

ab, xya would it be abxya or xyab?

- micronix November 17, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

How about this?
{{
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.Collections;

public class StringCombineLexicographic
{
public static void main(String[] arguementList)
{
String[] strArray = {"pqr","zy","aaaa","pqrs"};
char[] result = new char[128];
int size = 0;

for(int i=0;i<strArray.length;i++)
{
for (int j =0;j<strArray[i].length();j++)
{
int index = (int)strArray[i].charAt(j);
if(result[index] != strArray[i].charAt(j))
{
result[index] = strArray[i].charAt(j);
size++;
}
}
}

Arrays.sort(result);

for(int i = result.length-size;i<result.length;i++)
System.out.print(result[i]+" ");
}
}
}}

- BVT November 17, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

There is something wrong with the ques..
lets say u form some string which contains all the strings as its subsequences, bcdf
then abcdf is lexicographically smaller than bcdf...aabcdf is smaller than abcdf..and so on..
hence we can keep on adding number of a's to form smaller and smaller(lexicographically) strings.

- alex December 08, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Well, it's a classic problem which can be solved by dynamic programming(DP).
Considering the final patten satisfying the condition, it must be the form "(B)(A)(D)(C)(E)... etc.", where (A) means the word A in dictionary. i.e. Just connect them maybe cause overlaps or include.
In order to apply this approach, we need some preparation--erase all words which is contained in other word, and also keep the word set unique.
Thus, we define our DP equation as dp[i][statu] denoting the smaller string with i-th word be the last word we used, and this string has contained "statu" words("0100101" means that three words are contained).

dp[j][statu|(1<<j)] = dp[i][statu] + (an extra suffix string of word j)

In the brackets, an extra suffix means we need a string concentrate with word i to get the word j. i.e. if word i is "abcdefg" and word j is "defghij", we need an extra "hij" to finish the transfer.

The initial DP statu is to define each word as a statu, dp[0][00001] = "apple", dp[1][00010] = "orange", and so on.

The answer of the question is dp[anyone(0,1,2,..,N-1)][binary statu of all one(1<<N -1)], choose a smallest lexicographic string.

- Prowindy December 22, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Well, would this algorithm work?

1. Push all those string into a stack.
2. Peek the top value of the stack in each of the strings
3. The most minimum value lexographically is popped out (from more than one stack, if there should be duplicates) and stored into the resultant string.
4. Repeat 2 through 3 till all stacks are empty.

Of course, this takes up space proportional to number of strings (and no. of characters in each of those strings), but space concerns can alleviated, cause this problem would work well with divide and conquer method too (il est, process the strings in pairs, and merge them).

Using stack here, only because it makes the algorithm a bit easier to explain, of course you could just use an array to store the indices (of the yet-to-be-added-to-the-resultant-string characters).

- ignoramous December 29, 2013 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

N-way merge using min heap in Java. Complexity is O(m log n) where m is the overall number of chars in the input and n is the number of words in the input. The repeating chars are handled by popping all words that start with the same char and then putting them all back together...

import java.util.Arrays;
import java.util.LinkedList;

import util.MinHeap;


public class WordSuperSequence {

	private static class StringSuffix implements Comparable<StringSuffix> {
		String str;
		int start;
		StringSuffix(String string, int s){ str = string; start  = s; } 

		public int compareTo(StringSuffix other){
			// this is just a hack -  need to implement properly to achieve target complexity
			return -this.toString().compareTo(other.toString()); 
		}

		char peekChar() { return str.charAt(start); } 

		char nextChar() { return str.charAt(start++); } 
		boolean isEmpty () { return str.length() == start; }
		public String toString(){
			return str.substring(start);
		}

	}

	public static String getLeastSupersequence(String [] words){
		int  n = words.length;
		if(n == 0) return "";
		MinHeap<StringSuffix> heap = new MinHeap<>(n);
		StringBuilder sb = new StringBuilder();	


		for(String word : words) heap.insert(new StringSuffix(word, 0));
		LinkedList<StringSuffix> sameCharList = new LinkedList<>();
		while(!heap.isEmpty()){
			StringSuffix minSuffix = heap.extractMin();
			System.out.println("Extraced " + minSuffix);
			char ch = minSuffix.nextChar();
			sb.append(ch);
			sameCharList.clear();
			if(!minSuffix.isEmpty()) sameCharList.add(minSuffix);
			while(!heap.isEmpty() && heap.peekMin().peekChar() == ch){
				StringSuffix curSuffix = heap.extractMin();
				curSuffix.nextChar();	
				if(!curSuffix.isEmpty()) sameCharList.add(curSuffix);
			}
			for(StringSuffix suf: sameCharList){
				heap.insert(suf);
			}

		}

		return sb.toString();
	}
	
	
	public static void main(String[] args) {
		String [] words = {
			"apple",	
			"house",	
			"horse",	
			"help",	
			"pop"				
		};
		System.out.println("Supersequence of "+ Arrays.toString(words)+ " is "+ getLeastSupersequence(words));
	}

}

- konst June 03, 2014 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

well here is my solution in Java, the idea is to use Iterator and Heap (as many answers above)

public void solve(int testNumber, InputReader input, PrintWriter out) {
        FastScanner in = new FastScanner(input);
        int n = in.i(); // read an interger
        int total = 0;
        StringCharacterIterator[]strings = new StringCharacterIterator [n];
        for(int i = 0; i < n; ++i) {
            String s = in.sl(); // read a line
            strings[i] = new StringCharacterIterator(s);
            total += s.length();
        }

        PriorityQueue<StringCharacterIterator> heap = new PriorityQueue<>(cprt);
        for(int i = 0; i < n; ++i)
            heap.add(strings[i]);

        StringBuilder sb = new StringBuilder();
        for(int i = 0; i < total; ++i) {
            StringCharacterIterator it = heap.poll();
            sb.append(it.current());
            if(it.next() != CharacterIterator.DONE)
                heap.add(it);
        }

        out.println(sb);
    }

    Comparator<StringCharacterIterator> cprt = (o1, o2) -> Character.compare(o1.current(), o2.current());

- mt April 04, 2016 | Flag Reply

CareerCup

Google Interview Question for Software Engineer / Developers

Books

Videos

Resume Review

Mock Interviews