Amazon Interview Question
SDE1sCountry: India
Interview Type: Written Test
@Sunny
Good one, +1 for you. Though I see you need to take care not to add the same string to both hashtable1 and hashtable2. For ex: a string of the form aaaaaaaaabaaaaaaaaa might have 'a' as the longest prefix as well as the longest suffix, but they cannot be permuted to form the longest contiguous sequence.
Dumbo - that's a good catch. I lost sight of cases like these after I cleaned up my code. And for this problem I really have a hard time cleaning up the code. Even this version isn't that clean. Would love to see a cleaner solution. Perhaps I will try again myself.
Amit - my algorithm should return 5 as it also processes the strings in the "middle". Unless there's a bug of course.
I din't see your code, but in algo, you nowhere mentioned about traversing all strings. I thought You are only looking fr prefix and sufffix in the strings which contains multiple letters... If it processes character in the middle, it will be ok :)
input = {aa, aac, ba} output = a,5
output in this case can be constructed by joining the three strings as
baaaaac
If you join the strings of the array in any order you have to find the letter which is repeating continuously maximum number of time.
Here maximum is a -3
abbaac
the given set of strings is [ ab, ba, aac ]
if you consider the permutation ba-aac, this has the longest running character sequence "a-aa", so the answer is 3
For each aplphabet find max ending string,max starting string,only this aphabet strings.
condition1 :maxending,max starting should not be only this alphabet strings
condition2:if maxending,maxstarting is same string(find two more 2nd maxending,2nd max starting).
Combine maxendingstring+all only alphabet strings+max starting string(Take care of not using any string more than once).
For condition2;
try Max(maxending+all anly this alpabet strings+2ndmax starting , 2nd max ending+all only this alphabets+max starting).
Find the longest among these strings.
Ex:
aa,aac,ba
For a:max ending is ba,only a is {aa},max starting is aac->ba+aa+aac=5
For b:max ending is '',only b is{},max b starting is ba ->''++""+ba = 1
similar for all aphabets
Answer is baaaaac.
Complexity o(m*n) where m is avg length of strings.
For each character, that is either the first char or the last char of any string, compute these two details:
1. Length of longest sequence beginning with this char
2. Length of longest sequence ending with this char
If a string has same char as prefix and suffix only count the one which is longer.
Now for each character from above set,
Compute the length by summing (1) + (2) above.
Pick the character with max sum, as answer.
INPUT: ab; ba; aac
Algo:
a - begString: aac, length=2
a - endString: ba, length=1;
b - begString: ba, length =1;
b - endString; ab, length=1;
c - endString: aac, length=1;
for a, sum=2+1=3
for b, 1+1=2;
for c, 1+INF=INF
Output: a [max length=3]
Complexity: O(n). Simply read each character of each string from beginning & ending till it matches with adjacent character and find the above lengths.
Edit: To address the case where max length seq is in the middle of a string, for each char - also keep track of longest length "inside" a string globally(across all strings). This can be easily done by using a map or such data structure.
Solution by Sunny above seems to be more complete & elaborative.
What about ab; ba; aac; aaaa; aaaa; azzzzzzzzzzzzzzzzzzzza
(1) In this case, would the algorithm be able to consider using both "aaaa" to form the longest sequence?
(2) And would the algorithm be able to detect that "zzzzzzzzzzzzzzzzzzzz" is actually the longest?
I tried making a stack to test a single word. I inserted a new character in stack if the either the stack is empty or the old character is the same as this one. And emotied the stack if new character is different and checked if old character strring is greater than max string of repetitive characters.
#include <stdio.h>
#include <conio.h>
#include <malloc.h>
typedef struct stack stack;
struct stack{
int stk_arr[20];
int tos;
int max;
char max_char;
};
void insert(stack tk,char *s)
{
tk.stk_arr[++tk.tos]=*s;
printf("\n %c inserted",*s);
}
void empty_stack(stack tk)
{
while(tk.tos>=0)
{
tk.tos--;
}
}
int main()
{
stack stck;
char name[20]="sfdfdfgggggghhjgj";
char *st;
st=name;
stck.tos=0;stck.max=0;
while(*st!='\0')
{
if(stck.tos==0)
{insert(stck,st);}
else if(*st==stck.stk_arr[stck.tos])
{
insert(stck,st);
}
else {
if(*st!=(stck.stk_arr[stck.tos]))
{
if(stck.tos>=stck.max)
{stck.max=stck.tos;
stck.max_char=stck.stk_arr[stck.tos];
empty_stack(stck);
printf("\n stack empty");
insert(stck,st);
}
else{
empty_stack(stck);
printf("\n stack empty");
}
}
}
st++;
}
printf("\n exectued");
printf("\nmaximum char is %c sequence is %d",stck.max_char,stck.max);
getch();
}
Howerver every time the output is max character:';' and sequence length is 0.
Please tell me what is wrong with the above...
Maintain a map with <key, value> as <char, repetitions>.
While iterating over each word, populate a list of prefixes and another list of suffixes. Also, populate the repetitions map.
Again, iterate over both the lists and add the char and it's repetitions in the map, if the existing repetitions for that character does not exist in the map or if the repetitions are less than summation of suffix length+prefix length. While doing this make sure to avoid using the same word for calculating summation.
public class LongestSequence {
static Map<String, Integer> repeats = new HashMap<>();
static List<String> starts = new ArrayList<>();
static List<String> ends = new ArrayList<>();
static String maxRepChar = "";
public static void main(String[] args) {
String[] tests = {"ab", "ba", "aac", "dffffg"};
for (String test : tests) {
processor(test);
}
for (int i = 0; i < ends.size(); i++) {
String end = ends.get(i);
int endCharLength = end.length();
String character = end.substring(endCharLength - 1, endCharLength);
for (int j = 0; j < starts.size(); j++) {
if (j == i) continue;
else {
String start = starts.get(j);
int totalLength = endCharLength + start.length();
if ( start.startsWith(character) && (!repeats.containsKey(character) || repeats.get(character) < totalLength) ) {
repeats.put(character, totalLength);
if (totalLength > (repeats.get(maxRepChar) == null ? 0 : repeats.get(maxRepChar) )) {
maxRepChar = character;
}
}
}
}
}
System.out.println("maxRepChar = " + maxRepChar);
}
private static void processor(String test) {
Matcher repeatsMatcher = Pattern.compile("(.)\\1+").matcher(test);
while (repeatsMatcher.find()) {
String character = repeatsMatcher.group(1);
int repLength = repeatsMatcher.group(0).length();
if ( !repeats.containsKey(character) || repeats.get(character) < repLength) {
repeats.put(character, repLength);
if (repLength > (repeats.get(maxRepChar) == null ? 0 : repeats.get(maxRepChar) )) {
maxRepChar = character;
}
}
}
Matcher startsMatcher = Pattern.compile("^(.)\\1*").matcher(test);
if(startsMatcher.find()) {
starts.add(startsMatcher.group(0));
}
Matcher endsMatcher = Pattern.compile("(.)\\1*$").matcher(test);
if (endsMatcher.find()) {
ends.add(endsMatcher.group(0));
}
}
}
Append the input strings, then do the following on the resulting string:
cnt=1;
max=INT_MIN;
for(size_t i=0;i<strlen(appendStr);i++)
{
if(appendStr[i]==appendStr[i+1])
cnt++;
else
{
if(cnt>max)
{
max=cnt;
ch=appendStr[i-1];
}
}
printf("%d %c",max,ch);
free(appendStr);
}
This is my solution for the question in ruby:
def permutation_longest_sequence(str_list)
permutations = str_list.permutation.map &:join
max_sequence_count = Hash.new()
permutations.each do |combined_str|
previous_char = nil
sequence_count = 0
combined_str.split("").each do |char|
if previous_char.nil?
previous_char = char
next
end
if previous_char == char
sequence_count+=1
else
if max_sequence_count[previous_char].nil?
max_sequence_count[previous_char] = sequence_count
else
max_sequence_count[previous_char] = sequence_count if max_sequence_count[previous_char] < sequence_count
end
sequence_count = 1
end
previous_char = char
end
end
max_value = max_sequence_count.values.max
max_sequence_count.select{|k, v| v == max_value}
end
I would do it in a simple yet very efficient manner:
for each permutation of the set of strings
find out the longest consecutive sequence of the char
Below is the complete working code for the above algorithm:
#include <stdio.h>
#include <string>
#include <vector>
#include <algorithm>
using std::string;
using std::vector;
typedef vector<string> svec;
bool longest_sequence(const string& data, int& count, char& letter)
{
count = 0;
letter = 0;
if (data.length() == 0) {
printf("Data length: 0\n");
return false;
}
int slprev = -1;
char chprev;
int slcurr = -1;
char chrcurr;
for (int i = 1; i < data.length(); ++i){
if (-1 == slcurr) {
slcurr = 1;
chrcurr = data[i];
}
else {
if (data[i] == chrcurr) {
++slcurr;
}
else { // point where the current char differs from the last one
// if the previous is not initialized, then do it now
if (-1 == slprev) {
slprev = slcurr;
chprev = chrcurr;
}
else {
// we got the previous stored already, so compare the
// length and replace the previous with the current
// if the count of the current > previous
if (slprev < slcurr) {
slprev = slcurr;
chprev = chrcurr;
}
}
// Initialize the new current values
chrcurr = data[i];
slcurr = 1;
}
}
}
// Do the final processing of the current and the previous values
if (slcurr > slprev) {
count = slcurr;
letter = chrcurr;
}
else {
count = slprev;
letter = chprev;
}
return true;
}
string join_data(const svec& data)
{
string ret = "";
for(svec::const_iterator it = data.begin(); it != data.end(); ++it){
ret += *it;
}
return ret;
}
void print_longest_perm_string_seq(svec& items)
{
if (items.size() == 0) {
printf("Empty list of strings\n");
return;
}
int lchrs = 0;
char chr;
string strret = "";
string str = join_data(items);
printf("PERM:\t%s\n", str.data());
longest_sequence(str, lchrs, chr);
strret = str;
while(std::next_permutation(&items[0], &items[0] + items.size())){
int lc;
char chrc;
str = join_data(items);
printf("PERM:\t%s\n", str.data());
if (longest_sequence(str, lc, chrc)) {
if (lc > lchrs) {
lchrs = lc;
chr = chrc;
strret = str;
}
}
}
if (lchrs > 0)
printf("STRING: %s LONGEST CHAR SEQUENCE: %c %d\n", strret.data(), chr, lchrs);
}
int main(int argc, char* argv[])
{
svec items;
items.push_back("aac");
items.push_back("ab");
items.push_back("ba");
print_longest_perm_string_seq(items);
return 0;
}
Use 3 HashMaps:
(1) First one keeps track of the length of the longest prefix consisting of a given character
(2) Second one keeps track of the length of the longest suffix consisting of a given character
(3) Third one keeps track of the TOTAL length of strings that consist entirely of a given character
So for {aa, aac, ba, aaa}:
(1) First one has {a:2, b:1}
(2) Second one has {c:1, a:1}
(3) Third one has {a:5}
The algorithm consists of 2 steps:
(1) Process each string and update the 3 HashMaps above. Also process the characters in the "middle", after dealing with the prefix & suffix. The characters in the middle matter too because it might actually contain the longest string consisting of the same character.
(2) At the end, for each possible character, sum the counts returned by the 3 HashMaps and see which one is longest.
Code below. I tried cleaning it up already, but still kinda messy and not something I could have written on a whiteboard.
- Sunny June 25, 2013