## Interview Question for Senior Software Development Engineers

• 0
of 0 votes

Country: United States
Interview Type: Written Test

Comment hidden because of low score. Click to expand.
0
of 0 vote

If by tokens you mean words, keep a hash table to maintain count. If you mean characters, keep an array of size 256 to maintain count.

Comment hidden because of low score. Click to expand.
0
of 0 votes

To find k most frequent, use selection algorithm.

Comment hidden because of low score. Click to expand.
0
of 0 votes

1. Hash table: Because of the possibility of collisions, each insertion could take O(m) time, where m is the number of tokens or substrings, which is O(n). So complexity of inserting all tokens is worst-case O(m^2).
2. As for selection, selecting the kth element without sorting is O(n) in the worst case, but you have to do that O(m) times, m being the number of words/tokens.

I decided that an approach using suffix trees is the best for this problem. Suffix trees can be constructed in O(n) time, n being the size of the text string. One doesn't have to tokenize it beforehand - it is done for free when inserting the whole text into the tree. At the end node of each word, store pointers to the ends of the word in the text and a count of the current number of occurrences. At the end collect all these <word, # of occurrences> pairs, and do a radix sort. We thus end up with an O(n) algorithm.

Any comments are welcome.

Comment hidden because of low score. Click to expand.
0
of 0 votes

I dont think Alex solution is so bad, he is asking to maintain count not insert in hash table, which should be constant theoretically, practically unless I insert same set of keys, collision will hardly occur so this works as O(n). Coming to selection you can use max heap of size k, to get max k elements in a constant time. nlog(k) is complexity, in a worst case of k=n(or approaching n), heap insertion becomes constant.

By the way, what is the interviewer response to your solution

Comment hidden because of low score. Click to expand.
0
of 0 vote

Use a hash method and count each duplicate tokens upon hashing. For each hashed item count the duplicate, and the item which has maximum number of hash slot encountered should be the frequently repeated item, producing complexity of O(n).

Comment hidden because of low score. Click to expand.
0
of 0 vote

Code is as below-

``````import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

/**
* @author kedar dixit
*
*/
public class WordCount {

public static void main(String[] args) {
Scanner scanner = new Scanner(System.in);
System.err.println("Enter A String");
String string = scanner.nextLine();
Map<String, Integer> map = new HashMap<>();
String[] tokens = string.split(" ");
int maxVal = 0;
for (String token : tokens) {
map.put(token, (map.get(token) == null ? 0 : map.get(token)) + 1);
if (maxVal < map.get(token)) {
maxVal = map.get(token);
}
}
System.err.println(map);
System.out.println(maxVal);
}
}``````

Sample Input and Output-

Enter A String
Enter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A StringEnter A String
{A=28, String=1, Enter=1, StringEnter=27}
28

Comment hidden because of low score. Click to expand.
0
of 0 votes

That's not what the question is asking. It needs to print out the top K most frequent tokens where K can be 1, 2, 3, 4... Your solution just prints out the maximum occurrrence of a token.

Comment hidden because of low score. Click to expand.
0
of 0 vote

need clarification:
input: 121221312
output: 1 , 2 , 12 with tokenization 1 2 12 2 1 3 12 ??

input: 1212213
output: 1 , 2 , 3 with tokenization 1 2 1 2 2 1 3 ??

Comment hidden because of low score. Click to expand.
0
of 0 votes

The usual delimiters for tokens apply, so the string "121221312" is simply one token, whereas "1 2 1 2 2 1 3 12" would be 8 tokens, and the output would be "1 2 12 3".

I forgot to say that when two strings are equally frequent, they are reported in lexicographic order.

Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

### Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

### Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More