This is a two part question re

Google Interview Question for Software Engineers

0

of 0 votes

9
Answers
This is a two part question related to Markov string generation.

Part 1: Read a training set of strings. For each character in any of the strings, calculate the probability of seeing that character and store it for later use. (this part is about coming up with the right data structure and correct probability calculation).

Part 2: Based on the training set from Part 1, generate a random string of length N.
- Lively March 25, 2015 in United States | Report Duplicate | Flag | PURGE
Google Software Engineer

Email me when people comment.

An error occurred in subscribing you.

Country: United States
Interview Type: In-Person

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 5 vote

Construct a directed graph that starts with a space (" ") node. As you read through the training data, insert a new node for each new character (or update existing nodes). If you have the word "bat", then the graph would be " " -> "b" -> "a" -> "t" -> " ".

After you're done processing the training data, your graph will be a state machine where given any node/state, you have the probabilities to reach any other reachable states.
Ex. If you are at node corresponding to letter "c", and there are 2 links to "a", 3 links to "e", and 1 link to " ", then you know that the next letter will be P("a") = 2/6, P("e") = 3/6, P(" ") = 1/6. Calculate the one to choose, then move to it.

Note: If you have multiple links between two nodes, you wouldn't actually store the same links copied over and over. Instead store a count for each link.

This mechanism has the benefit of finding the next letter based on the current letter you're on rather than a straight up probability of overall letter occurrences, thereby better approximating the behaviour of the training data.

- JW March 26, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

How about part 2 ? Would you traverse the nodes calling a random() function and selecting the next char based on their probability match ?

- guilhebl April 17, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 2 vote

Let's say we are dealing with ASCII characters. Keep matrix 258x258. 256 for each possible character and 2 special ones: beginning of the string and the end of the string.

As you walk each string from beginning to the end, note every adjacent pair of characters, including the imaginary character before the string - beginning - and after - end.
Increment corresponding cell in the matrix [int representation of first char][int representation of second char]

Generating:
Look at the row corresponding to beginning symbol and chose a cell randomly based on its weight, which is a cell value divided by total in the row. Its column is next char.

Repeat until encounter end symbol.

- CT March 25, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

String[] strings = { "Read a set of training strings", "For each character in any of the strings", "calculate the probability of seeing that character" };
int[] counts = new int[27];
int total = 0;
for (String s : strings) {
	total += s.length();
	for (int i=0; i<s.length(); i++) {
		int index = (int)Character.toLowerCase(s.charAt(i))-97;
		index = index < 0 ? 26 : index;
		counts[index]++;
	}
}
for (int i=1; i<counts.length; i++) counts[i] += counts[i-1];
int n = 25;
StringBuilder sb = new StringBuilder();
for (int i=0; i<n; i++) {
	int r = (new Random()).nextInt(total);
	for (int j=0; j<counts.length; j++) {
		if (r > counts[j]) continue;
		int value = j == 26 ? ' ' : j+97;
		sb.append((char)(value));
		break;
	}
}
		System.out.println(sb.toString());

- JP March 25, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

- Keep a sorted list with the chars in frequency order and the number of occurrences
- Generate a random number between 0 and total number of characters
- Go through the list by decrementing the number above with the number of occurrences
- When negative - the current characters is the randomly generated one

C++:

#include <iostream>
#include <cstdlib>

using namespace std;

struct Node {
    Node *next;
    int  data, count;
    Node(int d, int c = 1) : data(d), count(c) {}
};

Node* addData(Node *list, int data) {
    if (!list) {
        return new Node(data);
    }
    
    Node *cur = list, *last = 0, *l=0;
    while (cur) {
        if (cur->data == data) {
            cur->count++;
            if (last && last->count < cur->count) {
                int v = last->data; last->data = cur->data; cur->data = v;
                v = last->count; last->count = cur->count; cur->count = v;
            }
            return list;
        }
        if (!last || last->count != cur->count) {
            last = cur;
        }
        l = cur;
        cur = cur->next;
    }
    // Not found - append
    l->next = new Node(data);
    
    return list;
}

char getRandChar(Node *list, int total) {
    
    int r = rand()%total;
    while (r >= list->count) {
        r -= list->count;
        list = list->next;
    }
    return (char)list->data;
}

int main() {

    int n = 10, count = 0;
    Node *list = 0;
    string strings[] = {"akdjfsfdf", "sdfsdfsdf", "sdfsdfsdfdsf", "aaaasasasaa"};
    
    for (string s : strings) {
        for (char c : s) {
            list = addData(list, c);
            count++;
        }
    }
    
    srand (time(NULL));
    while (n--) {
        cout << getRandChar(list, count);
    }
}

- nikolay.dimitrov March 26, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I used an array of 256 strings to store the frequencies. When character C(cur) is followed by C(next), add C(next) to the string Array[C(cur)].

When creating the output for part 2, just choose a random character in the current string to find the next one.

void main(){
   while (1){
      std::string freqencies[256], input;
      int cnt;
      std::cout << "Enter a string\n";
      std::getline(std::cin, input); 
      input += ' ';
      std::cout << "Enter output count\n";
      std::cin >> cnt; std::cin.ignore();
      
      char cur = ' ';
      for (char c : input){
         freqencies[cur] += tolower(c);
         cur = tolower(c);
      }
      cur = ' ';
      for (int i = 0; i < cnt; i++) {
         cur = freqencies[cur].at(rand() % freqencies[cur].size());
         std::cout << cur;
      }   
      std::cout << '\n';
   }
}

output:

Enter a string
This is Great! Isn't it? I think it is a great solution! That guy tjcbs2 deserves an upvote for his amazing effort.
Enter output count
100
hisort. teamazis fonk in! hit gupvot ang in is sn is erthison't guy t t thazi upvortjcbs foresn

- tjcbs2 March 27, 2015 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

funny username btw.

- tjcbs2 March 27, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

Hi, your solution is very elegant, but not scalable ... The total size of the 256 strings = the size of the input strings. And we may expect a lot of input strings, for the probabilities to be statistically significant. It makes more sense to store only the counters, or the probabilities as proposed by the other solutions.

- ranan.fraer March 27, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 votes

I don't understand your objection. I am basically making a copy of the input string, plus a bit of overhead. Hardly seems unreasonable, or unscalable.

- tjcbs2 March 27, 2015 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

class CharcaterProb {
	static CharcaterProb[] container = new CharcaterProb[26];
	static CharcaterProb getChar(char c) {
		return container[c - 'a'];
	}
	static int count = 0;
	char c;
	int frequency;
	CharcaterProb(char c) {
		this.c = c;
		container[c - 'a'] = this;
		touch();
	}

	static void touch(char c) {
		this.frequency ++;
		count++;
	}
}

void precalc(Stirng[] set) {
	for (String entry: set) {
		for (char c : entry.toCharArray()) {
			if ( CharcaterProb.getCar(c) == null ) {
				new CharacterProb(c);
			} else {
				CharcaterProb.getCar(c).touch();
			}
		}
	}
}

generateRandomString(int n) {
	char[] container = new char[n];
	int l = 0;
	for (char c = 'a'; c <= 'z'; c++)  {
		int freq = CharacterProb.getChar(c);
		while (freq--) {
			container[l++] = c;
		}
	}
	char[] ans = new char[n];

	for ( int i =0; i < n; i++ ) {
		int rnd = rand.nextInt(0, CharcaterProb.count - 1);
		ans[i] = container[rnd];
	}
}

- gime July 06, 2016 | Flag Reply

CareerCup

Google Interview Question for Software Engineers

Books

Videos

Resume Review

Mock Interviews