## Google Interview Question for Software Engineer / Developers

Country: United States

Comment hidden because of low score. Click to expand.
3
of 3 vote

Better formatting

``````Root	N1	N2	N3	N4
Root		1	1	1	1	1
N1		1	0	0	0	0
N2		1	0	0	0	0
N3		1	0	0	0	0
N4		1	0	0	0	0

Sends email to N1; Sends email to N3 (2 different emails)
Root	N1	N2	N3	N4
Root		1	2	1	2	1
N1		2	0	0	0	0
N2		1	0	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2 (same email)

Root	N1	N2	N3	N4
Root		1	3	2	2	1
N1		3	0	1	0	0
N2		2	1	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2, N3

Root	N1	N2	N3	N4
Root		1	4	3	3	1
N1		4	0	2	1	0
N2		3	2	0	1	0
N3		3	1	1	0	0
N4		1	0	0	0	0``````

Comment hidden because of low score. Click to expand.
0

How do we persist this data to disk?

Comment hidden because of low score. Click to expand.
0

If there are 10000 contacts, we need to store 10^8 values. Is there a better way?

Comment hidden because of low score. Click to expand.
0

we can use sparse matrix to contain the data.

Comment hidden because of low score. Click to expand.
0

Comment hidden because of low score. Click to expand.
1
of 1 vote

What about a Trie as the data structure ?
We can display all the nodes reachable from a traversed ancestor.

Comment hidden because of low score. Click to expand.
0

Can you elaborate?

Comment hidden because of low score. Click to expand.
0

When we compose a new mail in gmail, it simply displays the emails with matching prefixes. For this purpose trie is a good solution. i too think that was the intention of the question

Comment hidden because of low score. Click to expand.
1
of 1 vote

Here's a simple approach. For each email sent, we note the recipients. Suppose the recipients are A,B,C. We then consider all the pairs among them (order matters). In this case the pairs are (A, B) (B, A) (A, C) (C, A) (B, C) and (C, B). We then use a hashtable (call it ht) to record these pairs. For pair (A, B), we first set ht[A][B] to 0 if this value doesn't already exist. Then we increment ht[A][B] by one. Similarly for all the other pairs.

Now suppose the user is composing an email to recipients A & B. We then examine all the values in ht[A] and ht[B]. Each value will basically be a (recipient, count) pair such as (C, 3) or (D, 5). We then combine pairs with the same recipients, like (C, 3) and (C, 2) into (C, 5). Finally, we sort these pairs by the count and simply suggest say the top 5 recipients.

This is quite inefficient and also treats an email with recipients A, B, C the same as 3 emails with recipients A & B and A & C and B & C, so it discounts the significance of A, B, C occurring together in that email.

Comment hidden because of low score. Click to expand.
0
of 0 vote

Weighted connected graph.

Suppose there are n emails, the graph will have (n+1) nodes (+1 is for the sender's email). Initially all the edges have weight 0 except the ones connected to the root (which is the sender's email).

First we need to "train" the data (like in machine learning) or in simple words build the graph. For the first few emails, we keep incrementing the weights and after the training is done we will be suggesting the emails to add.

For example: Suppose we have 4 emails then the connected graph will have 5 nodes with the following weights

``````Root	N1	N2	N3	N4
Root		1	1	1	1	1
N1		1	0	0	0	0
N2		1	0	0	0	0
N3		1	0	0	0	0
N4		1	0	0	0	0

Sends email to N1; Sends email to N3 (2 different emails)
Root	N1	N2	N3	N4
Root		1	2	1	2	1
N1		2	0	0	0	0
N2		1	0	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2 (same email)

Root	N1	N2	N3	N4
Root		1	3	2	2	1
N1		3	0	1	0	0
N2		2	1	0	0	0
N3		2	0	0	0	0
N4		1	0	0	0	0

Sends email to N1, N2, N3

Root	N1	N2	N3	N4
Root		1	4	3	3	1
N1		4	0	2	1	0
N2		3	2	0	1	0
N3		3	1	1	0	0
N4		1	0	0	0	0``````

Coming to predicting/training. When the sender wants to send email to N1. You will see what N1 is connected to and suggest the highest weighted Node (in this case N2) and then N3 until there are no more nodes. And again you will add weights to the corresponding nodes.

Comment hidden because of low score. Click to expand.
0

won't there be n^2 nodes. in which every node is connected to every other node. unless we represent it as a matrix.

Comment hidden because of low score. Click to expand.
0
of 0 vote

My approach would like this: Each E-mail is represented as a node in a graph. When a new E-mail is encountered, it is added to the graph. After an E-mail is submitted to be sent, all of the recipients addresses are connected to each other in the graph via edges.

It is a good idea to assign a weight to these edges - let's say for each iteration of the above process the edges are strengthened by 1. Every time an E-mail is sent to a node (recipient), we will also check which connected nodes were NOT E-mailed, and decrement the strength of those edges by say 0.5 - this prevents situations where we keep recommending E-mails to A + B when the user rarely E-mails A+B together. We should also have a maximum limit to the edge weight, say 3.

Now, for the situations where we have A+B+C and we want to remember and suggest this association - when the user enters A+B, we can see that A,C are connected and B,C are connected. For each connection of every recipient, put the associated nodes in a bucket (say, a hash map) and aggregate the weights of the edges (in this case C gets stronger since it occurs twice). Now we iterate over the map and suggest the nodes with the highest value.

Comment hidden because of low score. Click to expand.
0
of 0 vote

I think the simplies solution for this is a dictionary of dictionaries. We need to quickly find relations for a given key (email), so dictionary with key by email. We might have selveral relations and we need to store weight for each of them, so another dictionary.
I've created a small sample in C#:

``````public class RelatedContacts
{
private readonly Dictionary<string, Dictionary<string, int>> _storage;

public RelatedContacts (IEnumerable<string> emailRecipients)
{
_storage = new Dictionary<string,Dictionary<string,int>>(StringComparer.InvariantCultureIgnoreCase);

foreach (var recipients in emailRecipients)
{
var emails = recipients.Split(new[] { ';', ',' }, StringSplitOptions.RemoveEmptyEntries);
if (emails.Length < 2) continue;

for (var i = 0; i < emails.Length; i++)
{
for (var j = i + 1; j < emails.Length; j++)
{
relate(_storage, emails[i].Trim(), emails[j].Trim());
relate(_storage, emails[j].Trim(), emails[i].Trim());
}
}
}
}

private static void relate(Dictionary<string, Dictionary<string, int>> relations, string emailA, string emailB)
{
Dictionary<string, int> contacts;
if (!relations.TryGetValue(emailA, out contacts)) contacts = new Dictionary<string, int>(StringComparer.InvariantCultureIgnoreCase);
int weight = 0;
contacts.TryGetValue(emailB, out weight);
contacts[emailB] = weight + 1;
relations[emailA] = contacts;
}

public IEnumerable<string> Get(string email)
{
Dictionary<string, int> contacts;
if (!_storage.TryGetValue(email, out contacts)) return Enumerable.Empty<string>();
return contacts.OrderBy((_) => _.Value).Select(_ => _.Key);
}

public IEnumerable<string> Get(IEnumerable<string> emails)
{
Dictionary<string, int> result = new Dictionary<string, int>();
Dictionary<string, int> contacts;
foreach (var email in emails)
{
if (!_storage.TryGetValue(email, out contacts)) continue;
foreach (var pair in contacts)
{
int weight = 0;
result.TryGetValue(pair.Key, out weight);
result[pair.Key] = weight + pair.Value;
}
}

return result.OrderBy((_) => _.Value).Select(_ => _.Key);
}
}``````

Comment hidden because of low score. Click to expand.
0

Ok, probably it's too much code in my solution without explanation.
So, it's a directed weighted graph, represented as sparse matrix where vertex is email and for each vertex we store a list of adjacent vertexes with weights.
Data may be persisted as 2 files or tables:
1. list of vertexes: (email, int key) pairs
2. one to many relation of int vertex key to pair of another vertex key and weight

Some improvements might be suggested to my initial solution:
1. If the total number of distinct emails is big it's better to create a help dictionary to associate email with int key and use it to represent vertex in a graph (i.e. dictionary in the code above). Because it's cheaper to store int than pointer to string.
2. If average number of related contacts is not big, we could use a list to store adjacent vertexes. This trade off will be slower on adding related contacts, but will consume less memory.
3. Initial solution is done as immutable class, but it's not hard to add method for adding relations.

Comment hidden because of low score. Click to expand.
0
of 0 vote

HashMap<String,Histogram> contact2contacts.

Histograms interface:
void countContact(String contact);
List<String> getContacts() // in the order from higher count

On every send and receive email, for each pair of contacts in the same email, get Histogram form HashMap for first and count the second and vice versa.

Comment hidden because of low score. Click to expand.
0
of 0 vote

I think for each person, the contact list will not so large, so I think brute string match can handle this problem.

Comment hidden because of low score. Click to expand.
0
of 0 vote

``````import java.util.*;
public class TrieNode {
public char Char;
public List<String> texts;
}

import java.util.*;
public class Trie {
public Trie()
{
if ( head == null ) {
}
}
public TrieNode constructNode(char character)
{
TrieNode aNode = new TrieNode();
aNode.Char = character;
aNode.texts = new ArrayList<>();
return aNode;
}
public void dumpString(String name)	{
String[] names = name.split(" ");
for(String aname : names){
injectIntoTrie(aname.toLowerCase(),name.toLowerCase());
}
}
public void injectIntoTrie(String element,String text)	{
for(int i=0;i<element.length();i++){
char character = element.charAt(i);
System.out.println("Cannot find char - creating new " + character);
}
else
{
System.out.println("Found char - moving on " + character);
}
}
}
public void findTextSimilar(String prefix) {
for(int i=0;i<prefix.length();i++){
char character = prefix.toLowerCase().charAt(i);
if ( start == null ){
break;
}
}
for( String text : start.texts){
System.out.println("Prefix String " + text);
}
traverseNode(start);
}
public void traverseNode(TrieNode start) {
System.out.println(String.format("%c",'a'+i));
}
for(String text : start.texts){
System.out.println("Prefix String " + text);
}
}
public static void main(String[] args)
{
Trie t = new Trie();
t.dumpString("Naveen Venkatesana");
t.dumpString("Naveeen Benkatesan");
t.dumpString("Naven");
t.dumpString("Venkatesan");
t.findTextSimilar("ben");
}
}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

Not sure it's the best solution for storage, but what you could do is have a table of the form EmailID, From, To <-- this is only one person.

So say A sends first email to P1, P2 and P3,

then you store three records:
1, A, P1
1, A, P2
1, A, P3

as A sends more emails:

1, A, P1
1, A, P2
1, A, P3
2,A,P1
2,A,P3
3,A,P2
3,A,P4
3,A,P3
4,A,P2
4,A,P3

It would then be easy to get all the weights for each edge.

As for the Data Structure, I'd probably use an Adjacency List, since using a Matrix could be a lot more expensive memory-wise (suppose a 1000 contact list, you'd have 1M fields).

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.