## Facebook Interview Question for Software Engineers

Country: United States

Comment hidden because of low score. Click to expand.
3
of 3 vote

I think y'all are overthinking this; it says the similarity function is a given. This question is about how you manage building the similarity dictionary (whoops I gave it away).

Comment hidden because of low score. Click to expand.
1
of 1 vote

``````//Levenshtein Algo
public static void main(String [] args) {
String a [] = {"tasneem@gmail.com", "tas@hotmail.com", "abc@gmail.com", "abc@hotmail.com", "apc@yahoo.com"};
HashMap<Integer, ArrayList<String>> similarElementsMap = new HashMap<>();
for (int i = 0 ; i < a.length - 1; i ++) {
int d = distance(a[i], a[i+1]);
if (similarElementsMap.containsKey(d)) {
ArrayList<String> list = similarElementsMap.get(d);
} else {
ArrayList<String> list = new ArrayList<>();
similarElementsMap.put(d, list);
}
}

for (Entry<Integer, ArrayList<String>> map : similarElementsMap.entrySet()) {
System.out.println("For distance : "+map.getKey());
for (int i = 0 ; i < map.getValue().size(); i ++) {
System.out.println(map.getValue().get(i));
}
}

}

private static int distance(String a, String b) {

a = a.substring(0, a.indexOf('@')).toLowerCase();
b = b.substring(0, b.indexOf('@')).toLowerCase();
// base case
if (a.length() == 0) return b.length();
if (b.length() == 0)return a.length();

int cost[] = new int[b.length() + 1];
// Initialise cost with each indexes initialy
for (int i = 0 ; i < b.length(); i ++) {
cost[i] = i;
}

// Iterate through a and b string and update cost array with each a iteration
for (int i = 1; i <= a.length(); i ++) {
cost = i;
int k = i - 1;
for (int j =1; j <= b.length(); j ++) {
int c = Math.min(1 + Math.min(cost[j], cost[j-1]), a.charAt(i-1) == b.charAt(j-1) ? k : k + 1);
// Swap the min value with cost and k
k = cost[j];
cost[j] = c;
}
}
return cost[b.length()];``````

}

Comment hidden because of low score. Click to expand.
1
of 1 vote

Python simple solution:

``````def groupEmails(emails):
buckets = []
for email in emails:
for bucket in buckets:
if isSimilar(email, bucket):
bucket.append(email)
break
else:
buckets.append([email])
return buckets``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

Weighted Quick Union + Path Compression
Complexity: N + M lg* N. M unions (or similarities) on N Emails.

Comment hidden because of low score. Click to expand.
0

Even if we assume transitivity of the similarity function, how do you reduce the number of unions to be performed (M in this case) to below O(N^2)?

Comment hidden because of low score. Click to expand.
0
of 0 vote

We can use the Levenshtein distance or edit distance algorithm. More the levenshtein distance, more dissimilar the email IDs are. We have to make sure, we do not compare the TLDs i.e. ignore @gmail.com and @hotmail.com

For instance, "hello" and "jello" have LD as 1. So, quite similar. However, "good" and "goodbye" will have LD as 3, so slight dissimilar.

Comment hidden because of low score. Click to expand.
0
of 0 vote

We can use Levenshtein distance algorithm. More the LD distance, more dissimilar the words.

Comment hidden because of low score. Click to expand.
0
of 0 vote

``````//Levenshtein Algo
public static void main(String [] args) {
String a [] = {"tasneem@gmail.com", "tas@hotmail.com", "abc@gmail.com", "abc@hotmail.com", "apc@yahoo.com"};
HashMap<Integer, ArrayList<String>> similarElementsMap = new HashMap<>();
for (int i = 0 ; i < a.length - 1; i ++) {
int d = distance(a[i], a[i+1]);
if (similarElementsMap.containsKey(d)) {
ArrayList<String> list = similarElementsMap.get(d);
} else {
ArrayList<String> list = new ArrayList<>();
similarElementsMap.put(d, list);
}
}

for (Entry<Integer, ArrayList<String>> map : similarElementsMap.entrySet()) {
System.out.println("For distance : "+map.getKey());
for (int i = 0 ; i < map.getValue().size(); i ++) {
System.out.println(map.getValue().get(i));
}
}

}

private static int distance(String a, String b) {

a = a.substring(0, a.indexOf('@')).toLowerCase();
b = b.substring(0, b.indexOf('@')).toLowerCase();
// base case
if (a.length() == 0) return b.length();
if (b.length() == 0)return a.length();

int cost[] = new int[b.length() + 1];
// Initialise cost with each indexes initialy
for (int i = 0 ; i < b.length(); i ++) {
cost[i] = i;
}

// Iterate through a and b string and update cost array with each a iteration
for (int i = 1; i <= a.length(); i ++) {
cost = i;
int k = i - 1;
for (int j =1; j <= b.length(); j ++) {
int c = Math.min(1 + Math.min(cost[j], cost[j-1]), a.charAt(i-1) == b.charAt(j-1) ? k : k + 1);
// Swap the min value with cost and k
k = cost[j];
cost[j] = c;
}
}
return cost[b.length()];``````

}

Comment hidden because of low score. Click to expand.
0
of 0 vote

Given people already are massively overthought this - and they did not think what should be thought about : en.wikipedia.org/wiki/Similarity_measure.
Take a pause and think this.
Is it transitive? That is :

``A similar_to B  and B similar_to C ==>(implies) A similar_to C``

does it hold true? 99.999% of the real practical cases, it won't be.
Take name for example.
ram : ok, male
rama : ok, male again perhaps
ramaa : ambiguous - perhaps it is a female name ? Now thinking backward,
ramaa --> rama :: hmm. See the problem there?
what about roma? ramaa and roma are close but is ram and roma close? Well.

That... pose a problem.
How did we get it? By working in system - where close name matching may point to same individual - facebook is another classic example. Same problem persists in the Indian addresses.

Thus, we need to ask, is the similarity function transitive? That would have clear impact on any algorithm we try to apply.

Comment hidden because of low score. Click to expand.
0
of 0 vote

I assumed that traistive rule doesnt work in this case (As NoOne mentioned it earlier). So that means I have to compare each emailaddress for similarity. so the brute force takes the complexity of O(n^2)

Here is a sample code, Is there any better solution

``````List<HashSet<string>> allSets = new List<HashSet<string>>();
for(int i = 0; i < emailIds.Length; ++i)
{
HashSet<string> currentSet = new HashSet<string>();
for (int j = 0; j < emailIds.Length; ++j)
{
if(i == j)
{
//same check
continue;
}
if(true == IsSimilar(emailIds[i], emailIds[j]))
{
}
}

if(currentSet.Count == 1)
{
// there is no similar email id to this, so let just remove it
currentSet.Remove(emailIds[i]);
}
}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

Let's suppose the interviewer says it /is/ transitive, i.e. A=B, B=C --> A=C. And let's say they only want sets with multiple emails, i.e. no singleton sets need be kept.

``````LinkedList<String> input = ...;
while (!input.isEmpty()) {
// pop one item
String current = input.remove(0);// (if array list, take last instead)
HashSet<String> equal = new HashSet<>();// (or ArrayList)
// transfer any similar emails
Iterator<String> it = input.iterator();
while (it.hasNext()) {
String s = it.next();
if (isSimilar(current,s)) {
it.remove();// fast for LinkedList (otherwise careful)
}
}
// only output non-singleton sets
if (!equal.isEmpty()) {
}
}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

Let's suppose the interviewer says it /is/ transitive, i.e. A=B, B=C --> A=C. And let's say they only want sets with multiple emails, i.e. no singleton sets need be kept.

``````LinkedList<String> input = ...;
while (!input.isEmpty()) {
// pop one item
String current = input.remove(0);// (if array list, take last instead)
HashSet<String> equal = new HashSet<>();// (or ArrayList)
// transfer any similar emails
Iterator<String> it = input.iterator();
while (it.hasNext()) {
String s = it.next();
if (isSimilar(current,s)) {
}
}
// only output non-singleton sets
if (!equal.isEmpty()) {
}
}``````

Comment hidden because of low score. Click to expand.
0
of 0 vote

``````object Solution {

import scala.collection._

type Email = String

val sc = new java.util.Scanner (System.in)
var n = sc.nextInt()
var emails = new Array[String](n)
for(i <- 0 until n) {
emails(i) = sc.next()
}

emails.toList
}

def isSimilar(email1: Email, email2: Email): Boolean = {
val maxDistance = 2

def countify(str: String): Map[Char, Int] = {
str.groupBy(identity).mapValues(_.length)
}

def distance(str1: String, str2: String): Int = {
val (c1, c2) = (countify(str1), countify(str2))
val (s1, s2) = (c1.keySet, c2.keySet)
val all = (s1 | s2).toList

all.map(x => math.abs(c1.getOrElse(x, 0) - c2.getOrElse(x, 0))).sum / 2
}

distance(email1, email2) <= maxDistance
}

def groupBySimilarity(emails: List[Email], similarFunc: (Email, Email) => Boolean): List[Set[Email]] = {

val res = emails.foldLeft(Map.empty[Email, mutable.Set[Email]])( (map, email) => {
val key = map.keys.filter(k => similarFunc(k, email)).headOption
if (key.isDefined) {
map(key.get) += email
map
} else {
val set = new mutable.HashSet[Email]()
set += email
map + (email -> set)
}
})

res.mapValues(_.toSet).values.toList
}

def main(args: Array[String]) {
val grouped = groupBySimilarity(emails, isSimilar)
grouped.foreach(g => println(g.mkString(" ")))
}

}``````

Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

### Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

### Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.