Given some email ids, and a si

Facebook Interview Question for Software Engineers

1

of 1 vote

13
Answers
Given some email ids, and a similarity function which says whether two email ids are similar, determine all the sets of email ids that are similar to each other.
- Ray March 28, 2017 in United States | Report Duplicate | Flag | PURGE
Facebook Software Engineer Algorithm

Email me when people comment.

An error occurred in subscribing you.

Country: United States

Email me when people comment.

An error occurred in subscribing you.

Comment hidden because of low score. Click to expand.

of 3 vote

I think y'all are overthinking this; it says the similarity function is a given. This question is about how you manage building the similarity dictionary (whoops I gave it away).

- Honcho March 29, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

//Levenshtein Algo
	public static void main(String [] args) {
	    String a [] = {"tasneem@gmail.com", "tas@hotmail.com", "abc@gmail.com", "abc@hotmail.com", "apc@yahoo.com"};
	    HashMap<Integer, ArrayList<String>> similarElementsMap = new HashMap<>();
	    for (int i = 0 ; i < a.length - 1; i ++) {
	      int d = distance(a[i], a[i+1]);
	      if (similarElementsMap.containsKey(d)) {
	    	  ArrayList<String> list = similarElementsMap.get(d);
	    	  list.add(a[i]);
	    	  list.add(a[i+1]);
	      } else {
	    	  ArrayList<String> list = new ArrayList<>();
	    	  list.add(a[i]);
	    	  list.add(a[i+1]);
	    	  similarElementsMap.put(d, list);
	      }
	    }
	    
	    for (Entry<Integer, ArrayList<String>> map : similarElementsMap.entrySet()) {
			System.out.println("For distance : "+map.getKey());
			for (int i = 0 ; i < map.getValue().size(); i ++) {
				System.out.println(map.getValue().get(i));
			}
		}
	    
	  }
	  
	  
	  private static int distance(String a, String b) {
	   
	    a = a.substring(0, a.indexOf('@')).toLowerCase();
	    b = b.substring(0, b.indexOf('@')).toLowerCase();
	    // base case 
	    if (a.length() == 0) return b.length();
	    if (b.length() == 0)return a.length();
	    
	    int cost[] = new int[b.length() + 1];
	    // Initialise cost with each indexes initialy 
	    for (int i = 0 ; i < b.length(); i ++) {
	      cost[i] = i;
	    }
	    
	    // Iterate through a and b string and update cost array with each a iteration
	    for (int i = 1; i <= a.length(); i ++) {
	      cost[0] = i;
	      int k = i - 1;
	      for (int j =1; j <= b.length(); j ++) {
	        int c = Math.min(1 + Math.min(cost[j], cost[j-1]), a.charAt(i-1) == b.charAt(j-1) ? k : k + 1);
	        // Swap the min value with cost and k
	        k = cost[j];
	        cost[j] = c;
	      }
	    }
	    return cost[b.length()];

}

- Tasneem March 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 1 vote

Python simple solution:

def groupEmails(emails):
	buckets = []
	for email in emails:
		for bucket in buckets:
			if isSimilar(email, bucket[0]):
				bucket.append(email)
				break
		else:
			buckets.append([email])
	return buckets

- Galileo April 22, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Weighted Quick Union + Path Compression
Complexity: N + M lg* N. M unions (or similarities) on N Emails.

- naren March 28, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 votes

Even if we assume transitivity of the similarity function, how do you reduce the number of unions to be performed (M in this case) to below O(N^2)?

- axg April 13, 2017 | Flag

Comment hidden because of low score. Click to expand.

of 0 vote

We can use the Levenshtein distance or edit distance algorithm. More the levenshtein distance, more dissimilar the email IDs are. We have to make sure, we do not compare the TLDs i.e. ignore @gmail.com and @hotmail.com

For instance, "hello" and "jello" have LD as 1. So, quite similar. However, "good" and "goodbye" will have LD as 3, so slight dissimilar.

- puneet March 28, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

We can use Levenshtein distance algorithm. More the LD distance, more dissimilar the words.

- puneet March 28, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

//Levenshtein Algo
	public static void main(String [] args) {
	    String a [] = {"tasneem@gmail.com", "tas@hotmail.com", "abc@gmail.com", "abc@hotmail.com", "apc@yahoo.com"};
	    HashMap<Integer, ArrayList<String>> similarElementsMap = new HashMap<>();
	    for (int i = 0 ; i < a.length - 1; i ++) {
	      int d = distance(a[i], a[i+1]);
	      if (similarElementsMap.containsKey(d)) {
	    	  ArrayList<String> list = similarElementsMap.get(d);
	    	  list.add(a[i]);
	    	  list.add(a[i+1]);
	      } else {
	    	  ArrayList<String> list = new ArrayList<>();
	    	  list.add(a[i]);
	    	  list.add(a[i+1]);
	    	  similarElementsMap.put(d, list);
	      }
	    }
	    
	    for (Entry<Integer, ArrayList<String>> map : similarElementsMap.entrySet()) {
			System.out.println("For distance : "+map.getKey());
			for (int i = 0 ; i < map.getValue().size(); i ++) {
				System.out.println(map.getValue().get(i));
			}
		}
	    
	  }
	  
	  
	  private static int distance(String a, String b) {
	   
	    a = a.substring(0, a.indexOf('@')).toLowerCase();
	    b = b.substring(0, b.indexOf('@')).toLowerCase();
	    // base case 
	    if (a.length() == 0) return b.length();
	    if (b.length() == 0)return a.length();
	    
	    int cost[] = new int[b.length() + 1];
	    // Initialise cost with each indexes initialy 
	    for (int i = 0 ; i < b.length(); i ++) {
	      cost[i] = i;
	    }
	    
	    // Iterate through a and b string and update cost array with each a iteration
	    for (int i = 1; i <= a.length(); i ++) {
	      cost[0] = i;
	      int k = i - 1;
	      for (int j =1; j <= b.length(); j ++) {
	        int c = Math.min(1 + Math.min(cost[j], cost[j-1]), a.charAt(i-1) == b.charAt(j-1) ? k : k + 1);
	        // Swap the min value with cost and k
	        k = cost[j];
	        cost[j] = c;
	      }
	    }
	    return cost[b.length()];

}

- tasneem March 30, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Given people already are massively overthought this - and they did not think what should be thought about : en.wikipedia.org/wiki/Similarity_measure.
Take a pause and think this.
Is it transitive? That is :

A similar_to B  and B similar_to C ==>(implies) A similar_to C

does it hold true? 99.999% of the real practical cases, it won't be.
Take name for example.
ram : ok, male
rama : ok, male again perhaps
ramaa : ambiguous - perhaps it is a female name ? Now thinking backward,
ramaa --> rama :: hmm. See the problem there?
what about roma? ramaa and roma are close but is ram and roma close? Well.

That... pose a problem.
How did we get it? By working in system - where close name matching may point to same individual - facebook is another classic example. Same problem persists in the Indian addresses.

Thus, we need to ask, is the similarity function transitive? That would have clear impact on any algorithm we try to apply.

- NoOne March 31, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

I assumed that traistive rule doesnt work in this case (As NoOne mentioned it earlier). So that means I have to compare each emailaddress for similarity. so the brute force takes the complexity of O(n^2)

Here is a sample code, Is there any better solution

List<HashSet<string>> allSets = new List<HashSet<string>>();
            for(int i = 0; i < emailIds.Length; ++i)
            {
                HashSet<string> currentSet = new HashSet<string>();
                currentSet.Add(emailIds[i]);
                for (int j = 0; j < emailIds.Length; ++j)
                {
                    if(i == j)
                    {
                        //same check
                        continue;
                    }
                    if(true == IsSimilar(emailIds[i], emailIds[j]))
                    {
                        currentSet.Add(emailIds[j]);
                    }
                }

                if(currentSet.Count == 1)
                {
                    // there is no similar email id to this, so let just remove it
                    currentSet.Remove(emailIds[i]);
                }
                allSets.Add(currentSet);
            }

- The-Learner March 31, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Let's suppose the interviewer says it /is/ transitive, i.e. A=B, B=C --> A=C. And let's say they only want sets with multiple emails, i.e. no singleton sets need be kept.

LinkedList<String> input = ...;
List<HashSet<String>> output = new LinkedList<>();
while (!input.isEmpty()) {
	// pop one item
	String current = input.remove(0);// (if array list, take last instead)
	HashSet<String> equal = new HashSet<>();// (or ArrayList)
	// transfer any similar emails
	Iterator<String> it = input.iterator();
	while (it.hasNext()) {
		String s = it.next();
		if (isSimilar(current,s)) {
			equal.add(s);
			it.remove();// fast for LinkedList (otherwise careful)
		}
	}
	// only output non-singleton sets
	if (!equal.isEmpty()) {
		equal.add(current);//sic: add only if not empty
		output.add(equal);
	}
}

- gforman44 April 04, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

Let's suppose the interviewer says it /is/ transitive, i.e. A=B, B=C --> A=C. And let's say they only want sets with multiple emails, i.e. no singleton sets need be kept.

LinkedList<String> input = ...;
List<HashSet<String>> output = new LinkedList<>();
while (!input.isEmpty()) {
	// pop one item
	String current = input.remove(0);// (if array list, take last instead)
    HashSet<String> equal = new HashSet<>();// (or ArrayList)
	// transfer any similar emails
	Iterator<String> it = input.iterator();
	while (it.hasNext()) {
		String s = it.next();
		if (isSimilar(current,s)) {
			equal.add(s);
			it.remove();// fast for LinkedList
		}
	}
	// only output non-singleton sets
	if (!equal.isEmpty()) {
		equal.add(current);//sic: add only if not empty
		output.add(equal);
	}
}

- gforman44 April 04, 2017 | Flag Reply

Comment hidden because of low score. Click to expand.

of 0 vote

object Solution {
  
  import scala.collection._

  type Email = String
  
  def readInput(): List[Email] = {
    val sc = new java.util.Scanner (System.in)
    var n = sc.nextInt()
    var emails = new Array[String](n)
    for(i <- 0 until n) {
      emails(i) = sc.next()
    }

    emails.toList
  }

  def isSimilar(email1: Email, email2: Email): Boolean = {
    val maxDistance = 2
    
    def countify(str: String): Map[Char, Int] = {
      str.groupBy(identity).mapValues(_.length)
    }

    def distance(str1: String, str2: String): Int = {
      val (c1, c2) = (countify(str1), countify(str2))
      val (s1, s2) = (c1.keySet, c2.keySet)
      val all = (s1 | s2).toList

      all.map(x => math.abs(c1.getOrElse(x, 0) - c2.getOrElse(x, 0))).sum / 2
    }
    
    distance(email1, email2) <= maxDistance
  }

  def groupBySimilarity(emails: List[Email], similarFunc: (Email, Email) => Boolean): List[Set[Email]] = {
    
    val res = emails.foldLeft(Map.empty[Email, mutable.Set[Email]])( (map, email) => {
      val key = map.keys.filter(k => similarFunc(k, email)).headOption
      if (key.isDefined) {
        map(key.get) += email
        map
      } else {
        val set = new mutable.HashSet[Email]()
        set += email
        map + (email -> set)
      }
    })

    res.mapValues(_.toSet).values.toList
  }
  
  def main(args: Array[String]) {
    val emails = readInput()
    val grouped = groupBySimilarity(emails, isSimilar)
    grouped.foreach(g => println(g.mkString(" ")))
  }

}

- Aleks.Sidorenko April 19, 2017 | Flag Reply

CareerCup

Facebook Interview Question for Software Engineers

Books

Videos

Resume Review

Mock Interviews