Facebook Interview Question
SDE1sCountry: United States
#solution for part 1
from collections import Counter
def findSubsetWord ( s, words):
result = []
for word in words:
w= Counter (word)
c= Counter (s)
c.subtract(w)
# print (c)
if len ( [i for i in c.values () if i <0 ])==0:
result.append(word)
return result
def main ( input, words):
result = findSubsetWord(input, words)
print (f" The following words {result} are subset of {input}")
if __name__ == '__main__':
main ("ACRAT", ["A", "BAR", "CAR", "AAA" "ACA", "ART", "RAC",])
I would take the approach of a tree structure containing has values.
Finally try to match the hash values of sub sequence anagrams for values in the tree.
Creating the tree is O(logn)
Creating sub seq anagrams for the string to be searched for - O(k2) (k length of word to be searched)
Searching is O(logN).
Total search - O(k^2) + O(logn)
Here is the working code -
Few things -
The more the tree in balanced, the better search time, like AVL tree
In place of BST, if we use tree with n nodes, depending on the no. of words in the dictionary, the better the search time.
public static void main(String[] args) {
String str = "ACRAT";
BT root = new BT(10.0, null, null, "");
// "A", "CAR", "ACA", "ART", "RAC"
add(root, hash("A".toCharArray()), "A");
add(root, hash("CAR".toCharArray()), "CAR");
add(root, hash("ACA".toCharArray()), "ACA");
add(root, hash("ART".toCharArray()), "ART");
add(root, hash("RAC".toCharArray()), "RAC");
Set<String> sub = sub(str);
Set<String> ans = new HashSet<>();
for (String string : sub) {
find(root, hash(string.toCharArray()), ans);
}
System.out.println(ans);
}
// ACRAT
public static Set<String> sub(String str) {
if(str == null || (str != null && str.length() == 0))
return new HashSet<>();
char head = str.charAt(0);
String rem = str.substring(1, str.length());
Set<String> res = sub(rem);
Set<String> ans = new HashSet<>();
ans.add(String.valueOf(head));
for (String string : res) {
ans.add(string);
for (int i = 0; i <= string.length(); i++) {
String pre = string.substring(0, i);
String post = string.substring(i, string.length());
ans.add(pre+head+post);
}
}
return ans;
}
public static void find(BT node, double hash, Set<String> ans) {
if (node == null)
return;
if (node.hval == hash) {
ans.add(node.word);
return;
}
if (hash < node.hval)
find(node.left, hash, ans);
if (hash > node.hval)
find(node.right, hash, ans);
}
// better to have height balanced AVL tree, or nTree
public static void add(BT node, double hash, String word) {
if (node == null)
return;
if (node.left == null && node.hval > hash) {
BT bt = new BT(hash, null, null, word);
node.left = bt;
} else if (node.right == null && node.hval < hash) {
BT bt = new BT(hash, null, null, word);
node.right = bt;
}
if (hash < node.hval)
add(node.left, hash, word);
if (hash > node.hval)
add(node.right, hash, word);
}
public static double hash(char... carr) {
int n = carr.length;
int i = 0;
double hash = 0.0;
double k = 2.73;
while (i < n) {
hash += carr[i] * k;
i++;
}
return hash;
}
static class BT {
double hval;
BT left;
BT right;
String word;
public BT(double hval, BT left, BT right, String word) {
this.hval = hval;
this.left = left;
this.right = right;
this.word = word;
}
}
Using a custom Trie like tree (not really a trie, since a single node can hold multiple words - this is because of the sorting)
As @ChrisK pointed out, sorting the input and the dictionary is the way to go:
My solution is basically @ChrisK solution that actually compiles :)
* The trie here is case-insensitive
* Only A-Z words are allowed
#include "CommonHeader.h"
static inline int char_to_num(char ch) { return ch - 'a'; }
class Trie
{
Trie* m_children[26];
std::unordered_set<std::string> m_words;
private:
/**
* @brief find a node for a given char, create it if it does not exist
* this function is used by "insertSorted" method
*/
Trie* getOrCreateNode(char ch)
{
int offset = char_to_num(ch);
if(m_children[offset] == nullptr) {
m_children[offset] = new Trie();
}
return m_children[offset];
}
public:
/**
* @brief find node for a given char, return null if it does not exist
*/
inline Trie* findNode(char ch) { return m_children[char_to_num(ch)]; }
public:
Trie() { memset(m_children, 0, sizeof(m_children)); }
virtual ~Trie() {}
/**
* @brief get all words for this node (there can be multiple words)
*/
void getWords(std::unordered_set<std::string>& words) const
{
if(!m_words.empty()) {
words.insert(m_words.begin(), m_words.end());
}
}
/**
* @brief add word to the trie
*/
void insertSorted(const std::string& word)
{
std::string sortedWord = word;
std::transform(sortedWord.begin(), sortedWord.end(), sortedWord.begin(), ::tolower);
std::sort(sortedWord.begin(), sortedWord.end());
Trie* node = this;
for(size_t i = 0; i < sortedWord.size(); ++i) {
char ch = sortedWord[i];
node = node->getOrCreateNode(ch);
}
// but we store the original word (unsorted)
node->m_words.insert(word);
}
};
void findInDictionary()
{
Trie trie;
trie.insertSorted("a");
trie.insertSorted("car");
trie.insertSorted("aca");
trie.insertSorted("art");
trie.insertSorted("rac");
trie.insertSorted("rac");
trie.insertSorted("tr");
trie.insertSorted("bar");
trie.insertSorted("aaa");
// Sort the result
std::string filter = "acrat";
std::sort(filter.begin(), filter.end()); // aacrt
std::unordered_set<std::string> words; // the result
std::queue<std::pair<Trie*, std::string> > q;
q.push({ &trie, filter });
while(!q.empty()) {
std::string w = q.front().second;
Trie* t = q.front().first;
q.pop();
t->getWords(words);
while(t && !w.empty()) {
Trie* child = t->findNode(w[0]);
w.erase(w.begin());
if(child) {
q.push({ child, w });
}
}
}
std::for_each(words.begin(), words.end(), [&](const std::string& w){
std::cout << w << std::endl;
});
}
/*
Makarand is right.
The best way to handle it is O(nk), where:
n is the size of the dictionary
k is the average size of the words in the dictionary by unique characters
Observe the use of counter, what we are trying to do is multiset.
en.wikipedia.org/wiki/Multiset
Given zoomba defaults into multiset using mset()
and let's set operation ( subset ) over multisets
The problem becomes trivial code.
== Appendix : Multiset Subset Operation ==
Given a list/sequence, a multiset is M = { (key,count) }
Now, M1 is subset of M2, if and only if :
for all key in M1,
1. key exists in M2
2. count of key in M1 <= count of key in M2
==
And now the solution
*/
// step 1, load dictionary and make each entry multiset :: one time
md = list ( file('/BigPackages/words.txt') ) as { [ $.o, mset( $.o.toCharArray ) ] }
// get input word
input_word = 'makarand'
test_word = mset( input_word.toCharArray )
// where each word in md is a subset ( multiset sense ) of test_word O(n*k)
subset_words = select ( md ) where { $.o.1 <= test_word } as { $.o.0 }
// we are done
println( subset_words )
Yielding:
[ a,aa,aaa,aam,ad,adam,adar,adm,adman,ak,aka,akan,akra,am,ama,amadan,amar,amarna,amra,an,ana,anam,anama,and,anda,ankara,ar,ara,arad,arak,arank,ark,arm,armada,arn,arna,d,da,dak,dam,dama,daman,damar,damn,dan,dana,dank,dar,dark,darn,dk,dkm,dm,dn,dr,dram,drama,drank,,k,ka,kaama,kam,kama,kan,kana,kanara,kand,karanda,karma,karn,km,kn,knar,kr,kra,krama,kran,krna,m,ma,maad,maana,maar,mad,makar,makara,makran,man,mana,manada,manak,mand,mandar,mandra,mank,mar,mara,mark,marka,md,mk,mn,mna,mr,n,na,naa,naam,nad,nada,nak,nam,namda,nar,nard,nark,nd,nm,nr,r,ra,raad,rad,rada,radman,rakan,ram,ramada,ramadan,ran,rana,rand,rank,rd,rm,rn,rnd, ]
Solution for the first part of the question:
1. Create a dictionary - HashMap
2. Create HashSet with all possible subsets of the input string.
3. Loop the HashSet and look in the dictionary if the key exists.
package com.cracking.facebook;
import java.util.HashMap;
import java.util.HashSet;
public class FindSubsetWords {
public static void main(String[] args) {
HashMap<String,String> dict = GetDictionary();
HashSet<String> subSets = GetSubSets("ACRAT");
PrintMatchingWords(dict, subSets);
}
public static void PrintMatchingWords(HashMap<String,String> dict,HashSet<String> subSets) {
for(String key:subSets) {
if( dict.containsKey(key) ) {
System.out.println(key);
}
}
}
public static HashSet<String> GetSubSets(String input) {
HashSet<String> subSets = new HashSet<String>();
GetSubSets(input, subSets);
return subSets;
}
public static void GetSubSets(String input,HashSet<String> subSets) {
for(int i=0;i<input.length(); i++) {
String left = Character.toString(input.charAt(i));
String right = input.substring(0,i) + input.substring(i+1);
GetSubSets(left, right, subSets);
}
}
public static void GetSubSets(String left,String right,HashSet<String> subSets) {
subSets.add(left);
subSets.add(right);
for(int i=0;i<right.length();i++) {
String newLeft = left + Character.toString(right.charAt(i));
String newRight = right.substring(0,i) + right.substring(i+1);
GetSubSets(newLeft, newRight, subSets);
}
}
public static HashMap<String,String> GetDictionary() {
HashMap<String,String> dict = new HashMap<String,String>();
dict.put("A", "A");
dict.put("AAA", "AAA");
dict.put("ACA", "ACA");
dict.put("ART", "ART");
dict.put("BAR", "A place with alcohol");
dict.put("BAD", "Not good");
dict.put("CAR", "Have 4 wheels and an engine");
dict.put("CREDIT", "Bank information");
dict.put("RAC", "RAC");
return dict;
}
}
Output:
A
ART
ACA
CAR
RAC
Solution for the follow up:
1. Use array of int[26] as a character counters.
2. Sum the character instances of the input list.
3. Search to build a list of words with the equal counters.
package com.cracking.facebook;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
public class FindEqualListOfWords {
public static void main(String[] args) {
HashMap<String,String> dict = GetDictionary();
PrintEqualList(dict, new String[]{
"DEBIT",
"CARD"
});
}
public static void PrintEqualList(HashMap<String,String> dict, String[] list) {
String inputStr = String.join("", list);
int[] inputCounter = GetWordCount(inputStr);
ArrayList<String> equalList = GetEqualList(dict, inputCounter);
System.out.println("Input List: " + Arrays.toString(list) );
System.out.println("Equal List: " + equalList.toString() );
}
public static ArrayList<String> GetEqualList(HashMap<String,String> dict,int[] inputCounter) {
ArrayList<String> equalList = new ArrayList<String>();
ArrayList<String> words = new ArrayList<String>(dict.keySet());
for(int i=0; i<words.size(); i++) {
String word = words.get(i);
int[] testCount = GetWordCount(word);
if(IsLessOrEqual(testCount, inputCounter)) {
equalList.add(word);
int j= 0;
while(j<words.size() && !IsEqual(testCount, inputCounter)) {
word = words.get(j++);
if(equalList.indexOf(word) != -1) continue; //word already in the list
int[] subTestCount = AddCounters(testCount, GetWordCount(word));
if(IsLessOrEqual(subTestCount, inputCounter)) {
equalList.add(word);
testCount = subTestCount;
}
}//End While
if( IsEqual(testCount, inputCounter) ) break;
equalList.clear();
}//Big If
}//For Loop
return equalList;
}
public static int[] GetWordCount(String input) {
int[] counters = new int[26]; //default values of 0 for each element
for(int i=0;i<input.length();i++) {
int pos = input.charAt(i)-'A';
counters[pos]++;
}
return counters;
}
public static int[] AddCounters(int[] counter1, int[] counter2) {
int[] total = new int[26];
for(int i=0; i<total.length; i++) {
total[i] = counter1[i] + counter2[i];
}
return total;
}
public static boolean IsLessOrEqual(int[] test,int[] source) {
for(int i=0;i<test.length; i++) {
if(test[i] > source[i]) return false;
}
return true;
}
public static boolean IsEqual(int[] test,int[] source) {
for(int i=0;i<test.length; i++) {
if(test[i] != source[i]) return false;
}
return true;
}
public static HashMap<String,String> GetDictionary() {
HashMap<String,String> dict = new HashMap<String,String>();
dict.put("A", "A");
dict.put("AAA", "AAA");
dict.put("ACA", "ACA");
dict.put("ART", "ART");
dict.put("BAR", "A place with alcohol");
dict.put("BAD", "Not good");
dict.put("CAR", "Have 4 wheels and an engine");
dict.put("CREDIT", "Bank information");
dict.put("RAC", "RAC");
return dict;
}
}
Output:
Input List: [DEBIT, CARD]
Equal List: [BAD, CREDIT]
My first thought was the following simple approach (easy to code):
1) sort the letter in each word and put it into a hash table / hash set
2) sort the letters of input word, create the super-set and check for each set(=string) if it's in the hash set: if word is 10 chars: 2^10: 1K lookups, with 20: 1 Mio, with 30: 1 Bio
The problem here is, there are a lot of tests into the hashtable that will not hit a "word". If I could stop trying if I knew a prefix of a subset I am trying doesn't exists ... see alternative below.
This HT approach would solve the follow-up question neatly. For the initial question it's not efficient.
Alterantive with Trie.
1) Store the words in a trie data strucutres (after sorting the chars, so CAT becomes "ACT"). The trie can be optimized to an alphabet of 26 chars and compacted (Patricia tree to save memory) but the original word must be still stored to return (since order is not preserved in the trie...)
2) Sort the chars in the search word (e.g. "ACRAT" -> "AACRT") Now, try to enter the trie with the first letter and then the basic idea is, if the letter is in the trie, I later need to backtrack and try the next letter from input string without the previous letter, but, and that's the catch, if it is not in the trie, I can stop exploring quite a lot subsets, so I do much fewer lookups that will be "empty".
Theoretically this is faster, but tries are hard to implement cache-friendly and so, it is not right obvious, if it's much faster on a modern machine. It is surely if we can add some additional constraints, like limit the number of results to return to e.g. 100, 1000, 10.000.
I would just write the search code in the trie assuming some trie interface:
- Chris October 19, 2017