Facebook Interview Question
Software Engineer InternsCountry: United States
Interview Type: Phone Interview
1. sort all the words in in inputlist in (m * nlongn). {eerst,beik,acrs,eerst,acrs}
2. sort the words (acrs,acrs,beik,eerst,eerst) mlogm
3. pass the entire list in m
complexity max of(m*mlogn,mlogm,m)
Item 2 is larger that mlogm because comparing two strings may take up to n. Both, MSD string sort and 3-way string quicksort take around m*n. Which leaves item 1 to define time complexity as 2 and 3 fit within.
If we sort the list in step 2 as you mentioned :-
2. sort the words (acrs,acrs,beik,eerst,eerst) mlogm
then the indices in new list will be different from the original list, then while making the return list how do we add the actual words into it. Since the return list should look like {{cars, arcs}, ...} Not {{acrs,acrs}, ...}.
If we follow your approach then you might also need to modify the sorting algorithm which will also return the actual position of the strings in the unsorted list ??
Runtime: O(m * n)
Assuming lower case English alphabet a-z.
Create int array size 26.
for each character in a word increment the appropriate index in the array. O(n)
iterate over the array and construct your sorted string. O(n)
for all words insert sorted string into a hashmap as a key and add the unsorted string into corresponding list. O(m)
@rizhang what else are you trying to sort? the hashmap contains a <String, List> pair. the String keys are sorted in O(n) as described above and the unsorted version is added to the sorted words List. The only sort I need takes O(n). Doing this for all m words takes O(m * n).
@rizhang more accurately, anon has speed up the asymptotic behavior of the sorting problem by using counting sort rather then the O(n*lg(n)) sorts others have used. This indeed have a O(n*m) run time...but IMHO a good interviewee would know the limitations of counting sort and the actual meaning of O notation. The real runtime of counting sort is O(n+k), where k is the number of buckets (26, in this case). So the overall runtime is O(n*m+k*m) (degenerates to k*m when n < m). Asymptotically, this will perform better when n gets very large. However, n*lg(n) run times will most likely be faster when n<<k, and wolfram alpha puts the average length of a word at 5.1. This is one of those cases where measuring would be the only way to prove things out. I'd guess either solution is fine as long as one could explain the other, why they chose the one they did, what the implementations of are of each, ect.
@nclimer i see your point, but as the problem stated "Given a list of strings", I did not assume the strings had to be dictionary valid and that the words in the example were just biased. I assumed the size of n could be on the order of m which would make a counting sort more practical than comparison sort where size of n is random and unbounded.
O(m*n*logn)
void findAnagrams(){
vector<string> svec;
vector<string>::iterator vit;
map< string , list<string> > m;
map< string , list<string> >::iterator mit;
list<string>::iterator lit;
list<string>::iterator eit;
svec.push_back("cars");
svec.push_back("arcs");
svec.push_back("trees");
svec.push_back("steer");
svec.push_back("scar");
svec.push_back("torn");
svec.push_back("link");
svec.push_back("kiln");
//O(m*n*logn)
int pos=0;
for(vit=svec.begin(); vit!=svec.end(); vit++ ) {
string ss = sorted(*vit);
mit = m.find(ss);
if( mit != m.end() ) {
mit->second.push_back(*vit);
}else {
list<string> ls;
ls.push_back(*vit);
m[ss] = ls;
}
pos++;
}
//O(m)
pos=0;
for(mit=m.begin(); mit!=m.end(); mit++) {
lit = mit->second.begin();
eit = mit->second.end();
for(;lit!=eit;lit++) {
cout << *lit << " ";
}
cout << endl;
}
return;
}
Quick couchtime solution. Wish ObjectiveC was kinder to string manipulation
+ (NSMutableDictionary *)anafy:(NSArray *)words {
NSMutableDictionary *ourWords = [NSMutableDictionary dictionary];
for (NSString *word in words) {
NSString *ourWord = [word lowercaseString];
NSMutableArray *wordExploded = [NSMutableArray array];
for (int i=0; i<ourWord.length;i++) {
NSString *str = [NSString stringWithFormat:@"%c", [ourWord characterAtIndex:i]];
[wordExploded addObject:str];
}
wordExploded = [[wordExploded sortedArrayUsingComparator:^NSComparisonResult(NSString *a, NSString *b) {
return [a compare:b];
}] mutableCopy];
ourWord = [wordExploded componentsJoinedByString:@""];
NSMutableArray *obj = [ourWords objectForKey:ourWord];
if (!obj) {
obj = [NSMutableArray array];
}
[obj addObject:word];
[ourWords setObject:obj forKey:ourWord];
}
return ourWords;
}
On PHP it's can be
$words = array('trees', 'bike', 'cars', 'steer', 'arcs');
$result = array();
$sortedWordIndexes = array();
$currentIndex = -1;
foreach ($words as $k => $word) {
$wordLetters = str_split($word);
asort($wordLetters);
$sortedWord = join('', $wordLetters);
if (!isset($sortedWordIndexes[$sortedWord]) ) {
$insertedIndex++;
$sortedWordIndexes[$sortedWord] = $insertedIndex;
}
if (!isset($result[$sortedWordIndexes[$sortedWord]])) {
$result[$sortedWordIndexes[$sortedWord]] = array();
}
$result[$sortedWordIndexes[$sortedWord]][] = $word;
}
var_dump($result);
So It's only O(n*m)
Just FYI, you've screwed up your O calculation...you have to remember to calculatione all the complexities in the sub calls. There is a hidden nlgn calculation in asort, so your solution is O(m*n*lgn).
A C++ solution. If n is the charaters in the string, and m is number of the strings, Calculating the key is O(nlgn), adding it to the hash is O(n) (for the copy and calculating the key, do that m times, overall O(m*n*lg(n));
#include <iostream>
#include <unordered_map>
#include <map>
#include <string>
#include <algorithm>
#include <vector>
using namespace std;
typedef std::unordered_map<std::string,std::vector<std::string> > MM;
template <typename ITER>
MM find_anagrams(ITER b, ITER e) {
MM ret;
for ( ; b != e; ++b) {
std::string key{*b};
std::sort(key.begin(),key.end());
ret[key].push_back(*b);
}
return ret;
}
int main()
{
std::vector<std::string> x{"trees", "bike", "cars", "steer", "arcs"} ;
auto y = find_anagrams(x.begin(),x.end());
for (auto& v : y) {
std::cout << "Bucket " << v.first << std::endl;
for (auto& s : v.second) {
std::cout << " " << s << std::endl;
}
}
return 0;
}
This is a really quick hack using dictionaries, could definitely be better
var anagrams = ['trees', 'bike', 'cars', 'steer', 'arcs'];
var anagramsWithId = {};
exe(anagrams);
function exe(anagrams){
var sortedString;
for (anagram in anagrams){
sortedString = anagrams[anagram].split('').sort(caseInsensitiveSort).join('');
if(anagramsWithId[sortedString] === undefined){
anagramsWithId[sortedString] = [anagram];
}else{
anagramsWithId[sortedString].push(anagram);
}
}
printCouples();
}
function printCouples(){
for(key in anagramsWithId){
for (val in anagramsWithId[key]){
anagramsWithId[key][val] = anagrams[anagramsWithId[key][val]];
}
console.log(anagramsWithId[key]);
}
}
function caseInsensitiveSort(a, b)
{
var ret = 0;
a = a.toLowerCase();b = b.toLowerCase();
if(a > b)
ret = 1;
if(a < b)
ret = -1;
return ret;
}
Python, O(m*n*log(n))
data = ['trees', 'bike', 'cars', 'steer', 'arcs']
from collections import defaultdict
d = defaultdict(list)
for w in data:
d[''.join(sorted(w))].append(w)
for l in d.values():
print l
@Oli: according your code and idea of hash table we don't need to sort anything. Ofcourse the hash with array might reduce the key size a lots and we don't need to sort anything
#list of strings
data = ['trees', 'bike', 'cars', 'steer', 'arcs']
from collections import defaultdict
def calcHash(w):
d = defaultdict(int)
for i in w:
o = ord(i)-97
d[o] = d[o]+1
s = ''
for i in d:
s = s + chr(i+97)+str(d[i])
return s
d = defaultdict(list)
for w in data:
d[calcHash(w)].append(w)
for l in d.values():
print l
def sortChar(str):
output = ''
for u in sorted(str):
output += u
return output
test = ["trees", "bike", "cars", "steer", "arcs"]
long = 0
dict = {}
for i in test:
if len(i) > long:
long = len(i)
tmp = sortChar(i)
if tmp not in dict.keys():
list = [i]
dict[tmp] = list
else:
dict[tmp].append(i)
finallist = []
for i in dict:
finallist.append(dict[i])
print finallist
print long
Here is a solution that uses Xor of the characters in a string as the hash value:
void GroupAnagrams(vector<string>& input)
{
map<char, vector<string> > anagrams;
for (size_t i = 0; i < input.size(); i++)
{
if (input[i].empty()) continue;
char xorVal = input[i][0];
for (size_t j = 1; j < input[i].size(); j++)
xorVal ^= input[i][j];
anagrams[xorVal].push_back(input[i]);
}
map<char, vector<string> >::iterator anagramIter = anagrams.begin();
for (; anagramIter != anagrams.end(); anagramIter++)
{
vector<string>& anagramVec = anagramIter->second;
for (size_t i = 0; i < anagramVec.size(); i++)
cout << anagramVec[i] << " ";
cout << endl;
}
}
int main ()
{
vector<string> input;
input.push_back("trees");
input.push_back("bike");
input.push_back("cars");
input.push_back("arcs");
input.push_back("steer");
GroupAnagrams(input);
return 0;
}
public class AnagramSolution
{
static class Data implements Comparable<Data>{
String sortedString;
int index;
Data(String str, int pos){
this.sortedString = str;
this.index = pos;
}//Data
@Override
public String toString(){
return "[" + sortedString + " " + index +"]";
}//toString();
@Override
public boolean equals(Object ob){
if (ob == null) return false;
if (ob == this) return true;
if(ob instanceof Data){
Data data = (Data) ob;
return this.sortedString.equals(data.sortedString);
}
return false;
}//
@Override
public int compareTo(Data o){
return this.sortedString.compareTo(o.sortedString);}
}//
public static ArrayList<ArrayList<String>> groupAnagrams(List<String> stringList){
String [] originalList = new String[stringList.size()];
stringList.toArray(originalList);
Data [] sortedStrings = new Data [originalList.length];
//sort each string in original list m * n* log(n) :m size of original list,
// n worst case length of string in original list
// result: [[eerst 0], [beik 1], [acrs 2], [eerst 3], [acrs 4]]
for(int i = 0; i<originalList.length; i++){
char [] sortedArray = originalList[i].toCharArray();
Arrays.sort(sortedArray);
String sorted = new String(sortedArray);
sortedStrings[i] = new Data(sorted,i);
}
//sort entire sortedStrings m * log(m)
//Result : [[acrs 2], [acrs 4], [beik 1], [eerst 0], [eerst 3]]
Arrays.sort(sortedStrings);
ArrayList<ArrayList<String>> finalList = new ArrayList<ArrayList<String>>();
//Iterate through sortedStrings : gather into arrayList final Solution
int i = 0;
while(i< sortedStrings.length){
Data data = sortedStrings[i];
ArrayList<String> temp = new ArrayList<String>();
temp.add(originalList[data.index]);
int nextIndex = i+1;
while(nextIndex < sortedStrings.length){
Data nextData = sortedStrings[nextIndex];
if(nextData.equals(data)){
temp.add(originalList[nextData.index]);
nextIndex++;
}
else break;}
finalList.add(temp);
i = nextIndex;
}
return finalList;
}//groupAnagrams(List<String> stringList)
#include <iostream>
#include <vector>
#include <algorithm>
#include <map>
using namespace std;
void printsorted(vector<string> input)
{
map< string, vector<string> > list;
for (vector<string>::const_iterator it = input.begin(); it != input.end(); ++it)
{
string str(*it);
sort(str.begin(), str.end());
list[str].push_back(*it);
}
cout << "{ ";
for (map< string, vector<string> >::const_iterator it = list.begin(); it != list.end(); ++it)
{
if (it != list.begin())
{
cout << ", ";
}
vector<string> sublist = it->second;
cout << "{";
for (int i = 0; i < sublist.size(); ++i)
{
if (i != 0)
{
cout << ", ";
}
cout << sublist[i];
}
cout << "}";
}
cout << " }" << endl;
}
Similar to the anagrams problem on leetcode. Use a hashmap, key = sorted word from a to z, value = list of anagrams, for example key = acrs, value = {cars, arcs}.
Java code:
public class Solution {
public ArrayList<ArrayList<String>> anagrams(String[] strs) {
ArrayList<ArrayList<String>> result = new ArrayList<ArrayList<String>>();
if (strs == null || strs.length == 0) return result;
HashMap<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
for (int i = 0; i < strs.length; i++) {
// sort the string: convert to char array, sort, then construct a new string
char[] temp = strs[i].toCharArray();
Arrays.sort(temp);
String sorted = new String(temp);
// check if the anagram group already exists, if so, add string to the value
if (map.containsKey(sorted))
map.get(sorted).add(strs[i]);
else {
// else create new entry of the map
ArrayList<String> entry = new ArrayList<String>();
entry.add(strs[i]);
map.put(sorted, entry);
}
}
// iterate map and add all anagrams to the result
Iterator<ArrayList<String>> iter = map.values().iterator();
while (iter.hasNext()) {
ArrayList<String> item = iter.next();
result.add(item);
}
return result;
}
}
public static List<List<String>> findAnagramsNextToEachOther(List<String> list){
List<List<String>> result = new ArrayList<List<String>>();
HashMap<String, ArrayList<String>> hashmap = new HashMap<String,ArrayList<String>>();
for(String s:list){
String key = AnagramsComparator.sortChars(s);
if(!hashmap.containsKey(key)){
hashmap.put(key, new ArrayList<String>());
}
ArrayList<String> arr = hashmap.get(key);
arr.add(s);
}
for(Entry<String,ArrayList<String>> entry:hashmap.entrySet()){
result.add(entry.getValue());
}
return result;
}
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("acre");
list.add("race");
list.add("care");
list.add("usb");
list.add("sbu");
list.add("oiuy");
list.add("uiyo");
System.out.println("original");
for(String s:list){
System.out.println(s);
}
System.out.println();
System.out.println("sorted and grouped");
List<List<String>> result = findAnagramsNextToEachOther(list);
for(List<String> entry:result){
System.out.print("{");
for(String s:entry){
System.out.print(s+" ");
}
System.out.print("}\n");
}
}
}
class AnagramsComparator implements Comparator<String>{
public static String sortChars(String s){
char[] chars = s.toCharArray();
Arrays.sort(chars);
return new String(chars);
}
@Override
public int compare(String o1, String o2) {
// TODO Auto-generated method stub
return o1.compareTo(o2);
}
}
Java 8 functional style:
public static Set<Set<String>> getAllAnagrams(List<String> strings) {
Map<String, Set<String>> map = new HashMap<>();
strings.forEach(word -> {
String sorted = sort(word);
System.out.println(sorted);
Set<String> anagrams = map.containsKey(sorted) ? map.get(sorted) : new HashSet<>();
anagrams.add(word);
map.put(sorted, anagrams);
});
Set<Set<String>> results = new HashSet<>();
map.values().stream().filter(set -> set.size() > 1).forEach(set -> results.add(set));
return results;
}
private static String sort(String word) {
char[] charArray = word.toCharArray();
Arrays.sort(charArray);
return new String(charArray);
}
// sort chars in each string, then sort strings and group
// timing 22 minutes
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;
public class GroupAnagrams {
static class HelperWord implements Comparable<HelperWord>{
String s, sWithCharsSorted;
HelperWord(String s){
this.s = s;
char sc[] = s.toCharArray();
Arrays.sort(sc);
sWithCharsSorted = new String(sc);
}
public String toString(){
return (s + "\t" + sWithCharsSorted);
}
@Override
public int compareTo(HelperWord that) {
int res = ((String) sWithCharsSorted).compareTo(that.sWithCharsSorted);
return res;
}
}
public GroupAnagrams() {
}
private Set<Set<String>> groupAnagrams(String[] a){
Set<Set<String>> res = new HashSet<Set<String>>();
if(a==null) {
return res;
} else if (a.length == 1){
Set<String> y = new HashSet<String>();
y.add(a[0]);
res.add(y);
return res;
}
List<HelperWord> wordsWithCharsSorted = new LinkedList<HelperWord>();
for(String s : a){
HelperWord x = new HelperWord(s);
wordsWithCharsSorted.add(x);
}
Collections.sort(wordsWithCharsSorted);
for(HelperWord x : wordsWithCharsSorted){
System.out.print(x + "\t");
}
System.out.println();
// traverse list and group anagrams
HelperWord prev = wordsWithCharsSorted.get(0);
int groupStart = 0;
for(int i = 1; i < wordsWithCharsSorted.size(); i++){
HelperWord x = wordsWithCharsSorted.get(i);
if(prev.compareTo(x) == 0){
} else {
// have a new cluster
Set<String> y = new HashSet<String>();
for(int j = groupStart; j < i; j++){
HelperWord x2 = wordsWithCharsSorted.get(j);
y.add(x2.s);
}
res.add(y);
groupStart = i;
}
prev = x;
}
Set<String> y = new HashSet<String>();
for(int j = groupStart; j < wordsWithCharsSorted.size(); j++){
HelperWord x2 = wordsWithCharsSorted.get(j);
y.add(x2.s);
}
res.add(y);
return res;
}
public static void main(String[] args){
GroupAnagrams wrapper = new GroupAnagrams();
String[] testcase = { "trees" , "car" , "bike" , "steer" , "arc" };
Set<Set<String>> groups = wrapper.groupAnagrams(testcase);
System.out.println(groups.size());
}
}
x = ["trees", "bike", "cars", "steer", "arcs", "anuj", "juna"]
a = ["c", "b", "a", "d", "z", "q"]
def quick_sort(a):
left = []
right = []
if len(a) > 0:
pivot = a[0]
for i in range(len(a)):
if a[i] < pivot:
left.append(a[i])
elif a[i] > pivot:
right.append(a[i])
return quick_sort(left) + [pivot] + quick_sort(right)
else:
return a
def group_all_anagram(x):
y = {}
for i in x:
sorted_i = "".join(quick_sort(list(i)))
if sorted_i not in y:
y[sorted_i] = []
y[sorted_i].append(i)
print [ val for key, val in y.iteritems()]
group_all_anagram(x)
x = ["trees", "bike", "cars", "steer", "arcs", "anuj", "juna"]
a = ["c", "b", "a", "d", "z", "q"]
def quick_sort(a):
left = []
right = []
if len(a) > 0:
pivot = a[0]
for i in range(len(a)):
if a[i] < pivot:
left.append(a[i])
elif a[i] > pivot:
right.append(a[i])
return quick_sort(left) + [pivot] + quick_sort(right)
else:
return a
def group_all_anagram(x):
y = {}
for i in x:
sorted_i = "".join(quick_sort(list(i)))
if sorted_i not in y:
y[sorted_i] = []
y[sorted_i].append(i)
print [ val for key, val in y.iteritems()]
group_all_anagram(x)
It can be done in O(mn), where n is the length of each string and m is the number of strings, not O(mnlogn).
Simply iterate through each string, hashing a data structure holding the count of each letter a-z in the string, along with the string itself. If you find a string with the same count, add it to the bucket. At the end, simply print out the buckets, placing the strings within each bucket in a subset.
That's O(m*n) to iterate through each string, O(m) to print the buckets, and O(m) space complexity.
You don't have to sort the words unless you're trying to do some sort of string comparison to find anagrams.
the easiest and most time efficient way would be to find a hash function in which trees and steer come to the same hash..
meaning the hash function is commutative wrt eat character at the same time decreasing one character and increasing the other character (like tr(e->d)(e->f)s) does not give the same hash.. i mean to say its not a simple addition of the asciis..
a good example would be to
take the alphabet set and assign random numbers to each character. call it random1
take the alphabet set again and assign another set of random numbers to each character. call it random2
now instead of ascii of these alphabets use these random numbers.
now characters in trees and steer would add up to the same total by adding values in random1 and also the same total using characters in random2.
trees and trdf wont add up to the same totals using random1 and random2
its not an ideal solution but can reduce the chances of collision drasticallly.
This is C++ version of solution, which run in O(m* nlog n) where M is # of words and N is the longest word length().
I am not sure the question is related to anagram.
#include<iostream>
#include<string>
#include<map>
#include<list>
#include<algorithm>
using namespace std;
map<string, list<string>>find_anagram_group(string * input, int len)
{
map<string, list<string>> result;
for (int i= 0; i<len; i++)
{
string sorted = input[i];
sort(sorted.begin(), sorted.end());
if (result.find(sorted)!= result.end())
{
result[sorted].push_back(input[i]);
} else {
list<string> l;
l.push_back(input[i]);
result[sorted] = l;
}
}
return result;
}
Done in O(n*m). For each string, we want to bring all the 'a's (if it has) to the front, then the b's, for example we wanna convert yxbaa-> aabxy. This can be done easily by keeping count of how many occurrence per character. Clearly if 2 strings are anagram of each other then they must have to same converted version.
The remaining part is using radix/trie sort to group all the string that has the same converted version together. This can be done in O(n*m).
Java
class Solution {
static String[] strings = {"trees", "bike", "cars", "steer", "arcs"};
public static void main(String[] args) {
Hashtable<String, List<String>> table = new Hashtable<String, List<String>>();
for(String s : strings)
{
char[] arr = s.toCharArray();
Arrays.sort(arr);
String key = new String(arr);
if(table.containsKey(key))
{
table.get(key).add(s);
}
else
{
ArrayList<String> x = new ArrayList<String>();
x.add(s);
table.put(key, x);
}
}
ArrayList<List<String>> returnList = new ArrayList<List<String>>();
Set<String> keys = table.keySet();
for(String key : keys)
{
List<String> list = table.get(key);
for(String s : list)
{
System.out.print(s + " ");
}
System.out.println();
returnList.add(table.get(key));
}
}
}
Create a Hashmap<Integer, ArrayList<String>> and go through the list of words hashing each of them (the hash could be a simple sum of int value of each char for the word), add the hash to the map then check the next word, if hash is in the map, then add the word to the list, otherwise, add hash as key and create new list.
In the end, print out the list.
public class PairAnagrams {
public static Map<Integer, ArrayList<String>> superSet = new HashMap<Integer, ArrayList<String>>();
public static void pairAnagrams(String[] arr) {
for (String s: arr) {
int hash = computeHash(s);
if (superSet.get(hash) == null) {
superSet.put(hash, new ArrayList<String>());
}
superSet.get(hash).add(s);
}
}
public static int computeHash(String s) {
int res = 0;
for(int i = 0; i < s.length(); i++) {
res += s.charAt(i);
}
return res;
}
public static void main(String[] args) {
String[] arr = {"trees", "bike", "cars", "steer", "arcs"};
pairAnagrams(arr);
for (ArrayList<String> a: superSet.values()) {
System.out.println(a);
}
}
}
Output:
[cars, arcs]
[bike]
[trees, steer]
You cannot use sum of string characters as hash function. For example
cars
arcs
cbps
will have the same hash, but only first and second are anagrams.
Ok, agreed, we can use something more complicated, I just used this simple function as an example :)
We can change the hash function to something like this:
Assign primary numbers to each character. While calculating hash code, get the prime number assigned to that character and multiply with to existing value.Now all anagrams produce same hash value.
ex : a - 2, c - 3 t - 7
hash value of cat = 3*2*7 = 42 hash value of act = 2*3*7 = 42 Print all strings which are having same hash value(anagrams will have same hash value)
The only problem with assigning numbers to chars is if all chars are in the list of strings. Let's say z = 101. If you have "zzzzzzzzzz" you'll have 101^10 which is a huge number. If all the strings must be a word from the dictionary, "Pneumonoultramicroscopicsilicovolcanoconiosis", which according to my google search is the longest word in the dictionary, will have a huge value
@rizhang: We can try to reduce probability of overflow by assigning the lowest prime numbers to the most common letters and vowels.
This mapping was stress tested against 100K dutch words and had no overflow (we are talking about unsigned 64-bit integers here):
unsigned char primes26[] =
{
5,71,79,19,2,83,31,43,11,53,37,23,41,3,13,73,101,17,29,7,59,47,61,97,89,67,
};
However, this approach is only applicable if we are talking about actual words (words that can be found in a dictionary). If we are using random strings like "zzzzzzzzzzz", etc... then the hashing function is not a reasonable one and you should use sorting instead like the other answer suggests.
I also have the same hashmap idea but anoynous put forth almost the code out here. So, the rub is the hash function.
My idea here for the hash function is run length encoding with a twist of course: "ABCAGR" MAPS to "A2B1C1G1R1" and so is "RABCAG". Order of the letters does not matter. The way to approach to this is to use a temporary character counter array and use that counter to generate the hash.
private static String computeHash(String string) {
//This array needs to be increased if we consider the numbers as well as other characters
long [] charCtr = new long [52]; // Radix
char[] chrArray = string.toCharArray();
for (int i = 0; i < chrArray.length; i++) {
charCtr[chrArray[i]-'A']++;
}
StringBuffer strBuff = new StringBuffer();
for (int i = 0; i < charCtr.length; i++) {
if(charCtr[i]!=0) {
strBuff.append( (char) (i+'A')+ "" + charCtr[i] );
}
}
return strBuff.toString();
}
@naren, that's actually a pretty good idea. Run Length Encoding might actually prevent any overflow occurring from associating numbers to chars.
use hash function as define in RobinK. Use Quadent as 13 or any prime number to get better value:
private int Hash(string text)
{
int hashVal = 0;
for (int i = 0; i < patternLength; i++)
hashVal = (10 * hashVal + text[i]) % Quadent;
return hashVal % Quadent;
}
@Anonymous Hashing with primes will work for most words, but again the example I posted will fail, even with a 64 bit int (I tested it).
@naren wow that's a good solution
@CT If we sort the chars and hash it, the total time complexity will be O(n*m*logm), where n = #words, m=avg # chars. If you write your own hash function, the complexity becomes O(n*m). Please let me know if you think I made a mistake
Suppose that : m = # of words and n = length of longest word
I know two ways how to solve this problem:
0) For each word sort it's letters and put sorted string into HashTable. O(m*n*log n)
1) Without HashTable.
- GK December 13, 2014i) Create two arrays: indexes[] and words[] where indexes[i] - position of words[i] in original array. O(m)
ii) For each string from words[] sort letters. O(m*n*logn)
iii) Sort words[]. At each swapping words[i] and words[j] have to swap indexes[i] and indexes[j] O(m*logm)
iv) Iterate through words[]. if (words[i] == words[i+1]) than words from original list standing on indexes[i] and indexes[i+1] position are anagrams. O(m)