Google Interview Question
Software Engineer in Testsfinding anagrams is m! so order is going to be O(n*m!). lets say n is 2000. m is 20. Order in your solution is def.ly not O(n). I gave an O(n) solution to the interviewer which was correct. but, he seemed to hint at a better solution (mine was taking 2*n operations in the worst case). I will post my solution tomorrow.
anagram_count = 0
SB = Sorted version of B
Create C that contains 1 of every character in B
for ( i = 0 .. length(A) )
{
D = A.substring( i thru ( i + length(B) );
for ( j = 0 .. length(D) )
{
// Anagram test for A[i] thru A[i + length(B)]
if ( C.indexOf( A[ j ] ) == -1 )
{
//A[j] is not in B, so skip ahead to next char in A
i = j + 1;
break;
}
}
Sort D;
if ( compare( D, SB ) == 0 )
{
anagram_count++;
}
}
return anagram_count;
anagram_count = 0
SB = Sorted version of B
Reposting as I corrected minor loop skip...
function get_anagram_count( A, B)
{
Create C that contains 1 of every character in B
for ( i = 0 .. length(A) )
{
D = A.substring( i thru ( i + length(B) );
skip = false;
for ( j = 0 .. length(D) )
{
// Anagram test for A[i] thru A[i + length(B)]
if ( C.indexOf( A[ j ] ) == -1 )
{
//A[j] is not in B, so skip ahead to next char in A
i = j + 1;
skip = true;
break;
}
}
if ( skip )
{
next;
}
Sort D;
if ( compare( D, SB ) == 0 )
{
anagram_count++;
}
}
return anagram_count;
}
void f(int A[], int aSize, int B[], int bSize)
{
int bCharTotal = 0;
int aCharTotal = 0;
if(bSize > aSize)
{
printf("Array A is smaller than B, anagram not possible");
return;
}
for(int i = 0; i < bSize; i++)
{
aCharTotal += A[i];
bCharTotal += B[i];
}
for(int i = bSize - 1; i < aSize; i++)
{
if(aCharTotal == bCharTotal)//anagram possible
{
int temp = aCharTotal;
for(int j = 0; j < bSize; j++)
temp -= b[i];
if(temp == 0)
{
printf("anagram found!\n");
printf("Substring on A from %d to %d is the anagram", i - bSize, i);
return;
}
}
//Shift the window total right by one
aCharTotal -= A[i-bSize]; //Remove the first character on the Total
if(i+1 < aSize) aCharTotal += A[i]; //add the next character on A array to the total
}
printf("No anagram found!\n");
}
When I saw the solutions here I got one doubt. Does the question says that
1. for a given B find a substring in A which is anagram of B?
Or it says
2. for a given B find a substring in A which contains anagram of B?
e.x.
1.
A=ghefdckjcaaadebclpq
B=aabedcca
here in A one of the substring is "caaadebc" which is anagram of B
2.
A=efcadckjaghadeblpcq
B=aabedcca
Here here in A one of the substring is "cadckjaghadeblpc" which contains anagram of B.
Solution posted above works for 2 only.
@Sathya, Mat, Anony please correct me if I m wrong.
Isn't this question a variation of sliding window which has all characters of B in A? If we can have unwanted characters in the window, i.e. the window size is > size of B then it's same as the sliding window problem.
Otherwise, we can have an additional check to ensure the window size = size of B.
put the string B into a hash table.
Now slide through the array A and check for each character to be present/absent ffrom the hash table. If present continue to next character and if absent start again from next character....thats linear.
IS IT CORRECT?
string A="hiiamam"
string B="army"
ur solution will return positive for amam(4 times in hash)
vs army (4 times in hash)
it depends on how u build the hash function...to check for anagrams ur string B must be sorted first and then any hash function would work...similarly in a window of length(B) in string A, first sort the window and then compute the hash function.
You can come with hash function such that anagrams will have same hashcode.
e.g. String contains only ascii characters then hash function could be adding ascii values of all the characters. Now pickup first 'm' characters from A and then compute hash function.
1. If hashcode matches with that of B, sort the substring (window of length B) and compare with sorted version
This saves you sorting each window of length (B).
2. If hashcode does not match you can compute next hashcode very easily, just substract ascii value of the first character in the window and add ascii value for the next character.
worst case running time O( n + ( n log m))
1. Each window can be sorted in O( m log m) time.
2. There would be n/m such windows. Hence total time would be O(n/m * m log m) = O( n log m)
3. Total time to traverse entire string A is O(n)
Algo:
1. Sort B
2. maintain a window of length m and slide it along A
3. for each window do:
3.1 Sort the window O(mlogm)
3.2 Compare the sorted window with sorted B
3.3 If equal anagram found
T(n) = O(n*mlogm) but since m<<n T(n)~O(n)
Nice approach
Using extra space it can be done without altering the input strings.
One Optimization to yr solution is that once we sorted the window of size "m" first time, after that no need to sort,we can call min heapify for the next entry.
so the time is (n-1)logm + mlogm = (n+m-1)logm instead of nmlogm.
i dont get the min heapify thing.please elaborate a bit.
and how this comes out to be (n-1)logm
how?suppose B="raam",A="ramaabcdefg"
now sorted B="aarm"
and sorted A for length 4 is aarm so both matches but no substring in A is aarm.plz let me know if i misunderstood the question
import java.util.Arrays;
public class AnagramSubString {
public static void containsAnagram(char a[], char b[]){
int m = b.length;
int n = a.length;
Arrays.sort(b);
for(int i = 0; i < n-m; i++){
char window[] = Arrays.copyOfRange(a, i, i+m);
Arrays.sort(window);
if(Arrays.equals(window, b)){
System.out.println("SubString Anagram at index "+i);
return;
}
}
System.out.println("Not Found");
}
public static void main(String args[]){
containsAnagram("123cabdnj265kml".toCharArray(), "abcdjn".toCharArray());
}
}
import java.util.HashMap;
public class AnagramSubstring {
static void initializeHashMap(HashMap<Character,Integer> strMap,String str2)
{
for(int k=0;k<str2.length();k++)
{
if(strMap.containsKey(str2.charAt(k)))
{
int temp = strMap.get(str2.charAt(k));
temp++;
strMap.put(str2.charAt(k), temp);
}
else
{
strMap.put(str2.charAt(k), 1);
}
}
}
static boolean checkAnagramSubstring(String str1,String str2)
{
HashMap<Character,Integer> strMap = new HashMap<Character,Integer>();
int str2_index=0,start=0;
for(int str1_index=0;str1_index<str1.length();str1_index++)
{
if(strMap.containsKey(str1.charAt(str1_index))&& strMap.get(str1.charAt(str1_index))>0)
{
int temp = strMap.get(str1.charAt(str1_index));
temp--;
strMap.put(str1.charAt(str1_index), temp);
if(str2_index==0)
{
start=str1_index;
}
str2_index++;
if(str2_index==str2.length())
{
return true;
}
}
else
{ if(str2_index!=0)
{
str1_index=start;
}
str2_index=0;
initializeHashMap(strMap,str2);
}
}
return false;
}
public static void main(String[] args) {
String str1="hellowoorldhowareyou";
String str2="oolrdw";
if(checkAnagramSubstring(str1,str2))
{
System.out.println("Present");
}
else
{
System.out.println("Not Present");
}
}
}
(1) Store all characters and their counts in string B into a hashtable.
(2) Set numZeros to 0 initially
(3) Iterate through the first |B| characters of A. For each character, check if it's in the hashtable. If not, ignore it. If yes, decrement the count. If the count is 0, increment numZeros. If numZeroes == #keys in the hashtable, then we have found a substring that's an anagram of B.
(4) For each subsequent character in A, do the same as in (3) above. Also, we need to "subtract" the character that no longer belongs in the current substring. Again, check if that character is in the hashtable. If not, ignore it. If yes, increment the count. If the count is now 1, which means it's previously 0, decrement numZeros. Repeat step (4) until we reached end of array or have numZeros == #keys in the hashtable.
unordered_map<char, int> mb;
unordered_map<char, int> ma;
int window_size(b.size());
for (int i=0; i < b.size(); i++)
mb[b[i]]++;
int cnt(0);
for (int i=0; i < b.size(); i++)
if (m[a[i]]) { ma[a[i]]++; cnt++; }
for (int i=0; i <= a.size()-b.size(); i++)
{
if (cnt == b.size()) return true;
else {
if (ma[a[i]]) { ma[a[i]]--; cnt--; }
if (i+b.size() < a.size()) {
if (mb[a[i+b.size()]]) { ma[a[i+b.size()]]++; cnt++; }
}
}
return false;
}
unordered_map<char, int> mb;
unordered_map<char, int> ma;
int window_size(b.size());
for (int i=0; i < b.size(); i++)
mb[b[i]]++;
int cnt(0);
for (int i=0; i < b.size(); i++)
if (m[a[i]]) { ma[a[i]]++; cnt++; }
for (int i=0; i <= a.size()-b.size(); i++)
{
if (cnt == b.size()) return true;
else {
if (ma[a[i]]) { ma[a[i]]--; cnt--; }
if (i+b.size() < a.size()) {
if (mb[a[i+b.size()]]) { ma[a[i+b.size()]]++; cnt++; }
}
}
return false;
}
question checks if you sliding window concept similar to rabin-karp string matching algorithm. Its particularly very useful if you window size
// a is the larger string, b is smaller and we need to check if anagram(b) exists as some substring. In anagram questions, think terms of matching the hash-tables.
bool isAnagaramExist(string& a, string& b)
{
unordered_set<char> sb;
unordered_map<char, int> ma;
int n(a.size()), m(b.size()), cnt(0);
for (int i =0 ; i < m; i++)
sb.insert(b[i]):
for (int i=0; i < m; i++)
if (sb.find(a[i]) != sb.end()) { ma[a[i]]++; cnt++; }
for (int s=0; s <= n-m; i++)
{
if (cnt == m && ma.size() == sb.size()) return true ; // hash-tables are same
if (sb.find(a[i]) != sb.end) {
cnt--;
if (--ma[a[s]] == 0)
ma.erase(a[s]); // if its becomes zero, remove from the hash-table maintained by sliding string A.
}
if (s+m < n && sb.find(a[s+m]) != sb.end()) {
ma[a[s+m]]++;
cnt++;
}
return false;
}
}
public static boolean ana_check(String a, String b)
{
int n=a.length();
int m=b.length();
boolean k;
for(int i=0;i<=(n-m);i++){
k= anagram((a.substring(i,i+m)),b);
if(k)
return true;
}
return false;
}
public static boolean anagram(String s, String t) {
// Strings of unequal lengths can't be anagrams
if(s.length() != t.length()) {
return false;
}
// They're anagrams if both produce the same 'frequency map'
return frequencyMap(s).equals(frequencyMap(t));
}
private static Map<Character, Integer> frequencyMap(String str) {
Map<Character, Integer> map = new HashMap<Character, Integer>();
for(char c : str.toLowerCase().toCharArray()) {
Integer frequency = map.get(c);
map.put(c, frequency == null ? 1 : frequency+1);
}
return map;
}
This solution runs at O(n*m) however as its is explicitly stated m<<n then the runtime is O(n).
Using Modified Rabin Karp, we can solve this problem in O(n)
Algorithm :
1. Calculate the sum of the characters (rolling hash) of string B = O(m)
2. Calculate rolling hash for the 1st window of size m for string A = O(m)
3. Progressively, calculate the rolling hash from m+1 till n-1 for string A = O(n-m)
4. At any point, if the rolling hash of string B matches the rolling hash of window under test, then we have found an anagram
Time Complexity = O(m+m+n-m) = O(n+m) = O(n) since n>>m
Space Complexity = O(1)
Find out all the anagrams and check array A for this substring. Checking one combination takes O(n). Since, number of anagrams rates to be not comparable to n, the complexity is O(n).
oh,sorry i got the point,thanks
- aryan.bit April 18, 2011