Yahoo Interview Question for Software Development Managers


Team: Y! Suite
Country: United States
Interview Type: In-Person




Comment hidden because of low score. Click to expand.
3
of 3 vote

If the two nonanagram strings are different lengths and we know the lengths ahead of time, we're done ( O(1) ) but ignore that case.

Let m be length of a string (note, they are both equal in length in interesting case).
{Note: Since the problem had no "letters" I, myself, define a letter instead of bringing "historical reasons" as a hidden assumption for what they mean}

Theoretical MIN. for solving the problem is clearly O(m) - you have to check every letter of the strings. Note, O(m) + O(m) = O(m) so I do mean checking both strings.

Can we find a solution that meets the theoretical min.?

Sure, in the abstract, let s1, s2 be our strings.

1) t1 = linearsort(s1), t2 = linearsort(s2) <=== O(m) work
2) now compare t1 and t2 letter by letter <=== O(m) work

Total O(m). We win.

=================

Details: What linear sort you use is a detail that changes the constants hidden in the big O. We can optimize step by step:

1) Counting sort works fine, but we don't need the stability, so maybe it's enough to...
2) BucketSort or basically count number of letters of each string. {Good enough I think}

{Now you can optimize 2) even further with a trick, of counting UP with one string, then DOWN with the other on the same tally array ... this saves some space and time... big deal who cares }

- S O U N D W A V E October 20, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
2
of 2 votes

#define ALPH_SIZE 256   //change as desired based on encoding
int count[ALPH_SIZE],  i;  

if( s1.length != s2.length ) return false;

for(i=0; i < s1.length ; i++) count[ s1[i] ]++, count[ s2[i] ]--;
for(i=0; i < ALPH_SIZE ) if ( count[i] ) return false;

return true;

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Use unsigned int or another "rollover/under" save integer type above.

unsigned int count[ALPH_SIZE];  //or keep it as "int" if roll-under is "safe"

Or modify as necessary for your "more strict" platform.

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
2
of 2 votes

#define ALPH_SIZE 256  
unsigned int count[ALPH_SIZE],  i;  

if( s1.length != s2.length ) return false;

for(i=0; i < s1.length ; i++) count[ s1[i] ]++, count[ s2[i] ]--;
for(i=0; i < ALPH_SIZE ; i++) if ( count[i] ) return false;

return true;

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

The time complexity of the above code is O(m + s), where m is the length of each string and s is the size of the alphabet. If s could potentially be large in comparison to m, we can use a hash-based set instead of an array to make the complexity O(m).

- eugene.yarovoi October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Hello everyone,
Most of the time for anagram related questions we create a int array of 256 length.
Could you please comment if our solutions will work if strings are not consist of English alphabets. I mean if strings are in some other language (e.g., Chinese, Japanese). Will an array of 256 length will suffice in that case.

- Rakesh Roy October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@Rakesh - The data structure used for storing character occurrence counts depends on the range the characters in the input string.

If the input contains only alpha-numeric then an array of constant size would suffice.
If the input contains wide range of characters then the hash table would save lot of memory.

It all depends on the assumptions which you take during the interview.

- siva October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

@Eugene, Yes correct. I have commented on the O(s) fixed being too large in a reply somewhere else on this thread.

@Siva, In this problem, getting a workable algorithm comes first, the rest is something easily _designable_ with the interviewer (or engineering team), like you say.

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

That is, having a DS that is fixed O(s=encoding size), is better than having one that is O(m=stringsize) for this problem, IMO.

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Considering the runtime, how is this O(m) and not O(m*logm)?

- MartyMacGyver October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Ok I'm biased to a single language alphabet in my answer above ^^^^

Hasing, I feel, is not a "algorithm." Some DSs have natural algorithms coming with them, but "use hash" seems like it's not explaning much. In this problem, I think it's better to think of Hash as a final optimization step to get memory size and/or runtime down for specific applications. And there are many hashing techniques... here is one such idea for a specific application niche (probably best application for this question: 1 language, strings on the alphabet compared) ....

One interesting way to hash to get memory size and runtime O(s) down is to strip out all parts of the encoding that are irrelevant to the problem (that is, anagrams usually refer to strings on the actual alphabet characters).
We can make due without even 256 sized count array (even if we are living in ASCII) with a one-to-one hash that maps 'a' to 'z' into integers 0 to 25, and also characters 'A' to 'Z' into 26 to 51 , irrespective of the underlying encoding size.

#define ALPH_SIZE 26   //size of real alphabet
unsigned count int[2*ALPH_SIZE + 1]; //lower, upper case & other 

//shifted hash (this is one-to-one on the actual alphabets)
unsigned int alphahash( char x )
{
	return ( x >= 'a' && x <='z'  ? x-'a' :                      //lower case
                    x >= 'A' && x <='Z' ? x-'A'+ALPH_SIZE : //upper case
                    2*ALPH_SIZE);                                 //all other characters (junk)
                                                                        //or add other ranges (numbers?)
}

Since the above hash is invertible on the alphabets, we can do the final anagram for-loop just as easily in ~C*52 fixed checks regardless of encoding size (note, this is for this special 1 language anagram niche application).
So O(s=encoding size) become C*52.

Yeah, we can optimize and modify solutions to make them super amazing, but we shouldn't simply post the most fancy one without any explanation. Rather _maybe_ we should show how to "develop" a solution.

Idea => Idea2 => Idea3 => Idea4 => Idea5 => {Optimize/improve/use tricks}

This problem has a "non-trick" chain of ideas that can lead to a good solution for even someone who has never done such a problem.

Anyways time to take a giant break from this place because I'm tired of the way people post questions and how people post responses. People are doing things to save themselves time (like rushing a question post with lack of details or posting a response hoping for validation or a code check) and causing the whole system to become extremely inefficient.
Most people post their code as a response to questions simply as a way to get other people to find bugs in their code.

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

1) "If we can manage to linear sort the strings, this preprocessing step is free since we have to check every letter regardless"

2) "Keep a count of characters we've seen as we sweep each string from left to right"

Above are two such ways to explain how to get an idea for the solution:
"Use a hash" off the bat is not the way to go.

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
-1
of 1 vote

typedef unsigned int uint;

#define ALPH_SIZE 26  //for English alphabet 
                             //so long as lower, upper case parts live in contiguous
                             //ranges of any underlying encoding (ascii, etc.)

uint count[2*ALPH_SIZE + 1]; //lower, upper case & other junk
uint h( char x )
{
    return (x >= 'a' && x <='z' ? x-'a' :           //lower case
            x >= 'A' && x <='Z' ? x-'A'+ALPH_SIZE : //upper case
            2*ALPH_SIZE)    //all other characters
                                              
}

/* precondition:  you supply it with strings on English alphabet 
                  that is, the rest of the code in your program
                  should not create junk strings/characters  if you want proper results    */

if( s1.length != s2.length ) return false;

for(i=0; i < s1.length ; i++) count[h(s1[i])]++, count[h(s2[i])]--;
for(i=0; i < 2*ALPH_SIZE ; i++) if ( count[i] ) return false;

return true;

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
3
of 3 votes

There is much better way to check if two strings are anagrams O(n) time and O(1) space where n is the length of the longest string.

1) Assign each of the 26 letters a unique prime number i.e. {'a'=>2, 'b'=>3, 'c'=>5, 'd'=>7, 'e'=>11, ..., 'l' = 37, ..., 's' => 67, .... ,'t' => 71,.... 'z' => 101}
2) All the anagrams of a word will have the same UNIQUE product. For example: prod(TEA) = prod(ATE) = prod(ETA) = prod(TAE) = prod(AET) = prod(EAT) = 71*11*2 = 154

public static final int primes[] = {2, 3, 5, 7, 11, 13, ......, 101} ;

public boolean isAnagram (String word1, String word2)
{
    	if(word1.length() != word2.length())
        	return false;

	word1 = word1.toLower();
	word2 = word2.toLower();

	int word1Key = 1;	
	for(int i=0; i<word1.length; i++)
		word1Key*=primes[word1[i]-'a'];

	int word2Key = 1;	
	for(int i=0; i<word2.length; i++)
		word2Key*=primes[word2[i]-'a'];

	return word1Key == word2Key;
}

- zahidbuet106 May 12, 2014 | Flag
Comment hidden because of low score. Click to expand.
2
of 2 vote

Sorting two strings would take O(nlogn) at best. If you're concerned about time, this isn't the best approach. The better approach would be to use a hash map. This would give you an time of O(n). However, if you have really long strings, you could have a problem with space.

public class AnagramChecker {
	public static void main(String[] args) {
		String s1 = "this is a test";
		
		String s2 = "this test is a";		
		String s3 = "this  test is a";
		String s4 = "this is a tes1";
		if (isAnagram(s1,s2)) {
			System.out.println("these are anagrams, string 1: " + s1 + " and string 2:" + s2);
		} else {
			System.out.println("these are not anagrams, string 1: " + s1 + " and string 2:" + s2);
		}
		if (isAnagram(s1,s3)) {
			System.out.println("these are anagrams, string 1: " + s1 + " and string 2:" + s3);
		} else {
			System.out.println("these are not anagrams, string 1: " + s1 + " and string 2:" + s3);
		}
		if (isAnagram(s1,s4)) {
			System.out.println("these are anagrams, string 1: " + s1 + " and string 2:" + s4);
		} else {
			System.out.println("these are not anagrams, string 1: " + s1 + " and string 2:" + s4);
		}
	}

	private static boolean isAnagram(String s1, String s2) {
		Map<String,Integer> letters = new HashMap<String,Integer>();
		
		if (s1.length() != s2.length()) return false;
		
		
		for (int i = 0; i < s1.length(); i++) {
			String cachedLetter = String.valueOf(s1.charAt(i)); 
			Integer count = letters.get(cachedLetter);
			if ( count == null ) {
				letters.put(cachedLetter, 1);				
			} else {
				letters.put(cachedLetter, count + 1);
			}			
		}
		for (int i = 0; i < s2.length(); i++) {
			String cachedLetter = String.valueOf(s2.charAt(i)); 
			Integer count = letters.get(cachedLetter);
			if ( count == null ) {
				return false;			
			} else {
				count = count - 1;
				if (count != 0) 
					letters.put(cachedLetter, count);
				else 
					letters.remove(cachedLetter);
			}			
		}
		return true;
		
	}
}

- RC October 20, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 votes

RC, nlgn is only a minimum on < comparison sorting.
And no need for hash table or any storage that grows with string size.
Below uses fixed O(encoding size) storage regardless of strings.

Try,
// put in a function returning true if s1, s2 are anagrams

#define ALPH_SIZE 256   //change as desired based on encoding
int count[ALPH_SIZE],  i;  

if( s1.length != s2.length ) return false;

for(i=0; i < s1.length ; i++) count[ s1[i] ]++, count[ s2[i] ]--;
for(i=0; i < ALPH_SIZE ) if ( count[i] ) return false;

return true;

//please fix bugs too

Note: Above is essentially optimized bucket sort (stripping out the parts not needed from it).

That is, optimized form of abstract solution below(m= string length):

1) t1 = linearsort(s1), t2 = linearsort(s2) <=== O(m) work
2) now compare t1 and t2 letter by letter <=== O(m) work

Any linear sort followed by step through comparing strings works optimally.
The optimized form above came to me after thinking in abstract about linear sorts (then optimizing by "strip out"). The linear sort came to me because I wanted to match the theoretical minimum for the problem as best as possible.

No magic rabbit out of a hat needed. Can we start explaining how we are "thinking" of solutions? Of course, to some people that answer is "I remember this question like the 1000 other questions I remember" but .... and least reverse engineer what you remember as a solution into a thought out process?

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Use unsigned int or another "rollover/under" save integer type above.

unsigned int count[ALPH_SIZE];  //or keep it as "int" if roll-under is "safe"

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

Your 'bucket sort' is really just a hash table with the key being the value of the character as the index and the value in the index. The problem with this solution is the space complexity. So, while your solution doesn't grow with string size, it does grow with character base.

Hashtables that don't have perfect functions do have problems. As the taken keys increase, the table must increase. This could be a problem with run-time.

- Anonymous October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 votes

I still do not get it, what are you correcting with your reply? Any objections?

Count array method above is basically a hash table with trivial hash h(x) = x, that keeps counts of keys seen instead of storing satellite data (i.e., keeps information about duplicate keys). We all know this.

The count array comes naturally from thinking of the abstract solution using linear sorts. To call it a hash table right off the bat is not the right way to go here. We can call "any" array with some meaning given to the index a hash table, since h(index) = index is used to index into the array (implicitly, without a hash function). That's like calling a cherry bomb a WMD.

If someone says "use a hash table of size O(stringsize)" then that is pulling a rabbit out of a hat.

I have explained the process of developing the solution, which is roughly: try to match. theoretical min. => pre-process by linear sort => bucket sort is one such type of sort => strip out unnecessary bits => keep a tally of letters

If character encoding range is too big for your app., then the logical adjustment is another => hash the character counts (to get smaller memory usage). e.g., if you don't want to use a sizeof(int)x64kB count array for unicode in Java, then the adjustment is minor. For one you can use "byte" or "char" then it's only 64kBytes memory, else you can map onto a smaller chunk of memory (e.g., a type of hashing).

And you are incorrect, there is no problem at runtime. The size of the tally array (or tally hash array if you want to go that route) is a matter of design, not a runtime problem. Encoding is fixed at compile time.

- S O U N D W A V E October 21, 2013 | Flag
Comment hidden because of low score. Click to expand.
0
of 0 vote

import java.util.HashMap;

public class Anagram {

	static HashMap<Character,Integer> Anagram_table = new HashMap<Character,Integer>();
	public Anagram() {


	}

	public static boolean checkIfAnagram(String s1,String s2)
	{
		if(s1.length()!=s2.length())
			return false;




		for(int i =0;i < s1.length();i++)
		{	char c = s1.charAt(i);
		int count =0;
		if(!Anagram_table.containsKey(c))
			Anagram_table.put(c, ++count);
		else 
			Anagram_table.put(c,(Anagram_table.get(c))+1);			
		}

		for(int j =0;j<s2.length();j++)
		{
			char c = s2.charAt(j);
			if(Anagram_table.containsKey(c))
			{
				Anagram_table.put(c,Anagram_table.get(c)-1);
				if(Anagram_table.get(c)==0)
					Anagram_table.remove(c);
			}
			else 
				return false;



		}

		return(Anagram_table.isEmpty());	

	}

}

- RS October 21, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

bool checkAnagram(string s1, string s2)
{
	if(s1.size() != s2.size())
		return false;
	int A[ALPH_SIZE];
	for(int i=0; i<ALPH_SIZE; i++)
		A[i] = 0;
	for(int i=0; i<s1.size(); i++)
		A[s1[i]]++;
	for(int i=0; i<s2.size(); i++)
	{
		A[s2[i]]--;
		if(A[s2[i]] < 0)	return false;
	}
	return true;
}

- Jason October 23, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.
0
of 0 vote

For this problem, why can't we just put all the characters of the string into a SET and then try to put the characters of the second string into the set. If the SET already contains the character, we should remove it. In the end, if the set is empty, we know its an anagram :). This o(n) solution

- bhajatin November 07, 2013 | Flag Reply
Comment hidden because of low score. Click to expand.


Add a Comment
Name:

Writing Code? Surround your code with {{{ and }}} to preserve whitespace.

Books

is a comprehensive book on getting a job at a top tech company, while focuses on dev interviews and does this for PMs.

Learn More

Videos

CareerCup's interview videos give you a real-life look at technical interviews. In these unscripted videos, watch how other candidates handle tough questions and how the interviewer thinks about their performance.

Learn More

Resume Review

Most engineers make critical mistakes on their resumes -- we can fix your resume with our custom resume review service. And, we use fellow engineers as our resume reviewers, so you can be sure that we "get" what you're saying.

Learn More

Mock Interviews

Our Mock Interviews will be conducted "in character" just like a real interview, and can focus on whatever topics you want. All our interviewers have worked for Microsoft, Google or Amazon, you know you'll get a true-to-life experience.

Learn More