Zynga Interview Question
Software EngineersCountry: United States
Python :)
peopleDict = { 'John' :['movie', 'dancing', 'singing', 'bar', 'FIFA'],
'Alex': ['computer', 'programming', 'FIFA', 'bar'],
'Lisa': ['singing', 'dancing', 'computer', 'bar', 'FIFA'],
'Alexa': ['computer', 'FIFA', 'bar'],
'Rose': ['singing', 'computer', 'bar', 'FIFA']
}
def matchThese(name1, name2):
matching = set(peopleDict[name1]).intersection(set(peopleDict[name2]))
totalScore = len(set(peopleDict[name1]).union(set(peopleDict[name2])))
print('match score between {0} and {1} is {2} out of {3}'.format(name1,name2,len(matching), totalScore))
print('Common interest are: ', matching)
matchThese('Alex', 'Rose')
I assume, intentionally information is held back on the requirement. So let's say we ask more questions and come up with the following requirements
- Saurabh July 09, 2018Requirements:
===========
1. The algorithm will identify people with similar interests based on the answers they provide to the set of questions.
Questions can be like.. on a scale from 1 to 5 how important is outdoor sports to you or write briefly about your favorite historical event etc
2. The score of the people would be their similarity index (Cosine or Jaccard index)
3. Rank matching would be a predictive matching score given to a new person when matched with an existing profile.
With the above requirements, essentially we are trying to solve a "K nearest neighbor" (KNN) problem. This problem can be solved using an advanced data structure like quadtree / KD tree or by using a locality sensitive hashing (LSH).
We can use LSH to find the minHash for the answers they provided to the questions. A very simplistic example is, below are the 3 historic events written by 3 people
1. American war of 1775
2. World war II war
3. American independence war
Here based on the Jaccard index (The Jaccard index is an intersection over a union index. We count the number of common elements from two sets/sentences and divide by the number of elements that belong to either the first set or the second set) person 1 and 3 are near to each other in historical liking than 1 and 2 or 2 and 3.
(We will have to provide a much detailed example in the actual interview discussion)
The above index will be the answer to requirement #2.
For the requirement #3, we keep a group of people on 2D scale and when an unknown person appears on the scale, we calculate its distance from K nearest neighbors. For example, if we have 2 set of people, one with liking for sports and 1 with a liking for arts. When a new user is plotted on the scale, we calculate its distance from k nearest neighbors and based on the answer (like its close to 4 sports people and 1 arts person) we decide whether this new user is more likely to be a good match with people from sports category or from an arts category.