Walmart Labs Interview Question
Software Engineer / DevelopersCountry: India
Interview Type: In-Person
I somewhat agree with your comment. I think a hashmap with key as the Keyword and Value as List of Ads that have that keyword would be a good data structure over here.
HashMap<KeyWord,List<Ads>>;
You can get list of all Ads with a specific keyword in O(1). If you have multiple keywords you can get all the K keywords in O(k) and then we need to iterate over the list to make sure it does not contain duplicates. So, I think the overall complexity would be O(k*n) where n is the number of unique Ads. If anyone can knows a better solution please comment.
If the question is just that from a list of keywords, whichever ad contains most words is the right one, why not just iterate all the adds and depending on the number of hits for keywords, assign it an integer and use this integer as a key for the the ad in a dictionary. If there is already a key with this integer, ignore the current one since our only criterion is to find the one with most keywords. So even if its a tie, its still the max.
I wudnt build tries for all those candidates.
I'd preprocess the keywords a la boyer moore horspool and substring match in all candidates (case insensitive). Define objective function on a candidate as something that takes into account total macthes and total different matches from pool of keywords.
Its similar to search engine in a way that we have to associate the query string with the best possible ad(or webpage).
for each keywords populate a graph having edge between keyword node and ad node. the weight age of edge is calculated based on various factors like similarity, occurrences etc.
When getting best ad to be displayed, parse the request content which will contain keywords. based on these keywords get all the ads with minimum weight age cutoff from the graph. using the weight between different keyword and ad determine a cumulative weight age between request content and ads. get the max cumulative weight age ad and display it.