Spotify Interview Question
Software Engineer / DevelopersCountry: United States
Interview Type: Phone Interview
I suppose one way would be to break the set up into chunks so that two chunks fit into memory at the same time, then test the first chunk against every other chunk, removing duplicates. Then test the second chunk against all but the first, removing duplicates and so on. In the end you have tested every chunk against every other chunk, so there should be no duplicates.
It would take O(N^2) time.
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.ListMultimap;
import java.util.*;
import java.util.function.BiFunction;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class RemoveDuplicates {
private static final int MEMORY_SIZE = 4;
public static void main(String[] args) {
final List<String> values = Arrays.asList("1", "2", "4", "a", "2", "a", "9", "3", "1", "7", "3");
removeDuplicates(values.stream(), values.size(), hashFunction(), MEMORY_SIZE, 0);
}
private static final int[] primes = new int[] { 1, 2, 3, 5, 7, 11, 13, 17, 19, 23}; // Etc.
public static BiFunction<String, Integer, Integer> hashFunction() {
return (s, prime) -> s.hashCode() % prime;
}
public static <T> void removeDuplicates(final Stream<T> dataset, final int size, BiFunction<T, Integer, Integer> hashFunction, final int memorySize, int iteration) {
if(size > memorySize) { // We can also choose to do an external sort when iteration > MAX_ITERATION and then de-duplicate (but that is slower)
final ListMultimap<Integer, T> map = ArrayListMultimap.create();
dataset.forEach(el -> map.put(hashFunction.apply(el, primes[iteration]), el));
for(Integer key : map.keySet()) {
final Collection<T> value = map.get(key);
System.out.println("List (" + iteration + "): " + value);
removeDuplicates(value.stream(), value.size(), hashFunction, memorySize, iteration + 1);
}
} else {
final Set<T> set = dataset.distinct().collect(Collectors.toSet());
System.out.println(set);
}
}
}
Use "external sort" and find duplicates. O(n logn)
- Karthik April 15, 2013