Interview Question
Country: India
Interview Type: Written Test
Seems like there is a confusion in the wording of the question:
K-th largest element
vs.
K largest elements
For "K-th largest element":
--------------------------
Use quick select algorithm with linear time complexity.
For "K largest elements":
--------------------------
There are 4 solutions worth discussing:
Solution #1. Min-Heap solution with O(N log K) time, O(K) extra space:
Use a min heap of size at most K to store elements. Keep inserting element into the min heap. When the size is K+1, we know that the minimum element in the heap will always smaller than at least K other elements (also in the min heap now). Thus, that minimum element must be extracted from the heap, maintaining the heap size of at most K.
Solution #2. Max Heap solution, with O(K log N) time, O(N) space:
Use a max heap of size N. Making a heap (e.g., max heap) from an array of size N can be done in O(N) time, no extra space. After making such a max heap of size N, just extract K elements from it to get K largest elements, in O(K log N) time.
Note, when K<<N, K log N is also << N log K. Therefore, the max-heap solution has better time complexity than the min-heap solution.
Solution #3. Algorithm via quick select idea:
- Find the K-th largest element X using quick select, in O(N) time.
- Partition the array into 2 parts, using X as the pivot, in O(N) time. Suppose the left part contains all K numbers >= X.
- Sort the left part, using most efficient sorting algorithm, which takes f(K) time.
Time complexity: O(N + f(K)). Where f(K) = K if using linear time sorting algorithm, or f(K) = K log K if use comparison-based sorting algorithm, like merge sort.
Solution #4. Algorithm via sorting:
- Sort the array;
- Print first K largest elements.
Complexity of sorting algorithms can be:
- O(N log N), if use best comparison-based sorting algorithm like merge short;
- O(N*L) if use radix sort, where L = length of the largest element, can consider L = O(1);
- O(N + M) if use counting sort, where M = value of the largest element.
My order of preference (based on time complexity and easiness of implementation):
1. radix sort, if O(N) extra space is allowed;
2. max-heap, if modification on original array is allowed (for make_heap()), or O(N) extra space is allowed;
3. min-heap, if O(K) space is allowed;
4. partitioning via quick select (not trivial implementation).
There are 2 solutions to the problem.
- If space is not a constraint but time is, then you use the solution(Randomized Partition) that I had provided above earlier.
- now, if space is constraint, you do a trade off and use a MAX heap. yes, MAX heap and not min heap!
the approach is, if you're trying to find 4th element in a array of size 10, that means, if you sort that array, it going to be on the 4th place. (and we're not going to sort, this is just an example). That mean, this kth number is going to be the max of the kth array.
So the idea is:
- Build a kth size max heap
- run rest of the element through the 0th position of the heap.
- if the number is lesser then the 0th position of the heap, replace that at 0 position and correct the heap.
- at the end, you'll have the kth element at top of the heap.
Here is the C# implementation:
// kth number using max-heap
using System;
public class Program
{
public static void Main()
{
// 2 4 6 8 10 12 14
int[] arr = {12, 4, 14, 6, 10, 16, 2, 8};
// find 4th smallest number
int num = Find(arr, 4);
Console.Write(num);
}
public static int Find(int[] arr, int k)
{
int n= arr.Length;
int[] heap = new int[k];
for (int i = 0; i < k; i++)
{
heap[i] = arr[i];
}
// build heap
for (int p = (k - 1) / 2; p >= 0; p--)
{
MaxHeapify(heap, k, p);
}
for(int i=k; i<n; i++)
{
if(arr[i] < heap[0])
{
heap[0] = arr[i];
MaxHeapify(heap, k, 0);
}
}
return heap[0];
}
public static void MaxHeapify(int[] heap, int heapSize, int index)
{
int l = index * 2 + 1;
int r = index * 2 + 2;
int largest = (l < heapSize && heap[index] < heap[l]) ? l : index;
if (r < heapSize && heap[r] > heap[largest])
largest = r;
if (largest != index)
{
Exch(heap, index, largest);
MaxHeapify(heap, heapSize, largest);
}
}
private static void Exch(int[] arr, int l, int r)
{
try
{
int temp = arr[l];
arr[l] = arr[r];
arr[r] = temp;
}
catch
{
Console.Write(l + " ");
Console.WriteLine(r);
}
}
}
Complexity: n Log(k)
Now the beauty of this implementation is, once you have the k capacity heap, you need to access each of the rest of the elements once only. so, you can run a stream of numbers through the heap and will have the kth number at the end. :)
Hope this helps!
public class Myclass {
public static int find(int [] A, int k){
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
for(int i=0;i<A.length;i++){
pq.offer(A[i]);
}
int n = -1;
k=pq.size()-k+1;
while(k>0){
n = pq.poll();
k--;
}
return n;
}
public static void main(String[] args) {
int[] A = { 1, 2, 10, 20, 40, 32, 44, 51, 6 };
int k = 4;
System.out.println("4th biggest element:" + find(A,4));
}
}
This is an Order statistics problem and can be solved using Randomized Partition (which is of O(n)).
Here is the implementation in C#:
public class OrderStatistics
{
// Driver
private static void Main()
{
var arr = new[] { 6, 14, 10, 2, 8, 12, 4 };
// find 3rd largest element
var value = FindKth(arr, 3);
Console.Write(value);
Console.ReadKey();
}
public static int FindKth(int[] arr, int kth)
{
if (kth < 1 || kth > arr.Length)
{
throw new Exception("Index out of range");
}
var pos = Find(arr, kth - 1, 0, arr.Length - 1);
return arr[pos];
}
private static int Find(int[] arr, int kth, int start, int end)
{
int p = Partition(arr, start, end);
if (kth == p)
return p;
if (kth < p)
p = Find(arr, kth, start, p);
else
p = Find(arr, kth, p + 1, end);
return p;
}
private static int Partition(int[] arr, int start, int end)
{
int p = start,
i = start;
for (int j = start + 1; j <= end; j++)
{
if (arr[j] < arr[p])
{
Exchange(arr, ++i, j);
}
}
Exchange(arr, p, i);
return i;
}
private static void Exchange(int[] arr, int i, int j)
{
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
MIN-HEAP..!!
Given: array of N elements.
To Find: Top K elements.
-Take first K elements from array and build-min-heap --- O(K) time.
-Now take the rest (N-K) elements and compare with root element. IF greater than Root insert into min-heap and perform min-heapify--- O(log K) time.
-In worst case we need to do (N-K) times min-heapify ---O((N-K)log K).
Total worst case time --- O(K) + O((N-K)log K)
Max heap of size K. Usefull as it sounds like the file might be not allowed to be read into memory. You browse the file element, by element adding to the heap.
- Anonymous July 27, 2014