Nightingale
BAN USERIn addition to the numerous hash table based approaches described in this problem comments, it seems the solution can also be expressed through using auxiliary Disjoint Set structure to perform the actual merge of the nodes in the node set. That is:
* Use hash table to store the nodes where each node contains information about its parent node in the disjoint set, as well as min and max value.
* When iterating over numbers, add a node for this number into the set if it is not there yet and join the node with x+1 and x-1 nodes.
* Once done, scan over the root nodes in the disjoint set forest and find the one with maximum set size.
All steps are amortized O(n).
Code:
# Given an array of random numbers. Find the longest consecutive sequence.
# For ex
# Array 100 3 200 1 2 4
# Ans 1 2 3 4
# Array 101 2 3 104 5 103 9 102
# Ans 101 102 103 104
#
# Can we do it in one go on array using extra space??
import random
data = [int(5000*random.random()) for i in xrange(1000000)]
class Node:
def __init__(self, num):
self.parent = None
self.rank = 0
self.set_size = 1
self.min_num = num
self.max_num = num
class DisjointSet:
def __init__(self):
self.nodes = {}
def add(self, num):
if num not in self.nodes:
self.nodes[num] = Node(num)
def find(self, num, add=False):
x = self.nodes.get(num)
if x is None:
return None
if x.parent is None:
return num
else:
return self.find(x.parent)
def join(self, num1, num2):
p1 = self.find(num1)
if p1 is None: return
p2 = self.find(num2)
if p2 is None: return
if p1 != p2:
# different sets - join now
n1 = self.nodes[p1]
n2 = self.nodes[p2]
assert n1.parent == None
assert n2.parent == None
if n1.rank == n2.rank:
n1.rank += 1
if n1.rank < n2.rank:
(p1, p2) = (p2, p1)
(n1, n2) = (n2, n1)
n2.parent = p1
n1.set_size += n2.set_size
n1.min_num = min(n1.min_num, n2.min_num)
n1.max_num = max(n1.max_num, n2.max_num)
#data = [6, 100, 3,200,1,2,4]
ds = DisjointSet()
for i in data:
ds.add(i)
ds.join(i, i+1)
ds.join(i, i-1)
max_size = 0
max_start_num = None
for i in ds.nodes.values():
size = i.max_num - i.min_num + 1
if size > max_size:
max_size = size
max_start_num = i.min_num
print "start=%d, finish=%d" % (max_start_num, max_start_num+max_size-1)
Algorithm:
* Initialize lo to 0, hi to 1.
* Do hi = hi * 2 while data[hi] <= x
* Then do binary search between lo and hi for the first element greater than x
The best tutorial on binary search I ever read is on TopCoder (search in google with "topcoder" and "binary search" keywords - can't do links here)
I know how to do this in O(N * ln N) - Python code below. Linear solution seems to be possible given the fact the numbers are ranged - so we should be able to map the numbers to the sorted order in linear time, but then we need to iterate over them in linear time and that will take 10^6 steps. Whether it's better or worse than O(N * ln N) will depend on how many numbers we have.
s = [1, 1, 5, 3, 6, 4]
print sum([i[0][0] * min(i[0][1], i[1]) for i in zip(
sorted(zip(s, reversed(range(len(s))))),
reversed(range(len(s))))])
This is a sliding range minimum query problem. It can be solved in O(n) time.
- Nightingale November 25, 2012