haroud
BAN USER
One way is to enforce a read before a write to the web page. That ensures that every write is on the most updated data. But even that is not sufficient. THe 'RMW has to be atomic in some manner. For example, when a read is done for a subsequent write, a locking mechanism can be used with a timeout. The locking mechanism should use some identification that the subsequent write will provide that will allow it to go through.
A plain read from the second admin will be let thru but a RMW from second admin will either return an error code (try again) or block till the first write is completed.
The simplest hash map is an array. Assuming that there is sufficient space, on a collision, you can increment in a predictable way to the next empy slot and put the value there. In all cases you need to put both key and value into the array.
As long as the method to increment is the same for insert and search, and you compare the key, you can take care of conflicts that way.
But you need to create a randomizer function that will give the same value for the same key.
I would keep the top N (min in this case) in a sorted list. For each new number, I will check against the max of this list. If the new number is less than the max, than it deserves to be in the list and the max number if thrown out. e.g. if we order the list in decrementing order, list[0] will be the max of the min N.
list = [10000 for n in range(N)]
def process_number(n):
if n < list[0]:
list[0] = n
i = 1
while i < N and list[i-1] < list[i]:
(list[i], list[i-1]) = (list[i-1], list[i])
list[N] contains the N minimum values processed.
- haroud December 03, 2015Quick thought: We may be able to do BFS instead of DFS and accumulate the paths to each node (while adding the current node). At the final node, add the node to each of the accumulated paths and print out.
All the paths are of equal length in this problem.
someone: good point.
Since the data query is by date, my first instinct is to keep the data as a hash table keyed by date, and then keyed by city.
hash = {date: {city: temp}}
Getting the temp for a city on a particular date is in the avg case a O(1) hash lookup and another O(1) hash lookup in the city.
For finding the top 5, we would first get the data by date first. This will give us a unsorted list of {city: temp} value pairs. As we process the list, we can keep a list of top 5 values (min or max). If the new value is <//> of the min value, then we swap out fhe min value. This can be optimized by keeping this 5 array in sorted order.
Instead of doing any sorting on this array, since this is just a consta number, we can do a bubble of the value up until it is in the right place. e.g. the new number ifs first swapped with top_five[0] and then bubbled up until is it between top_filve[i], top_five[i+2] where top_five[i] < new_val < top_five[i+2]
So every new value can potentially result in a reordering of the top five list.
The complexity will be O(N * 1) where N is the number of cities. Worst case is
that the numbers are read in sorted order so that each data swaps out the old so it will process the top five. Best case is it will again O(N) where only the first 5 cities cause a swap out and after that no swap happens.
def preprocess(file, data):
with open(file, 'r') as datafile:
for line in datafile.readlines():
(date, city, temp) = line.split()
print 'Adding %s:%s to %s' % (city, temp, date)
if date in data.keys():
data[date][city] = temp
else:
data[date] = {city: temp}
def find_temp_city_date(date, city, data):
try:
return data[date][city]
except KeyError:
raise Exception("Date or City not in database")
def find_hottest_five(date, data):
top_five = [(0, "") , (0, ""), (0, ""), (0, ""), (0, "")]
for city, temp in data[date].items():
if temp > top_five[0][0]:
top_five[0] = (temp, city)
for t in range(len(top_five) - 1):
if top_five[t][0] > top_five[t+1][0]:
(top_five[t], top_five[t+1]) = (top_five[t+1], top_five[t])
print "%s" % top_five
for d in reversed(top_five):
print "(%s, %s)" % (d[0], d[1])
data = ('09-11-2015 Delhi 45\n'
'09-11-2015 Guwahati 51\n'
'09-11-2015 Kolkata 42\n'
'09-11-2015 Bombay 35\n'
'09-11-2015 Ahmedabad 35\n'
'09-11-2015 Chandigarh 41\n'
'09-11-2015 NewYork 25\n'
'09-11-2015 SanFrancisco 40\n'
'09-11-2015 Shanghai 23\n'
'09-11-2015 Junea 20\n'
'09-11-2015 Toronto 21\n'
'10-11-2015 Delhi 45\n'
'10-11-2015 Guwahati 51\n'
'10-11-2015 Kolkata 42\n'
'10-11-2015 Bombay 35\n'
'10-11-2015 Ahmedabad 35\n'
'10-11-2015 Chandigarh 40\n'
'10-11-2015 NewYork 25\n'
'10-11-2015 SanFrancisco 40\n'
'10-11-2015 Shanghai 23\n'
'10-11-2015 Junea 20\n'
'11-11-2015 Toronto 21\n'
'11-11-2015 Delhi 45\n'
'11-11-2015 Guwahati 51\n'
'11-11-2015 Kolkata 42\n'
'11-11-2015 Bombay 35\n'
'11-11-2015 Ahmedabad 35\n'
'11-11-2015 Chandigarh 40\n'
'11-11-2015 NewYork 25\n'
'11-11-2015 SanFrancisco 40\n'
'11-11-2015 Shanghai 23\n'
'11-11-2015 Junea 20\n'
'11-11-2015 Toronto 21\n')
with open("a.data", 'w') as f:
f.write(data)
$ python city_temp.py
Adding Delhi:45 to 09-11-2015
Adding Guwahati:51 to 09-11-2015
Adding Kolkata:42 to 09-11-2015
Adding Bombay:35 to 09-11-2015
Adding Ahmedabad:35 to 09-11-2015
Adding Chandigarh:41 to 09-11-2015
Adding NewYork:25 to 09-11-2015
Adding SanFrancisco:40 to 09-11-2015
Adding Shanghai:23 to 09-11-2015
Adding Junea:20 to 09-11-2015
Adding Toronto:21 to 09-11-2015
Adding Delhi:45 to 10-11-2015
Adding Guwahati:51 to 10-11-2015
Adding Kolkata:42 to 10-11-2015
Adding Bombay:35 to 10-11-2015
Adding Ahmedabad:35 to 10-11-2015
Adding Chandigarh:40 to 10-11-2015
Adding NewYork:25 to 10-11-2015
Adding SanFrancisco:40 to 10-11-2015
Adding Shanghai:23 to 10-11-2015
Adding Junea:20 to 10-11-2015
Adding Toronto:21 to 11-11-2015
Adding Delhi:45 to 11-11-2015
Adding Guwahati:51 to 11-11-2015
Adding Kolkata:42 to 11-11-2015
Adding Bombay:35 to 11-11-2015
Adding Ahmedabad:35 to 11-11-2015
Adding Chandigarh:40 to 11-11-2015
Adding NewYork:25 to 11-11-2015
Adding SanFrancisco:40 to 11-11-2015
Adding Shanghai:23 to 11-11-2015
Adding Junea:20 to 11-11-2015
Adding Toronto:21 to 11-11-2015
[('40', 'SanFrancisco'), ('41', 'Chandigarh'), ('42', 'Kolkata'), ('45', 'Delhi'), ('51', 'Guwahati')]
(51, Guwahati)
(45, Delhi)
(42, Kolkata)
(41, Chandigarh)
(40, SanFrancisco)
THe problem is basically what are the different ways we can reach a particular score starting from (0, 0). From each node of a graph, there are only 2 possible paths. e.g for a score node (i, j) the possible pathsa re (i+1, j) or (i, j+1).
We can create the graph recursively first and then run a DFS to find all the paths.
In python:
In the graph creation part, I look for whether (i+1, j) and (i, j+1) are valid and if so, recursively call the same function to populate the graph. Once the score value goes above either i, j or both, the recursion will return.
In the case of the printing, we need to keep track of the path for each node as a string. And append to it as we progress down the tree graph. When we hit a node which has an empty list, then we print out the full path at that point and return. Then the backtracking will take the next potential path and reach the final node again printing out that path as well.
def create_score_graph(current_score, final_score, graph):
(I, J) = final_score
(i, j) = current_score
if i < I:
if (i+1, j) not in graph[(i, j)]:
graph[(i, j)].append((i+1, j))
if (i+1, j) not in graph.keys():
graph[(i+1, j)] = []
create_score_graph((i+1, j), final_score, graph)
if j < J:
if (i, j+1) not in graph[(i, j)]:
graph[(i, j)].append((i, j+1))
if (i, j+1) not in graph.keys():
graph[(i, j+1)] = []
create_score_graph((i, j+1), final_score, graph)
def print_all_scores(graph, current_score, pstr):
(i, j) = current_score
mystr = pstr[:]
if not mystr:
mystr += "(%s, %s)" % (i, j)
else:
mystr += ",(%s, %s)" % (i, j)
if not graph[current_score]:
print mystr
else:
for score in graph[current_score]:
print_all_scores(graph, score, mystr)
graph = {(0,0): []}
final_score = (2, 3)
start_score = (0, 0)
create_score_graph(start_score, final_score, graph)
import pprint
pp = pprint.PrettyPrinter()
pp.pprint(graph)
# graph should have all possible path to final score
print_all_scores(graph, start_score, "")
$ python get_score_combination.py
{(0, 0): [(1, 0), (0, 1)],
(0, 1): [(1, 1), (0, 2)],
(0, 2): [(1, 2), (0, 3)],
(0, 3): [(1, 3)],
(1, 0): [(2, 0), (1, 1)],
(1, 1): [(2, 1), (1, 2)],
(1, 2): [(2, 2), (1, 3)],
(1, 3): [(2, 3)],
(2, 0): [(2, 1)],
(2, 1): [(2, 2)],
(2, 2): [(2, 3)],
(2, 3): []}
(0, 0),(1, 0),(2, 0),(2, 1),(2, 2),(2, 3)
(0, 0),(1, 0),(1, 1),(2, 1),(2, 2),(2, 3)
(0, 0),(1, 0),(1, 1),(1, 2),(2, 2),(2, 3)
(0, 0),(1, 0),(1, 1),(1, 2),(1, 3),(2, 3)
(0, 0),(0, 1),(1, 1),(2, 1),(2, 2),(2, 3)
(0, 0),(0, 1),(1, 1),(1, 2),(2, 2),(2, 3)
(0, 0),(0, 1),(1, 1),(1, 2),(1, 3),(2, 3)
(0, 0),(0, 1),(0, 2),(1, 2),(2, 2),(2, 3)
(0, 0),(0, 1),(0, 2),(1, 2),(1, 3),(2, 3)
(0, 0),(0, 1),(0, 2),(0, 3),(1, 3),(2, 3)
Requirements:
1. millions of users
2. each user can login from multiple devices (assume 1 at a time)
3. user can send, draft, delete, trash, spam, move to another folder
4. Add contact, add calendar entry, send invite... (optional)
5. 2 level authentication for new device
6. Long term testing (many days), short term (hours)
Scale:
Need many users (100s of millions)
Distributed tests with 100s of agents/slaves
Each slave runs multi-threaded agents with 1000s of email clients
Do we need geographical diversity?
High Level Test Design:
Distributed testing recommended
Master tester keeps track of all slaves completed test, failed test etc.
Asynchronous. Master polls for results or receives some notification when test is done.
Master times out at some point with data of tests passed, failed, incomplete
Master also allocates username lists to slaves (common, unique etc).
Mechanisms to enforce some sequencing when slaves access same account
Slave tester provides list of usernames
Slave tester uses unique identifies to store info in drafts folder.
Slaves logs all tests and uploads results/logs to master.
I dont think just printing the possible combinations is sufficient. Each intermediate score can ben reached via different paths, and this only prints each one once.
- haroud December 04, 2015