laperla
BAN USERWhat a wicked answer, Erasmus, thank you very much! I stand corrected,
plead guilty as charged and offer a reimbursement. You're right that
splice is not O(1), neither in perl nor in python. To preserve the
contiguous memory locations, I offer to run an equivalent chain of swaps
whenever the iterator i hits a number that must be moved to the left.
Here is the improved rearrange function:
def rearrange (a):
good_i = -1
i=0
while (i<len(a)):
v = a[i]
if ( is_odd(good_i) and v<0 or is_even(good_i) and v>=0 ):
if (i != good_i+1):
for j in range(i,good_i+1,-1):
a[j]=a[j-1]
a[good_i+1] = v
i = good_i = good_i+1
i+=1
return a
The trick is to _move_ the next element to its place in the array, not to swap. Such an operation is also known as splice. I wrote it in python. I'd argue, it is O(N) even though the iterator i sometimes decreases a bit. The good news is that every splice we do fixes two elements simultaneously.
def is_odd (num):
return num % 2 > 0
def is_even (num):
return not is_odd(num)
def rearrange (a):
good_i = -1
i=0
while (i<len(a)):
v = a[i]
if ( is_odd(good_i) and v<0 or is_even(good_i) and v>=0 ):
if (i != good_i+1):
a[i:i+1] = []
a[good_i+1:good_i+1] = [v]
i = good_i = good_i+1
i+=1
return a
print rearrange([1,2,3])
print rearrange([-1,-2,-3])
print rearrange([1,2,3,-1,-2,-3])
print rearrange([1,2,-1,-2,-3,3])
print rearrange([-1,-2,-3,1,2,3])
print rearrange([-1,1,-2,2,-3,3])
print rearrange([1,-1,2,-2,3,-3])
print rearrange([-1,1,-2,2,-3,3,4,5,6])
print rearrange([-1,1,-2,2,-3,3,-4,-5,-6])
Enjoy
- laperla October 15, 2013Hi Izaaz, in your example the first pass finds the critical threshold T to be 2. The second pass would then calculate T=2 devided by S=3 and ask all servers for all URLs that have a score >= 2/3. In other words it would merge the complete set of URLs and thus get the URL G and the sum of the accesses to it in the second pass.
- laperla October 07, 2013Thanks, vik, for your thoughts on scalability. I think this shows how open ended the question actually is. Without more knowledge about the topology of the machines, datacenters, loadbalancers, etc involved it is not possible to proof the quality and scalability of the algorithm in real life. A few things I would suggest to discuss about this:
- is it a typical weblog file? Typical weblogs have a steep declining start and a long flat tail, so the number of accesses on the tenth rank are usually high: if it is such a distribution, this algorithm is not bad.
- how biased are the loadbalancers? If the loadbalancers have a high degree of randomness built in, then the differences between the machines are statistically already similar enough that the first pass is quite good and the second pass is not much work.
- can clusters of machines be pre-merged? If racks or lines of racks can have a logserver that collects the full logs for the rack or line, then the number of servers to contact can be drastically reduced.
- how often are these calculations repeated? Which similar questions need to be answered? How often? How exact must the numbers be? How up-to-date? Each answer would influence the program and tuning we want to set up to produce fast and good numbers.
As I understand the task, the result must be one or more of the given points on the grid and not any extra point somewhere in between. In my opinion, this is backed by the hint that one should imagine a map of streets.
Given this interpretation is correct, the task is just to sum up all distances that would have to be gone if all people would come together at any of the given points. The distance between two points on a grid is trivial to calculate, just the distance on the x asis plus the distance on the y axis, no graph theory required. The sum must be computed for every point, so the runtime complexity is O(N^2).
Here is my solution written in go. It contains 6 examples that can be verified with a bit of common sense by drawing the points on a grid.
package main
import "fmt"
type Point struct {
X, Y int
}
type Solution struct {
Indexes []int
Min_sum uint
Points []Point
}
func abs(a int) uint {
if a < 0 {
a = -a
}
return uint(a)
}
func grid_distance(p1, p2 Point) uint {
distance := uint(0)
distance += abs(p1.X - p2.X)
distance += abs(p1.Y - p2.Y)
return distance
}
func median_of_grid(points ...Point) {
sol := new(Solution)
sol.Min_sum = ^uint(0)
for i, ivalue := range points {
sum := uint(0)
for _, jvalue := range points {
sum += grid_distance(ivalue, jvalue)
}
if sum < sol.Min_sum {
sol.Min_sum = sum
sol.Indexes = []int{i}
} else if sum == sol.Min_sum {
sol.Indexes = append(sol.Indexes, i)
}
}
for _, value := range sol.Indexes {
sol.Points = append(sol.Points, points[value])
}
fmt.Println(sol)
}
func main() {
// one solution:
median_of_grid(Point{0, 1})
median_of_grid(Point{0, 0}, Point{0, 1}, Point{1, 0})
median_of_grid(Point{0, 0}, Point{3, 4}, Point{4, 3}, Point{1, 1})
// two solutions
median_of_grid(Point{0, 1}, Point{1, 0})
median_of_grid(Point{0, 0}, Point{3, 4}, Point{4, 3})
// four solutions
median_of_grid(Point{0, 1}, Point{1, 0}, Point{2, 1}, Point{1, 2})
}
Enjoy,
- laperla September 28, 2013Presuming a protocol exists that can ask three questions to each server:
* the score of a single url
* the top 10
* the top n that satisfy score >= N
We program a two pass solution like so:
We denote the number of servers as S.
[First pass]
(1) Ask every server for its own top ten
(2) merge the results. For all URLs in the merged set calculate correct values by asking
all servers for their scores for each URL. Calculate a set of top ten from our sample.
(3) pick score of the now tenth URL as the threshold that we try to beat
in the second round. We denote the threshold as T.
[Second pass]
(4) Ask every server for all its top N that satisfy score >= T/S
(5) Merge these bigger samples again as in step (2)
(6) We now have the correct top ten with correct scores.
- can you follow the thought: every URL that has less than T/S hits on
- laperla August 16, 2015all servers cannot be among the top ten anymore? The rationale being
that S*T/S equals T and if they have a bit less than T/S, then all
of them together have less than T.
- if so: how about a URL that has less than T/S hits on a single
server? Sure it can be among the top ten iff the same URL has more
than T/S hits on a different server. Right?
- if so: we will get a hold of such a URL on the other server that has
more than T/S hits there. Either we have already got it in the first
pass (1) or in the second pass (4)
- the exactness of the number of hits is guaranteed in steps (2) and
(5), after we have determined candidate URLs. In the first pass just
a minimal seed of URLs, in the second pass all URLs that might
qualify