smallchallenges
BAN USER1.
Linux newbie...so, take this with a grain of salt. I looked into Linux source code and here is what I see. I looked into the source and then I understood Stan's comment. But, posting for ref.
55 struct thread_info {
56 struct task_struct *task; /* main task structure */
57 __u32 flags; /* low level flags */
58 __u32 status; /* thread synchronous flags */
59 __u32 cpu; /* current CPU */
60 mm_segment_t addr_limit;
61 unsigned int sig_on_uaccess_error:1;
62 unsigned int uaccess_err:1; /* uaccess failed */
63 };
If the task ptr in is not NULL, then it means that this is a thread. If NULL, it is a process.
- smallchallenges March 31, 2016Assuming ascending sort criterion, how about we create a heap of x elements and insert the first x list elements. Since the list sorting is off by x elements, the smallest should be in the heap. Then keep adding one additional element to the heap and extract the next min element.
- smallchallenges March 31, 2016Answer is c.
Demand paging can be considered a special case of segmentation, with a segment size == one page. That said, I do not know of any commercial UNIX that implements demand segmentation...
char * const p => only the pointer p cannot be changed but it says nothing about not changing the contents of what p is pointing to. For example, if char b[10] = "WORLD", p = b; will not compile.
In this example, *p = 'n'; will print out nELLO.
ptrA - ptrB will return the number of elements of the type that the pointer is pointing to. The assumption being that both pointer A and B are pointing to elements of the same array OR one past the array...IIRC, this restriction is listed somewhere in the standard.
If you do not cast it to char *, then the pointer type of x will be chosen i.e. if it is an int, then the number of ints between the two pointers is returned. If you want to know the size of the object, then what you want is the number of char element types between the two i.e. casting it to char * will provide that for you.
Here is pseudo code I cooked up...
fishes_alive = 0;
fish_upstream = (dequeue(b) == 0);
fish_size = dequeue(a);
while (q_not_empty(b)) {
if (fish_upstream) {
fishes_alive++;
fish_upstream = dequeue(b);
fish_size = dequeue(a);
} else {
// fish is going downstream...check if it can eat anything else.
next_fish_upstream = peek(b);
next_fish_size = peek(a);
if (fish_upstream == next_fish_upstream) {
// both fishes int the same direction...no need to compare sizes
fishes_alive++;
fish_upstream = dequeue(b);
fish_size = dequeue(a);
} else {
// next fish is upstream, curr fish is downstream....compare sizes
if (fish_size < next_fish_size) {
// fish gets eaten by next fish
fish_upstream = dequeue(b);
fish_size = dequeue(a);
} else {
// next fish gets eaten by current fish
dequeue(b);
dequeue(a); // not storing return value => fish dies
}
}
}
}
if (fish_upstream != true) {
// we had a downstream fish and it gobble up all the way to the end
// i.e. it is alive and we need to count it
fishes_alive++;
}
return (fishes_alive);
@emb:
Fair point.
If all the nos are the same, then only one bucket will be filled i.e. easy to bail out and pick the single repeated number as the median.
If the median bucket is > 2^30, then the solution can be adapted to repeat until the wanted bucket size is <= 2^30 i.e. 8GB or 1GB doubles. It would mean one pass through the file for each repetition, but the algorithm will still be O(n).
Does that get me closer :-)
The first solution that comes to mind is to use an external sort to sort the 1TB file of doubles and since the number of doubles is given to be odd, we can pick the middle number. Of course, we need a function to compare two doubles....need to go back and dig into my comp arch stuff to see how to do this. Complexity is nlogn with additional files to store the intermediate sorts in the external sort.
The next thought that came to me was: can we use a max and min heap to keep the first and second half of the numbers. After reading the last number, you will know the median: either the last number OR the top of the max or min heaps. This solution does not use file writes, but will use more memory than 8GB. So, not a good one.
The next thought that came to me was: can I use a bucket sort based method? Here is my thought process. 2^64 is the total possible combinations...we can use just the mantissa of a double (2^52). 8GB/8bytes per double => 1GB doubles. So, to fit 2^52 into 2^30, I can use 22 of the bits in the input number as an index. The plan is to form 2^22 buckets, make one pass through the file and develop a count of nos for each bucket. Note that I am not storing the actual nos. Now you have the total nos in the file and the "middle" count. Find the bucket which maps to the middle count. Go through the file a second time and collect all the nos that belong to the identified bucket. Sort them (takes into account the exponent now) and find the exact median number.
Thoughts?
The songs themselves are write once, read rest of the song's lifetime. So, I would not change the way the song is stored...a sequentially written file.
The directory that maintains the list of songs could and will be randomly accessed. So, this directory data structure needs to support random directory reads. In addition, it seems that a consecutive play should not repeat a song i.e. this calls for sequential directory reads.
One good data structure that supports both efficient random reads (lg n) and supports sequential reads is a btree. Other structures like a skiplist provides both random and sequential access with its own tradeoffs.
Lemme know what you guys think...
I guess the answer might be to state an answer and defend it....here is what I would have tried...
For a set of "n" elements, can any sorting algorithm do better than O(n)? No...we have to visit each element atleast once. So, imo, the best sorting algorithm is one which is O(n). So, I vote for Counting sort IF cpu were the only consideration (time complexity)....counting sort needs additional memory.
If additional memory cannot be used (space complexity), then any O(nlogn) sort should work....as a previous poster says qsort may better.
But, if a worst case bound is required, then merge sort it is.
Probably a bit late on this question...but I thought I would take a shot at it.
There are a couple ways to reduce read latency.
+ On sequential reads, read ahead can reduce latency.
+ Parallelizing the read i.e. reading from multiple disks like in a raid group can reduce latency.
+ Caching blocks can improve latency as well if the hit rate is good.
+ For random reads, readahead will not help and neither will caching. So, using a disk with good random read performance like ssds or sas disks can help as well.
For reducing write latency, there are a couple options as well.
+ Write to persistent memory for lazy writes to disk at a later time
+ Parallelize the writes i.e. divide the writes to multiple disks
..
I see what he is is hinting at....he says take the sum of all the elements in the array and subtract the diff...but that in itself would not prove a series....
Post a tiny walk before coming to my desk, I had a slightly diff solution. Find the lowest two elements in pass 1. From their difference, you know the diff that every adjacent element should satisfy. I would insert all the elements into a hash and then for each element, check the existence of the element+diff in the hash. This would be pass 2. Hitting an already paired element would invalidate the series as would not finding the adjacent element.
What do you think about this? O(n) but needs some extra memory.
If the array elements are unsorted, then it would seem that the logic of checking the difference between adjacent elements does not work. With sorting, we can have two solutions.
+ nlogn sort + check the difference between adjacent elements
+ O(n) sort....counting sort + check the difference between adjacent elements
Ofcourse if the array contains negative elements, then counting sort is tricky....
Let me think some more...
Two questions occur to me....
+ What happens when array is of size 1 or 2?
+ What happens if the series elements are jumbled up i.e. unsorted?
This was not a trivial problem in implementation :-)
#include <stdio.h>
#include <stdlib.h>
void
print_array(int *arr, int size)
{
int i;
for(i=0; i<size; i++) {
printf("%d ", arr[i]);
}
printf("\n");
}
/* Byte-wise swap two items of size SIZE. */
#define SWAP(a, b, size) \
do \
{ \
register size_t __size = (size); \
register char *__a = (a), *__b = (b); \
do \
{ \
char __tmp = *__a; \
*__a++ = *__b; \
*__b++ = __tmp; \
} while (--__size > 0); \
} while (0)
/* Byte-wise set one item of size SIZE. */
#define SET(a, b, size) \
do \
{ \
register size_t __size = (size); \
register char *__a = (a), *__b = (b); \
do \
{ \
*__a++ = *__b++; \
} while (--__size > 0); \
} while (0)
void
insertion_sort(void *ptr, int nitems, int item_size, int (*functionp)(const void *, const void *))
{
int i=0;
char *val, backup_val;
int j;
char *arg1, *arg2, *arg3;
char *charPtr = (char *)ptr;
for(j=0; j<nitems-1; j++) { // 3 4 2 1....2 3 4 1.....1 2 3 4
// Move last element of one sub-array to its final position.
i = j+1;
val = (charPtr + (i*item_size));
SET(&backup_val, val, item_size);
i--;
for(i; i>=0; i--) {
arg1 = (charPtr + (i*item_size));
arg2 = &backup_val;
if (functionp((void *)arg1, (void *)arg2) == 1) {
arg3 = (charPtr + ((i+1)*item_size));
SET(arg3, arg1, item_size); // IMP: SET(dst, src)
} else {
break;
}
}
arg1 = charPtr + ((i+1)*item_size);
arg2 = &backup_val;
SET(arg1, arg2, item_size); // found final place for one element
}
}
int
compare_ints(const void *x, const void *y)
{
int *a, *b;
a = (int *)x;
b = (int *)y;
if (*a < *b) {
return (-1);
} else if (*a > *b) {
return (1);
} else {
return (0);
}
}
void
GoogleSort(void *ptr, int nitems, int item_size, int (*functionp)(const void *, const void *))
{
insertion_sort(ptr, nitems, item_size, functionp);
}
int
main(void)
{
int arr[] = {4, 3, 2, 1};
printf("Array contents before sorting:\n");
print_array(arr, sizeof(arr)/sizeof(int));
GoogleSort(arr, sizeof(arr)/sizeof(int), sizeof(int), compare_ints);
printf("Array contents after sorting:\n");
print_array(arr, sizeof(arr)/sizeof(int));
return (0);
}
Is the question to prove/disprove the inequality?
+ Factorial exists only for non-negative integers....0, 1, 2, 3, .....
+ If x = 0, then (x-1)! == (-1)! is undefined.
+ So, this equation does not hold. I agree with 0xF4.
void
sum_powers_of_two(int n)
{
int i=1;
long sum=0;
int cnt=0;
while (cnt < n) {
printf("2 power %d = %d\n", cnt, i);
sum += i;
i = i << 1;
cnt++;
}
printf("Sum of first n powers of two is: %ld\n", sum);
}
One alternative to comparison based sorting is Counting sorts. Here, the complexity is O(n). But the requirement is that the integer values are bounded and not very sparse as the integers are used to "index into an array".
What are the properties of the input integers in the question?
In a BST, the worst case performance is O(n)...if presented with a degenerate input sequence like 5,4,3,2,1. AVL trees are balanced trees i.e. they guarantee O(logN) performance irrespective of the input sequence. Of course, the balanced property is achieved at a cost on each insert and deletes.
So, one can use AVL trees or any of the other balanced trees when a guarantee of O(log N) performance is needed.
head will not get updated with this code....you will need a double pointer as input OR return the new head ptr.
- smallchallenges January 24, 2015One question. In the "find MODE" operation, are we talking "mode" in the mathematical sense? I.e. the most frequently occurring number at any point in time? Please clarify.
- smallchallenges January 21, 2015How do you guarantee that the hash tables are in order of the input URLs? I mean, if you hash a string, the index can be anywhere in the hash table. So, how do you guarantee that the first url hash that has no duplicates is the first URL in the input file? What am I missing? Please clarify.
- smallchallenges January 21, 2015Thinking about all the books I buy on Amazon, I knew that each book has a unique ISBN number in the URL. Likewise, each non-book product as a unique product code in the URL.The format is something like title/uniqueid/.... So, I would ask if I can use this info. If I could, then I would extract the unique id and create a set of (unique id, file offset, num entries). When the entire url file is processed, I would get the earliest file offset with a num_entries = 1.
Edit:
Thinking a bit more about the set, I would think we need two sets: one indexed by unique id and storing num_entries and the other indexed by file offset and storing unique id. If a btree were used to implement the set, then walking file offets and checking for unique ids with num_entries == 1 would give the earliest unique url.
The nodes on different levels is a trippy case...my code broke on this test case. Thanks for the hint.
- smallchallenges January 18, 2015Super thanks Nick! I realized my mistake. I misinterpreted the question as the sum of the two dices should be 6...the question is the non-occurrence of any 6 in either of the two dices. As soon as I read your answer, I realized my mistake. So, thanks much!
- smallchallenges January 15, 2015Do you mean <2N comparisons? If so, I would propose a counting sort based algorithm....o(n) but extra space.
- smallchallenges January 14, 2015I was thinking of the following idea...will test this with code a bit later. The idea is to use recursion. Lets say we have a three element arr a[] = {1, 2, 3}
Keep a running sum of path encountered until now..
if not last element in the array, recurse
if last element in array, store the running sum into array location AND start a running sum of the values from the bottom. The value at each index is the sum of running sum from top and running sum from bottom.
This will do this with only one array
- smallchallenges January 13, 2015int
is_num(char *ch)
{
char c = *ch;
return ((c >= '0') && (c <= '9'));
}
int
IsNumber(char *str)
{
int is_number = 1;
assert(str);
while (*str != '\0') {
if (is_num(str) == 0) {
is_number=0;
break;
} else {
str++;
}
}
if (is_number) {
printf("string is a number.\n");
} else {
printf("string is NOT a number.\n");
}
return (is_number);
}
For the math gurus here, I have one question. I went about this in a different way....and the answer ofcourse is different...as with most probability problems...can you please help me understand the mistake in my thinking? Thanks....
1.
What is the set of all possible outcomes when I roll a dice?
d1 = dice 1.
If dice 1 has the value in column 1, then what are the possible sums of both dices?
d1 Outcomes possible
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
2.
Total outcomes = 36
Prob of getting 6 = 5/36
Prob of not getting 6 = 31/36.
This is not the same as the others...so, any help is appreciated. Thanks!
I like your idea....you always maintain the array in a compressed fashion so that a random operation is guaranteed to have a hit. I like it. Thanks.
- smallchallenges April 10, 2014You know...the search was not for the question...that would mean I am taking shortcuts...the search was for a data structure that can support the perf limits OR at the least generate an idea. What can you say? I am not smart that way :-)
- smallchallenges April 09, 2014On further thought, I was hasty. The random server part is *NOT* solved by Cuckoo hashing. Sorry.
- smallchallenges April 09, 2014Okay. Found a reasonable answer...Cuckoo hashing.
O(1) Lookup/Delete
Amortized O(1) Inserts.
Most credit to google and little credit to me...I was close when I was thinking bloom filter.
Thanks for the question once again!
This puppy is an interesting one. Thanks for posting. I do not have an answer...but I am hoping my thought process can generate a response with an answer :-)
O(1) add and remove is possible with a hash OR a double linked list. It is the return random server that is the problem.
I thought about probabilistic structures like skip lists and blook filter, could not get a guaranteed O(1). I thought about a bitmap of remaining servers...not O(1).
Would welcome a tip from somebody else :-) Thanks in advance.
Can you clarify the following things please?
+ What is the size of the pattern and how is it presented to us?
+ What is the size of the input and how it is presented to us?
Thanks.
Lets assume somethings...m==1, n==2 and all meetings start at the same time, with varying end times. In this case, sorting by end times => the shortest meeting gets the room first. The proposed algorithm seems like one based on the "Shortest Job Next" algorithm. In this algo, the advantage is that the average wait times are minimized. If we assume meeting times and durations are known, this algorithm seems to work in my opinion.
- smallchallenges March 31, 2014What I was thinking is: convert the linked lists to a stack. Here, it becomes a reversed linked list, and we can pop the top item of each list, multiply the two, write the result to a new node (or reuse) and free up the two input nodes. Remember to carry over using a temp variable. This should consume the least space, but more cpu to convert to stack. What do you guys think?
- smallchallenges March 17, 2014Nice idea...and accounting for boundary conditions as well. The only issue that may be raised is the level of recursion vs stack space running out if the number/list has many many numbers/nodes.
- smallchallenges March 17, 2014IMO, we need to understand what Optimize means. For example, at a very high level, it could mean one of the following
+ Optimize Functionality
++ Better placement of frames
++ Better relational logic
+ Optimize Performance
++ Faster loading of pages
++ Better readahead of related pages
++ More simultaneous operations.
+ Optimize Quality
++ no plugin crashes, etc etc
+ Optimize Security
++ .....
Kartik,
YFor 1,2,3,4 the order will be like below:
Table Deck
------------
1 2 3 4
1 3 4 2
1 3 4 2
1 3 4 2
1 4 2 3
1 4 2 3
1 4 2 3
1 2 3 4
1 2 3 4
Can you please explain how you came to that conclusion? Thanks in advance.
For me, I did two experiments..posted one of them (1-6) below. It seemed to me that once we banish 2 to the end, it has to climb back up 6-1 spots to get back to its original position (we can see that visually as well). So, n-1 spots seems like the correct answer. But, is there an intuitive explanation for this answer? Please clarify.
1 2 3 4 5 6
Table Deck
---------------------
1 3 4 5 6 2
1 3 5 6 2 4
1 3 5 2 4 6
1 3 5 2 6 4
1 3 5 2 6 4
1 3 5 2 6 4
1 5 2 6 4 3
1 5 6 4 3 2
1 5 6 3 2 4
1 5 6 3 4 2
1 5 6 3 4 2
1 5 6 3 4 2
1 6 3 4 2 5
1 6 4 2 5 3
1 6 4 5 3 2
1 6 4 5 2 3
1 6 4 5 2 3
1 6 4 5 2 3
1 4 5 2 3 6
1 4 2 3 6 5
1 4 2 6 5 3
1 4 2 6 3 5
1 4 2 6 3 5
1 4 2 6 3 5
1 2 6 3 5 4
1 2 3 5 4 6
1 2 3 4 6 5
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
From Wikipedia: A balanced binary tree is commonly defined as a binary tree in which the depth of the left and right subtrees of every node differ by 1 or less...
So, in theory, we could maintain depth of the left and right subtrees in each node. At the root node, we will have the degree of imbalance between the left and right subtrees. If left subtree has more depth, then we move nodes from left to right subtree until the imbalance has been corrected to a difference of <= 1. Perhaps a recursive approach may be useful...
I have not coded this up, but will try now...just thought I will post the idea first.
Interesting question.
+ Capacity of a floppy disk = 1.44MBytes
+ The best speed in writing to magnetic media is achieved for sequential writes. Assume no limitations in cpu and memory (1.44mb seems trivial even on older computers).
+ So, the time to fill up a floppy is proportional to sequential write speed to the magnetic media on floppy drive. This is all I could say initially.
So, I went to wikipedia and found that we can safely assume raw write speed to floppy drives to be 1000Kbits/sec OR 125 KBytes/sec. Assuming a file system sitting on top takes 0% cost, to fill up 1.44MB with sequential write speed of 1000KB/sec, it will take roughly 12 secs.
qxixp: the idea is to work on the index (i=1...billion) and not the file storing the words. Hope this helps.
- smallchallenges February 28, 2014Good one. This was what I was thinking as well. Doing it this way, we do not care about how the 1 billion words are stored...sorted, not sorted etc. In fact, since an Index is used for the 1billion names, it may be fair to assume that the words may not be sorted. The assumption being that the index is sorted.So, this method is good.
- smallchallenges February 27, 2014Excellent explanation...even somebody scared of permutation/combinations would be able to appreciate it. Kudos!
- smallchallenges February 26, 2014Thanks for the question. This was surprisingly tricky to get right...I made mistakes. I think part of the reason was that I underestimated the question :-(
The foll code works for 1, 1-901, 1-901-2, 1-901-2-902.
if (head->next) {
second_head = head->next;
} else {
printf("Only one element in list.\n");
return 1;
}
if (second_head->next == NULL) {
printf("Only two elements in the list.\n");
return 2;
}
tail = head;
second_tail = second_head;
while(tail->next && second_tail && second_tail->next) {
tail->next = tail->next->next;
second_tail->next = second_tail->next->next;
tail = tail->next;
second_tail = second_tail->next;
}
// attach the first and second lists togther.
tail->next = second_head;
print_list(head);
If I have to do this exercise once, I would just do a linear walk through the characters and find the answer like below.
Assume sizea, sizeb sizec are the respective array sizes.
// NOTE: watch out for the \0 at the end.
if ((sizea != sizeb) || ((sizec-1) != (sizea-1+sizeb-1))) { // -1 => take out \0 at the end.
printf("Perfect interleaving not possible.\n");
return -1;
}
ptrA = a; ptrB = b; ptrC = c;
sizea -= 1; // cut off the \0 char
while (sizea) {
if (*ptrA++ != *ptrC++) {
break;
}
if (*ptrB++ != *ptrC++) {
break;
}
sizea--;
}
if (sizea == 0) {
printf("Perfect interleave.\n");
} else {
printf("No interleave.\n");
}
I was thinking along similar lines as Arrantinsane. I split the problem into two parts:
+ From the binary tree root, see if the target node can be found at specified distance/depth.
+ From the target node, find all nodes at specified distance/depth.
Here is my pseudo code.
void
find_node_at_distance(root, tgt, distance)
{
lroot = root; // local copy of root.
ldistance = distance;
while (lroot) {
if (tgt) {
if (lroot->data == tgt->data) {
break;
}
}
if (ldistance < 0) {
break;
}
ldistance--;
if (tgt->data < lroot->data) {
lroot = lroot->left;
} else {
lroot = lroot->right;
}
}
if (ldistance == 0) {
printf("root is at distance of %d from tgt\n", root, distance);
}
}
main()
find_node_at_distance(root, tgt, 2); // ex distance of 2.
find_node_at_distance(tgt->left, NULL, 2-1); // NULL means any node at given dist.
find_node_at_distance(tgt->right, NULL, 2-1);
Forgot the second part of the question....I posted a link to an ITANIUM programmers manual and the post got killed :-(
- smallchallenges March 31, 2016The thread context is usually lighter than the process context. Hence the speed difference.
For example, thread context is only the general purpose cpu registers. Floating point registers, memory management info (like region table info in Itanium), etc are not saved on a thread context switch...they *HAVE* to be saved if we were to switch processes and new ones restored. Look at any chipsets system programmer's manual for details.