Messi
BAN USERAssume that memory block size is M and the file is having N blocks. Then N/M times, M blocks can be brought into memory. Now maintain two hash tables in the memory, one for the sessionid and one for the <pageid,sessionid>. Second hashtable is used to remove duplicates and keep incrementing the <sessionid> hashtable values based on the number of pages visited.
All this is still O(N) operation. Now use k-selection sort to find the final session id's.
Here for hash tables, you can use some kind of boolean array if the page id and session id values are known already.
Other way to do this is to use map reduce.
From each map job, emit the <sessionid,<pageid,sessionid>> from the map.
Now for each session id, go through all the values in the reduce step and now emit <count,sessionid> from the reduce step.
Now have another map step, where you only emit the max 2 counts for each map job. And have just one reducer where you emit the top k values.
For every 2 points, find the slope of the points and store in the hash table. Now go through the hash table and print all the sets that are having the same key value(i.e
slope). The complexity will be O(N^2).
Make sure the value of the hashtable is represented as a set, to avoid duplicates.
Idea being, if two points are having the some slope, then the third point which is collinear will also have the same slope with either of the two points.
Please correct me if I am wrong.
As all the inverted lists are sorted, lets suppose there are k of them.
Now build a min-heap with the first index of all the k lists and also keep track of the max element. Now check the value of max - min and store it.
Now keep reading the indexes from each list, and check if its index > min value, then insert it into the heap and pop the min element. Keep doing this till you exhaust elements in one of the lists.
I think this will be the code.
typedef struct node {
struct node* left;
struct node* right;
struct node* parent;
int data;
}node;
node *find_next(node *cur){
node *tmp;
if (!cur) return NULL;
if (cur->right) {
tmp = cur->right;
while(tmp->left){
tmp = tmp->left;
}
return tmp;
}
else{
tmp = cur;
while(tmp->parent && (tmp->parent->right != cur && tmp->parent != root)) {
tmp = tmp->parent;
}
if(tmp->parent->right)
{
while(tmp->left){
tmp = tmp->left;
}
}
return tmp;
}
}
1. Write test cases before you write actual functionality. Make sure you have 100% coverage for the code you wrote.
2. Fix bugs by writing a test case to replicate it and then fix the bug.
Mock tools - Junit for test cases, some mocking framework for mocking objects.
In a very large file find the 3 most frequent words
Sort the words in the file, now just walk through the list. This is O(nlogn). Better algorithm will be to hash each word, and then walk through the hash and report three words with max occurences.
Also tell the number of occurrences of each word
Above idea will suffice.
Pointers and Reference looks similar but there are some difference between both of them.
POINTER
1) Its not necessary to initialize the pointer at the time of declaration. Like
int a = 10;
int *P = &a; //It is not necessary
Another way is :
int a = 10;
int *P;
P = &a;
2) You can create the array of Pointer.
3) You can assign NULL to the pointer like
int *P = NULL; //Valid
4) You can use pointer to pointer.
REFERENCE
1) Its necessary to initialize the Reference at the time of declaration. Like
int &a = 10;
int &a; //Error here but not in case of Pointer.
2) You can not create the Array of reference.
3) You can not assign NULL to the reference like
int &a = NULL; //Error
4) You can not use reference to reference.
you can just store the number of zero's right..why do you want sparse matrix?
- Messi September 09, 2010