LinhHA05
BAN USERBased on the observation that all words with the same anagrams shared the same histogram (counting of each individual letter), if the only query that we need is counting query (count the number of words in the same anagrams) then we should use hash map.
The mapping processing is following
word -> histogram -> count
Complexity
- word -> histogram : O(l) with l is the length of the words
- histogram -> count : O(1) as we can pre-compute the value of each entry (count)
The overall complexity is O(l), the solution require additional space O(m) with m is number of anagrams in the lists.
Now if we tight of space, then the best approach is to use a sorted list of words. We ordered all words based on its representative anagrams for example: 'abc', 'bac', 'cab' all be presented by 'abc'. We now can use a binary search to find the first element in the anagram (lower_bound() function in C++) and find the first element of the next anagram (upper_bound() function in C++) the number of words with same anagram is the difference of these two number.
The complexity of the approach is O(l) * O(log N) with N = 10000 and l is the length of the words however this approach require no additional space, the list of the same anagram is available immediately. Note that for the hashmap approach it takes O(N) to build this list.
From what I understand from the input in the digit representation of a number X if we can decompose to
X = A A1 A2 .. AN
with Ai = Ai-1 + Ai-2 then X is Fibonacci number
Some other observation to reduce the complexity
- A = A1 (as show on all of your example)
- Ai > Aj if i > j that means the number of digit represent Ai > number of digit represent Aj
- N is at least 2 (or there're at least 3 numbers)
My solution is a function to detect if a number is fibonacci number.
bool isFibonacci(long long val) {
string s = to_string(val);
int digits = s.length();
int max_length = digits / 3; // as we need at least 3 number
for (l = 1; l <= max_length; ++l)
if (isFibonacci_helper(s, l))
return true; // stop if we find feasible solution
return false;
}
bool isFibonacci_helper(const string& s, int start_length){
int p = 0;
string sA = s.substr(p, start_length); p+= start_length;
string sA1 = s.substr(p, start_length); p+= start_length;
if (sA.compare(sA1) != 0) // check the condition that A1 = A
return false;
long long A = stoi(sA);
long long A1= A;
long long A2 = A + A1;
while (p < s.length()) {
string sA2 = to_string(A2);
if (p + sA2.length() > s.length())
return false;
string sA2c = s.sub_str(p, sA2.length());
if (sA2.compare(sA2c) != 0) // check the condition that A2 = A + A1
return false;
p += sA2.length();
A = A1; A1 = A2; A2 = A + A1;
}
return true;
}
@pxe: thank for recognizing my solution. I have many down-vote that I have no clue, but i think it is not important if someone think my answer is useful
@Urik Lagnes: I just want to say that the problem is the problem of finding Hamiltonian path, it is so famous that I don't think it is necessary to show how we do it. Of course if you know how to find Hamiltonian path faster on the complete graph you can show how to do it.
You can do smth similar to DFS( or BFS) travel that you only visit the node that you have not visited to prevent loop. You also have to keep track the path from the source so that we can print it out when you reach the destination.
Something to note
- The length of the path is less or equal to N which is the number of vertices
- When you reach the destination then print the path and stop the current search branch since the destination can not be visited second time
- We need to un-visit a node at the end of the recursive function to allow investigation from other branch that might come back to the same node (this is the main different from original DFS or BFS)
class Graph {
public:
Graph(int n) : g(n), visit(n, false), path(n, -1) {}
void add(int from, int to) {
g[from].push_back(to); g[to].push_back(from); }
void print_path(int l) const {
for (int i=0; i< l; ++i) cout << path[i] << ' ';
cout << endl;
}
void traverse(int source, int dest) {
fill(visit.begin(), visit.end(), false);
DFS(source, dest, 0);
}
private:
void DFS(int u, int dest, int level) {
path[level] = u;
if (u == dest) {
print_path(level+1);
return;
}
visit[u] = true;
for (auto v : g[u] )
if (!visit[v])
DFS(v, dest, level + 1);
visit[u] = false;
}
vector<vector<int> > g;
vector<bool> visit;
vector<int> path;
}
Beside I don't care if you up vote or down-vote my answer, but if you down-vote, that means I did something stupid and I want to know what is your perspective.
- LinhHA05 October 11, 2013This is easy problem however it is the kind of question to see how careful your coding is.
One thing to note is the count need to be converted to string before we write, and the count might have several digits
#include <iostream>
#include <string>
#include <sstream>
using namespace std;
void reptonum(char* s, int n) {
char *rp, *wp;
rp = wp = s;
if (n <= 1) return;
while (rp < s + n) {
char cur = *rp;
int cnt = 0;
while (*rp == cur && rp < s + n) {
++rp;
++cnt;
}
*wp++ = cur;
if (cnt > 1) {
ostringstream convert;
convert << cnt;
string nums = convert.str();
for (int i=0; i< nums.length(); ++i) {
*wp++ = nums[i];
}
}
}
*wp = 0;
cout << s << endl;
}
int main()
{
char s[] = "AAAABCDDD";
reptonum(s, sizeof(s) / sizeof(s[0]));
char s1[] = "AAAAAAAAAAAAAAABCCD";
reptonum(s1, sizeof(s1) / sizeof(s1[0]));
char s2[] = "ABCD";
reptonum(s2, sizeof(s2) / sizeof(s2[0]));
}
Using graph ? Yes.
However this is a completed graph, so it is highly likely to have cycle so topological ordering doest not work. I think you guys recognize wrong problem. The order of the vertices satisfied the requirement form a "Hamiltonian Path" on the complete graph. So any algorithm return hamiltonian path on winning graph should give you the answers
@Felipe I think Furquan code is working and return false with your case as order of checking is 3 8 5 (I debug in my mind not run on a computer but I'm pretty sure about the order). You can double check by running it. The solution(s) with min and max value is what I mentioned in my post
- LinhHA05 October 09, 2013The easiest way to prove that your code is wrong is real example
2
/ \
1 3
/ \
0 4
This one is not BST but your code is failed to detect it. The properties of BST is the maximum on the left branch is less than the current which is less than the minimum value on the right branch, that is why your simple strategy fails.
1) Build a graph of (NxN) vertices that there exist an edge (v->w) if the knight can move from v to w. The maximum order of a vertices is 8 so the maximum number of edge is 8xNxN.
We perform the BFS from the source to find
- If the destination is reachable from the source
- if there're a path, then BFS give the minimum number of hops
Node : there're no need to build the graph explicitly we only want to do the BFS on the implicit graph.
2. I believe for infinite chess board all the positions are reachable so
- the solution is not extendable if the destination is unreachable from the source
- Otherwise, the solution on the infinite chess board is at least as good as NxN that gives us the upper bound for the search on the infinite chess board. Again we can rerun the BFS with the bound from the previous step to check if we can find a better solution.
The idea is merging from the back of the b array so the write op of the merge is never interfered with read op from b
int i_a = a.size() - 1;
int i_b = b.size() - 1;
int i_ab = a.size() + b.size() - 1;
for ( ; i_a >= 0 && i_b >=0; )
if (a[i_a] > b[i_b]) b[i_ab--] = a[i_a--];
else b[i_ab--] = b[i_b--];
if (i_a > 0) {
for (;i_a >=0;)
b[i_ab--] = a[i_a--];
} else { // There is nothing to do if i_b > 0 as we merge onto b
}
We have to traverse the tree and check the validity at each node. Even though this condition is often expressed as:
max(left_child) < node_value < min(right_child)
which lead to the following code
bool isBST(Node* node) {
if (node == nullptr)
return true;
bool condition = (maxBT(node->left) < node->value) && (node->value < minBT(node->right));
return condition && isBST(node->left) && isBST(node->right);
}
int maxBT(Node* node){
if (node == null_ptr)
return INT_MIN;
return std::max(maxBT(node->left), std::max(node->value, maxBT(node->right)));
}
int minBT(Node* node){
if (node == null_ptr)
return INT_MAX;
return std::min(minBT(node->left), std::min(node->value, minBT(node->right)));
}
The complexity of this approach is O(N^2) as the minBT and maxBT is computed on fly.
There are several ways to improve the complexity to O(N)
(1) Compute min value and max value at each node on separated passes and cache the result using hash_map. The comparison is then performed on lookup value in constant time
unordered_map<Node*, int> max_map;
unordered_map<Node*, int> min_map;
int maxBT(Node* node){
if (node == null_ptr) {
max_map[node] = INT_MIN;
return INT_MIN;
}
int max_value = std::max(maxBT(node->left), std::max(node->value, maxBT(node->right)));
max_map[node] = max_value;
return max_value;
}
// Similar function for minBT to compute min_map
bool isBST(Node* node) {
if (node == nullptr)
return true;
bool condition = (max_map[node->left] < node->value) && (node->value < min_map[node->right]);
return condition && isBST(node->left) && isBST(node->right);
}
(2) We combine validation, minimum and maximum processes into one post-order traversal that compute the minimum, maximum and validation at the same time.
tuple<int, int, bool> MaxMinValid_helper(Node* node) {
if (node == nullptr)
return make_tuple(INT_MIN, INT_MAX, true);
auto left_res = MaxMinValid_helper(node->left);
auto right_res = MaxMinValid_helper(node->left);
auto max_value = std::max(left_res.get<0>(), right_res.get<0>());
auto min_value = std::min(left_res.get<1>(), right_res.get<1>());
bool valid = (left_res.get<0>() < node->value) && (node->value < right_res.get<1>()) && left_res.get<2>() && right_res.get<2>();
return make_tuple(max_value, min_value, valid);
}
bool isBST(Node* root) {
return MaxMinValid_helper(root).get<2>();
}
(3) The simplest method, however, is based on observation that as long as the value of the current node is in the range determined by its parent then the BST is valid. In particular
* The valid range for left child is [parent->range_min, parent->value];
* The valid range for right child is [parent->value, parent->range_max];
For the root node the range is [INT_MIN, INT_MAX] which lead to the following code
bool isBST_helper(Node* node, int range_min, int range_max) {
if (node == nullptr) return true;
return (range_min < node->value) && (node->value < range_max)
&& isBST_helper(node->left, range_min, node->value)
&& isBST_helper(node->right, node->value, range_max);
}
bool isBST(Node* root) {
return isBST_helper(root, INT_MIN, INT_MAX);
}
(4) Other solution as Furquan mentioned is to based on the unique properties of BST is that in order traversal of BST return the sorted array so of course you can traverse the tree to build the in-order array of the values and then check if it is sorted. In practice however we don't need to store the in-order result as we can combine the check into our in-order traversal to compare the current value at the node to the immediately prior node in the traversal order (pvalue). I initialize the pvalue with INT_MIN to remove the check of validity of pvalue
bool isBST_helper(Node* node, int& pvalue) {
if (node == nullptr) return true;
if (isBST_helper(node->left, pvalue))
if (node->value <= pvalue) {
pvalue = node->value();
return isBST_helper(node->right, pvalue);
}
return false;
}
bool isBST(Node* root) {
int pvalue = INT_MIN;
return isBST_helper(root, pvalue);
}
I don't think the limits of the range: O(n) is a valid assumption in general.
Even the assumption of integer number is questionable.
The method using quick select is a much better solution in practice that can use with any range of input and any type of input.
It sounds smart but this approach has several problems as horizon and alex mentioned, actually the solution by horizon is a better one, why do we need to compute (check) to the end of the string while we know the condition is violated already and we also suffer the chance to overflow.
- LinhHA05 August 29, 2013From the requirement clearly we need
- A forward conversion function to map from a tree to an array
- A backward conversion function to map from the array to original tree
The forward conversion will be the result of one of the traversal: pre/in/post order
While it is possible to use a single array (the pre-order) to reconstruct the input tree, the requirement is that the tree is a binary tree without duplicated value.
In general the binary tree can be constructed using two arrays in-order in combination with pre/post order, we can not combine these 2 arrays to a single array as there is no linear mapping between 2 arrays that allowed an common indicies to combine 2 arrays.
My solution is to create a single array of the structure
struct Node {
int value;
bool isLeaf;
};
with isLeaf indicates if a node is a leaf node. We use pre-order traversal to create
the array, the reason is that pre-order traversal will create the parent node before the children. The isLeaft field indicate a leaf node or the point the tree stop growing and we need to come back to build from other branch.
Good question. I don't know the answer while I believe there are closed form solution with higher number of missing values but it is non trivial. There are two thing to prove: the solution in close form exist and unique (if we order the result in ascending/descending order)
- LinhHA05 August 08, 2013The idea is simple
Call
S1_100 = 1 + 2 +3 + ...100;
S2_100 = 1 ^ 2 + 2 ^ 2 + .. + 100 ^ 2;
Now we do the computation on the data that we have
S1_A = a[1] + a[2] + .. a[98]
S2_A = a[1]^2 + a[2]^2 + .. a[98]^2
Now let say the two missing number are a and b then
a + b = S1_100 - S1_A
a^2 + b^2 = S2_100 - S2_A
We then determine a and b from the equations.
Apply in your problem we have
S1_5 = 1 + 2 + 3 + 4 + 5 = 15
S2_5 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 = 55
S1_A = 4 + 1 + 2 = 7
S2_A = 21
The
x + y = 8
x^2 + y ^ 2 = 34
The solution is x = 3, y = 5;
All above solution is non-linear complexity. if the question is only the number of steps then I have O(n) solution. Mine function return -1 if there is no feasible solution
#include <iostream>
#include <vector>
using namespace std;
int minJump(vector<int> jumps)
{
size_t n = jumps.size();
bool feasible = true;
size_t lend = 1;
int jump_cnt = 1;
int i = 0;
while (feasible)
{
size_t lr = 0;
for ( ; i < lend; ++i)
if (i + jumps[i] > lr)
lr = i + jumps[i] + 1;
if (lr >= n)
break;
feasible = (lend < lr);
lend = lr;
++jump_cnt;
}
return feasible ? jump_cnt : -1;
}
int main()
{
//int a[] = {1, 5, 4, 6, 9, 3, 0, 0, 1, 3};
int a[] = {2, 8, 3, 6, 9, 3, 0, 0, 1, 3};
std::vector<int> va(a, a + sizeof(a) / sizeof(a[0]));
std::cerr << minJump(va);
}
@gleb: a2[0] is the largest of a2
For your example
a1 = [7, 5] a2 = [2, 0]
b1 = [6, 4] b2 = [3, 2]
since a2[0] < b2[0] so a2 is in the lower 4, b1 is in the higher 4 that left if a1 and b2
a11 = 7 a12 = 5
b21 = 3 b22 = 2
Now we compare a12 and b22, b22[0] < a12[0] so b22 in the lower 2 left. The last one of lower 2 is decided between b21 and a21.
So we need total 3 weight to find the 4 lower number which is a2, b22, b21: [2, 0, 2, 3]
The main logic of the problem is based on DP and tree traversal
- DP: if a string satisfy the conditions then all of its substrings are. That mean
if you have a string with length l, then the one with length l -1 (without the last character) also satisfy the condition of the problem
- The feasible sets is a tree, and the length of the string indicate the level of the tree, the next level will cover all the length + 1 strings. We can build the tree from the top (level 0), each step we add the leave at the level with with a single character in the set (0, 1 in the binary tree). We then check the feasibility of the node.
- The infinite condition is when we see repeat pattern
if SS exist in the feasible set then SSS, SSSS ... also feasible so we have to check the repeat pattern if any.
Please correct if my intuition is wrong
Sound like 2012 is the year of the test. By anyway first try with simple equivalent one and generalize latter.
Let assume that you have 2^n = 32 weight, they are divide in to 2 group and then ordered. The question is how can you extract the small 16 weights.
Call them a, b group
Divide each group to 8 and 8 weight : a1 > a2, and b1 > b2
- 1st: Weight a2[0] and b2[0]. if a2[0] > b2[0] then a1, b1 > b2 so b2 is in the 16 less group, other wise a2 is in the 16 less. Now we have identified 8 of 16 lower and 8 of the upper 16 is a1 in the first case and b1 in the second
- 2rd. Without losing the generality assume a2[0] < b2[0]. What we have left is a1, b2.
Divide a1 by 2: a11 > a12. Divide b2 by 2: b21 > b22. Repeat the first step again to extract 4 of the lower 8
- Repeat the process we then identify 4, 2, 1 weight for the total of 15 after 4 weighting. If we need 16 then one extra weight will require for a total of log (2^n) = 5 weighting.
Now you see the answer for your question
Mine solution is for binary string but it can be extent to string with a set of character. The idea based on extending feasible level of a suffix tree. The infinity condition is based on the repeat of the substring, when substring repeat we can easily construct a solution with infinite length
@Edit: change .size() to .length() to get the length of the string
#include <iostream>
#include <string>
#include <vector>
bool check_contain(std::string& a, const std::vector<std::string > subs) {
for (int i=0; i< subs.size(); ++i) {
if (a.length() >= subs[i].length()) {
std::string s = a.substr(a.length() - subs[i].length(), subs[i].length());
if (s == subs[i])
return true;
}
}
return false;
}
bool check_repeat(std::string& a, int n) {
if (a.length() < 2 * n)
return false;
std::string s1 = a.substr(a.length() - n, n);
std::string s2 = a.substr(a.length() - 2 * n, n);
return s1.compare(s2) == 0;
}
int main(int argc, char** argv)
{
std::string arr[] = {"11", "00"};
//std::string arr[] = {"101", "111", "00", "110"};
std::vector<std::string > ref(arr, arr + 2);
int max_length = 0;
for (int i=0; i< ref.size(); ++i)
if (ref[i].length() > max_length)
max_length = ref[i].length();
std::vector<std::string > non_cover_set_prev;
std::vector<std::string > non_cover_set;
non_cover_set_prev.push_back(std::string(""));
bool repeat = false;
while (!repeat) {
non_cover_set.clear();
for (int i=0; i< non_cover_set_prev.size() && !repeat; ++i) {
std::string a0 = non_cover_set_prev[i] + '0';
std::string a1 = non_cover_set_prev[i] + '1';
if (!check_contain(a0, ref)) {
non_cover_set.push_back(a0);
repeat = repeat || check_repeat(a0, max_length);
}
if (!check_contain(a1, ref)) {
non_cover_set.push_back(a1);
repeat = repeat || check_repeat(a1, max_length);
}
}
if (non_cover_set.size() == 0)
break;
non_cover_set.swap(non_cover_set_prev);
}
if (repeat) {
std::cout << -1 << std::endl;
}
for (int i=0; i< non_cover_set_prev.size(); ++i) {
std::cout << non_cover_set_prev[i] << std::endl;
}
return 0;
}
There are simple trick to turn it from non-decreasing to strictly increasing by changing from upper bound (to find something strictly larger hence the previous might be equal) to lower bound (to find something at least equal or larger hence the previous is strictly smaller). My edited code reflect that idea
- LinhHA05 July 29, 2013@vgeek: I think you miss read it. I think @Leaner ask the right question, acording to him the maxum in -1. -3, 6, 8 is 6 + 8 = 14 (not 10 as you pointed out). And since the question is confusing, he want to make it clear. For me I still don't fully undestand the question
- LinhHA05 July 28, 2013The possible reason that someone (not me) to down vote this one might be the wrong output. But anyway this is well-know problem "Longest increasing subsequence" you can google it. The complexity is O(nlogn). Beware it is not easy to make it right. This is mine version
#include <iostream>
#include <vector>
#include <algorithm>
#include <limits>
#define STRICTLY_INCREASING 1
template<typename T>
std::vector<T> LIS(const std::vector<T>& v) {
typedef T value_type;
typedef std::pair<value_type, int> value_pos_type;
typedef std::vector< value_pos_type > value_pos_vector_type;
typedef typename value_pos_vector_type::iterator best_iterator_type;
size_t n = v.size();
std::vector<int> p(n);
std::vector< value_pos_type > best;
value_pos_type cur = std::make_pair(std::numeric_limits<value_type>::min(), -1);
best.push_back(cur);
for (int i=0; i< n; ++i) {
#ifdef STRICTLY_INCREASING
value_pos_type cur = std::make_pair(v[i], 0);
best_iterator_type it = lower_bound(best.begin(), best.end(), cur);
cur.second = i;
#else
value_pos_type cur = std::make_pair(v[i], i);
best_iterator_type it = upper_bound(best.begin(), best.end(), cur);
#endif
if (it == best.end()) {
p[i] = best.back().second;
best.push_back(cur);
} else {
best_iterator_type pIt = it-1;
p[i] = pIt->second;
*it = cur;
}
}
std::vector<value_type> ret(best.size() - 1);
int f = best.back().second;
for (int i = ret.size()-1; i >=0; --i) {
ret[i] = v[f];
f = p[f];
}
return ret;
}
int main(int agrc, char** argv) {
int a[] = {5,6,3,4,1,9,9,8,9,5 };
//int a[] = {0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15};
std::vector<int> v(a, a + sizeof(a) / sizeof(a[0]));
std::vector<int> res = LIS(v);
for (int i=0; i < res.size(); ++i )
std::cout << res[i] << ' ';
std::cout << std::endl;
}
@eugene.varovoj: my analysis is incorrect, agree as I said with Apostle (see the discussion above). If you read the solution (not mine but the one I refer) you will know that O(M+N) is not the best. Actually mine is not as good as O(M+N) when M is not much different from N, however if M >> N, it is a better one. Just show you a simple example, when N = 1, while the saddle back complexity still M + 1, we know a binary search with log(M) complexity.
- LinhHA05 July 28, 2013I'm pretty supprised of some anonymous one here. This is what he said
"Oh shut up. The reason O(m+n) was mentioned was because the interviewer did so. This was a comment to claim that the interviewer was a Moron. Seems like you missed the point again, idiot "
And this is my answer
Before you call someone moron or idiot, read the paper that I put there first, if you can read
@Apostle: Thank you for the constructive discussion. I think I don't have tight bound for my analysis. Believe me or not Saddleback is not the best solution available
Google this and you will see
"Improving Saddleback Search: A Lesson in Algorithm Design"
Thank you (Apostle) for leading me to this paper. I will not post the solution here since it does not belong to me. BTW: if you can not access to the paper, you can find it in the book "Pearls of Functional Algorithm Design"
@Apostle: yes, my analysis falls on that similar line. However, we should see the idea as is
- Each sub-quad has the same properties of the original (the value increase from left to right, and from top to bottom)
- The range is determined in constant time based on the left top corner and right bottom corner. So checking range is the same cost as your check in binary search
-I choose the pivot point in the way that I ballance size of the sub-quad
So in my analysis
T(n*m) = 3 * T(m*n/ 4) + c
- c is the checking cost
- m * n / 4 is the size of the sub-quad
- Constant 3 reflects the mutual exclusion of first-and forth quads
as
-- if a[i][j] < v then the all the value of the first quads < v without doing any futher investigation.
-- if (a[i][j] > v) then all the value of the fourth quads > v without doing any futher investigation
So we alway have to check only 3/4 quads at most. The range checking can further trimp down the branch in constant time.
When one of the size become 1 : we fall back to binary search
@eugen.yarovoi: You have flaws in your analysis since you have wrong constant 3 (if you consider binary tree) or n/2 (if you consider quad tree) that is why the runtime is incorrect. BTW: my analysis based on the total number (m*n) as the pivot point divide the space onto 4 sub quad. See my first explanation for more details
- LinhHA05 July 27, 2013Actually I should correct it, it should be call quad (range search) tree which each internal node has exactly four children. The range in each node is defined by minimum (at the top left corner) and maximum (at the bottom right corner). We only traverse the node if the range of the node contain the lookup value. Due to the mutual exclusion of the first and the fourth leaves we can always trim one leave while we do searching. We choose the pivot at the middle to make our tree balanced. So in term of complexity in the worst case it similar to binary tree with (1/4, 3/4) cut when the search always happens at the 3/4 part. However the complexity still O(log N) with N = m * n
- LinhHA05 July 26, 2013
You have unique hash code because all anagrams share the same histogram. Using your hash code is one way to map between the histogram and the count but that is not the only hash function we can use. Basically any good hash function computed from the histogram would work. The reason that I don't think your hash function is a good choice due to the potential to overflow.
- LinhHA05 October 17, 2013I will discuss to interviewer about what are the queries that application will support. if counting is the only query then we don't need to maintain the whole list.