hava.babay
BAN USERpublic class LongestStringInDictionary {
public static class Trie {
char current;
Trie[] nextChars = new Trie[26];
boolean isWord = false;
public Trie(char current) {
this.current = current;
}
}
public static class Dictionary {
Trie root = new Trie('*');
public void addWord(String s) {
Trie currentNode = root;
for (int i=0; i < s.length(); ++i) {
char c = s.charAt(i);
Trie newLetter = currentNode.nextChars[c - 'a'];
if (newLetter == null) {
newLetter = new Trie(c);
currentNode.nextChars[c - 'a'] = newLetter;
}
currentNode = newLetter;
}
currentNode.isWord = true;
}
public boolean isWord(String s) {
Trie prefix = getPrefix(s);
return (s == null) ? false : prefix.isWord;
}
public boolean isPrefix(String s) {
Trie prefix = getPrefix(s);
return (prefix != null);
}
private Trie getPrefix(String s) {
Trie currentNode = root;
int length = 0;
boolean isFound = true;
for (int i=0; (i < s.length()) && isFound; ++i) {
char c = s.charAt(i);
Trie newLetter = currentNode.nextChars[c - 'a'];
if (newLetter != null) {
++length;
} else {
isFound = false;
}
currentNode = newLetter;
}
if (isFound && (length == s.length())) {
return currentNode;
} else {
return null;
}
}
}
public String getLongestValidWord(String prefix, String remaining, Dictionary root) {
String maxStr = null;
for (int i=0; i< remaining.length(); ++i) {
char c = remaining.charAt(i);
if (root.isPrefix(prefix + c)) {
if (root.isWord(prefix + c) && ((maxStr == null) || (maxStr.length() < (prefix.length() + 1)))) {
maxStr = prefix + c;
}
String newRemaining = remaining.substring(0,i) + remaining.substring (i + 1, remaining.length());
String logestWithC = getLongestValidWord(prefix + c, newRemaining, root);
if ((logestWithC != null) && ((maxStr == null) || (maxStr.length() < logestWithC.length()))) {
maxStr = logestWithC;
}
}
}
return maxStr;
}
public String getLongestValidWord(String s, Dictionary root) {
return getLongestValidWord("", s, root);
}
public static void main(String args[]) {
Dictionary d = new Dictionary();
d.addWord("apl");
d.addWord("apple");
d.addWord("monkey");
d.addWord("plea");
System.out.println((new LongestStringInDictionary()).getLongestValidWord("abpcplea", d));
}
}
Requirements
- hava.babay February 17, 2017* For each video - have a counter for happy, and a counter for non-happy
* One user can either like or don't like a video. You can cancel you like or dislike
One machine design
* Users table
* Video table
* User to video table - this table will have a third column for like\dislike. Since the user can only choose one option.
Capacity in large systems
* 1 billion users (100 million active users)
* Videos 1 billion per day => 365 billion in one year. 5 * 365 = 1,825 billion in 5 years
* Average of likes per video: 100 (many videos with no likes, but some videos with a lot of likes. So in 5 years this is the number of records in the like table: 182500 billion.
* If we keep all the relationship between user and movie in a table:
* Keep one record: 64 byte for the ID of the video + 64 byte for the ID of user + 1 byte for the like\dislike = 130 bytes = 0.15 KB for each record. Total size 18250 billion KB ==> 18250 TB ==> 18 Petabyte
* So we will do something else. For each movie we will keep the count of like and dislike. And for each user, we will keep the list of movies that he liked.
* For each movie 2 counters, size of int: 8 byte * 1825 billion = 14600 billion byte = 14.5 TB
* For each user the list of likes: each user can have 1000 likes = 1000 billion * 64 byte = 64000 billion byte = 64 TB
* Keep the movie counters in DB, and also for the recent movies keep it in memory
* Keep user to movie data in DB
Handle concurrency
* What happen when 2 users update the count of a movie? The server should get the requests as "user X, Movie y, + 1". For example use the database capability to do "select for update".
* Make sure to update memory cache each time the counter updates in DB.
How many machines?
* Let say that 50% of the videos are seen daily, so we need 7 TB in memory for the counters. So considering 70 GB of RAM, this means 100 servers. And 10 more servers for handling machines failures.
* Data base: considering 4 TB SSD drive per machine: 16 machines for the user likes, and 4 machines for movie count. Let say that each machine have a slave so 40 machines
How the data is spread in machines? Should be according to location. Both the users data and the movies data.