Microsoft Interview Question
InternsCountry: United States
Interview Type: In-Person
Several terabytes fits in nowadays HDD not in the RAM. We should split our database into few files: index files and data file. Typically index file of database is presented as B-tree for decreasing disk read/write operations. Search time is Log b (N), where b - number of child of one node in B-tree.
In such a situation, one should create some indexes on the data, for example B+ tree, where the index node size could be alpha times internal memory size where alpha is less then one, and is decided based upon performance.
so this means, we are actually have alpha times internal memory size of fan out of the tree, which reduces the height of the tree significantly, to Log(N/(Alpha*M))/Log(Alpha*M), where N is the total size of the Daata, and M is the internal memory size.
similarly Insert also, in most cases, untill we have to do some node partitioning which is very rare.and so delete(where we do lazy delete). Full scan any will take same amount of time, we have to read all the data, can't do anything.
hmm, strongly depends one the db. ms implies access. if i had to choose i'd select mnesia but that's just my personal preferred option.
- nils.muellner@googlemail.com February 12, 2015so, the question implies that you search a lot and insert/delete less. this leads to the next question: what is the content, what are the keys? with a focus on search i'd suggest sth bst like, or key hashes. then you could also distribute your database over several disks to gain speed... (i.e. spread buckets over disks to search for one key hash concurrently...).