Microsoft Interview Question
Software Engineer in TestsCountry: United States
Interview Type: In-Person
I would like to discuss why and if Huffman encoding will be a right answer? Run-length encoding is good if all the characters have equal probability of occuring. So for example - if a occurs 5 times and all other characters occur within the range of 3-6, then run length encoding is surely the right answer.
Huffman coding is usually used when you need to transmit an alphabet (set of symbols, in this case the english alphabet) over a communication medium using binary numbers (usually) or a much smaller alphabet. Which is why i see doing huffman code as unnecessary, although its not wrong. And i was asked this question in a microsoft interview for the same position and they were looking for the same answer. :)
Bleep you are right - huffman is overkill. bleep can you share more about your MS interview. You can mail me - whizz.comp at gmail
Hi Bleep, just out of curiosity, how is the original string reconstructed at the recieving end. can you illustrate the logic with a sentence..? Thanks in advance
#include <cstdlib>
#include <sstream>
#include <iostream>
int
main(int argc, char **argv)
{
if(argc != 2) {
std::cerr << "the string is required" << std::endl;
return (EXIT_FAILURE);
}
std::stringstream buf;
char *pcur = argv[1];
while(*pcur != '\0') {
char cur_char = *pcur;
int count = 1;
pcur += 1;
while(*pcur != '\0' && *pcur == cur_char) {
count += 1;
pcur += 1;
}
buf << cur_char << count;
}
std::cout << buf.str() << std::endl;
return (EXIT_SUCCESS);
}
To compress string you can use something called run-length coding. This makes use of the redunduncies (like repetitions in string) and compresses it.
- bleep February 28, 2012for example:
if you have aaaabbbcddeeeeffff
it can compressed as \4a\3bc\2d\4e\4f.
the backslash is used as an escape sequence to denote that its a compressing element. you escape backslash in the original text with another backslash just like how you would do in c/c++.
you can do this in n steps. its a straightforward algorithm