## Google Interview Question

Software Engineer Interns**Country:**United States

**Interview Type:**Phone Interview

I like this answer - I was working on the lines of a graph G with vertices of dogs and cats, and edges of votes (either a cat lover vote or a dog lover vote). Then:

1) identify each independent sub-graph of vertices that are connected

2) for each independent sub-graph count the dog votes and cat votes, and for whichever has the most votes in that sub-graph you would eliminate the other types of pets. Add this max value of votes to the total of satisfied votes.

Not sure if this is maximal though...

I upvoted your answer because it made me think of the problem in a new way. However, I'm pretty sure this solution won't work. The problem is that you are trying to find a maximum independent set of *constraints*, not a maximum set of voters. As the examples show, some constraints are more popular than others, so not all vertices are created equal. Therefore, all maximal independent sets must be considered in order to maximize the number of voters. This turns the problem NP-complete again and I suspect that in an interview setting the best I would come up with is an exhaustive recursive algorithm. Great idea though.

Some terminology clarification: a graph can have many *maximal* independent sets, but only one *maximum* independent set. It is possible to reduce the problem of finding the *maximum* independent set in a bipartite graphs to polynomial complexity using Konig's theorem. However, finding all maximal independent sets is NP-complete regardless.

I might not understand the problem. Can you explain why the answer to the second case in the problem statement is 3 and not 2? Seems like because no more than 2 people agree on who to get rid of, an answer of 3 should not be possible.

I agree, I believe the problem states that viewers are considered happy if both their votes are satisfied so the output should be two.

I'm ignoring the input reading part because that's not very interesting to me. If this were an in-person interview, my experience is that they would not care about this very much. If I had to implement it, I would use some sort of stream parsing, but there wouldn't be much logic.

In writing the actual algorithm that determines the number of things votes that could be satisfied, the logic is actually quite simple: Count the maximum number of voters that vote for each pet to be eliminated. Since each voter can only vote for 1 pet to be eliminated and each voter cannot vote to keep the same pet that they eliminate, those votes would be immediately satisfied. Bucket sort the votes on the what they would eliminate and count the largest bucket. O(v + (c + d) ) where v is the number of votes, c is the number of cats, and d is the number of dogs.

```
static class Vote{
boolean eliminateIsCat;
int eliminateIndex;
boolean keepIsCat;
int keepIndex;
}
//inputs are parsed into a Collection of Votes
public static int maxSatisfied(Collection<Vote> votes, int numCats, int numDogs){
int[] elimCats = new int[numCats];
int[] elimDogs = new int[numDogs];
for(Vote vote : votes){
if(vote.elimateCat){
elimCats[vote.eliminateIndex]++;
}
else{
elimDogs[vote.eliminatedIndex]++;
}
int maxElim = 0;
for(int i =0; i < numCats; i++){
if(elimCats[i] > maxElim){
maxElim = elimCats[i];
}
}
for(int i = 0; i < numDogs; i++){
if(elimDogs[i] > maxElim){
maxElim = elimDogs[i];
}
}
return maxElim;
}
```

this will not work. Just consider the case where every one votes for d1 like this

C1 d1

C2 d1

C2 d1

C3 d1

Your algorithm gives 4 viewers where it is supposed to be 2

@arun

But it should be 4 votes. See, if everyone says they want to get rid of D1 like your example, then the optimal thing would be to get rid of D1, and C1, C2, and C3 all stay on the show so everyone is happy.

Hi Zortlord,

why would you take into account for the solution just the most voted pet for the elimination?

It seems that your solution doesn't work for this kind of input:

C1 D1

C1 D1

C1 D2

C1 D2

D2 C1

D2 C1

D1 C1

with this input the output should be 4 (we can eliminate D1 and D2 and keep C1, and the first four lines will be satisfied).

instead your algorithm retrieves 3 (the last 3 lines)

Supposing that the second output is two, (as only two customers can have both their options satisfied not three)

I would parse the input into a collection of votes, I would also abstract the dogs and the cats into a participant class. Then split all of those into test cases/ballots.

```
class Vote
{
Participant whoIsLeaving;
Participant whoIsStaying;
///... implementation
}
Class Participant {
ParticipantType type; //struct, or something cached from a DB
int participantId;
}
class TestCase
{
int numberOfDogs;
int numberOfCats;
List<Vote> votes;
}
```

Then I would parse the votes and keep track of which one is max so I do not have to search on the matrix later.

```
int ProcessVotes (List<TestCase> testCases)
{
int[] output = new int[testCases.Count];
int i=0;
testCases.ForEach( tc => {
Dictionary<Tuple<Participant,Participant>,int> results = new Dictionary<Tuple<int,int>, int>();
output[i]=0;
tc.Votes.ForEach(v =>{
var tuple = Tuple.Create(v.WhoIsLeaving, v.WhoIsStaying);
if (results.ContainsKey(tuple)) //combination already has a vote
{
results[tuple]++;
if (output[i]<results[tuple])
{
output[i] = results[tuple];
}
continue;
}//end if
results.Add(tuple, 1); //combination has no vote
if (output[i]==0)
{
output[i] = 1;
}
}); //foreach vote
i++;
});//foreach test case
return output;
}
```

This is my solution, let me know test cases where this algorithm doesn't work:

I define a class Pet. Each Pet has a type(C or D) and a code (the sequential number 1 2 etc...). Each Pet knows the difference between positive and negative votes received, the number positive votes received, and the Pets that should be discarded if this Pet is chosen for the final solution.

After parsing the input into a List of Pets, I sort the list by the difference between positive and negative votes received.

Then starting with the Pet with the Highest tradeoff between positive and negative votes( the most popular pet ), I add the number of positive votes received by that pet to the solution, and then I flag all the pets that should be discarded according to the votes received by that pet, and I won't count these discarded pets in the final sum. Keep adding positive votes until the end of the list skipping Pets flagged as discarded. The total obtained in this way is the maximum number of satisfied votes.

Complexity O(v) to read the votes, where v is the number of votes; O(nlogn) to sort the list of Pets by the difference positive and negative votes (maybe this can be improved by adding the Pets directly in a TreeSet with and external Comparator; O(n) to count the final sum; n is the number of Pets;

In the code below the pets list is already sorted by the most popular pets:

```
public static int maxViewers(List<Pet> pets) {
int max = 0;
if(pets==null || pets.size()==0) return max;
Set<Pet> discarded = new HashSet<Pet>();
for(Pet p: pets) {
if(!discarded.contains(p)) {
for(Pet petout: p.getDiscards()) {
discarded.add(petout);
}
max+=p.getPositive();
}
}
return max;
}
```

I agree that arbitrary number of cats or dogs will be kept and thrown out.

I would approach this problem using a hashmap and then compare the 2 different hashmaps.

E.g. for the last testcase, we have:

HashmapKeep: C1: 3 D2: 1

HashmapThro: C1: 1 D2: 1 D1: 2

We keep C1 throw D2 D1. Based on the comparison that keep.get("C1")>throw.get("C1")

Otherwise we throw. Even if they have the same number.

And that leaves us with 3 satisfied viewers.

This might not work for all cases. Though.

You guys thought too much. We need to respect the given output, right? It is confusing that "...the pets that get to stay will be chosen so as to maximize the number of viewers who get both their opinions satisfied." - But considering the given output for testcase2 is 3, I understand the focus is ONLY "to choose which pet to stay", i.e. the pet that won the most "stay" votes (that is, it naturally satisfies its voters' both opinions: a) "stay" pet; b) not "leave" pet and it is agnostic/independent to which specific pet its like voters want to leave), so the C1 won and the result is 3.

Note: if the question adds one more condition like this ""...the pets that get to stay AND THE PETS GET TO LEAVE will be chosen so as to maximize the number of viewers who get both their opinions satisfied." - Once TV producer has to be specific which pet has to leave, he knows he will lose more audiences 'cos the 2nd contraint of "the pet got the most "leave" votes" has to be satisfied as well (it is like find an overlapping subset between Set S_max_stay and Set S_max_leave for audiences). In this case, we will have 2 as the result for the testcase2, i.e. (C1 stay, D1 leave).

An update to the 2nd half of my previous post:

Note: if the question adds one more condition like this ""...the pets that get to stay AND THE PETS GET TO LEAVE will be chosen so as to maximize the number of viewers who get both their opinions satisfied." - Once TV producer has to be specific which pet has to leave, he knows he will lose more audiences 'cos the 2nd contraint of "the pet got the most "leave" votes" has to be satisfied as well (it is like find a maximum overlapping audience subset of "stay" votes and "leave" votes). In this case, we will have 2 as the result for the testcase2, i.e. (C1 stay, D1 leave).

Number of [C1 D1] votes are 2

Number of [C1 D2] votes are 1

Number of [D2 C1] votes are 1

Note: for c cats and d dogs, all possible vote permutations are c*d+d*c=2dc (as voter is either cat lover or dog lover, mutually exclusive). For the testcasse2, c=1 and d=2, so in total 2dc=4 permutations (but no vote for the 4th permutation [D1 C1] in this experiment). Thus the maximum overlapping audience subset with both opinions satisfied is 2 (and that winning opinion is [C1, D1]).

Algo:

Coding is straightforward

Initialise the max number of audiences with both opinions satisifed to 0.

check each vote [Xi, Yj]:

if it is a new permutation, record it and set its count to 1

if it is a already recorded permutation, increment its count by 1

if the updated count number is greater than the max number of audiences with both opinions satisfied, set the max number to this count number

Return the max number of audiences with both opinions satisfied.

Following up on some of the ideas in the suggested answers, I found a solution based on the max-flow algorithm.

1. create nodes for all 'constraints' i.e. distinct voter lines.

2. divide nodes into two groups - dog lover constraints and cat lover constraints.

3. create a directed graph flow network

3.1. create a source and sink nodes.

3.2. for each dog lover constraint node (left side).

3.2.1. connect source to node with edge capacity equal to constraint votes.

3.3. for each cat lover constraint node (right side)

3.3.1. connect node to sink with edge capacity equal to constraint votes.

3.4. for every two conflicting constraints

3.4.1. create an edge from left node to right node with infinitie capacity.

4. calcluate max flow.

5. return original number of voters - max flow.

The reason this works is that according to the max-flow-min-cut theorem, the max flow is equal to the minimum capacity that when removed from the network in a specific way, causes no flow from source to sink. That specific way is by removing the min-cut edges from the graph.In this problem, since we assigned infinite capacity to the original edges, the min-cut edges must be connected to the source or the sink. Additionally, the constraint nodes in these edges correspond to the minimum weight vertex cover, i.e. exactly the constraints we need to remove in order to avoid conflicts between the remaining voters constraints and maximize the number of remaining voters.

Below is a C++ implementation. It is too long for an interview setting but I believe it is the optimal solution to this problem.

```
#pragma once
#include <memory>
#include <queue>
#include <string>
#include <iostream>
#include <sstream>
#include <list>
#include <boost/unordered_map.hpp>
#include <boost/unordered_set.hpp>
#include <boost/tokenizer.hpp>
#include <boost/algorithm/string.hpp>
namespace reality_shows
{
template <typename K, typename V>
using map = boost::unordered_map<K, V>;
template <typename T>
using set = boost::unordered_set<T>;
using std::shared_ptr;
using std::istream;
using std::list;
using std::queue;
class vote_t
{
public:
vote_t(size_t _first_id, size_t _second_id, bool _hates_cats)
: first_id(_first_id), second_id(_second_id), hates_cats(_hates_cats) {}
size_t first_id;
size_t second_id;
bool hates_cats;
};
class node;
class edge
{
public:
edge(shared_ptr<node> _from, shared_ptr<node> _to, size_t _capacity, bool _residual = false)
: from(_from), to(_to), capacity(_capacity), flow(_residual ? capacity : 0),
residual(_residual), complement(nullptr) {};
shared_ptr<node> from;
shared_ptr<node> to;
size_t capacity;
size_t flow;
// is a residual edge
bool residual;
// complement edge to->from
// for the max-flow residual network
shared_ptr<edge> complement;
// residual capacity
inline size_t cf() { return capacity - flow; }
};
class node
{
public:
node(const vote_t& _vote) : vote(_vote), count(1) {}
size_t count;
vote_t vote;
set<shared_ptr<edge>> outgoing;
};
class reality_show
{
public:
reality_show(istream& in)
{
int num_cats = 0, num_dogs = 0;
in >> num_cats;
if (num_cats <= 0) throw std::invalid_argument("invalid cat number.");
in >> num_dogs;
if (num_dogs <= 0) throw std::invalid_argument("invalid dog number.");
in >> num_voters;
if (num_voters <= 0) throw std::invalid_argument("invalid voters number.");
// skip line
std::string line;
std::getline(in, line);
for (auto v = 0; v < num_voters; ++v)
{
std::getline(in, line);
boost::trim(line);
using separator_t = boost::char_separator < char >;
separator_t sep(" ", "CD");
boost::tokenizer<separator_t> tokens(line, sep);
if (4 != std::distance(tokens.begin(), tokens.end()))
throw std::invalid_argument("invalid vote format.");
auto token = tokens.begin();
auto first_pet_type = *token;
if ("C" != first_pet_type && "D" != first_pet_type)
throw std::invalid_argument("invalid vote format.");
++token;
auto first_id = std::stoi(*token);
++token;
auto second_pet_type = *token;
if (("C" != second_pet_type && "D" != second_pet_type) ||
first_pet_type == second_pet_type)
throw std::invalid_argument("ilegal vote format.");
++token;
auto second_id = std::stoi(*token);
if (first_id < 1 || second_id < 1)
throw std::invalid_argument("invalid pet id.");
if ("C" == first_pet_type &&
(first_id > num_cats || second_id > num_dogs))
throw std::invalid_argument("invalid pet id.");
if ("D" == first_pet_type &&
(first_id > num_dogs || second_id > num_cats))
throw std::invalid_argument("invalid pet id.");
if (nodes.find(line) == nodes.end())
{
vote_t vote(first_id, second_id, "D" == first_pet_type);
nodes[line] = std::make_shared<node>(vote);
}
else
{
++nodes[line]->count;
}
}
}
size_t advance()
{
// For this problem the maximum number of voters kept
// is the original number of voters minus
// the maximum flow.
// The reason for this is that according to the
// max-flow-min-cut theorem, the max flow
// is equal to the minimum capacity that when removed
// from the network in a specific way, causes no flow
// from source to sink. That specific way is
// by removing the min-cut edges from the graph.
// In this problem, since we assigned infinite capacity
// to the original edges, the min-cut edges must be
// connected to the source or the sink. Additionally,
// the constraint nodes in these edges correspond
// to the minimum weight vertex cover, i.e. exactly
// the constraints we need to remove in order to avoid
// conflicts between the remaining voters constraints
// and maximize the number of remaining voters.
construct_bipartite_flow_graph();
auto max_flow = edmunds_karp_max_flow();
return num_voters - max_flow;
}
private:
void connect_nodes(shared_ptr<node>& from, shared_ptr<node>& to)
{
// create edge between conflicting nodes, infinite capacity
auto e = std::make_shared<edge>(from, to, std::numeric_limits<size_t>::max());
// create reverse residual edge as well
auto re = std::make_shared<edge>(to, from, std::numeric_limits<size_t>::max(), true);
// make them complements (i.e. flow updates effect each other)
e->complement = re;
re->complement = e;
// update the respective nodes
from->outgoing.insert(e);
to->outgoing.insert(re);
}
void construct_bipartite_flow_graph()
{
// mappings of pet id to the nodes
map< size_t, set<shared_ptr<node>> > dog_lovers;
map< size_t, set<shared_ptr<node>> > dog_haters;
map< size_t, set<shared_ptr<node>> > cat_lovers;
map< size_t, set<shared_ptr<node>> > cat_haters;
for (auto& p : nodes)
{
auto node = p.second;
auto vote = node->vote;
if (vote.hates_cats)
{
// hi i'm a dog lover
dog_lovers[vote.first_id].insert(node);
// and a cat hater
cat_haters[vote.second_id].insert(node);
// connect to everybody i hate
if (cat_lovers.find(vote.second_id) != cat_lovers.end())
{
for (auto cnode : cat_lovers[vote.second_id])
{
connect_nodes(node, cnode);
}
}
// connect to everybody that hates me
if (dog_haters.find(vote.first_id) != dog_haters.end())
{
for (auto cnode : dog_haters[vote.first_id])
{
connect_nodes(node, cnode);
}
}
}
else
{
// hi i'm a cat lover
cat_lovers[vote.first_id].insert(node);
// and a dog hater
dog_haters[vote.second_id].insert(node);
// connect everybody i hate to me
if (dog_lovers.find(vote.second_id) != dog_lovers.end())
{
for (auto dnode : dog_lovers[vote.second_id])
{
connect_nodes(dnode, node);
}
}
// connect everybody that hates me to me
if (cat_haters.find(vote.first_id) != cat_haters.end())
{
for (auto dnode : cat_haters[vote.first_id])
{
connect_nodes(dnode, node);
}
}
}
}
// create source and sink nodes
source = std::make_shared<node>(vote_t(0, 0, true));
sink = std::make_shared<node>(vote_t(0, 0, true));
// connect source to all dog lover nodes, with capacity equal to their count
for (auto& p : dog_lovers)
{
for (auto& node : p.second)
{
auto e = std::make_shared<edge>(source, node, node->count);
source->outgoing.insert(e);
}
}
// connect all cat lover nodes to sink, with capacity equal to their count
for (auto& p : cat_lovers)
{
for (auto& node : p.second)
{
auto e = std::make_shared<edge>(node, sink, node->count);
node->outgoing.insert(e);
}
}
}
// modified bfs to find a path from source to sink that can be improved
list<shared_ptr<edge>> find_residual_path()
{
map<shared_ptr<node>, shared_ptr<edge>> visited;
queue<shared_ptr<node>> q;
q.push(source);
while (!q.empty())
{
auto node = q.front();
q.pop();
for (auto e : node->outgoing)
{
if (e->cf() > 0 && visited.find(e->to) == visited.end())
{
visited[e->to] = e;
if (e->to == sink)
{
list<shared_ptr<edge>> path;
auto node = sink;
while (source != node)
{
path.push_back(visited[node]);
node = visited[node]->from;
}
return path;
}
q.push(e->to);
}
}
}
// no path
return list<shared_ptr<edge>>();
}
size_t edmunds_karp_max_flow()
{
auto max_flow = 0;
auto path = find_residual_path();
while (!path.empty())
{
// find minimum residual flow on path
auto mincf = std::numeric_limits<size_t>::max();
for (auto e : path)
{
mincf = std::min(e->cf(), mincf);
}
// update path flow
for (auto e : path)
{
// add flow to path edge
e->flow += mincf;
if (nullptr != e->complement)
{
// subtract flow from complement edge
auto ec = e->complement;
ec->flow -= mincf;
}
}
max_flow += mincf;
path = find_residual_path();
}
return max_flow;
}
using node_key = std::string;
map < node_key, shared_ptr<node> > nodes;
shared_ptr<node> source;
shared_ptr<node> sink;
size_t num_voters;
};
void test_reality_show()
{
std::string sin =
"3\n" \
"1 1 2\n" \
"C1 D1\n" \
"D1 C1\n" \
"1 2 4\n" \
"C1 D1\n" \
"C1 D1\n" \
"C1 D2\n" \
"D2 C1\n" \
"2 3 13\n" \
"C1 D1\n" \
"C2 D1\n" \
"C2 D1\n" \
"C2 D1\n" \
"C2 D1\n" \
"C2 D2\n" \
"C2 D2\n" \
"D1 C1\n" \
"D1 C1\n" \
"D1 C1\n" \
"D1 C2\n" \
"D3 C1\n" \
"D3 C1\n";
std::istringstream in(sin);
auto num_test_cases = 0;
in >> num_test_cases;
size_t expected[] = { 1, 3, 8 };
for (auto tc = 0; tc < num_test_cases; ++tc)
{
reality_show rs(in);
assert(expected[tc] == rs.advance());
}
}
}
```

Most graph-approaches I read here in the comments model votes as vertices and have two votes be adjacent if they are conflicting. This means that on the resulting graph you need to compute the cardinal of a maximum independent set, which is an NP-hard problem. I approached it in a slightly different way: each vote is a vertex and two vertices are adjacent if they are not conflicting. That way, what you need to look for in the resulting graph is a vertex with the highest possible degree, and you return that degree+1.

To lxfuhuo@, This problem is really interesting to work with. However, was this question really asked in a phone interview? And expected to be solved (coded up) within 30 min from a 45 min interview.

If not, then please remove the "Phone Interview" interview type from this question. It is very misleading and doing a disservice to everyone.

I suspect that you guys are all wrong. The nature of this problem can create all sorts of complex dependencies so all the sorting and counting techniques will not work (and yes, the goal of the problem is to eliminate an arbitrary number of pets - so in the second example you eliminate D_1, D_2 and keep 3 voters happy).

- Someone December 19, 2014So, I will refer to a vote (C_i, D_j) as a constraint where the constraint is satisfied if and only if C_i is kept and D_j is eliminate (C_i denoting the i-th cat and D_j denoting the j-th dog). In order to model all constraints, we create a graph G where its vertices are the constraints, and any two vertices are connected with an edge if the two constraints violate each other (by violating I mean that one voter wants to eliminate a certain pet, and another want to keep that same pet). We would like to find a maximal independent set, that is, a maximal set of constraints that none of them violate each other. That would be the maximal set of votes that one can satisfy.

The key observation here is that this graph G is a bipartite graph. The reason for this is that we have two type of constraints: (C_i, D_j) or (D_k, C_l), that is, either we want to keep a cat and eliminate a dog or the other way around. Any two constraints of type (C, D) can not violate each other, and the same goes for any two constraints of type (D, C). That gives us the sides of the bipartite graph - one side being all constraints of type (C, D) and the other all constraints of type (D, C). We want to find a maximal independent set in that graph. I won't go into details here, but that can be reduced into finding a minimal vertex cover, which can be solved by finding a maximal matching, which can be solved in time O(|E|\sqrt(|V|)) in a bipartite graph.

I hope that I got it right.