Facebook Interview Question
Software EngineersCountry: United States
Interview Type: In-Person
Good explanation, but, I think, you could still use the same approach with linked lists for addition as well - that gives you better performance, than m*n. And the result will be a spare matrix as well: for result row i you'll have a new linked list of length A[i].Length (length of linked list for i row in matrix A) + B[i].Length (length of linked list for i row in matrix B) as max.
I decided to go with 2 arrays of linked list per matrix: one for rows and one for columns.
I tested the code and it seems to work. Posting only the interesting part. You can devise the rest.
This is C#.
static Matrix dotProduct(Matrix a, Matrix b)
{
if (a.nCl != b.nRows)
throw new ArgumentException();
Matrix res = new Matrix(a.nRows, b.nCl);
for (int r = 0; r < a.nRows; r++)
{
for (int c = 0; c < b.nCl; c++)
{
var row = a.rows[r];
var cl = b.columns[c];
if (row == null || cl == null )
continue;
var rowIt = row.GetEnumerator();
var clIt = cl.GetEnumerator();
var rNext = rowIt.MoveNext() ? rowIt.Current : null;
var cNext = clIt.MoveNext() ? clIt.Current : null;
int x = 0;
while (rNext != null && cNext != null)
{
if (rNext.cl == cNext.row)
{
x += rNext.v*cNext.v;
rNext = rowIt.MoveNext() ? rowIt.Current : null;
cNext = clIt.MoveNext() ? clIt.Current : null;
}
else if (rNext.cl < cNext.row)
{
rNext = rowIt.MoveNext() ? rowIt.Current : null;
}
else
{
cNext = clIt.MoveNext() ? clIt.Current : null;
}
}
if (x != 0)
{
if (res.rows[r] == null) res.rows[r] = new LinkedList<El>();
if (res.columns[c] == null) res.columns[c] = new LinkedList<El>();
var el = new El()
{
cl = c,
row = r,
v = x
};
res.rows[r].AddLast(el);
res.columns[c].AddLast(el);
}
}
}
return res;
}
class SparseMatrix:
def __init__(self, matrix):
self.num_rows = len(matrix)
self.num_columns = len(matrix[0])
self.values = dict()
for i in self.num_rows:
for j in self.num_columns:
self.values[(i,j)] = matrix[i][j]
def shape(self):
return (self.num_rows, self.num_columns)
def dot(self, other):
if self.shape() != other.shape():
raise Exception("Matrices must have same shape")
ans = 0
for (r, c) in self.values:
if (r, c) in other.values:
ans += self.values[(r,c)]*other.values[(r,c)]
return ans
For these types of questions, it's usually safe to assume that sparse matrices implies few N entries relative to very large dimensionality M. For example, N=100 values in sparse matrix size M=2^50, which implies that direct array storage is completely out of the question.
The related (separate) question involves dot product computation (a linear operation), which gives some hints to what the running order should be. Regardless, the answer to this question - as it stands - is really any structure with space growth that tracks the number of loaded entries independently of the matrix dimensions.
We can use one of the following
1. Table representation
total#rows total#columns total#nonZeroValues // header row first row of table
//next rows below the actual values
// if a row as two values then there will be two rows here
2. LinkedList representation
// this is bit complex, than first one
//it has three nodes HeaderNode, RowNode and ColumnNode
//HeaderNode -> Total#rows,Total#column, pointer to next row
//RowNode -> Row#,pointerToNextRow,pointerToNextColumn
//ColumnNode -> Column#,Value, pointerToNextColumn(if a row as more than one nonZero values)
a datastructure is driven by the operations one wants to perform on it. The question screams for clarification of this operations.
- Chris November 06, 2016Since it's sparse, a pure array seems not the solution although in practice it depends on the expected size of the matrix (average, max) and the numbers of matrices to be stored.
For example if we have to deal with 3 sparse matrices of size10x10 with average 2 elements being not 0 in all cases I can think of the best representation is a two dimensional array. If we need to keep tenths or hunderts of thousands of such 10x10 matrices in memory or if a single matrix has large dimensions, say 100'000 x 100'000 this representation may not be good anymore.
Thinking about the operations one potentially may want to perform on such matrices like addition and multiplication we do not require random access to single elements but sequential access for addition and random access of a whole row / column where the elements of that row / column are accessed sequentially.
In case of addition we need m*n operations anyway, be it to add the elements or to initialize the result with 0.
For multiplication the thing is different. m x n * n x p matrix will result in a m x p matrix. So, if we have to trie to access every element we end up with O(m x n + n x p) compared to optimal if it is sparse that is O(n x p) assuming, for simplicity, the number of elements in both matrices that are not 0 is smaller than n x p.
It seems a linked list per row and per column may be a good choice for addition and multiplication. This would store the value and column-index for a row, or value and row-index for a column.
It is slow to insert/delete and random read single elments. As well attention to construction needs to be payed (how to fill the data structure initially). If we want to speed up random access (read) we could maintain a hashmap on top. If insert/delete of single elements needs optimization while still providing reasonable access to all elements of a column (in row order) and vice versa for a row the linked lists could be replaced by balanced search trees.