General Programming » Algorithms » Data Structures » Creating a Sparse Matrix in .NET |

A sparse matrix is a data structure that works like a two-dimensional array but uses far less memory. It is ideally suited for situations where you need to represent a grid, but most cells in the grid will be empty.

For example, consider a spreadsheet with 1,000,000 rows and 1,000,000 columns. To represent the data for such a spreadsheet using a two-dimensional array, the array would need to store 1,000,000,000,000 elements! When you consider that most spreadsheets use only a tiny fraction of the available cells, this represents a huge waste of memory. A two-dimensional array uses memory for all cells, whether they are used or not.

Enter the sparse matrix. This is a data structure that stores grid-like data, while using far less memory when some cells are empty.

There are many different ways to implement a sparse matrix. Some methods favor fast navigation through rows and columns at the expense of using more memory. No one method is always the best, although some methods are better choices given a specific set of requirements.

In .NET, you can take advantage of the many collection classes available. I chose the Dictionary<> class to implement my sparse matrix. Dictionaries provide a fast look up given the key for the value you want to find.

At the root of my sparse matrix is a dictionary to holds all the rows. Only rows that contain non-empty cells are represented. If the matrix contains no elements, my root dictionary is also empty.

Each element in my rows dictionary holds another dictionary for all the columns in that row. Again, a column entry only exists when that row and column contain non-empty cells. The columns dictionary contains a collection of the actual grid items.

As you might guess, the key used in the row dictionary is the row number, and the key used in the columns dictionary is the column number.

Listing 1 shows my SparseMatrix class. It is implemented as a generic class, which allows you to specify the data type stored in each cell.

When creating an instance of the SparseMatrix class, you don't need to specify the dimensions of the two-dimensional grid. Just declare an instance of the class, and start setting values at any row/column position.

The class overrides the square brackets operator, making it easy to treat a SpareMatrix instance like a two-dimensional array. For example, you could set the value at row 100 and column 200 using matrix[100, 200] = value. And you can read values using the same syntax.

When you read a cell value from a cell that is empty, the class returns the default value for the cell data type. For example, if the data type is String or a class type, it would return null.

Note that this can cause problems for simple data types like integers. The default value for an integer is 0. So you wouldn't be able to distinguish between an empty cell and a cell with the value of 0 if you declared the cell type as something like an integer. Keep this in mind if you use the class. However, in most cases, your data type will probably be some type of class.

**Listing 1: The SparseMatrix Class**

class SparseMatrix<T> { // Master dictionary hold rows of column dictionary protected Dictionary<int, Dictionary<int, T>> _rows; /// <summary> /// Constructs a SparseMatrix instance. /// </summary> public SparseMatrix() { _rows = new Dictionary<int, Dictionary<int, T>>(); } /// <summary> /// Gets or sets the value at the specified matrix position. /// </summary> /// <param name="row">Matrix row</param> /// <param name="col">Matrix column</param> public T this[int row, int col] { get { return GetAt(row, col); } set { SetAt(row, col, value); } } /// <summary> /// Gets the value at the specified matrix position. /// </summary> /// <param name="row">Matrix row</param> /// <param name="col">Matrix column</param> /// <returns>Value at the specified position</returns> public T GetAt(int row, int col) { Dictionary<int, T> cols; if (_rows.TryGetValue(row, out cols)) { T value = default(T); if (cols.TryGetValue(col, out value)) return value; } return default(T); } /// <summary> /// Sets the value at the specified matrix position. /// </summary> /// <param name="row">Matrix row</param> /// <param name="col">Matrix column</param> /// <param name="value">New value</param> public void SetAt(int row, int col, T value) { if (EqualityComparer<T>.Default.Equals(value, default(T))) { // Remove any existing object if value is default(T) RemoveAt(row, col); } else { // Set value Dictionary<int, T> cols; if (!_rows.TryGetValue(row, out cols)) { cols = new Dictionary<int, T>(); _rows.Add(row, cols); } cols[col] = value; } } /// <summary> /// Removes the value at the specified matrix position. /// </summary> /// <param name="row">Matrix row</param> /// <param name="col">Matrix column</param> public void RemoveAt(int row, int col) { Dictionary<int, T> cols; if (_rows.TryGetValue(row, out cols)) { // Remove column from this row cols.Remove(col); // Remove entire row if empty if (cols.Count == 0) _rows.Remove(row); } } /// <summary> /// Returns all items in the specified row. /// </summary> /// <param name="row">Matrix row</param> public IEnumerable<T> GetRowData(int row) { Dictionary<int, T> cols; if (_rows.TryGetValue(row, out cols)) { foreach (KeyValuePair<int, T> pair in cols) { yield return pair.Value; } } } /// <summary> /// Returns the number of items in the specified row. /// </summary> /// <param name="row">Matrix row</param> public int GetRowDataCount(int row) { Dictionary<int, T> cols; if (_rows.TryGetValue(row, out cols)) { return cols.Count; } return 0; } /// <summary> /// Returns all items in the specified column. /// This method is less efficent than GetRowData(). /// </summary> /// <param name="col">Matrix column</param> /// <returns></returns> public IEnumerable<T> GetColumnData(int col) { foreach (KeyValuePair<int, Dictionary<int, T>> rowdata in _rows) { T result; if (rowdata.Value.TryGetValue(col, out result)) yield return result; } } /// <summary> /// Returns the number of items in the specified column. /// This method is less efficent than GetRowDataCount(). /// </summary> /// <param name="col">Matrix column</param> public int GetColumnDataCount(int col) { int result = 0; foreach (KeyValuePair<int, Dictionary<int, T>> cols in _rows) { if (cols.Value.ContainsKey(col)) result++; } return result; } }

As stated, you don't need to set the dimensions of the matrix. Just declare an instance and start writing and reading cell values.

To delete a cell, just set it to the default value for the data type. Alternatively, you could also use the RemoveAt() method.

I also added a couple of helper methods for examining specific rows or columns of data. Note the comments in the source code that indicate the row helper methods are much more efficient than the column helper methods because of the way the class is implemented.

The attached test project simply allows you to specify row and column values. The program will display the current value at that location. You can click the edit button to change the value. It's very trivial, but it demonstrates how to use the SparseMatrix class.

This class would be more work without the .NET collection classes. As it stands, it's a very minor class. However, it could be quite useful if you have the need to represent a sparsely populated grid.

Use of this article and any related source code or other files is governed by the terms and conditions of The Code Project Open License.