Login


Reading and Writing CSV Files in C#

By Jonathan Wood on 12/14/2010 (Updated on 3/26/2013)
Language: C#
Technology: .NET
Platform: Windows
License: CPOL
Views: 341,278
Desktop Development » Files & Directories » File Formats » Reading and Writing CSV Files in C#

Screenshot of CSV File Test Project

Download Test Project Download Test Project

Introduction

A common requirement is to have applications share data with other programs. Although there are interfaces available to work with, for example, Microsoft Excel data files, this approach is generally complex, involves a fair amount of overhead, and requires that support libraries accompany your application.

Note that the code below is a complete rewrite of the code presented when this article was first published. I decided to make the code more robust and add a number of new features that include support for multi-line values and the ability to change the characters used for delimiters and quotes. I also added several options to control how the CSV reader class handles empty lines.

Comma-Separated Values (CSV) Files

A much simpler way to have your application share data is by reading and writing Comma-Separated Values (CSV) files. CSV files can easily be read and written by many programs, including Microsoft Excel.

For the most part, reading and writing CSV files is trivial. As the name suggestions, a CSV file is simply a plain text file that contains one or more values per line, separated by commas. Each value is a field (or column in a spreadsheet), and each line is a record (or row in a spreadsheet).

However, there is slightly more work involved. Double quotes are used to wrap values that contain special characters such as commas, double quotes, new lines, etc. This is required to prevent those special characters from being interpreted as value delimiters, etc. In addition, double quotes that exist within a double quoted value must appear as two double quote characters together to distinguish them from the double quote character at the end of the value.

So this seems like a perfect task for a handy little C# class. Listing 1 shows my CsvFileReader and CsvFileWriter classes.

Listing 1: CsvFileReader and CsvFileWriter Classes

/// <summary>
/// Determines how empty lines are interpreted when reading CSV files.
/// These values do not affect empty lines that occur within quoted fields
/// or empty lines that appear at the end of the input file.
/// </summary>
public enum EmptyLineBehavior
{
    /// <summary>
    /// Empty lines are interpreted as a line with zero columns.
    /// </summary>
    NoColumns,
    /// <summary>
    /// Empty lines are interpreted as a line with a single empty column.
    /// </summary>
    EmptyColumn,
    /// <summary>
    /// Empty lines are skipped over as though they did not exist.
    /// </summary>
    Ignore,
    /// <summary>
    /// An empty line is interpreted as the end of the input file.
    /// </summary>
    EndOfFile,
}

/// <summary>
/// Common base class for CSV reader and writer classes.
/// </summary>
public abstract class CsvFileCommon
{
    /// <summary>
    /// These are special characters in CSV files. If a column contains any
    /// of these characters, the entire column is wrapped in double quotes.
    /// </summary>
    protected char[] SpecialChars = new char[] { ',', '"', '\r', '\n' };

    // Indexes into SpecialChars for characters with specific meaning
    private const int DelimiterIndex = 0;
    private const int QuoteIndex = 1;

    /// <summary>
    /// Gets/sets the character used for column delimiters.
    /// </summary>
    public char Delimiter
    {
        get { return SpecialChars[DelimiterIndex]; }
        set { SpecialChars[DelimiterIndex] = value; }
    }

    /// <summary>
    /// Gets/sets the character used for column quotes.
    /// </summary>
    public char Quote
    {
        get { return SpecialChars[QuoteIndex]; }
        set { SpecialChars[QuoteIndex] = value; }
    }
}

/// <summary>
/// Class for reading from comma-separated-value (CSV) files
/// </summary>
public class CsvFileReader : CsvFileCommon, IDisposable
{
    // Private members
    private StreamReader Reader;
    private string CurrLine;
    private int CurrPos;
    private EmptyLineBehavior EmptyLineBehavior;

    /// <summary>
    /// Initializes a new instance of the CsvFileReader class for the
    /// specified stream.
    /// </summary>
    /// <param name="stream">The stream to read from</param>
    /// <param name="emptyLineBehavior">Determines how empty lines are handled</param>
    public CsvFileReader(Stream stream,
        EmptyLineBehavior emptyLineBehavior = EmptyLineBehavior.NoColumns)
    {
        Reader = new StreamReader(stream);
        EmptyLineBehavior = emptyLineBehavior;
    }

    /// <summary>
    /// Initializes a new instance of the CsvFileReader class for the
    /// specified file path.
    /// </summary>
    /// <param name="path">The name of the CSV file to read from</param>
    /// <param name="emptyLineBehavior">Determines how empty lines are handled</param>
    public CsvFileReader(string path,
        EmptyLineBehavior emptyLineBehavior = EmptyLineBehavior.NoColumns)
    {
        Reader = new StreamReader(path);
        EmptyLineBehavior = emptyLineBehavior;
    }

    /// <summary>
    /// Reads a row of columns from the current CSV file. Returns false if no
    /// more data could be read because the end of the file was reached.
    /// </summary>
    /// <param name="columns">Collection to hold the columns read</param>
    public bool ReadRow(List<string> columns)
    {
        // Verify required argument
        if (columns == null)
            throw new ArgumentNullException("columns");

    ReadNextLine:
        // Read next line from the file
        CurrLine = Reader.ReadLine();
        CurrPos = 0;
        // Test for end of file
        if (CurrLine == null)
            return false;
        // Test for empty line
        if (CurrLine.Length == 0)
        {
            switch (EmptyLineBehavior)
            {
                case EmptyLineBehavior.NoColumns:
                    columns.Clear();
                    return true;
                case EmptyLineBehavior.Ignore:
                    goto ReadNextLine;
                case EmptyLineBehavior.EndOfFile:
                    return false;
            }
        }

        // Parse line
        string column;
        int numColumns = 0;
        while (true)
        {
            // Read next column
            if (CurrPos < CurrLine.Length && CurrLine[CurrPos] == Quote)
                column = ReadQuotedColumn();
            else
                column = ReadUnquotedColumn();
            // Add column to list
            if (numColumns < columns.Count)
                columns[numColumns] = column;
            else
                columns.Add(column);
            numColumns++;
            // Break if we reached the end of the line
            if (CurrLine == null || CurrPos == CurrLine.Length)
                break;
            // Otherwise skip delimiter
            Debug.Assert(CurrLine[CurrPos] == Delimiter);
            CurrPos++;
        }
        // Remove any unused columns from collection
        if (numColumns < columns.Count)
            columns.RemoveRange(numColumns, columns.Count - numColumns);
        // Indicate success
        return true;
    }

    /// <summary>
    /// Reads a quoted column by reading from the current line until a
    /// closing quote is found or the end of the file is reached. On return,
    /// the current position points to the delimiter or the end of the last
    /// line in the file. Note: CurrLine may be set to null on return.
    /// </summary>
    private string ReadQuotedColumn()
    {
        // Skip opening quote character
        Debug.Assert(CurrPos < CurrLine.Length && CurrLine[CurrPos] == Quote);
        CurrPos++;

        // Parse column
        StringBuilder builder = new StringBuilder();
        while (true)
        {
            while (CurrPos == CurrLine.Length)
            {
                // End of line so attempt to read the next line
                CurrLine = Reader.ReadLine();
                CurrPos = 0;
                // Done if we reached the end of the file
                if (CurrLine == null)
                    return builder.ToString();
                // Otherwise, treat as a multi-line field
                builder.Append(Environment.NewLine);
            }

            // Test for quote character
            if (CurrLine[CurrPos] == Quote)
            {
                // If two quotes, skip first and treat second as literal
                int nextPos = (CurrPos + 1);
                if (nextPos < CurrLine.Length && CurrLine[nextPos] == Quote)
                    CurrPos++;
                else
                    break;  // Single quote ends quoted sequence
            }
            // Add current character to the column
            builder.Append(CurrLine[CurrPos++]);
        }

        if (CurrPos < CurrLine.Length)
        {
            // Consume closing quote
            Debug.Assert(CurrLine[CurrPos] == Quote);
            CurrPos++;
            // Append any additional characters appearing before next delimiter
            builder.Append(ReadUnquotedColumn());
        }
        // Return column value
        return builder.ToString();
    }

    /// <summary>
    /// Reads an unquoted column by reading from the current line until a
    /// delimiter is found or the end of the line is reached. On return, the
    /// current position points to the delimiter or the end of the current
    /// line.
    /// </summary>
    private string ReadUnquotedColumn()
    {
        int startPos = CurrPos;
        CurrPos = CurrLine.IndexOf(Delimiter, CurrPos);
        if (CurrPos == -1)
            CurrPos = CurrLine.Length;
        if (CurrPos > startPos)
            return CurrLine.Substring(startPos, CurrPos - startPos);
        return String.Empty;
    }

    // Propagate Dispose to StreamReader
    public void Dispose()
    {
        Reader.Dispose();
    }
}

/// <summary>
/// Class for writing to comma-separated-value (CSV) files.
/// </summary>
public class CsvFileWriter : CsvFileCommon, IDisposable
{
    // Private members
    private StreamWriter Writer;
    private string OneQuote = null;
    private string TwoQuotes = null;
    private string QuotedFormat = null;

    /// <summary>
    /// Initializes a new instance of the CsvFileWriter class for the
    /// specified stream.
    /// </summary>
    /// <param name="stream">The stream to write to</param>
    public CsvFileWriter(Stream stream)
    {
        Writer = new StreamWriter(stream);
    }

    /// <summary>
    /// Initializes a new instance of the CsvFileWriter class for the
    /// specified file path.
    /// </summary>
    /// <param name="path">The name of the CSV file to write to</param>
    public CsvFileWriter(string path)
    {
        Writer = new StreamWriter(path);
    }

    /// <summary>
    /// Writes a row of columns to the current CSV file.
    /// </summary>
    /// <param name="columns">The list of columns to write</param>
    public void WriteRow(List<string> columns)
    {
        // Verify required argument
        if (columns == null)
            throw new ArgumentNullException("columns");

        // Ensure we're using current quote character
        if (OneQuote == null || OneQuote[0] != Quote)
        {
            OneQuote = String.Format("{0}", Quote);
            TwoQuotes = String.Format("{0}{0}", Quote);
            QuotedFormat = String.Format("{0}{{0}}{0}", Quote);
        }

        // Write each column
        for (int i = 0; i < columns.Count; i++)
        {
            // Add delimiter if this isn't the first column
            if (i > 0)
                Writer.Write(Delimiter);
            // Write this column
            if (columns[i].IndexOfAny(SpecialChars) == -1)
                Writer.Write(columns[i]);
            else
                Writer.Write(QuotedFormat, columns[i].Replace(OneQuote, TwoQuotes));
        }
        Writer.WriteLine();
    }

    // Propagate Dispose to StreamWriter
    public void Dispose()
    {
        Writer.Dispose();
    }
}

Because the .NET stream classes generally seem to be split into reading and writing, I decided to follow that same pattern with my CSV code and split it into CsvFileReader and CsvFileWriter. This also simplifies the code because neither class needs to worry about which mode the file is in or protect against the user switching modes.

Starting at the top of the code is the EmptyLineBehavior enum. After careful review, I realized there were a few valid ways to interpret an empty line within a CSV file. So the CsvFileReader class' constructor takes an argument of this type to specify how empty lines should be handled. Note that this does not affect empty lines within a quoted value, or an empty line at the end of the input file.

Next is my CsvFileCommon class. There are a few settings common to both the reader and writer classes and so I use this abstract base class to track the special characters within a CSV value that require the value to be enclosed in quotes. It also provides a way to change the characters used as delimiters and quotes.

The CsvFileReader class comes after that. This is the class that reads data from a CSV file and is the most complex class presented here. I modeled its behavior after how Microsoft Excel interprets CSV files. There are two constructors: one that accepts the name of the input file and another that accepts an input Stream. As mentioned previously, both constructors also accept an EmptyLineBehavior argument to control how empty lines are handled. The ReadRow() method is used to read a single row from the input file and populate a List<string> collection with the values read. For each value, it dispatches the appropriate parsing routine based on whether or not the first character is a quote character.

Finally, the CsvFileWriter class writes data to a CSV file. As with the CsvFileReader class, this class has two constructors. Call the WriteRow() method to write a single row to the target file using a collection of values. Each time WriteRow() is called, it checks to see if the current quote character has changed. If so, it updates the strings used to correctly format quoted output.

Both the reader and writer classes implement IDisposable, which is delegated to the StreamReader or StreamWriter class. This allows you to enclose your use of either class within a using statement to ensure the file is closed in a timely manner

Using the Code

The code was designed to be as easy as possible to use. When you call CsvFileWriter.WriteRow(), you supply a collection of the values to be written to the file. And when you call CsvFileReader.ReadRow(), the collection argument is populated with the values read in.

Listing 2 demonstrates using the classes.

Listing 2: Sample Code to Write and Read CSV files

private void WriteValues()
{
    using (var writer = new CsvFileWriter("WriteTest.csv"))
    {
        // Write each row of data
        for (int row = 0; row < 100; row++)
        {
            // TODO: Populate column values for this row
            List<string> columns = new List<string>();
            writer.WriteRow(columns);
        }
    }
}

private void ReadValues()
{
    List<string> columns = new List<string>();
    using (var reader = new CsvFileReader("ReadTest.csv"))
    {
        while (reader.ReadRow(columns))
        {
            // TODO: Do something with columns' values
        }
    }
}

Using goto Statements

One other thing that I should probably comment on is my use of the goto keyword. In general, using the goto keyword makes the flow of execution harder to see.

For this reason, the use of goto is generally avoided and rarely seen in C# code. In fact, beginning developers are generally told to avoid using goto altogether in an effort to get them thinking about better ways to structure code.

However, some developers seem to take this too far and treat it almost as a religion. Developing software is not a religion. It's about producing the best and most readable code. After trying several different type of loops in the code in question, I concluded that the goto statement was the most efficient and easiest to read. After all, that code is not really a loop. It's just one of many cases where it needs to go back and try something again.

So I may have raised a few eye brows with my use of the goto keyword, but I stand behind the decision.

Conclusion

That's about all there is to it. The attached project includes the code above in a test project that loads CSV files and displays them in a grid. The user can edit the grid and then save the results back to a file.

This code should be helpful for anyone wanting an easy way to share data with Microsoft Excel or any other program that can read or write CSV files.

Update History

12/16/2010: Someone pointed out there were a couple of problems with the code. This update fixes them.

6/26/2011: I received a few questions about how to use the code and so I've added a couple of simple examples.

1/15/2012: Corrected problem with WriteRow() where the first comma was not written if the first column was empty.

10/7/2012: Completely reworked the code to be more robust, flexible and added several new features.

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Code Project Open License.

Author Information

Jonathan Wood

I'm a software/website developer working out of the greater Salt Lake City area in Utah. I've developed many websites including Black Belt Coder, Insider Articles, and others.