Login


A Simple CSS Parser

By Jonathan Wood on 8/22/2013
Language: C#
Technology: .NET
Platform: Windows
License: CPOL
Views: 17,739
General Programming » Text Handling » Parsing » A Simple CSS Parser

CSS Parser Test Program

Download Source Code and Test Project Download Source Code and Test Project

Introduction

I recently encountered a question on stackoverflow about parsing CSS. Although I've written a bunch of parsing routines, I'd never thought to parse CSS. I've never had a reason to do so. Still, I had a bit of time and thought it might be interesting to put something together.

While basic CSS is pretty straight forward, it is a constantly evolving syntax with some less frequently used keywords (such as @media and @import). It can be embedded in script sections within an HTML file, or it can be in a stand-alone file. I wanted to keep my CSS parsing code as simple as possible. So it assumes a string that contains a valid block of CSS. If that string comes from a script section within an HTML file, then the caller is responsible for extracting just the CSS.

My CssParser Class

Listing 1 shows my CssParser class. The class derives from my TextParser class, which provides some of the low-level parsing logic.

The code parses a block of CSS via the ParseAll() method. This method begins by stripping out all comments. Initially, I was scanning for comments within each of the parsing routines. By simply stripping out all the comments before anything else, the parsing was much simpler.

The code then enters a loop that continues until the end of the text is reached. The loops starts by skipping over any whitespace. Next, it checks for an "@-rule".

While I did want to keep the parser as simple as possible, I decided to supprt the @media rule. This rule is followed by a block of CSS within curly braces. All rules within those curly braces are specific to the specified media. So I added a Media field to my data, and simply set that when parsing rules within a @media block. All other "@-rules" will throw a NotSupportedException.

In the case of a @media rule, the code parses out the media types and then extracts the media block withing the curly braces. It then creates a new instance of the CssParser class and uses it to parse that block. The media types are passed to this new instance so that each rule within the block can be given a reference to those media types. The code then appends the rules returned to its own list of rules.

In the case where an "@-rule" is not found (the usual case), the code scans forward for the next open curly brace ({). If one is found, the text that came before it is assumed to be a comma-delimited list of selectors, which are parsed out. Next, the code scans for the next closing curly brace (}). It then parses all the properties within those curly braces.

This parsing is pretty straight forward. Each property/value pair is delimited by a semi-colon (;), and within each pair, the property is delimited from the value by a colon (:).

Listing 1: The CssParser Class

/// <summary>
/// Class to hold information for a single CSS declaration.
/// </summary>
public class CssParserDeclaration
{
    public string Property { get; set; }
    public string Value { get; set; }
}

/// <summary>
/// Class to hold information for single CSS rule.
/// </summary>
public class CssParserRule
{
    public CssParserRule(string media)
    {
        Selectors = new List<string>();
        Declarations = new List<CssParserDeclaration>();
        Media = media;
    }

    public string Media { get; set; }
    public IEnumerable<string> Selectors { get; set; }
    public IEnumerable<CssParserDeclaration> Declarations { get; set; }
}

/// <summary>
/// Class to parse CSS text into data structures.
/// </summary>
public class CssParser : TextParser
{
    protected const string OpenComment = "/*";
    protected const string CloseComment = "*/";

    private string _media;

    public CssParser(string media = null)
    {
        _media = media;
    }

    public IEnumerable<CssParserRule> ParseAll(string css)
    {
        int start;

        Reset(css);
        StripAllComments();

        List<CssParserRule> rules = new List<CssParserRule>();

        while (!EndOfText)
        {
            MovePastWhitespace();

            if (Peek() == '@')
            {
                // Process "at-rule"
                string atRule = ExtractSkippedText(MoveToWhiteSpace).ToLower();
                if (atRule == "@media")
                {
                    start = Position;
                    MoveTo('{');
                    string newMedia = Extract(start, Position).Trim();

                    // Parse contents of media block
                    string innerBlock = ExtractSkippedText(() => SkipOverBlock('{', '}'));

                    // Trim curly braces
                    if (innerBlock.StartsWith("{"))
                        innerBlock = innerBlock.Remove(0, 1);
                    if (innerBlock.EndsWith("}"))
                        innerBlock = innerBlock.Substring(0, innerBlock.Length - 1);

                    // Parse CSS in block
                    CssParser parser = new CssParser(newMedia);
                    rules.AddRange(parser.ParseAll(innerBlock));

                    continue;
                }
                else throw new NotSupportedException(String.Format("{0} rule is unsupported", atRule));
            }

            // Find start of next declaration block
            start = Position;
            MoveTo('{');
            if (EndOfText) // Done if no more
                break;

            // Parse selectors
            string selectors = Extract(start, Position);
            CssParserRule rule = new CssParserRule(_media);
            rule.Selectors = from s in selectors.Split(',')
                                let s2 = s.Trim()
                                where s2.Length > 0
                                select s2;

            // Parse declarations
            MoveAhead();
            start = Position;
            MoveTo('}');
            string properties = Extract(start, Position);
            rule.Declarations = from s in properties.Split(';')
                                let s2 = s.Trim()
                                where s2.Length > 0
                                let x = s2.IndexOf(':')
                                select new CssParserDeclaration
                                {
                                    Property = s2.Substring(0, (x < 0) ? 0 : x).TrimEnd(),
                                    Value = s2.Substring((x < 0) ? 0 : x + 1).TrimStart()
                                };

            // Skip over closing curly brace
            MoveAhead();

            // Add rule to results
            rules.Add(rule);
        }
        // Return rules to caller
        return rules;
    }

    /// <summary>
    /// Removes all comments from the current text.
    /// </summary>
    protected void StripAllComments()
    {
        StringBuilder sb = new StringBuilder();

        Reset();
        while (!EndOfText)
        {
            if (IsComment())
            {
                SkipOverComment();
            }
            else if (IsQuote())
            {
                sb.Append(ExtractSkippedText(SkipOverQuote));
            }
            else
            {
                sb.Append(Peek());
                MoveAhead();
            }
        }
        Reset(sb.ToString());
    }

    /// <summary>
    /// Moves to the next occurrence of the specified character, skipping
    /// over quoted values.
    /// </summary>
    /// <param name="c">Character to find</param>
    public new void MoveTo(char c)
    {
        while (Peek() != c && !EndOfText)
        {
            if (IsQuote())
                SkipOverQuote();
            else
                MoveAhead();
        }
    }

    /// <summary>
    /// Moves to the next whitespace character.
    /// </summary>
    private void MoveToWhiteSpace()
    {
        while (!Char.IsWhiteSpace(Peek()) && !EndOfText)
            MoveAhead();
    }

    /// <summary>
    /// Skips over the quoted text that starts at the current position.
    /// </summary>
    protected void SkipOverQuote()
    {
        Debug.Assert(IsQuote());
        char quote = Peek();
        MoveAhead();
        while (Peek() != quote && !EndOfText)
            MoveAhead();
        MoveAhead();
    }

    /// <summary>
    /// Skips over the comment that starts at the current position.
    /// </summary>
    protected void SkipOverComment()
    {
        Debug.Assert(IsComment());
        MoveAhead(OpenComment.Length);
        MoveTo(CloseComment, true);
        MoveAhead(CloseComment.Length);
    }

    /// <summary>
    /// Skips over a block of text bounded by the specified start and end
    /// character. Blocks may be nested, in which case the endChar of
    /// inner blocks is ignored (the entire outer block is returned).
    /// Sets the current position to just after the final end character.
    /// </summary>
    /// <param name="startChar"></param>
    /// <param name="endChar"></param>
    private void SkipOverBlock(char startChar, char endChar)
    {
        Debug.Assert(Peek() == startChar);
        MoveAhead();
        int depth = 1;
        while (depth > 0 && !EndOfText)
        {
            if (IsQuote())
            {
                SkipOverQuote();
            }
            else
            {
                if (Peek() == startChar)
                    depth++;
                else if (Peek() == endChar)
                    depth--;
                MoveAhead();
            }
        }
    }

    /// <summary>
    /// Calls the specified action and then returns a string of all characters
    /// that the method skipped over.
    /// </summary>
    /// <param name="a">Action to call</param>
    /// <returns></returns>
    protected string ExtractSkippedText(Action a)
    {
        int start = Position;
        a();
        return Extract(start, Position);
    }

    /// <summary>
    /// Indicates if single or double-quoted text begins at the current
    /// location.
    /// </summary>
    protected bool IsQuote()
    {
        return (Peek() == '\'' || Peek() == '"');
    }

    /// <summary>
    /// Indicates if a comment begins at the current location.
    /// </summary>
    protected bool IsComment()
    {
        return IsEqualTo(OpenComment);
    }

    /// <summary>
    /// Determines if text at the current position matches the specified string.
    /// </summary>
    /// <param name="s">String to compare against current position</param>
    protected bool IsEqualTo(string s)
    {
        Debug.Assert(!String.IsNullOrEmpty(s));
        for (int i = 0; i < s.Length; i++)
        {
            if (Peek(i) != s[i])
                return false;
        }
        return true;
    }
}

The ParseAll() method returns a list of CssParserRule objects, which describe the CSS that was parsed.

Conclusion

The attached download includes the parsing code and a simple test program to test the parser. All the test program does is take a block of CSS, parses it, and then re-renders it to a string and copies that string to the keyboard. About the only value that provides is to test the parser. But you can use that as a starting point for your own application.

Perhaps I will return to this code to add support for some of the other keywords. But for now, it was just a fun excercise.

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Code Project Open License.

Author Information

Jonathan Wood

I'm a software/website developer working out of the greater Salt Lake City area in Utah. I've developed many websites including Black Belt Coder, Insider Articles, and others.