Login


A JavaScript Formatter

By Jonathan Wood on 5/22/2011
Language: C#JavaScript
Technology: .NET
Platform: Windows
License: Ms-PL
Views: 5,536
General Programming » Text Handling » General » A JavaScript Formatter

Screenshot of Test Project

Download Source Code and Test Project Download Source Code and Test Project

Introduction

Note: You can try out the code from this article using our online JavaScript Formatter.

Many websites use JavaScript, which runs on the user's browser (client side) to produce a richer browsing experience. While JavaScript can easily be embedded within a web page, larger chunks of scripting source code are often produced in a separate file.

Because these files can grow quite large, it is common practice to compact them by removing unneeded whitespace characters. This can significantly reduce the amount of bandwidth required to download them to the user's browser.

Unfortunately, removing unneeded whitespace also makes the code much more difficult to read. For developers wanting to look at the inner workings of websites they didn't create, compacted JavaScript can make understanding client script much more difficult.

This article will present a C# class to format JavaScript. While it can be used to format any JavaScript, it is probably most useful for those wanting to browse compacted JavaScript source code.

Writing a JavaScript Formatter

Modifying a large block of JavaScript could get very tedious and so this is a perfect task for the computer. However, let me just say that the logic to implement a JavaScript formatter is not trivial. My code starts by extracting tokens from the input script. It then outputs those tokens along with appropriate whitespace characters.

The difficult part is tracking state information. For example, if an open parenthesis follows the plus sign, a space should separate the two. But if it follows a symbol name, it's probably the start of an argument list and no space should be added before the open parenthesis. Also, while braces are normally used wrap an indent block, this is not always the case. And so my code must "unindent" after the first statement in an indent block without braces. That statement might end with a semicolon, it could end with a closing curly brace (which means an open curly brace appeared within the indent block), or (and this is what makes JavaScript so much fun) it might not have either because trailing semicolons are optional in some cases.

While I can imagine some more sophisticated approaches to tracking state information, I tried to keep the code simple as possible. So I just used a couple of variables to track basic state information, and then put tests in the code to try and catch special cases.

The result is more of a brute-force approach and isn't quite as clean as I'd prefer. I spent a fair amount of time testing with JavaScript from a variety of sources. And, yes, I suspect there may be couple of rare constructs out there that don't format exactly right. But, for the most part, the code seems to be working pretty well.

The JavaFormatter Class

Listing 1 shows a partial listing of my JavaFormatter class. The complete source code is too long to list here but is included in the attached test project.

The first thing the code does is break the JavaScript up into tokens. This turned out to be the easy part. As stated, I used a couple of variables to track state information such as the parentheses depth and various flags for the current line. I also used a separate class (the Indents class) to track the current indentation depth, and store flags associated with each indent.

The Format() method shown in the listing is the public method called to format a script. It accepts a JavaScript string as an argument and returns the formatted result. The Format() method creates an instance of the Tokenizer class (not shown), and calls it to extract each token.

The code writes the result to a StringBuilder object, inserting whitespace as needed. When finished, the result is converted to a string and returned to the caller.

Listing 1: Partial Listing of the JavaFormatter Class

/// <summary>
/// Formats the given JavaScript string.
/// </summary>
/// <param name="javascript">JavaScript script to format</param>
/// <returns>The formatted string</returns>
public string Format(string javascript)
{
    _builder = new StringBuilder(javascript.Length);
    _indents = new Indents();
    _parenCount = 0;
    _bracketCount = 0;
    _lineFlags = LineFlags.None;
    _nextLineFlags = LineFlags.None;

    Tokenizer tokenizer = new Tokenizer(javascript);
    bool endLine = false;        // Cause new line
    bool isLineStart = true;    // Current token is first on line
    Token peek = null;

    // Process each token in input string
    while (tokenizer.GetToken())
    {
        // Get current token
        Token token = tokenizer.Token;

        // Test for new line
        if (_builder.Length > 0)
        {
            isLineStart = endLine;
            if (endLine)
            {
                NewLine();
                endLine = false;
            }
        }

        // Process this token
        switch (token.Type)
        {
            case TokenTypes.OpenBrace:
                if (!isLineStart)
                {
                    if (OpenBraceOnNewLine && _builder.Length > 0)
                    {
                        // Put open brace on new line
                        NewLine();
                    }
                    else
                    {
                        // Put open brace on same line
                        if (token.PreviousType != TokenTypes.OpenParen &&
                            token.PreviousType != TokenTypes.OpenBracket)
                            _builder.Append(' ');
                    }
                }

                // Write token
                _builder.Append(token.Value);

                // Start new indent block
                peek = tokenizer.PeekToken();
                if (peek.Type == TokenTypes.CloseBrace)
                {
                    // Special handling for "{}"
                    tokenizer.GetToken();
                    _builder.Append(tokenizer.Token.Value);
                    peek = tokenizer.PeekToken();
                    if (peek.Type != TokenTypes.SemiColon &&
                        peek.Type != TokenTypes.Comma)
                    {
                        // Unindent if in conditional block without braces
                        while (_indents.Current.HasFlag(IndentFlags.NoBraces))
                            _indents.Unindent();
                        endLine = true;
                    }
                    else if (peek.Type == TokenTypes.Comma)
                    {
                        // Normally, we insert a new line after
                        // a closing brace and comma but not here
                        tokenizer.GetToken();
                        _builder.Append(tokenizer.Token.Value);
                    }
                }
                else
                {
                    // Increase indentation
                    IndentFlags flags = IndentFlags.None;
                    if (_lineFlags.HasFlag(LineFlags.DoKeyword))
                        flags |= IndentFlags.DoBlock;
                    else if (_lineFlags.HasFlag(LineFlags.CaseKeyword))
                        flags |= IndentFlags.CaseBlock;

                    _indents.Indent(flags);
                    endLine = true;
                }
                break;

            case TokenTypes.CloseBrace:
                // End indent block
                if (_indents.Current.HasFlag(IndentFlags.CaseBlock))
                {
                    // Extra unindent if in case/default block
                    _indents.Unindent();
                    if (isLineStart)
                        _indents.StripTrailingIndent(_builder);
                }

                // Unindent if in conditional block without braces
                while (_indents.Current.HasFlag(IndentFlags.NoBraces))
                    _indents.Unindent();

                // Regular unindent
                _indents.Unindent();
                if (isLineStart)
                    _indents.StripTrailingIndent(_builder);
                else
                    NewLine();
                _builder.Append(token.Value);

                // Don't unindent without braces for catch/finally
                peek = tokenizer.PeekToken();
                if (peek.Value != "catch" &&
                    peek.Value != "finally" &&
                    peek.Value != ":")
                {
                    // Unindent if in conditional block without braces
                    while (_indents.Current.HasFlag(IndentFlags.NoBraces))
                        _indents.Unindent();
                }

                if (_indents.LastIndent.HasFlag(IndentFlags.DoBlock))
                    _lineFlags |= LineFlags.EndDoBlock;

                // Insert new line after code block
                if (peek.Type != TokenTypes.SemiColon &&
                    peek.Type != TokenTypes.CloseParen &&
                    peek.Type != TokenTypes.CloseBracket &&
                    peek.Type != TokenTypes.Comma &&
                    peek.Type != TokenTypes.OpenParen &&
                    peek.Type != TokenTypes.Colon &&
                    !_lineFlags.HasFlag(LineFlags.EndDoBlock))
                {
                    endLine = true;
                }
                break;

            case TokenTypes.OpenParen:
                if (!isLineStart &&
                    token.PreviousType != TokenTypes.OpenParen &&
                    token.PreviousType != TokenTypes.UnaryPrefix &&
                    token.PreviousType != TokenTypes.CloseBracket &&
                    token.PreviousType != TokenTypes.CloseParen &&
                    token.PreviousType != TokenTypes.CloseBrace &&
                    (token.PreviousType != TokenTypes.Symbol ||
                    (_lineFlags.HasFlag(LineFlags.BlockKeyword) &&
                    _parenCount == 0)))
                    _builder.Append(' ');
                _builder.Append(token.Value);
                _parenCount++;
                break;

            case TokenTypes.CloseParen:
                // Append closing parenthesis
                _builder.Append(token.Value);
                _parenCount = Math.Max(_parenCount - 1, 0);

                // Test for indent block start without braces
                if (_parenCount == 0 &&
                    _lineFlags.HasFlag(LineFlags.BlockKeyword))
                {
                    // Examine next token
                    peek = tokenizer.PeekToken();
                    if (peek.Type != TokenTypes.OpenBrace)
                    {
                        // Single line indent with no conditions or braces
                        _indents.Indent(IndentFlags.NoBraces);
                        endLine = true;
                    }
                }
                break;

            case TokenTypes.OpenBracket:
                if (!isLineStart &&
                    token.PreviousType != TokenTypes.Symbol &&
                    token.PreviousType != TokenTypes.OpenParen &&
                    token.PreviousType != TokenTypes.CloseParen &&
                    token.PreviousType != TokenTypes.CloseBracket)
                    _builder.Append(' ');

                // Special handling for JSON syntax?
                peek = tokenizer.PeekToken();
                if (_lineFlags.HasFlag(LineFlags.JsonColon) &&
                    peek.Type != TokenTypes.CloseBracket &&
                    peek.Type == TokenTypes.OpenBrace &&
                    _parenCount == 0)
                {
                    if (OpenBraceOnNewLine)
                        NewLine();
                    _indents.Indent(IndentFlags.BracketBlock);
                    endLine = true;
                }
                _builder.Append(token.Value);
                _bracketCount++;
                break;

            case TokenTypes.CloseBracket:
                _bracketCount = Math.Max(_bracketCount - 1, 0);
                if (_indents.Current.HasFlag(IndentFlags.BracketBlock))
                {
                    _indents.Unindent();
                    if (isLineStart)
                    {
                        _indents.StripTrailingIndent(_builder);
                        _builder.Append(token.Value);
                    }
                    else
                    {
                        NewLine();
                        _builder.Append(token.Value);
                    }
                }
                else _builder.Append(token.Value);
                break;

            case TokenTypes.Symbol:

                bool blockKeyword = _blockKeywords.Contains(token.Value);

                // Special handling for else without if
                if (token.Value == "else" &&
                    tokenizer.PeekToken().Value != "if")
                    blockKeyword = true;

                // Special handling for switch..case..default
                if (_indents.Current.HasFlag(IndentFlags.CaseBlock) &&
                    (token.Value == "case" ||
                    token.Value == "default"))
                {
                    _indents.StripTrailingIndent(_builder);
                    _indents.Unindent();
                }

                if (_parenCount == 0 && blockKeyword)
                {
                    // Keyword that starts an indented block
                    if (!isLineStart)
                        _builder.Append(' ');
                    // Append this symbol
                    _builder.Append(token.Value);

                    if (!_lineFlags.HasFlag(LineFlags.EndDoBlock) ||
                        token.Value != "while")
                    {
                        // Test for special-case blocks
                        if (token.Value == "do")
                            _lineFlags |= LineFlags.DoKeyword;
                        // Examine next token
                        peek = tokenizer.PeekToken();
                        if (peek.Type == TokenTypes.OpenBrace ||
                            peek.Type == TokenTypes.OpenParen)
                        {
                            // Handle indentation at ')' or '{'
                            _lineFlags |= LineFlags.BlockKeyword;
                        }
                        else
                        {
                            // Single line indent with no conditions or braces
                            IndentFlags flags = IndentFlags.NoBraces;
                            if (_lineFlags.HasFlag(LineFlags.DoKeyword))
                                flags |= IndentFlags.DoBlock;

                            _indents.Indent(flags);
                            endLine = true;
                        }
                    }
                }
                else
                {
                    // All other symbols
                    if (!isLineStart &&
                        token.PreviousType != TokenTypes.OpenParen &&
                        token.PreviousType != TokenTypes.OpenBracket &&
                        token.PreviousType != TokenTypes.UnaryPrefix &&
                        token.PreviousType != TokenTypes.Dot)
                        _builder.Append(' ');

                    // Flag line for case block
                    if (token.Value == "case" || token.Value == "default")
                        _lineFlags |= LineFlags.CaseKeyword;

                    _builder.Append(token.Value);
                }
                break;

            case TokenTypes.String:
            case TokenTypes.Number:
            case TokenTypes.RegEx:
                // Emit constant
                if (!isLineStart &&
                    token.PreviousType != TokenTypes.OpenParen &&
                    token.PreviousType != TokenTypes.OpenBracket &&
                    token.PreviousType != TokenTypes.UnaryPrefix)
                    _builder.Append(' ');
                _builder.Append(token.Value);
                break;

            case TokenTypes.SemiColon:
                _builder.Append(token.Value);
                if (_parenCount == 0)
                {
                    // Unindent if in conditional block without braces
                    while (_indents.Current.HasFlag(IndentFlags.NoBraces))
                        _indents.Unindent();
                    if (_indents.LastIndent.HasFlag(IndentFlags.DoBlock))
                        _nextLineFlags |= LineFlags.EndDoBlock;

                    // Determine if end of single-line indent block
                    peek = tokenizer.PeekToken();
                    if (peek.Type == TokenTypes.LineComment ||
                        peek.Type == TokenTypes.InlineComment)
                    {
                        bool newLine;
                        if (peek.Type == TokenTypes.LineComment)
                            newLine = NewLineBeforeLineComment;
                        else
                            newLine = NewLineBeforeInlineComment;

                        tokenizer.GetToken();
                        if (newLine)
                            NewLine();
                        else
                            _builder.Append(' ');
                        _builder.Append(tokenizer.Token.Value);
                    }

                    endLine = true;
                }
                break;

            case TokenTypes.Comma:
                _builder.Append(token.Value);
                // Append newline if it looks like JSON syntax
                if (token.PreviousType == TokenTypes.CloseBrace ||
                    (_lineFlags.HasFlag(LineFlags.JsonColon) &&
                    _parenCount == 0 &&
                    _bracketCount == 0 &&
                    _indents.Count > 0))
                    endLine = true;
                break;

            case TokenTypes.Colon:
                if (!_lineFlags.HasFlag(LineFlags.CaseKeyword))
                {
                    // Standard colon handling
                    if (!isLineStart &&
                        (_lineFlags.HasFlag(LineFlags.QuestionMark) ||
                        token.PreviousType == TokenTypes.CloseBrace))
                        _builder.Append(' ');
                    _builder.Append(token.Value);
                    // May be JSON syntax
                    if (!_lineFlags.HasFlag(LineFlags.QuestionMark))
                        _lineFlags |= LineFlags.JsonColon;
                }
                else
                {
                    // Special handling for case and default
                    _builder.Append(token.Value);
                    _indents.Indent(IndentFlags.CaseBlock);
                    endLine = true;
                }
                break;

            case TokenTypes.QuestionMark:
                _lineFlags |= LineFlags.QuestionMark;
                if (!isLineStart)
                    _builder.Append(' ');
                _builder.Append(token.Value);
                break;

            case TokenTypes.BinaryOperator:
            case TokenTypes.UnaryPrefix:
                if (!isLineStart &&
                    token.PreviousType != TokenTypes.OpenParen &&
                    token.PreviousType != TokenTypes.OpenBracket &&
                    token.PreviousType != TokenTypes.UnaryPrefix)
                    _builder.Append(' ');
                _builder.Append(token.Value);
                break;

            case TokenTypes.LineComment:
                // Separate line comment from previous token
                if (!isLineStart)
                {
                    if (NewLineBeforeLineComment)
                        NewLine();                // Separate with new line
                    else
                        _builder.Append(' ');    // Separate with space
                }
                // Append comment
                _builder.Append(token.Value);
                // Line comment always followed by new line
                endLine = true;
                break;

            case TokenTypes.InlineComment:
                // Separate line comment from previous token
                if (!isLineStart)
                {
                    if (NewLineBeforeInlineComment)
                        NewLine();                // Separate with new line
                    else
                        _builder.Append(' ');    // Separate with space
                }
                // Append comment
                _builder.Append(token.Value);
                // New line after comment
                if (NewLineAfterInlineComment)
                    endLine = true;
                break;

            default:
                _builder.Append(token.Value);
                break;
        }
    }

    _builder.AppendLine();

    return _builder.ToString();
}

/// <summary>
/// Emits a new line to the output string.
/// </summary>
protected void NewLine()
{
    _builder.AppendLine();
    _builder.Append(_indents.ToString());

    _bracketCount = _parenCount = 0;
    _lineFlags = _nextLineFlags;
    _nextLineFlags = LineFlags.None;
}

To use the code, simply create an instance of the JavaFormatter class and call its Format() method.

There are also four Boolean properties that affect how the script is formatted. OpenBraceOnNewLine determines if a new line should be inserted before an opening curly brace. NewLineBeforeLineComment determines if a new line should be inserted before a line comment. And NewLineBeforeInlineComment and NewLineAfterInlineComment determine if a new line should be inserted before and after an inline comment.

Conclusion

I haven't gone into great detail about how the code works. The fact is that there really wasn't any slick algorithm employed here. The core logic is just based on various state variables and tests for special conditions.

The test project download includes all the source code and a comprehensive test project. You can use the code as is, or you can browse the source code if you want a closer look at how it works.

Update History

12/2/2012: Added support for exponential notation and fixed issues with regular expressions. Thanks to feedback from Eric Lawrence

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Microsoft Public License.

Author Information

Jonathan Wood

I'm a software and website developer working out of the greater Salt Lake City area of Utah. I've developed many websites including Black Belt Coder, Trail Calendar, and others.

I hike each week with my dogs Suki and Sasha. You can see my hiking blog at Hiking Salt Lake.