Login


A JavaScript Formatter

By Jonathan Wood on 5/22/2011 (Updated on 4/8/2014)
Language: C#JavaScript
Technology: .NET
Platform: Windows
License: Ms-PL
Views: 24,367
General Programming » Text Handling » General » A JavaScript Formatter

Screenshot of Test Project

Download Source Code and Test Project Download Source Code and Test Project

Introduction

Note: You can try out the code from this article using our online JavaScript Formatter.

Many websites use JavaScript, which runs on the user's browser (client side) to produce a richer browsing experience. While JavaScript can easily be embedded within a web page, larger chunks of scripting source code are often produced in a separate file.

Because these files can grow quite large, it is common practice to compact them by removing unneeded whitespace characters. This can significantly reduce the amount of bandwidth required to download them to the user's browser.

Unfortunately, removing unneeded whitespace also makes the code much more difficult to read. For developers wanting to look at the inner workings of websites they didn't create, compacted JavaScript can make understanding client script much more difficult.

This article will present a C# class to format JavaScript. While it can be used to format any JavaScript, it is probably most useful for those wanting to browse compacted JavaScript source code.

Writing a JavaScript Formatter

Modifying a large block of JavaScript could get very tedious and so this is a perfect task for the computer. However, let me just say that the logic to implement a JavaScript formatter is not trivial. My code starts by extracting tokens from the input script. It then outputs those tokens along with appropriate whitespace characters.

The difficult part is tracking state information. For example, if an open parenthesis follows the plus sign, a space should separate the two. But if it follows a symbol name, it's probably the start of an argument list and no space should be added before the open parenthesis. Also, while braces are normally used wrap an indent block, this is not always the case. And so my code must "unindent" after the first statement in an indent block without braces. That statement might end with a semicolon, it could end with a closing curly brace (which means an open curly brace appeared within the indent block), or (and this is what makes JavaScript so much fun) it might not have either because trailing semicolons are optional in some cases.

While I can imagine some more sophisticated approaches to tracking state information, I tried to keep the code simple as possible. So I just used a couple of variables to track basic state information, and then put tests in the code to try and catch special cases.

The result is more of a brute-force approach and isn't quite as clean as I'd prefer. I spent a fair amount of time testing with JavaScript from a variety of sources. And, yes, I suspect there may be couple of rare constructs out there that don't format exactly right. But, for the most part, the code seems to be working pretty well.

The JavaFormatter Class

Listing 1 shows a partial listing of my JavaFormatter class. The complete source code is too long to list here but is included in the attached test project.

The first thing the code does is break the JavaScript up into tokens. This turned out to be the easy part. As stated, I used a couple of variables to track state information such as the parentheses depth and various flags for the current line. I also used a separate class (the Indents class) to track the current indentation depth, and store flags associated with each indent.

The Format() method shown in the listing is the public method called to format a script. It accepts a JavaScript string as an argument and returns the formatted result. The Format() method creates an instance of the Tokenizer class (not shown), and calls it to extract each token.

The code writes the result to a StringBuilder object, inserting whitespace as needed. When finished, the result is converted to a string and returned to the caller.

Listing 1: Partial Listing of the JavaFormatter Class

/// <summary>
/// Formats the given JavaScript string.
/// </summary>
/// <param name="javascript">JavaScript script to format</param>
/// <returns>The formatted string</returns>
public string Format(string javascript)
{
	_builder = new StringBuilder(javascript.Length);
	_indents = new Indents();
	_parenCount = 0;
	_bracketCount = 0;
	_lineFlags = LineFlags.None;
	_nextLineFlags = LineFlags.None;

	Tokenizer tokenizer = new Tokenizer(javascript);
	bool endLine = false;		// Cause new line
	bool isLineStart = true;	// Current token is first on line
	Token peek = null;

	// Process each token in input string
	while (tokenizer.GetToken())
	{
		// Get current token
		Token token = tokenizer.Token;

		// Test for new line
		if (_builder.Length > 0)
		{
			isLineStart = endLine;
			if (endLine)
			{
				NewLine();
				endLine = false;
			}
		}

		// Process this token
		switch (token.Type)
		{
			case TokenTypes.OpenBrace:
				if (!isLineStart)
				{
					if (OpenBraceOnNewLine && _builder.Length > 0)
					{
						// Put open brace on new line
						NewLine();
					}
					else
					{
						// Put open brace on same line
						if (token.PreviousType != TokenTypes.OpenParen &&
							token.PreviousType != TokenTypes.OpenBracket)
							_builder.Append(' ');
					}
				}

				// Write token
				_builder.Append(token.Value);

				// Start new indent block
				peek = tokenizer.PeekToken();
				if (peek.Type == TokenTypes.CloseBrace)
				{
					// Special handling for "{}"
					tokenizer.GetToken();
					_builder.Append(tokenizer.Token.Value);
					peek = tokenizer.PeekToken();
					if (peek.Type != TokenTypes.SemiColon &&
						peek.Type != TokenTypes.Comma)
					{
						// Unindent if in conditional block without braces
						while (_indents.Current.HasFlag(IndentFlags.NoBraces))
							_indents.Unindent();
						endLine = true;
					}
					else if (peek.Type == TokenTypes.Comma)
					{
						// Normally, we insert a new line after
						// a closing brace and comma but not here
						tokenizer.GetToken();
						_builder.Append(tokenizer.Token.Value);
					}
				}
				else
				{
					// Increase indentation
					IndentFlags flags = IndentFlags.None;
					if (_lineFlags.HasFlag(LineFlags.DoKeyword))
						flags |= IndentFlags.DoBlock;
					else if (_lineFlags.HasFlag(LineFlags.CaseKeyword))
						flags |= IndentFlags.CaseBlock;

					_indents.Indent(flags);
					endLine = true;
				}
				break;

			case TokenTypes.CloseBrace:
				// End indent block
				if (_indents.Current.HasFlag(IndentFlags.CaseBlock))
				{
					// Extra unindent if in case/default block
					_indents.Unindent();
					if (isLineStart)
						_indents.StripTrailingIndent(_builder);
				}

				// Unindent if in conditional block without braces
				while (_indents.Current.HasFlag(IndentFlags.NoBraces))
					_indents.Unindent();

				// Regular unindent
				_indents.Unindent();
				if (isLineStart)
					_indents.StripTrailingIndent(_builder);
				else
					NewLine();
				_builder.Append(token.Value);

				// Don't unindent without braces for catch/finally
				peek = tokenizer.PeekToken();
				if (peek.Value != "catch" &&
					peek.Value != "finally" &&
					peek.Value != ":")
				{
					// Unindent if in conditional block without braces
					while (_indents.Current.HasFlag(IndentFlags.NoBraces))
						_indents.Unindent();
				}

				if (_indents.LastIndent.HasFlag(IndentFlags.DoBlock))
					_lineFlags |= LineFlags.EndDoBlock;

				// Insert new line after code block
				if (peek.Type != TokenTypes.SemiColon &&
					peek.Type != TokenTypes.CloseParen &&
					peek.Type != TokenTypes.CloseBracket &&
					peek.Type != TokenTypes.Comma &&
					peek.Type != TokenTypes.OpenParen &&
					peek.Type != TokenTypes.Colon &&
					!_lineFlags.HasFlag(LineFlags.EndDoBlock))
				{
					endLine = true;
				}
				break;

			case TokenTypes.OpenParen:
				if (!isLineStart &&
					token.PreviousType != TokenTypes.OpenParen &&
					token.PreviousType != TokenTypes.UnaryPrefix &&
					token.PreviousType != TokenTypes.CloseBracket &&
					token.PreviousType != TokenTypes.CloseParen &&
					token.PreviousType != TokenTypes.CloseBrace &&
					(token.PreviousType != TokenTypes.Symbol ||
					(_lineFlags.HasFlag(LineFlags.BlockKeyword) &&
					_parenCount == 0)))
					_builder.Append(' ');
				_builder.Append(token.Value);
				_parenCount++;
				break;

			case TokenTypes.CloseParen:
				// Append closing parenthesis
				_builder.Append(token.Value);
				_parenCount = Math.Max(_parenCount - 1, 0);

				// Test for indent block start without braces
				if (_parenCount == 0 &&
					_lineFlags.HasFlag(LineFlags.BlockKeyword))
				{
					// Examine next token
					peek = tokenizer.PeekToken();
					if (peek.Type != TokenTypes.OpenBrace)
					{
						// Single line indent with no conditions or braces
						_indents.Indent(IndentFlags.NoBraces);
						endLine = true;
					}
				}
				break;

			case TokenTypes.OpenBracket:
				if (!isLineStart &&
					token.PreviousType != TokenTypes.Symbol &&
					token.PreviousType != TokenTypes.OpenParen &&
					token.PreviousType != TokenTypes.CloseParen &&
					token.PreviousType != TokenTypes.CloseBracket)
					_builder.Append(' ');

				// Special handling for JSON syntax?
				peek = tokenizer.PeekToken();
				if (_lineFlags.HasFlag(LineFlags.JsonColon) &&
					peek.Type != TokenTypes.CloseBracket &&
					peek.Type == TokenTypes.OpenBrace &&
					_parenCount == 0)
				{
					if (OpenBraceOnNewLine)
						NewLine();
					_indents.Indent(IndentFlags.BracketBlock);
					endLine = true;
				}
				_builder.Append(token.Value);
				_bracketCount++;
				break;

			case TokenTypes.CloseBracket:
				_bracketCount = Math.Max(_bracketCount - 1, 0);
				if (_indents.Current.HasFlag(IndentFlags.BracketBlock))
				{
					_indents.Unindent();
					if (isLineStart)
					{
						_indents.StripTrailingIndent(_builder);
						_builder.Append(token.Value);
					}
					else
					{
						NewLine();
						_builder.Append(token.Value);
					}
				}
				else _builder.Append(token.Value);
				break;

			case TokenTypes.Symbol:

				bool blockKeyword = _blockKeywords.Contains(token.Value);

				// Special handling for else without if
				if (token.Value == "else" &&
					tokenizer.PeekToken().Value != "if")
					blockKeyword = true;

				// Special handling for switch..case..default
				if (_indents.Current.HasFlag(IndentFlags.CaseBlock) &&
					(token.Value == "case" ||
					token.Value == "default"))
				{
					_indents.StripTrailingIndent(_builder);
					_indents.Unindent();
				}

				if (_parenCount == 0 && blockKeyword)
				{
					// Keyword that starts an indented block
					if (!isLineStart)
						_builder.Append(' ');
					// Append this symbol
					_builder.Append(token.Value);

					if (!_lineFlags.HasFlag(LineFlags.EndDoBlock) ||
						token.Value != "while")
					{
						// Test for special-case blocks
						if (token.Value == "do")
							_lineFlags |= LineFlags.DoKeyword;
						// Examine next token
						peek = tokenizer.PeekToken();
						if (peek.Type == TokenTypes.OpenBrace ||
							peek.Type == TokenTypes.OpenParen)
						{
							// Handle indentation at ')' or '{'
							_lineFlags |= LineFlags.BlockKeyword;
						}
						else
						{
							// Single line indent with no conditions or braces
							IndentFlags flags = IndentFlags.NoBraces;
							if (_lineFlags.HasFlag(LineFlags.DoKeyword))
								flags |= IndentFlags.DoBlock;

							_indents.Indent(flags);
							endLine = true;
						}
					}
				}
				else
				{
					// All other symbols
					if (!isLineStart &&
						token.PreviousType != TokenTypes.OpenParen &&
						token.PreviousType != TokenTypes.OpenBracket &&
						token.PreviousType != TokenTypes.UnaryPrefix &&
						token.PreviousType != TokenTypes.Dot)
						_builder.Append(' ');

					// Flag line for case block
					if (token.Value == "case" || token.Value == "default")
						_lineFlags |= LineFlags.CaseKeyword;

					_builder.Append(token.Value);
				}
				break;

			case TokenTypes.String:
			case TokenTypes.Number:
			case TokenTypes.RegEx:
				// Emit constant
				if (!isLineStart &&
					token.PreviousType != TokenTypes.OpenParen &&
					token.PreviousType != TokenTypes.OpenBracket &&
					token.PreviousType != TokenTypes.UnaryPrefix)
					_builder.Append(' ');
				_builder.Append(token.Value);
				break;

			case TokenTypes.SemiColon:
				_builder.Append(token.Value);
				if (_parenCount == 0)
				{
					// Unindent if in conditional block without braces
					while (_indents.Current.HasFlag(IndentFlags.NoBraces))
						_indents.Unindent();
					if (_indents.LastIndent.HasFlag(IndentFlags.DoBlock))
						_nextLineFlags |= LineFlags.EndDoBlock;

					// Determine if end of single-line indent block
					peek = tokenizer.PeekToken();
					if (peek.Type == TokenTypes.LineComment ||
						peek.Type == TokenTypes.InlineComment)
					{
						bool newLine;
						if (peek.Type == TokenTypes.LineComment)
							newLine = NewLineBeforeLineComment;
						else
							newLine = NewLineBeforeInlineComment;

						tokenizer.GetToken();
						if (newLine)
							NewLine();
						else
							_builder.Append(' ');
						_builder.Append(tokenizer.Token.Value);
					}

					endLine = true;
				}
				break;

			case TokenTypes.Comma:
				_builder.Append(token.Value);
				// Append newline if it looks like JSON syntax
				if (token.PreviousType == TokenTypes.CloseBrace ||
					(_lineFlags.HasFlag(LineFlags.JsonColon) &&
					_parenCount == 0 &&
					_bracketCount == 0 &&
					_indents.Count > 0))
					endLine = true;
				break;

			case TokenTypes.Colon:
				if (!_lineFlags.HasFlag(LineFlags.CaseKeyword))
				{
					// Standard colon handling
					if (!isLineStart &&
						(_lineFlags.HasFlag(LineFlags.QuestionMark) ||
						token.PreviousType == TokenTypes.CloseBrace))
						_builder.Append(' ');
					_builder.Append(token.Value);
					// May be JSON syntax
					if (!_lineFlags.HasFlag(LineFlags.QuestionMark))
						_lineFlags |= LineFlags.JsonColon;
				}
				else
				{
					// Special handling for case and default
					_builder.Append(token.Value);
					_indents.Indent(IndentFlags.CaseBlock);
					endLine = true;
				}
				break;

			case TokenTypes.QuestionMark:
				_lineFlags |= LineFlags.QuestionMark;
				if (!isLineStart)
					_builder.Append(' ');
				_builder.Append(token.Value);
				break;

			case TokenTypes.BinaryOperator:
			case TokenTypes.UnaryPrefix:
				if (!isLineStart &&
					token.PreviousType != TokenTypes.OpenParen &&
					token.PreviousType != TokenTypes.OpenBracket &&
					token.PreviousType != TokenTypes.UnaryPrefix)
					_builder.Append(' ');
				_builder.Append(token.Value);
				break;

			case TokenTypes.LineComment:
				// Separate line comment from previous token
				if (!isLineStart)
				{
					if (NewLineBeforeLineComment)
						NewLine();		// Separate with new line
					else
						_builder.Append(' ');	// Separate with space
				}
				// Append comment
				_builder.Append(token.Value);
				// Line comment always followed by new line
				endLine = true;
				break;

			case TokenTypes.InlineComment:
				// Separate line comment from previous token
				if (!isLineStart)
				{
					if (NewLineBeforeInlineComment)
						NewLine();		// Separate with new line
					else
						_builder.Append(' ');	// Separate with space
				}
				// Append comment
				_builder.Append(token.Value);
				// New line after comment
				if (NewLineAfterInlineComment)
					endLine = true;
				break;

			default:
				_builder.Append(token.Value);
				break;
		}
	}

	_builder.AppendLine();

	return _builder.ToString();
}

/// <summary>
/// Emits a new line to the output string.
/// </summary>
protected void NewLine()
{
	_builder.AppendLine();
	_builder.Append(_indents.ToString());

	_bracketCount = _parenCount = 0;
	_lineFlags = _nextLineFlags;
	_nextLineFlags = LineFlags.None;
}

To use the code, simply create an instance of the JavaFormatter class and call its Format() method.

There are also four Boolean properties that affect how the script is formatted. OpenBraceOnNewLine determines if a new line should be inserted before an opening curly brace. NewLineBeforeLineComment determines if a new line should be inserted before a line comment. And NewLineBeforeInlineComment and NewLineAfterInlineComment determine if a new line should be inserted before and after an inline comment.

Conclusion

I haven't gone into great detail about how the code works. The fact is that there really wasn't any slick algorithm employed here. The core logic is just based on various state variables and tests for special conditions.

The test project download includes all the source code and a comprehensive test project. You can use the code as is, or you can browse the source code if you want a closer look at how it works.

Update History

12/2/2012: Added support for exponential notation and fixed issues with regular expressions. Thanks to feedback from Eric Lawrence

4/8/2014: Corrected an issue where support for exponential notation caused problems with the 'e' in hexadecimal numbers.

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Microsoft Public License.

Author Information

Jonathan Wood

I'm a software/website developer working out of the greater Salt Lake City area in Utah. I've developed many websites including Black Belt Coder, Insider Articles, and others.