Login


Converting Text to a URL-Friendly Slug

By Jonathan Wood on 12/1/2010 (Updated on 12/12/2010)
Language: C#
Technology: .NET
Platform: Windows
License: CPOL
Views: 15,518
General Programming » Text Handling » HTML & URLs » Converting Text to a URL-Friendly Slug

Text to Slug Demo Program

Download Source Code Download Source Code

Introduction

By now, many of you have seen where a URL contains text similar to the the page's title. For example, the URL may look like http://www.domain.com/this-is-my-best-article.aspx instead of http://www.domain.com/bestarticle.aspx. Text converted from a regular string, that can appear within a URL this way is called a "slug."

Not only is a slug a little more human-readable, but it can also help indicate to search engines like Google and Bing what keywords are important to your page.

My ConvertTextToSlug Method

There's no built-in .NET function to convert a string to a slug. I found a few examples on the web but didn't find any I really liked. So I decided to roll my own in C#. Listing 1 shows my TextToSlug() method. It takes any string and makes it safe to include as part of a URL.

Initially, I started by looking for an official, comprehensive list of all characters that are illegal within a valid URL. I thought I'd be slick by allowing all characters that could legally be included. However, the more I thought about it, the more I thought that removing all punctuation looked much cleaner. So my code strips characters from the original string that could legitimately be included in a URL.

Listing 1: ConvertTextToSlug() Method

/// <summary>
/// Creates a "slug" from text that can be used as part of a valid URL.
/// 
/// Invalid characters are converted to hyphens. Punctuation that is perfect valid in
/// a URL is also converted to hyphens to keep the result mostly text. Steps are taken
/// to prevent leading, trailing, and consecutive hyphens.
/// </summary>
/// <param name="s">String to convert to a slug</param>
/// <returns></returns>
public static string TextToSlug(string s)
{
    StringBuilder sb = new StringBuilder();
    bool wasHyphen = true;

    foreach (char c in s)
    {
        if (char.IsLetterOrDigit(c))
        {
            sb.Append(char.ToLower(c));
            wasHyphen = false;
        }
        else if (c != '\'' && !wasHyphen)
        {
            sb.Append('-');
            wasHyphen = true;
        }
    }

    // Avoid trailing hyphens
    if (wasHyphen && sb.Length > 0)
        sb.Length--;

    return sb.ToString();
}

Some examples I found on the web used regular expressions. My routine is simpler. It just iterates through each character in the string, appending it to the result if it's either a letter or a character. If I encounter a space, I append a hyphen (-).

The code takes steps to prevent consecutive hyphens, keeping the result looking cleaner. It also takes steps to prevent leading and trailing hyphens.

Conclusion

As you can see, it's a very simple routine. But it seems to produce good results. Of course, if you decide to name your documents this way, it'll be up to you to ensure you correctly handle different titles that resolve to the same slug.

End-User License

Use of this article and any related source code or other files is governed by the terms and conditions of The Code Project Open License.

Author Information

Jonathan Wood

I'm a software/website developer working out of the greater Salt Lake City area in Utah. I've developed many websites including Black Belt Coder, Insider Articles, and others.

I hike each week with my dogs Suki and Sasha. You can see my hiking blog at Hiking Salt Lake.