Sunday, 8 May 2011

Dot.Net: How to Delete Accents from String

Question:
What is an easy way to delete accents from a string in CSharp ?
How to remove diacritic marks from characters in CSharp ?
How to remove accent marks from a string?
How to transform a string containing accents into a string without accents? (Each accented letter is transformed into its accent-less equivalent)


Answer:
To resume: how to get rid of àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ ???
The easiest way is to use the string.Normalize method and then filter out all non-spacing marks.

Here is an extension method.
public static string RemoveDiacritics(this string input)
{
    string result = "";
    if (!string.IsNullOrEmpty(input))
    {
        input = input.Normalize(NormalizationForm.FormD);

        StringBuilder sb = new StringBuilder();

        for (int i= 0; i < input.Length; i ++)
        {
            if (CharUnicodeInfo.GetUnicodeCategory(input[i]) != UnicodeCategory.NonSpacingMark)
            {
                sb.Append(input[i]);
            }
        }
        result = sb.ToString();
    }
    return result;
}
An even shorter version
string normalized = cInput.Normalise(NormalizationForm.FormD);
IEnumerable<char> selected = normalized.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
string result = new String(selected.ToArray());

happy programming ...

No comments: