Sunday, November 8, 2009

Is this string numeric in C# part 3

Following on from Is this string numeric in C# part 2...

Curiosity got the better of me and I wanted to know what the IsDigit and TryParse functions did. There's a post here by Shawn Burke that details how to setup Visual Studio to allow you to step into the .NET source. From that I found the IsDigit() source to be exactly what we'd surmised:

public static bool IsDigit(char c) {
  if (IsLatin1(c)) {
    return (c >= '0' && c <= '9');
  return (CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber);

In both scenarios you'd expect Jon's function to perform better and even more so in the numeric string. When checking the numeric string every single character needs to be checked so the additional overhead for the IsLatin1() call should slow down Guy's function more. However, the results show Guy's function performing better when the string is numeric.

When the string is non numeric from the first character then you'd expect Jon's function to perform better but only just.

I can't explain why Guy's function is currently out-performing Jon's at the moment. There must be a flaw in my testing setup.

I was, however, completely wrong about the TryParse function. It does not throw an exception but it is a much larger and more nested function than I was expecting and executes dozens of code paths and this is what's slowing this down. Also, if you think about it the TryParse is converting from one type to another so the complete conversion is taking place while in our IsDigit function we are just checking to see if it's possible to convert the string to a number and not actually doing the conversion. So in addition to doing the check on each character, the TryParse is also writing those characters to a memory space to create a numeric value. That's where the performance hit comes in.

Jon also suggested modifying his function to assigning the char to a local variable before using it as the index operator makes a function call so I tried by modifying his function to:

Func<string, bool> isNumeric_Jon =
    delegate(string s)
        for (int i = 0; i < s.Length; i++)
            char c = s[i];
            if (c > '9' || c < '0')
                return false;
        return true;

However, this made no difference in the tests. My guess is that the compiler had already optimized the two identical references into one.



No comments:

Post a Comment