Saturday, November 7, 2009

Is this string numeric in C# part 2

I heard back from Bill and Jon with some suggestions about improving the algorithm for speed in the IsNumeric(string) function documented at Is this string numeric in C#?

Bill's idea was to use the return value from Int64.TryParse() and Jon's idea was to compare the values of each character to their digital character equivalents and check the greater or less than in the ascii sequence using if (s[i] > '9' || s[i] < '0').

My initial gut-feel reaction was that Jon's would be faster and Bill's slower. The reason that I thought that Jon's would be faster is because I am under the impression that IsDigit() checks other digits outside of the 0 to 9 range to take into account other digit sets such as Thai. Bill's I thought would be slower because I think that TryParse() catches a thrown exception when returning false and so the expense of exception handling would cause it to be slow.

The first problem that I hit was that the original numeric string that I used:

string numericString = "12345678901234567890123456";

was too large for Int64. I didn't pick this up at first and when I ran the first set of tests they completed turned my expectations on their head with Bill's being fastest and then mine and then Jon's. However, these results were invalid as the Int64.TryParse() call was not returning the correct results.

I fixed the test code to look like this and re-ran it all.

static string numericString = "1234567890123456";
static string nonNumericString = "abcdefghijklmnopqrstuvwxyz";
static void DigitTesting()
{
    Func<string, bool> isNumeric_Guy =
        delegate(string s)
        {
            for (int i = 0; i < s.Length; i++)
            {
                if (!Char.IsDigit(s[i]))
                {
                    return false;
                }
            }
            return true;
        };

    Func<string, bool> isNumeric_Jon =
        delegate(string s)
        {
            for (int i = 0; i < s.Length; i++)
            {
                if (s[i] > '9' || s[i] < '0')
                {
                    return false;
                }
            }
            return true;
        };
    Func<string, bool> isNumeric_Bill =
        delegate(string s)
        {
            Int64 number;
            if (Int64.TryParse(s, out number))
            {
                return true;
            }
            return false;
        };

    Func<string, bool>[] functions = {isNumeric_Guy, isNumeric_Jon, isNumeric_Bill };

    foreach (Func<string, bool> function in functions)
    {
        DateTime startTime = DateTime.Now;
        for (int i = 0; i < 100000000; i++)
        {
            function(numericString);
        }
        Console.WriteLine("numberic: {0} time taken: {1}",
            function.Method.Name, DateTime.Now - startTime);
    }
    Console.ReadLine();
}


Against a numeric string IsNumeric_Guy() turned out to be about 15% faster than IsNumeric_Jon() and about 31% faster than IsNumeric_Bill().

When testing against a non-numeric string IsNumeric_Jon() showed up 16% faster than IsNumeric_Guy() and 87% faster than IsNumeric_Bill(). This is how I would have expected the results to be in both cases.

This is an interesting conundrum to end up at with two algorithms both beating each other by the same amount under different data types. Luckily I know that most of the data that this algorithm will be looking at will be non-numeric strings and so I will go with Jon's algorithm.

Thanks Bill and Jon - your ideas and help are much appreciated.

Continued in Is this string numeric in C# part 3.

1 comment:

  1. The problem with my suggestions has been that I don't know the data you're expecting. If I were trying to determine whether a value coming from a varchar column called "shopper ID" was numeric versus string, then I would expect that the data would *generally* be numeric and within 64-bit integer parameters. So a TryParse would *generally* not hit an exception or yield incorrect results.
    But with a different dataset, those expectations are out the window. I don't program to the objective best algorithm but the best algorithm for a particular task. In that particular task, my algorithm might perform far better than the objective best (best for the widest, most general situation). If the task changes, then the algorithm must be re-evaluated.

    ReplyDelete