During a code review I was told that a compiled regex would work faster. I had no doubt that this was true but I wanted to know how much faster and at what cost. I setup and ran the following test.
static void TestCompiledRegex()
{
string regexString = @"(\{{0,1}([0-9a-fA-F]){8}-([0-9a-fA-F]){4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){12}\}{0,1})";
Regex compiledRegex = new Regex(regexString, RegexOptions.Compiled);
Regex uncompiledRegex = new Regex(regexString, RegexOptions.None);
double totalFaster = 0d;
int numIterations = 0;
for (int j = 10; j <= 100; j += 10)
{
TimeSpan uncompiledTime = RunRegex(uncompiledRegex, j);
TimeSpan compiledTime = RunRegex(compiledRegex, j);
double timesFaster = uncompiledTime.TotalMilliseconds / compiledTime.TotalMilliseconds;
Console.WriteLine("For {0} GUIDS compiled takes {1} and non-compiled takes {2}. Compiled is {3:0.00} faster", j, compiledTime, uncompiledTime, timesFaster);
totalFaster += timesFaster;
numIterations++;
}
Console.WriteLine("Average times faster: {0:0.00}", totalFaster/(double)numIterations);
Console.ReadLine();
}
static TimeSpan RunRegex(Regex regex, int numGuids)
{
int x;
string input = GetStringWithGuids(numGuids);
DateTime startTime = DateTime.Now;
for (int i = 0; i < 10000; i++)
{
MatchCollection mc = regex.Matches(input);
x = mc.Count;
}
return DateTime.Now - startTime;
}
static string GetStringWithGuids(int numGuids)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < numGuids; i++)
{
sb.Append(Guid.NewGuid().ToString() + " spacer ");
}
return sb.ToString();
}
The result was that the compiled regex ran 1.6 times faster than the uncompiled version.
I then split the RunRegex function into two functions and moved the creation of the Regex object into these functions making one of them compiled and the other not.
On text with 10 GUIDs in it the uncompiled version ran 42 times faster than the compiled version. The number of times faster diminished as the number of GUID's (matches) in the string increased until the uncompiled version was six times faster when we had 100 matches in the string.
The next test was to decrease the number of matches to non-matches in the text so I adjusted the GetStringWithGuids() function to add 100 "spacers" between each GUID. Remember that the listed GetStringWithGuids() above has 1 match (GUID) per non-match(" spacer "). The new function looked like this:
static string GetStringWithGuids(int numGuids)
{
StringBuilder sb = new StringBuilder();
string spacer = String.Join(" ", Enumerable.Repeat("spacer", 100).ToArray());
for (int i = 0; i < numGuids; i++)
{
sb.Append(Guid.NewGuid().ToString() + " " + spacer + " ");
}
return sb.ToString();
}
For 10 GUIDs the uncompiled version performed 2.7 times better but at 50 GUIDs the compiled version started performing better through to 100 GUIDs.
So the only test left was the one that was truly representative of the data that I was going to run this against which was a block of text with a single GUID in it.
The new GetStringWithGuids() function with redundant parameter looked like this:
static string GetStringWithGuids(int numGuids)
{
StringBuilder sb = new StringBuilder();
string spacer = String.Join(" ", Enumerable.Repeat("spacer", 100).ToArray());
sb.Append(spacer + " " + Guid.NewGuid().ToString() + " " + spacer);
return sb.ToString();
}
This showed the uncompiled version to be 10 times faster than the compiled version.