Wednesday, December 8, 2010

Speed of Contains() on string list and hash set in C#

Although I was 99.9% certain that a Contains() on a HashSet would outperform the same on a List I needed to see it happen before I could sleep again.

Here is the code that I used to test it.

int iterateNum = 100000000;
string longString = "this is a very very very long string that is in the list";
string searchString = "this is the search string";

List<string> stringList = new List<string>();
for (int i = 0; i < iterateNum; i++)
{
    stringList.Add(longString);
}
stringList.Add(searchString);

Stopwatch stopwatch = Stopwatch.StartNew();
bool contains = stringList.Contains(searchString);
Console.WriteLine("contains on list took: " + stopwatch.Elapsed.ToString());

HashSet<string> hashSet = new HashSet<string>();
for (int i = 0; i < iterateNum; i++)
{
    hashSet.Add(longString);
}
hashSet.Add(searchString);
stopwatch.Reset(); // should be stopwatch.Restart();
contains = hashSet.Contains(searchString);
Console.WriteLine("contains on hashset took: " + stopwatch.Elapsed.ToString());

And here are the results:

contains on list took: 00:00:01.2830928
contains on hashset took: 00:00:00

i.e. Was not possible to measure how long the hash set took.

In the code I'm adding the string to find as the last string in the list to make sure that the entire list is searched before it finds the string.

Edit: As per Bill's suggestion in the first comment I changed the code to populate a smaller and more realistic list and then exercised the Contains() method multiple times to get an "average" of its performance. I also replaced the Reset() method on the Stopwatch object with Restart(). Here is the new code:

int iterateNum = 4000;
int testNum = 100000;
string longString = "this is a very very very long string that is in the list";
string searchString = "this is the search string";

List<string> stringList = new List<string>();
for (int i = 0; i < iterateNum; i++)
{
    stringList.Add(longString);
}
stringList.Add(searchString);

Stopwatch stopwatch = Stopwatch.StartNew();
for (int i = 0; i < testNum; i++)
{
    bool contains = stringList.Contains(searchString);
}
Console.WriteLine("contains on list took: " + stopwatch.Elapsed.ToString());

HashSet<string> hashSet = new HashSet<string>();
for (int i = 0; i < iterateNum; i++)
{
    hashSet.Add(longString);
}
hashSet.Add(searchString);

stopwatch.Restart(); // .NET 4 method on Stopwatch
for (int i = 0; i < testNum; i++)
{
    bool contains = hashSet.Contains(searchString);
}
Console.WriteLine("contains on hashset took: " + stopwatch.Elapsed.ToString());

And the results are:

contains on list took: 00:00:04.6986760
contains on hashset took: 00:00:00.0056837