Thursday, December 31, 2009

Five best things I didn't get for Christmas

I'm posting this here so that I can find it again. This is Life Hacker's most popular Hive Five Topics of 2009.

I hadn't seen this before I bought my Netbook but it turns out that the one I bought was an update to the most voted Netbook on their Five Best Netbooks posting.

Wednesday, December 23, 2009

Managing IP addresses as integers

I was recently involved in a discussion about IP addresses and discovered that not everyone understands an IP address. I was told that an IP address is not an integer. Well, it is an integer, a 32-bit integer, but it is rarely represented as such. It is usually represented as four octets joined together with periods.

This is what most people know as an IP address: and if I showed you the value 2130706433 you probably wouldn't recognize it as the same IP address.

I blogged some extension methods in Converting a string IP to and int and back that will help you if you need to convert an IP between the two formats using C#. The code is fairly obvious and easily ported to other languages.

The question often asked is why do you need to convert it?

The answer is speed and space.

Most often, when you receive an IP address it is in the string notation. Storing it in this notation inefficient because it takes between 7 ( and 15 ( bytes to store each address. Compare this to storing it as an Int32 which takes 4 bytes each time. A space saving of between 42% and 73%.

My advice is to convert it into an Int32 as soon as you receive it and then work with it as such from there onwards. Working with an IP as an Int32 will be faster because sorting and comparing Int32's will outstrip the same when using strings. Setting up an index against an Int32 column in a database will also be faster and smaller than doing the same against a varchar column.

When you need to show an integer IP to a user you'll need to convert it back to its string notation. This is easily done with the extension methods linked above.

The only draw back to using this approach for managing IP addresses is when you list IP's directly from your database. If they were stored as strings they'd be easy to recognize but listed as integers they're pretty useless.

Converting a string IP to an int and back

I often need to convert a string IP to an Int32 and then back again. To facilitate converting an Int32 to its string representation I have added this extension method to my IntExtensions class in C#:

public static class IntExtensions
    /// <summary>
    /// Converts an integer that represents an IPv4 to the string equivalent
    /// Source:
    /// </summary>
    /// <param name="solidIP">The integer representation of an IPv4</param>
    /// <returns>A string of the format x.x.x.x representing an IPv4</returns>
    public static string ToIPv4(this int solidIP)
        byte ip1 = (byte)(solidIP >> 24);
        byte ip2 = (byte)(solidIP >> 16);
        byte ip3 = (byte)(solidIP >> 8);
        byte ip4 = (byte)(solidIP);
        return ip1 + "." + ip2 + "." + ip3 + "." + ip4;

And to convert the string representation of an IP to an Int32 I have added this extension method to my StringExtensions class:

public static class StringExtensions
    /// <summary>
    /// Converts a string that holds an IPv4 address with the pattern x.x.x.x
    /// to the integer equivalent. If the string cannot be split into 4 parts
    /// using a . or if the 4 parts cannot by parsed
    /// into bytes then 0 will be returned.
    /// Source:
    /// </summary>
    /// <param name="IP">A string representing an IPv4 address</param>
    /// <returns>An integer representing an IPv4 address or zero if failure</returns>
    public static int ToIPv4(this string IP)
            string[] fourIP = IP.Split('.');

            Int32 ip =
                (Byte.Parse(fourIP[0]) << 24) +
                (Byte.Parse(fourIP[1]) << 16) +
                (Byte.Parse(fourIP[2]) << 8) +

            return ip;
            return 0;


Monday, December 21, 2009

Vertical Wireless Router

While I was researching the buying of a new wireless router I read a customer review comment on a router that said something about a particular router/model that would overheat if left in the horizontal position which would cause connections to be dropped. This intrigued me because my router/modem/wireless access point has always lived in the horizontal position and is prone to rebooting when a wireless device connects to it. So my theory is that more of this device is in use when a wireless device is connected to it so it might get hotter.
Could this be the problem with the flaky wireless that I have?
Well after reading that I tried standing the router vertically and have been aggressivley using the wireless and so far (a couple of days) we have had no disconnects. The evidence points to the overheating in the horizontal position, so far. The jury is still out on this one though because I know as well as anyone else that this could be a fluke and we may return to flakiness in the near future. I'll update this post in a couple of months with more info.

Saturday, December 19, 2009

Belkin N+ Wireless Router

The ASUS Netbook that I mentioned in this post arrived yesterday and I've started playing with it. So far it's good and a bit faster than I was expecting. The only two changes that I was going to make to it before buying it were to replace the 250GB drive with a 128GB SSD (Solid State Drive) and upgrade the RAM from 1GB to 2GB. However, it's a lot nippier than expected so will not worry about that yet. It arrived with Windows 7 Starter. I stuck in a thumb drive and immediately installed Windows 7 Ultimate. It was really tricky getting it to boot from the thumb drive because the hot key that you had to hit to get to the BIOS and then to pick the "boot from USB option" was difficult to time. You basically had to hit the escape key twice immediately after it was powered on.

It supports the new (yet to be finished) higher speed network standard of 802.11 N. My current wireless access point goes up to G and is flaky (update on why it might be flaky) often knocking the modem offline if a wireless device is connected to it. As such I've just been doing a whole ton of research to see what I can add to my network that will give me that best quality wireless and hopefully last several years. I hate shopping for stuff like this that I'm not expert in as I never know if I'm going to get a good deal or not. I'm also suspicious of the specs published by many of the comparison sites and even the manufacturer as they are sometimes inaccurate and I never know if the customer reviews are written by the company trying to boost their product or by a competitor trying to discredit their product.

Anyway I had to make a decision and finally went with the BELKIN N+ Wireless Router. I was comparing this to the BELKIN N Wireless Router and the only difference that I could find was that the N+ has gigabit ports while the N has 10/100MBS ports. Considering that I'm all about speed and constantly wanting to saturate my network I selected the former. I'll report back when it's installed.

Tuesday, December 15, 2009

Windows Powershell v2

For whatever reason it just seems impossible for any search engine to provide the correct link to download PowerShell 2 so I'm making a note of it here on my blog so that I can easily find it again:


ASUS Eee PC Seashell 1005HA

I'm in the process of buying:

ASUS Eee PC Seashell 1005HA-PU17-BU Royal Blue 10.1" WSVGA Netbook

from NewEgg. I will post more details here once I get the product.

I tried to customize the HP Mini 311 from the HP web site but it wouldn't let me order one with 3GB of RAM and Windows XP. Insisted that I order Windows 7 at an extra $50 if I wanted 3GB of RAM. According to the sales rep and the web site Windows XP does not support 3GB of RAM. Tell that to the millions of users with 3GB of RAM and Windows XP.

If you're thinking about buying one of these then wait a couple of days until after I've bought it. The price usually plummets as soon as I've bought something like this.


12/18/2009: First impressions of the ASUS Netbook.

Monday, December 14, 2009

XMarks Bookmark Synchronization

Last week I installed XMarks Bookmark Synchronizer and I am impressed. I use 4 browsers (sometimes 6 if you include IE6 and IE7) and keeping my bookmarks in sync on all the browsers and across all machines is a real problem. I regularly use about 5 different machines so with my 4 regular browsers that makes 20 browsers that I would like to keep in-sync. XMarks makes this a breeze.

XMarks supports IE8 (probably IE6 and IE7 but haven't installed on these so don't know), Firefox, and Chrome. I don't believe that it supports Opera or at least I haven't found something that will work with Opera and XMarks yet.

When you install it do so on your browser that has your definitive or most comprehensive set of bookmarks and when walking through the wizard select the option to synchronize your bookmarks. On subsequent browsers you want to select the option to blow away your local bookmarks and use the ones on the server.

After that, any book mark you add, delete or modify on any browser will then be synchronized with all other browsers and across multiple machines.

Very impressed with this free utility.

Wednesday, December 9, 2009

NTLM Active Directory Integration in Firefox

At work we use Active Directory to authenticate our internal websites. This is great if you're using Internet Explorer because it will pass through to the application without requiring further authentication. However, on Firefox you're required to make some changes to get this to work.

Open Firefox and go to about:config

Filter on "auth"

Set all booleans to true

Set network.automatic-ntlm-auth.trusted-uris to a comma separated list of domains that you want AD to do pass through authentication on.

That last instruction never used to work for me. I used to put in the full domain name but for some reason it didn't like it. I've now learned that you only need to put in the trailing part of the domain (TLD/Top Level Domain) and it will authenticate all domains and sub-domains for you.

Say you work for IBM and your internal domains follow a pattern of - all you need to do is add "" to the trusted-uris setting and it will work for you. In fact, all you need to add is ".ibm".

Another thing I often need to do is connect to a site by IP. This will also work by dropping in the last octet of the IP into this list. e.g. ".164" (Obviously you could cover all IP addresses by dropping in all possible 256 octets.)

Here is how to generate all the octets using PowerShell:

$numbers = 0..255 | %{".{0}" -f $_}
$octets = [string]::join(",", $numbers)

Monday, December 7, 2009

The proxy server is refusing connections in Firefox

If you're a developer and you're using Firefox and you suddenly get this message:

The proxy server is refusing connections

It could be because you closed Fiddler but still have Firefox set to send traffic to Fiddler. Just click on the Fiddler link in your status bar and switch it off or restart Fiddler.

ASP.NET MVC GetTypeHashCode() no suitable method found to override

I was getting this error when running a page from an ASP.NET MVC site and it was driving me batty:

Error Message: c:\Windows\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files!!!!!oot\296bde83\3fd88bdf\App_Web_create.aspx.1486a709.vkhqok-s.0.cs(291): error CS0115: 'ASP.views_mycontroller_create_aspx.GetTypeHashCode()': no suitable method found to override

I fixed it by deleting the old view and creating a new one and then just pasting in the bits I'd already done testing it after each paste to see what it was. This showed nothing.

I then compared the old view with the new one and discovered that the only difference was in the @page directive:

Old View: Inherits="System.Web.Mvc.ViewUserControl<Model.EditorViewModel>"

New View: Inherits="System.Web.Mvc.ViewPage<Model.EditorViewModel>"

Not sure how I ended up with ViewUserControl in the old view instead of ViewPage but this was my error and fixed the problem.

Thursday, December 3, 2009

Changing Graffiti to use Google Analytics Async

Google have just released an asynchronous version of their Google Analytics site tracking code. The main advantage of this is that you can have the code start executing and collecting stats as soon as the page has started loading (because you can put it higher up in the HTML) and it won't interfere with your load times because it's asynchronous.
This blog is running on Graffiti CMS and I was pleasantly surprised at how easy it was to change to the asynchronous version of analytics.
When logged in to your Graffiti CMS site as an admin click on the Control Panel link and then on Site Options. From Site Options click on the Settings box.

At the bottom of the "Your Site Options" page you will see two text areas. If you are already using Google Analytics then you will probably have your code in the first box labeled Web Statistics. Putting your original analytics code in this box is good because this puts the analytics code at the bottom of your page just before the </body> tag.
However, now that we have an asynchronous version of analytics we should put the new code in the second box, the Header box, because this will put the analytics JavaScript in the <head> tag and it will load earlier and also ensure that your page loads faster. (Positioning it in the header won't make your page load faster but the fact that it's asynchronous will.) This is what these two boxes will look like when you've finished editing.

Before you delete your old analytics code from the Web Statistics text area you need to save your site's UA code. This is the only part of your old code that you'll need.
Then you will want to paste the following into the Header part of this page:
<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXX-X']);

  (function() {
    var ga = document.createElement('script');
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '';
    ga.setAttribute('async', 'true');
and replace the UA-XXXXX-X part with your value.
It really is as easy as that. I suddenly have new found respect for Graffiti CMS.

Monday, November 30, 2009

Hard Disk speed comparison

These are five hard disks that I have in three machines that I frequently access.

Machine Drive MB/s Model Description
Pavilion C 35/35 ST3250823AS Seagate Barracuda 7200.8 ST3250823AS 250GB 7200 RPM 8MB Cache SATA 1.5Gb/s 3.5" Hard Drive -Bare Drive
Pluto C 225/225 PFZ128GS25SSDR Patriot Torqx PFZ128GS25SSDR 2.5" Internal Solid state disk (SSD)
Pluto E 60/120 ST32000542AS Seagate Barracuda LP 2TB 3.5" SATA 3.0Gb/s Hard Drive -Bare Drive
Monster C 150/200 WD3000GLFS-01F8U0 Western Digital VelociRaptor 300GB 3.5" SATA 3.0Gb/s Internal Hard Drive -Bare Drive
Monster D 90/110 ST31500341AS Seagate Barracuda LP 1.5TB 3.5" SATA 3.0Gb/s Hard Drive -Bare Drive

I was thinking of replacing the 250GB Seagate in the Pavilion to improve performance on that machine but wanted to get an idea of what sort of performance boost I thought I might be able to get so I copied some hard disk benchmarking software (ATTO) onto each of the machines and ran it.

The MB/s column shows the write/read speeds measured for each disk. This is a rough average for each disk.

My surprises were as follows:

  • The solid state drive (SSD) was not much faster than the velociraptor.
  • The seagate 1.5TB had a faster write time than the 2TB. (It doesn't feel faster, it feels slower.)
  • The 2TB driver is less that twice as fast as the old 250GB Seagate. (Again, the 2TB drive feels much faster.)

I've decided to replace the Pavilion's 250GB Seagate with a 2TB Seagate. The bottleneck with that machine at the moment is the hard disk and I'm confident that I'm going to get at least a triple speed improvement even though the numbers don't give that.

Since I installed my Windows Home Server nothing has crashed on my network so I have been unable to try out recovering a hard disk from its backup. I plan on using this new disk as an opportunity to try this with the Pavilion machine. I will add the new hard disk and then set the bios to treat this disk as the boot drive. I think that I then have to generate a recovery CD from which this machine will boot and restore the OS etc. to the new disk.

I'll post a link to my results here when done.

Destroying a Hard Disk

A number of years ago I got myself a Maxtor 250GB external hard disk which at the time I used for all my backups. It cost me a fair penny then (2003) and I can now in 2009 buy 4TB of space for the same amount. Don't think that I'm complaining though, I think that's great progress.

Like all good hard drives it eventually died but it died a slow bit by bit corrupting death. It started to become inconsistent (that's when I should have taken an image of it) and then just started to corrupt and not be able to read data. Luckily I manage to recover all my important data and just lost a few albums of songs and movies that I could easily buy again.
The drive had only lasted about 2 years and Maxtor refused to entertain the idea of replacing and knowing how much I'd spent on it I didn't want to through it away it so it went into the box of computer parts that every geek has.
Years passed and during a cleanup I agree with myself that it had to go. I really couldn't remember what else was on it but I was pretty sure that a good hardware recovery technician could get data off it so I wanted to destroy it before disposing of it.
To this end I pulled out a drill and started drilling holes in it. The white box around the photo below shows a broken drill bit and the other end of the bit is lying on top of the circuit board.

With a larger and stronger drill bit I was able to get some deeper and more crippling holes into the disk before I decided that I had limited data recover-ability from it.

When you read the instructions on how to handle these devices they always tell you to discharge any static before handling and work in a static free environment etc. I ignore all of that when I was doing this.

Wednesday, November 25, 2009

Age Analyzer

As part of the text classification work that I do I follow the uClassify blog who have just release a new web site called Age Analyzer. which guesses the age of the author of a blog. I plugged in the URL of this site and it guessed me in the 26-35 bracket. This would be flattering if you were to look at me but I find it insulting that my writing is considered this immature. It should have put me in the 36-50 bracket.

My friend Rob Manderson's eloquent prose has aged his Ultramaroon Rises Again blog in the 65-100 range (I'm sure you're not that old Rob). Dan Esparza's Esta Nublado puts him at a very young 18-25, and Bill Brown's New Clarion puts him in the 65-100 bracket while his bblog looks a little more accurate at 26-35.

The Evil Solution User Option SUO file

I have a Visual Studio solution that's started taking longer and longer to load. In fact it was taking up to 5 minutes to load this solution which meant that I'd leave the solution open for days just so that I didn't have to reload it. Another symptomatic clue that all was not well in Denmark was that it was taking several minutes for the IDE to be usable after I stopped a debug session with Shift+F5.

My first stab at a solution was to create a new one in a different directory and add all my projects to it. Worked like a charm and the new solution loaded lightening faster (couple of seconds) and I could exit the debug session in about 3 seconds.

Although happy that I'd solved the problem I was then lucky enough to notice that in the original folder next to the solution's .sln file there was a Solution User Option .suo file which was a whopping 11Mb in size. I deleted this file and opened the original solution and discovered that my solution is now loading and exiting debug lickety-split again.

It might be my unorthodox way of exiting debug sessions with Shift+F5 that's causing this file to grow in size. I'll keep an eye on it's future growth and report back here if I learn anything new.

Monday, November 23, 2009

Even Faster Web Sites

Even Faster Web SitesJust been reading Even Faster Web Sites. I had never heard of comet before I read this but it makes sense and I'm looking forward to trying it.

Comet is a long-held HTTP request allows a web server to push data to a browser, without the browser explicitly requesting it.

I was a little bit surprised that the book only gave two pages to CSS Sprites. My testing shows that sprites can significantly improve web site load times if the site has multiple small images coming from the server.

Web app efficiently is going to start playing a more and more significant role as we return to the mainframe model with cloud computing. Many of the cloud computing providers are charging by processor cycles so the fewer of these your app uses the less expensive it is to run in the cloud. On top of that because you are processing your requests faster your site will be more competitive.

Sunday, November 22, 2009

SQL Injection Attack part 2

Since I last wrote about a SQL Injection Attach that one of my sites received I took measures to prevent it and now reject a URL with @(cast in it immediately and don't process it any further. This has worked well over the last year and a half and no further attacks of that type have made it into the logs.

I have now started to see URL requests with the following pattern:


The significant part comes after the =3:

 and |+user+|=0

No idea what they're trying to achieve with this...

Friday, November 20, 2009

Rotate or flip image through Y-axis

I've been trying to work out how to rotate an image through the Y-axis and display its mirror image on a web page. One of the problems that I faced was trying to find the right word(s) to search on as rotate (so I discovered) is used to refer to rotation through the Z-axis most of the time.
I finally found a solution in Silverlight 3. Here's the code...
<Image Source="Jellyfish.jpg">
        <PlaneProjection RotationY="180" />

...which does exactly what I'm looking for.
My favorite image editing program (Paint.Net) refers to this as Flip Horizontal which I think is completely misnamed.

<PlaneProjection RotationY="0" />

  <PlaneProjection RotationY="180" />

Hard disk capacity is getting smaller

I'm just about to go to Fry's Electronics and get myself a Seagate Barracuda LP 2TB 3.5" SATA 3.0Gb/s Hard Drive -Bare Drive (that link is to the same drive but at NewEgg). I was curious to know how big this 2TB drive would be so I did a quick calculation and discovered that it will be 1.82 TB. As most people know, disk drive manufacturers calculate disk sizes to make them appear bigger than they actually are. They measure a Kb as 1,000 bytes instead of 1,024 bytes.

This caused me to do a little calculation to see what were were getting when sold drive space:

1MB (advertised) = 0.95 MB (actual)

1GB = 0.93 GB

1TB = 0.91 TB

1PB = 0.89 PB

Do you see the progression? they're getting smaller. Eventually that number will be 0.00 Xb.

A PB is a Petabyte which I have always considered to be 1,024^5. However, according to wikipedia I'm wrong and it's 1,000 terabytes and not 1,024 terabytes and a Pebibyte abbreviated as Pi is what I should be using. Also all my other references are apparently wrong and I should be using Mi, Gi, Ti, and Pi as abbreviations.



Tuesday, November 17, 2009

Get Powershell Version Number

I wish that they'd aliased the old DOS "ver" command to Powershell's Get-Host command.

If you type "alias" at the Powershell command prompt you'll see a ton of aliases that they've setup. "dir", "copy", "cls", "compare", "md", "cd" etc. These greatly ease the transition to Powerhsell's commands if you want to quickly drop in to a command prompt and have a PowerDOS environment.

I know that I can easily configure Powershell to alias "ver" but the only time I usually need this is when I'm on a new machine and don't know what version is running. After that I don't need it anymore. Even "Get-Ver" would be more helpful and logical than Get-Host.

Anyway, if you haven't figured it out by now:

ver = Get-Host


Windows Management Framework Core is already installed on your system

I've been trying to install Windows Update KB968930 on my XP SP3 machine and kept on getting this error:

Windows Management Framework Core Setup Error
Setup cannot proceed. Windows Management Framework Core is already installed on your system.

This update, I believe, is Powershell 2.0 (final)

To install Powershell 2.0 you first have to uninstall Powershell 1.0

I finally solved it and got it installed by uninstalling some stuff that has to be removed before you uninstall Powershell 1.0. This is what I uninstalled and the order in which I did it:

  1. Update for Microsoft Windows (KB971513)
  2. Update for Windows Internet Explorer (KB975364)
  3. Microsoft Base Smart Card Cryptographic Service
  4. Windows Powershell 1.0 MUI Pack
  5. Wndows Powershell 1.0
  6. Windows Management Framework Core

UPDATE 12/22/2009: You may find that after you've installed Powershell 2 you see an exception from mscorelib.exe when you reboot your computer and this error repeats three times. If you look at the Event Viewer and under System you will see an error from Service Control Manager: The .NET Runtime Optimization Service v2.0.50727_X86 service terminated unexpectedly.

To solve this problem you need to run the following in the new version of Powershell that you've just installed:

Set-Alias ngen (Join-Path ([Runtime.InteropServices.RuntimeEnvironment]::GetRuntimeDirectory()) ngen.exe)
ngen update

There will be pages and page of informational text displayed including errors and warnings and this will take a fair amount of time to run. Be patient and don't panic.



Friday, November 13, 2009

Win7 slam the sides feature

I frequently need to view two web pages side by side but I find that I have those two web pages open in separate tabs in the same browser.

What I used to do would be the following:

  1. Copy the URL from one of the tabs.
  2. Minimize all windows on the desktop (Win + D)
  3. Open a second browser.
  4. Paste in the URL and hit enter.
  5. Restore the original browser.
  6. Right click the task bar and select "Show windows side by side" (or whatever the XP equivalent was).

I've found two shortcuts to improve this process. One of them from Windows 7 and the other seems to be a new(ish) feature on modern browsers that I just stumbled across.

  1. Click the title bar of the browser with the mouse and slam it onto the right or left edge of the screen. (This will cause the browser to fully occupy one vertical half of your screen.
  2. Drag the tab off the browser and it will immediately create a new new window.
  3. Slam that title bar into the other edge.

I've discovered that dragging the tab away from the tab bar will create a new browser window in FireFox, Chrome and Opera but I haven't been able to get this to work in IE8.

In Firefox and Chrome you can drag the tab down onto the current page and it will pop up a new windows. However, in Opera, you need to drag the tab off the browser to get it to create a new window. That's why it's better to slam the side with the browser before dragging the tab because then you know that you have space to drag the tab onto.

Monday, November 9, 2009

Speed improvements with compiled regex

During a code review I was told that a compiled regex would work faster. I had no doubt that this was true but I wanted to know how much faster and at what cost. I setup and ran the following test.

static void TestCompiledRegex()
    string regexString = @"(\{{0,1}([0-9a-fA-F]){8}-([0-9a-fA-F]){4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){12}\}{0,1})";
    Regex compiledRegex = new Regex(regexString, RegexOptions.Compiled);
    Regex uncompiledRegex = new Regex(regexString, RegexOptions.None);

    double totalFaster = 0d;
    int numIterations = 0;
    for (int j = 10; j <= 100; j += 10)
        TimeSpan uncompiledTime = RunRegex(uncompiledRegex, j);
        TimeSpan compiledTime = RunRegex(compiledRegex, j);
        double timesFaster = uncompiledTime.TotalMilliseconds / compiledTime.TotalMilliseconds;
        Console.WriteLine("For {0} GUIDS compiled takes {1} and non-compiled takes {2}. Compiled is {3:0.00} faster", j, compiledTime, uncompiledTime, timesFaster);
        totalFaster += timesFaster;
    Console.WriteLine("Average times faster: {0:0.00}", totalFaster/(double)numIterations);

static TimeSpan RunRegex(Regex regex, int numGuids)
    int x;
    string input = GetStringWithGuids(numGuids);
    DateTime startTime = DateTime.Now;
    for (int i = 0; i < 10000; i++)
        MatchCollection mc = regex.Matches(input);
        x = mc.Count;
    return DateTime.Now - startTime;


static string GetStringWithGuids(int numGuids)
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < numGuids; i++)
        sb.Append(Guid.NewGuid().ToString() + " spacer ");
    return sb.ToString();

The result was that the compiled regex ran 1.6 times faster than the uncompiled version.

I then split the RunRegex function into two functions and moved the creation of the Regex object into these functions making one of them compiled and the other not.

On text with 10 GUIDs in it the uncompiled version ran 42 times faster than the compiled version. The number of times faster diminished as the number of GUID's (matches) in the string increased until the uncompiled version was six times faster when we had 100 matches in the string.

The next test was to decrease the number of matches to non-matches in the text so I adjusted the GetStringWithGuids() function to add 100 "spacers" between each GUID. Remember that the listed GetStringWithGuids() above has 1 match (GUID) per non-match(" spacer "). The new function looked like this:

static string GetStringWithGuids(int numGuids)
    StringBuilder sb = new StringBuilder();
    string spacer = String.Join(" ", Enumerable.Repeat("spacer", 100).ToArray());
    for (int i = 0; i < numGuids; i++)
        sb.Append(Guid.NewGuid().ToString() + " " + spacer + " ");
    return sb.ToString();

For 10 GUIDs the uncompiled version performed 2.7 times better but at 50 GUIDs the compiled version started performing better through to 100 GUIDs.

So the only test left was the one that was truly representative of the data that I was going to run this against which was a block of text with a single GUID in it.

The new GetStringWithGuids() function with redundant parameter looked like this:

static string GetStringWithGuids(int numGuids)
    StringBuilder sb = new StringBuilder();
    string spacer = String.Join(" ", Enumerable.Repeat("spacer", 100).ToArray());
    sb.Append(spacer + " " + Guid.NewGuid().ToString() + " " + spacer);
    return sb.ToString();

This showed the uncompiled version to be 10 times faster than the compiled version.

Sunday, November 8, 2009

Is this string numeric in C# part 3

Following on from Is this string numeric in C# part 2...

Curiosity got the better of me and I wanted to know what the IsDigit and TryParse functions did. There's a post here by Shawn Burke that details how to setup Visual Studio to allow you to step into the .NET source. From that I found the IsDigit() source to be exactly what we'd surmised:

public static bool IsDigit(char c) {
  if (IsLatin1(c)) {
    return (c >= '0' && c <= '9');
  return (CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.DecimalDigitNumber);

In both scenarios you'd expect Jon's function to perform better and even more so in the numeric string. When checking the numeric string every single character needs to be checked so the additional overhead for the IsLatin1() call should slow down Guy's function more. However, the results show Guy's function performing better when the string is numeric.

When the string is non numeric from the first character then you'd expect Jon's function to perform better but only just.

I can't explain why Guy's function is currently out-performing Jon's at the moment. There must be a flaw in my testing setup.

I was, however, completely wrong about the TryParse function. It does not throw an exception but it is a much larger and more nested function than I was expecting and executes dozens of code paths and this is what's slowing this down. Also, if you think about it the TryParse is converting from one type to another so the complete conversion is taking place while in our IsDigit function we are just checking to see if it's possible to convert the string to a number and not actually doing the conversion. So in addition to doing the check on each character, the TryParse is also writing those characters to a memory space to create a numeric value. That's where the performance hit comes in.

Jon also suggested modifying his function to assigning the char to a local variable before using it as the index operator makes a function call so I tried by modifying his function to:

Func<string, bool> isNumeric_Jon =
    delegate(string s)
        for (int i = 0; i < s.Length; i++)
            char c = s[i];
            if (c > '9' || c < '0')
                return false;
        return true;

However, this made no difference in the tests. My guess is that the compiler had already optimized the two identical references into one.



Saturday, November 7, 2009

Up or down for version number on C# 4.0?

Not sure if you were up-to-speed on the compiler version numbering that's been used with C# but just in case you weren't this is what version number the compiler calls itself when run from the \windows\\framework\ folders...

We went from version 7 to 8 and then to 3.5...
That's not the most logical progression of numbers that I've ever seen. I haven't installed the C# 4.0 beta compiler yet so I don't know what the version number will be. My thinking is that if I ever went for a job interview with Microsoft I am definitely going to find out what it is because on that job IQ test they'll have a number progression question and it will be what comes after 7, 8, 3.5 __ and I will then know the answer.

Standing on the shoulders of giants

As arrogant as my domain name is "Guy Ellis Rocks Dot Com", I only chose it because Standing On The Shoulders Of Giants Dot Com was already taken. I've always attributed this quote to Sir Isaac Newton but just learned that it was originally coined by Bernard of Chartres.

I take a moment to be humble and acknowledge that almost everything that you see on this blog is the synthesis of giants who have helped me and who's work I have read.

Is this string numeric in C# part 2

I heard back from Bill and Jon with some suggestions about improving the algorithm for speed in the IsNumeric(string) function documented at Is this string numeric in C#?

Bill's idea was to use the return value from Int64.TryParse() and Jon's idea was to compare the values of each character to their digital character equivalents and check the greater or less than in the ascii sequence using if (s[i] > '9' || s[i] < '0').

My initial gut-feel reaction was that Jon's would be faster and Bill's slower. The reason that I thought that Jon's would be faster is because I am under the impression that IsDigit() checks other digits outside of the 0 to 9 range to take into account other digit sets such as Thai. Bill's I thought would be slower because I think that TryParse() catches a thrown exception when returning false and so the expense of exception handling would cause it to be slow.

The first problem that I hit was that the original numeric string that I used:

string numericString = "12345678901234567890123456";

was too large for Int64. I didn't pick this up at first and when I ran the first set of tests they completed turned my expectations on their head with Bill's being fastest and then mine and then Jon's. However, these results were invalid as the Int64.TryParse() call was not returning the correct results.

I fixed the test code to look like this and re-ran it all.

static string numericString = "1234567890123456";
static string nonNumericString = "abcdefghijklmnopqrstuvwxyz";
static void DigitTesting()
    Func<string, bool> isNumeric_Guy =
        delegate(string s)
            for (int i = 0; i < s.Length; i++)
                if (!Char.IsDigit(s[i]))
                    return false;
            return true;

    Func<string, bool> isNumeric_Jon =
        delegate(string s)
            for (int i = 0; i < s.Length; i++)
                if (s[i] > '9' || s[i] < '0')
                    return false;
            return true;
    Func<string, bool> isNumeric_Bill =
        delegate(string s)
            Int64 number;
            if (Int64.TryParse(s, out number))
                return true;
            return false;

    Func<string, bool>[] functions = {isNumeric_Guy, isNumeric_Jon, isNumeric_Bill };

    foreach (Func<string, bool> function in functions)
        DateTime startTime = DateTime.Now;
        for (int i = 0; i < 100000000; i++)
        Console.WriteLine("numberic: {0} time taken: {1}",
            function.Method.Name, DateTime.Now - startTime);

Against a numeric string IsNumeric_Guy() turned out to be about 15% faster than IsNumeric_Jon() and about 31% faster than IsNumeric_Bill().

When testing against a non-numeric string IsNumeric_Jon() showed up 16% faster than IsNumeric_Guy() and 87% faster than IsNumeric_Bill(). This is how I would have expected the results to be in both cases.

This is an interesting conundrum to end up at with two algorithms both beating each other by the same amount under different data types. Luckily I know that most of the data that this algorithm will be looking at will be non-numeric strings and so I will go with Jon's algorithm.

Thanks Bill and Jon - your ideas and help are much appreciated.

Continued in Is this string numeric in C# part 3.

Friday, November 6, 2009

Is this string numeric in C#?

On a completely unrelated project to my previous post about Optimizing a custom Trim() function in C# where I was stripping non-numeric characters from either end of a string that may have digits in it I found myself battling with a slow running console application that needed profiling.

Someone had recommended EQATEC Profiler to me and so I ran it against this console app to find where the slowness was coming from. It turned out that the bottleneck was in an IsNumeric function:

private readonly static Regex NumericRegex =
    new Regex(@"^[0-9]+$", RegexOptions.Compiled);
bool IsNumeric(string Text)
    return NumericRegex.IsMatch(Text);

When I discovered how frequently this was being called I went about optimizing it and came up with:

bool IsNumeric(string s)
    for (int i = 0; i < s.Length; i++)
        if (!Char.IsDigit(s[i]))
            return false;
    return true;

I then tested the two functions on two different strings:

string numericString = "12345678901234567890123456";
string nonNumericString = "abcdefghijklmnopqrstuvwxyz";

To see how they would fare against each other. On the numeric string the new function ran 3 times faster than the old regex one. On the non-numeric string it was a factor of nine. The reason for the huge difference in performance gain between the two types of strings should be fairly obvious. For the non-numeric string the new function bails out of the call immediately after inspection of the first character while the regex would still examine the whole string. For the numeric string both functions would need to examine every single character before returning.

More tests with 2 new algorithms from comment and email: Part 2

Delegates and Lambdas in C#

I was just explaining delegates and lambdas to a friend and thought that I'd write up an example for him.

Here is an IEnumerable<> that we want to interrogate to get all values greater than or equal to 60:
int[] scores = { 45, 55, 59, 65, 66, 67, 68 };

Here is how we could do it using a delegate. We declare a Func<T, TResult> type which is the same as a pointer to a function in C. In our following LINQ statement we pass that function pointer (delegate) as the "predicate" in the Where "clause." The Where() extension method will be passing in each int one at a time and expect a bool true/false result back. That's why we declared the delegate as a Func<int, bool>, the int is what we're passing in and the bool is the return type.
Func<int, bool> predicate = delegate(int score) { return score >= 60; };
IEnumerable<int> passingScores1 = scores.Where(predicate);

LINQ makes the above syntax much easier for us. In the parameter to the Where() extension method we can declare the body of the function so long as it adheres to the prototype of int parameter and bool return.
IEnumerable<int> passingScores2 = scores.Where((int score) => score >= 60);

The compiler can infer the type that's being passed in as the parameter so we can drop the (int score) and replace it with score if you want even more brevity.
IEnumerable<int> passingScores3 = scores.Where(score => score >= 60);

Func<> has 5 overloads:

Func<T, TResult>
Func<T1, T2, TResult>
Func<T1, T2, T3, TResult>
Func<T1, T2, T3, T4, TResult>

In best coding practices, what are the maximum number of parameters a function should have? Well the designers of the Func<> delegate believe that it's four and I don't disagree with them. I am guilty of using more than four but that's mostly out of laziness of writing and populating a class to pass instead of the parameters.

JavaScript library management in browsers

Yesterday Google introduced its Closure Tools which includes its Closure Library. Google's AJAX Libraries API page lists a bunch of common JavaScript libraries which include jQuery, Prototype, Scriptaculous, MootTools, Dojo, SWFObject (never heard of this one), and YUI.

One of the advantages of using Google's content distribution network (CDN) for JavaScript libraries is that if someone visiting your site had previously visited another site which used the same JavaScript library from Google's CDN then that library will already be cached and your site will load faster. This is great but I believe that things could be even better. The problem is that you might be using a different CDN to other sites and will not benefit from this caching.

Browsers should be able to manage a library of common JavaScript libraries along with their versions. Your JavaScript code should be able to ask for a particular library and version and the browser should have had it installed when the browser was installed or if released after the initial installation then installed with regular browser updates.

Thursday, November 5, 2009

Optimizing a custom Trim() function in C#

I needed to write a Trim() function that removed anything that wasn't a digit from the beginning and end of a string. What usually happens when I write a blog entry like this is that someone posts a comment a few days later saying "hey, you didn't need to write that, it's already in the .NET library." If it is, I haven't been able to find it.

Here is my TrimNonDigits() function first attempt:

protected string TrimNonDigits_v1(string text)
    while (text.Length > 0 && !Char.IsDigit(text, 0))
        text = text.Substring(1);
    while (text.Length > 0 && !Char.IsDigit(text, text.Length - 1))
        text = text.Substring(0, text.Length - 1);

    return text;

Now this function works and for what I needed it for (trivially short strings) it would do the job. But it is my intention to add it to a string extension library and I had a fear that someone would pass in an enormous string and with all the Substring() operations creating new strings this may become extremely inefficient.

To see how inefficient this function was I wrote another unit test to specifically measure the speed at which the code ran. Here is that unit test:

public void trim_non_digits_big_text()
    int repeat = 10000;
    UnitTestDerived utd = new UnitTestDerived();
    string lotsOfCharacters = String.Join("", Enumerable.Repeat("abcdef", repeat).ToArray());
    string text = lotsOfCharacters + "1" + lotsOfCharacters;
    string expected = "1";
    string actual = utd.TrimNotDigits(text);
    Assert.AreEqual(expected, actual);

The unit test passes but takes 21 seconds to run. Although extremely unlikely that this amount of non-digit-text on either side of a number might appear it's still a valid use case so the code needed reworking so that the Substring() function was only called once. This is the resulting change to the code:

protected string TrimNonDigits(string text)
    int start = 0, end = text.Length - 1;
    while (text.Length > start && !Char.IsDigit(text, start))
    if (start != end)
        while (end > start && !Char.IsDigit(text, end))
    return text.Substring(start, 1 + end - start);

The unit test runs against this new function in under 1 second. That's a dramatic difference and good demonstration for the case of not using Substring() repeatedly in a loop if you can help it.

A side note. I also had several other unit tests that checked the validity of the results of the algorithm before I did the optimization. These included the testing of edge cases and extremes. This made the optimization refactor a cinch because I was fairly confident at the end that the algorithm will work in all situations.

Tuesday, November 3, 2009

MultiThread testing for speed

I recently wrote this little test program to validate that multi-threading was working the way I expected it to.

class Program
    private delegate void funcs();

    static void Main(string[] args)
        funcs[] functions = new funcs[]
            new funcs(MultiThreadTest),
            new funcs(SingleThreadTest)
        foreach (funcs function in functions)
            DateTime startTime = DateTime.Now;
            for (int i = 0; i < 10; i++)
            while (ThreadWorker.ThreadCount > 0)
            Console.WriteLine("{0} time taken: {1}",
                function.Method.Name, DateTime.Now - startTime);

    private static void MultiThreadTest()
        ThreadWorker threadWorker = new ThreadWorker();
        ThreadStart threadDelegate =
            new ThreadStart(threadWorker.Runner);
        Thread newThread = new Thread(threadDelegate);
    private static void SingleThreadTest()
        ThreadWorker threadWorker = new ThreadWorker();

class ThreadWorker
    public static int ThreadCount = 0;

    public ThreadWorker()
    public void Runner()
        IEnumerable<string> strings =
            @"the not so quick brown fox was
caught by the hen and eaten with eggs", 50);
        string text = String.Join(" ",
            strings.SelectMany(a => a.Split()).ToArray());
        for (int i = 0; i < 1000; i++)
            // Do some random LINQ stuff to occupy the processor
            string[] list = text.Split(' ');
            string[] l2 = list.Distinct().ToArray();
            l2 = list.OrderBy(a => a).ToArray();
            l2 = list.OrderByDescending(a => a).ToArray();

I tested this bit of code by running it four times on each of two machines.

The first was a dual-core and the single-threaded-test took an average of 33.4 seconds and the multi-threaded-test 17.8 seconds. This makes the multi-threaded part 1.9 times faster which is about what you'd expect if you allow two processors to work on the problems in parallel.

The second machine was a quad core with hyper-threading so essentially eight cores. This took an average of 29.3 seconds for the single-threaded-test and 4.6 seconds for the multi-threaded-test. An improvement factor of 6.4, not as close to 8 as I was expecting but not that far off. If anybody knows why the eight cores do not come as close to an eight-times factor as the two cores came to a two-times factor I'd love to hear from you in the comments.

The two processors were:

  • Intel Core 2 CPU 6400 @ 2.13 GHz
  • Intel Xeon CPU L5410 @ 2.33GHz

Something that I found interesting was that the slower processor ran 1.65 times faster than the fast processor when taking advantage of multi-threading. This has important implications for the software that you write. The single-threaded test ran 1.14 times (14%) faster on the faster processor. However, the multi-threaded code on the slower processor runs 65% faster than the single-threaded code on the faster processor.

If you're looking for a performance boost there may be more performance in multi-threaded code than in a faster processor. In fact, processor speed is probably not what you're looking for. The best combination would be multi-threaded code on multi-core boxes.


Monday, November 2, 2009

Using LINQ to join two string lists without repeats

I have a list of members that have to play a game against each other and I want to generate a complete list of all the members against every other member without repeating any games. My list looks like this:

string[] members = {"Alphie", "Jerome", "Silky", "Buzz" };

My first attempt at generating a list using LINQ was this:

IEnumerable<string> q = from one in members
                        from two in members
                        select one + " plays " + two;

which resulted in:

Alphie plays Alphie
Alphie plays Jerome
Alphie plays Silky
Alphie plays Buzz
Jerome plays Alphie
Jerome plays Jerome
Jerome plays Silky
Jerome plays Buzz
Silky plays Alphie
Silky plays Jerome
Silky plays Silky
Silky plays Buzz
Buzz plays Alphie
Buzz plays Jerome
Buzz plays Silky
Buzz plays Buzz

This is not exactly what I was looking for. I wonder who the winner would have been in Buzz versus Buzz?

The secret is to put a where clause before the select statement:

where one.CompareTo(two) < 0

This will eliminate duplicates when CompareTo(two) == 0 and also alphabetically sort the two players eliminating them playing against each other a second time. This is the complete code snippet:

IEnumerable<string> q = from one in members
                        from two in members
                        where one.CompareTo(two) < 0
                        select one + " plays " + two;

and here are the revised results:

Alphie plays Jerome
Alphie plays Silky
Alphie plays Buzz
Jerome plays Silky
Buzz plays Jerome
Buzz plays Silky

Thursday, October 29, 2009

Unit testing and code conversion


I've just done a little exercise in code conversion from Python to C# and the icing on the cake were two unit tests written in the Python code that confirmed that my code had been converted correctly.

I know nothing about Python so it was lucky that this code was about 10 functions and only a few pages long. For the bits of code (mostly syntax) that weren't obvious I found an online quick reference to Python and used that to search for the unusual keywords and work out what they did.

The code conversion I did inline by copy pasting the Python code into a C# class in a Visual Studio project and then converting each line into C# leaving the variable names and code structure intact as much as possible. I had the original Python file open in Notepad++ on a second monitor as a reference.

At the bottom of the Python file were a couple of unit tests with expected results and input parameters. I rewrote those unit tests in Visual Studio's Unit Tester and used the same inputs and expected outputs and they ran successfully. As a result of those unit tests that was probably the most successful code conversion that I have ever done and a very productive one as well.

Quicken Deluxe 2010 spams your desktop


I just bought and installed Intuit Quicken Deluxe 2010 from their site for $59. The first thing to disappoint me was the fact that there was nowhere during the purchasing procedure to enter my coupon code and get a discount. I was in too much of a hurry to bother with hunting around for how to do this so they got to keep my $10 discount.

The second thing that really annoyed me was that the installation procedure dumped three spam links on my desktop for other products that they sell.

I haven't even run their software yet and they've had two strikes that annoyed me sufficiently to blog negatively about them. It might be great software but so far it looks like a shady company using back-alley tactics.

Edit on 11/30/2009: That link above (Intuit Quicken Deluxe 2010) has the software priced at $54.99 so you save a through dollars through NewEgg. I didn't know about it at the time so paid the full $59 to Intuit.

Tuesday, October 27, 2009

Network Saturation Finally

I have finally achieved my goal of network saturation on my home network.
As with most people, I have a small off-the-shelf router that does the standard 100 Mbs. Most of the computers in my house are hardwired because the wireless signal is slower and weaker in the far corners of the house and also because we bought a spec home that had all the rooms pre-wired with Ethernet.
In the past, when I've been copying files from one computer to another and I've looked at the transfer rate over the network I've been disappointed that only 40% to 60% of the available bandwidth was being utilized. The hardware supports 100 MBS so why isn't it transferring data at that rate dammit?
The reason is because of the slowest component in the chain which has always been the hard disk speed. Well not anymore. I've just bought myself a new computer and with this I got a Patriot Torqx PFZ128GS25SSDR 2.5" Internal Solid state disk (SSD) which promises 260 MBS read and 160 MBS write.
The computer that I was copying from had the data sitting on a Seagate Barracuda LP 1.5TB 3.5" SATA 3.0Gb/s Hard Drive -Bare Drive. I don't know what the read speed of that is (yet - I'll come back and update this later) but I'm guessing that it's over 100 MBS or I wouldn't have achieved network saturation.
Is it going to be worth getting a faster router? Not yet I think. The times I'll be copying between computers with fast hard drives is probably going to be rare. I'll wait until my internet connection exceeds 100 MBS. My prediction is that will happen in about 7 years time.

DVDBurn.exe in Server 2003

I've just been battling for the last 30 minutes to try and burn an ISO as an image to a DVD. I have a version of Nero Burn and tried to line up the planets with this bit of not-so-great software and ended up creating a data DVD with the ISO as the single file on this DVD. I took a second attempt with Nero but just couldn't find the option to create it as an image.

Did I mention that I was burning this DVD image from Windows Server 2003?

I stumbled across this little gem of a utility that I must have installed on this server with the Resource Kit Tools. It's called dvdburn.exe and has a younger brother called cdburn.exe.

DVDBurn.exe takes 3 parameters:

dvdburn <drive> <image> [/Erase]

Worked like a charm:

C:\Software>dvdburn d: en_windows_7_ultimate_x64_dvd.iso
Media type: DVD-R
Preparing media...
Error setting timestamp; this error will be ignored, some drives can work without this
- 100.0% done
Finished Writing
Waiting for drive to finalize disc (this may take up to 30 minutes).............

Success: Finalizing media took 10 seconds
Burn successful!

Saturday, October 24, 2009


The genius behind NinjaCamp recently sent me a link to 10/GUI. I try and follow the zero Inbox policy but for him I made an exception and allowed this email to languish in my Inbox so that I could take a relaxed 10 minutes out of the weekend to watch what he was so abuzz about. I have to agree with him, this is an amazing concept and I want one now.

From what I can work out this is as simple as a touch pad with up to 10 points that are recognized by it. Instead of the mouse you have a surface lying in front of the keyboard that will understand all your fingers including the motions now recognized by hand held devices such as the Android and iPhone.

The only thing missing, and I think this is just a matter of time, is for a keyboard to appear on there. If there were keys that could quickly inflate themselves on this device when you needed a keyboard and then disappear when you wanted the touch pad that would be the ultimate human interface.

Friday, October 23, 2009

C# Math.Round Banker's or Mathematical Method

I wasn't even aware that there was a banker's method to rounding numbers until I came across what I though was a bug in the Math.Round() function in the .NET library today.

If you call this function:

double x = Math.Round(.005, 2);

then you will discover that x is set to zero. If you call this:

double x = Math.Round(.0051, 2);

then x will be set to 0.01.

I was always taught that you round up from 5. This is still correct and is known as the mathematical method of rounding. However I have recently learned that there's another method and if the least-significant-digit is 5 and you're using the banker's method then you round down instead of up. This is the method that Math.Round() uses by default.

If you want to use the expected mathematical method for the round function then you need to add another parameter to the overloaded Round() function which specifies how you want it to handle midpoint rounding. AwayFromZero will do the trick:

double x = Math.Round(.005, 2, MidpointRounding.AwayFromZero);

This will cause x to be set to .001 as you would expect. The other value defined in the MidpointRounding enum is ToEven and this will do the default of round the midpoint down to zero.

Mystery solved!

1 Sep 2010, just discovered this:

Banker's Rounding

When you add rounded values together, always rounding .5 in the same direction results in a bias that grows with the more numbers you add together. One way to minimize the bias is with banker's rounding.

Banker's rounding rounds .5 up sometimes and down sometimes. The convention is to round to the nearest even number, so that both 1.5 and 2.5 round to 2, and 3.5 and 4.5 both round to 4. Banker's rounding is symmetric.

Predictably Irrational

I've just finished reading Predictably Irrational by Dan Ariely. This is a great book that applies to everyone and should be required reading before you leave school. Everyone is guilty of being irrational in their decisions and some of us continue to do it everyday even though we know that it's stupid and, yes, irrational.

One of the stories that I loved was the pen and the suit comparison. If you were buying a pen and saw it on sale for $30 and knew that you could buy it across town for $23 then you'd likely jump in your car, make the drive, and save yourself that $7. However, if you were buying a suit for $400 and the same applied you're highly unlikely to make the same trip to buy the suit for $393. Why would you do that.

One of my favorite experiments that they did was that with Lindt Truffles and Hershey's Kisses. They priced the Truffles and Kisses at 15 and 1 cents respectfully and found that 73% of customers chose the Truffles. They were willing to pay that extra 14 cents to get a superior product. However, when the dropped the price by 1 cent and made the Kisses free then suddenly everything reversed and 69% of customers took the free Kiss over the Truffle even though the price differential of 14 cents was still there.

Wednesday, October 21, 2009

SpamBayes performance on Outlook 2007

I installed SpamBayes for Outlook 2007 eight days ago and have been using it since then. I can understand the reluctance of users to installing this software because it's not trivial to understand how it works or exactly why it would be better than the spam software built into Outlook.
Before I installed SpamBayes, Outlook was catching about half of the spam emails that I received and putting about 5 or 6 legitimate emails into the Junk Mail folder. After SpamBayes took over it got a few wrong and put a number of emails into the Junk Suspects but once it had finished training in about 3 or 4 days it was getting almost everything right. Since then it's had about an average of 1 a day wrong or unknown and the gap between those is getting longer.
If you're an Outlook user and you haven't installed SpamBayes yet then I highly recommend that you do so. It's free and available here from SourceForge and it will repay your time-investment within the first week.
The key difference between SpamBayes and Outlook's default spam filter is that SpamBayes learns what spam means to you without you having to classify subject types and/or from emails as blacklisted. You don't have to setup rules. For example, I get a lot of spam email that purports to be from myself. I also send myself legit emails about things I want in my email. Spammers know this and that's why they spoof my email address because they know that I won't block my own email address. SpamBayes learns that it should place zero significance on who the email's from and instead focuses on other markers in the email to learn from you what spam and ham (good email) mean to you.
If, for example, you work in the porn (or anti-porn) industry then emails with the word "porn" may be legitimate emails that you want to and need to read. They may be from co-workers of yours. However, there may also be plenty of spam about porn that you don't want to read. SpamBayes takes care of that for you because it learns that distinguishing spam from ham based (in part) on the word porn for your email is useless and will therefore automatically ignore this word and focus on other words (and markers) in your email to make this distinction. In my personal email however it would probably classify the word "porn" as a red flag that the email is spam.

Ditto Clipboard Manager

For about the last year I've been using ClipX which is a clipboard manager. This worked well-ish with a few minor annoyances but seems to have been languishing in a vacuum of non-development and stagnation. Recently a friend put me on to the Ditto Clipboard Manager which I've installed on to a number of machines that I use and I absolutely love it and recommend that you install it immediately.
I've tested Ditto on Windows XP (32 bit), Server 2003 (32 bit) and Server 2008 (64 bit) and it works perfectly on all those machines. Today I was chastised by Ninja Camp for not blogging about this utility and spreading the wealth of knowledge so here is that post. Ninja Man tried to install it on Vista earlier today and he reported initial problems and I haven't heard back from him on this so not sure if he got it working or not. Hopefully he'll comment here on his progress.

One of the other utilities that I had running on one of my computers was Pure Text. This little applet allowed you to strip formatting from text when you paste it. You know when you copy something from a word document or a web page and you want to paste it without that bold and color? Well Pure Text does this for you. However, and this is a big however, you no longer need this utility if you're using Ditto. One of the features of Ditto is to paste any of your clippings as "Paste Plain Text Only" which is a real Godsend.
The default keyboard combination to launch Ditto is Ctrl + ` which I suggest you don't change even if you find it uncomfortable at first. I promise that this will grow on you very quickly.

Tuesday, October 13, 2009

Enabling SpamBayes on Outlook 2007 on Server 2008 x64

After installing SpamBayes Addin for Outlook 2007 on my Server 2008 x64 "workstation" I started receiving the following message when starting up Outlook 2007: "Outlook experienced a serious problem with the 'spambayes' add-in."
The fix was to disable DEP for that application. To do this on Server 2008:
  • Start > Computer > right click > Properties (System Control Panel)
  • Click Advanced System Settings
  • Under Performance click Settings...
  • Click on the Data Execution Prevention tab (have you worked out what DEP stands for yet?)
  • Click Add... and then navigate to the application that was causing problems and add it.
If you've already disabled the Addin on Outlook 2007 you're going to want to re-enable it:
  • In Outlook 2007 select menu Tools > Trust Center
  • On left panel click Add-ins
  • Under Disabled Application Add-ins select the Add-in that you disabled.
  • From the Manage drop down select Disabled Items and click Go...
  • Select the disabled item and click Enable.

Friday, October 2, 2009

Acer not charging for free upgrade this time around

I bought an Acer laptop a couple of years ago just before Vista came out. It touted a free upgrade to Vista because Vista was imminent. Fair enough I thought, I'll still get Vista and don't have to wait so I went ahead with the purchase.

The real story behind the free upgrade was that (1) I had to wait for months after Vista had shipped and was already available on a similar model laptop at Best Buy and, (2) I had to fill out an online form and then mail in all sorts of proof of purchase and, (3) I had to pay $15 for shipping and handling. This was a big effort, wait and annoying cost and that's the last time I fall for that free upgrade story.

Interestingly I just read Free Windows 7 upgades not always free and it reminded me of my Vista saga. However it looks like Acer may have learned from the Vista experience as they are now the only company listed in that article not to charge for the Windows 7 upgrade. However, I still wouldn't buy a computer with an upgrade promise because I can guarantee that you will have to spend hours fighting for that upgrade and you will only get it long after it's become available on the same laptop in that same store that you bought your Vista machine in.

Caveat emptor.

Wednesday, September 30, 2009

Windows 7

I'm ready to move on to Windows 7. Most of the reports and experiences that I've read have been very positive. In fact I don't recall anything negative that I've heard about Windows 7 that would impact me.

My main desktop machine is running Windows Server 2008 Enterprise. The reason that I selected this OS as a desktop operating system was because (1) I wanted to be developing on the same machine that my apps were being targeted for and (2) I wanted to use Hyper-V. This was a mistake. IIS7 comes on all modern Microsoft operating systems now and it turns out that as a desktop machine Hyper-V works particularly badly because it virtualizes the video card and really impacts performance of everything you're working on.

On top of this Windows Server 2008 makes it painfully difficult to install run of the mill applications that you would frequently use on your desktop such as the Zune Desktop and Windows Media Player. Granted, Server 2008 was not designed to run these apps, that's what Vista and Windows 7 are for. I now understand my folly.

I now have a dilemma. Do I keep my powerful Server 2008 machine and turn it into a "remote into" machine for running and testing applications or do I convert it to Windows 7? Using it as a remote into only machine means that I can re-enable Hyper-V however it also means that I'll have an expensive and powerful machine sitting on my home network that's hardly being used anymore.

My biggest problem with all of this is the amount of time it takes to setup a new machine but I think that while typing this I've come up with a solution. I'll buy a cheaper piece of hardware and use my Windows Home Server's image of the Server 2008 machine and restore that to the new hardware. I'll then install Windows 7 on my main power machine. This will also give me a chance to test the recoverability of a system from WHS without risk.

OCR with Google Docs

At last, Optical Character Recognition (OCR) online with Google Docs.

  • Go to
  • Use the link to sign in to your Google account.
  • Click browse to find your .jpg, .gif, or .png file to be converted.
  • Click 'Start Import'
  • Your image will now open as text in a Google Doc

My testing showed a fairly accurate conversion with very few mistakes. Because I was doing this in Firefox a squiggly red line appeared under the mistakes and a right click quickly fixed those. (Not sure if it's Firefox or Google Docs that puts the squiggly red line under those misspelled words? Might be both...)

One feature that's not available yet which I'd love to see is the ability to import PDF's that hold scanned images of documents. For some reason I seem to have a ton of those. At the moment, the only way that I can find to import those is to click on each page in the PDF. Copy it to the clipboard. Paste it into Paint.Net and then save it as an image. I then import that image into the doc. This is only practical if your PDF has a few pages. If it has many then this is just not workable.




Testing a site for nofollow links with jQuery

I'm working on a site at the moment and a number of the links on the pages do not need to be followed by the search engine (or other) spiders and do not need to be indexed. These links I set with the nofollow value on the rel attribute in the anchor tag follows:

<a href="some-link-goes-here" rel="nofollow">link text</a>

Some of the pages have plenty of links and I'm never sure if I've marked all of them correctly but I would know if I could color the already marked ones. This is easily achieved with jQuery. I open up the page in FireFox and then open the Firebug Console panel and in there I type:

$('[rel=nofollow]').attr('style', 'color:Fuchsia')

This will cause all of the links on the page with the rel attribute set to nofollow to be colored pink. You can then easily perform a visual check to see if you've caught them all.

(It will actually cause any text inside any tag with a rel attribute set to nofollow to be colored pink but I believe that the only recognized tag that rel is used with is the anchor tag.