Tuesday, September 29, 2009

Programming Fail: Directory.GetFiles()

I went to demo some shiny new code for a friend, and we both had a laugh when my program pretty much puked all over itself.

Admittedly I had not run this code on this particular machine before, and it was running Vista and .NET 3.5 (neither of which I'd tested against) But upon finding the bug, I find it difficult to understand what sort of design decision would involve such an arbitrary behavior.

Directory.GetFiles() seems like a pretty straight-forward function. You hand it a directory to search, and a search pattern (e.g. "*.exe"), and it returns a list of all the files in that directory which match the search. OK, I can handle that. Or so I thought.

The fine print of the documentation contains the gotcha:

The following list shows the behavior of different lengths for the searchPattern parameter:

* "*.abc" returns files having an extension of .abc, .abcd, .abcde, .abcdef, and so on.
* "*.abcd" returns only files having an extension of .abcd.
* "*.abcde" returns only files having an extension of .abcde.
* "*.abcdef" returns only files having an extension of .abcdef.


Ummm. OK. So, MSFT, basically your search pattern violates decades of common convention with regards to regular expressions, and the developer gets to figure this out when the app bombs? OK, sure. Brilliant. Ship it, yo.

Let's say you're looking for TIFF files, which can have either "tif" or "tiff" as the file extension. What happens is: calling GetFiles() with both of those extensions will result in the "tiff" files being added twice. Bzzt, fail MSFT, fail.

So, the fix is: either go through the aggregate list and remove duplicates, or remove the "*.tiff" search pattern (by the way, I got my supported extensions through the image handling classes in .NET). But it'd be a lot easier to simply have this method behave in a manner that is normal and predictable and sane: if I ask for "*.tif," I only want "*.tif". But I guess that's asking for too much.

No comments: