ISWIX, LLC View Christopher Painter's profile on LinkedIn profile for Christopher Painter at Stack Overflow, Q&A for professional and enthusiast programmers

June 20, 2007

Dealing with very large number of files

Allan Bommer recently asked on the WiX-Users list:

The one item I am worried about is managing a very large number of files. Our application’s Help directory contains about 1500 files (html, gif, etc). It seems the only way to get these files installed is to list them explicitly (with some help from Heat). It also appears that it is suggested to make each file a component.

Explicitly dealing with 1500 files and their component GUIDs seems likely to have a very low ROI for the programmer and company. I’m sure my clients would rather have me crank out a few useful features in the application than spend countless hours on all those individual entries. Perhaps if you have a million clients, then dealing with all those files individually makes sense, but with 100, 1000 or 10,000 clients, the cost-benefit ratio doesn’t look very good.


Allan hits on one of the weaknesses of the declarative design of Windows Installer. Rich metadata describing the behavior of the contents and behavior of the installer has the huge downside of having to author that metadata and keep it in sync with the ever changing world. The upgrade servicing patterns also require that you be absolutely perfect in how you do this. One component rule violation or primary key column change across dozens of builds will result in undesired servicing behavior.

Allan then asks ( in his own way ) about dynamically linking the files. Dynamically file linking will generally work but it has several downsides:

The most important downside is the possibility that a file should be in the package, but that it was not where it was supposed to be at build time. If the file was described declaratively this would be a build break. But with dynamic file linking, it simply isn't linked and now we have a potential runtime break.

The second downside is that it's very difficult to perform minor upgrades and patches due to the dynamic nature of the component authoring. Some tools like InstallShield have patterns for mitigating this, but it's still a crap shoot.

The third downside is that your build will take A LOT longer to perform since it's not only creating your package but that it's also authoring and validating your component definitions.

But I think there is a third way to go about this and I probably spent at least one of my five hours with Sinan Karaca touching on this subject: Use authoring tools with very rich component wizards. InstallShield has one, but it's not nearly good enough in my opinion. Here are a few attributes that would make it better:

`Deprecated Component` instead of Deleted Component. Minor upgrades can't remove a component and it's keyfile, so instead you take advantage of Transitive Components and NOOP conditional expressions along with a 0 Byte file in the media. This allows a developer to safely remove a component without worrying about which upgrade pattern will be used to service the installation.

Better definition and control over the concept of `Best Practices` when authoring the components. On MSDN you will find 2 very different definitions of best practices for componentizing files. The MSI team breaks things by directory and COM servers with all other files being allowed to share a component as a designated key file and companion files. The Visual Studio team takes a different approach that every file should be a key file. To be honest, I see ways in which they are both correct. A properly behaving component wizard shouldn't simply ask `do you want to follow best practices` as if it is a black and white decision. It should offer you a choice of those two best practices and a custom mode where you get to decide for yourself. The point is when I have some ASP.NET developer adding files I don't want him to have to understand the component rules. I don't want him to need to understand that default.ascx is the keyfile and background.png is the companionfile. He just needs to know that he can update either file and he can expect to see his changes on the server after the installation is executed.

Another feature that a good wizard should have is the ability to automatically detect changes between the currently authored components and the directory structure that it is syncronizing. We are talking easy button here: Rescan Components followed by "I see these 5 new files... would you like to add them?" or "These 5 files are missing, would you like me to deprecate them?"

MSI might be complex, maybe more then it needs to be, but there is value in it's patterns. It's just time that we have authoring tools that makes it a hell of a lot easier so that Allan doesn't have to be pointing out the obvious ROI problems associated with it.

1 comment:

Anonymous said...

Thanks for your feedback Christopher. The "Deprecated Component" and "Rescan Components" ideas seem to have a lot of value.

-Allan