Tuesday, May 5, 2009

File scheme URLs

I'm through with file scheme URLs.

An URL, as of course you know, is a Uniform Resource Locator. The notion is that, given an URL, client software (such as your web browser) can figure out how to retrieve that resource for your use. In the Build Farm, as is true in many modern web applications, the user interface is full of URLs, as there are many resources that we wish to display.

In the Build Farm, many of the resources to which we provide access are files:
  • build logs
  • test output
  • server logs
These files are recorded by individual build machines during job execution, and then are saved on the master file servers. The Build Farm UI then wants to make these files available, so I need to show links (URLs) to these files.

For a time, I tried to construct these links using file scheme URLs. URLs can use a variety of schemes. The "scheme" is the part of the URL at the start, which describes the underlying technique that is to be used to access the resource. Among the common schemes are:
  • http:// URLs, which are the most common, and which indicate that HyperTextTransportProtocol requests are to be used to access this resource.
  • https:// URLs, which are the secured variant of http URLs.
  • mailto: URLs, which are used for constructing links that initiate the sending of an email request.
  • file: URLs, which can be used to specify a file.
When you first have a look at file scheme URLs, they don't look too bad:

file://host/path
However, have a very close look at that middle paragraph in the description on Wikipedia:


On MS Windows systems, the normal colon (:) after a device letter has sometimes been replaced by a vertical bar (|) in file URLs. For example, to refer to file FOO.BAR in the top level directory of the C disk, the URL file:///C|/FOO.BAR was used. This reflected the original URL syntax, which made the colon a reserved character in a path part. For network shares, add an additional two slashes. For example, \\host\share\dir\file.txt, becomes file:////host/share/dir/file.txt.
Trying to figure out these details of how to handle device letters and network file shares is where the wheels start to come off the wagon. As I said, I tried, for a time, to make file scheme URLs work for network share references to build logs and other goodies stored on our file servers. But I could never get it to work reliably across different versions of different browsers. Firefox seemed to behave differently than Internet Explorer, and different versions of the various browsers seemed to behave differently. And the more I searched around on the net, the more confusing it got. There were seemingly authoritative articles suggesting use of 3 slashes, 4 slashes, even 5 slashes. In the bug reports, people seemed to be flailing, even using 20 or more slashes!

It was terribly frustrating, but in the end, I realized that I didn't need to care. Instead, I realized that I was trying to link to a bunch of files that were all on file servers that my web server already had access to. So, I simply changed my web server so that it was willing to serve static files via HTTP protocol (you have to fiddle with allowLinking=true in the Context definition), defined a simple set of symbolic links in my webapps/ROOT directory, thus creating a simple HTTP namespace for accessing the important files on the file server, and replaced all my nasty code which tried to build portable file scheme URLs with simple code that built simple HTTP scheme URLs.

And I was happy! So, like I said, I'm through with file scheme URLs.

No comments:

Post a Comment