Bug in 0.3

Several people have pointed out to me that there is a bug in get_enclosures 0.3 where you can get a divide by zero error on a download that happens fast enough that will come down in the same second. People are getting this running get_enclosures against my bittorrent RSS feed. I’ll release an update to this this evening, but the workaround is to not use the bittorrent feed with get_enclosures. It’s not going to do anything anyway, other than just getting down the metadata file. It doesn’t do the downloading of the actual thing behind the torrent, so it doesn’t do anything for you anyway.

Release Management

I can see that I need to do something about the get_enclosures release management. Yesterday the 0.2 version was downloaded 50 times, while the 0.3 was downloaded 11 times. That’s not good, so I really need to do something ASAP. I guess in the short term, I’ll just remove the links to old versions and make them point to the new one, but I don’t want to have to update every old post every time I release one. Someone suggested that I just link to a release directory rather than to individual files, which is a good idea. My problem is that at the top level I have my Apache setup configured via .htaccess to serve the weblog page as the index. Does anyone know how to override that at a lower level? I want to have a subdirectory not serve that out, instead giving a directory listing. I looked at the Apache docs last night, seeing how I could turn that off or make the DirectoryIndex directive in my .htaccess file not recurse, and couldn’t figure it out.

get_enclosures 0.3

Here is an updated release of get_enclosures. I have freely stolen AppleScript snippets from Ray Slakinski’s pyPodder to improve my iTunes integration. Now, iTunes will stay in the back if it is being newly started or will stay where it is if it is already running. Also, the feeds.txt file is no longer in the zip so you don’t have to worry about overwriting your subscriptions if upgrading. It will just write a starter file if it doesn’t already exist, otherwise it will just leave it be if it is already there.

Try it out and please give me whatever feedback you can. I’m looking at perhaps trying to incorporate automatic installation of the necessary Perl modules or maybe even an installer to do the things Adam discusses, setting up the modules and the cron job interactively at install time. Does anyone have any input on good free installer programs for OS X or even a multi-platform?

I Hate Applescript, I Need Help

I love most things about Macs, but I completely freaking hate Applescript. I’ve been trying to add to get_enclosures functionality someone had suggested, the ability to add the URLs from an RSS enclosure feed to a playlist as something to stream rather than download. That sounds simple enough, but after 30 minutes of farting around and reading the barely existent documentation I cannot figure out how to take the URL that I have, and add it to a playlist as a streaming entry. There is some magic syntax that I haven’t yet stumbled upon. I’ve been a professional programmer for most of a decade now, and trying to use the “user friendly” AppleScript language consistently drives me completely bonkers.

If any of you out there know how to do this, please throw me a bone and either email me or leave me the snippet in a comment. I have the URL, I have the playlist name, I just want this thing to be a streaming URL in that playlist. Note too that I’m not wanting to start it immediately, but have the entry added to the list.

get_enclosures 0.2

Here’s an updated 0.2 release of the get_enclosures script. Added is the more robust caching mechanism based on the dates in the RSS item tag. This release will write out two M3U playlists in every directory that it downloads new files for, one alphabetical and one in reverse chronological order. This allows for ease of use with WinAmp or XMMS. Note that the M3U playlists will be based on everything that is in the directory at the time, so new files will be added to the existing ones. If you have deleted a file, it will not be reflected in the playlists.

You can also comment out a feed by preceding a line with #, which will keep it from being downloaded but without needing to be deleted from the file. This release should also fix issues with duplicates being added to the playlists.

If you are upgrading from a previous release, be sure not to overwrite your feeds.txt file when you unzip this. Either unzip it elsewhere or make a backup copy of feeds.txt so you don’t clobber it. Y’all probably already know this, but I just thought I’d remind.

Update: Thanks to Gordon for pointing out the boned URL. I really need to learn to not push out these things after midnight.

Renko

Pete Prodoehl has done a script much like get_enclosures called renko. I downloaded it and it worked well. It’s easier to install than mine because he includes all the modules you need in his distribution, so if that’s an issue with you by all means go get renko.

He also points out in a post that the correct place for all this stuff to be happening is in the desktop aggregators like NetNewsWire and Shrook. It shouldn’t be too hard – have an enclosure preferences that lets you decide to download all enclosures automatically, only ones of specific MIME types, none, ask every time, etc. Set up a directory where you want them to go, click a check button if you want them automatically added to iTunes and away you go. For me, the only big issue here would be if NetNewsWire added this support. I’m already paid up on Shrook and I like it, but if this functionality gets added to a competitive product then I might switch.

This kind of thinking has occurred to me as well, which is why there is an upper limit to how much effort I’ll ultimately put into get_enclosures. The best script in the world pales to mediocre support in the desktop aggregators. There are one or two additional things I’d like to add to it, and then I’m probably going to slow or stop work, only fixing bugs as they are brought to my attention. We’ve now done our work in validating the proof of concept further, and it is time for the aggregator developers to step up next.

Changing the Caching Mechanism

I’m going to change the way the caching for the files works in get_enclosures. The way it works now is that when a file is downloaded, the current timestamp is saved. Before a file is downloaded, it is checked to see if there exists a timestamp for it. If so, it is not downloaded. I realize that this case is too simplistic, and I thought of a use case that would make this break while I was thinking of something else that I thought would be cool. But first, a digression.

In this talk of the “iPod platform”, for over two years now I’ve been saving the MP3 files from the WREK streaming archives off for specific shows. I would then burn them to CD and listen to them offline. I did this with custom scripts and Windows scheduled tasks. It occurred to me that this could easily be something that reused all this infrastructure. I realized that it would be quite simple to create a cron task that would write out an RSS feed with enclosures for the various programs on that station. Then, the get_enclosure script could just download them when it was doing its thing anyway.

Here’s where the mechanism described in paragraph 1 falls apart: every week, the URL to get the MP3 archive for that same half-hour of programming is the same. With the existing mechanism, that URL would be downloaded once and only once, the first time the script ran. All subsequent runs would find that URL as one that has already been downloaded. Damn, so close yet so far.

Here’s how that can be fixed, and how perhaps it makes things more robust in all cases. The RSS 2.0 spec defines (requires?) an element for the item, pubDate. I’ve altered the caching mechanism to use this value rather than the current timestamp. Then, when examining whether to get the file it checks the value contained in the pubDate of the item in the current feed versus the one in the cache. If the feed is newer than the cache, get it again. This allows for getting a file down like the WREK situation, where the file name and URL will be reused every week as the contents of the file are rewritten with the new week’s stream. When assembling the RSS 2.0 feed with the enclosure, the pubDate is set to the correct value for that week and everything will work out. Conceivably, this could also allow for redownloading of a file that was edited and republished with everything else the same but the pubDate updated to the new publish time. Because these are textual times, I wrote a simple function that compares two RFC 822 dates and finds out which is the earliest, so for the individual download URLs everything will be used, compared and stored with those dates from the item tag. There are better, more robust ways such as using Date::Manip, but I don’t want to require people to install any more modules than they already do. In fact, I might think about getting rid of the dependence on XML::Simple.

This updated mechanism will be part of the 0.2 release. As well, I will pick a WREK show or three to prepare these experimental feeds for. If they like it and want to do it, I’ll let them have it and they can put it on their own site.

iPodder for Windows

Via a comment, Pieter Overbeeke informs me that he has a script for downloading files and controlling iTunes for Windows XP available! This does the same stuff as get_enclosures or iPodder on the Mac, by getting the files and also adding them to the iTunes library.

In the shower this morning, I was wondering if there were COM libraries for Perl that I could use to control Windows iTunes from get_enclosurest. Now that Pieter has invented this wheel, there is no need. If you are on Windows you should definitely give his script a try. More infrastructure for the “iPod platform!”

Multiplatform it is!

I let the updated get_enclosures script run overnight on a Windows box with Active Perl installed, and it worked just fine. Right on! There were a few minor issues, but the files all came down, so that’s good.

I tried to test it against Cygwin’s version of Perl but couldn’t get the modules installed. I recently had to wipe and reinstall my Windows 2000 OS because Windows is such a fragile piece of shit that it eats itself over time, and when I did I had to start from scratch. This Cygwin is newly installed, and I get all kinds of make errors trying to install the LWP. It’s really weird because if I go into the build directories and manually run make it works. I’ve never seen this before on any Cygwin install. If some kind soul out there could test this script with Cygwin Perl and let me know how it goes, I’d highly highly appreciate it. I’m not planning on spending any time fighting with Cygwin.

Update: I did get it to work on Cygwin after all, and it seemed to work fine. If anyone has success with this on any other platforms, let me know.

get_enclosures version 0.1

Here is the updated version of get_enclosures, version 0.1. The zip now includes a changes.txt which covers the differences from the previous version. It now is no longer dependent on Mac::AppleScript, which means it will run without alteration on Linux or Cygwin, etc (it still depends on XML::Simple and LWP). It caches the RSS time so that feeds are not redownloaded unless something has changed since last time. Thanks to Brian Tol you can now get nicely formatted documentation via POD (run “perldoc get_enclosures.pl” to see it.)

Thanks to all who have given suggestions and used this. I highly recommend everyone upgrade to this if you downloaded the previous one, particularly the person who had this on a cron job to download my RSS feed every minute. Thanks, anonymous friend, you reminded me that any reasonable RSS consumer should be using Last-Modified out of etiquette.

Update: That enhancement of not fetching the RSS every time introduced a bug, because I was clearing out URLs from the cache if there weren’t in the RSS feeds. Well, when you don’t fetch the RSS feed at all, there are no URLs in it at all so it was clearing out the cache when nothing was new. For the time being, I have just turned off the cache cleanup altogether. This cache is not going to be getting large relative to an audiofile in any reasonable timescale anyway.

get_enclosures Category

Since this is taking off, and a little faster than I expected, I am creating a category for this on the blog. From here on forward, I’m posting everything about it in this category. I do ask everyone that uses this to, if you don’t subscribe to the whole blog RSS here, to at least subscribe to this category. If there is some sort of bug fix or new release, I’ll post it here and then you’ll know about it. Through the miracle of blosxom, you can automatically subscribe to the RSS for any subcategory, and the RSS feed for just this category is here. There will be a release of a 0.1 version (what is out there now I am retroactively calling 0.0) before I go to bed tonight. It will have enough new stuff, including a serious performance tweak, that all current users should upgrade. In addition, it includes what people like Gordon Smith suggested here and make this so that it will work on non Apple platforms. My original conception was that this would be specific to Mac and iTunes, but there is no reason to be that specific. Now, it will work as a downloader for anyone that can have the right Perl stuff installed, on Linux, on Windows (straight or Cygwin), etc. Cool stuff.