Wednesday 24 April 2013

Ripping your CD Collection – I. Metadata

It happens quite often.  People mention to me that they have started the process of ripping their CD collections to WAV files so they can start to play them through their computers.  And they haven’t paused to think it through before they start.  This is definitely an area where “look before you leap” or “an ounce of prevention is worth a pound of cure” can be held to apply.  Maybe I can help.

This is the first in a series of posts where I will talk about the real-world issues you will encounter when you take the plunge and commit to ripping your CD collection.  This mostly introductory post addresses the main predicament we face, and how we arrived to this juncture in the first place.

Part I    Metadata

I can’t think of anybody who has successfully made the transition from CDs to computer-based audio, and abandoned it to go back to CDs.  Once your music collection is safely tucked away on Hard Disk, the ability to navigate through it, to prepare playlists and collections, to browse intelligently – even to control it remotely using a mobile device such as an iPad or an iPhone – massively enriches the experience.  Even with a relatively mundane piece of software such as iTunes.  And serious high-end products such as Sooloos elevate the user experience much closer to the incredible possibilities that the brave new world of computer audio opens up for us.

My good friend ‘Richard H’ has something approaching 3,000 CDs in his collection.  They live in a selection of shelving units and cupboards that dot his listening room.  Richard knows pretty much where most of his CDs live, but occasionally some are hard to track down.  (Particularly if I’m the person who last put it away…)  Extracting full value from that collection involves not only knowing exactly where every disk sits, but also having a good memory for what tracks are on every one of those disks.  I’m sure many of you will identify with that.  But it is at least manageable.  Its what we’ve all gotten used to.

On the other hand, my sister Barbara works for a NPR radio station, WKSU, which is one of the biggest classical music stations in the world.  Their music collection comprises MANY THOUSANDS of disks, and their ability to function as a station relies to a great degree on the people who work there knowing how to find every last piece of music they own.  It is a nightmare of a task, and I have no idea how they manage that, but it seems they do it very effectively!  The thing is, with classical music, how do you organize a library of thousands of CDs with the sole assistance of a BIG shelving unit?  Do you do it by composer, by musical style, by period, by performer, by record label … or do you just stack ’em up one by one in the order you bought ’em?  There is no natural solution.  Particularly since, with classical music, a single CD can contain works by different composers, in different musical styles, of different periods, by different performers, and so forth.

But once you rip that library into computer files, there is an immediate, and very natural solution.  All that information is just data, and computers handle data very, very well.  The challenge, then, is to get all that valuable data off the discs and into the computer.  And that is where the problems start.  Because the data isn’t on the discs in the first place.

All of the information that is relevant to the music on an audio disc is termed “metadata”.  Most of it is printed on the jewel case artwork, or in the enclosed booklet, but none of it is encoded on the disc itself.  Back when the format of the CD was devised, more than 30 years ago, the concept did not exist of wanting to read that information from the disc, and so nobody thought to standardize any method for putting it on there.  Finally, in the mid-1990’s, when a standard did emerge for combining audio and data onto the same disc, there was no interest – let alone any sort of agreement – in establishing a standard format for doing so.  So it never happened.

What did happen was what always happens when a stubborn industry fails to meet the needs of their customers.  The geeks step in and engineer a solution of their own.  In this case it was called MP3.  Techies realized that they could play their music on their computers, if only they could get their music in there in the first place.  The trouble was, music files were so darned HUGE that you couldn’t fit many on the size of hard drives that were available at the time.  It is easy to forget that way back then the capacity of a CD exceeded the capacity of most computer hard drives!  You had to do something to get the size of the files down.  That something was the MP3 format.

So it soon became possible to collect a fair-sized number of music files on your computer and play them using some custom software.  Of course, if you wanted to be able to properly manage the new music collection on your computer – or even just identify which tracks were which – you wanted access to some of that “metadata” that I described.  So the next thing the geeks developed was the ID3 “metadata” tagging system, which was a way to embed metadata into the same files that contained the music.  MP3 became a file format that would store not only the music, but also all of the information that describes the music.  It was a revolutionary development, to which the music industry responded with various enlightened practices including refusing to accept it, pretending it didn’t exist, and trying to ban it.

With the record industry standing off to one side with its head in the sand, the next thing the geeks did was to come up with huge on-line databases which “cloud-sourced” (as we would describe the activity today) all of this metadata, together with some very clever information that individual users could use to interact with it.  Using these on-line resources, you could insert a CD into your computer, some clever software would analyze the CD, correctly identify it, locate all of its metadata, and – Bingo! – automatically insert it into the resultant audio files as part of the ripping process.

The “end of the beginning” (if I may channel Churchill) came when the hard disk industry started manufacturing drives big enough to hold the contents of a large numbers of CDs, and in response the geeks started developing alternative formats to MP3 which could store the music in a lossless form – the FLAC file format is by far the most popular – thereby preserving intact all of the musical information.  These new formats would also support the new high definition audio standards that were emerging at the same time.

Thus, with the support of an enthusiastic, geek-driven, audio hardware industry, the computer-based audio paradigm reached its first level of practical maturity.  The record industry at first refused staunchly to participate, and now that they are finally getting on board with downloading as a legitimate mainstream sales & marketing channel, they can no longer hope to control its de facto standards, which continue to evolve pretty much independently, for better or for worse.  Which is why we need a set of posts like this one – and the rest in this short series – to guide you through the perils of ripping a large collection.  Because it can be quite a frustrating business, and can take up an awful lot of your time.

Part II can be found here.