A couple years ago, downloading a song from Soundcloud used to be pretty trivial. Their server would send you the complete 128kbps MP3, and then the local embedded control would allow you to seek at will. Because the file arrived in one large chunk, it was both easy to identify in cache, and easy to copy somewhere else to play back. Sometimes this still works… You’ll know by looking at the dev console, and see if it shows a huge MP3 file transfer. If so, you’re in luck! Copy it from the cache and you’re set.
Evidently they’ve changed this practice for other tunes, possibly to improve the latency of seeking at random in tracks, or possibly because they don’t want people getting music they shouldn’t be able to get. You can get Greasemonkey scripts which put the download button back, but these simply fire the URL off to an third-party site which “somehow” reconstructs the song and then sends it back your way. Very black-box magic stuff indeed.
However – If you can stream it, you can download it, as they say. Let’s take a look at how Soundcloud actually gets a song to you, and see if we can still figure out how to download something we may not really be allowed to.
Start the browser’s Developer Console and then browse to a song you want to hear. Keep an eye on the “network” activity, it will give you clues as to what is actually going on. As the song begins playing, you’ll see a lot of small network requests to magically named files:
This seems promising. Download one or two and run “file” on it, and you get:
$ file *
c5f47vUnF3Ow.128.mp3?f10880d39085a94a0418a7e163b03d5226edfe2317e6aa1445547d76cf23a7ca5b08b0b9169eed2c0a13f681ab93c51d8e788dcaa887622ee2905d7463e4fd982e918b5b687caf75047026a3429731c5010a16: MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
c5f47vUnF3Ow.128.mp3?f10880d39085a94a0418a7e167b03d5249919aaf544816306c9a5e3ca05a129454accfdda2750c51705ac2f68f036a37b2c482058312ab10625db87a6e3ab6dc1d1631dbd883a3f38786db484e66359daf667314eb8f03: MPEG ADTS, layer III, v1, 128 kbps, 44.1 kHz, JntStereo
Okay, so Soundcloud has broken the file into parts and is playing them back in sequence. You can pop one of these into your media player and listen to a portion of the song. We’re close, but how do we know where to find all these parts and put them together in order? Easy: there’s an m3u8 that has that for you – check the Dev Console again! Soundcloud’s player is using this to fetch the data in order from various URLs, and then stream it to you. For example, something like this:
An interesting aside, it seems these URLs time out after a short period of time, leading to 403 Forbidden errors if you try to access it again. No doubt these huge URL parameters point to some browser session or timestamp which becomes invalid after a while. If that happens, reload the page and start playing again to generate new files.
So to recap all this: we need to
- take the m3u8 file,
- retrieve each mp3 segment,
- and concatenate them together.
Getting the m3u8 programmatically is hard, so just copy it from the browser : ) And to put these together you’ll need mp3cat installed – see http://tomclegg.ca/mp3cat for info.
open(FP,"playlist.m3u8") or die "can't open playlist: $!";
my $piece = 0;
while( < FP > )
next if ($_ =~ m/^#/);
$filename = sprintf('outdir/%02s.mp3',$piece);
print `wget --no-check-certificate -O $filename $_`;
print `cat outdir/*.mp3 | mp3cat - - > output.mp3`;