Breaking the Sound Barrier in Second Life

Published and last updated 2022-03-09.

Premise

Up until 2020, the Second Life viewer, along with its third-party derivatives, had an interesting exploit in its sound uploader code that permitted uploading sounds up to about 60 seconds in length. In preparation for a server-side update to allow native sound uploads up to 30 seconds, the "vanilla" viewer and Firestorm Viewer patched this exploit - perhaps unintentionally - so it cannot be used on current viewer versions. Regardless, I explain it below for history's sake.

Some other users have experimented with the underpinnings of this exploit, but I think I was the first to figure out a remarkably clean and efficient way to make use of it.

Background

Second Life permits sound uploads in .WAV format, 44.1 kHz, 16-bit only, up to exactly 10 seconds in length. When uploading sounds individually, the Second Life viewer requires sounds to be exactly this format, and sounds played by the viewer are always 44.1 kHz, though I'm not sure if they are stored on the server in .WAV or .OGG format - the use of .OGG files at all implies that it's possible sound files are also stored as Ogg Vorbis, which would probably make sense to make use of its lower filesize.

When using the bulk upload tool, for whatever reason, the viewer allows you to submit an .OGG (Ogg Vorbis) file instead. When uploading an .OGG file, the viewer performs the length check the wrong way. The viewer will accept an .OGG file regardless of its sample rate, which should be 44.1 kHz. The viewer checks the length of the file not as a 44.1 kHz file, but as whatever sample rate the file actually is - for example, a file with a 88.2 kHz sample rate and 882,000 samples will be exactly 10 seconds. But when the viewer actually plays the file, it will assume 44.1 kHz and play the samples accordingly, because there should not be any way that a sound file sent by the server is not 44.1 kHz. Therefore, the file with 882,000 samples will be played for 20 seconds, at half of its original speed. The viewer will also not check the bit depth - it will accept a 32-bit .OGG despite requiring a 16-bit .WAV. (This is a highly simplified explanation of digital audio, but it suffices to explain the bug.)

This exploit has, in fact, been known for quite some time. Sound files exist from the 2000s that break the ten-second barrier and the concept fascinated me for a while. In 2013, I created a music format called Sei Media that essentially chained ten-second sound files together, and since then I experimented with methods to use longer sound files to save money when uploading large quantities of music. So these glitched sound files sat in my inventory for years, taunting me until I happened across the secret when playing with .OGG files in Audacity.

The above explanation should hint at how this bug is exploited, but there's a catch - if you play a sound at half of its original speed, it will also play with notably reduced quality. (Astute readers will mention at this point that playing an 88.2 kHz file at half speed should still result in a perfectly normal 44.1 kHz file, which is true, but I use "quality" in a specific sense that I'll describe later.) This isn't a huge issue at half-speed, but when you are trying to get up to 60 seconds, the quality is very poor. So as part of my research, I wanted to figure out a way to maintain the audio quality and possibly automate the process to save time when uploading long files.

Explaining Sample Rates

An .OGG file, like .WAV, is a series of audio "samples". These samples are numbers that represent a specific point on an audio waveform. If you imagine a sine wave tone, like the one in the above image, a sample is a single point somewhere on that wave. You can recreate the sound by stringing together a bunch of these points - the more points you have, the more accurately the sound is reproduced. The "sample rate", therefore, is the number of samples that constitute each second of audio. For 44.1 kHz, there are 44,100 samples per second. The image above shows the sample points on a 440 Hz sine wave sampled at 44.1 kHz. Technology Connections has an excellent video describing how this works.

The Hard Way

I want to preface this explanation with a disclaimer that I somewhat independently discovered and tested this method in March 2018 and haven't used it since, so take it with a couple of grains of salt if you want to try it yourself. This was based on some vague hypothesizing on the Second Life Forums about sample rates that I used to poke around in Audacity. My recollection of how this worked may be wrong.

If you open a 44.1 kHz file in Audacity, it will let you change the sample rate of the audio to basically whatever you want. Audacity does not resample the file when you do this - it takes all of the samples it has and says "okay, you are all going to be played at X kHz now". The file simply plays faster when raising the sample rate and slower when lowering it. (If, instead, you were to resample the file, Audacity would add or remove samples and retain the "speed" of the audio.)

If the trick isn't obvious by now, here it is. Change the sample rate of a 20-second 44.1 kHz file to 88.2 kHz, which will create a 10-second 88.2 kHz file. If you export it as an Ogg Vorbis file and upload it into Second Life, the viewer will check the length of the file as 88.2 kHz. But when played, the data in the file will be read as if it were 44.1 kHz, which causes it to play at the original 20-second speed.

The problem is compression quality. Ogg Vorbis is a lossy audio compression format. That is, when exporting an audio file, it will irretrievably throw away some of the audio data. When you export an Ogg Vorbis file, Audacity will prompt you to select a quality level from 1 to 10. Even at the highest quality level, some compression artifacts will be added to the audio. If you take this method to the extreme - for example, Audacity will let you go as high as 384 kHz - audio played in Second Life that uses this method will sound crushed. This makes for an interesting hack, but it's not really one that should be relied on for perfect audio quality.

Unfortunately, I really don't know enough about audio compression to explain that phenomenon more precisely. In fact, I may well be completely wrong. But I can tell that the audio was messed up when I used this method. Fortunately, a different method avoids the problem entirely.

The Easy Way

The Ogg format is, charitably, a nightmare to explain, so I would certainly not do it any justice here. But the short version is that every Ogg file contains a header with metadata about what's inside it, followed by "pages" of data that are defined by whatever codec it contains. (Ogg can contain other stuff besides Vorbis-encoded audio, but I'm just talking about Vorbis here.)

As an experiment, I exported two .OGG files from an identical source file - the first was 44.1 kHz, the second was eight times faster, or 352.8 kHz. Sure enough, per the Vorbis spec, a hex editor showed the key - offset 0x28 read as a 32-bit integer showed 44100 in the first and 352800 in the second.

On 2018-03-30, I wrote a little utility that overwrites the 44.1 kHz header sample rate with the 352.8 kHz one and repairs the header checksum. When uploaded into Second Life, the uploader will read the hacked header to determine the sound length notwithstanding the fact that the sound data is actually untouched 44.1 kHz Vorbis-encoded audio. It will assume, therefore, that the sound is one eighth the length it actually is, process it, and then the viewer will ignore the header when actually playing the file. Thus, it uses the sample rate exploit, but without compressing the audio as the wrong sample rate - the audio stream is intact and as high-quality as you like. The exploit doesn't actually touch the audio stream at all, so whatever you put in will sound pretty much exactly the same at higher export quality levels.

You can download the utility here. Requires Windows. Source code is here.

To use, drag a 32-bit, 44.1 kHz, mono Ogg Vorbis file up to 59.949 seconds onto the EXE, it will be converted in place. Please keep in mind that this is the first C++ app I wrote since taking some intro classes eight years earlier so don't expect perfection, but it does the job well enough.

Limitations

The only limitation to this method is the sound length. I do not know for sure why this is the case, but the Second Life viewer will refuse to play sounds that are longer than exactly 59.949 seconds. Attempting to do so will throw an error message in the viewer debug log and play nothing. I presume it rounds the length to the nearest tenth and just throws an undocumented error at 60.0 seconds or more, just in case some wiseass manages to play a file with a ridiculous length.

Conclusion & TL;DR

The exploit makes use of a bug in the bulk upload code that allows .OGG files of arbitrary sample rate and bit depth. When played, however, the viewer will always assume all sounds use a 44.1 kHz sample rate.

The common way of making use of this exploit involves specifically exporting a file at an incorrectly high sample rate, which introduces compression distortions because the Ogg Vorbis codec compresses the file based on the arbitrary sample rate. When played, the audio decompression is performed at the wrong sample rate, which causes a distorted effect.

I discovered that by using a native 44.1 kHz file, if you merely overwrite the sample rate specified in the .OGG file header, the viewer will use that sample rate to perform the length check. The actual audio data, however, is the untouched 44.1 kHz stream - when the viewer plays this as a 44.1 kHz file, there is no loss of quality.

The download link is here. Drag an .OGG file onto it to convert it in place. Have fun!