3D audio file display

I would think this would be fairly straightforward in Vuo. I have a .wav file, a wavetable, 16 single cycle waveforms, each 2048 samples long (so audio file total length is 32768 samples, at 44.1k sample rate). How do I get the .wav file data into Vuo? I think once I have this sorted, I am good to go.

I want to display the 16 single cycle waveforms in 3D, like many wavetable synths do. For example, Serum’s:

Screen Shot 2020-10-28 at 1.33.54 AM.png

Problem: Play Audio File appears to output something with 512 samples (and the wrong sample rate):

Screen Shot 2020-10-28 at 1.36.09 AM.png

mixed0_1-5.wav_.zip (115 KB)

Hello fellow Bidule user.

If it has not changed, Vuo only supports 48 kHz. And the 512 samples is the buffer. That’s what my memory tells me. It’s been awhile…

Hey, @Kewl I remember following along online with that video + music project you made a couple years back, very cool!

If I am dealing with actual samples, wouldn’t think sample rate would be an issue here? The 512 sample buffer is another story – sending to Make Waveform Image appears to process 512 samples, one buffer size block at a time, sequentially. Working on queuing up a workable list of samples in the audio file (y coordinates in a graph).  

I have hope. Right now I have a problem, probably something basic here – in this comp, why no output from Make Line Strips Object? Zip file contains test waveform.

Edit: the end goal is to create a list of 3D line strip objects, here one object per 2048 audio samples, that can be placed on a grid as slices with Arrange 3D Objects on a Grid.

Looks like I can generate a list of 2048 items representing one single cycle waveform by “unpacking” sequential buffer size blocks of samples (4 buffers = 2048 samples – though until I can see the output from Make Line Strips Object, I am not sure it is actually working). I have not quite been able to imagine an intermediate step, how to use Build List make a list of 4 audio buffers for each waveform, which amounts to list indices 1-4, 5-8, 9-12, etc. then “unpacking” to lists of 2048 items (waveform amplitudes) for each waveform (maybe this has to be by brute force, like with the Get Item From List and Append Lists section in the attached comp?). This points to another end goal – to be able to set wavetable parameters for number of waveforms and waveform sample length.  

3DAudioWaveform.zip (118 KB)

Ok, real progress. Not sure exactly why the Make Line Strips Object was not working but it is now. Perhaps something to do with the Render Scene to Window objects input port. (By the way, along the journey here some Vuo editor GUI weirdness, detaching a cable will leave “ghost” cables not attached to anything, module port fragments, etc. Doesn’t appear to cause any real problems yet.)

Now the issue is that this method I came up with to queue up 4 consecutive audio buffers, then unpack them into 16x2048 sample length waveforms, does not appear to be sample accurate – appears to grab only part of each waveform, and slices are not aligned (see pic – this uses draggable camera. Also, there is some semi-attached line segment artifact in the front waveform).

The other pic shows the test .wav file, the 16x2048 sine wavetable – the goal is to get 16 complete sine wave cycles, aligned slices, stacked front to back.

How can I do this?


 

3DAudioWaveform.zip (559 KB)

Does it have to be line objects if you only want to display pre-existing wav files? You can do something like the example display by using the waveform images and output it to 3D-squares and arrange those to your liking.

This is one stream split by frequencies, but it should be easily adoptable to multiple files as well. Requires playback to display, but I would guess that’s the point?

1 Like

Oh, and you have an offset from the input in your lines. Your list starts at the beginning of the line, but the control signal doesn’t come in until where the linear behaviour ends. This is probably why the waves are out of sync as well. Try cutting the input list going to the line object node from the beginning until it syncs up/starts at the wave.

Oh, and you have an offset from the input in your lines. Your list starts at the beginning of the line, but the control signal doesn’t come in until where the linear behaviour ends. This is probably why the waves are out of sync as well. Try cutting the input list going to the line object node from the beginning until it syncs up/starts at the wave.

Thank you, that’s helpful, I’ll try – maybe something to do with needing the file to play to the end, then enqueue, then build the graph? Maybe a properly placed Spin Off Event would solve it? By the way, the “wavetable” here is one .wav file with 16 waveforms one after the other. I’ll have another look at using Make Waveform Image. As long as the 16 slices can be extracted from the wavetable file and lined up with sample accuracy, then I’ll go with whatever graphic method works best, or works period. I’ll see how the orthographic camera performs – actually the line strip “ribbon” gets wonky, changes from start to end with the ortho camera, wondering if there’s a way to flatten that view and lose the ribbon effect so all line weight and thickness is the same.  

If you’re not scared of some heavy nerding, you can also just get the bytes from the wav files via the Data nodes and convert the sample range from the file to the Y-values you need. That way you get straight to the data you want instead of going through a build/process list step.

Be aware of the endianness of the wav data (Stackoverflow look at the second answer/The Canonical WAVE file format). Although the data starts at byte offset 45, it’s little-endian meaning that you want to start at byte offset 46, and skip every other byte as you won’t need the details for a display anyways. You can probably skip even more bytes in total, just add 2 to the skip count to decrease the resolution…

I’m too slow to figure out the formula, but you also probably want to shift all values above 127 to be in the negative range. Since the byte values are naturally unsigned they are subtracted from the maximum byte value (255).

2 Likes

The calculate list formula is:

(((((Height/ 255) * X) - (-(Height/2))) % ((Height/2) - (-(Height/2)))) + ((Height/2) - (-(Height/2)))) % ((Height/2) - (-(Height/2))) + (-(Height/2))

to deal with conversion from 8-bit to Vuo coordinates and place the parts correctly.

@MartinusMagneson, this is f-ing awesome! Thank you very much for doing the leg work on that. This is more what I was hoping for, being able to unpack the .wav file raw data. And thank you for sharing that 8-bit to Vuo coordinate conversion formula, would have landed there somehow.

I still need some way to isolate each waveform cycle, some % op scenario. So building a list of the 16 consecutive waveform cycles in the wavetable is still a likely outcome, which then is versatile for scaling, color, manually moving through the wavetable, possibly displaying phase shifts, etc.

Ps. what tools were you using in Vuo to see what data you had? Were you monitoring the data with Console?  

1 Like

By accident just found @alexmitchellmus wavetable synth node set in the node gallery. That’s on the mark here, too. Very cool.  

I just looked at the header data with the two loose nodes at the top to confirm the data according to the spec. For the data itself I looked at the output object which confused me for a bit until I remembered the endian-thing. If you know the sample size per cycle it’s just a matter of setting the byte count (Cut Data) to the double of the samples.

The wave data itself starts at byte 45 after the header, and from what I understand use two bytes per sample. There is no direct data → samples conversion (maybe a FR is in place?), but you might get lucky with converting a real list to audio. To do so (in theory) you can feed the output from the Cut Data to a Deinterleave list where the second list gets calculated with (1/256) x ListItem and the first list would be calculated with ((1/256)/256) x ListItem

(The proper term is probably calculation, not formula. I’ve been baking too much the last 7 months…)

Awesome, thank you. I’ll see what I can put together.

(Yeah, man, these past months have been like no other… not to mention politics, so easy to let that @#$% take over the psyche…)  

1 Like

You also have to add the two lists together again, forgot to mention that!

Damn Martinus, you do really have some nice skills in a broad range of domains ;)
Amazing.

No luck yet with my wavetable files. I’m a little confused at the moment how to set up Comb List or ?. Viewing only the first part of the list in the output port display, looks like there are a lot of extra zeroes in my file, and I am not sure about the pattern and what/how to skip. Why so many zeroes? @MartinusMagneson as you mentioned, it seems that I should be able to “resample”. Need to bear down on the numbers. It would be nice to tame the data to get back to something resembling the 32768 total samples → 16x2048, i.e., straightforward power of 2 stuff…  

@bodysoulspirit I get lucky sometimes!

@jersmi

Comb list is for the visual part only, I’ll come back to the implementation but you have to think in terms of samples and bits in relation to the bytes you work with. This might cover things you already know, but I can give a simple breakdown of it for others that might be interested as well. For audio, samples determine the pitch resolution, and bit-depth the amplitude resolution. When we speak of a 44.1KHz 16bit audio file, we mean that we have sampled the analog signal 44 100 times per second, with a quantization (rounding to nearest) of 16 bit = 65 536 different amplitude (loudness) levels.

Taking this over to byte manipulation; If 1 second of audio data is 44 100 samples, and there are 2 bytes per sample to give the 16 bits (there are usually 8 bits in a byte), you have 88 200 bytes for that second. Since the bytes are little endian it means that the first byte of a pair of two contains the detail information, while the second byte contains the coarse information. In practice the second byte place the amplitude into a coarse quantization of 256 levels (8 bit) from min(0) to max(255). The first byte place the amplitude into a fine quantization of 256 levels between two values in the *second *byte giving the total of 65 536 levels of amplitude.

That one second of audio is then represented in Vuo by a list of 88 200 integers with the Get Data Bytes node. If you combine the bytes to get the proper 44 100 samples with a resolution of 65 536 each, it would require a screen with a pixel resolution of 44 100px X 65 536px to display all of the detail. Undoubtedly cool, but unrealistic for most people at this time (and even if it were probable, it would be unnecessary as you wouldn’t see the detail). In addition, you want to avoid processing the parts you don’t need, as that would use a lot of resources for no benefit.

This is where we can start cutting away the data we don’t need to display. First of all is the header which by the spec is 44 bytes long. Then we don’t need the fine information carried by the first byte of data, meaning that we can skip that one and start at byte 46 using the Cut Data node. Something to note here is that since all the samples comes in pairs of 2, an odd number will always be the fine data, while an even number will always be the coarse data you want if you skip further ahead. If we now look at what we have, it’s still a list of 44 100 numbers for that second. As this is still far more than what’s needed for display, we can use the comb list to chop away more of the data before sending it to a mesh/object. So from the Cut Data node you can connect a Get Data Bytes node (which will give you a list of integers) and connect that one to a Comb List node. The Comb List works by specifying how many items in the list you want to keep, and how many you want to throw away. Here you want to keep 1 (as the next item would be the “fine” byte), and then skip an odd number until the number of points you get from it is a reasonable amount. If you keep 1 and skip 9, you effectively divide the list by 10, giving you 4 410 items in the list. If you skip 999, you divide by about 1000, giving you 45 items. Finding the suitable amount here for your display and sample length will take some tweaking. Smaller is better though as huge lists can be quite resource intensive.

As for 0s, if there is 1 millisecond of silence at the start of your wave data, it means that you’ll have 44 samples with a value of 0, and a list of 88 bytes with a value of 0 through the Get Data Bytes node. This will onlsy show 0s at the output port. If the length of silence is unknown, you could try scrubbing through your file using the starting point of the Cut Data node, and just add an offset to the byte count input with an Add node until something shows up. Or you could try locating the starting sample in an audio editor (probably faster/more accurate).

Note that this is only valid for a proper WAVE file as well. I’m not sure what the difference is to AIFF, but it could involve the header and the endianness of the data, so that would also have to be checked. Compressed formats (.mpX) will probably require decoding before something useful can be pulled from this approach.

Sharing my current stage, and my clear lack of knowledge. Still not quite getting the final result I need, digging in a little at a time, online research has not quite given me the whole picture, fragments here and there, stackoverflow, etc.

This is what I output right now:

I don’t understand a number of things –

In order to get at least something that contains waveform like display, I need to set Cut Data start byte back to 45. Maybe this is because of the .wav file structure generated in the software I use (Plogue Bidule – uncompressed PCM)? Then there is the byte length of my file – 131153. How do I understand the header + remaining byte structure in relation to my 32768 sample length, 16-bit mono, etc.? I should double check, but I’d be really surprised if Bidule output anything other than clean, most universal .wav format. What makes most numerical sense is sample length*4 = 131072, which leaves 81 bytes left over – why? Then there is Comb List set to skip 5 – why every 6th byte in the list?

@MartinusMagneson – that 8-bit conversion calculation, I have not quite unpacked that, I want to understand the 8-bit integer conversion 'Get Data Bytes` is outputting, is the calculation something about uLaw conversion? Feeling the desire to simplify to generate a [-1,1] range then apply height scaling…

In general I am looking ahead a bit to having a reliable setup to handle a range of wavetable size and waveform cycle length.  

Take a look at this page which apparently is the source for the previously referred to image: http://soundfile.sapp.org/doc/WaveFormat/. It has a better explanation of the header.

I think this discussion might go into a full tutorial (the wave format should be well suited for that), but you will have to check the header to determine the specifics of your wave file. My example was a mono file, but if you have a stereo file, it will intertwine the data for each channel. You can read this from the “Num Channels” in the header. Basically you’ll have to determine the pick and skip count by the data in the header, which should enable an automatic conversion and display of the files as well. I’ll need some time to type it up, but I already have the header extraction done.