Audio level per-frame as 1D texture

This is an odd one. I'd like a plugin that records a frames-worth of audio (at the current frame-rate), and outputs the level of the audio during that time-period as a 1D texture (ie a strip) of selectable width.

So, for example, you could choose to have it output a 256px wide strip every frame. The luminosity of the pixels across the strip from left to right would represent the amplitude of the audio over the course of one frame.

How does that sound? Is it doable in a QC plugin?

alx

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

cwright's picture
doable, but deadly

This is actually how spurtg worked nearly a decade ago (holy crap I need to get a real job... ;). It's a powerful mechanism, however there are some caveats:

a "frame's worth" of audio is meaningless: how long is a frame? How many samples per second? These two vary over time (framerates in particular in QC are all over the place; samples per second is usually more consistent, but never synced with the video frame rate, so it'll drift). Instead of doing shady hand-waving with sample rates and frame rates, why not have an input that asks how many samples you're interested in fetching, and it'll provide a texture of that size of the last N samples (updated asynchronously, so no audio/video stalls occur). This way if you asked for 600 samples, and a sample rate of 1Hz (asinine, I know), it won't take 10 minutes per frame :)

Other details: How do we handle stereo? 5.1? 7.1? X.Y? As independent textures?

toneburst's picture
well, I was just thinking it

Well, the old tricks are the best as they say. Or is that jokes...?

well, I was just thinking it would record audio for a frame (how ever long that was) into a buffer at a fixed sample-rate. Then, while it was recording the next frames-worth, it would divide the number of samples recorded by the number of pixels requested, and write every nth sample as a luminosity value into the image, discarding the rest.

Non-programmers idea of what should happen, of course....

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

toneburst's picture
I'm not looking for total

I'm not looking for total accuracy here, by the way, I'm really just after a way to get around the fact that you can only get audio level data into QC per-frame, and I'm interested in trying to emulate analogue systems that are able to modulate video signals continuously, rather than just on a frame-by-frame basis.

Incidentally, is there a theoretical limit to the pixel length of one of these strips?

Hope this makes sense.

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

cwright's picture
max texture size

The image size would be limited to the maximum texture size supported by the card in use (2048 for intel cards, 4096 for everything else. maybe 1024 for old cards).

I fully understand the (un)need for total accuracy. But if it's possible, I'd like to keep data in QC as high-def as possible. :)

toneburst's picture
Multi-Channel Audio

good point...

I hadn't thought of that. Maybe have several options:

  1. Output a single image strip with the average of all channels
  2. Output separate strip for each channel
  3. Output one 2D image strip with one line for each channel (more efficient?)

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

cwright's picture
multistrip...

1 is easy, but not particularly useful if you're going through the effort to have a working multi-channel setup. Not sure if I'll do this.

2 was my initial thought; it's easy, and you can ignore what you're not interested in. You also don't need to deal with some of the filtering issues presented in 3...

3 was also an initial thought, but since texture lookups aren't always pixel-exact, you'd get some filter blur from adjacent pixels. This isn't bad horizontally (temporal smoothing), but would be wrong vertically (cross-channel smoothing). If we can do some tests to refine the precision of sampled data, this would definitely simplify things though (less images to deal with, and fewer system/vram transactions)

toneburst's picture
Cross-Channel Blurring

I've come across this issue on the past, in fact with 2D LUTs. You can get around the problem in a quick-and-dirty way by making the strap, say, 20 pixels high, and sampling at Y = 5px and Y = 15px (or the normalized equivalent coordinates). Probably not ideal, but I've known it to work.

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

cwright's picture
good to know

Good to know this works; it was one hack-around I was considering, but I didn't get around to whipping up a test just yet.

Do you have any indication of how large the "landing strip" needs to be? 5 pixels seems high; maybe 3 would do? more tests to do I guess

toneburst's picture
Dunno.

Dunno. Might require some experimentation.

It might actually be cool to have that L-R blurring. You could even deliberately introduce some kind of vertical blending. That way, you could have a range of values to use, between 2 completely discrete channels, and a mono signal. Might be useful for something....

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

franz's picture
the manual way....

can't you do this manually, without any plugin ? make a 256 pix by 1 render in image patch (into which you'll have plugged some gfx responding to the audio amplitude , as you like -) then feed into an accumulator (or multiple ones). it shouldn't be that hard. What are you trying to do exactly ? What do you intend to do with your 256pix strip ? --- i'm just curious ---

cwright's picture
high-res

You can only accumulate one sample per frame though; he's wanting to have access to a whole frame's worth of samples (typically between 200 and 800 samples, depending on various rates).

He wants them for displacement in GLSL, most likely. Though I could also think of some interesting uses (scopes of various flavors, GPU signal analysis, among others)

toneburst's picture
That's Correct

I'm looking to get a range of values per-frame. Also it will allow me to get much closer to a traditional oscilloscope, which is continuously updated, rather than only updating 60x/sec.

I can also see it being used to emulate the kind of analogue CRT effects where audio signals are used to directly modulate the screen scan-line in various ways (think Rutt-Etra).

I can see a use for accumulation-type effects being applied to the output of such a plugin, mind you, to smooth the transitions between frames a little.

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

franz's picture
smooth

"Also it will allow me to get much closer to a traditional oscilloscope, which is continuously updated, rather than only updating 60x/sec."

maybe some interpolation could do the trick and smooth out the result to give an analogue look.

cwright's picture
insufficient data

There's not enough data present at a 1 sample-per-frame sample rate to get a decent scope.

For Example:

This screenshot had I believe 800 samples of data per frame (each frame had entirely new data, for all 800 points). With an accumulator in QC, it would take 10-15 seconds to acquire this much data, and all the high-frequency (>60Hz) data would be lost -- for most audio, the interesting stuff is above 60Hz :)

As an alternative he could try reconstructing the waveform using the frequency band data, but the limited set of data (16 channels) still severely limits the accuracy of the generated signal.

toneburst's picture
Great!!

This example looks like exactly the kind of thing I'm after, in fact. How did you do this?

Incidentally, I did consider recreating the waveform with the spectral outputs. This is a different kind of display, though- more like a very low-res FFT than an analogue waveform display, which is more what I'm after

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

cwright's picture
spurtg

This was rendered in a really old graphics program I wrote in highschool, called spurtg. you can find it here: http://prj.softpixel.com/spurtg/ It's linux-only, super antiquated, and the 3rd-generation precursor to kineme in its current form (spurtg was the beginning of my foray into interactive audio/visual applications)

toneburst's picture
Ah... Now I see...

I see. Very nice.

It's nice to know great minds really DO think alike ;) Though sadly my 'great mind' is sadly unmatched by any great ability to actually to put any of my ideas into action....

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

cwright's picture
iFFT

To reconstitute the output waveform from the fft coefficients, you just need to perform an inverse-FFT. this will give you the scope waveform, not the low-res fft/stereo-esq effect you may be thinking of. A lot of math, but it might be possible in js or with specially crafted math expressions.

But with only 16 channels, the regenerated wave will still be a gross simplification of the actual input :(

toneburst's picture
Ah, true. I hadn't thought

Ah, true. I hadn't thought of that. You're right though, it will give some kind of waveform. You're also right in saying it won't be much like the original though. I think you need 256+ partials to get an anywhere near accurate waveform using FFT analysis/resynthesis. Having said that, the result with 16 bands and 1-update per frame might be enough to get a visually useful output. I suspect the slowdown associated doing all the necessary maths in a JS patch might mean it's not worth going down this route.

What do you think?

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

cwright's picture
think not, do

I'm prone to being completely wrong when asked what I think; I suggest you whip up an experiment, and see if it's feasible or not :)

For a 16-point iFFT, with complexity n*log(n), you're only doing something like 64 operations; this isn't terribly difficult, and could probably be performed in real-time without too much of a performance impact. Famous last words though ;)

toneburst's picture
Unfortunately

I don't have the first idea how I'd even think about making a start on that, maths-dunce that I am...

:(

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

toneburst's picture
I've just been thinking about this

Surely, even if you could do the maths to recreate the waveform using JavaScript, I can't see a way actually to draw it, since the JavaScript patch can't generate images. You could send out a long array of numbers, but there aren't any patches that can turn a long array of numbers into anything useful, per-frame. Or am I wrong?

It's got to be some kind of plugin, I think.

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net

toneburst's picture
Thinking of Commissioning

I'm seriously considering commissioning this one.

alx

Quartz Composer Blog: http://machinesdontcare.wordpress.com

Music Site: http://www.toneburst.net