Hello from the North

rainwave1's picture

New to Quartz Composer. Been working with FCP for sometime. This community looks very cool. Good folks...Trying to wrap my mind around QC and artoolkit. System spec...G5 dual core...non intel....10.4.11...will be upgrading to leopard in the future. Have been tapped for a project to use a particle em miter to be influenced by tone of speech. Sort of like negative speech would create red or shaded colors,,,positive speech lighter colors. Not sure if this can be done in Tiger let alone QC. I know I am asking a lot for a newbie. Just thought I would ask though. Thank you for listening...I won't take up any of your time...Josh

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

cwright's picture
Re: Hello from the North

For tone of speech (assuming you mean frequency?), you can use the Audio Input patch, and some structure index patches to pull out basic frequency information. The built-in stuff only offers ~16 channels of frequency information (of which about 12-13 are usable), so that might be limiting for what you need.

(if by tone you mean more sophisticated processing, that's a lot of processing that no one does currently, other than fancy-pants speech recognition tools)

Custom-patch-wise, we've got some (unreleased?) audio patches for Leopard that give much higher definition. Unfortunately, Development on QC Tiger doesn't take place much anymore, so you might be hard-pressed to find customizations for that.

rainwave1's picture
Re: Hello from the North

Thank You

I figure I should upgrade. I will play around and let you know how it goes. Plus I want to output to qt video H264. I know i need leopard for that...

Thanks>>>

brettm's picture
Re: Hello from the North

negative speech ... positive speech

As with all programming, you have to tell the machine exactly what you want, in its language (whatever language you're working in). So, you'd have to define "positive" and "negative" in terms available in the environment you're working in : the change in amplitude of frequency bands over time. That's all you've got to work with --

As a previous poster said, you're looking at fancy-pants speech recognition... if you're lucky! That is, if anyone has even succeeded at defining (and then implementing) "negative" speech -- this gets into artificial intelligence, cognitive modelling, etc, in addition to the raw "speech recognition" part itself.

There probably are libraries out there that are the speech equivalent of computer-vision packages (eg. "OpenCV" ).

When you're not familiar with a language or paradigm, often something that you'd think would be "do-able" or even "easy" is in fact quite difficult! I know.... because i've been there!

gtoledo3's picture
Re: Hello from the North

That's pretty rough (to discern emotional content via QC)... I agree about the AI comment.

One complimentary thought is that maybe you might be able to setup something that responded to pace of speech (amount of volume peaks over a certain threshold within a given period of time), and amplitude. I don't think that really translates to positive/negative though.

brettm's picture
Re: Hello from the North

amount of volume peaks over a certain threshold within a given period of time

Yes, good idea. Reminds me of ... something i"ve been thinking about lately:

    How to define "interesting" derived audio characteristics for visualization of drones?

   simple beat-detection doesn't give you much.

Instead i think we'd need characteristics over a much larger time 

window. And it would involve psycho-acoustic features that we can computationally identify in drones: --"persistent" amplitude peaks in contiguous regions of the spectrum -- "movement" of amplitude across contiguous bands -- "quiet" detection (we are now in a quiet period) -- "post-crescendo" detection (we are now in a time following a short, loud period)

I'm thinking that "optical flow" techniques might be useful

here, applied to the movements of peaks and valleys in the
3-D surface induced by the time-set of spectra under consideration . . .

This is also one of those fancy-pants areas of audio-processing/computer-music research eg.

http://ccrma.stanford.edu/~brg/research/pc/pitchtrack.html

Anyone here with a PhD in DSP ?

usefuldesign.au's picture
Re: Hello from the North

Another angle that is theoretically within the grasp of Quartz and OpenCV or similar is facial expression recognition. The practical limitations of Quartz's light coding environment and speed of execution and the high-res image detection possibly required may be a limitation though.

A psychologist, Paul Ekman, did heaps of research into what he called micro-expressions. Momentary changes in the face that can be extreme but often go unnoticed, at least at the conscious level because they are a fraction of a second duration. You need a decent frame rate on your motion camera to pick most of them up.

Ekman documented (if I remember correctly) over 100 micro-expression directly related to specific emotional states (more subtle than positive/negative) but if you could detect a few of them you could parse for negative/positive with those like contempt, disgust, anger, admiration and so on. More success would be achieved if you had some context algorithms for the micro-expressions themselves and for the spoken content.

They do say the text-content of our speech makes up only a small percentage of the whole communication but it's a pretty important part if you want to know if somebody is truth'n or lie'n.

The TV show “Lie to me” was loosely based on Ekmans work although it is more of a popularisation than documentation. There’s a chapter in “blink” by Malcolm Gladwell about his work which makes it all very accessible, at the discursive level anyway.

It stands to reason that if there are so many micro-expression on the face, the voice will also contain auditory 'tells' but it might take you a decade of research and a funded-research team to discover them. Best of Luck.

Come to think of it you'd have a pretty juicy market if you did get a visual micro-expression detector going that didn't even require the person being sampled knowing they were being parsed by a machine for emotional 'tells'.

First the Military, then clean-out the world's poker tournaments (joking, that's even more negative than the military!), throw one in every paddy wagon, broadcast media could use it on politicians in parliament, sky's the limit.

rainwave1's picture
Re: Hello from the North

ok. Looks like it would be a tough one. ?. So the particle emmiters I see changing to the music is strictly random? What about bringing it down to programming certain negative words to trigger a particular outcome, for example.

Speech recognition would be involved somewhere in the loop. Biting off more than I can handle. I know

Thank you for all the responses:)

psonice's picture
Re: Hello from the North

Like everyone else said, determining if the speech is negative will be really hard and it'll need a fair bit of programming most likely. If you can do that though, the rest is pretty easy.

Quick + easy solution: use a human. They're pretty good at knowing what emotions are involved, and very capable of pushing a slider up and down :) Unless you really need this to be a permanent, automated thing, the time and effort will be much smaller if you ask somebody to do it.

gtoledo3's picture
Re: Hello from the North

Yep! Right on.

A thought that comes in line with that, is that over half or more of stuff that I do that has something to do with music, isn't strictly music reactive, it's just been timed correctly.

My best thought is, if it is for performance.... have someone working the qtz in real time just like psonice says, unless it really HAS to be automated, etc.

In my experience, if we are to ignore the environment of QC, and just talk about what is possible... the word recognition is possible, but who is to say what is positive or negative absolutely, since it depends upon inflection? In working with some DSP for microphone simulation, I can say that I think that this is a pretty hard thing to pull off given that so many things can change frequency peaks (which can denote agitation or relaxed voice) besides the voice (like the microphone, the room, the particular speaker's tonality, distance from the mic/proximity effect, the audio chain in general, ambient noise...).

It's an awesome idea, that's for sure.

I think the audio reactive stuff you see with QC is basically all volume peak, or frequency spectrum stuff, or simply timed correctly via stuff like lfo/interpolation/timelines/value historian.

rainwave1's picture
Re: Hello from the North

Ok. I think I know where you are all coming from. I have been tasked by a group to pull off something of in line with Masaru Emoto""The Message from Water"" If you google him you know what I have in mind. Thanks again for brainstorming. I think we can close this thread for now now.......Josh..

Thanks

usefuldesign.au's picture
Re: Hello from the North

"I have been tasked by a group to pull off something of in line with Masaru Emoto"

They could have ask you for something easier like compose Beethoven's 10th symphony. If you accept his work as science, Emoto discovered a previously undetected phenomenon of nature; how emotional vibrations come to affect this phenomenon and how to document it. No small feat.

gtoledo3's picture
Re: Hello from the North

heheh, I want this to come out right.... I'm not TOTALLY bagging on him, because I think he has a good rap...

Sometimes I see visual stuff, and it's like "we've replicated the moment of the creation of the universe in inverse, and ripples in the time bla bla bla"... and it's like a particle generator and it just does whatever it does, and someone simply is SAYING, oh look this is the visual representation of "____" (whatever it is supposed to be).

That's how I feel about Masaru. Any of that stuff could have been done, and if you did it a million times, you would have a million different visual results. In that sense, I think it's highly laughable.

So, I would say, just make something that looks cool, and insist that it represents whatever it is SUPPOSED to, because that's all he's doing.

gtoledo3's picture
Re: Hello from the North

Let's see all of the pics, without culling. That's my take. Let's see the same word repeated many times, at different environmental condition.

Anyone that knows me, knows that I am actually really open minded towards this type of thing, but that's my opinion on him and what he does, and I'm sticking to it :)

usefuldesign.au's picture
Re: Hello from the North

But the Tibetan one looks like a mandhala and the heavy metal is so ugly has to be true!

I would like to believe he is a genuine operator, there are a few Japanese scientist who excel at the straight science-spiritual crossover thing, in nutrition and so on, they really seem to get the 'whole' as it were whereas so many in the west are on one side of the duality or the other. Some documentation or independent verification would be excellent though, I agree. In the end I don't rely on this kind of material for any 'world view'. shrugs

This guy gets wheeled out on every new-age device web site of proof that mind influences matter (which I have no debate with at all) and therefore the validity of their super$$ unproved device. That I find {insert random negative emotion here} at times.