Depth Textures with GL Read Pixels

toneburst's picture

There doesn't seem to be enough tonal variation between near and far objects to be able to use the output of the Read Pixels patch as a depth texture in depth-of field or SSAO shaders.

Would there be some mileage in some options for pre-processing the depth image in it's original 32bit/ch form before outputting it from the plugin? I'd envisage being able to set a near and far limit, and having the full 0 > 1 range of tones between those values. Don't know if this would be possible though.

a|x

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

cwright's picture
Re: Depth Textures with GL Read Pixels

toneburst wrote:
There doesn't seem to be enough tonal variation between near and far objects to be able to use the output of the Read Pixels patch as a depth texture in depth-of field or SSAO shaders.

Correct, there are only about 256 values (despite reading the texture in 32bit floating point format... grumbles)

toneburst wrote:
Would there be some mileage in some options for pre-processing the depth image in it's original 32bit/ch form before outputting it from the plugin? I'd envisage being able to set a near and far limit, and having the full 0 > 1 range of tones between those values. Don't know if this would be possible though.

The problem isn't reading it as a floating point texture (it's doing that now), it's that CoreImage/QCImage (probably the latter) sucks and apparently rounds the values when creating images from GL Textures. There's an alternative way to create QCImages using GLTextures (skipping CoreImage), but it also does a bit-depth-reduction for some reason. I'm trying to figure out a fast accurate way to do this, but no ways have come forth yet :/

Preprocessing (to clamp/bias) would be absolutely dreadful for performance (unless filtered on the GPU, it'll have to download, process, and reupload, which will almost guarantee a <30fps framerate)

toneburst's picture
Re: Depth Textures with GL Read Pixels

Hi cwright,

thanks for getting back to me, and so fast....

It's a shame there isn't an easy way to do this :(

In the past, I've written shaders to write out depth value onto the geometry texture, but obviously with this method you lose at lease one channel, unless you render the same geometry multiple times (which is fiddly and inefficient).

A lot of the stuff I want to do at the moment would be made much easier by the ability to render the same geometry to multiple textures. I guess in future I'll just have to do this kind of fancy business in custom plugins.

a|x

markus.helminen's picture
Re: Depth Textures with GL Read Pixels

Hi!

Good that I found discussion concerning this thing I already wrote a similar question in another thread http://kineme.net/forum/Discussion/General/GLToolsv14#comment-11351 but actually I got the aswer partially from here.

My problem was the linearization of depth image. There is plenty of documentation and solutions ( http://www.geeks3d.com/20091216/geexlab-how-to-visualize-the-depth-buffe... ) .. with sample code that might help you to get the nonlinear depth from eyspace corrected to linear screenspace values.

I can probably get shaders that do the depth calculations and the screenspace effects going(with some sweat and tears .. I´m pretty new to GLSL), but I guess it is more efficient to do it for example after the final "Render In image" .. take the depth info with glread add the effectspatches and render the scene to billboard or render it to billboard inside a shader that does the effects. The last way I might be able to pull off but if I want a effect/filter chain before the final billboard it wouldnt be a good solution. I would have to do another render in image with the shader.. or maybe the read pixels can do multiple instances so that I can get the color map from the postprocessing shader with glread and spooky? How is the performance difference with read pixel and render in image? Or maybe a final postproseccing shader with a lot of uniform parameters that would do.. a good DoF, SSAO/(ASSAO this is nice too), (good fast AA? Dont have qc4 so dont know is the aa any good) + other postimagestaff. you could turn different effects on/off in the shader. My experience with QC is that pixel shaders are far more faster in processing stuff than the image filters that come with quartz.. btw has anybody made a shader library for QC? I guess people have their own supershaders and dont want to give them up.. Maybe there could be a Kineme shader pack? or something like that..

Anyway is there a way to get the depth linearization in the read pixel plugin, the new "Render in image with depth" or into a CI filter? Thanks in advance!

cwright's picture
Re: Depth Textures with GL Read Pixels

the values from the depth stuff (Render in Image With depth is private, by the way -- don't actually depend on it for anything, as it'll likely change in the next release) are provided as they're stored on the gpu. We don't (and won't) do any post processing, as it's trivial to do that yourself with a core image filter. Extra postprocessing means more code for us to write/test/maintain, and worse performance for everyone :/

read pixels reads from the current context (the visualizer, typically mid-render), so don't use that unless you really understand what you're doing (QC's free to evaluate stuff out of order as long as the output is correct, so expecting read pixels to work identically on 10.5, 10.6, and 10.7 is a tenuous proposition at best). Read pixels is much faster than render in image (there's no extra context involved), but RII should still be quick enough for your needs (and generally more reliable)

markus.helminen's picture
Re: Depth Textures with GL Read Pixels

Thank you for the answer. I think I understand the read pixels better now.

This means I´ll have to learn to write core image code then. Can I do the same stuff with core image than I would with the fragment shader and will it be as fast? Is it more efficient to have multiple ci filters or just one doing the all the different stuff?

This is probably a good starting point? http://developer.apple.com/mac/library/documentation/GraphicsImaging/Con...

It might take a while for me to learn it well enough to write my own filters.. but Ill try as always.. in the meantime can someone on this forum write the depth linearization CIfilter based on fragment shader code? If this is not too much to ask.

cwright's picture
Re: Depth Textures with GL Read Pixels

CI kernel code is almost verbatim frag shader code (a few things are renamed, and there's no discard feature, but otherwise the features are mostly the same) -- CI creates a GLSL shader from the kernel, so the speed should be identical.

As for combining filters: as long as the filter's simple enough to live on the gpu, fewer is better -- however, if you do too many things in the same filter, it ends up either splitting it automatically into 2 kernels (internally -- you don't need to care, other than performance impact), or worse, makes the filter run on the CPU (this is very slow). You should see stuff in Console.app pop up if that happens, to let you know you're hitting the limit.

markus.helminen's picture
Re: Depth Textures with GL Read Pixels

Thanks!

This is good to know. Glad to have started asking questions on this forum. For some reason I always thought that CoreImage would be really hardcore to write.. Ill definately check it out now. If I can write my own post processing with CI it will make my life a lot easier. Especially if combining all the pixelshader code inside one CI filter works well. I used to do post processing with a shader(s) inside render in image(s) and connecting them to billboard.. that slowed down the framerate alot and apparently was not the way to do it..

gtoledo3's picture
Re: Depth Textures with GL Read Pixels

Doesn't it seem weird that even if the filter ended up splitting into 2 kernels that it makes it slower, since there isn't the extra connection/noodle-adge happening? Does every part of the filter shift to the CPU, or does the CPU only come into play to "split" the kernels?

I remember that one of the things that really puzzled me when first experimenting with QC/CI was that things are LESS efficient if you put more function into a single CI filter.

cwright's picture
Re: Depth Textures with GL Read Pixels

not weird at all -- consider how it actually works:

you've got a texture in vram. you do some filter on it (a fragment shader), and render either to a second texture (now using 2x as much memory, and have done 1 read/write to every pixel). If the kernel gets split in 2, you're going to read the texture, filter (half way), write it, then take the intermediate, and filter it with the second half, and that second part becomes the "final" result -- you're doing 2x as many read/write operations, and you'll have 2x the GPU setup overhead (not cheap), and 3x the memory usage (source, dest, and intermediate). In retrospect, I don't think splitting happens as frequently (if at all) as just dropping to CPU mode (as logically splitting filters is difficult, computationally), which is really bad because then the CPU's doing filter operations out of vram (slow) and writing it back to vram (slow), and not working in parallel nearly as much as the GPU could (slow). GPU to vram is blazing quick, vram to cpu is awful. just like CPU to ram is quick, gpu to ram is slow. (this is a major issue with trivial CL usage on 10.6 -- most of the performance is lost shuffling tiny buffers between the cpu and the gpu -- datasets smaller than several thousand elements waste more in transit than it would take for the CPU to just operate on them in-place)

[edit off topic mostly: hybrid CPU/GPU computation isn't all it's cracked up to be -- it'd be faster (and simpler) for the filter to drop either to CPU mode for the entire filter, or stay in GPU mode (which is how it currently works, splitting aside, which we'll treat as "academic" only :); shuffling the data around between the two is severely expensive (much more than initially expected: vram is on the order of 1/8-1/16 as fast as ram from the CPU's point of view, from the profiling I've done). Smokris and I initially conceived a sweet, hybrid particle engine that could flip between CL(GPU) and CPU as the operation dictated (some particle ops are trivially CL capable, others (boids, for example) aren't as much)), but initial proof-of-concept profiling indicated worse performance than just sticking to one or the other, because the transit is just so so so expensive...]

markus.helminen's picture
Re: Depth Textures with GL Read Pixels

Ahh.. I didnt read the previous posts well enough. So the read pixels outputs just 256 samples of depth..? I thought first that the way it acted was because the linearization issue.
So the only way to do proper depth based screenspace stuff is rendering the scene twice first using a glsl shader that outputs the depth image.. and then the real scene.. damn .. its gonna be too slow if there is alot of stuff going on.. Is there a way to do multipass with opencl? or are there any other options in qc4 that would help? You guys have probably thought this over but I´m asking anyway, sorry..

cwright's picture
Re: Depth Textures with GL Read Pixels

unfortunately, yeah, that's about as good as it gets (rendering twice) -- until render in image with depth is non-private, there aren't any faster alternatives.

nothing about CL helps you with multipass, and QC4's evaluation isn't any faster than QC3 (and sometimes half as fast -- yes, QC4 is measurably slower than qc3 in some instances...)

gtoledo3's picture
Re: Depth Textures with GL Read Pixels

I believe that the short answer is that it's pretty much useless for what you're trying to do because of the limited resolution of images in QC (Chris, feel free to expand/correct my simplistic response if I'm off-base).

markus.helminen's picture
Re: Depth Textures with GL Read Pixels

Okay.

I´m being a little stubborn with these post processing effects, but they really make things look soo much better. I´ll try to get things to work with the double rendering for now.. And as a pretty wet behind the ears quartzer I´m probably still going to be asking some dumb questions. :)

To minimize the load while double rendering how would I go when I have a 3d scene with some terrain and animated models in it? Would it be best to make one structure with all the animated models that is fed to both shaders thru spooky. Inside the shaders the meshes would be rendered with render from structure(I dont remember the name of the patch but anyway the new patch that renders multiple meshes from one structure..) Then to get the zbuffer image out I would use the "read pixel color" inside the shader. That would read the color bitdepth right wouldnt it? and send it via spooky to the final CI postprocessing filter where the real render would go too from render in image. That would make the nice screenspace magic happen.

Or could an iterator be used. Inside the iterator I would use render in image with a double function shader that would be switchable between the real render and the z render. then I´d output the two different images from the iterator into the CI filter. (I read that iterator output is possible with qc4.) would this work, would this make it any faster?

I´m visiting my parents so I´ll have to wait a couple days to test this out.. Does this sound like the most cost efficient way to do this or do you guys have some more ideas? Is the real performance problem still going to be the fact that I´ll have to render things twice and all this finetuning won´t really make a difference? If this doesnt work with decent framerates I´d have to change to blender to make this scene happen in my project and that would be a lot of work for me to get the controls that I made with QC to blender thru OSC.

Sorry about the discussion going away from the original topic..

cwright's picture
Re: Depth Textures with GL Read Pixels

when you're looking for "optimal" performance, the right idea isn't ask, but to do experiments and measure it -- things that would logically seem fast can be slow in QC, and things that might sound slow (rendering a scene twice, for example) might not be as slow as you think.

avoid using spooky unless it's absolutely necessary -- it leaks and makes compositions difficult to follow. it'll also be receiving some updates in a future release, which might render it inoperable (right now we haven't broken compatibility, but I'm ok with doing so, since the patch is a few years old and we've learned a lot of design details since then to improve things.)

iterator output is possible in qc4, but it probably won't help -- iterators aren't any faster than manually rendering twice (and in many cases, it's actually slower).

beware of phrases like "most efficient" -- most efficient in terms of what? vram usage? frame rate? battery life? If you're going for all-out rendering performance (frame rate), using QC is not the way to go, at the cost of needing to write GL code yourself. ;)

markus.helminen's picture
Re: Depth Textures with GL Read Pixels

Thanks for the answers .. and the patience ;)

I will try to optimize my scene by trial and error. I just thought that there would be some guidelines how things usually are practical to implement.. I´m a very practical guy and as long as the framerate doesnt drop below some 30fps its all ok.

If C was my second language I´d tackle GL and use some good programming framework for the stuff that I try to do with QC. But I´m just a simple musician and the way I have learned QC and GLSL is just by staring at code and studying loads of examples of compositions and shader code. Experimented with them and picked the bits that I liked. And now asking advice from people that are skilled with these tools. My method is similar to the way I learn music. I dont know that its good for learning coding though.. maybe some day I´ll learn it enough to use something like OF for interactive stuff i´d like to do. At the moment the way to go for me is QC..