Rendering huge amount of vertices (VBO)

LukeNeo's picture

Hi to all, I'm developing a plugin that allows you to render a set of vertices randomly distributed in space. I’d like to draw a huge amount of vertices at the same time: to do this I tried to use OpenGL Vertex Buffer Object (VBO), because I read that it allows vertex array data to be stored in high-performance graphics memory on the server side and promotes efficient data transfer.

This is my approach:
* Generate a new buffer object with glGenBuffersARB().
* Bind the buffer object with glBindBufferARB().
* Copy vertex data to the buffer object with glBufferDataARB().

so in my startExecution plugin function I wrote:

  1. - (BOOL) startExecution:(id<QCPlugInContext>)context
  2. {
  3. CGLContextObj cgl_ctx = [context CGLContextObj];
  4.  
  5. glGenBuffersARB(1, &VBUFFERNAME);
  6. glBindBufferARB(GL_ARRAY_BUFFER, VBUFFERNAME);
  7. //vcArray is defined as float vcArray[numV*3]
  8. glBufferDataARB(GL_ARRAY_BUFFER, sizeof(vcArray), vcArray, GL_DYNAMIC_DRAW_ARB);
  9. glVertexPointer(3, GL_FLOAT, 0, 0);
  10.  
  11. return YES;
  12. }

and in my execute plugin function I do this:
* Update vertices in vcArray using glBufferSubDataARB()
* Draw them using glDrawArrays()

  1. - (BOOL) execute:(id<QCPlugInContext>)context atTime:(NSTimeInterval)time withArguments:(NSDictionary*)arguments
  2. {
  3. CGLContextObj cgl_ctx = [context CGLContextObj];
  4.  
  5. //update vcArray vertices
  6. updateVertices();
  7.  
  8. glEnableClientState(GL_VERTEX_ARRAY);
  9. glBindBufferARB(GL_ARRAY_BUFFER, VBUFFERNAME);
  10. {
  11. glDrawArrays(GL_LINE_STRIP, 0, numV*3);
  12. [self calcVertices];
  13. glBufferSubDataARB(GL_ARRAY_BUFFER, 0, sizeof(vcArray), vcArray);
  14. }
  15. glDisableClientState(GL_VERTEX_ARRAY);
  16. glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
  17.  
  18. return YES;
  19. }

..the problem is that I can’t get noticeably performance improvements than the immediate mode rendering: they are the same! For example, using VBO I got 60 FPS with 2.500 vetices, 40 FPS with 5000 verices, and 10 FPS with 20.000 vertices.. and that are the same performance I got in immediate mode, using simple code like these:

  1. glBegin(GL_LINE_STRIP);
  2. for(x = 0; x<numV; x++){
  3. glVertex3f(v[x][0], v[x][1], v[x][2]);
  4. }
  5. glEnd();

..am I missing something in the VBO approach? Why I can’t get performance improvements?

Thank you, Luke

PreviewAttachmentSize
test_00
test_00149.99 KB

vade's picture
Re: Rendering huge amount of vertices (VBO)

Thats tragically slow. Have you actually attempted to profile your code? My gut tells me that

  [self calcVertices];

Is the culprit. I would suggest threading that, so it does not block drawing.

Also, if you are replacing the entire contents of your VBO, you ought to set its contents to NULL before updating the buffer, it is a small win.

You probably want to look into VAOs as well. Buffered VBO's should let you draw millions of points at 60hz. Clearly something else is causing an issue.

cwright's picture
Re: Rendering huge amount of vertices (VBO)

You probably don't want to draw lines, and especially not antialiased lines. Those pretty much defeat the GPU in a bunch of cruel ways.

1) lines by definition don't emit significant spans (so the GPU can't sweep scanlines or blocks of contiguous fragments)

2) blending requires each primitive to be completed before the next can begin so that they blend in the correct sequence (unblended stuff has a bit more flexibility, in that multiple primitives can be processed in parallel, with the depth test used to control which fragment "wins")

You didn't provide source or define what other GL state you're using.

Did you profile? If so, where did the time go? Did you try the standard GL bottleneck deduction stuff (make the point cloud smaller so that it touches fewer pixels - if that helps, you're thrashing the drawable with blending, make the rendering use a trivial shader, change AA, blending, alpha test, depth test states, try rendering that many vertices as triangles (it'll look like crap, but it'll reveal whether or not lines are the culprit).

From what I've observed (and I haven't taken a closer look for a while, so I could be totally outdated), GL_LINES isn't particularly well optimized. It's not frequently used, so it doesn't seem to get much attention.

gtoledo3's picture
Re: Rendering huge amount of vertices (VBO)

vade wrote:
Thats tragically slow. Have you actually attempted to profile your code? My gut tells me that

  [self calcVertices];

Is the culprit. I would suggest threading that, so it does not block drawing.

Also, if you are replacing the entire contents of your VBO, you ought to set its contents to NULL before updating the buffer, it is a small win.

You probably want to look into VAOs as well. Buffered VBO's should let you draw millions of points at 60hz. Clearly something else is causing an issue.

Chris makes really good points about the toll of lines and blending, and I really really agree with Vade's point above. This can really drag down what you're doing and totally negate use of VBO.

However, I'd second the suggestion of investigating using VAOs, and instead of using glDrawArrays, to use glDrawElements as I believe this to be more stable (caveat emptor).

LukeNeo's picture
Re: Rendering huge amount of vertices (VBO)

Thank you for the replies. I'm currently trying another approach: I split the process of update vertices position and render them into two plugins. Vertices rendered by the custom plugin "B" are passed via structure from another custom plugin "A". But I'm still stuck with some kind of bottleneck: when vertices num become bigger than 3000, it seems that plugin A spends too much time to prepare the output structure with vertices coords.

Plugin A manage vertices coords in an internal array declared as

float   v[VSIZE][3];

where VSIZE is the num of vertices. After vertices are updated (it is a very simple update function, that adds a little random delta (x,y,z) to each vertex. Anyway, the problem persist even if I don't update vertices position at all), the output structure is prepared in this way:

[[outStruct objectAtIndex:0] removeAllObjects];
[[outStruct objectAtIndex:1] removeAllObjects];
[[outStruct objectAtIndex:2] removeAllObjects];
 
for (int x=0;x<VSIZE;x++) {
   [ [outStruct objectAtIndex:0] addObject:[NSNumber numberWithDouble:v[x][0]] ];
   [ [outStruct objectAtIndex:1] addObject:[NSNumber numberWithDouble:v[x][1]] ];
   [ [outStruct objectAtIndex:2] addObject:[NSNumber numberWithDouble:v[x][2]] ];
}
 
self.outputStructure = outStruct;

where outstruct is declared in this way in the header file:

NSMutableArray   *outStruct;
NSMutableArray   *xStruct;
NSMutableArray   *yStruct;
NSMutableArray   *zStruct;
 
@property(assign) NSArray   *outputStructure;

and initialized in this way in the startExecution method:

outStruct = [[NSMutableArray alloc] init];
xStruct   = [[NSMutableArray alloc] init];
yStruct   = [[NSMutableArray alloc] init];
zStruct   = [[NSMutableArray alloc] init];
[outStruct addObject:xStruct];
[outStruct addObject:yStruct];
[outStruct addObject:zStruct];

so the output is a structure consisting of three structures (one for X coords, use for Y coords, and one for Z coords) (see fig1) This structure is then read from plugin B, stored in the VBO and then rendered. To test performances, I disabled plugin B, and linked plugin A to an imageWithString just to force the plugin to run. Performances are very poor and they appear to be directly proportional to the number of vertices: 7 FPS with 50.000 vertices, 15 FPS with 25.000 vertices, 30 FPS with 12.500 vertices

Now the problem is: is this the correct way to create, update, and pass a structure of vertices coordinates from one plugin to another?

PreviewAttachmentSize
fig1.png
fig1.png41.81 KB