Performance measuring and Present

Performance measuring and Present

Post by Peter Rac » Tue, 01 Jul 2003 18:33:55



Hello,

My application is made with DX8.1 and it renders static objects via fixed
pipline and hardvare vertex processing. Vertex and Index buffers are static,
they never change during rendering.

The hardware is a GF2 with 64 MB.

I have following issue. The program renders about 30-200 meshes. The meshes
and textures are stored in the managed pool, rendering is made "by hand",
that means by call to DrawIndexedPrimitive. There is a simple culling
algorithm implemented with bound spheres around the object. The algorithm
culles objects behind the camera.
The number of texture changes per frame is constantly 23, and the number of
VB changes per frame ist constantly 2.
The duration of the own algorithms which cull and render is constantly
between 1 and 2 milliseconds.
Under this circumstances I measure the duration of Present: If the number of
objects is about 100 or less, Present takes 2-4 milliseconds. If the number
of objects grows about 100-110 the duration of Present increases
dramatically to 20-30 milliseconds. Now, that is not enough. If I turn the
camera to the sky, the objects will not be culled(!), but non of them can be
seen as they are outside of the frustum, the duration of Present decreases
to its original value of 2-4 milliseconds.

Therefore I think it is somehow a problem with filling the pixels or with
Z-Buffering, or somethink similar. I would be very happy on any advise.

Thank you
Peter

 
 
 

Performance measuring and Present

Post by Peter Rac » Tue, 01 Jul 2003 21:49:40




Quote:> Hello,

> My application is made with DX8.1 and it renders static objects via fixed
> pipline and hardvare vertex processing. Vertex and Index buffers are
static,
> they never change during rendering.

> The hardware is a GF2 with 64 MB.

> I have following issue. The program renders about 30-200 meshes. The
meshes
> and textures are stored in the managed pool, rendering is made "by hand",
> that means by call to DrawIndexedPrimitive. There is a simple culling
> algorithm implemented with bound spheres around the object. The algorithm
> culles objects behind the camera.
> The number of texture changes per frame is constantly 23, and the number
of
> VB changes per frame ist constantly 2.
> The duration of the own algorithms which cull and render is constantly
> between 1 and 2 milliseconds.
> Under this circumstances I measure the duration of Present: If the number
of
> objects is about 100 or less, Present takes 2-4 milliseconds. If the
number
> of objects grows about 100-110 the duration of Present increases
> dramatically to 20-30 milliseconds. Now, that is not enough. If I turn the
> camera to the sky, the objects will not be culled(!), but non of them can
be
> seen as they are outside of the frustum, the duration of Present decreases
> to its original value of 2-4 milliseconds.

> Therefore I think it is somehow a problem with filling the pixels or with
> Z-Buffering, or somethink similar. I would be very happy on any advise.

Now I measured also the number of primitives rendered. The limit is about
140.000 primitives per frame. If the rate is above this, Present takes
suddenly very long. If there are less than 140.000 primitives to render,
Present ist very fast. Regarding frame rate I have to say, that my software
limits the number of frames per secont to 50 by measuring the time and going
to sleep. So the frame rate gets never higher than 50, but in the case of
more than 140.000 primitives it gets lower and lower.

Is it possible, that I reached the limit of the hardware? Is it a usual rate
to render 140.000 primitives every 20 milliseconds?

best regards
Peter

 
 
 

Performance measuring and Present

Post by Sean Cavanaug » Wed, 02 Jul 2003 01:40:46


Do you have vsync on?  It sounds like you are getting to the point where
the hardware is just missing the vsync and having to wait for the next
one (which will block in present . . .)  Which would add about 16ms
assuming 60hz refresh rate . . .



>>Hello,

>>My application is made with DX8.1 and it renders static objects via fixed
>>pipline and hardvare vertex processing. Vertex and Index buffers are

> static,

>>they never change during rendering.

>>The hardware is a GF2 with 64 MB.

>>I have following issue. The program renders about 30-200 meshes. The

> meshes

>>and textures are stored in the managed pool, rendering is made "by hand",
>>that means by call to DrawIndexedPrimitive. There is a simple culling
>>algorithm implemented with bound spheres around the object. The algorithm
>>culles objects behind the camera.
>>The number of texture changes per frame is constantly 23, and the number

> of

>>VB changes per frame ist constantly 2.
>>The duration of the own algorithms which cull and render is constantly
>>between 1 and 2 milliseconds.
>>Under this circumstances I measure the duration of Present: If the number

> of

>>objects is about 100 or less, Present takes 2-4 milliseconds. If the

> number

>>of objects grows about 100-110 the duration of Present increases
>>dramatically to 20-30 milliseconds. Now, that is not enough. If I turn the
>>camera to the sky, the objects will not be culled(!), but non of them can

> be

>>seen as they are outside of the frustum, the duration of Present decreases
>>to its original value of 2-4 milliseconds.

>>Therefore I think it is somehow a problem with filling the pixels or with
>>Z-Buffering, or somethink similar. I would be very happy on any advise.

> Now I measured also the number of primitives rendered. The limit is about
> 140.000 primitives per frame. If the rate is above this, Present takes
> suddenly very long. If there are less than 140.000 primitives to render,
> Present ist very fast. Regarding frame rate I have to say, that my software
> limits the number of frames per secont to 50 by measuring the time and going
> to sleep. So the frame rate gets never higher than 50, but in the case of
> more than 140.000 primitives it gets lower and lower.

> Is it possible, that I reached the limit of the hardware? Is it a usual rate
> to render 140.000 primitives every 20 milliseconds?

> best regards
> Peter

 
 
 

Performance measuring and Present

Post by ppu » Wed, 02 Jul 2003 04:41:51





> > Hello,

> > My application is made with DX8.1 and it renders static objects via fixed
> > pipline and hardvare vertex processing. Vertex and Index buffers are
> static,
> > they never change during rendering.

> > The hardware is a GF2 with 64 MB.

> > I have following issue. The program renders about 30-200 meshes. The
> meshes
> > and textures are stored in the managed pool, rendering is made "by hand",
> > that means by call to DrawIndexedPrimitive. There is a simple culling
> > algorithm implemented with bound spheres around the object. The algorithm
> > culles objects behind the camera.
> > The number of texture changes per frame is constantly 23, and the number
> of
> > VB changes per frame ist constantly 2.
> > The duration of the own algorithms which cull and render is constantly
> > between 1 and 2 milliseconds.
> > Under this circumstances I measure the duration of Present: If the number
> of
> > objects is about 100 or less, Present takes 2-4 milliseconds. If the
> number
> > of objects grows about 100-110 the duration of Present increases
> > dramatically to 20-30 milliseconds. Now, that is not enough. If I turn the
> > camera to the sky, the objects will not be culled(!), but non of them can
> be
> > seen as they are outside of the frustum, the duration of Present decreases
> > to its original value of 2-4 milliseconds.

> > Therefore I think it is somehow a problem with filling the pixels or with
> > Z-Buffering, or somethink similar. I would be very happy on any advise.

> Now I measured also the number of primitives rendered. The limit is about
> 140.000 primitives per frame. If the rate is above this, Present takes
> suddenly very long. If there are less than 140.000 primitives to render,
> Present ist very fast. Regarding frame rate I have to say, that my software
> limits the number of frames per secont to 50 by measuring the time and going
> to sleep. So the frame rate gets never higher than 50, but in the case of
> more than 140.000 primitives it gets lower and lower.

> Is it possible, that I reached the limit of the hardware? Is it a usual rate
> to render 140.000 primitives every 20 milliseconds?

> best regards
> Peter

Hey Peter,

Have you tried D3DPRESENT_INTERVAL_IMMEDIATE, in order to determine
if you are bounded by vsync? My Xbox emulator is severely slowed down
if vsync is waited for, as it misses a sync and has to wait for the
next one.

caustik

 
 
 

Performance measuring and Present

Post by Peter Rac » Wed, 02 Jul 2003 20:24:15




Quote:> Do you have vsync on?  It sounds like you are getting to the point where
> the hardware is just missing the vsync and having to wait for the next
> one (which will block in present . . .)  Which would add about 16ms
> assuming 60hz refresh rate . . .

No, I have the vsync off. If I switch it on, things get much "worse", the
hardware waits in nearly every frame (which is the intention of vsync) and
this I can see in the measurings.

Peter

 
 
 

Performance measuring and Present

Post by Peter Rac » Wed, 02 Jul 2003 20:59:06




Quote:> Hey Peter,

> Have you tried D3DPRESENT_INTERVAL_IMMEDIATE, in order to determine
> if you are bounded by vsync? My Xbox emulator is severely slowed down
> if vsync is waited for, as it misses a sync and has to wait for the
> next one.

Hi,

D3DPRESENT_INTERVAL_IMMEDIATE is only applicable for fullscreen
applications. My application is windowed and is not bound on vsync. In the
meantime I have the feeling, that I reached the limit of the hardware,
having plenty of time on the CPU.

Peter

 
 
 

Performance measuring and Present

Post by Max McMullen [MS » Thu, 03 Jul 2003 02:11:23


Hi Peter,

There are probably a couple things going on here:

First, Direct3D batches up rendering commands in a buffer and sends them to
the driver only when the buffer is full or some other operation requires the
driver to finish rendering (things like Lock or Present).  The time spent in
Present may not be just "Presentation" code, it might include time for the
GPU to render batched commands.

Second, the previous advice about missing a vsync causing wasted cycles is
correct.  Since your application is using DX8.1, make sure
D3DSWAPEFFECT_COPY_VSYNC isn't set.  Performance should not be measured when
the application is being synchronized to the monitors refresh rate.  You
should also check any driver control panel that is installed to make sure
VSYNC isn't forced on.

We removed D3DSWAPEFFECT_COPY_VSYNC in DX9 and allowed
D3DPRESENT_INTERVAL_ONE in both windowed and fullscreen, so I think the
previous bit of advice was assuming DX9.

Max McMullen
Direct3D




> > Hey Peter,

> > Have you tried D3DPRESENT_INTERVAL_IMMEDIATE, in order to determine
> > if you are bounded by vsync? My Xbox emulator is severely slowed down
> > if vsync is waited for, as it misses a sync and has to wait for the
> > next one.

> Hi,

> D3DPRESENT_INTERVAL_IMMEDIATE is only applicable for fullscreen
> applications. My application is windowed and is not bound on vsync. In the
> meantime I have the feeling, that I reached the limit of the hardware,
> having plenty of time on the CPU.

> Peter

 
 
 

Performance measuring and Present

Post by Peter Rac » Thu, 03 Jul 2003 17:26:00




Quote:> Hi Peter,

> There are probably a couple things going on here:

> First, Direct3D batches up rendering commands in a buffer and sends them
to
> the driver only when the buffer is full or some other operation requires
the
> driver to finish rendering (things like Lock or Present).  The time spent
in
> Present may not be just "Presentation" code, it might include time for the
> GPU to render batched commands.

> Second, the previous advice about missing a vsync causing wasted cycles is
> correct.  Since your application is using DX8.1, make sure
> D3DSWAPEFFECT_COPY_VSYNC isn't set.  Performance should not be measured
when
> the application is being synchronized to the monitors refresh rate.  You
> should also check any driver control panel that is installed to make sure
> VSYNC isn't forced on.

Hi Max,

Thank you for your advises. I am sure D3DSWAPEFFECT_COPY_VSYNC is not set. I
also can see the horizontal differences on the sceen if I rotate the camea
quickly. Now I measured followings: The duration of present gets suddenly
higher if I reach 9.000.000 triangles per second with lighting on or
14.000.000 triangles per second with lighting off. Nvidia defines for a GF2
about 20.000.000 triangles per second as a maximum. My software tries to
render every 20 milliseconds as default (that is 50 fps). The rendering
itself (Between BeginScene and EndScene) takes only 2-3 milliseconds on the
CPU, after that (actually after Present) my program sleeps until 20 ms are
reached. My theory is: As long as the GPU can finish rendering in the
remainig part of the 20 milliseconds, everything works well and Present
returns rapidly. I think the effect comes if the driver/adapter cannot
finish rendering within 20 milliseconds, in that case some stall seems to
occur and Present needs immediately much longer, to allow the GPU to finish
rendering. I repeated the measurements with a laptop with ATI 9000 mobility
and I got very similar values. The throughput is about 20% less than with
GF2, but the behaviour is the same.

best regards
Peter

 
 
 

1. DirectX8 - Performance problem with Present

I'm using two textures: Texture A (D3DPOOL_SYSTEMMEM), Texture B
(D3DPOOL_DEFAULT)

Then I Lock texture A. Fill it with some bitmap. Unlock it and do a
UpdateTexture(TextureA, TextureB). This is working fine.

Next I draw two triangles using texture B (filling the whole back
buffer). This is also working fine.

Finally I do a Present(NULL,NULL,NULL,NULL).

Now the problem is: The final Present takes a very long time (15 ms).

I have created the d3dDevice using SwapEffect D3DSWAPEFFECT_DISCARD.
I've tried to experiment with the other parameters for CreateDevice
(multiple backbuffers, different swap effects, windows vs. full
screen).

The examples supplied with the DirectX SDK are working without
problems.

Any suggestions what I should try?

Harald Deischinger

2. dumping resulting 3d grid data from splot to file

3. How to measure graphics performance?

4. Computer Animation'98 / Program

5. Measuring OpenGL performance on Win95

6. MAX slower with 128MB than 64MB!!!!

7. Newbie Question : How to measure performance

8. BEYOND TECHNOLOGY

9. What's the best way to measure OpenInventor performance on PC?

10. Flicker Light Studio presents Tom Repasky

11. CHM Presents More Great Events

12. 10/24 6 pm HP Palo Alto, TCMHC presents Freds Brooks (Virtual Reality)