Performance problems porting from SGI O2 to NT

Performance problems porting from SGI O2 to NT

Post by Ville Herv » Sun, 15 Mar 1998 04:00:00



I ported a OpenGL app from O2 to NT. The app is a kind of walk-through
thing with a large triangle mesh (~10000 triangles, altough it has
the simplification facility which brings the number doewn to 500-2000).
It also uses textures and partly transparent billboard trees. The
program does a lot of geometric optimization and cullings itself so
it should be quite CPU-intensive.

The performance on a O2 was surprisingly good (to me, being new to
openGL, that is. This was my first OpenGL prog...). The app run in the
default window (300 x 300) with frame rate ranging from 12 to 20,
average being something like 15.

When I first ported it to NT I had a 200 PPro with Matrox Millennium I
4Mb to test it with. I was quite satisfied with the 3.5 - 4 FPS
performance I got - that was without any hardware acceleration,
afterall. The switch between the SGI OpenGL and the MS OpenGL made no
noticeable difference.

The I got the pleasure to to test it on a 2 x 200 PPro with a pricy
OpenGL card and moreover, on a 2 x 300 PII with a yet more price Gloria
OpenGL card. (Please forgive me the lack of exact product numbers of the
cards, I have no change to check them now.) The latter machine was said
to be twice as fast as a O2 when it comes to OpenGL graphics.

However, neither of the two machines ran to prog with FPS more than 5.
Changing the drivers in the latter machine to some beta ones increased
the speed to something like 9 FPS, which was still far from
satisfactory. Again, there was no difference between SGI and MS OpenGL
implemetations. And yees, both of the machine seemed to be using
hardware acceleration as the OpenGL-card settings in the control panel
had immediate effect on the program.

I tried to disable parts of the program in order to find the bottleneck,
but it just seemed to be running slow everywhere. Drawing just a single
house with 5 textured quadrilaterals gave FPS of 15-25!

So, I am totally clueless with this. What might cause this, are there
some things one should particularry avoid when optimizing performance
for NT machines? Any clue appreciated!

If the exact models of the cards are relevant, I can post them later.

-- v --


 
 
 

Performance problems porting from SGI O2 to NT

Post by Dirk Reiner » Mon, 16 Mar 1998 04:00:00


Hi Ville,

most of the pricey OpenGL cars are optimized for CAD application, so
textures are not a topic and if they are supported at all in HW this
part of the driver will not have had as much care as say the antialiased
lines part. Try a game card with a Voodoo or something similar, they are
optimized for textured triangles (actually that's all some of them do),
so they might be better suited for your app.

        Dirk


-- ZGDV - AR Group                   http://www.igd.fhg.de/~reiners
-- Arabellastr. 17 (ECRC)            
-- D-81925 Muenchen                  All standard disclaimers apply.
-- Truth is stranger than fiction because fiction has to make sense.

 
 
 

Performance problems porting from SGI O2 to NT

Post by Ville Her » Tue, 17 Mar 1998 04:00:00



>Hi Ville,
>most of the pricey OpenGL cars are optimized for CAD application, so
>textures are not a topic and if they are supported at all in HW this
>part of the driver will not have had as much care as say the antialiased
>lines part. Try a game card with a Voodoo or something similar, they are
>optimized for textured triangles (actually that's all some of them do),
>so they might be better suited for your app.

Thanks Dirk,

I tried not to use texturing at all, but the frame rate dit not rise
noticeably. Besides I understoop the Gloria card does support
texturing in HW, since there where a lot of settings in the control panel
tab, such as the quality of texturing.

The exact model of the GLoria card is

GLINT Delta R1 + GLINT MX R1
RGB640
16 MB
Elsa GLoria-XL 16 / 24

if that helps.

Since I can't define the problem any more accurately I did in the last  
posting. I'd appreciate /any/ ideas that might improve the performance in NT
platform.

--
-- v --


 
 
 

Performance problems porting from SGI O2 to NT

Post by Kevin Bac » Tue, 24 Mar 1998 04:00:00


Can you post a very simple program that demonstrates this problem?  Maybe
then someone can analyze potential problems.

Off hand I would say it probably invloves some sort of state change overhead
that the O2 drivers somehow handle alot better than the Gloria drivers.  Are
you minimizing state changes?  Using strips and fans?  Using texture
objects?  I think posting the code will help.

-Kevin


> I ported a OpenGL app from O2 to NT. The app is a kind of walk-through
> thing with a large triangle mesh (~10000 triangles, altough it has
> the simplification facility which brings the number doewn to 500-2000).
> It also uses textures and partly transparent billboard trees. The
> program does a lot of geometric optimization and cullings itself so
> it should be quite CPU-intensive.

> The performance on a O2 was surprisingly good (to me, being new to
> openGL, that is. This was my first OpenGL prog...). The app run in the
> default window (300 x 300) with frame rate ranging from 12 to 20,
> average being something like 15.

> When I first ported it to NT I had a 200 PPro with Matrox Millennium I
> 4Mb to test it with. I was quite satisfied with the 3.5 - 4 FPS
> performance I got - that was without any hardware acceleration,
> afterall. The switch between the SGI OpenGL and the MS OpenGL made no
> noticeable difference.

> The I got the pleasure to to test it on a 2 x 200 PPro with a pricy
> OpenGL card and moreover, on a 2 x 300 PII with a yet more price Gloria
> OpenGL card. (Please forgive me the lack of exact product numbers of the
> cards, I have no change to check them now.) The latter machine was said
> to be twice as fast as a O2 when it comes to OpenGL graphics.

> However, neither of the two machines ran to prog with FPS more than 5.
> Changing the drivers in the latter machine to some beta ones increased
> the speed to something like 9 FPS, which was still far from
> satisfactory. Again, there was no difference between SGI and MS OpenGL
> implemetations. And yees, both of the machine seemed to be using
> hardware acceleration as the OpenGL-card settings in the control panel
> had immediate effect on the program.

> I tried to disable parts of the program in order to find the bottleneck,
> but it just seemed to be running slow everywhere. Drawing just a single
> house with 5 textured quadrilaterals gave FPS of 15-25!

> So, I am totally clueless with this. What might cause this, are there
> some things one should particularry avoid when optimizing performance
> for NT machines? Any clue appreciated!

> If the exact models of the cards are relevant, I can post them later.

> -- v --



--
Kevin Baca
Senior Programmer
Sony Interactive Studios America

 
 
 

Performance problems porting from SGI O2 to NT

Post by Frederick Haa » Sat, 28 Mar 1998 04:00:00



> I ported a OpenGL app from O2 to NT. The app is a kind of walk-through
> thing with a large triangle mesh (~10000 triangles, altough it has
> the simplification facility which brings the number doewn to 500-2000).
> It also uses textures and partly transparent billboard trees. The
> program does a lot of geometric optimization and cullings itself so
> it should be quite CPU-intensive.

> The performance on a O2 was surprisingly good (to me, being new to
> openGL, that is. This was my first OpenGL prog...). The app run in the
> default window (300 x 300) with frame rate ranging from 12 to 20,
> average being something like 15.

> When I first ported it to NT I had a 200 PPro with Matrox Millennium I
> 4Mb to test it with. I was quite satisfied with the 3.5 - 4 FPS
> performance I got - that was without any hardware acceleration,
> afterall. The switch between the SGI OpenGL and the MS OpenGL made no
> noticeable difference.

> The I got the pleasure to to test it on a 2 x 200 PPro with a pricy
> OpenGL card and moreover, on a 2 x 300 PII with a yet more price Gloria
> OpenGL card. (Please forgive me the lack of exact product numbers of the
> cards, I have no change to check them now.) The latter machine was said
> to be twice as fast as a O2 when it comes to OpenGL graphics.

According to whom?

Seriously, you may be experiencing the fact that
faster chip != better performance, there's lots of
factors in there.  While I have my problems with
SGI, those new machines are not just great CPU's,
the whole system is just completely well designed.
Very often someone will claim twice the performance
and they're really referring to one aspect, like
fill speed or something.

How big are your textures?  O2 has unified memory
which lets you use as much for texture as you want,
as long as you don't swap main memory.

A setup with dedicated texture memory will need to
swap if you go over the limit.

I would say system bandwidth would be a factor with
2000 polygons, considering how many points make up
a polygon and how much data has to be transfered
per polygon, as well as material, texture, color
and lighting information...it depends how much is actually
being done in the graphics hardware and how much is
being done ahead of time...graphics acceleration is
an ambiguous term at best.  But since your house
with only 5 or 6 polygons ran so slow, it must be
something else, of course.

Anyway, I know I'm not much help...the texture
memory thing is the only thing I can think of
right now, and it even gets me at work on Onyx
IR's with 64MB of texture memory.

Something you may not know is that very often
implementations of OpenGL may require your
texture sizes to powers of two.  So if you
have 300x300 texture it may actually be taking
up 512x512 space...a significant difference.  So
you may need to check implementation release notes
or something.

Good luck anyway,
Fred

--
--------------------------------------------------------
- Frederick Haab - Just Your Average Software Engineer -
- Fight UCE ("Spam")! Here's how: http://www.cauce.org -
--------------------------------------------------------

 
 
 

Performance problems porting from SGI O2 to NT

Post by Ville Her » Tue, 31 Mar 1998 04:00:00


Kevin Baca <kb...@sonyinteractive.com> writes:
>Can you post a very simple program that demonstrates this problem?  Maybe
>then someone can analyze potential problems.

The whole source tree is rather large, but below, I've tried to attach
the relevant parts. I'm aware, you can't compile that. Please email
me, if you'd like the whole source.

>Off hand I would say it probably invloves some sort of state change overhead
>that the O2 drivers somehow handle alot better than the Gloria drivers.  Are
>you minimizing state changes?  Using strips and fans?  Using texture
>objects?  I think posting the code will help.

I use texture objects and display lists. As said, I also tried to comment off
almost all of the code, leaving only 5 quadrilaterals to be rendered, which
still gave only some 20-30 FPS.

types.h:
==============================================================

#ifndef TYPESH
#define TYPESH

// These should preferably be enabled from
// Makefile with -D to enable portability

// #define MSBBYTEORDER
// #define GCC_KLUDGE // kludge
// #define NOBOOL
// #define NO_AUTO_TEMPLATE_INSTATIATION
// #define TEXT_EXT // sgi named texture extension
// #define WIN // Windows kludges (MSVC 5.0)

enum PointContainment {Outside, Inside, OnBoundary};

#ifndef NULL
#define NULL 0x0
#endif

#ifdef NOBOOL
#ifndef bool
#ifndef true
//enum bool {true=1, false=0};
typedef int bool;
#define true 1
#define false 0
#endif
#endif
#endif

#ifdef GCC_KLUDGE
#define throw delete new
#endif

#ifdef WIN
#define glBindTextureEXT glBindTexture
#define glGenTexturesEXT glGenTextures
#pragma warning( disable : 4305 4244)
#endif

void fail(char* a);

#endif

GL.cc:
====================================================================
//#define GLORIA
#ifndef WIN
#include <sys/time.h>
#else
#include <time.h>
#endif
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <time.h>
#ifdef WIN
#include <windows.h>
#include <mmsystem.h>
#endif
#include <GL/gl.h>
#include <GL/glu.h>
#include <GL/glut.h>

#include "VM_files.H"
#include "texture.h"
#include "Polyline.H"
#include "GL.H"
#include "LOD.H"

View *view;  // Global just to enable callbacks.

int vp[4];

void
Start(int argc, char **argv, VMFile &_map, int lods)
{
  view = new View(_map, argc, argv, lods);

  view->Run();

  delete view;

}

///////////// Few convenience funcs

void
Normalize(double& angle)
{
  angle = fmod(angle,360.0);

}

double
Dot(double x0, double y0, double z0,
    double x1, double y1, double z1)
{
  return x0*x1+y0*y1+z0*z1;

}

//////////// Mandatory GL callback

void
Error ( GLenum errno )
{
  printf("gluTessError %s. \n",gluErrorString(errno));

}

//////////// Dummy callback funcs - class members cannot be callbacked

void
MenuFunc(int value)
{
  view->Menu(value);

}

void
Reshape(int width, int height)
{
  view->windW = width;
  view->windH = height;
  glViewport(0, 0, view->windW, view->windH);
  glGetIntegerv(GL_VIEWPORT, vp);

}

void
Draw(void)
{
  view->Render();

}

void
Key(unsigned char key, int /*x*/, int /*y*/)
{
  view->Key(key, false);

}

void
SpecialKey(int key, int /*x*/, int /*y*/)
{
  view->Key(key, true);

}

void MouseMotion2(int x, int y)
{
  view->mouseY = y; view->mouseX = x;

}

void MouseMotion(int x, int y)
{
  view->MouseMotion(x,y);

}

void
Timer(int)
{
  view->MouseMotion(view->mouseX, view->mouseY);
  if (view->mouseButton != -1)
    glutTimerFunc(7, Timer, 0);
  else
    {
      view->timer = false;
      view->Render( /*accurate*/ /*true*/);
    }

}

void
Mouse(int button, int state, int x, int y)
{
  glutDetachMenu(GLUT_RIGHT_BUTTON);

  if(state == GLUT_DOWN)
    {
      if (glutGetModifiers())
        {
          glutAttachMenu(GLUT_RIGHT_BUTTON);
          return;
        }

      if (!view->timer)
        {
          view->changed = true;
          view->timer = true;
          glutTimerFunc(7, Timer, 0);
        }

      view->mouseButton = button;
    }
  else if (state == GLUT_UP)
    view->mouseButton = -1;

  view->mouseY = y; view->mouseX = x;
  view->MouseMotion(x,y);

}

//////////// The actual View functions

View::View(VMFile &_map, int argc, char **argv, int nrlods)
  : surfaceTextureRatio(17.0/2.0), viewAngle(0.67), treeThreshold(20.0),
    first(true)
{
  InitVars();
  map = &_map;

  glutInit(&argc, argv);

  InitGL();

  lods = new LODs(this, nrlods);

  lods->DoLists();

}

View::~View()
{
  delete lods;

}

void
View::Run()
{
  glutMainLoop();

}

void
View::InitGL()
{
  glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_DEPTH);
  glutCreateWindow("Virtual Map");

  GLfloat mat_ambient[] = { 1.0, 1.0, 1.0, 1.0 };
  GLfloat light_position[] = {0, 00, -100, 0.0};
//  GLfloat light_position[] = {0, 100, -30, 0.0};
  GLfloat light_ambient[] = { 0.8, 0.8, 0.8, 1.0 };

  glMaterialfv(GL_FRONT, GL_AMBIENT, mat_ambient);
  glLightfv(GL_LIGHT0, GL_POSITION, light_position);
  glLightModelfv(GL_LIGHT_MODEL_AMBIENT, light_ambient);

  glEnable(GL_LIGHTING);
  glEnable(GL_LIGHT0);
//  glEnable(GL_NORMALIZE);
  glEnable(GL_ALPHA_TEST);

  glShadeModel(GL_SMOOTH);

  glDepthFunc(GL_LESS);
  glEnable(GL_DEPTH_TEST);
  glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_FASTEST);
  glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_FASTEST);

  GetTextures();

  glutReshapeFunc(Reshape);
  glutKeyboardFunc(::Key);
  glutMouseFunc(Mouse);
  glutMotionFunc(MouseMotion2);
  glutSpecialFunc(SpecialKey);
  glutTimerFunc(7, Timer, 0);

  viewP = Point3d(0,0,-100);
  zRotation = 185.0;
  xRotation = 180;
  Key(0, false);

  menu = glutCreateMenu(MenuFunc);
  glutAddMenuEntry("toggle texture (t)", 1);
  glutAddMenuEntry("toggle wireframe (l)", 2);
  glutAddMenuEntry("toggle trees (T)", 3);
  glutAddMenuEntry("toggle buildings (b)", 4);
  glutAddMenuEntry("toggle gouraud/flat (g)", 5);
  glutAddMenuEntry("toggle follow ground (f)", 6);
  glutAddMenuEntry("--------------------------", 7);
  glutAddMenuEntry("toggle view cone cutting", 8);
  glutAddMenuEntry("toggle tree distance test", 9);

  glMatrixMode(GL_MODELVIEW);
  glLoadIdentity();
  double flip_y[] = { 1,0,0,0, 0,-1,0,0, 0,0,1,0, 0,0,0,1 };

  glMultMatrixd(flip_y);

  glMatrixMode(GL_PROJECTION);

  glutDisplayFunc(Draw);

}

void
View::InitVars()
{
  windW = 300; windH = 300;
  zRotation=0; xRotation=-20;

  pine = birch = sallow = grass = water =
    spruce = field = asphalt = building = skysphere = NULL;

  wireFrame = false;
  showBuildings = true;
  showTrees = true;
  showTextures = true;
  gouraud = true;
  viewConeOptimize = true;
  treeDistanceOptimize = true;
  night = false;
  followGround = false;

  treeInterleave = 7;

  zBoost = -1;

  mouseButton = -1;

  changed = true;

}

void
View::DoBuildings()
{
  unsigned i;

  glNewList(BuildingsDL, GL_COMPILE);

  for (i=0;i<
#ifdef GLORIA
          1
#else
          map->BlockAt(VMBuilding::Oid).Length()
#endif
          ;i++)
    {  
      unsigned j;

      if (showTextures)
        glEnable(GL_TEXTURE_2D);

      PolyPoints walls;
      for(j=0; j<((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->numVertices; j++)
        walls.Append(*(Point3d*)map->
                     BlockAt(VMPoint::Oid)[((VMBuilding*)map->
                                            BlockAt(VMBuilding::Oid).At(i))->vertices[j]]);

#ifdef TEXT_EXT
      glBindTextureEXT(GL_TEXTURE_2D, textures[8]);
#else
      enableTexture(building, buildingW, buildingH);
#endif

      Point3d pt = ((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->roof[0];
      for (j=0; j<((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->numVertices; j++)
        {
          VMPoint *pt1 = (VMPoint*)map->
            BlockAt(VMPoint::Oid)[((VMBuilding*)map->
                                   BlockAt(VMBuilding::Oid).At(i))->vertices[j]];
          VMPoint *pt2 = (VMPoint*)map->
            BlockAt(VMPoint::Oid)[((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->
                      vertices[(j+1) % ((VMBuilding*)map->
                                        BlockAt(VMBuilding::Oid)[i])->numVertices]];

          GLfloat head_diffuse[] = { .6, .6, .6, 0 };

          GLfloat normal[] = { -pt2->y + pt1->y, pt2->x - pt1->x, 0 }; // -y,x
          float len = sqrt(normal[0]*normal[0] + normal[1]*normal[1]);
          normal[0] /= len;
          normal[1] /= len;
          Point3d normalTest;

          // go 30 cm perpendicurally from the wall
          normalTest.x = (pt1->x+pt2->x)/2.0 + normal[0];
          normalTest.y = (pt1->y+pt2->y)/2.0 + normal[1];

          if (walls.PointInPoly(normalTest) == Inside)
            {
              normal[0] = -normal[0];
              normal[1] = -normal[1];
            }

      glMaterialfv(GL_FRONT, GL_AMBIENT_AND_DIFFUSE, head_diffuse);

          glColor4f(0.7, 0.7, 0.7, 1);

          glBegin(GL_QUADS);

          glTexCoord2f(0,1);
          glNormal3fv(normal);
          glVertex3f(pt1->x, pt1->y, zBoost*pt.z -
                     ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);      

          glTexCoord2f(1,1);
          glNormal3fv(normal);
          glVertex3f(pt2->x, pt2->y, zBoost*pt.z -
                     ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);

          glTexCoord2f(1,0);
          glNormal3fv(normal);
          glVertex3f(pt2->x, pt2->y, zBoost*pt2->z);

          glTexCoord2f(0,0);
          glNormal3fv(normal);
          glVertex3f(pt1->x, pt1->y, zBoost*pt1->z);

          glEnd();
        }

      glDisable(GL_TEXTURE_2D);
      GLfloat normal[] = {0, 0, 1};
      Point3dArray *roof = &((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->roof;
      for (j=0; j<roof->Length(); j+=3)
        {
          glBegin(GL_TRIANGLES);
          glNormal3fv(normal);
          glVertex3f(roof->At(j).x, roof->At(j).y, zBoost*roof->At(j).z -
                     ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);
          glVertex3f(roof->At(j+1).x, roof->At(j+1).y, zBoost*roof->At(j+1).z -
                     ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);
          glVertex3f(roof->At(j+2).x, roof->At(j+2).y, zBoost*roof->At(j+2).z -
                     ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);
          glEnd();
        }
    }

  glEndList();

}

void
View::DoTrees()
{
  unsigned i;
  double tx,ty,tz,dot;
  static unsigned int last = 8000;

  glNewList(TreesDL, GL_COMPILE);  

#ifdef TEXT_EXT
  glBindTextureEXT(GL_TEXTURE_2D, textures[1]);
#endif

  if (showTextures)
    glEnable(GL_TEXTURE_2D);
  glEnable(GL_ALPHA_TEST);
...

read more »

 
 
 

Performance problems porting from SGI O2 to NT

Post by Ville Her » Tue, 31 Mar 1998 04:00:00



>> cards, I have no change to check them now.) The latter machine was said
>> to be twice as fast as a O2 when it comes to OpenGL graphics.
>According to whom?

http://www.specbench.org/gpc/opc.static/index.html

Quote:>Seriously, you may be experiencing the fact that
>faster chip != better performance, there's lots of
>factors in there.  While I have my problems with
>SGI, those new machines are not just great CPU's,
>the whole system is just completely well designed.
>Very often someone will claim twice the performance
>and they're really referring to one aspect, like
>fill speed or something.

But still, four times slower...?

Quote:>How big are your textures?  O2 has unified memory
>which lets you use as much for texture as you want,
>as long as you don't swap main memory.

The PC has 16 Mb of texture memory, which is far more
than the size of the textures (few megs).

Quote:>A setup with dedicated texture memory will need to
>swap if you go over the limit.
>I would say system bandwidth would be a factor with
>2000 polygons, considering how many points make up
>a polygon and how much data has to be transfered
>per polygon, as well as material, texture, color
>and lighting information...it depends how much is actually
>being done in the graphics hardware and how much is
>being done ahead of time...graphics acceleration is
>an ambiguous term at best.  But since your house
>with only 5 or 6 polygons ran so slow, it must be
>something else, of course.
>Anyway, I know I'm not much help...the texture
>memory thing is the only thing I can think of
>right now, and it even gets me at work on Onyx
>IR's with 64MB of texture memory.
>Something you may not know is that very often
>implementations of OpenGL may require your
>texture sizes to powers of two.  So if you
>have 300x300 texture it may actually be taking
>up 512x512 space...a significant difference.  So
>you may need to check implementation release notes
>or something.

The textures are 512x512.

I thought this might be something with linking against
wrong libraries or using wrong dll. Or wrong drivers.
Anybody has had experiences on this?

--
-- v --


 
 
 

Performance problems porting from SGI O2 to NT

Post by Dave Shreine » Tue, 31 Mar 1998 04:00:00


Ville.He...@hut.fi (Ville Herva) writes:
> I use texture objects and display lists. As said, I also tried to comment
> off almost all of the code, leaving only 5 quadrilaterals to be rendered,
> which still gave only some 20-30 FPS.

   Using texture objects is good.  Display lists are okay, depends on the
hardware.  In general, I think that using vertex arrays may be better.

   One question.  What happens if you change the size of the window?
From what you write above, you sound fill limited ( that is, reducing
the number of primitives didn't increase the frame rate ).

   After a look around, you may very well be getting killed with state
changes.  In particular, hardware sometimes doesn't appreciate lots
of state changes.  From what I'm seeing in your code below:

   - swithing texture on and off is a bad idea.  Each time you call
                                glEnable() or glDisable(), there's a whole lot of shuffling
                                around in the internals of the library.  If you can, sort
                                your scene by state changes ( i.e. grouping all textured
                                primitives together, all lit primitives ... )

   - using connected primitives would be a real win as well.  You seem
                                to either draw tris or quads, but none of them in strips.
                                Additionally, glBegin( GL_TRIANGLES ) or glBegin( GL_QUADS )
                                allow any number of primitives to be specifed between them
                                and the glEnd().  Moving them outside of your loops will
                                help amortize them across each primitive.

   - using glColorMaterial() could also help.  Changing materials
                                with glMaterial() can be expensive, and hopefully, color
                                material can help reduce that expense.

   One final thing.  You might look at using either 'ogldebug' or
the 'gltrace' utilities ( both can be found at http://www.opengl.org/,
I believe ) to actually see the stream of commands you're sending.

   I'm glad to see that the O2 kicks butt here ( I'm allowed to like the
machine, even if I work for the company ;).  However, there may be a few
things which could increase performance. If you don't mind some comments,
I've stuck some in the code below.  For what they're worth.

> void
> View::InitGL()
> {
                                [ ... ]

>   glMatrixMode(GL_MODELVIEW);
>   glLoadIdentity();
>   double flip_y[] = { 1,0,0,0, 0,-1,0,0, 0,0,1,0, 0,0,0,1 };

>   glMultMatrixd(flip_y);

   You may be better off doing a glScale( 1.0, -1.0, 1.0 );  OpenGL
understands the difference between a scale matrix, and a generic
matrix provided with glMultMatrix(), and can reduce the computations
in transforming a vertex.  If you pass in a generic matrix, regardless
of it its something this simple, it will do a full matrix multiply, where
if its only a scale, it will only modify the vertices accordingly.

>   glMatrixMode(GL_PROJECTION);

>   glutDisplayFunc(Draw);
> }

                                [ ... ]

> void
> View::DoBuildings()
> {
>   unsigned i;

>   glNewList(BuildingsDL, GL_COMPILE);

>   for (i=0;i<
> #ifdef GLORIA
>      1
> #else
>      map->BlockAt(VMBuilding::Oid).Length()
> #endif
>      ;i++)
>     {      
>       unsigned j;

>       if (showTextures)
>    glEnable(GL_TEXTURE_2D);

>       PolyPoints walls;
>       for(j=0; j<((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->numVertices; j++)
>    walls.Append(*(Point3d*)map->
>                 BlockAt(VMPoint::Oid)[((VMBuilding*)map->
>                                        BlockAt(VMBuilding::Oid).At(i))->vertices[j]]);

> #ifdef TEXT_EXT
>       glBindTextureEXT(GL_TEXTURE_2D, textures[8]);
> #else
>       enableTexture(building, buildingW, buildingH);
> #endif

>       Point3d pt = ((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->roof[0];
>       for (j=0; j<((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->numVertices; j++)
>    {
>      VMPoint *pt1 = (VMPoint*)map->
>        BlockAt(VMPoint::Oid)[((VMBuilding*)map->
>                               BlockAt(VMBuilding::Oid).At(i))->vertices[j]];
>      VMPoint *pt2 = (VMPoint*)map->
>        BlockAt(VMPoint::Oid)[((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->
>                  vertices[(j+1) % ((VMBuilding*)map->
>                                    BlockAt(VMBuilding::Oid)[i])->numVertices]];

>      GLfloat head_diffuse[] = { .6, .6, .6, 0 };

>      GLfloat normal[] = { -pt2->y + pt1->y, pt2->x - pt1->x, 0 }; // -y,x
>      float len = sqrt(normal[0]*normal[0] + normal[1]*normal[1]);
>      normal[0] /= len;
>      normal[1] /= len;
>      Point3d normalTest;

>      // go 30 cm perpendicurally from the wall
>      normalTest.x = (pt1->x+pt2->x)/2.0 + normal[0];
>      normalTest.y = (pt1->y+pt2->y)/2.0 + normal[1];

>      if (walls.PointInPoly(normalTest) == Inside)
>        {
>          normal[0] = -normal[0];
>          normal[1] = -normal[1];
>        }

>       glMaterialfv(GL_FRONT, GL_AMBIENT_AND_DIFFUSE, head_diffuse);

>      glColor4f(0.7, 0.7, 0.7, 1);

   You might be better off either moving the glMaterial() outside of
the loop if "head_diffuse" remains constant, or using glColorMaterial().
I don't know what the above color is attempting to do.  I'll get replaced
by the computed color from lighting in the quads below.

- Show quoted text -

>      glBegin(GL_QUADS);

>      glTexCoord2f(0,1);
>      glNormal3fv(normal);
>      glVertex3f(pt1->x, pt1->y, zBoost*pt.z -
>                 ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);      

>      glTexCoord2f(1,1);
>      glNormal3fv(normal);
>      glVertex3f(pt2->x, pt2->y, zBoost*pt.z -
>                 ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);

>      glTexCoord2f(1,0);
>      glNormal3fv(normal);
>      glVertex3f(pt2->x, pt2->y, zBoost*pt2->z);

>      glTexCoord2f(0,0);
>      glNormal3fv(normal);
>      glVertex3f(pt1->x, pt1->y, zBoost*pt1->z);

>      glEnd();
>    }

   Depending on your data structures, you may be thrashing cache pretty
good.  If you can load your coordinates ( vertex, texture and normals )
into arrays, you may see some speed up from using vertex arrays.  In
particular, for machines with function call overhead, they'll win big
time.  For the above loop, if those don't work, then:

                                1) move the glBegin() / glEnd() outside of the loop
                                2) use glVertex2fv(), and glTexCoord2fv(), which can be a
                                   little faster.

- Show quoted text -

>       glDisable(GL_TEXTURE_2D);
>       GLfloat normal[] = {0, 0, 1};
>       Point3dArray *roof = &((VMBuilding*)map->BlockAt(VMBuilding::Oid)[i])->roof;
>       for (j=0; j<roof->Length(); j+=3)
>    {
>      glBegin(GL_TRIANGLES);
>      glNormal3fv(normal);
>      glVertex3f(roof->At(j).x, roof->At(j).y, zBoost*roof->At(j).z -
>                 ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);
>      glVertex3f(roof->At(j+1).x, roof->At(j+1).y, zBoost*roof->At(j+1).z -
>                 ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);
>      glVertex3f(roof->At(j+2).x, roof->At(j+2).y, zBoost*roof->At(j+2).z -
>                 ((VMBuilding*)map->BlockAt(VMBuilding::Oid).At(i))->height*1.6);
>      glEnd();
>    }
>     }

   Again, move the glBegin() / glEnd() of the loop.  Even though its
a display list, the implementation may not be smart enough to pull
the glBegin() / glEnd() outside of the stream of OpenGL commands.

- Show quoted text -

>   glEndList();
> }

> void
> View::DoTrees()
> {
>   unsigned i;
>   double tx,ty,tz,dot;
>   static unsigned int last = 8000;

>   glNewList(TreesDL, GL_COMPILE);  

> #ifdef TEXT_EXT
>   glBindTextureEXT(GL_TEXTURE_2D, textures[1]);
> #endif

>   if (showTextures)
>     glEnable(GL_TEXTURE_2D);
>   glEnable(GL_ALPHA_TEST);
>   glAlphaFunc(GL_GEQUAL,0.4);

>   GLfloat matBase[] = { 1.0, 1.0, 1.0, 1.0 };  
>   GLfloat treeColor[] = { 0, 0.7, 0.3, 1.0 };

>   if (showTextures)
>     glMaterialfv(GL_FRONT, GL_AMBIENT_AND_DIFFUSE, matBase);
>   else
>     glMaterialfv(GL_FRONT, GL_AMBIENT_AND_DIFFUSE, treeColor);

>   for (i=0; i<map->BlockAt(VMTree::Oid).Length(); i++)
>     {      
>       Point3d pt1,pt2;
>       VMTree *tree = (VMTree*)map->BlockAt(VMTree::Oid)[i];

>       Point3d vi = viewP - lookAt*50;
>       tx =  tree->x - vi.x;
>       ty =  -tree->y - vi.y;
>       tz =  tree->z-tree->height/2 - vi.z;
>       dot = Dot(tx, ty, tz, lookAt.x, lookAt.y, lookAt.z);

>       if (dot < 0 ||
>      (viewConeOptimize &&
>       dot*dot < viewAngle*Dot(tx, ty, tz, tx, ty, tz)))
>    continue;
>       if (treeDistanceOptimize && timer &&
>      (dot > treeThreshold*treeThreshold) &&
>      i%4 != 0) {
>    continue;
>       }

>       double direction = atan2(ty,-tx);

> #ifdef TEXT_EXT
>       if (last!=tree->type)
>    if (tree->type == VMTree::pine)
>      glBindTextureEXT(GL_TEXTURE_2D, textures[0]);
>    else if (tree->type == VMTree::spruce)
>      glBindTextureEXT(GL_TEXTURE_2D, textures[1]);
>    else if (tree->type == VMTree::sallow)
>      glBindTextureEXT(GL_TEXTURE_2D, textures[3]);
>    else if (tree->type == VMTree::birch)
>      glBindTextureEXT(GL_TEXTURE_2D, textures[3]);
> #else
>       if (last!=tree->type)
>    if (tree->type == VMTree::pine)
>      enableTexture(pine, pineW, pineH);
>    else if (tree->type == VMTree::spruce)
>      enableTexture(spruce, spruceW, spruceH);
>    else if (tree->type == VMTree::sallow)
>      enableTexture(spruce, spruceW, spruceH);
>    else if (tree->type == VMTree::birch)
>      enableTexture(birch, birchW, birchH);
> #endif      

>       last = tree->type;

>       pt1.x = tree->x - sin(direction)*tree->height/4.0;
>       pt1.y = tree->y + cos(direction)*tree->height/4.0;
>       pt1.z = tree->z;

>       pt2.x = tree->x + sin(direction)*tree->height/4.0;
>       pt2.y = tree->y - cos(direction)*tree->height/4.0;
>       pt2.z = tree->z;

>       glBegin(GL_QUADS);

>       glTexCoord2f(0.01,0.01);
>       glVertex3f(pt1.x, pt1.y, zBoost*pt1.z);

>       glTexCoord2f(.99,0.01);
>       glVertex3f(pt2.x, pt2.y, zBoost*pt2.z);

>       if (showTextures)
>    {
>      glTexCoord2f(.99,.99);
>      glVertex3f(pt2.x, pt2.y, zBoost*pt2.z - tree->height);

>      glTexCoord2f(0.01,.99);
>      glVertex3f(pt1.x, pt1.y, zBoost*pt1.z - tree->height);
>    }
>       else
>    {
>      glVertex3f((pt2.x+pt1.x)/2, (pt2.y+pt1.y)/2, zBoost*pt2.z - tree->height);

...

read more »

 
 
 

Performance problems porting from SGI O2 to NT

Post by Andy Vespe » Tue, 31 Mar 1998 04:00:00


... lots of good advice omitted.

Quote:> Something you may not know is that very often
> implementations of OpenGL may require your
> texture sizes to powers of two.  So if you
> have 300x300 texture it may actually be taking
> up 512x512 space...a significant difference.  So
> you may need to check implementation release notes
> or something.

Actually, ALL OpenGL implementations require texture
sizes to be a power of 2.

--
Andy V, OpenGL Alpha Geek  (never a Digital spokesperson)

(To send mail to me, change "NoSpamPlease" to "com".)

 
 
 

Performance problems porting from SGI O2 to NT

Post by Ville Her » Fri, 03 Apr 1998 04:00:00




>> I use texture objects and display lists. As said, I also tried to comment
>> off almost all of the code, leaving only 5 quadrilaterals to be rendered,
>> which still gave only some 20-30 FPS.
>   Using texture objects is good.  Display lists are okay, depends on the
>hardware.  In general, I think that using vertex arrays may be better.

Unfortunately, it is not trivial to divide the landscape into fans or
strips. I concidered using strips, but there were many issues that made
using DL's more attractive. On O2, the difference was not conciderable.

As I understand, using strips, fans or arrays inside a DL gives no
additional performance?

Anyway, I believe that the problem is not here in this case. I may still
dig into this later.

Quote:>   One question.  What happens if you change the size of the window?
>From what you write above, you sound fill limited ( that is, reducing
>the number of primitives didn't increase the frame rate ).

I normally use 300x300 window to measure the frame rate. using 150x150
(four times smaller) only gave one FPS more (6 FPS vs 7 FPS). Nn full
screen mode (1200x1024), the rate was something like 2-3, which is
relatively better than with O2.

Quote:>   After a look around, you may very well be getting killed with state
>changes.  In particular, hardware sometimes doesn't appreciate lots
>of state changes.  From what I'm seeing in your code below:
>   - swithing texture on and off is a bad idea.  Each time you call
>                            glEnable() or glDisable(), there's a whole lot of shuffling
>                            around in the internals of the library.  If you can, sort
>                            your scene by state changes ( i.e. grouping all textured
>                            primitives together, all lit primitives ... )

I took those glEnable(GL_TEXTURE_2D)'s off, which is admittedly a very
good idea. However, that gave no noticeable add to the frame rate.

Quote:>   - using connected primitives would be a real win as well.  You seem
>                            to either draw tris or quads, but none of them in strips.
>                            Additionally, glBegin( GL_TRIANGLES ) or glBegin( GL_QUADS )
>                            allow any number of primitives to be specifed between them
>                            and the glEnd().  Moving them outside of your loops will
>                            help amortize them across each primitive.

The display lists are only made once, so this should not be a problem?

Quote:>   - using glColorMaterial() could also help.  Changing materials
>                            with glMaterial() can be expensive, and hopefully, color
>                            material can help reduce that expense.

Again, I did that, but unfortunately I got no measurable speed-up.

Quote:>   One final thing.  You might look at using either 'ogldebug' or
>the 'gltrace' utilities ( both can be found at http://www.veryComputer.com/,
>I believe ) to actually see the stream of commands you're sending.

gltrace gave a huge (~10 MB) output, which can be summarized as follows:
after building display lists, the app essentially call each one of them
once a frame.

Quote:>   I'm glad to see that the O2 kicks *here ( I'm allowed to like the
>machine, even if I work for the company ;).  However, there may be a few
>things which could increase performance. If you don't mind some comments,
>I've stuck some in the code below.  For what they're worth.

I like the machine as well, even though I don't work for SGI...

Quote:>   You may be better off doing a glScale( 1.0, -1.0, 1.0 );  OpenGL
>understands the difference between a scale matrix, and a generic
>matrix provided with glMultMatrix(), and can reduce the computations
>in transforming a vertex.  If you pass in a generic matrix, regardless
>of it its something this simple, it will do a full matrix multiply, where
>if its only a scale, it will only modify the vertices accordingly.

Good point.

Quote:>>          if (walls.PointInPoly(normalTest) == Inside)
>>            {
>>              normal[0] = -normal[0];
>>              normal[1] = -normal[1];
>>            }

>>       glMaterialfv(GL_FRONT, GL_AMBIENT_AND_DIFFUSE, head_diffuse);

>>          glColor4f(0.7, 0.7, 0.7, 1);
>   You might be better off either moving the glMaterial() outside of
>the loop if "head_diffuse" remains constant, or using glColorMaterial().
>I don't know what the above color is attempting to do.  I'll get replaced
>by the computed color from lighting in the quads below.

Good point again, but this is hardly the MAJOR problem. Unfortunately.

Quote:>   Depending on your data structures, you may be thrashing cache pretty
>good.  If you can load your coordinates ( vertex, texture and normals )
>into arrays, you may see some speed up from using vertex arrays.  In
>particular, for machines with function call overhead, they'll win big
>time.  For the above loop, if those don't work, then:
>                            1) move the glBegin() / glEnd() outside of the loop
>                            2) use glVertex2fv(), and glTexCoord2fv(), which can be a
>                               little faster.

Hmm, this was again done during the initial DL build-up, which is done
once. I believe this should not deteriorate the frame rate.

Quote:>   Again, move the glBegin() / glEnd() of the loop.  Even though its
>a display list, the implementation may not be smart enough to pull
>the glBegin() / glEnd() outside of the stream of OpenGL commands.

Yep.

Quote:>   Hmmm.  The first two vertices have texture coordiantes regardless,
>and the last two are tested?  Also, glBegin() / glEnd(), and vector
>forms if you can.

A potentional bug; Ive mostly tested with textures on.

Quote:>   Should this normal go inside of the loop?

Yep!

Quote:>Thanx,
>Dave

Thanks for the advice! I think they may well be worth some 10% in speed
(what I said about the unmeasurability involves my timer - it seems to
give only FPS's of 5.8, 6.4, 7.3 and so on on NT, while on unix it had
better accuracy. I haven't bothered to amend that, since I'm aiming for

Perhaps the Elsa is optimized for tests only. I really can't figure out,
what is the bottleneck. everything seems to be slower than expected. On
the other hand, the demo apps still run as fast as one could expect.
Weird.

I guess, the next thing I'll do is to write a small test app from scratch
and start testing with it. Then I can say something more accurate.

--
-- v --


 
 
 

Performance problems porting from SGI O2 to NT

Post by Steve Bake » Fri, 03 Apr 1998 04:00:00



> As I understand, using strips, fans or arrays inside a DL gives no
> additional performance?

I'd be suprised if that were true. In principal, a very clever Display
List
implementation *could* optimise a non-strip/fan set of triangles into
something
just as efficient as a properly stripped/fanned set - but I very much
doubt if
many (if any) OpenGL implementations actually do that. Generally,
discrete
triangles need three transform/lighting operations each where in long
strips,
it drops to one transform/lighting operation per triangle.

If you are transform bound (which you may be given what we hear from
you),
using long strips could triple your performance.

Quote:> >   - using connected primitives would be a real win as well.  You seem
> >     to either draw tris or quads, but none of them in strips.
> >     Additionally, glBegin( GL_TRIANGLES ) or glBegin( GL_QUADS )
> >     allow any number of primitives to be specifed between them
> >     and the glEnd().  Moving them outside of your loops will
> >     help amortize them across each primitive.

> The display lists are only made once, so this should not be a problem?

Yes, it could still be a problem. You are assuming a lot about display
list
optimisations that may well not be true. OpenGL only guarantees that
DL's are no less efficient than the corresponding non-DL commands. Even
on
systems that *could* optimise that intelligently, it would be very easy
to
trip up the optimisation by introducing commands in an unexpected way
such
that the optimiser fails to spot the speedup possibilities.

Quote:> >   I'm glad to see that the O2 kicks *here ( I'm allowed to like the
> >machine, even if I work for the company ;).  However, there may be a few
> >things which could increase performance. If you don't mind some comments,
> >I've stuck some in the code below.  For what they're worth.

> I like the machine as well, even though I don't work for SGI...

Funny - I was really disappointed in it. Compared to a decent (ie $700)
PC
with a reasonable (ie $150) graphics card, I found the O2 to be
lack-lustre.
However, it's not truly an apples-for-apples comparison - and I know
there
are areas where the O2 wins ... but not for me. In my application I saw
something like 2Hz from the O2 and 15Hz from the el-cheapo PC.

--

Steve Baker                (817)619-8776 (Vox/Vox-Mail)
Raytheon Systems Inc.      (817)619-4028 (Fax)


 
 
 

Performance problems porting from SGI O2 to NT

Post by Ville Her » Fri, 10 Apr 1998 04:00:00



> (Lengthy whine on lousy performance)

I finally found the solution. There was a silly little useless
function call that somehow had now effect on O2, but drastically
deteriorated the performance on PC's. I'll skip the detailed
explanation as it clearly has no value for others struggling
with poor performance.

Now the frame rates are as expected and seen to comply with
the benchmarks.

I am sorry for wasting your time -- I thought from the beginning
that this was more exciting problem, something to do with the
special characteristics of the GL cards perhaps.

I however got many good advice, and I'm hoping that by applying
them, I'm can still gain additional boost.

Thank you all who were kind enough to reply!

--
-- v --


 
 
 

Performance problems porting from SGI O2 to NT

Post by Steve Bake » Fri, 10 Apr 1998 04:00:00




> > (Lengthy whine on lousy performance)

> I finally found the solution. There was a silly little useless
> function call that somehow had now effect on O2, but drastically
> deteriorated the performance on PC's. I'll skip the detailed
> explanation as it clearly has no value for others struggling
> with poor performance.

Oh no - you don't get off that easily :-)

You have to tell us which function call was the silly useless one
for the good of future generations.

--

Steve Baker                (817)619-8776 (Vox/Vox-Mail)
Raytheon Systems Inc.      (817)619-4028 (Fax)


 
 
 

Performance problems porting from SGI O2 to NT

Post by Gavin Be » Fri, 10 Apr 1998 04:00:00


Quote:>You have to tell us which function call was the silly useless one
>for the good of future generations.

Hear hear!  And until you do, maybe we should take bets on what it
was:

glEnable(GL_SLOW_MODE) ?
glSleep(1); ?
glHint(GL_128_BIT_COLOR, GL_NICEST); ?


SkyPaint: what do QuickTimeVR, 3DStudio and VRML have in common?
  http://www.wasabisoft.com/

 
 
 

Performance problems porting from SGI O2 to NT

Post by Ville Her » Fri, 10 Apr 1998 04:00:00



>>You have to tell us which function call was the silly useless one
>>for the good of future generations.
>Hear hear!  And until you do, maybe we should take bets on what it
>was:
>glEnable(GL_SLOW_MODE) ?
>glSleep(1); ?
>glHint(GL_128_BIT_COLOR, GL_NICEST); ?

No, it was

glEnable(GL_RAYTRACE);

although I also tried

glPollSMBNetwork(GL_THROUGHOUTLY);
glTravellingSalesManProblem(triangle_mesh);
glCrackDES^H^H^HIDEA(ciphertext);
glBlockingRequest(GL_MS_HELPDESK);

Okay, okay. You guys just seem to refuse to take that as a good
way to say that I messed up without losing my face.

I updated certain display lists periodically. These include the
tree DL and surface model that is divided into number of squares.
(This is sorta simple map walk though program.) The reason for
updates was of course to drop the irrelevant objects out. Now, these
update were taking place every frame - not every tenth or so frame.
The reason for this was a little mistake I made in the clock function
while porting to NT. The stupidest thing was that this call was to even
needed, since the update was also done after every 7 frames. I just had
forgotten to take that clock based thing out in the original program,
and, in addition, made a mistake in porting. The guilty function call
was looking too innocent to be found in the debugging taken that
I thought I had removed the clock based update.

HTH (although I know it can _only_ make one laugh.)

--
-- v --


 
 
 

1. Performance problems porting from O2 to Win95 to NT

   I have a complex (~20,000 fully linear-textured triangles) animation that I
coded that was running at about 7-8 fps on an O2 (5000).  When I compiled it on
Win95, the frame rate was still about 6 1/2 fps (48 Mb P200-->
w/8_Mb_FireGL_1000_Pro & bios ver 4.1).  I recently installed NT4, and the same
executable runs at little more than 2 fps.  Would this have something to do
with the way NT parcels out memory to processes, and can this be changed
(increased)?  Or is NT just doing a lot more management stuff in the
background?  I doubt the difference in 95 and NT (with bios 4.8) drivers for
the card would make that much of a difference, especially since the bios and
drivers are newer than when I was running it on 95.  I'm going to implement
display lists in an attempt to make the program less CPU dependant.  We'll see
if that works.  Has anyone else seen a drastic difference between 95 and NT?

Tom

2. License problem

3. Porting-Problem WIN-NT-OpenGL to SGI-IRIX-OpenGL

4. How do I send fonts with a Publication in progress?

5. GLUI 2.0 question

6. Porting Code from SGI to NT

7. Porting SGI Inventor apps to NT

8. X/Allocation problem (SGI O2 - IRIX6.3)

9. Problem: Connecting SGI O2 with Epson projector

10. SGI O2 mediarecorder problem

11. [= SGI O2 X/Font problem - Need Help! =]