Aller au contenu

Photo

INFO: Draw calls not geometry as a bottleneck in NWN.


  • Veuillez vous connecter pour répondre
118 réponses à ce sujet

#51
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

From looking over your methodology, I can't find anything which jumps out as flawed, but...

 

Since I don't have much time and I want to make sure I'm understanding your observations, consider the following:

 

A. A normal 10x10 area where each tile is composed of one flat plane (1,000cmx1,000cm) and one simple cube.  And, of course, a walkmesh.  "Cube" in this sense would be 2 meter squared and above the flat plane (the ground) some distance.

 

B. A 10x10 group where 9 tiles contain only walkmesh and one tile has a single uncut plane of 10,000cm x 10,000cm parented to it and also 10 cubes, which have been Attached in Max to be 1 mesh and which are also parented to it.

 

Based on your observations so far, which of those is preferable for optimal performance?

 

Are you using gDEBugger to look at this?  Any consolidated meshes should "pop" into view completely for each draw call as confirmation of consolidation.



#52
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

From looking over your methodology, I can't find anything which jumps out as flawed, but...

 

Since I don't have much time and I want to make sure I'm understanding your observations, consider the following:

 

A. A normal 10x10 area where each tile is composed of one flat plane (1,000cmx1,000cm) and one simple cube.  And, of course, a walkmesh.  "Cube" in this sense would be 2 meter squared and above the flat plane (the ground) some distance.

 

B. A 10x10 group where 9 tiles contain only walkmesh and one tile has a single uncut plane of 10,000cm x 10,000cm parented to it and also 10 cubes, which have been Attached in Max to be 1 mesh and which are also parented to it.

 

Based on your observations so far, which of those is preferable for optimal performance?

 

Are you using gDEBugger to look at this?  Any consolidated meshes should "pop" into view completely for each draw call as confirmation of consolidation.

I guess you  mean 100 cubes in B, a 10x10 array of cubes, one for each original tile.

Based on the limited cases I've tested so far, I would expect your case B to be marginally worse than A.

 

I'm going to do the 8x8 barracks next, then finally the whole 16x16. It's a bit slow because I'm checking the models in notepad as I go, as well as visual inspection, to make sure he meshes are what they are supposed to be I'm not overlooking anything.


  • OldTimeRadio aime ceci

#53
rjshae

rjshae
  • Members
  • 4 488 messages

I agree with the OP. My understanding is that it is better to use a single merged texture file, or as few as possible, in order to reduce draw calls. Just by looking at the texture files used in NWN2 placeables, you can tell the NWN2 developers took a lot of effort to do this. It's inconvenient in terms of wrapping the UV map, but I'm trying to make more of an effort to follow their lead.

 

My $.02 worth.


  • OldTimeRadio aime ceci

#54
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Next datum point: 8x8 chunked Barracks: 69/160 fps. Definitely degrading. To summarise:

1x1 = 112/180 fps
2x2 = 112/180 fps
4x4 = 101/180 fps
8x8 =  69/160 fps

The 8x8 chunk is a 2.5 Mbyte model when compiled, with about 25k polys total. The largest single trimesh in it, is 5,664 polys bitmapped with tcn_stone20. In NWN1 terms, this is a heavy model and even though there are only four models in the whole area, since typically only one of them is in camera shot at a time, it's the weight of that model that dominates performance.

 

On the basis of these tests, I'm concluding that in NWN1, chunking tile groups to consolidate meshes across bitmaps (to save draw calls) doesn't improve performance because you end up with heavier models with too many polys, which pulls the performance down by more than you gain.

 

This is based on observing frame-rate in-game with FRAPS.


  • Zarathustra217, rjshae, OldTimeRadio et 1 autre aiment ceci

#55
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I can't do the 16x16 test. A bit disappointing, but the model exceeds the theoretical capacity of the game engine. It goes like this:

  1. After consolidating the four tiles in the Barracks2x2 group, the largest trimesh in the model is the one bitmapped with tcn01_stone20 and that has 354 faces (81+33 from tcn01_s19_01, 72+22 from tcn01_s20_01, 32+14 from tcn01_t19_01 and 82+18 from tcn01_t20_01).
  2. It takes 64 such groups to fill a 16x16 area, so after making a single model for the whole area and consolidating trimeshes by bitmap, we get a single trimesh bitmapped with tcn01_stone20, with 64x354 = 22656 faces.
  3. When a model is compiled or loaded, the tverts in each trimesh are exploded so that each face gets its own 3 tverts. So we would have a trimesh with 3x22656 =  67968 tverts.
  4. In the game engine, tverts are indexed by 16-bit unsigned integers, so you can't have more that 65536 of them in a single trimesh.
  5. 67968 > 65536.
  6. Bang! :o

  • Estelindis, OldTimeRadio et MerricksDad aiment ceci

#56
Carcerian

Carcerian
  • Members
  • 1 108 messages

 

I can't do the 16x16 test. A bit disappointing, but the model exceeds the theoretical capacity of the game engine. It goes like this:

  1. After consolidating the four tiles in the Barracks2x2 group, the largest trimesh in the model is the one bitmapped with tcn01_stone20 and that has 354 faces (81+33 from tcn01_s19_01, 72+22 from tcn01_s20_01, 32+14 from tcn01_t19_01 and 82+18 from tcn01_t20_01).
  2. It takes 64 such groups to fill a 16x16 area, so after making a single model for the whole area and consolidating trimeshes by bitmap, we get a single trimesh bitmapped with tcn01_stone20, with 64x354 = 22656 faces.
  3. When a model is compiled or loaded, the tverts in each trimesh are exploded so that each face gets its own 3 tverts. So we would have a trimesh with 3x22656 =  67968 tverts.
  4. In the game engine, tverts are indexed by 16-bit unsigned integers, so you can't have more that 65536 of them in a single trimesh.
  5. 67968 > 65536.
  6. Bang! :o

 

 

world-exploding-o.gif

 

Doh!


  • Estelindis, OldMansBeard et MerricksDad aiment ceci

#57
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Just a rider on that 16x16 attempt:

 

The ascii model is about 4.6MB

  • NWNExplorer displays it correctly
  • The BW compiler can't compile it and drops out reporting a "Major Error" in the model.
  • The toolset crashes if it tries to load it
  • The game itself crashes if it tries to load it

nwnmdlcomp can compile it, without exploding the tverts

The resulting (nwnmdlcomp) compiled model is about 9.6MB

  • It displays correctly in NWNExplorer
  • The toolset crashes it it tries to load it
  • The game itself crashes if it tries to load it

Moral: don't consolidate meshes beyond 21845 faces.

I think I might build that into CM3.


  • Zwerkules, Estelindis, Rolo Kipp et 1 autre aiment ceci

#58
Zarathustra217

Zarathustra217
  • Members
  • 221 messages

I believe the number of drawcalls tax the CPU rather than the GPU, meaning you would only notice the difference in situations where the CPU is the bottleneck.

 

I'm honestly a bit sceptical that it's the actual drawcalls that cause performance issues - you'd need a quite large amount for that - but it could perhaps be related to some other function/feature that's called in relation to rendering each sub-mesh (and thus in conjunction with each drawcall),

 

An interesting test could be to measure when the amount of meshes started to become an issue. Ideally, such a test would also be run in a way that it frequently change texture between the meshes to give something more comparable to actual everyday use - but setting that up may be a bit time consuming.


  • OldTimeRadio aime ceci

#59
Bannor Bloodfist

Bannor Bloodfist
  • Members
  • 924 messages

 

I can't do the 16x16 test. A bit disappointing, but the model exceeds the theoretical capacity of the game engine. It goes like this:

  1. After consolidating the four tiles in the Barracks2x2 group, the largest trimesh in the model is the one bitmapped with tcn01_stone20 and that has 354 faces (81+33 from tcn01_s19_01, 72+22 from tcn01_s20_01, 32+14 from tcn01_t19_01 and 82+18 from tcn01_t20_01).
  2. It takes 64 such groups to fill a 16x16 area, so after making a single model for the whole area and consolidating trimeshes by bitmap, we get a single trimesh bitmapped with tcn01_stone20, with 64x354 = 22656 faces.
  3. When a model is compiled or loaded, the tverts in each trimesh are exploded so that each face gets its own 3 tverts. So we would have a trimesh with 3x22656 =  67968 tverts.
  4. In the game engine, tverts are indexed by 16-bit unsigned integers, so you can't have more that 65536 of them in a single trimesh.
  5. 67968 > 65536.
  6. Bang! :o

 

I heard that bang all the way over here, across that little thing they call a pond...

 

NWN is great, for it's day, but there were all sorts of limitations, and even though we have pushed those limits far beyond what BIoware originally thought possible, we are still working with an ancient (in computer parlance) engine.  One that truly does not utilize the power of our computer systems. 

 

This whole experiment was an interesting thing to investigate, but we still end up with the results we started with.  An engine that can not handle high poly objects of the sort that newer games can handle, an Engine that does not use the possible texture abilities provided by all of that wonderful new hardware that we all currently use etc.  However, having said all of that, I still believe that NWN still provides more to a potential world builder, much more power than any other game out there even today.  No other engine allows a single person to build with such speed and power that NWN allows. 

 

Yeah, we are a bit limited in some of the graphics capabilities that some of the new games offer, as in things like Skyrim, however, I think more people have built useful and interesting worlds with NWN than anything I have actually played or seen with Skyrim.  Given that the graphics are a bit dated, and the landscape building options are limited due to the tile based system, we still have MUCH more ability with NWN than the other games I have investigated.  Granting of course, that I have NOT tried everything, nor will I, unless someone points me to something that will actually blow my socks off.

 

From what playing around I have done in Skyrim and it's more powerful graphical engine, I have found it to be MUCH more difficult to actually get working models into the game, and have found no real way to build terrain systems for it similar to the way I have built tilesets for NWN.  I am not saying it is not possible, just that it is extremely time and brain power consuming process, and the tools required to do even simple things make it nearly worthless to me personally.  Why would anyone wish to have to use 5 or 6 different tools just to get ONE completely NEW item into the game is not something that I can fathom.

 

Thank you for your efforts OMB, and I am sure you enjoyed trying to achieve it as well.  Since I know how much you love attacking mathematical problems looking for better solutions ;)

 

P.S.  If anyone has a game or even an engine that they think might surprise me or be something that I would enjoy working with, PLEASE let me know!


  • OldMansBeard et MerricksDad aiment ceci

#60
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I've been able to do 16x8 without breaking the engine but that's the limit. No real surprise, the downward trend continued.

 1x1 = 112/180 fps
 2x2 = 112/180 fps
 4x4 = 101/180 fps
 8x8 =  69/160 fps
16x8 =  64/120 fps

It's all rather disappointing, really. I was quite looking forward to writing some wizard software for automagically baking areas into single models. But, there we are. As an aged physicist, I'm used to finding that real measurements don't support the theories that one would like to be true.

 

*OMB goes back to sleep for 100 years*


  • Zarathustra217, OldTimeRadio et MerricksDad aiment ceci

#61
_six

_six
  • Members
  • 919 messages

Baking area-wide geometry into a single mesh means the engine has no way to cull any part of that mesh that is off camera. So sure you can render more in one go, but your overall going to be drawing more too. I'd be tempted to wildly speculate about memory allocation costs too (but after taking part in http://allrgb.com/ using a garbage collected language, I'm maybe a little fresh from the trenches there).

 

The takeaway from all this IMO should just be, minimize texture count and combine whatever seems appropriate within a single creature, placeable, tile or tilegroup. If there's any wizard software to be written, something that does the combining within a single tile (or tilegroup) would probably provide benefits overall, I reckon. I've seen tilegroups in various tilesets (CC City had plenty) that have been utterly minced by the tile slicer tool that'd definitely benefit from it.


  • Estelindis, rjshae et OldTimeRadio aiment ceci

#62
Shadooow

Shadooow
  • Members
  • 4 468 messages

P.S.  If anyone has a game or even an engine that they think might surprise me or be something that I would enjoy working with, PLEASE let me know!

Not that I know of any other game with such possibilities, but friend of mine is working on a brand new NWN-style engine. His goal is to allow reuse of the NWN (non-bioware) models and prefabs with extending the engine in terms of technologies and supported hardware. Unfortunately he is still at very start, he send me a DEMO showcase of what he currently has. Not very impressive but there is a high potential there... (BTW in that demo, a movement keys works, W (weapon) and A (animate)) and mouse to mess with camera.


  • Estelindis et WhiteTiger aiment ceci

#63
Bannor Bloodfist

Bannor Bloodfist
  • Members
  • 924 messages

At least he has something working there.  Not sure where/how the model(s) were created, animated etc, but it sounds as if he is starting from the Bioware format so it should be something that could be learned by the cc folks that may be interested.


  • Estelindis aime ceci

#64
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

Thank you for the data, OldMansBeard!  Can you provide the testing materials (esp. ASCII versions of those models) you used so that I can attempt to reproduce this behavior? 

 

What I'd like to do is use that same testing material and attempt to reproduce the performance hit.  Then try to identify any other factors in the models (or their textures) which could be causing it.  Since this is solely based on consolidating geometry, I'm kind of at a loss as to what could possibly be causing that dramatic of a loss of frame rate when, for instance, the total geometry on four individual tiles and one of the consolidated 4x4 tiles should be exactly the same.

 

The performance hit you describe isn't just noticeable, it's huge.  I'd like to have the opportunity to examine what's going on in that scene with the debugger.  That kind of data could be incredibly useful.


  • OldMansBeard aime ceci

#65
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Thank you for the data, OldMansBeard!  Can you provide the testing materials (esp. ASCII versions of those models) you used so that I can attempt to reproduce this behavior? 

 

What I'd like to do is use that same testing material and attempt to reproduce the performance hit.  Then try to identify any other factors in the models (or their textures) which could be causing it.  Since this is solely based on consolidating geometry, I'm kind of at a loss as to what could possibly be causing that dramatic of a loss of frame rate when, for instance, the total geometry on four individual tiles and one of the consolidated 4x4 tiles should be exactly the same.

 

The performance hit you describe isn't just noticeable, it's huge.  I'd like to have the opportunity to examine what's going on in that scene with the debugger.  That kind of data could be incredibly useful.

 

Yes, I'll see what I can do. It will be tomorrow, though. I'll bundle up both the ascii and binary versions of the models. Your hardware is almost certainly different from mine, so it will be interesting to see what effect that has on the numbers.

 

I think _six hit the nail on the head when he posted "Baking area-wide geometry into a single mesh means the engine has no way to cull any part of that mesh that is off camera. So sure you can render more in one go, but your overall going to be drawing more too."


  • Estelindis, OldTimeRadio, henesua et 2 autres aiment ceci

#66
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

Thank you very much, OldMansBeard.  RL is very busy at the moment so no rush but I want to make sure I put the request in now so that when time does open up I can dive into this and see what's going on.  I also agree with _six.  Still, I feel like there's some kind of "x factor" that I'm missing in this.


  • OldMansBeard aime ceci

#67
Bannor Bloodfist

Bannor Bloodfist
  • Members
  • 924 messages

Thank you very much, OldMansBeard.  RL is very busy at the moment so no rush but I want to make sure I put the request in now so that when time does open up I can dive into this and see what's going on.  I also agree with _six.  Still, I feel like there's some kind of "x factor" that I'm missing in this.

 

Aren't you just re-enforcing the issue that has already been proven:  Aurora does NOT like high poly objects.  We ALL know that, it has been proven soo many times over the past 10 years that it is not even worth considering as a non-issue. The more poly's  single object has, the more the engine chokes.  Looking over the numbers OMB quoted a page or so back, about how many faces/polys you end up with, regardless of how simple a set of models you use the original idea of saving draw calls becomes moot.  The engine can't handle the actual poly count on anything large enough for an area to be useable to gain any advantage of merging everything into a single object.  Add to that the issues of lighting, shadows, and what EVERYONE wants in the way of variability, a tileset of pre-set 8x8 or 16x16 tiles would get so boring so fast that few, if any, would use it more than once anyway.

 

It was a neat idea, but seriously, do you expect to find anything truly different enough to make a difference? 

 

This sort of research would have helped Bioware long before the engine was finalized, but back then, 32 bits was new, and 64 bits was a dream.  Bioware even investigated what it would take to update the Aurora engine, and decided they were better off dropping it completely and moving on to a completely new engine, which they have also dropped in favor of licensing software from some other developers.

 

Anyway, the missing 'x' factor you appear to keep missing is just the simple poly count issues we have always had  to deal with.  No gain in speed from a single object vs split up objects, and in fact, reduced functionality with no real gain at all in speed - thus, no advantage.

 

With newer game engines, the draw call limit might be an issue, but likely it has already been designed around in various other ways.  That, and the fact that newer game engines are all written in 64bit languages even if the various programmers are NOT using the true advantages of the larger address bandwidth.



#68
MerricksDad

MerricksDad
  • Members
  • 1 608 messages

I'd personally like to know the exact limits of the engine. I didn't before, and now I know one more part of it. Whether or not it actually gets us something more useful to work with is still just speculation, but I'd like to know, and have documented for others, the exact limitations, and the why of each limitation. This kind of information can save people headaches in the future, especially users who are new to the toolset and modding in general. Without that info, people may simply run into the wall a full speed and their projects would then die from frustration.


  • Frith5 et OldTimeRadio aiment ceci

#69
henesua

henesua
  • Members
  • 3 863 messages

Bannor, it is not worth while to caution someone from more research - especially not when there are still questions unanswered.

 

One question we still need to answer here is what are the critical thresh holds for draw calls. How many drawcalls can we have per frame before performance takes a hit.

 

OMB's test didn't address draw calls at all. It addressed a problem you will run in to by endlessly combining meshes, and found a critical threshold that we can not cross. What his test actual was about was to anwer the question: is there a negative sideaffect to combing meshes.


  • Frith5, Estelindis et OldTimeRadio aiment ceci

#70
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I've posted my test files on neverwintervault http://neverwinterva...-together-areas


  • Estelindis et OldTimeRadio aiment ceci

#71
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

Thank you sharing that, OldMansBeard!  I have made a little video in the debugger to show what I think the issue is:

 


  • Rolo Kipp aime ceci

#72
MerricksDad

MerricksDad
  • Members
  • 1 608 messages

I am not even going to ask why the Loch Ness Monster is sounding off in the background...


  • Shadooow et OldTimeRadio aiment ceci

#73
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

Ha!  I have the window open (it's a fine day over here) and apparently that's how my laptop's microphone records the sound of a car passing on the street out front.  I was like "What...What is that sound?" and then I realized.


  • MerricksDad aime ceci

#74
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Interesting. Thanks, OTR. I'll try some quick tests with suppressing those fence meshes somehow and see what happens. They aren't on an 'a' node in the vanilla tiles, though they perhaps should have been, so I didn't force them.


  • Estelindis et OldTimeRadio aiment ceci

#75
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

Thanks, OldMansBeard.  To clarify something I said in the video, I'd suggest leaving all 32-bit textured things where they are (so, no consolidation of any sort) and then just consolidate the regular 24-bit stuff.   Listening to what I said, I realized it could have sounded like I could be suggesting consolidating the 32-bit stuff in some way and that wasn't my intention.  Thanks again!


  • Estelindis aime ceci