Aller au contenu

Photo

INFO: Draw calls not geometry as a bottleneck in NWN.


  • Veuillez vous connecter pour répondre
118 réponses à ce sujet

#101
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

@OldMansBeard -  Thank you, again, for your testing.  As with previous tests, I can't find anything in your methodology which doesn't appear to be sound.  I'm puzzled by the slight FPS increase.  But I can't argue with the numbers.  Very puzzled radio over here.

 

@Zarathustra217 - The Spelljammer placeable isn't consolidated.  Or, at least not perfectly.  There are 217 chunks (about the same as 13-14 "beggar" NPC models, draw call-wise), IIRC.  Each chunk ranges from somewhere around a 1,000 tris on up to 10,000 or so per mesh node.  But, yeah, something like that (or a couple of them) in non-optimized form to take down the FPS and then consolidated versions of the same, even partly consolidated, should yield something useful.

@Rolo - First, that's a good list.  Second, I wish PeachyKeen was around as well.  I don't think it's poor form to mention this out in the open because so much time has passed but, as a person who kept an eye on Voodoo Shader development, it appeared to me that at some point NWN 1 got dropped from the list of supported games.  The project would appear to be dead, so it may be a moot point but  I remember reading, either on the old social site's NWN 1 or 2 forum a message by him that there was something about the pipeline in NWN 1 that was really inefficient that he was not happy about.  After reading the message, I was never able to find it again and I wasn't able to get in touch with him over the IRC or via PM and I've always wished I could have remembered what it was.  I wish the scope of that project could have been a little more limited, because I think it could have gone much farther.  Oh well.

 

@Zwerkules - Do you happen to recall whether this was in a situation with shadows on or off?


  • OldMansBeard et Rolo Kipp aiment ceci

#102
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages

<peering into the...>

 

My reasoning for differentiating the static placeables and static tiles is probably on the CPU-side of the pipeline, but there, non-the-less. Put down a static placeable like my Skyball (radius is iirc 200m or 300m), load the game and from the origin, walk away. At about 40m from the origin, the placeable will fade out, while the tiles are still drawn... (Edit: That's 40m away from the no-mesh origin and 40m *closer* to the mesh)

 

Further, let me theorize that weapons and *heads* are in the *effects* pipeline (or their own) rather than the item pipeline. If you walk behind the wrong kind of effect, weapons and heads vanish :-) Of course, you get behind *some* effects and everything but tiles vanish :-P

 

<...murky murky past>



#103
Zwerkules

Zwerkules
  • Members
  • 1 321 messages

@Zwerkules - Do you happen to recall whether this was in a situation with shadows on or off?

Shadows were turned on, but neither the trees on the tiles nor the placeables had shadows.



#104
NWN_baba yaga

NWN_baba yaga
  • Members
  • 1 232 messages

I´m very late to the party but can someone tell me what to do in the future to improve the performance in words that are simple;)

 

thanks all:)



#105
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I´m very late to the party but can someone tell me what to do in the future to improve the performance in words that are simple;)

 

thanks all:)

Turn off shadows :)


  • Pstemarie, Zwerkules, Estelindis et 4 autres aiment ceci

#106
virusman

virusman
  • Members
  • 282 messages

If only we had the headers... Switching from immediate mode to VBOs and replacing shadows with shaders would probably boost the performance a lot.


  • OldTimeRadio et Rolo Kipp aiment ceci

#107
Zwerkules

Zwerkules
  • Members
  • 1 321 messages

Turn off shadows :)

I wish I could have liked this post more than once! :D


  • Estelindis et Rolo Kipp aiment ceci

#108
Zarathustra217

Zarathustra217
  • Members
  • 221 messages

The end users can themselves switch off the shadows if they have performance issues, so I don't see any reason to do that on the model. On the other hand, it is a viable solution to disable shadows on complex meshes and then make simpler invisible meshes cast the shadows instead.


  • henesua aime ceci

#109
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

The end users can themselves switch off the shadows if they have performance issues, so I don't see any reason to do that on the model. On the other hand, it is a viable solution to disable shadows on complex meshes and then make simpler invisible meshes cast the shadows instead.

Okay, if you are making complex meshes, shadows are only half the problem. Fair point. Rendering them is the other half.


  • Zwerkules, OldTimeRadio et Rolo Kipp aiment ceci

#110
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

I had some time open up so I tried to reproduce this.  I installed FRAPS, got my on screen display of the FPS without having to do anything.

 

I tried the first test module (here, file is "File bundle: module + hak + uncompiled tile models") in the game, using the instructions here, which includes starting out staring at one's feet then zooming out as much as possible.  My feet shots look like this, my zoomed ones look like this

Q: OldMansBeard, are those shots representative of what you see when testing?

  

 My results:                                                                     OldMansBeard's results:
 Area1x1 - 148/172 (zoomed out, zoomed in)          1x1 = 112/180 fps
 Area2x2 - 151/172                                                        2x2 = 112/180 fps
 Area4x4 - 132/168                                                        4x4 = 101/180 fps
 Area8x8  - 83/160                                                         8x8 =  69/160 fps

 

These kind of numbers seem close enough to say I've reproduced what you saw.  But there's a problem: Actually walking around in any of the test areas, including the 1x1, brings on horrible lag instantly.  Something's not right, but I can't tell what it is.  I gave a cursory inspection to your models and walkmesh and nothing stood out as being improper.

 

Q: Can you or anyone else reproduce this lag and help figure out where it's coming from?  I'm getting it just trying to walk anywhere.  I'm not sure whether that would affect the FPS tests or not.  It only seems to happen when I'm moving so my guess is not but it's still unexpected behavior.  For anyone who's interested: FRAPS can be downloaded here.  Just install it and you'll get the FPS overlay without having to change anything else.  Test module can be downloaded here, and I'm using the file called "File bundle: module + hak + uncompiled tile models".

 

Interestingly, using that same test module in the toolset, zooming all the way out there was a marked increase in FPS for consolidated tiles:

Area1x1  - 28 FPS (non-optimized, all the rest are optimized)

Area2x2  - 50 FPS

Area4x4  - 83  FPS

Area8x8  - 76 FPS

Area16x8 - 70 FPS

 

So at least up to a point (4x4), it seems to increase the FPS quite a bit. 

 

Q: Can anyone reproduce that in the toolset?

 

Q: Anyone have opinions on what the difference is?

 

------------

 

At least on the geometry (and probably texture) front, check out these scenes and the FPS I'm getting in each with the same settings on my client- namely shadows, shiny water and VSYNC off:

 

xEA43lc.jpgBG9N7Zj.jpg

0Ij2mRJ.jpgIRSE1wu.jpg

 

If anyone wants to actually compare their FPS in these scenes, FRAPS can be downloaded here, here is the module I just whipped together that the above screenshots are from.  You'll need the hak in this archive to play it.  I can absolutely understand how I can get 55 FPS in Megaton compared to the 122 FPS in Rural.  Harder to explain is why I only get 5 more FPS in Tropical and 10 less in TNO from those vantages. 

 

Q (to anyone): So what is going on to either allow Megaton to run so smoothly or TNO to run so slowly?  I'll take either answer as long as it's something I can reproduce.  If you use NWN Explorer Reborn 1.63, with the option of "Outline Polygons in Model Meshes" turned on, you can see the raw geometry that's at play in the Megaton hak.  And the ~85 insanely hi-res (many alpha) textures involved.


  • OldMansBeard aime ceci

#111
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

The end users can themselves switch off the shadows if they have performance issues, so I don't see any reason to do that on the model. On the other hand, it is a viable solution to disable shadows on complex meshes and then make simpler invisible meshes cast the shadows instead.

 

Also, if an area maker wanted to be sneaky, I believe that a custom environment in environment.2da could turn shadows off, even if the mesh had shadows on them and the client viewing them had shadows turned on.  Re: LIGHT_SHADOWS and DARK_SHADOWS.  I believe the description for DARK_SHADOWS to be a typo and that it's for night, not day, which is what LIGHT_SHADOWS is.  YMMV, but I think that's right.



#112
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

If only we had the headers... Switching from immediate mode to VBOs and replacing shadows with shaders would probably boost the performance a lot.

 

It's one of the first things I'm going to be asking Overhaul for if they ever work their magic on this game.  Trent & Cam are destined/cursed to revisit NWN.  After MDK2 HD and Baldur's Enhanced, it's the natural "next step".

 

Relevant from the Omnibus, from RTrifts, thread "Emitters & CPU usage":
 

It's a little more complicated than that though.
Emitters and creatures are inherently not optomized for the way that OpenGL works with NWN. (Tilesets on the other hand, are optomized much more tightly, so one polygon in a tile is less than one polygon in a creature or on an emitter in game engine load terms, even if both are animated).  The current vertex pool code in NWN is designed around the old Vertex Array Range (VAR) style of rendering, where the application would be forced to do all memory management for geometry data. A little over a year ago, the Vertex Buffer Object (VBO) extension was introduced to OpenGL, which moved the burden of memory management back into the driver. This is what Bio uses in KotOR in the Odyssey engine, but Aura is still VAR style.  If VBO was implemented in NWN, it would permit creatures and emitters or even entire tilesets to be packaged up as objects. That would much less copying of data, being able to download static geometry to video memory for much faster rendering, etc... That's what KotOR has - but we don't have it in NWN.  Anyways - long story short: a polygon is not a polygon when you are comparing tiles vs creatures & emitters.
So sayeth roboius, and on stuff like this, I salute him smartly.

 

Examination with gDEBugger shows that at least some types of emitters are optimized (multiple particles being drawn in a single pass), but his overall point holds.



#113
virusman

virusman
  • Members
  • 282 messages

It's one of the first things I'm going to be asking Overhaul for if they ever work their magic on this game.  Trent & Cam are destined/cursed to revisit NWN.  After MDK2 HD and Baldur's Enhanced, it's the natural "next step".

 

Relevant from the Omnibus, from RTrifts, thread "Emitters & CPU usage":
 

 

Examination with gDEBugger shows that at least some types of emitters are optimized (multiple particles being drawn in a single pass), but his overall point holds.

Unfortunately, NWN:EE is highly unlikely: Trent Oster said that multiple times in his Twitter: https://twitter.com/...385563095814144

Yes, some things are batched, but the majority of the calls are single vertices - you can see that in gDEBugger detailed view. Without headers, I can't easily tell what types of meshes are batched.



#114
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I had some time open up so I tried to reproduce this.  I installed FRAPS, got my on screen display of the FPS without having to do anything.
 
I tried the first test module (here, file is "File bundle: module + hak + uncompiled tile models") in the game, using the instructions here, which includes starting out staring at one's feet then zooming out as much as possible.  My feet shots look like this, my zoomed ones look like this
Q: OldMansBeard, are those shots representative of what you see when testing?

 

Yes, those screenshots are right. You are doing the same tests. Only slight difference is that I set creature shadows to high, which, with an invisible PC, gives no shadow at all at your feet rather than the blurred circle of simple shadow. But it doesn't affect the figures much.

 

My results:                                                                     OldMansBeard's results:
 Area1x1 - 148/172 (zoomed out, zoomed in)          1x1 = 112/180 fps
 Area2x2 - 151/172                                                        2x2 = 112/180 fps
 Area4x4 - 132/168                                                        4x4 = 101/180 fps
 Area8x8  - 83/160                                                         8x8 =  69/160 fps
 
These kind of numbers seem close enough to say I've reproduced what you saw.  But there's a problem: Actually walking around in any of the test areas, including the 1x1, brings on horrible lag instantly.  Something's not right, but I can't tell what it is.  I gave a cursory inspection to your models and walkmesh and nothing stood out as being improper.

 

That's interesting. I didn't get that. I found fps drops a bit when moving but not hugely. Just a thought - try the same thing with my V2 upload. Does it happen in the same way? If it does or doesn't, that might give us a clue.

 

Q: Can you or anyone else reproduce this lag and help figure out where it's coming from?  I'm getting it just trying to walk anywhere.  I'm not sure whether that would affect the FPS tests or not.  It only seems to happen when I'm moving so my guess is not but it's still unexpected behavior.  For anyone who's interested: FRAPS can be downloaded here.  Just install it and you'll get the FPS overlay without having to change anything else.  Test module can be downloaded here, and I'm using the file called "File bundle: module + hak + uncompiled tile models".
 
Interestingly, using that same test module in the toolset, zooming all the way out there was a marked increase in FPS for consolidated tiles:
Area1x1  - 28 FPS (non-optimized, all the rest are optimized)
Area2x2  - 50 FPS
Area4x4  - 83  FPS
Area8x8  - 76 FPS
Area16x8 - 70 FPS
 
So at least up to a point (4x4), it seems to increase the FPS quite a bit. 
 
Q: Can anyone reproduce that in the toolset?
 
Q: Anyone have opinions on what the difference is?
 
------------
 
At least on the geometry (and probably texture) front, check out these scenes and the FPS I'm getting in each with the same settings on my client- namely shadows, shiny water and VSYNC off:
 
xEA43lc.jpgBG9N7Zj.jpg
0Ij2mRJ.jpgIRSE1wu.jpg
 
If anyone wants to actually compare their FPS in these scenes, FRAPS can be downloaded here, here is the module I just whipped together that the above screenshots are from.  You'll need the hak in this archive to play it.  I can absolutely understand how I can get 55 FPS in Megaton compared to the 122 FPS in Rural.  Harder to explain is why I only get 5 more FPS in Tropical and 10 less in TNO from those vantages. 
 
Q (to anyone): So what is going on to either allow Megaton to run so smoothly or TNO to run so slowly?  I'll take either answer as long as it's something I can reproduce.  If you use NWN Explorer Reborn 1.63, with the option of "Outline Polygons in Model Meshes" turned on, you can see the raw geometry that's at play in the Megaton hak.  And the ~85 insanely hi-res (many alpha) textures involved.

 

Just looking at a small sample, TNO01 tiles seem to have about 5x the poly count of the TCN01 ones. That might account for some of the lag.


  • OldTimeRadio aime ceci

#115
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

@Virusman - In the immediate future, like this year or 2015?  I agree with you.  But I think it's something they want to do if they can.  If KotOR can be ported to the iPad 2, it seems the technical hurdles are not as insurmountable as they might seem to bring NWN to mobile.  I do hate that the MMO might in any way slow down the possibility of Overhaul touching it, though.  :(

 

Yes, some things are batched, but the majority of the calls are single vertices - you can see that in gDEBugger detailed view. Without headers, I can't easily tell what types of meshes are batched.

 

Not single vertices but single mesh nodes.  I know you probably know this but I don't want others to get confused.  Here's one of the ~200 calls on the Spelljammer.  I'm assuming the number 33699 after GL_TRIANGLES really means 33k verts and my guess is it's only around 11,000 actual triangles, though.  If you have an ATI card, AMD acquired the code for gDEBugger and continued the project as CodeXL.  I'm still getting familiar with it, but it's very similar to gDEBugger.

 

Out of curiosity, Virusman, I am assuming that the headers are something, like the symbols file, that either falls in your lap or is never accessible at all?  Could something like that be reverse-engineered from what I think is a sort of an older "debug" build of the engine?  Because the Bioware model viewer (can also download here if that link doesn't work) has a lot more that's...um...I guess "viewable", strings-wise at least, than anything else I know of.  For a time, I had access to IDA Pro 5.5 with HexRays and hoped to use the model viewer's weakness to my advantage but still couldn't learn all that much, sadly.

 

@OldMansBeard -

Screenshots: Good.  Thank you again for all  the observations.

 

Walklag: Yep, even on the v2.  Just did some testing because this was just too odd.  Turns out it's a function (at least on my end) of your choice of appearances.  809 is an invisible human male at 10% size.  For some reason this is what was triggering the ungodly lag.  Oddly, 298 (which I usually use for invisible) didn't have this problem nor did other creature appearances (I tried Formian, Basilisk, et al.)  Seems like it has less of an effect as I get closer to 100% size and no lag when it's close to, at or over 100% size.  This game is so much "What the hell?", sometimes.  I was thinking this might be some bizarre graphical thing but maybe the bad behavior is from my PC's movement or anims trying to play through that 10%-sized guy.

 

Diff between TNO and TCN: Maybe.  It makes sense between the two of them but I don't get the difference between TNO and Megaton, if it's based on the heaviness of geometry, anyway.



#116
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

@OldMansBeard -
Screenshots: Good.  Thank you again for all  the observations.
 
Walklag: Yep, even on the v2.  Just did some testing because this was just too odd.  Turns out it's a function (at least on my end) of your choice of appearances.  809 is an invisible human male at 10% size.  For some reason this is what was triggering the ungodly lag.  Oddly, 298 (which I usually use for invisible) didn't have this problem nor did other creature appearances (I tried Formian, Basilisk, et al.)  Seems like it has less of an effect as I get closer to 100% size and no lag when it's close to, at or over 100% size.  This game is so much "What the hell?", sometimes.  I was thinking this might be some bizarre graphical thing but maybe the bad behavior is from my PC's movement or anims trying to play through that 10%-sized guy.


That's bizarre. Glad you bottomed it - I would never have thought of that.

It looks like your results generally are the same as mine - different numbers but same trend (or lack of it) - and that's reassuring. The machine I was doing the tests on is unavailable for the next few weeks and I'm reduced to using a puny notebook, so I can't do any other confirmations for a while but it looks like you are set to go with all this. :)

 

(edit - added)

 

I now have access to a laptop with integrated graphics that runs NWN about as fast as my desktop. I've re-run the V2 tests on it and something interesting has come up:

Chunk    Camera    Camera    Camera
 Size      A         B         C
-----    ------    ------    ------
 1x1      184       153        36
 2x2      185       155        50
 4x4      181       139        61
 8x8      176        88        62
16x8      176        89        62

The first two columns (zoom-in/zoom-out) are no surprise but look at Column C (camera height 200.0, looking down on the whole area). Performance improves with consolidation !

 

So different hardware may give different results qualitatively.

 

My original tests were on a Q6600 (4 x 2.4GHz) desktop running Win 8.1 in 6GB of ram, with a GT610 graphics card (2GB DDR3).

 

These tests are on a Core2 Duo (2 x 2.4GHz) laptop running Win 7 in 4GB, with a Mobility Radeon HD4530 (512MB) card.


  • OldTimeRadio et Rolo Kipp aiment ceci

#117
Zarathustra217

Zarathustra217
  • Members
  • 221 messages

I figure one useful thing we can derive from these numbers is that it's perfectly fine to make 2x2 tilegroups as consolidated. At times this could reduce the work you have to do - as well as resulting in less polys when you don't have to split up faces that cross the inner tile edges.


  • Zwerkules, Shadooow, kalbaern et 2 autres aiment ceci

#118
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I've been doing some more tests, using the TNO01 house 1 2x2 group, This time, with the camera looking horizontally along a street of identical houses. Starting 5m from the southern edge looking north and stepping in 10m intervals up to 155m, at which point the camera is looking at the northern sky with no houses visible, Measuring the frame rate at each step. The difference in view between each step and the next, is the number of houses visible in the far distance.

 

What the numbers are telling me, is that consolidation up to 2x2 does improve frame rate for distant tiles (more than about 40m away) but not for tiles that are close to.the camera - on those it has no effect. If this is right, then performance in large areas with long straight streets can be improved by consolidation but in areas with short streets with buildings across the ends, it doesn't help.

 

This is quite surprising. I will do some more experimentation.

 

(edit -added) That's with fog turned off (fog distance set to 4500). With fog set to the default 45m, frame rate doesn't change until you get right to the end of the street because the number of houses in view is limited by fog anyway, and consolidation has no effect.


  • rjshae, OldTimeRadio et Rolo Kipp aiment ceci

#119
virusman

virusman
  • Members
  • 282 messages

Not single vertices but single mesh nodes.  I know you probably know this but I don't want others to get confused.  Here's one of the ~200 calls on the Spelljammer.  I'm assuming the number 33699 after GL_TRIANGLES really means 33k verts and my guess is it's only around 11,000 actual triangles, though.  If you have an ATI card, AMD acquired the code for gDEBugger and continued the project as CodeXL.  I'm still getting familiar with it, but it's very similar to gDEBugger.

 

Out of curiosity, Virusman, I am assuming that the headers are something, like the symbols file, that either falls in your lap or is never accessible at all?  Could something like that be reverse-engineered from what I think is a sort of an older "debug" build of the engine?  Because the Bioware model viewer (can also download here if that link doesn't work) has a lot more that's...um...I guess "viewable", strings-wise at least, than anything else I know of.  For a time, I had access to IDA Pro 5.5 with HexRays and hoped to use the model viewer's weakness to my advantage but still couldn't learn all that much, sadly.

I already have the symbols file (under non-distribution terms though), but that doesn't contain typeinfo (objects' memory layout) and other stuff, so it'd still take a lot of time to figure out how the engine works without headers. Mapping memory structures isn't easy.


  • OldTimeRadio aime ceci