Aller au contenu

Photo

INFO: Draw calls not geometry as a bottleneck in NWN.


  • Veuillez vous connecter pour répondre
118 réponses à ce sujet

#26
_six

_six
  • Members
  • 919 messages

How I would love to see source code for NWN.

 

Draw call being the bottleneck makes sense though. NWN's (presumably) using fixed pipeline stuff, so the only thing limiting how fast geometry can be processed, is how fast it can be sent off to the GPU. The physical distance it needs to travel across the hardware, and the number of times it has to travel there, is what ties down your framerate.

 

I find the draw order slightly curious insofar as it's sorting by texture, but not going so far as to actually combine the same-textured objects before drawing them. Which I'd love to try as a first optimisation step in a crazy alternate universe where we had the source code. Particularly for tiles, as that stuff is all static, and it'd be very easy to predict which tiles could be onscreen at any particular point.

 

I'm quite gratified to hear that less textures, more combined meshes is on the mark anyway, since I've kinda been assuming that's a good approach for a long time. And for reference, TNO is actually a really good tileset in that regard, compared to some of the standard ones. Waaaaaay too many textures though.


  • Estelindis et MerricksDad aiment ceci

#27
_six

_six
  • Members
  • 919 messages

I'd be interested to see how particle systems are constructed. I've found them notorious for slowing down the game, sometimes even apparently very simple effects.


  • MerricksDad aime ceci

#28
MerricksDad

MerricksDad
  • Members
  • 1 608 messages

I'd be interested to see how particle systems are constructed. I've found them notorious for slowing down the game, sometimes even apparently very simple effects.

 

I'd be more interested in creating a reuse algorithm inside the particle emitter system. One of the biggest issues I see with it, besides its create-when-needed methods, is that it doesn't clean up in a timely fashion. Still not as slow as firefox getting rid of page entities in memory after they are closed.


  • Rolo Kipp aime ceci

#29
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages

<still catching...>

 

I'm not quite ready to dive into this *very* tasty thread (busy with work :-/ ), but I'd like to mention a couple things to consider. Well, one thing to consider and one response to my name being dropped ;-)

 

@ MD: While I truly to want to develop procedural, chaotic world generation algos, I'm also very fond of *exceptions*. My original geometry engine proposal (to EA) in the 80's was a combination of DEM for world-mapping and 3D mesh exceptions with persistent deformable states. In regards to NwN, my plan (mentioned a few years ago) was to build areas procedurally *except* for special tilesets that would be constructed monolithically (like OTR's Megatron) and be both single purpose and highly optimized. (My descriptions before were considerably less specific and considerably more ignorant, but still on board with this idea.)

 

And that leads me to an item for consideration; Batching. That's what it's called in the jME engine, at any rate. This is done both at model creation and at runtime, sometimes even during the graphics update loop. Some of the most impressive terrain systems they are working on are actually calculating the geometry at runtime and batching mesh on the fly.

 

Joining Heightmaps thread

Marching cubes, triplanar mapping... thread and Awesome Video

(Edit: Had the wrong link on the Awesome video :-P Fixed now)

 

While the runtime option isn't available to us, an automated utility for examining an area, mod that could ultimately pull all the static mesh and textures (tiles *and * placeables) to produce optimized single use tilesets... that's intriguing :-) If we could still design and *build* with modular bits and pieces and then have a utility go through and optimize/compile as much as practical (starting only with tiles/groups/sets)... that could be amazing :-)

 

As for shadows in NwN... mostly I feel, like others, they just aren't worth the hassle in the current engine. In the *next* engine though... ;-)

 

<...the bug>


  • Estelindis et MerricksDad aiment ceci

#30
MerricksDad

MerricksDad
  • Members
  • 1 608 messages

Are you in any of these channels where people are talking about the potential for a NWN 3? I hear rumors, but nothing more. I only have so many wishes left.



#31
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

...

While the runtime option isn't available to us, an automated utility for examining an area, mod that could ultimately pull all the static mesh and textures (tiles *and * placeables) to produce optimized single use tilesets... that's intriguing :-) If we could still design and *build* with modular bits and pieces and then have a utility go through and optimize/compile as much as practical (starting only with tiles/groups/sets)... that could be amazing :-)

...

<thinks>

 

That could be done.

 

Read in the .git file, along with the .set and all the tile and placeable models, assemble a 'model' of the area, then relink the visible meshes from the all the tiles onto a central one, leaving all the other tiles with just lights and walkmeshes; optimise the central tile and export the resulting tiles. I could see that working.

 

That way, it would be possible to have a tileset where most of the tiles were just lights+walkmesh with no visible geometry (re-useable across areas) and a selection of central tiles each with the whole geometry for an entire area. As many tiles in the one tileset as you want special areas.

 

</thinks>

 

<afterthinks>

 

Animated bits of tile would have to be left where they were, but that's easy. You just don't move them.

 

</afterthinks>


  • Rolo Kipp et MerricksDad aiment ceci

#32
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages

<looking at the stars...>

 

Are you in any of these channels where people are talking about the potential for a NWN 3? I hear rumors, but nothing more. I only have so many wishes left.

No idea what you're talking about... *cough* (everything has to start somewhere, and the tools come first)

irc.nwn2source.net/#nwvault

 

<...and whistling idly>



#33
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I see three problems with the "supertiles" idea, if I can call it that. Putting all the static meshes for a whole area onto one central tile and consolidating them by bitmap to reduce the overall number of meshes in the area.

  • Mipmapping
  • Fog
  • Tilefade

Mipmapping. Imagine a 15x15 area of plain cobblestones, lIke you get when you create a new area in the toolset. Imagine optimising that, so the whole 150x150 ground plane is a single mesh attached to the central tile and bitmapped all over with cobblestones.dds. Now position the camera in one corner of the area, looking towards the centre. The pivot of the mesh is a long way off (around 100m) so the engine will choose a lo-res mipmap of cobblestones to render the mesh. The whole mesh. Including the place you are standing on. You won't see hi-res cobblestones unless you walk over and stand bang on the central tile. Not good.

 

Fog. Somehat similar. Same scenario. The centre tile is a long way off, so if the fog setting is below about 100, none of the ground plane will be rendered at all. Including the place you are standing on. Then the whole area will suddenly flip into visibility as you walk towards the centre. Not good.

 

Tilefade. If all the tilefading meshes are joined up, the whole upper story over the entire area will flip in or out as you walk past the central tile.

 

This needs thinking through, and possibly testing out.


  • Zwerkules, Shadooow, OldTimeRadio et 1 autre aiment ceci

#34
OldTimeRadio

OldTimeRadio
  • Members
  • 1 400 messages

@Everyone - I love the ideas being tossed around here.  So many wonderful ideas!  gDEBugger is free and the links in my original post should help you get up and running relatively quickly.  Whatever your area of interest is, gDEBugger is incredibly useful.  Even if you're only a builder and not a CC maker.  Download it, fire it up.  Give it a shot!

 

@OldMansBeard -  Good points.  I carried out some tests with a static placeable.  At least in that capacity, I find the results encouraging.


Mipmapping. Imagine a 15x15 area of plain cobblestones, lIke you get when you create a new area in the toolset. Imagine optimising that, so the whole 150x150 ground plane is a single mesh attached to the central tile and bitmapped all over with cobblestones.dds. Now position the camera in one corner of the area, looking towards the centre. The pivot of the mesh is a long way off (around 100m) so the engine will choose a lo-res mipmap of cobblestones to render the mesh. The whole mesh. Including the place you are standing on. You won't see hi-res cobblestones unless you walk over and stand bang on the central tile. Not good.

 

15x15 static placeable, mapped with tcn01_cobb03.  GMax file for placeable here.  Placed in the middle of a 20x20 area (and set to static), I get no mip-mapping when looking toward the center/pivot/root node from a corner.  How do I explain this?  Well, I suppose one hypothesis could be that mipmapping is based not on distance from center/pivot/root node but distance from the bounding box of the model, itself.  A quick look over my exported ASCII model didn't seem to indicate bounding box dimensions but it is part of the binarized model header format, so I assume it's present in all compiled models.  Just to make sure I wasn't fooling myself in some way, I did test with a texture-approrpaite .TXI which had the lines "downsamplemax 2" and "downsamplemin 2" in order to force mipmap downsampling.  Here it is without anything, and here with the forced TXI downsampling.  So, at least in the way I devised to "check my work", my work seems to check out.

 


Fog. Somehat similar. Same scenario. The centre tile is a long way off, so if the fog setting is below about 100, none of the ground plane will be rendered at all. Including the place you are standing on. Then the whole area will suddenly flip into visibility as you walk towards the centre. Not good.

 

This is definitely the case in non-static objects, but not static ones.  Static meshes are loaded at level load and are never unloaded. Thankfully.  Some of my better tricks exploit this aspect of the engine.

 


Tilefade. If all the tilefading meshes are joined up, the whole upper story over the entire area will flip in or out as you walk past the central tile.

I wasn't able to test this but based on my limited knowledge I would say a situation like that would be working "as designed" because, technically, a large contiguous mesh would trigger the code that thinks the camera is being blocked by it.  Can multiple (separate) meshes be connected to a tile and will a tilefade "event" trigger them all, or only those whose (whatever, bounding box?) triggers it?  Do you recall?

 

Here is a link to download my testing materials for anyone who cares to play with it, themselves.  Module goes in modules folder, the other two files go into the override folder.  In order to invoke the forced mipmapping downsample, just remove "removeme" from the extension of the .TXI file.


  • Rolo Kipp et MerricksDad aiment ceci

#35
Shadooow

Shadooow
  • Members
  • 4 468 messages

I wasn't able to test this but based on my limited knowledge I would say a situation like that would be working "as designed" because, technically, a large contiguous mesh would trigger the code that thinks the camera is being blocked by it.  Can multiple (separate) meshes be connected to a tile and will a tilefade "event" trigger them all, or only those whose (whatever, bounding box?) triggers it?  Do you recall?

Not sure whether I understand the question but: whether tilefading feature happens or not is based on a camera position and tiles. Every meshes with fade option on the tile (even those that would be 20meters off the tile) will fade whenever you move the camera next to that tile.

 

Still this is problem only for auto-fade which is what I am using when playing. its a suitable option for both enough immerse and decent view when using top-down camera mode. Since the enforced fading (not camera based) will still work, its not *that* big issue.


  • OldTimeRadio et WhiteTiger aiment ceci

#36
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

@OTR

 

That's quite reassuring. I think it's time to bite the bullet and make some "supertiles" for testing.

 

I'm going to install NWN and 3dsMax again, so I can try some stuff out.


  • Estelindis, Tarot Redhand, OldTimeRadio et 2 autres aiment ceci

#37
Lord Sullivan

Lord Sullivan
  • Members
  • 559 messages

This optimization is a good way to reduce the calls however, it would be even more significant with "Texture Atlas" which I can see being a mapping issue if say you patch 4 (512x512) different textures to create 1 1024x1024. The issue is mostly in the mapping work needed with such a texture of the kind. Unless there is a way in the material editor to pinpoint and only use a specifique area on the the texture that I don't know about.


  • henesua aime ceci

#38
_six

_six
  • Members
  • 919 messages

Mipmapping is usually handled on the GPU end of things at per-pixel level, based on the depth of each pixel from the camera. I can't see NWN doing anything particularly unusual there (at least it'd be unnecessarily complex for it to).

 

 

Fog does not conceal geometry - it's all still rendered. Though there is a maximum draw distance, which is unfortunately related to the fog distance. I think it's something like the fog cutoff distance squared.

 

 

Good point on the tilefade, though. Personally, I find automatic tilefade so unpredictable and distracting as to be useless and tend to either have it all off or on myself. *shrug*


  • OldTimeRadio aime ceci

#39
_six

_six
  • Members
  • 919 messages

I'm wary of using atlassing for NWN tilesets so much. I mean, multiple things on a texture is great, but NWN has a habit of forcibly downsampling to save on texture memory, regardless of your machine's capabilities, and I've found large textures on multiple objects to slow the game down to a surprising extent for me. Plus tilesets tend to involve tiling textures, which are something of a no-go for atlasing (unless it's carefully planned from the start - and even then, you'll need padding to work around mip mapping).

 

I reckon Diademus' spider caves are a great reference point for most of this stuff.


  • Estelindis aime ceci

#40
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Well, I've done a straightforward test, to see the effect of moving all of the geometry of an area onto a central tile and cosolidating the meshes.

 

It works fine, no problem with mipmapping or fog.

 

Sadly, the performance is marginally worse.

 

Here's what I did:

  • Take the tms microset and change the walkmesh material in the c01 tile (basic grass) from grass (3) to dirt(1) and turn off Grass in the .set
  • Compile it with the BW model compiler, put the .mdl and .wok in override.
  • Create a test module and build a 15x15 area entirely out of that tile.
  • Add an OnEnter script to the area to turn the PC invisible: SetCreatureAppearanceType(oPC,809).
  • Fire up FRAPS
  • Load up the test module in-game, stand in the middle of the area looking straight down at the ground.
  • Zoom right out to include widest possible area and note the fps
  • Zoom right in close to the ground and note the fps again
  • Make a copy of the modified c01 tile, call it x01 and delete the ground plane entirely
  • Make another copy of c01 called y01 with the ground plane duplicated and shifted 15x15 times
  • Merge all the meshes in y01 into one giant 150.00x150.00 mesh with 7200 polys
  • Compile x01 and y01 with the BW model compiler then copy the .mdls and .woks into override.
  • Add entries for x01 and y01 to the .set file
  • Paint a new 15x15 area in the test module with a y01 in the centre surrounded by 224 x01 tiles.
  • Check in-game that the new area looks and functions the same as the old one (it does)
  • Repeat the same fps test in the new area.

Results

'c' tiles (old style - separate meshes on each tile)

zoom out = 135fps, zoom in=180fps

 

'x' & 'y' tiles (all geometry on the centre tile surrounded by invisible tiles)

zoom out = 132fps, zoom in=170fps

 

The difference isn't great, but it's not in the direction we were hoping for.

 

Any suggestions for further tests?


  • Estelindis et OldTimeRadio aiment ceci

#41
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages

<standing right...>

 

Could you do a test of 3x3 metatiles (nwn2 name for them)?

Same set up, so 5x5 of 3x3 batched 1x1 tiles (I just wanted to type that :-)

 

Now I'm wondering why the slow down and if it might be the single huge chunk vs smaller, but still batched, chunks that might be sent opportunistically down the pipeline...

 

<...in the middle of the road>


  • OldTimeRadio et MerricksDad aiment ceci

#42
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Yes, I'll have a go at that

 

<standing right...>

 

Could you do a test of 3x3 metatiles (nwn2 name for them)?

Same set up, so 5x5 of 3x3 batched 1x1 tiles (I just wanted to type that :-)

 

Now I'm wondering why the slow down and if it might be the single huge chunk vs smaller, but still batched, chunks that might be sent opportunistically down the pipeline...

 

<...in the middle of the road>

Yes, that should be easy enough  to try.



#43
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages

<dragging his monkey...>

 

One of the things I've noticed in the discussions of batching in jMonkeyEngine is there seems to be a sweet spot in batch size (they're calling them chunks) that is driving how they are setting up terrain and vegetation paging. Too many small chunks or too few giant chunks both seem to bog things down.

 

<...through the nights again>



#44
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Interesting.

 

15x15 of 1x1 (vanilla) : 135/180 fps

5x5 of 3x3 chunks      : 135/180 fps

Big chunk + 224 blanks : 132/170 fps

 

So medium chunking is neutral for performance, neither win nor lose. Big chunking appears to be bad. Does that make any sense?


  • Zarathustra217 aime ceci

#45
Rolo Kipp

Rolo Kipp
  • Members
  • 2 791 messages

<pulling out...>

 

Only if the big chunk causes some contention for the bus that the smaller chunks don't.

 

I don't really think we are seeing an issue with # of verts or textures, yet. This so far sounds like something to do with having a huge rendering range for a single mesh. (theorizing from the seat of my seat, of course ;-P )

 

Another test, if you're willing; Add a simple sphere with a different texture to the center of each tile, then batch that mesh as well

  1. The big chunk with a single 15x15 sphere (submesh elements) and a single big ground plane
  2. The medium chunks - 5x5 3x3 metatiles
  3. The control 15x15 single tiles

 

<...the plunger>



#46
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

Okay, but that will have to wait while I install 3dsMax. I've been just doing stuff in notepad today.

 

Something I might try is using more complex tiles - perhaps streets of identical houses from tcn - and going up in chunk size 1x1, 2x2, 4x4, 8x8, 16x16 in a 16x16 area.


  • cervantes35, Rolo Kipp et MerricksDad aiment ceci

#47
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I've taken the Barracks 2x2 group out of tcn and filled a 16x16 area with it.

 

Comparing:

(1) With the four individual tiles in the group, just with shadows & tilefade turned off, meshes compacted by bitmap and compiled.

(2) After moving all the trimeshes onto one of the tiles & compacting it by bitmap again, leaving the other three tiles with just walkmesh & lights.

 

The number of trimesh nodes on the four tiles was 72 in case (1), reducing to 22 in case (2). In other words, the group as a whole uses 22 distinct bitmaps.

 

Results: no difference. 112/180 fps in both cases.

 

So 2x2 chunking a moderately complex tile group is neutral with regards to performance.


  • Estelindis et OldTimeRadio aiment ceci

#48
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

A thought - suppose that what matters is the combined polycount of all the distinct meshes that intersect the field of view of the camera.

 

Up to a point, chunking would make no difference; the same polys would be rendered regardless of which meshes they are part of. But if you make the chunks too big, the engine will be pipelining polys that are well out of the field of view, just because they are part of the same mesh that is partly in view. If there is an poly in view that is rendered with a particular bitmap, all the polys with that bitmap in the whole area would get rendered, even the ones that are completely out of view. So it would get worse.

 

With the Barracks group, the field of view looking straight down at widest zoom is about one group. If the theory is right, I would expect to see progressive degradation as I successively double the chunk sizes of the barracks tiles.


  • OldTimeRadio aime ceci

#49
MerricksDad

MerricksDad
  • Members
  • 1 608 messages

I'm still going with the fact that there is a huge fundamental difference between how the aurora engine renders stuff and how other engines, including the toolset, render stuff. But I totally agree with all the testing you guys are doing.


  • OldTimeRadio aime ceci

#50
OldMansBeard

OldMansBeard
  • Members
  • 152 messages

I've taken the Barracks 2x2 group out of tcn and filled a 16x16 area with it.

 

Comparing:

(1) With the four individual tiles in the group, just with shadows & tilefade turned off, meshes compacted by bitmap and compiled.

(2) After moving all the trimeshes onto one of the tiles & compacting it by bitmap again, leaving the other three tiles with just walkmesh & lights.

 

The number of trimesh nodes on the four tiles was 72 in case (1), reducing to 22 in case (2). In other words, the group as a whole uses 22 distinct bitmaps.

 

Results: no difference. 112/180 fps in both cases.

 

So 2x2 chunking a moderately complex tile group is neutral with regards to performance.

 

Next measurement: 4x4 chunks 101/180 fps. Performance is starting to drop off.


  • Estelindis aime ceci