shadernodes.tcl
===============
13-Dec-00   floh    created

Explains using the new shader node for multitexturing effects, how to use
them EFFICIENTLY, and what's going on under the hood.


=====================================================================
What is it?
=====================================================================
nShaderNode objects contain all render states that describe the
appearance of the surface of a 3d object, or generally, all renderstates
which have to be set before the geometry of a 3d object can be rendered.
Those render states can be categorized into the following blocks:

- the operations on a pixel as it travels through the multitexture pipeline

- the configuration of the virtual texture units (describing how exactly
  a texel should be generated by a texture read operation), this includes
  manipulation/generation of texture coordinates
  
- attributes for dynamic lighting and fogging

- the alpha blending operations, which describes how the pixel which
  is coming out of the multitexture pipeline should be combined into
  the frame buffer

=====================================================================
Some implementation details:
=====================================================================
- Allows to configure up to 4 texture layers (at the moment). Nebula
  tries to figure out how many render passes are required to render
  a given shadernode and transparently uses multipass rendering if
  necessary.

- Render states which are unlikely to change frequently are compiled
  into display lists (OpenGL) or state blocks (d3d7). There's a dirty
  detection mechanism which causes a recompile if one of the "static"
  attributes changes.

- There's a "current state block" tracking mechanism which prevents
  redundant render state switches. The scene graph object assists
  by sorting render nodes by shader node.

- nshadernodes have replaced nmatnodes in Nebula.  Any usage of nmatnode
  in your own code must be converted to use nshadernode.  This includes
  any corresponding usage of ntexnode, as ntexarraynode is required for
  use with the nshadernode.

=====================================================================
Related Classes
=====================================================================

nShaderNode     - this is the "frontend" class which is linked into a
                  nvisnode hierarchy, it replaces the old nmatnode class
                  (script interface name is nshadernode)

nPixelShader    - this is the low level abstract base class which is the
                  superclass of the 3d api specific classes nD3D8PixelShader
                  and nGlPixelShader

nD3D8PixelShader   - Direct3D8 optimized pixel shader class

nGlPixelShader     - OpenGL 1.1 optimized pixel shader class (requires
                     extensions to work completely)


=====================================================================
Multitexturing and Pixel Operations
=====================================================================

nShaderNode allows to define a sequence of operations which a pixel
must pass though when traveling from the colored fragment towards the frame
buffer. This provides a very intuitive and straightforward way to configure
the multitexture pipeline.

The first thing you have to tell the shadernode is how many stages you
require:

.setnumstages [0..4]

This defines both the number of pixel operation in the sequence and the
number of virtual texture unit that need to be configured.

The actual pixel operations are defined by the script commands:

.setcolorop 'stage' 'op [arg1] [arg2] [arg3]'
.setalphaop 'stage' 'op [arg1] [arg2] [arg3]'

It is possible to define independent operations for the rgb and alpha
component.  The stage is a number from 0 to 3 and defines the order of
the operations as well as the number of the virtual texture unit, which
this operation will use for texture feeds.

The following operations are defined, each of them requires from one up
to 3 arguments:

replace arg1        -> set the result to arg1 (res = arg1)

mul arg1 arg2       -> multiply (modulate) arg1 and arg2 (res = arg1 * arg2)

adds arg1 arg2      -> do a signed add (res = arg1 + arg2 - 0.5)

add arg1 arg2       -> do an unsigned add (res = arg1 + arg2)

ipol arg1 arg2 arg3 -> interpolate between arg1 and arg2 with arg3 (res = arg1*arg3 + arg2*(1-arg3))

dot arg1 arg2       -> calculate a dot3 product


The following optional postfixes may be appended to the opcodes to manipulate
the final result:

.2  -> multiply the result by 2
.4  -> multiply the result by 4

Please note that those postfixes are not supported by all host systems.


These are the valid keywords for arg1:

tex      - the result of the texture read operation of the current texture unit
           (in short: the current texel)

prim     - the untextured "base pixel" from the lighting equation



These are the valid keywords for arg2:

prev        - the result of the previous operation

const       - the rgba constant color as defined by the .setconst command

prim        - the untextured "base pixel" from the lighting equation


These are the valid keywords for arg3:

const.a     - the alpha component of the constant color

tex.a       - the alpha component of the current texel


Argument strings can be prefixed by a '-' (minus sign) to invert the argument
before it goes into the operation.

Argument strings can be postfixed by a '.c' or '.a' to explicitly select the
rgb component (.c) or alpha component (.a) of the pixel. The default is '.c'
for .setcolorop and '.a' for .setalphaop.

If there is no alpha operation defined, the color op of the same stage will be
duplicated for the alpha component.

Please note that not all combinations are supported by all host systems
(Nebula tries its best to hide the limitations though).

For a few example operations, look at the shadernode.tcl script in the
nebula/data/tekdemos directory.


=====================================================================
Compatibility note
=====================================================================
Shader nodes generally have fewer compatibility problems under Direct3D7
then under OpenGL, since D3D7 has a richer base functionality when it
comes to multitexturing and overall driver support under D3D is better
then under OpenGL.

Under OpenGL the situation is quite simple:

nVidia and ATI      -> good
Matrox and 3dfx     -> bad

nshadernode makes use of EXT_texture_env_combine, which only nVidia and
ATI are offering in their OpenGL drivers. If this extension is not
supported, multitexturing is still used but the missing operations can
only be emulated.

Ironically, Matrox cards (at least the G400) work perfectly well under
D3D, so hardware-wise they should be able to support EXT_texture_env_combine.

3dfx is disappointing both under D3D and OpenGL, their OpenGL driver is
still a joke after all those years, and at least the Voodoo3 seems to lack
several important texture blending modes.


=====================================================================
Virtual texture units
=====================================================================

Each pixel operation stage of a shadernode has a 'virtual texture unit'
assigned, which defines how a texture read operation will actually be
performed. It's 'virtual' because there may be more texture units
defined in the shadernode (up to 4) then there are present in hardware
(usually 1 or 2), so that Nebula has to remap the virtual texture units
to actual hardware texture units.

Most render states a texture unit holds are known from nmatnode, new is
the access to the texture matrix.

Here are the commands which configure a texture unit:

.begintunit
.setaddress
.setminmagfilter
.settexcoordsrc
.setconst
.setenabletransform
.txyz
.rxyz
.sxyz
.endtunit

See the script documentation of nshadernode for details.

Please note that there is also a set of functions to manipulate the
translation, rotation, scaling of texture units directly, which is more
friendly for the nipol/nmixer classes:

.txyz0..3
.rxyz0..3
.sxyz0..3

Attaching interpolators to shadernode objects can provide very cool
animated texturing effects (esp. if combined with multitexturing).

If the shader node uses at least one texture, you have to provide a
ntexarraynode object as well, which defines the required texture images.
See the shadernode.tcl script in the tekdemos directory for details.


=====================================================================
Lighting and fogging parameters
=====================================================================

The following command define the lighting parameters for the
shadernode. Lighting happens before the pixel operation sequence
is entered, while fogging happens after the pixel operation sequence
is left and before the pixel is written to the frame buffer.

.setlightenable
.setdiffuse
.setemissive    (NOTE! name changed from nmatnode!)
.setambient
.setfogenable

Again, see the script docs for nshadernode for details.


=====================================================================
Alpha blending
=====================================================================

Alpha blending defines the "final" operation of how the resulting 
lighted, multitextured, fogged pixel is written to the frame buffer.

If you want a straight write, just disable alpha blending:

.setalphaenable false

This is the default.

For alpha blending to work, enable alpha blending, and define
the blending operation:

.setalphaenable true
.setalphablend src_blend dst_blend

See the script docs for details.


=====================================================================
Other render states
=====================================================================

There's a bunch of other render states commands in shadernode which
deal with some more obscure details of the rendering process:

.setzwriteenable    - turns zbuffer updates on/off for this object
.setzfunc           - defines the z-value accept/reject operation
.setcullmode        - controls backface culling
.setcolormaterial   - reroutes vertex colors into the lighting equation

Check the nshadernode script docs for details.


=====================================================================
 *** IMPORTANT OPTIMIZATION NOTE ***
=====================================================================

Nebula principally supports sharing of shader nodes. The more
shadernodes that can be shared in a scene, the fewer render state switches
happen, which can result in drastically improved performance esp.
with hardware t&l.

Shadernodes are shared by pointers. As a consequence, try to keep all your
shader nodes in a global library, and in the actual n3dnode hierarchy,
use nlinknodes which point to the required nshadernode. This will save
tons of runtime space and loading time as well (our Nomads dataset has
dropped from 14 MB to about 7 MB size when switching to a shadernode library,
that is without textures and other resources though).

The tekdemos 'fog.tcl' script shows an example. Note that there is actually
only one shadernode in the whole scene, while there are dozens of donuts.
Nebula is clever enough to only set the required render states once when
the first donut is rendered, each following donut will only set a new
modelview matrix but will not cause new pixelshader or texture related
render state switches. You can check the number of pixel shader switches
in a scene like this:

- in the tekdemos, go to the fogging example
- open the console and type:

> /sys/servers/console.watch gfx* [enter]

- close the console, some interesting statistics should appear

- Note that there are only 2 texture and pixelshader switches in the whole
  scene (in reality there's only one, the other is caused by flushing
  the 'current object' pointers at the beginning of a frame).
  Don't pay attention to the mesh switches line, it doesn't work yet :)

=====================================================================
EOF
=====================================================================

