Perspective Corrected Texturing
[From Tom Hammerslevs Graphics Coding Page
Check it out at http://www.users.globalnet.co.uk/~tomh ]
Perspective Corrected Texturing
---------------------------------------------------------------------------
Introduction
Perspective texturing is an almost essential addition to any 3D engine
nowadays. People are sick of watching affine texture slip and slide all
over their polygons, they want something better. And perspective texturing
solves this problem. I'll be describing Chris Hecker-style perspective
texturing here, rather than the 3 "Magic Vectors" method.
---------------------------------------------------------------------------
Affine Texturing
Ordinary texturing is very simple to code. We specify u, v at each vertex
of our polygon (I'll be describing triangles however). We interpolate the U
and V across the triangle edges, and then across scanlines. With triangles,
the delta for the scanline is constant, which speeds things up a lot. Then
its just a case of interpolating texture co-ords across a scanline, and
sampling the texture map. Very simple, very fast. And very limited. If your
triangles get too big, too much difference in Zs at vertices etc, the
texture starts to slide. It looks horrible. We need perspective correction.
---------------------------------------------------------------------------
Perspective Texturing
Probably the most common form of perspective texturing is done via a divide
by Z. Its a very simple algorithm. Instead of interpolate U and V, we
instead interpolate U/Z and V/Z. 1/Z is also interpolated. At each pixel,
we take our texture co-ords, and divide them by Z. Hang on, you're thinking
- if we divide by the same number twice (Z) don't we get back to where we
started - like a double reciprocal? Well, sort of. Z is also interpolated,
so we're not dividing by the same Z twice. We then take the new U and V
values, index into our texture map, and plot the pixel. Pseudo-code might
be:
su = Screen-U = U/Z
sv = Screen-V = V/Z
sz = Screen-Z = 1/Z
for x=startx to endx
u = su / sz
v = sv / sz
PutPixel(x, y, texture[v])
su += deltasu
sv += deltasv
sz += deltasz
end
Very simple, and very slow.
---------------------------------------------------------------------------
Speeding Up The Routine
The first thing that comes to mind when speeding up this routine is the two
divides - divides are a slow operation, and should be avoided. So, we'll
turn those 2 divides into a reciprocal and a multiply:
for x=startx to endx
recip = 1.0 / sz
u = su * recip
v = sv * recip
PutPixel(x, y, texture[v])
su += deltasu
sv += deltasv
sz += deltasz
end
This helps things a little. The second big way of speeding it up is to lerp
(linear interpolate) between sets of 'correct' u, v. We calculate correct
u, v every n pixels, and interpolate between them. This cuts down on the
divides overall, but it can lead to problems: if your correction value is
too high for your resolution, the texture will 'wiggle' - the sample rate
is too low. If your correction value is too high, you'll see all sorts of
weird bendy patterns at certain viewing angles. It takes a little time to
find the best correction level for a given resolution. Pseudo-code for this
would look something like:
zinv = 1.0 / sz; // do the divide here
width++;
oddedge = width & cormask; // test for case of raggy-edge
zinv *= 65536.0;
u = su * zinv;
v = sv * zinv; // reciprocal then multiply
RoundToInt(logu1, u);
RoundToInt(logv1, v);
sv += cordvdelta; // cordvdelta etc are deltasv*correction
su += cordudelta;
sz += cordzdelta;
zinv = (1.0 / sz) * 65536.0; // muls by 65536 are used to do
// u << 16 a little better
// one fmul = 3 clocks
// 2 shls = 2*2 clocks
while(width > 0) {
if(width >= correction)
pixels = correction;
else
pixels = oddedge; // we have a raggy edge
width -= correction; // even if edge is raggy loop will
// still terminate due to this
u = su * zinv;
v = sv * zinv;
RoundToInt(&logu2, u);
RoundToInt(&logv2, v);
luadd = (logu2 - logu1) >> corshift; // deltas for linear
lvadd = (logv2 - logv1) >> corshift; // pass
logu = logu1; // 'logical' u and v
logv = logv1;
sv += cordvdelta;
su += cordudelta;
sz += cordzdelta;
zinv = 1.0 / sz; // again, do divide in parallel
while(pixels--) {
index = ((logv >> 8) & 0xFF00) +
((logu >> 16) & 0xFF);
PutPixel(x, y, texture[index]);
logu += luadd;
logv += lvadd;
}
zinv *= 65536.0;
logu1 = logu2;
logv1 = logv2;
}
This is based on the loop I use. I use the idea of doing floating point
operations in parallel a lot here, because it means we can effectively get
them for free. However it is often quite hard to persuade the compiler that
this is what you want to do; it'll take a little experimentation. Note also
that this loop doesn't have a seperate if () {} statement to cover the case
of a 'raggy-edge', like most perspective texturers do. I see that in a lot
of code, this way is smaller, and easier to maintain and optimize.
---------------------------------------------------------------------------
Other Considerations
A lot of people take a religious hatred to floating point calculations,
because they think they are slow. Well, they may have been slow in times
past, but now CPUs can do them very quickly indeed; they can be up to 29%
faster on a 486DX, and 40% faster on a 586 (intel). I found these figures
by doing 1,000,000 matrix muls, 1,000,000 * (Add/Sub/Div/Mul). Note that
conversion to integer however, is still slow.
I know one person in particular was adamant that FPU was slower in his
tests. What was his test? Something like:
for(x=0;x<65536; x++)
array[x] = 1.0 * x;
What a stupid test! For a number of reasons:
1. Its not representative of the kind of work you'd do in a 3D engine
2. Most compilers would optimize away a mutiplication by 1.0.
3. Conversion to integer can be slow; especially in Watcom C 10.6 for
DOS. Did you know that to convert to integer, it calls a function
__CHP, which contains the following code:
__CHP: push eax
fstcw dword ptr [esp]
wait
push dword ptr [esp]
mov byte ptr +1H[esp],1fH
fldcw dword ptr [esp]
frndint
fldcw dword ptr +4H[esp]
wait
lea esp,+8H[esp]
ret
Its amazing what can be done with WDISASM, and WLIB, and a little
lateral thought...
FPU operations can also be done in parallel with the integer unit on Intel
chips. I don't think this can be done on Cyrix. Thats not worth worrying
about. Despite all you might hate intels monopoly, Cyrix have no real
chance of ever breaking it. So you may as well optimize with an Intel chip
in mind.
The 1/z values can also be used for Z-buffering. Which is very handy. You
can then have perspective correct texturing, and perspective correct
Z-Buffering, at little speed cost. See my page on Z-Buffering for more
information on that.
I also toyed with the idea of pre-perspective correcting textures. I heard
that in Quake, textures are lit in a seperate pass to the texturing, due to
the lack of registers on the 586 (can Intel count higher than the number of
fingers they posess?). I wonder if it would be possible to do a similar
thing with perspective texturing? Theoretically, it shouldn't work, because
I think that the routine is dependant on the shape of the polygon being
mapped to. If anyone has any thoughts on this, I'd be very interested.
Another possible speed up would be to use an affine texture where the
change in z is very little, and a perspective texturer where the change is
large. Hmm.. what would this look like in code?
average-z = (z1 + z2 + z3) / 3
zdiff = 0
for n=1 to 3
zdiff += (z(n) - average-z)**2
end
if zdiff < z-threshold**2
Affinetexture(polygon)
else
PerspectiveTexture(polygon)
end
Sound about right? Maybe I'll try this one day. Idea here is to find
difference between average Z and Z of each triangle. Distance is not square
-rooted, just kept as a square, then compared against squared threshold. If
too much Z change is present, then perspective is used. This however may
fail with large triangles.
Tom Hammersley, tomh@globalnet.co.uk
[BACK] Back
Advertisement
Recommended Tutorials
Other Tutorials by Myopic Rhino
26550 views
23756 views
Advertisement