tardy renderer minor 1

    Rasterization is one of the typical techniques for rendering 3D models. By following the tinyrenderer course, this project aims to explore and understand the rasterization pipeline in low-level 3D graphics libraries. Let’s start with a first basic implementation.

Important

All models (Wavefront .obj file format) are from the tinyrenderer course. See it here.

Topic of Lessons

Bresenham’s line drawing

1st attempt

2nd attempt, round 1

2nd attempt, round 2

3rd attempt

Note

All measurements in Release mode with optimizations enabled (averaged over 5 runs)

Here are the performance measurements on Windows.

PowerShell
cd ./x64/Release
Measure-Command { .\tardy_renderer_minor_1.exe }

4th attempt, round 0 (avg 4.46053706 sec)

4th attempt, round 1 (avg 3.24540542 sec)

4th attempt, round 2 (avg 4.3917408 sec)

4th attempt, round 3 (avg 3.91366674 sec)

Note

In the past, floating-point operations were significantly more expensive than integer ones (or even entirely inaccessible). This is why Jack Elton Bresenham developed his all-integer rasterization algorithm in the 1960s.

4th attempt, round 4 (avg 3.230245646 sec)

Thank you, Professor Bresenham. I see you.

Triangle rasterization

initial three triangles

Note

Since the three vertices need to be sorted, I adopted the insertion sort from C++ standard library’s introsort.

insertion sort (ascending order)

lower half border

upper half border

full border with base line

After sorting, the edges are colored as follows:

  • edge between the 1st and 2nd vertices: blue
  • edge between the 2nd and 3rd vertices: green
  • edge between the 3rd and 1st vertices: red

full border

Warning

While the top-left rule is standard practice for rasterization, it’s unnecessary here as the three triangles are non-overlapping.

lower half (top-left rule O)

lower half (top-left rule X)

upper half (top-left rule O)

upper half (top-left rule X)

filled triangles with baseline

filled triangles

AABB means Axis-Aligned Bounding Box, a type of bounding volume.

AABB rasterization

I think it is more efficient to use float instead of double. Because it provides sufficient precision for optimization with no visually perceivable loss in quality.

triangle rasterization (double)

triangle rasterization (float)

The output of scanline rendering and modern rasterization are NOT same. This discrepancy is driven by three technical factors:

  1. quantization errors from integer arithmetic,
  2. tie-breaking rules (like the Top-Left rule) for shared edges,
  3. and the shift from incremental to pixel-independent interpolation.

Ultimately, the modern rasterization approach provides superior precision and architectural consistency.

Barycentric coordinates

The second attempt code is a variation of the digital differential analyzer (DDA) algorithm.

t=xaxbxaxt = \frac{x - a_x}{b_x - a_x}

xx represents the sampling position. It is not the target being interpolated, but the input that drives interpolation. tt indicates how far along it is from the start point (axa_x) to the end point (bxb_x). If xx is at axa_x, then t=0t = 0. If xx is at bxb_x, then t=1t = 1.

y=(1t)ay+tbyy = (1 - t) \cdot a_y + t \cdot b_y

Based on the equation of a line, estimate where yy should be for the value of xx. To do this, apply the ratio tt computed earlier. This is called linear interpolation, also known as lerp.

1D barycentric coordinates example

The detailed discussion of the barycentric coordinate system will be covered separately.

2D barycentric coordinates example

Hidden faces removal

Each {Model} - add a simple trick corresponds to “Triangle rasterization with a simple trick” homework. The difference is in color:

  • solid (random) color := modern rasterization approach + back-face culling
  • gradient color := full-colored triangles by barycentric coordinates

At a glance, it is difficult to distinguish the first image (add a simple trick ver.) from the second one (painter's algorithm ver.). That’s why I visualized the comparison of two images. The (C, M, Y) colors indicate region where pixels do not exactly match, by computing pixel-wise difference between two images.

Note

The image comparison was implemented using OpenCV in a separate project, and I plan to extend it further.

I just wanted to paint all triangles in the rear-to-front order, as presented in the lecture. The volume rendering-like effect was made by reconstructing 96 frames into 4 seconds GIF with FFmpeg. In other words, it was encoded at 24 FPS to produce cinematic feeling.

Diablo - add a simple trick

Diablo - painter's algorithm

Diablo - visualization (pixel difference)

Diablo - visualization (rear-to-front order)

Boggie (body) - add a simple trick

Boggie (body) - painter's algorithm

Boggie (body) - visualization (pixel difference)

Boggie (body) - visualization (rear-to-front order)

African head (head) - add a simple trick

African head (head) - painter's algorithm

African head (head) - visualization (pixel difference)

African head (head) - visualization (rear-to-front order)

The initial, yet incomplete depth buffer appears. The depth buffer colors represent:

  • black: depth value close to 00 (far)
  • white: depth value close to 11 or 2bits12^{bits} - 1 (near)

This is because the depth buffer is initialized to 0, which corresponds to black.

Diablo - depth interpolation (Z-buffer)

Diablo - depth interpolation (Framebuffer)

Boggie (body) - depth interpolation (Z-buffer)

Boggie (body) - depth interpolation (Framebuffer)

African head (head) - depth interpolation (Z-buffer)

African head (head) - depth interpolation (Framebuffer)

Once updated to the nearest depth value per pixel, the result (Z-buffer) is finally complete. Compared to the previous incomplete depth buffer, grayscale gradation becomes even smoother. The volume rendering-like effect is made in a similar manner.

Warning

In Painter’s algorithm, the effect appears at the triangle level, whereas here it appears at the pixel level. This still differs from actual volume rendering, as all but a single depth value are discarded by the Z-buffer.

Diablo - per-pixel painter's algorithm (Z-buffer, front view)

Diablo - per-pixel painter's algorithm (Framebuffer, front view)

Diablo - per-pixel painter's algorithm (Z-buffer, left view)

Diablo - per-pixel painter's algorithm (Framebuffer, left view)

Diablo - per-pixel painter's algorithm (Z-buffer, top view)

Diablo - per-pixel painter's algorithm (Framebuffer, top view)

Diablo - visualization (rear-to-front order)

Boggie (body) - per-pixel painter's algorithm (Z-buffer, front view)

Boggie (body) - per-pixel painter's algorithm (Framebuffer, front view)

Boggie (body) - per-pixel painter's algorithm (Z-buffer, left view)

Boggie (body) - per-pixel painter's algorithm (Framebuffer, left view)

Boggie (body) - per-pixel painter's algorithm (Z-buffer, top view)

Boggie (body) - per-pixel painter's algorithm (Framebuffer, top view)

Boggie (body) - visualization (rear-to-front order)

African head (head) - per-pixel painter's algorithm (Z-buffer, front view)

African head (head) - per-pixel painter's algorithm (Framebuffer, front view)

African head (head) - per-pixel painter's algorithm (Z-buffer, left view)

African head (head) - per-pixel painter's algorithm (Framebuffer, left view)

African head (head) - per-pixel painter's algorithm (Z-buffer, top view)

African head (head) - per-pixel painter's algorithm (Framebuffer, top view)

African head (head) - visualization (rear-to-front order)

Naive camera handling

The rotation matrix that rotates vectors by an angle θ\theta (= 30°30\degree) about the yy-axis in three dimensions, following the right-hand rule, is given by:

Ry(θ)=[cosθ0sinθ010sinθ0cosθ] R_y(\theta) = \begin{bmatrix} \cos \theta & 0 & \sin \theta \\ 0 & 1 & 0 \\ -\sin \theta & 0 & \cos \theta \end{bmatrix}

There are two ways to apply this rotation.

  1. Keep the camera fixed and rotate the scene (including all models). ✅
  2. Keep the scene fixed and rotate the camera.

In this chapter, the first approach is used. These two methods are equivalent and differ only by an inverse transformation. So the result is indistinguishable to the viewer—much like in Einstein’s elevator experiment.

Diablo - rotating the camera (Z-buffer)

Diablo - rotating the camera (Framebuffer)

Boggie (full) - rotating the camera (Z-buffer)

Boggie (full) - rotating the camera (Framebuffer)

African head (full) - rotating the camera (Z-buffer)

African head (full) - rotating the camera (Framebuffer)

To express perspective—one of the key elements of visual realism—a simplified form of perspective projection is used. (Ref. 3D projection) This makes the fundamental geometry easier to understand. The camera is placed on the zz-axis at (0, 0, cc), and the projection plane is the xyxy-plane at z=0z = 0. This setup differs from the standard rendering pipeline used in OpenGL, where the camera is fixed at the origin and the projection is performed onto a near plane in front of it.

Important

In the standard rendering pipeline, projection is tightly coupled with clipping. In particular, it determines the last visible point along the viewing frustum by projecting from an out-of-view (invisible) point after all necessary transformations have been applied. As a result, the projection matrix must map the entire frustum into a canonical view volume, which leads to a significantly more complex matrix. (Ref. The Perspective Projection Matrix)

Diablo - central projection (Z-buffer)

Diablo - central projection (Framebuffer)

Boggie (full) - central projection (Z-buffer)

Boggie (full) - central projection (Framebuffer)

African head (full) - central projection (Z-buffer)

African head (full) - central projection (Framebuffer)

Better camera
Shading
More data!
Tangent space
Shadow mapping
Ambient occlusion
Bonus: toon shading

Homework

Bresenham’s line drawing (wireframe rendering)

Diablo

You can see the actual appearance in this video. (Ref. Diablo Ⅲ Cinematics)

Diablo - front view (Visual Studio Model editor)

Diablo - left view (Visual Studio Model editor)

Diablo - top view (Visual Studio Model editor)

The resolution in the Visual Studio Model editor is 500 x 500, while the others (my renderer) are 1000 x 1000.

Diablo - front view

Diablo - left view

Diablo - top view

Boggie - front view (body)

Boggie - front view (eyes)

Boggie - front view (head)

Boggie - front view (full)

African head - front view (head)

African head - front view (eye inner)

African head - front view (eye outer)

African head - front view (full)

Triangle rasterization with a simple trick

Note

All renders using random colors are shown from a fixed front view.

Diablo - straightforward rasterization

Boggie (body) - straightforward rasterization

Boggie (full) - straightforward rasterization

African head (head) - straightforward rasterization

African head (full) - straightforward rasterization

A simple trick includes the following:

  1. backface culling,
  2. avoiding division by zero,
  3. discarding triangles (< a pixel).

Diablo - add a simple trick

Boggie (body) - add a simple trick

Boggie (full) - add a simple trick

African head (head) - add a simple trick

African head (full) - add a simple trick

Barycentric coordinates (full-colored vs wireframe effect)

After sorting, the vertices are colored as follows:

  • the top vertex: blue
  • the middle vertex: green
  • the bottom vertex: red

Let’s lerp!

full-colored triangle

Then, let’s find the magic number. Here’s the method I found after a few attempts.

visualization (magenta: alpha > 0.1)

visualization (yellow: beta > 0.1)

visualization (cyan: gamma > 0.1)

visualization (white: intersection)

It looks like I can subtract the white triangle area from the whole triangle. Professor, would 0.10.1 be the magic number?

wireframe triangle

Note

All renders using specific colors are shown from a fixed front view.

For we are bound by the (R, G, B) Color. The sacred union of our every thought and emotion. :)

    “For we are bound by the Khala…the sacred union of our every thought and emotion."

Hierarch Artanis (StarCraft Wiki)

In full-colored models, I can observe consistent patterns. In contrast, in wireframe models, the thickness of wireframe effect is proportional to the size of each triangle, so it is NOT uniform.

full-colored Diablo

wireframe Diablo

full-colored Boggie (full)

wireframe Boggie (full)

full-colored African head (full)

wireframe African head (full)

Graphics mathematics

Graphics mathematics templates are based on three design choices:

  1. Adopt a column-major layout to align with the convention used by GLM, OpenGL, and Vulkan.
  2. Apply modern C++ features such as constexpr and [[nodiscard]] to make the code more robust and expressive.
  3. Take care to document each function with mathematical details so that its intent is clear.
vector
C++
template<int N> struct vec
{
    double data[N] = {};

    // i-th element
    constexpr double& operator[](int i) {
        assert(i >= 0 && i < N);
        return data[i];
    }
    constexpr double operator[](int i) const {
        assert(i >= 0 && i < N);
        return data[i];
    }
};

// addition (component-wise)
template<int N> constexpr vec<N> operator+(vec<N> a, const vec<N>& b) {
    for (int i = N; i--; a[i] += b[i])
        ;
    return a;
}

// subtraction (component-wise)
template<int N> constexpr vec<N> operator-(vec<N> a, const vec<N>& b) {
    for (int i = N; i--; a[i] -= b[i])
        ;
    return a;
}

// multiplication: vector x scalar (component-wise)
template<int N> constexpr vec<N> operator*(vec<N> v, double s) {
    for (int i = N; i--; v[i] *= s)
        ;
    return v;
}

// multiplication: scalar x vector (component-wise)
template<int N> constexpr vec<N> operator*(double s, vec<N> v) { return v * s; }

// division: vector / scalar (component-wise)
template<int N> constexpr vec<N> operator/(vec<N> v, double s) {
    for (int i = N; i--; v[i] /= s)
        ;
    return v;
}

// dot product (sum of component-wise multiplication)
template<int N> constexpr double operator*(const vec<N>& a, const vec<N>& b) {
    double sum = 0;
    for (int i = N; i--; sum += a[i] * b[i])
        ;
    return sum;
}

// stream output
template<int N> std::ostream& operator<<(std::ostream& out, const vec<N>& v) {
    for (int i = 0; i < N; ++i)
        out << v[i] << " ";
    return out;
}

// L2 norm: ||v||
template<int N> double norm(const vec<N>& v) { return std::sqrt(v * v); }

// normalization: v / ||v||
template<int N> vec<N> normalized(const vec<N>& v) { return v / norm(v); }

template<> struct vec<2>
{
    double x = 0, y = 0;

    constexpr double& operator[](int i) {
        assert(i >= 0 && i < 2);
        return i ? y : x;
    }
    constexpr double operator[](int i) const {
        assert(i >= 0 && i < 2);
        return i ? y : x;
    }
};

template<> struct vec<3>
{
    double x = 0, y = 0, z = 0;

    constexpr double& operator[](int i) {
        assert(i >= 0 && i < 3);
        return i ? (1 == i ? y : z) : x;
    }
    constexpr double operator[](int i) const {
        assert(i >= 0 && i < 3);
        return i ? (1 == i ? y : z) : x;
    }
};

// cross product: a × b (for vec3)
// perpendicular to both a and b
// magnitude: |a||b|sin(theta)
inline vec<3> cross(const vec<3>& a, const vec<3>& b) {
    return {a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x};
}

template<> struct vec<4>
{
    double x = 0, y = 0, z = 0, w = 0;

    constexpr double& operator[](int i) {
        assert(i >= 0 && i < 4);
        return i < 2 ? (i ? y : x) : (2 == i ? z : w);
    }
    constexpr double operator[](int i) const {
        assert(i >= 0 && i < 4);
        return i < 2 ? (i ? y : x) : (2 == i ? z : w);
    }

    constexpr vec<2> xy() const { return {x, y}; }
    constexpr vec<3> xyz() const { return {x, y, z}; }
};

// type alias
using vec2 = vec<2>;
using vec3 = vec<3>;
using vec4 = vec<4>;
matrix (column-major)
C++
// forward declaration
template<int N> struct recursive_det;

template<int R, int C> struct mat
{
    vec<R> col[C] = {};

    // j-th column
    constexpr vec<R>& operator[](int j) {
        assert(j >= 0 && j < C);
        return col[j];
    }
    constexpr const vec<R>& operator[](int j) const {
        assert(j >= 0 && j < C);
        return col[j];
    }

    // i-th row -> vec<C>
    [[nodiscard]] constexpr vec<C> row(int i) const {
        vec<C> row_vec;
        for (int j = C; j--; row_vec[j] = col[j][i])
            ;
        return row_vec;
    }

    // determinant (only square matrix)
    // Ref. recursive_det<C>::expand
    [[nodiscard]] double det() const {
        static_assert(R == C);
        return recursive_det<C>::expand(*this);
    }

    // cofactor matrix (only square matrix)
    // (1) minor (T: 1 / F: 0)
    // (2) det(minor) * (-1)^{r + c}
    [[nodiscard]] double cofactor(int r, int c) const {
        assert(r >= 0 && r < R && c >= 0 && c < C);
        mat<R - 1, C - 1> minor;
        for (int i = R - 1; i--;)
            for (int j = C - 1; j--; minor[j][i] = col[j + (j >= c)][i + (i >= r)])
                ;
        return minor.det() * ((r + c) % 2 ? -1 : 1);
    }

    // transpose of the inverse matrix
    // (1) inverse matrix := 1 / det(M) * C^T (C: cofactor matrix)
    // (2) 1 / det(M) * C
    // Note. Laplace expansion
    // det(M) := sum of products of the entries in that row and their respective cofactors
    [[nodiscard]] mat inverse_transpose() const {
        mat cof_mat;
        for (int i = R; i--;)
            for (int j = C; j--; cof_mat[j][i] = cofactor(i, j))
                ;
        return cof_mat / (col[0] * cof_mat[0]);
    }

    // inverse matrix
    [[nodiscard]] mat inverse() const { return inverse_transpose().transpose(); }

    // transpose matrix
    [[nodiscard]] mat<C, R> transpose() const {
        mat<C, R> tp_mat;
        for (int i = R; i--;)
            for (int j = C; j--; tp_mat[i][j] = col[j][i])
                ;
        return tp_mat;
    }
};

// addition
template<int R, int C> mat<R, C> operator+(mat<R, C> a, const mat<R, C>& b) {
    for (int j = C; j--; a[j] = a[j] + b[j])
        ;
    return a;
}

// subtraction
template<int R, int C> mat<R, C> operator-(mat<R, C> a, const mat<R, C>& b) {
    for (int j = C; j--; a[j] = a[j] - b[j])
        ;
    return a;
}

// multiplication: matrix x scalar
template<int R, int C> mat<R, C> operator*(mat<R, C> m, double s) {
    for (int j = C; j--; m[j] = m[j] * s)
        ;
    return m;
}

// multiplication: matrix x vector
template<int R, int C> vec<R> operator*(const mat<R, C>& m, const vec<C>& v) {
    vec<R> col_vec;
    for (int i = R; i--; col_vec[i] = m.row(i) * v)
        ;
    return col_vec;
}

// multiplication: vector x matrix
template<int R, int C> vec<C> operator*(const vec<R>& v, const mat<R, C>& m) {
    vec<C> row_vec;
    for (int j = C; j--; row_vec[j] = v * m[j])
        ;
    return row_vec;
}

// multiplication: matrix x matrix
template<int R, int K, int C> mat<R, C> operator*(const mat<R, K>& a, const mat<K, C>& b) {
    mat<R, C> m;
    for (int j = C; j--; m[j] = a * b[j])
        ;
    return m;
}

// division: matrix / scalar
template<int R, int C> mat<R, C> operator/(mat<R, C> m, double s) {
    for (int j = C; j--; m[j] = m[j] / s)
        ;
    return m;
}

// stream output (row-major)
template<int R, int C> std::ostream& operator<<(std::ostream& out, const mat<R, C>& m) {
    for (int i = 0; i < R; ++i)
        out << m.row(i) << "\n";
    return out;
}

// expand -> cofactor -> det -> -> expand -> ...
template<int N> struct recursive_det
{
    static double expand(const mat<N, N>& m) {
        double sum = 0;
        for (int i = N; i--; sum += m[0][i] * m.cofactor(i, 0))
            ;
        return sum;
    }
};

template<> struct recursive_det<1>
{
    static double expand(const mat<1, 1>& m) { return m[0][0]; }
};

Custom Features

  • Add PNG image output support using stb_image_write.h.
  • Add byte_per_pixel() (accessor function) to support generic color interpolation.
  • Apply clang-format based on Blender style.
  • Separate render modes (wireframe, Lerp, Z-buffer, etc.).
  • Add Mesh constructor overloading to support multi-mesh loading.
  • Visualize volume rendering-like effect (triangle-level / pixel-level).

Development Environment

Hardware specifications

  • CPU: Intel Core i7-8750H (6 cores, 12 threads @ 2.20GHz)
  • GPU: NVIDIA GeForce GTX 1060 (6GB VRAM)
  • RAM: 16GB

Software requirements

  • OS: Windows 11 Home (Build 26200.7462)
  • IDE: Visual Studio 2026
  • Compiler: MSVC (included with Visual Studio 2026)
  • C++ Standard: C++20
  • Build System: Visual Studio Solution (.sln)
  • Graphics API: None - Pure software rasterizer (CPU-based rendering)

Dependencies

  • stb_image_write.h - For PNG image export (header-only library)