$19
Goal
• Attenuated lights
• Various types of dynamic 3D transforms (rotation, shears, transformation order, etc.)
• A static surface of revolution on the CPU
• A dynamic surface of revolution on the GPU
• Deferred rendering
• (Bonus) Tiled deferred rendering
NOTE: For maximum compatibility with student computers/drivers, we are only going to use OpenGL 2.1.
(The graininess is due to the video compression – better quality video here.)
Associated Labs
• Lab 0 (required): Setting Up Your Development Environment
• Lab 10 (recommended for Task 4): Sphere with Squash and Stretch
• Lab 11 (recommended for Task 5): Surface of Revolution
• Lab 12 (recommended for Task 6): Multiple Render Targets
Task 1: Multiple Attenuated Lights
There is no base code provided for this assignment. Please start with your previous lab/assignment code. Please come see the instructor or the TA if you need help completing your previous assignment.
1. Start with your A3 or A4 code.
◦ If starting from A4, strip away the code for HUD, texture, and the top-down view.
◦ If starting from A3, replace the lighting model to be like in A4 – the light source should be defined in world coordinates rather than in camera coordinates.
2. Add at least 100 objects in the scene.
◦ Each object should have a random diffuse color.
◦ Each object should have a random scale.
◦ Replace the ambient color with the object’s “emissive” color, which will be used for rendering the lights. For these 100+ objects, the emissive color should be zero.
◦ You can hard code the specular color to
with exponent
• .
• Distribute these objects to an area of about
• units on the XZ plane, centered about the origin of the world. (E.g., do not put them in a single line.)
• Add at least 10 lights, each with a color.
• Change the background color to black with glClearColor(0.0f, 0.0f, 0.0f, 1.0f); in the init function.
• To compute the fragment color in the shader, compute the diffuse and specular RGB color as before in A4 (but with no ambient), and then multiply its three color components by the corresponding color components of the light.
• The lights’ uniform parameters should be passed as an array of glm::vec3s to the fragment shader. You can use the following syntax: glUniform3fv(uniformID, count, value_ptr(array[0]));, where uniformID is the ID pointing to the uniform variable in the shader (e.g., prog->getUniform("foo"))), count is the array length, and array is the array of glm::vec3s. In the shader, the array should be declared as uniform vec3 foo[10], assuming that count=10.
• The vertical position of each light (i.e., its y coordinate) should be around half the height of the objects. (See images above.)
• Now add attenuation to the lights. We’re going to use the standard quadratic attenuation model in OpenGL:
where is the distance between the fragment and the light, and , , are the constant, linear, and quadratic attenuation factors. For this part of the assignment, we want the light to fall off to % of its strength at and to % at , which gives us , , and
• . The color at the fragment should be scaled by this attenuation value.
vec3 fragColor = ke; // emissive color of the object (see below)
for(...) { // for each light
float diffuse = ...;
float specular = ...;
vec3 color = lightColor * (kd * diffuse + ks * specular);
float attenuation = 1.0 / (A0 + ...);
fragColor += color * attenuation;
}
• Each light should be displayed as a sphere, using the same fragment shader as everything else. We’ll use the “emmissive” color,
, for this, which is the color that the light is emitting. This value should be set to the light color when rendering the sphere for the light and for everything else. Putting everything together, the fragment color for all objects and lights should be computed as follows: where is the ith light. The notation indicates that the multiplication should be done component-wise, for R, G, and B. This equation allows us to use a single shader to render everything in the scene – the lights will be colored using
, and other objects will be colored with attenuated Blinn-Phong. To summarize:
• When rendering the lights, set
to be the color of the light and set and
• to be zero.
• When rendering the other objects, set to be zero and set and
◦ to be the objects’ material parameters.
Task 2: Rotating Bunnies
Rather than changing the scale of the bunny over time as in A4, rotate it around its vertical axis. As in A4, the overall scale of the bunny should still be randomized, and the bunny should touch the floor but not intersect it.
In this image (and the following), I am using only one light to better illustrate the motion.
Task 3: Shearing Teapots
Rather than changing the scale of the teapot over time as in A4, shear it so that it sways from side to side. The teapot should look like it is glued to the floor. As in A4, the overall scale of the teapot should still be randomized.
Task 4: Bouncing Spheres
Add some bouncing spheres, following the steps outlined in Lab 10. The sphere should have a randomized radius, and it should touch the floor but not intersect it. When the sphere is moving up or down, its scale in X and Z should be made smaller to achieve “squash and stretch.” The geometry of the sphere should be created and stored in memory just once in the init() function. To display multiple spheres in the scene, use different transformation matrices passed in as uniform variables.
Task 5: Surface of Revolution
Add some surfaces of revolution, following the steps outlined in Lab 11. First, implement a static surface on the CPU and then move the computation over to the GPU to allow a dynamic surface. Like the sphere, the vertex attributes of the surface of revolution should be created just once in the init() function. To display multiple surfaces of revolution in the scene, use different transformation matrices passed in as uniform variables. Just like the other objects in the scene, the surface-of-revolution objects should just touch the ground.
Task 6: Deferred Rendering
We are now going to implement deferred rendering. Since this step requires render to texture and multiple render targets, it may help to complete Lab 12 first. Deferred rendering will require substantial overhauling of your code base, so you should make sure to apply source control so that you can easily get back to your old code if needed.
In deferred rendering, we use two passes. In the first pass, we render to multiple render targets to create textures that hold all the information needed to compute the color of each fragment. In the second pass, we render a view-aligned quad with the textures from the first pass, and then do the actual lighting computation in the fragment shader.
First Rendering Pass
The four images below show the four textures we need to generate in the first pass. The size of these textures should be the same as the onscreen framebuffer size, which can be obtained with glfwGetFramebufferSize(...). (The default size is 640 x 480; later, support for resizing the window will be added.)
1. The first image is the camera-space position of all of the fragments. In this visualization, the position
is coded with . Since RGB needs to be between and
• , the visualization shows a black area in the lower left, corresponding to the region where the camera-space positions are all negative. Also, since the camera-space Z coordinate of all of these fragments are negative, there is no blue component in the color output of any of the fragments.
• The second image is the camera-space normal of all of the fragments. As before, the normal’s
is coded with
2. . Fragments whose normals are pointing to the right in camera-space are colored red, those whose normals are pointing up are colored green, and those whose normals are pointing toward the camera are colored blue.
1. The third image is the emissive color of all the fragments. In this image, I have 200 randomly colored lights, but in your code, you may only have a small number of lights.
2. The fourth image is the diffuse color of all the fragments.
These four textures must be generated as the output of the first pass. To do so, first, change the texture format in your C++ code to use 16 bit floats: GL_RGB16F instead of GL_RGBA8, and GL_RGB instead of GL_RGBA.
// L12
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_RGBA, GL_FLOAT, NULL);
...
// A5: replace the last line above with this:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB16F, width, height, 0, GL_RGB, GL_FLOAT, NULL);
In this assignment, there are four textures, so the line above must be used four times. The fragment shader of the first pass can now write floating point values to the four textures:
#version 120
varying vec3 vPos; // in camera space
varying vec3 vNor; // in camera space
uniform vec3 ke;
uniform vec3 kd;
void main()
{
gl_FragData[0].xyz = vPos;
gl_FragData[1].xyz = vNor;
gl_FragData[2].xyz = ke;
gl_FragData[3].xyz = kd;
}
The vertex shaders for the first pass depends on what is being drawn. Bunny, teapot, and sphere should be drawn with a simple vertex shader that transforms the position and normal into camera space (vPos and vNor in the fragment shader above). Surface of revolution will require another vertex shader.
Second Rendering Pass
In the second pass, we draw a view-aligned quad that completely fills the screen. In this stage of the assignment, we can simply draw a unit square somewhere close enough to the camera so that it ends up covering the whole screen. The vertex shader for the second pass is very simple:
#version 120
uniform mat4 P;
uniform mat4 MV;
attribute vec4 aPos;
void main()
{
gl_Position = P * (MV * aPos);
}
This vertex shader simply transforms the vertex position from model space to clip space. The fragment shader will use the textures created in the first pass to compute the final fragment colors that end up on the screen. Rather than using the texture coordinates of the quad, we can compute them in the fragment shader using the keyword gl_FragCoord, which stores the window relative coordinate values for the fragment. Dividing this by the window size gives the correct texture coordinates, which is
at the lower left corner and
at the upper right corner. Using these texture coordinates, read from the four textures and then calculate the color of the fragment. Additionally, you need to pass in the light information to this fragment shader as uniform variables (e.g., light positions and colors).
#version 120
uniform sampler2D posTexture;
uniform sampler2D norTexture;
uniform sampler2D keTexture;
uniform sampler2D kdTexture;
uniform vec2 windowSize;
... // more uniforms for lighting
void main()
{
vec2 tex;
tex.x = gl_FragCoord.x/windowSize.x;
tex.y = gl_FragCoord.y/windowSize.y;
// Fetch shading data
vec3 pos = texture2D(posTexture, tex).rgb;
vec3 nor = texture2D(norTexture, tex).rgb;
vec3 ke = texture2D(keTexture, tex).rgb;
vec3 kd = texture2D(kdTexture, tex).rgb;
// Calculate lighting here
...
gl_FragColor = ...
}
For debugging, consider these substeps.
• In the fragment shader for the second pass, simply color the quad red (gl_FragColor.rgb = vec3(1.0, 0.0, 0.0);). This should give you a fully red screen.
• Color the fragment using the computed texture coordinates (gl_FragColor.rg = tex;). The screen should look red/green.
• Color the fragment using each texture (e.g., gl_FragColor.rgb = pos;). You should see the 4 images at the beginning of this task.
Please set up the code so that the grader can easily produce the 4 images at the top of this section. To get full points, the final output, as well as these 4 images must be correct. Please put in your README file how to produce these images (e.g., “Uncomment line XX in some shader file.”).
HINT: Debugging OpenGL & GLSL
• Set the Program class to be verbose by calling the setVerbose() function. If there is a GLSL compilation error, then you will see the error in the console. For example, if the varying variables of the vertex shader and the fragment shaders do not match up, it will tell you so. Pay attention to the line number (e.g., line 28 in the error log below). Make sure to set verbose to be false after debugging.
Shader InfoLog:
ERROR: 0:28: ...
...
• Use GLSL::checkError(GET_FILE_LINE); to find which OpenGL call caused an error. This function will assert if there were any OpenGL errors before getting to this line. You can use this to winnow down which OpenGL function is causing an error. For example, if you put this line at the top, the middle, and the bottom of your function (shown below), and if the assertion happens in the middle, you know that the error must be happening in the top half of your function. You can then keep interspersing the checkError line into more places into the code. Once you find exactly which OpenGL call is causing the error, you can Google the OpenGL function to figure out what caused the error. For example, maybe one of the arguments should not have been zero or null.
void render()
{
GLSL::checkError(GET_FILE_LINE);
Some OpenGL lines
GLSL::checkError(GET_FILE_LINE);
More OpenGL lines
GLSL::checkError(GET_FILE_LINE);
}
• The GLSL compiler will silently optimize away any variables that are not used in the shader. If you try to access these variables at runtime, the program will crash, since these variables no longer exist in the shader. In this lab, when you move the computation of the normal to the GPU, the aNor variable no longer needs to be passed to the GPU, since it is computed in the shader. Therefore, you will have to comment out any reference to aNor from your C++ runtime code. Or, you can trick the GLSL compiler from optimizing away aNor by using it and disgarding it as follows:
vec3 nor = aNor.xyz;
nor.x = ...;
nor.y = ...;
nor.z = ...;
Bonus: Window Resizing
Add support for window resizing for deferred rendering. Use the framebuffer size callback in GLFW. Note that the code will slow down a lot if the window size is increased, since there are many more fragments to be processed.
Bonus: Tiled Deferred Rendering
In this bonus task, we are going to increase the number of lights. First, because we are going to have many more lights, we are going to adjust the attenuation factor for this part of the assignment. Otherwise the scene will be over-exposed (too bright). Now, we want the light to fall off to
% of its strength at and to % at , which gives us , , and
.
First, implement the naive approach, which uses a for-loop to go through all of the lights in the fragment shader. This is the ground truth, but this will become much too slow as the number of lights increases.
Therefore, we will now use tiled deferred rendering. The basic idea is to render a grid of small quads instead of one big quad that fills the screen. Each small quad will be rendered with only a subset of lights that affect the fragments within that quad. The images below show quads with black borders to visualize the grid. The various shades of red indicate the number of lights each quad is using.
For each quad, we need to know which lights need to be included. To do this, we perform frustum-sphere intersection tests.
• Each 2D quad on the screen is a 3D frustum in world space. The quad must be drawn somewhere between the near and far planes, and the size of the quad depends on exactly where it is drawn. In my implementation, I draw my quads half way between the near and far planes.
• We will assume that each light has a sphere of influence of radius
, since the intensity of each light falls to % of its strength at distance
• .
[Wikimedia Commons]
The image above shows a view frustum originating at the camera and sandwiched between the near and far planes. The near and far distances are stored in the Camera class, if you’re using it. (Both
and are negated because by convention these are defined to be positive distances away from the camera, but the actual near and far planes are along the negative Z axis in camera space.) The corners of the near plane are defined by left, right, bottom, and top (, , , and ). In the previous lab/assignment code, we have been using the field-of-view (fovy) and aspect arguments. To convert this to
, use the following:
l = -n * tan(fovy/2) * aspect;
r = -l;
b = -n * tan(fovy/2);
t = -b;
Once the overall view frustum for the whole window is derived, divide it into smaller frustums, one for each tile. For each small frustum, perform a frustum-sphere intersection test.
[Wikimedia Commons]
The image above shows a schematic of frustum-sphere intersections. For this assignment, each light sphere is of unit radius because of the attenuation factors chosen. For each frustum, check which light spheres intersect it, and then when rendering the quad for this frustum, only include the intersecting lights.
For the intersection tests, use the code by Daniel Holden. (You may need to pass arguments by reference to get good performance.) This piece of code requires you to specify the 6 planes of the frustum. I suggest using camera space to specify both the frustum and the spheres. To specify a frustum plane, you need to specify a point and a direction. For example, the near plane of the frustum can be defined by the point
and direction , and the far plane can be defined by the point and direction
. The other 4 planes all share a common point (camera’s position in camera space), and so you can use the cross product to compute the plane directions. Don’t forget to normalize the result of the cross product.
Bonus: More Optimizations
Unfortunately, on many GPUs, dynamic branching can really slow down the shader. This means that even if we have a loop in the shader to calculate the contributions from only the lights intersecting the frustum, the GPU will not be able to speed up the code, since each quad will require a different subset of lights to be processed. The way we are going to get around this problem is to have multiple shaders that can handle increasing number of lights. For example, let’s say we have 200 lights in the scene. Then we create a shader that can handle
lights. If a quad has 5 lights, then we use the shader that can handle up to 20 lights, for 21 lights, use the shader with
lights, and so on. These shaders should have the following lines at the very top:
#version 120
const int LIGHTS_MAX = 20;
The shader expects the version string (#version 120) first. Otherwise, the shader will use the most basic profile, which is almost certainly not what you want. The second line above declares the constant that defines the number of lights this shader can handle. We want to create multiple shaders with different numbers in this line. Rather than creating these shaders as separate files, we can prepend some lines to the shader in the C++ code as follows (Program.cpp):
GLuint FS = glCreateShader(GL_FRAGMENT_SHADER);
const char *fshader = GLSL::textFileRead(fShaderName.c_str());
string fstr = "";
for(auto p : prepend) {
fstr += p + "\n";
}
fstr += fshader;
const char *fstr_c = fstr.c_str();
glShaderSource(FS, 1, &fstr_c, NULL);
Here, prepend is a vector<string> that contains the lines to be added to the beginning of the shader.
Unfortunately, this might still be too slow because switching between GLSL programs can now become the bottleneck. It is reasonable to switch the program a few times during a render call, but changing 100 times would cause the parallelism to breakdown. Therefore, we sort the quads by the number of intersecting lights and then draw the quads in an increasing order in terms of the number of lights, switching the shader only when the number of lights crosses the next threshold. For example, let’s say the sorted quads have the following number of lights:
. Then we draw the first two quads using the ‘’ shader, we draw the next six quads using the ‘
’ shader, etc. Figuring out how many shaders to use vs. cramming a lot of complexity into fewer shaders is a difficult balancing act that depends on the hardware, driver, and the OpenGL/GLSL code.
There are more optimizations that can be performed, but with the ones listed above, you should be able to get 200 lights at interactive rates for the default window size.
Point breakdown
• 15 points for Task 1: colored lights with attenuation.
• 10 points for Task 2: rotating bunnies.
• 15 points for Task 3: shearing teapots.
• 15 points for Task 4: bouncing sphere.
• 10 points for Task 5a: static surface of revolution on the CPU.
• 10 points for Task 5b: dynamic surface of revolution on the GPU.
◦ If the GPU code works, do not hand in the CPU code.
• 20 points for Task 6: deferred rendering.
• 5 points for coding style and general execution (e.g., loading each OBJ only once, etc.).
• +5 points for support for window resizing.
• +20 points for tiled deferred rendering.
• +10 points for optimized tiled deferred rendering.
Total: 100 plus 35 bonus points
What to hand in
Failing to follow these points may decrease your “general execution” score. On Linux/Mac, make sure that your code compiles and runs by typing:
> mkdir build
> cd build
> cmake ..
> make
> ./A5 ../resources
If you’re on Windows, make sure that you can build your code using the same procedure as in Lab 0.
For this assignment, there should be only one argument. You can hard code all your input files (e.g., obj files) in the resources directory.
• Make sure the arguments are exactly as specified.
• Include an ASCII README file that includes:
◦ Your name, UID, and email
◦ The highest task you’ve completed
◦ Citations for any downloaded code
◦ For Task 6: how to generate the 4 textures for deferred rendering
◦ Plus anything else of note
• Make sure you don’t get any compiler warnings.
• Remove unnecessary debug printouts.
• Remove unnecessary debug code that has been commented out.
• Hand in src/, resources/, CMakeLists.txt, and your readme file. The resources folder should contain the obj files and the glsl files.
• Do not hand in:
◦ The build directory
◦ The executable
◦ Old save files (*.~)
◦ Object files (*.o)
◦ Visual Studio files (.vs)
◦ Git folder (.git)
• Create a single zip file of all the required files.
◦ The filename of this zip file should be UIN.zip (e.g., 12345678.zip).
◦ The zip file should extract a single top-level folder named UIN/ (e.g. 12345678/).
◦ This top-level folder should contain your README, src/, CMakeLists.txt, etc.
◦ Use the standard .zip format (not .gz, .7z, .rar, etc.).