1

 Timothy J. Purcell
 Mike Cammarano
 Pat Hanrahan
 Stanford University

2


3

 Interactive global illumination on the GPU
 Nearly have sufficient compute power and flexibility
 Explore GPUbased computation algorithms

4

 CPUbased interactive global illumination
 Supercomputers [Parker et al.]
 Clusters [Tole et al., Wald et al.]
 Global illumination on programmable GPUs
 Ray tracing [Carr et al., Purcell et al.]
 Photon mapping [Ma et al.]
 Radiosity [Carr et al., Coombe et al.]
 Translucency [Carr et al., Stamminger et al.]

5

 Photon tracing
 Emission, scattering, storing into kdtree
 Similar to ray tracing
 Rendering
 Ray tracing for direct illumination
 Photon map visualization

6

 Constructing a irregular or sparse data structure

7

 Adaptive nearest neighbor search

8

 Adaptive nearest neighbor search

9

 Balanced kdtree
 Compact storage of photons
 Efficient
 O(log n) search
 Priority queue
 Nearest neighbor search
 Incremental insertion and removal of photons

10

 Direct visualization of photon map
 Keeps rendering costs low
 Use grid instead of kdtree
 Tried kdtree…
 Kdtree construction is difficult
 Radiance estimate
 Fixed radius search works fine
 Adaptive search needs priority queue
 No priority queue

11

 Mapped complete gridbased photon mapping algorithm onto the GPU
 Including photon tracing, ray tracing, etc.
 Implemented an adaptive knearest neighbor search
 Show how to construct a sparse data structure on the GPU
 Bitonic merge sort with binary search
 Stencil routing

12

 GPU as data parallel compute engine
 Fragment programs execute compute kernels
 Screen sized quad initializes computation
 Floating point texture memory
 Rendertotexture for intermediate results
 Data structure storage
 Pointer dereferencing via dependent fetches

13

 Building a Sparse Data Structure

14

 Requires scatter
 Why don’t we have fragment scatter?
 Fragment processing has highly coherent blocked memory writes
 Extra hardware support would be needed
 Write hazards
 Memory latencies

15

 Sort photons into grid cells
 Simulate scatter with fragment programs
 Bitonic merge sort followed by binary search
 Compact grid
 O(log^{2} n) rendering passes

16


17

 Grid cell searches for self in photon list
 If none, find first element in next cell
 Empty grid cells waste compute
 Log(n) + 1 steps

18

 Grid cell searches for self in photon list
 If none, find first element in next cell
 Empty grid cells waste compute
 Log(n) + 1 steps

19

 Grid cell searches for self in photon list
 If none, find first element in next cell
 Empty grid cells waste compute
 Log(n) + 1 steps

20

 Grid cell searches for self in photon list
 If none, find first element in next cell
 Empty grid cells waste compute
 Log(n) + 1 steps

21

 Grid cell searches for self in photon list
 If none, find first element in next cell
 Empty grid cells waste compute
 Log(n) + 1 steps

22

 Grid cell searches for self in photon list
 If none, find first element in next cell
 Empty grid cells waste compute
 Log(n) + 1 steps

23

 Vertex programs can scatter

24

 Vertex programs can scatter
 Draw point to buffer
 Stencil routing
 Limit photon count per grid cell
 Preallocate grid cell space
 Draw photons as points
 Vertex program computes grid cell
 Stencil buffer controls location within cell
 Single rendering pass

25

 Fix each grid cell size to n^{2} pixels
 Draw fat points to cover each fat cell

26

 Control location written to with stencil
 Pass when stencil is n^{2} 1
 Stencil always increments
 Location written depends on draw order

27

 Adaptive Nearest Neighbor Search

28

 Iterative algorithm
 Accept or reject photons in cell visit order

29


30

 Candidate photons must be within max search radius
 Visit voxels in order of distance to sample point

31

 If current number of photons in estimate is less than number requested,
grow search radius

32

 If current number of photons in estimate is less than number requested,
grow search radius

33

 Don’t add photons outside maximum search radius
 Don’t grow search radius when photon is outside maximum radius

34

 Add photons within search radius

35

 Add photons within search radius

36

 Don’t expand search radius if enough photons already found

37

 Add photons within search radius

38

 Visit all other voxels accessible within determined search radius
 Add photons within search radius

39

 Finds all photons within a sphere centered about sample point
 May locate more than requested knearest neighbors

40

 NVIDIA GeForce FX 5900 Ultra (NV35)
 Cg compiler 1.1

41


42


43


44


45


46


47


48


49

 How to prevent program execution over a subset of pixels?
 Nonuniform pixel computation distribution
 KILL is only a write mask
 Earlyz occlusion culling
 Compute mask, branching, or stream buffer?
 Improve radiance estimate speed by 3070% over tiling

50

 Scatter
 Makes (a programmer’s) life easier
 Is it worth implementing?
 Gain factor of log^{2 }n avoiding sort

51

 Kdtrees
 Photon power redistribution
 Adaptive sampling
 Progressive refinement

52

 The GPU can compute an entire global illumination solution
 Implemented an adaptive knearest neighbor query for the GPU
 Shown how to construct sparse data structures on the GPU
 Bitonic merge sort and binary search
 Stencil routing
 Sorting and searching algorithms applicable to other computations

53

 Stanford FlashG
 Ian Buck, Mike Houston, Kekoa Proudfoot
 Stencil routing
 Kurt Akeley, Matt Papakipos
 Hardware and drivers
 David Kirk, Nick Triantos
 Funding

54


55

 Adaptive nearest neighbor search

56

 Unpredictable photon distance
 Nearby cells may have distant photons
 Unpredictable search radius enlargement
 Maximum search radius enlargement is fixed for grid
 Determined by grid cell size

57

 Readmodifywrite buffers
 Efficient save and restore over multiple passes
 Addressing modes
 2D textures vs. 1D textures
 Integer computation
 Sometimes you need a mod or a div

58

 Data dependencies
 Incremental construction
 Child node location depends on parent node
 One sorting pass per tree level
 Grid has one level of dependency
