Notes.txt 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167
  1. ----------------------------------------------------------------------------------------------
  2. General longterm systems:
  3. - Camera controls + world grid
  4. - Frustum culling and octree (or some other) acceleration structure
  5. - Render queue and sorting
  6. ----------------------------------------------------------------------------------------------
  7. Reminders:
  8. - Assets manifest
  9. - I need to be able to create a list of assets used in the game. Assets referenced from the editor
  10. should be easy to account for. But assets loaded from code are special. Maybe be like Unity and allow
  11. special Resources folder?
  12. - When displaying inspector data for a component, take into consideration that it will need to be able
  13. to display that data for user created C# classes as well. AND I will most certainly have C# versions of all my
  14. components. Therefore is there any purpose of having C++ only inspector parsing code?
  15. - When I'll be doing SRGB write make sure GUI textures are handled properly. Right now they are read in gamma space, and displayed
  16. as normal, but if I switch to SRGB write then gamma would be applied twice to those textures.
  17. - Make editor background have a tileable pattern (see Mini paper app on my cell). Since tileable images wouldn't work with scale9grid, it should be an overlay over the solid background.
  18. - Maybe use the background type I used for website in WebDIP?
  19. - Use black with yellow-orange highlights (highlight = selected button, selected frame border, selected entry box, etc.)
  20. - Async callbacks. I'd like to be able to assign a callback to an async method, that will execute on the calling thread once the async operation is complete.
  21. - For example when setting PixelData for a cursor I need to get PixelData from a texture, which is an async operation, in which case I need to block the calling thread
  22. until I get the result. But I'd rather apply the result once render thread is finished.
  23. - GUI currently doesn't batch elements belonging to different GUIWidgets because each of them has its own transform. Implement some form of instancing for DX11 and GL
  24. so this isn't required. GUIManager already has the ability to properly group meshes, all that is needed is a shader.
  25. Potential optimizations:
  26. - bulkPixelConversion is EXTREMELY poorly unoptimized. Each pixel it calls a separate method that does redudant operations every pixel.
  27. - UI buffer updates
  28. - https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/gdc12/Efficient_Buffer_Management_McDonald.pdf
  29. - See here how to avoid locks and stalls when updating just parts of a GUI buffer. In need to modify my mesh update code because it currently always updates 100% of the buffer.
  30. - Maybe get rid of CPU UI buffers completely.
  31. - UI shader resolution params for gui should be in a separate constant buffer
  32. ----------------------------------------------------------------------------------------------
  33. More detailed thought out system descriptions:
  34. <<<<Reducing render state changes>>>>
  35. - Transparent objects get sorted back to front, always
  36. - Opaque objects I can choose between front to back, no sort or back to front
  37. - Then sort based on material-pass combo, rendering all passes of the same material at once, then moving to next pass, then to next material, etc.
  38. - For transucent objects I need to render entire material at once, and not group by pass
  39. - Ignore individual state and textures changes, just sort based on material
  40. - Use key-based approach as described here: http://realtimecollisiondetection.net/blog/?p=86
  41. Questions/Notes:
  42. 1. Could I make use of multiple texture slots so I don't have to re-assign textures for every material when rendering translucent objects pass by pass?
  43. - When sorting back to front (or front to back) it's highly unlikely that there will be many objects sharing the same material next to the same depth level anyway.
  44. So probably ignore this problem for now, and just change the states.
  45. 2. Should sorting be done on main or render thread?
  46. - Main thread. It's highly unlikely main thread will be using as much CPU as render thread will, so free up render thread as much as possible.
  47. 3. Should oct-tree queries be done on main or render thread?
  48. - Main thread as I would need to save and copy the state of the entire scene, in order to pass it to the render thread. Otherwise we risk race conditions.
  49. 4. Since render state and shader changes are much more expensive than shader constant/buffer/mesh (and even texture) changes, it might be a good idea to sort based on these,
  50. instead of exact material? A lot of materials might share shaders and render states but not textures.
  51. 5. This guy: http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D (who's a driver programmer) sorts all opaque objects based on shader/state
  52. into buckets, and then orders elements in these bucks in front to back order. This gives him best of two worlds, early z rejection and low state changes.
  53. <<<<DirectDraw>>>>
  54. - Used for quickly drawing something, usually for debug and editor purposes.
  55. - It consists of methods like: DrawLine, DrawPolygon, DrawCube, DrawSphere, etc.
  56. - It may also contain other fancier methods like DrawWireframeMesh, DrawWorldGrid etc.
  57. - Commands get queued from various Component::update methods and get executed at the end of frame. After they're executed they are cleared and need to be re-queued next frame.
  58. - Internally DirectDraw manages dynamic meshes so it can merge multiple DrawLine class into one and such. This can help performance, but generally performance of this class should not be a major concern.
  59. - Example uses for it:
  60. - Drawing GUI element bounds when debugging GUI
  61. - Drawing a wireframe selection effect when a mesh is selected in the scene
  62. <<<<Multithreaded memory allocator>>>>
  63. - Singlethreaded implementation from Game Engine Gems 2 book
  64. - Small and medium allocators separate for each thread. Memory overhead should be minimal considering how small the pages are. But performance benefits are great.
  65. - Large allocator just uses some simple form of page allocation and reuse, using atomics?
  66. - Must ensure that memory allocated on one thread can only be freed from that thread
  67. - How do I easily tell which allocator to call based on current thread? Require a thread ID with each alloc/dealloc?
  68. - Need to think this through more
  69. <<<<More on memory allocator>>>>
  70. - Regarding potentially often allocating large amounts of memory:
  71. - Ignore this for now. Allocating large amounts (16K+ of memory often probably won't be the case). This will only happen when modifying textures or meshes and I can assume there won't be many of such updates.
  72. - (But there will be multiple such updates per frame when it comes to GUI meshes for example)
  73. - However I should implement allocation counter in my allocator so I can know if I have a bottleneck.
  74. - For those allocations that do hit this limit I should implement a FrameAllocator. Memory is allocated during simulation step and the entire block is cleared when the frame ends.
  75. - Allocations like copying MeshData, PixelData, PassParams, etc. when queing commands for render thread should all be using this.
  76. - Problem with such allocator is safety
  77. - Allocations that are created and deleted in a single function should use a Stack allocator
  78. <<<<Resource changes and reimport>>>>
  79. Use case example:
  80. - User reimports a .gpuproginc file
  81. - Dependencies are held by CoreGpuObjectManager. Objects are required to register/unregister
  82. their dependencies in their methods manually.
  83. - Editor calls SomeClass::reimport(handle) after it detects a change
  84. - this will load a new resource, release the old one and assign the new one to Handle without changing the GUID
  85. - In order to make it thread safe force all threads to finish what they're doing and pause them until the switch is done
  86. - Should be okay considering this should only happen in edit-mode so performance isn't imperative
  87. - reimport is recursively called on all dependant objects as well.
  88. <<<<Handle multithreaded object management>>>:
  89. - Make everything that is possible immutable. Once created it cant be changed.
  90. - Example are shaders, state objects and similar
  91. - Things like Textures, Vertex, Index buffers, GpuParams may be changed
  92. - Make Vertex/Index buffers and similar only accesible from render thread. Higher level classes like meshes can have deferred methods
  93. - TODO - How to handle the remaining actually deferred methods? Like Textures?
  94. DirectX11 supports concurrent drawing and resource creation so all my resource updates should be direct calls to DX methods (I'll need a deferred context?)
  95. - DX9 doesn't so creating/updating resources should wait for render thread?
  96. - Although these are sync points which kill the whole concept of separate render thread
  97. - Updating via copy then? (DX11 driver does it internally if resource is used anyway)
  98. - OpenGL? No idea, need to study GL contexts
  99. - Although it seems DX11 also copies data when mapping/unmapping or updating on a non-immediate context. So maybe copy is the solution?
  100. So final solution:
  101. - Copy all data that will be updated on a deferred context
  102. - Make deferred context have a scratch buffer it can use for storing temporary copied data
  103. - Immediate context will execute all commands right away
  104. - This applies when rendering thread calls resource create/update internally
  105. - Or when other thread blocks and waits for rendering thread
  106. - Create a simple distinction so user knows when is something executed deferred and when immediate?
  107. - Move resource update/create methods to DeferredContext?
  108. - Not ALL methods need to be moved, only those that are resource heavy
  109. - Smaller methods may remain and always stay async, but keep internal state?
  110. - Resource creation on DX11 should be direct though, without a queue (especially if we manage to populate a resource in the same step)
  111. - Remove & replace internal data copying in GpuParamBlock (or just use a allocator instead of new())
  112. A POSSIBLY BETTER SOLUTION THAN COPYING ALL THE DATA?
  113. Classes derive from ISharedMemoryBuffer
  114. - For example PixelData, used when setting texture pixels
  115. - They have lock, unlock & clone methods
  116. - Users can choose whether they want to lock themselves out from modifying the class, or clone it, before passing it to a threaded method
  117. - Downside is that I need to do this for every class that will be used in threaded methods
  118. - Upside is that I think that is how DX handles its buffers at the moment
  119. <<<<RenderSystem needed modifications>>>>
  120. - Texture resource views (Specifying just a subresource of a texture as a shader parameter)
  121. - UAV for textures
  122. - Stream out (write vertex buffers) (DX11 and GL)
  123. - Texture buffers
  124. - Just add a special texture type? OpenGL doesn't support getting offset from within a texture buffer anyway
  125. - Tesselation (hull/domain) shader
  126. - Detachable and readable depthstencil buffer (Window buffers not required as they behave a bit differently in OpenGL)
  127. - OpenGL provides image load/store which seems to be GL UAV equivalent (http://www.opengl.org/wiki/Image_Load_Store)
  128. - Resolving MSAA textures (i.e. copying them to non-MSAA so they can be displayed on-screen). DX has ResolveSubresource, and OpenGL might have something similar.
  129. - Single and dual channel textures (especially render textures, which are very important for effects like SSAO)
  130. - Compute pipeline
  131. - Instancing (DrawInstanced) (DX11 and GL)
  132. - OpenGL append/consume buffers
  133. - Indirect drawing via indirect argument buffers
  134. - Texture arrays
  135. - Rendertargets that aren't just 2D (Volumetric (3D) render targets in particular)
  136. - Shader support for doubles
  137. - Dynamic shader linkage (Interfaces and similar)
  138. - Multisampled texture resources
  139. - Multiple adapters (multi gpu)
  140. - Passing initial data when creating a resource (DX11, but possibly GL too)
  141. - Sample mask when setting blend state (DX11, check if equivalent exists in GL)
  142. - RGBA blend factor when setting blend state(DX11, check if equivalent exists in GL)
  143. - HLSL9/HLSL11/GLSL/Cg shaders need preprocessor defines & includes
  144. - One camera -> one task (thread) approach for multithreading
  145. - Also make sure to run off a thread pool (WorkQueue class already exists that provides needed interface)
  146. - The way I handle rendering currently is to discard simulation results if gpu thread isn't finished.
  147. - This reduces input lag but at worst case scenario the effect of multithreading might be completely eliminated as
  148. GPU ends up waiting for GPU, just because it was few milliseconds late. Maybe better to wait for GPU?