| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280 |
- Reminders:
- - When I'll be doing SRGB write make sure GUI textures are handled properly. Right now they are read in gamma space, and displayed
- as normal, but if I switch to SRGB write then gamma would be applied twice to those textures.
- - Async callbacks. I'd like to be able to assign a callback to an async method, that will execute on the calling thread once the async operation is complete.
- - For example when setting PixelData for a cursor I need to get PixelData from a texture, which is an async operation, in which case I need to block the calling thread
- until I get the result. But I'd rather apply the result once render thread is finished.
- - GUI currently doesn't batch elements belonging to different GUIWidgets because each of them has its own transform. Implement some form of instancing for DX11 and GL
- so this isn't required. GUIManager already has the ability to properly group meshes, all that is needed is a shader.
- - When specifying GUIElement layout using (for example) GUILayoutOptions::expandableX then it would be useful if I didn't have to provide the Y height, and instead make it use
- the default value for that GUIElement type. For most elements I will only be changing width, and height will remain default (labels, buttons, toggles, drop down lists, etc.)
- so this would be helpful in a way I wouldn't need to look up actual element height in EngineGUI.
- - GUI currently ignores tooltips in GUIContent.
- - A way to initialize BansheeEngine without RenderSystem or any kind of UI. So that it may be used for server building as well.
- - Calls to "initialize()" should be protected. For example GUIWidget and all its derived classes require the user to separately call initialize() after construction.
- However a much better option would be to wrap construction and initialization in a create() method and make initialize protected.
- - GUIWidget needs to be added using addComponent which makes "create()" method not practical. However I have trouble seeing the need for initialize(),
- it could be replaced with a variable parameter version of addComponent.
- - Profiling: Create an easy to browse list of all loaded Resources (similar to Hierarchy in Unity, just for loaded resources)
- - Possibly also for all core objects
- - GUI ignores image in GUIContent for most elements
- - Each view (i.e. camera) of the scene should be put into its own thread
- - How do I handle multiple mesh formats? Some files need animation, other don't. Some would mabye like to use QTangent, others the proper tangent frame.
- - Asset postprocessor? Imports a regular mesh using normal importers and then postprocesses it into a specialized format?
- - Load texture mips separately so we can unload HQ textures from far away objects (like UE3)
- - Add Unified shader so I can easily switch between HLSL and GLSL shaders (they need same parameters usually, just different code)
- - Maybe just add support for Cg and force everyone to use that? - I'd like to be able to just switch out renderer in a single location and that everything keeps on working without
- further modifications.
- - Port boost threads to std threads (CmThreadDefines.h)
- - Remove HardwarePixelBuffer (DX11 doesn't use it, and DX9 and OpenGL textures can be rewritten so they have its methods internally)
- - Multihead device
- - Don't forget to check out Unity DX11 documentation on how to implement DX11 features (http://docs.unity3d.com/Documentation/Manual/DirectX11.html)
- - Go to Game Engine Architecture book and make a list of Utility systems we will need (Config files, Parsers, File I/O etc)
- - Go to GEA book and read about resource managers before implementing them
- - Actually I should re-read most of the chapers in the book, or all of it
- - OpenGL non-Win32 window files haven't been properly parsed or tested
- - Since I probably can't compile them, try adding them to VS and see what intellisense says?
- - Textures and all other buffers keep a copy of their data in system memory. If there are memory constraints we might need a way to avoid this.
- - Make sure my Log system uses XML + HTML
- - There is an issue that custom-UIs won't have their mesh shared. For example most game UIs will be advanced and will
- likely use on GUIWidget per element. However currently I only perform batching within a single widget which
- doesn't help in the mentioned case.
- - Later add InputMap class in which you can bind certain actions (like move left, fire, etc.) to Keyboard, Joystick or Mouse buttons.
- - Also ensure button combinations are possible. e.g. on keyboard I might want to press F1 to open debug menu, but on joystick it might be A+B+X
- - Add a field that tracks % of resource deserialization in BinarySerializer
- - Add GL Texture buffers (They're equivalent to DX11 buffers) - http://www.opengl.org/wiki/Buffer_Texture
- - Instead of doing setThisPtr on every CoreGpuObject, use intrusive shared_ptr instead?
- - I should consider creating two special Mesh types:
- StreamMesh - constantly updated by CPU and read by GPU
- ReadMesh - written by GPU and easily read by CPU
- - OpenGL especially has no good way of reading or streaming data. It has special STREAM and COPY buffer types which I never use.
- - (EXTREMELY LOW PRIORITY) Scripting: It might be good to make Mono classes more generic and move them to BansheeEngine.
- e.g. MonoClass -> ScriptClass, where ScriptClass is just an abstract interface. Then I don't expose any Mono stuff to actually script libraries like
- SBansheeEngine. User could then fairly easily port the system to another scripting language just by implementing another ScriptSystem.
- - This would probably come with an overhead of at least one extra function call for each script call, which is currently unacceptable
- considering that most people will definitely won't be writing new script systems.
- - Perhaps add code generation functionality to the engine through Mono? I know Mono has support for it though its Embed interface
- - Add instancing to engine: All I really need to add are per-instance attributes in VertexData (and MeshData). And then RenderSystem::renderInstance method that also accepts an instance count.
- - Debug console
- - Add ability to add colors tags like <color=#123>
- - When showing a debug message, also provide a (clickable?) reference to Component it was triggered on (if applicable)
- - It really helps when you get an error on a Component that hundreds of SceneObjects use
- - When displaying an error with a callstack, make each line of the callstack clickable where it opens the external editor
- Potential optimizations:
- - bulkPixelConversion is EXTREMELY poorly unoptimized. Each pixel it calls a separate method that does redudant operations every pixel.
- - UI shader resolution params for gui should be in a separate constant buffer
- ----------------------------------------------------------------------------------------------
- More detailed thought out system descriptions:
- <<<<Memory allocation critical areas>>>>
- - Binding gpu params. It gets copied in DeferredRenderContext
- - GameObjectHandle often allocates its internal data
- - ResourceHandle often allocates its internal data
- - AsyncOp allocates AsyncOpData internally
- - Deserialization, a lot of temporary allocations going on - But how much impact on performance will allocations have considering this is probably limited by disk read?
- - Creating SceneObjects and Components - I might want to pool them, as I suspect user might alloc many per frame
- - Log logMsg
- <<Multithreaded GUI rendering>>
- - Event handling and normal "update" will still be done on the main thread
- - At the beginning of each frame a GUI mesh update is queued on the GUI thread
- - Since we're queuing the update at the beggining of the frame we will be using last frames transform and gui element states.
- - When queing we need to make sure to store GUIWidget transform, and specific element states (e.g. "text" in GUILabel)
- - At the end of simulation frame wait until GUI update is complete. After both simulation and GUI updates are complete, proceed with submitting it to render system.
- <<Figure out how to store texture references in a font>>
- - Currently I store a copy of the textures but how do I automatically update the font if they change?
- - Flesh out the dependencies system?
- - I can import texture as normal, and keep it as an actual TextureHandle, only keep it hidden
- if it was created automatically (by FontImporter) for example?
- - But then who deletes the texture?
- - Set up an "internalResource" system where resources hold references to each other and also release them?
- - In inspector they can be expanded as children of the main resource, but cannot be directly modified?
- - Deleting the main resource deletes the children too
- <<<<SceneManager/SceneObject>>>>
- Two major parts:
- SceneObject update:
- - Just takes care of updating world/local transforms
- - Transforms are only updated when requested
- - Marking a transform dirty will also mark it dirty in the SceneManager
- - Only SceneObjects that have an Interactable type component (Renderer, Collider, etc.) will reside in SceneManager
- SceneManager maintains a list of world matrices, bounds and primitives
- - We can query them for various collision (ray, frustum, etc.)
- - Internally they likely use BVH or oct-tree
- - Uses a binary-tree (unlike SceneObject hierarchy which is n-ary), which is laid out neatly in memory for quick traversal.
- - What happens when object is removed/added? Tree keeps on growing, empty nodes have their space always reserved. Possibly add a method that can reduce tree size when enough empty nodes exist.
- Adding a node might increase tree size which will involve a memcpy while we increase the size.
- - (Updating an object is SceneManager should optionally transfer its world matrix to its owner object as well)
- This separation should work fine, as scripts requesting transforms is unlikely to be something that done often. Or not nearly as much as frustum culling and raycasting will be.
- How will Physics update objects (and when?)
- - It's unlikely there will be a massive amount of rigidbodies in the scene, so updating them should not be a huge matter of performance.
- Calling setTransform after physics simulation update (and before SceneManager update) should work fine.
- Updating children dirty whenever setPos/rot/tfrm is called is potentially slow. Can it be avoided?
- - Then again such updates will only be done from the simulation thread, usually from scripts so its unlikely they will be many of them
- Do I separate SceneObject and Transform?
- - I don' think I need to
- Make sure to let the user know SceneManager only gets updated after simulation, so changes to objects wont be applied right away.
- (See that he doesn't transform something and call Raycast just so it fails)
- - It is unlikely functionality when query results are needed right after transform will be used much.
- So it is acceptable to implement it like this. We might add SceneManager::forceUpdate method in case it is not acceptable.
- !!!BUG!!! - When I change parent I don't update individual local position/rotation/scale on scene object
- - Also I don't have a way of setting world pos/rot directly
- <<<<Reducing render state changes>>>>
- - Transparent objects get sorted back to front, always
- - Opaque objects I can choose between front to back, no sort or back to front
- - Then sort based on material-pass combo, rendering all passes of the same material at once, then moving to next pass, then to next material, etc.
- - For transucent objects I need to render entire material at once, and not group by pass
- - Ignore individual state and textures changes, just sort based on material
- - Use key-based approach as described here: http://realtimecollisiondetection.net/blog/?p=86
- Questions/Notes:
- 1. Could I make use of multiple texture slots so I don't have to re-assign textures for every material when rendering translucent objects pass by pass?
- - When sorting back to front (or front to back) it's highly unlikely that there will be many objects sharing the same material next to the same depth level anyway.
- So probably ignore this problem for now, and just change the states.
- 2. Should sorting be done on main or render thread?
- - Main thread. It's highly unlikely main thread will be using as much CPU as render thread will, so free up render thread as much as possible.
- 3. Should oct-tree queries be done on main or render thread?
- - Main thread as I would need to save and copy the state of the entire scene, in order to pass it to the render thread. Otherwise we risk race conditions.
- 4. Since render state and shader changes are much more expensive than shader constant/buffer/mesh (and even texture) changes, it might be a good idea to sort based on these,
- instead of exact material? A lot of materials might share shaders and render states but not textures.
- 5. This guy: http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D (who's a driver programmer) sorts all opaque objects based on shader/state
- into buckets, and then orders elements in these bucks in front to back order. This gives him best of two worlds, early z rejection and low state changes.
- <<<<Input System>>>>
- - Input is currently ignoring all axes except for mouse axes
- - Remove/Improve smoothing
- - Add ability to get raw or smoothed axis input for any axis (currently you can only get mouse X and Y axes)
- - Allow the user to map axes to custom keys. e.g. Left/right axis can have A and D keys where A returns -1, and D 1
- - Add a way to handle multiple devices (e.g. 2 or more joysticks)
- - Hook up OIS joystick callbacks and test joysticks
- <<<<DirectDraw>>>>
- - Used for quickly drawing something, usually for debug and editor purposes.
- - It consists of methods like: DrawLine, DrawPolygon, DrawCube, DrawSphere, etc.
- - It may also contain other fancier methods like DrawWireframeMesh, DrawWorldGrid etc.
- - Commands get queued from various Component::update methods and get executed at the end of frame. After they're executed they are cleared and need to be re-queued next frame.
- - Internally DirectDraw manages dynamic meshes so it can merge multiple DrawLine class into one and such. This can help performance, but generally performance of this class should not be a major concern.
- - Example uses for it:
- - Drawing GUI element bounds when debugging GUI
- - Drawing a wireframe selection effect when a mesh is selected in the scene
- <<<<Multithreaded memory allocator>>>>
- - Singlethreaded implementation from Game Engine Gems 2 book
- - Small and medium allocators separate for each thread. Memory overhead should be minimal considering how small the pages are. But performance benefits are great.
- - Large allocator just uses some simple form of page allocation and reuse, using atomics?
- - Must ensure that memory allocated on one thread can only be freed from that thread
- - How do I easily tell which allocator to call based on current thread? Require a thread ID with each alloc/dealloc?
- - Need to think this through more
-
- <<<<More on memory allocator>>>>
- - Regarding potentially often allocating large amounts of memory:
- - Ignore this for now. Allocating large amounts (16K+ of memory often probably won't be the case). This will only happen when modifying textures or meshes and I can assume there won't be many of such updates.
- - (But there will be multiple such updates per frame when it comes to GUI meshes for example)
- - However I should implement allocation counter in my allocator so I can know if I have a bottleneck.
- - For those allocations that do hit this limit I should implement a FrameAllocator. Memory is allocated during simulation step and the entire block is cleared when the frame ends.
- - Allocations like copying MeshData, PixelData, PassParams, etc. when queing commands for render thread should all be using this.
- - Problem with such allocator is safety
- - Allocations that are created and deleted in a single function should use a Stack allocator
- <<<<Resource changes and reimport>>>>
- Use case example:
- - User reimports a .gpuproginc file
- - Dependencies are held by CoreGpuObjectManager. Objects are required to register/unregister
- their dependencies in their methods manually.
- - Editor calls SomeClass::reimport(handle) after it detects a change
- - this will load a new resource, release the old one and assign the new one to Handle without changing the GUID
- - In order to make it thread safe force all threads to finish what they're doing and pause them until the switch is done
- - Should be okay considering this should only happen in edit-mode so performance isn't imperative
- - reimport is recursively called on all dependant objects as well.
- <<<<Handle multithreaded object management>>>:
- - Make everything that is possible immutable. Once created it cant be changed.
- - Example are shaders, state objects and similar
- - Things like Textures, Vertex, Index buffers, GpuParams may be changed
- - Make Vertex/Index buffers and similar only accesible from render thread. Higher level classes like meshes can have deferred methods
- - TODO - How to handle the remaining actually deferred methods? Like Textures?
- DirectX11 supports concurrent drawing and resource creation so all my resource updates should be direct calls to DX methods (I'll need a deferred context?)
- - DX9 doesn't so creating/updating resources should wait for render thread?
- - Although these are sync points which kill the whole concept of separate render thread
- - Updating via copy then? (DX11 driver does it internally if resource is used anyway)
- - OpenGL? No idea, need to study GL contexts
- - Although it seems DX11 also copies data when mapping/unmapping or updating on a non-immediate context. So maybe copy is the solution?
- So final solution:
- - Copy all data that will be updated on a deferred context
- - Make deferred context have a scratch buffer it can use for storing temporary copied data
- - Immediate context will execute all commands right away
- - This applies when rendering thread calls resource create/update internally
- - Or when other thread blocks and waits for rendering thread
- - Create a simple distinction so user knows when is something executed deferred and when immediate?
- - Move resource update/create methods to DeferredContext?
- - Not ALL methods need to be moved, only those that are resource heavy
- - Smaller methods may remain and always stay async, but keep internal state?
- - Resource creation on DX11 should be direct though, without a queue (especially if we manage to populate a resource in the same step)
- - Remove & replace internal data copying in GpuParamBlock (or just use a allocator instead of new())
- A POSSIBLY BETTER SOLUTION THAN COPYING ALL THE DATA?
- Classes derive from ISharedMemoryBuffer
- - For example PixelData, used when setting texture pixels
- - They have lock, unlock & clone methods
- - Users can choose whether they want to lock themselves out from modifying the class, or clone it, before passing it to a threaded method
- - Downside is that I need to do this for every class that will be used in threaded methods
- - Upside is that I think that is how DX handles its buffers at the moment
- <<<<RenderSystem needed modifications>>>>
- - Texture resource views (Specifying just a subresource of a texture as a shader parameter)
- - UAV for textures
- - Stream out (write vertex buffers) (DX11 and GL)
- - Texture buffers
- - Just add a special texture type? OpenGL doesn't support getting offset from within a texture buffer anyway
- - Tesselation (hull/domain) shader
- - Detachable and readable depthstencil buffer (Window buffers not required as they behave a bit differently in OpenGL)
- - OpenGL provides image load/store which seems to be GL UAV equivalent (http://www.opengl.org/wiki/Image_Load_Store)
- - Resolving MSAA textures (i.e. copying them to non-MSAA so they can be displayed on-screen). DX has ResolveSubresource, and OpenGL might have something similar.
- - Single and dual channel textures (especially render textures, which are very important for effects like SSAO)
- - Compute pipeline
- - Instancing (DrawInstanced) (DX11 and GL)
- - OpenGL append/consume buffers
- - Indirect drawing via indirect argument buffers
- - Texture arrays
- - Rendertargets that aren't just 2D (Volumetric (3D) render targets in particular)
- - Shader support for doubles
- - Dynamic shader linkage (Interfaces and similar)
- - Multisampled texture resources
- - Multiple adapters (multi gpu)
- - Passing initial data when creating a resource (DX11, but possibly GL too)
- - Sample mask when setting blend state (DX11, check if equivalent exists in GL)
- - RGBA blend factor when setting blend state(DX11, check if equivalent exists in GL)
- - HLSL9/HLSL11/GLSL/Cg shaders need preprocessor defines & includes
- - One camera -> one task (thread) approach for multithreading
- - Also make sure to run off a thread pool (WorkQueue class already exists that provides needed interface)
- - The way I handle rendering currently is to discard simulation results if gpu thread isn't finished.
- - This reduces input lag but at worst case scenario the effect of multithreading might be completely eliminated as
- GPU ends up waiting for GPU, just because it was few milliseconds late. Maybe better to wait for GPU?
- <<<Localization notes for MUCH LATER>>>
- - It would be nice if HString identifier hash was being generated at compile time
- - I still need an easy way to edit the string table (Editor, importer or similar)
- - I might need font localization for non-standard character sets (e.g. russian, greek, asian, etc.)
- - I probably don't want to use one huge set of textures containing both latin and asian characters but want to keep them separate
- - Also asian sets might be too large for textures, in which case generating them at runtime might be necessary (or parsing string table and
- generating textures from only used characters)
|