cpp
/
BansheeEngine
kopia lustrzana https://github.com/larioteo/BansheeEngine.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280
							Reminders:
  - When I'll be doing SRGB write make sure GUI textures are handled properly. Right now they are read in gamma space, and displayed
    as normal, but if I switch to SRGB write then gamma would be applied twice to those textures.
  - Async callbacks. I'd like to be able to assign a callback to an async method, that will execute on the calling thread once the async operation is complete.
     - For example when setting PixelData for a cursor I need to get PixelData from a texture, which is an async operation, in which case I need to block the calling thread
	   until I get the result. But I'd rather apply the result once render thread is finished.
  - GUI currently doesn't batch elements belonging to different GUIWidgets because each of them has its own transform. Implement some form of instancing for DX11 and GL
    so this isn't required. GUIManager already has the ability to properly group meshes, all that is needed is a shader.
  - When specifying GUIElement layout using (for example) GUILayoutOptions::expandableX then it would be useful if I didn't have to provide the Y height, and instead make it use 
    the default value for that GUIElement type. For most elements I will only be changing width, and height will remain default (labels, buttons, toggles, drop down lists, etc.) 
	so this would be helpful in a way I wouldn't need to look up actual element height in EngineGUI.
  - GUI currently ignores tooltips in GUIContent. 
  - A way to initialize BansheeEngine without RenderSystem or any kind of UI. So that it may be used for server building as well.
  - Calls to "initialize()" should be protected. For example GUIWidget and all its derived classes require the user to separately call initialize() after construction.
    However a much better option would be to wrap construction and initialization in a create() method and make initialize protected.
	  - GUIWidget needs to be added using addComponent which makes "create()" method not practical. However I have trouble seeing the need for initialize(),
	    it could be replaced with a variable parameter version of addComponent.
  - Profiling: Create an easy to browse list of all loaded Resources (similar to Hierarchy in Unity, just for loaded resources)
    - Possibly also for all core objects
  - GUI ignores image in GUIContent for most elements
  - Each view (i.e. camera) of the scene should be put into its own thread
  - How do I handle multiple mesh formats? Some files need animation, other don't. Some would mabye like to use QTangent, others the proper tangent frame.
    - Asset postprocessor? Imports a regular mesh using normal importers and then postprocesses it into a specialized format?
  - Load texture mips separately so we can unload HQ textures from far away objects (like UE3)
  - Add Unified shader so I can easily switch between HLSL and GLSL shaders (they need same parameters usually, just different code)
    - Maybe just add support for Cg and force everyone to use that? - I'd like to be able to just switch out renderer in a single location and that everything keeps on working without 
	  further modifications.
  - Remove HardwarePixelBuffer (DX11 doesn't use it, and DX9 and OpenGL textures can be rewritten so they have its methods internally)
  - Multihead device
  - Don't forget to check out Unity DX11 documentation on how to implement DX11 features (http://docs.unity3d.com/Documentation/Manual/DirectX11.html)
  - Go to Game Engine Architecture book and make a list of Utility systems we will need (Config files, Parsers, File I/O etc)
  - Go to GEA book and read about resource managers before implementing them
    - Actually I should re-read most of the chapers in the book, or all of it
  - OpenGL non-Win32 window files haven't been properly parsed or tested
    - Since I probably can't compile them, try adding them to VS and see what intellisense says?
  - Textures and all other buffers keep a copy of their data in system memory. If there are memory constraints we might need a way to avoid this.
  - Make sure my Log system uses XML + HTML
  - There is an issue that custom-UIs won't have their mesh shared. For example most game UIs will be advanced and will 
   likely use on GUIWidget per element. However currently I only perform batching within a single widget which 
   doesn't help in the mentioned case.
  - Later add InputMap class in which you can bind certain actions (like move left, fire, etc.) to Keyboard, Joystick or Mouse buttons.
    - Also ensure button combinations are possible. e.g. on keyboard I might want to press F1 to open debug menu, but on joystick it might be A+B+X
  - Add a field that tracks % of resource deserialization in BinarySerializer
  - Add GL Texture buffers (They're equivalent to DX11 buffers) - http://www.opengl.org/wiki/Buffer_Texture
  - Instead of doing setThisPtr on every CoreGpuObject, use intrusive shared_ptr instead?
  - I should consider creating two special Mesh types:
     StreamMesh - constantly updated by CPU and read by GPU
     ReadMesh - written by GPU and easily read by CPU
	  - OpenGL especially has no good way of reading or streaming data. It has special STREAM and COPY buffer types which I never use.
  - (EXTREMELY LOW PRIORITY) Scripting: It might be good to make Mono classes more generic and move them to BansheeEngine. 
      e.g. MonoClass -> ScriptClass, where ScriptClass is just an abstract interface. Then I don't expose any Mono stuff to actually script libraries like 
	  SBansheeEngine. User could then fairly easily port the system to another scripting language just by implementing another ScriptSystem. 
    - This would probably come with an overhead of at least one extra function call for each script call, which is currently unacceptable 
	  considering that most people will definitely won't be writing new script systems.
  - Perhaps add code generation functionality to the engine through Mono? I know Mono has support for it though its Embed interface
  - Add instancing to engine: All I really need to add are per-instance attributes in VertexData (and MeshData). And then RenderSystem::renderInstance method that also accepts an instance count.
  - Debug console
    - Add ability to add colors tags like <color=#123>
	- When showing a debug message, also provide a (clickable?) reference to Component it was triggered on (if applicable)
	  - It really helps when you get an error on a Component that hundreds of SceneObjects use
	- When displaying an error with a callstack, make each line of the callstack clickable where it opens the external editor
  - std::function allocates memory but I have no got way of using custom allocators as I'd have to wrap std::bind and that seems non-trivial
  - Add a TaskScheduler profiler that neatly shows time slices of each task and on which thread they are run on

Potential optimizations:
 - bulkPixelConversion is EXTREMELY poorly unoptimized. Each pixel it calls a separate method that does redudant operations every pixel.
 - UI shader resolution params for gui should be in a separate constant buffer

----------------------------------------------------------------------------------------------
More detailed thought out system descriptions:

 <<<<Memory allocation critical areas>>>>
 - Binding gpu params. It gets copied in DeferredRenderContext
 - GameObjectHandle often allocates its internal data
 - ResourceHandle often allocates its internal data
 - AsyncOp allocates  AsyncOpData internally
 - Deserialization, a lot of temporary allocations going on - But how much impact on performance will allocations have considering this is probably limited by disk read?
 - Creating SceneObjects and Components - I might want to pool them, as I suspect user might alloc many per frame
 - Log logMsg

<<Multithreaded GUI rendering>>
 - Event handling and normal "update" will still be done on the main thread
 - At the beginning of each frame a GUI mesh update is queued on the GUI thread
 - Since we're queuing the update at the beggining of the frame we will be using last frames transform and gui element states.
   - When queing we need to make sure to store GUIWidget transform, and specific element states (e.g. "text" in GUILabel)
 - At the end of simulation frame wait until GUI update is complete. After both simulation and GUI updates are complete, proceed with submitting it to render system.

<<Figure out how to store texture references in a font>>
 - Currently I store a copy of the textures but how do I automatically update the font if they change?
 - Flesh out the dependencies system?
 - I can import texture as normal, and keep it as an actual TextureHandle, only keep it hidden
   if it was created automatically (by FontImporter) for example?
    - But then who deletes the texture?
	- Set up an "internalResource" system where resources hold references to each other and also release them?
	 - In inspector they can be expanded as children of the main resource, but cannot be directly modified?
	 - Deleting the main resource deletes the children too

<<<<SceneManager/SceneObject>>>>
Two major parts:
SceneObject update:
 - Just takes care of updating world/local transforms
  - Transforms are only updated when requested
 - Marking a transform dirty will also mark it dirty in the SceneManager
 - Only SceneObjects that have an Interactable type component (Renderer, Collider, etc.) will reside in SceneManager

SceneManager maintains a list of world matrices, bounds and primitives
 - We can query them for various collision (ray, frustum, etc.)
 - Internally they likely use BVH or oct-tree
 - Uses a binary-tree (unlike SceneObject hierarchy which is n-ary), which is laid out neatly in memory for quick traversal.
   - What happens when object is removed/added? Tree keeps on growing, empty nodes have their space always reserved. Possibly add a method that can reduce tree size when enough empty nodes exist. 
     Adding a node might increase tree size which will involve a memcpy while we increase the size.
 - (Updating an object is SceneManager should optionally transfer its world matrix to its owner object as well)

This separation should work fine, as scripts requesting transforms is unlikely to be something that done often. Or not nearly as much as frustum culling and raycasting will be.

How will Physics update objects (and when?)
 - It's unlikely there will be a massive amount of rigidbodies in the scene, so updating them should not be a huge matter of performance. 
   Calling setTransform after physics simulation update (and before SceneManager update) should work fine.

Updating children dirty whenever setPos/rot/tfrm is called is potentially slow. Can it be avoided?
 - Then again such updates will only be done from the simulation thread, usually from scripts so its unlikely they will be many of them

Do I separate SceneObject and Transform?
 - I don' think I need to

Make sure to let the user know SceneManager only gets updated after simulation, so changes to objects wont be applied right away. 
(See that he doesn't transform something and call Raycast just so it fails)
 - It is unlikely functionality when query results are needed right after transform will be used much. 
   So it is acceptable to implement it like this. We might add SceneManager::forceUpdate method in case it is not acceptable.

!!!BUG!!! - When I change parent I don't update individual local position/rotation/scale on scene object
 - Also I don't have a way of setting world pos/rot directly

<<<<Reducing render state changes>>>>
 - Transparent objects get sorted back to front, always
 - Opaque objects I can choose between front to back, no sort or back to front
 - Then sort based on material-pass combo, rendering all passes of the same material at once, then moving to next pass, then to next material, etc.
   - For transucent objects I need to render entire material at once, and not group by pass
   - Ignore individual state and textures changes, just sort based on material
 - Use key-based approach as described here: http://realtimecollisiondetection.net/blog/?p=86

Questions/Notes:
 1. Could I make use of multiple texture slots so I don't have to re-assign textures for every material when rendering translucent objects pass by pass?
   - When sorting back to front (or front to back) it's highly unlikely that there will be many objects sharing the same material next to the same depth level anyway. 
     So probably ignore this problem for now, and just change the states.

 2. Should sorting be done on main or render thread?
   - Main thread. It's highly unlikely main thread will be using as much CPU as render thread will, so free up render thread as much as possible.

 3. Should oct-tree queries be done on main or render thread?
   - Main thread as I would need to save and copy the state of the entire scene, in order to pass it to the render thread. Otherwise we risk race conditions.

 4. Since render state and shader changes are much more expensive than shader constant/buffer/mesh (and even texture) changes, it might be a good idea to sort based on these,
    instead of exact material? A lot of materials might share shaders and render states but not textures.

 5. This guy: http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D (who's a driver programmer) sorts all opaque objects based on shader/state
    into buckets, and then orders elements in these bucks in front to back order. This gives him best of two worlds, early z rejection and low state changes.

<<<<Input System>>>>
 - Input is currently ignoring all axes except for mouse axes
 - Remove/Improve smoothing
 - Add ability to get raw or smoothed axis input for any axis (currently you can only get mouse X and Y axes)
 - Allow the user to map axes to custom keys. e.g. Left/right axis can have A and D keys where A returns -1, and D 1
 - Add a way to handle multiple devices (e.g. 2 or more joysticks)
 - Hook up OIS joystick callbacks and test joysticks

<<<<DirectDraw>>>>
 - Used for quickly drawing something, usually for debug and editor purposes.
 - It consists of methods like: DrawLine, DrawPolygon, DrawCube, DrawSphere, etc.
 - It may also contain other fancier methods like DrawWireframeMesh, DrawWorldGrid etc.
 - Commands get queued from various Component::update methods and get executed at the end of frame. After they're executed they are cleared and need to be re-queued next frame.
 - Internally DirectDraw manages dynamic meshes so it can merge multiple DrawLine class into one and such. This can help performance, but generally performance of this class should not be a major concern.
 - Example uses for it: 
    - Drawing GUI element bounds when debugging GUI
    - Drawing a wireframe selection effect when a mesh is selected in the scene

<<<<Multithreaded memory allocator>>>>
 - Singlethreaded implementation from Game Engine Gems 2 book
 - Small and medium allocators separate for each thread. Memory overhead should be minimal considering how small the pages are. But performance benefits are great.
 - Large allocator just uses some simple form of page allocation and reuse, using atomics?
 - Must ensure that memory allocated on one thread can only be freed from that thread
 - How do I easily tell which allocator to call based on current thread? Require a thread ID with each alloc/dealloc?
 - Need to think this through more
 
<<<<More on memory allocator>>>>
 - Regarding potentially often allocating large amounts of memory:
  - Ignore this for now. Allocating large amounts (16K+ of memory often probably won't be the case). This will only happen when modifying textures or meshes and I can assume there won't be many of such updates.
    - (But there will be multiple such updates per frame when it comes to GUI meshes for example)
  - However I should implement allocation counter in my allocator so I can know if I have a bottleneck.
  - For those allocations that do hit this limit I should implement a FrameAllocator. Memory is allocated during simulation step and the entire block is cleared when the frame ends.
   - Allocations like copying MeshData, PixelData, PassParams, etc. when queing commands for render thread should all be using this.
   - Problem with such allocator is safety
  - Allocations that are created and deleted in a single function should use a Stack allocator

<<<<Resource changes and reimport>>>>
Use case example:
 - User reimports a .gpuproginc file
   - Dependencies are held by CoreGpuObjectManager. Objects are required to register/unregister
      their dependencies in their methods manually.
 - Editor calls SomeClass::reimport(handle) after it detects a change 
   - this will load a new resource, release the old one and assign the new one to Handle without changing the GUID
   - In order to make it thread safe force all threads to finish what they're doing and pause them until the switch is done
     - Should be okay considering this should only happen in edit-mode so performance isn't imperative
   - reimport is recursively called on all dependant objects as well.

<<<<Handle multithreaded object management>>>:
 - Make everything that is possible immutable. Once created it cant be changed.
  - Example are shaders, state objects and similar
 - Things like Textures, Vertex, Index buffers, GpuParams may be changed
  - Make Vertex/Index buffers and similar only accesible from render thread. Higher level classes like meshes can have deferred methods
  - TODO - How to handle the remaining actually deferred methods? Like Textures?

DirectX11 supports concurrent drawing and resource creation so all my resource updates should be direct calls to DX methods (I'll need a deferred context?)
 - DX9 doesn't so creating/updating resources should wait for render thread?
  - Although these are sync points which kill the whole concept of separate render thread
  - Updating via copy then? (DX11 driver does it internally if resource is used anyway)
 - OpenGL? No idea, need to study GL contexts
 - Although it seems DX11 also copies data when mapping/unmapping or updating on a non-immediate context. So maybe copy is the solution?

So final solution:
 - Copy all data that will be updated on a deferred context
  - Make deferred context have a scratch buffer it can use for storing temporary copied data
 - Immediate context will execute all commands right away
  - This applies when rendering thread calls resource create/update internally
  - Or when other thread blocks and waits for rendering thread
 - Create a simple distinction so user knows when is something executed deferred and when immediate?
  - Move resource update/create methods to DeferredContext?
    - Not ALL methods need to be moved, only those that are resource heavy
    - Smaller methods may remain and always stay async, but keep internal state?
 - Resource creation on DX11 should be direct though, without a queue (especially if we manage to populate a resource in the same step)
 - Remove & replace internal data copying in GpuParamBlock (or just use a allocator instead of new())

A POSSIBLY BETTER SOLUTION THAN COPYING ALL THE DATA?
Classes derive from ISharedMemoryBuffer
 - For example PixelData, used when setting texture pixels
 - They have lock, unlock & clone methods
  - Users can choose whether they want to lock themselves out from modifying the class, or clone it, before passing it to a threaded method
 - Downside is that I need to do this for every class that will be used in threaded methods
 - Upside is that I think that is how DX handles its buffers at the moment


<<<<RenderSystem needed modifications>>>>
  - Texture resource views (Specifying just a subresource of a texture as a shader parameter)
  - UAV for textures
  - Stream out (write vertex buffers) (DX11 and GL)
  - Texture buffers 
   - Just add a special texture type? OpenGL doesn't support getting offset from within a texture buffer anyway
  - Tesselation (hull/domain) shader
  - Detachable and readable depthstencil buffer (Window buffers not required as they behave a bit differently in OpenGL)
  - OpenGL provides image load/store which seems to be GL UAV equivalent (http://www.opengl.org/wiki/Image_Load_Store)
  - Resolving MSAA textures (i.e. copying them to non-MSAA so they can be displayed on-screen). DX has ResolveSubresource, and OpenGL might have something similar.
  - Single and dual channel textures (especially render textures, which are very important for effects like SSAO)
  - Compute pipeline
  - Instancing (DrawInstanced) (DX11 and GL)
  - OpenGL append/consume buffers
  - Indirect drawing via indirect argument buffers
  - Texture arrays
  - Rendertargets that aren't just 2D (Volumetric (3D) render targets in particular)
  - Shader support for doubles
  - Dynamic shader linkage (Interfaces and similar)
  - Multisampled texture resources
  - Multiple adapters (multi gpu)
  - Passing initial data when creating a resource (DX11, but possibly GL too)
  - Sample mask when setting blend state (DX11, check if equivalent exists in GL)
  - RGBA blend factor when setting blend state(DX11, check if equivalent exists in GL)
  - HLSL9/HLSL11/GLSL/Cg shaders need preprocessor defines & includes
  - One camera -> one task (thread) approach for multithreading
   - Also make sure to run off a thread pool (WorkQueue class already exists that provides needed interface)
  - The way I handle rendering currently is to discard simulation results if gpu thread isn't finished.
     - This reduces input lag but at worst case scenario the effect of multithreading might be completely eliminated as
	   GPU ends up waiting for GPU, just because it was few milliseconds late. Maybe better to wait for GPU?


<<<Localization notes for MUCH LATER>>>
 - It would be nice if HString identifier hash was being generated at compile time
 - I still need an easy way to edit the string table (Editor, importer or similar)
 - I might need font localization for non-standard character sets (e.g. russian, greek, asian, etc.)
   - I probably don't want to use one huge set of textures containing both latin and asian characters but want to keep them separate
   - Also asian sets might be too large for textures, in which case generating them at runtime might be necessary (or parsing string table and
     generating textures from only used characters)