# Build pipeline
This document describes the Build pipeline in Stride, its current implementation (and legacy), and the work that should be done to improve it.

## Terminology

* An **Asset** is a design-time object containing information to generate **Content** that can be loaded at runtime. For example, a **Model asset** contains the path to a source FBX file, and additional information such as an offset for the pivot point of the model, a scale factor, a list of materials to use for this model. A **Sprite font asset**  contains a path to a source font, multiple parameters such as the size, kerning, etc. and information describing in which form it should be compiled (such as pre-rasterized, or using distance field...). **Asset** are serialized on disk using the YAML format, and are part of the data that a team developing a game should be sharing on a source control system.

* **Content** is the name given to compiled data (usually generated from **Asset**s) that can be loaded at runtime. This means that in term of format, **Content** is optimized for performance and size (using binary serialization, and data structured in a way so that the runtime can consume it without re-transforming it). Therefore **Content** is the platform-specific optimized version of your game data.

## Design

Stride uses *Content-addressable storage* to store the data generated by the compilation. The main concept is that the actual name of each generated file is the hash of the file. So if, after a change, the resulting content built from the asset is different, then the file name will be different. An index map file contains the mapping between the content *URL* and the actual hash of the corresponding file. Parameters of each compilation commands are also hashed and stored in this database, so if a command is ran again with the same parameters, the build engine can easily recover the hashes of the corresponding generated files.

### Build Engine

The build engine is the part of the infrastructure that transforms data from the **assets** into actual **content** and save it to the database. It was originally designed to build content from input similar to a makefile. (eg. "compile all files in `MyModels/*.fbx` into Stride models). It has then been changed to work with individual assets when the asset layer has been implemented. Due to this legacy, this library is still not perfectly suited or optimal to build assets in an efficient way (dependencies of build steps, management of a queue for live-compiling in the Game Studio, etc.).

#### Builder

The `Builder` class is the entry point of the build engine. A `Builder` will spawn a given number of threads, each one running a `Microthread` scheduler (see `RunUntilEnd` method).

#### Build Steps

The `Builder` takes a root `BuildStep` as input. We currently have two types of `BuildStep`s:
* A `ListBuildStep` contains a sequence of `BuildStep` (Formerly we had an additional parent class called `EnumerableBuildStep`, but it has been merged into `ListBuildStep`). A `ListBuildStep` will schedule all the build steps it contains at the same time, to be run in parallel. Formerly we had a synchronization mechanism using a special `WaitBuildStep` but it has been removed. We now use `PrerequisiteSteps` with `LinkBuildSteps` to manage dependencies.
* A `CommandBuildStep` contains a single `Command` to run, which does actual work to compile asset.

> **TODO:** Currently, when compiling a graph of build steps, we need to have all steps to compile in the root `ListBuildStep`. More especially, if we have a `ListBuildStep` container in which we want to put a step A that depends on a step B and C, we need to put A, B, C in the `ListBuildStep` container. This is cumbersome and error-prone. What we would like to do is to rely only on the `PrerequisiteSteps` of a given step to find what we have to compile. If we do so, we wouldn't need to return a `ListBuildStep` in `AssetCompilerResult`, but just the final build step for the asset, the graph of dependent build steps being described by recursive `PrerequisiteSteps`. The `ListBuildStep` container could be removed. We would still need to have lists of build steps when we compile multiple asset (eg. when compiling the full game), but it would be nothing that the build engine should be aware of.

#### Commands

Most command inherits from `IndexFileCommand`, which automatically register the output of the command into the command context.

Basically, at the beginning of the command (in the `PreCommand` method), a `BuildTransaction` object is created. This transaction contains a subset of the database of objects that have been already compiled, provided by the `ICommandContext.GetOutputObjectsGroups()`. In term of implementation, this method returns all the objects that where written by prerequisite build steps, and all the objects that are already written in any of the parent `ListBuildStep`s, recursively. The objects coming from the parent `ListBuildStep` are a legacy of when we were using `WaitBuildStep` to synchronize the build steps. This hopefully should be implemented differently, relying only on prerequisite (since no synchronization can happen in the `ListBuildStep itself, everything is run in parallel).

> **TODO:** Rewrite how OutputObjects are transfered from `BuildStep`s to other `BuildStep`s. Only the output from prerequisite `BuildStep` should be transfered. A lot of legacy makes this code very convoluted and hard to maintain.

The `BuildTransaction` created during this step is mounted as a *Microthread-local database*, which is accessible only from the current microthread (which is basically the current command).

At the end of the command (in the `PostCommand` method), every object that has been written in the database by the command are extracted from the `BuildTransaction` and registered to the current `ICommandContext` (which is how the `ICommandContext` can "flow" objects from one command to the other.

It's important to keep in mind that objects accessible in a given command (in the `DoCommandOverride`) using a `ContentManager` are those provided during the `PreCommand` step, and therefore it is important that dependencies between commands (what other commmands a command needs to be completed to start) are properly set.

### Compilers

Compilers are classes that generate a set of `BuildStep`s to compile a given `Asset` in a specific context. This list could grow in the future if we have other needs, but the current different contexts are:
- compiling the asset for the game
- compiling the asset for the scene editor
- compiling the asset to display in the preview
- compiling the asset to generate a thumbnail

#### IAssetCompiler

This is the base interface for compiler. The entry point is the `Prepare` method, which takes an `AssetItem` and returns a `AssetCompilerResult`, which is a mix of a `LoggerResult` and a  `ListBuildStep`. Usually there are two implementations per asset types, one to compile asset for the game and one to compile asset for its thumbnails. Some asset types such as animations might have an additional implementation for the preview.

Each implementation of `IAssetCompiler` must have the `AssetCompilerAttribute` attached to the class, in order to be registered (compilers are registered via the `AssetCompilerRegistry`.

> **TODO:** The `AssetCompilerRegistry` could be merged into the `AssetRegistry` to have a single location where asset-related types and meta-information are registered.

Each compiler provides a set of methods to help discover the dependencies between assets and compilers. They will be covered later in this document.

#### ICompilationContext

> Not to be mistaken with `CompilerContext` and `AssetCompilerContext`.

Contexts of compilation are defined by *types*, which allow to use inheritance mechanism to fallback on a default compiler when there is no specific compiler for a given context. Each compilation context type must implement `ICompilationContext`. Currently we have:

* `AssetCompilationContext` is the context used when we compile an asset for the runtime (ie. the game).
* `EditorGameCompilationContext` is the context used when we compile an asset for the scene editor, which is a specific runtime. Therefore, it inherits from `AssetCompilationContext`.
* `PreviewCompilationContext` is the context used when we compile an asset for the preview, which is a specific runtime. Therefore, it inherits from `AssetCompilationContext`.
* `ThumbnailCompilationContext` is the context used when we compile an asset to generate a thumbnail. Generally, for thumbnails, we compile one or several assets for the runtime, and use additional steps to generate the thumbnail with the `ThumbnailCompilationContext` (see below).

> **TODO:** Currently thumbnail compilation is in a poor state. In `ThumbnailListCompiler.Compile`, we first generate the steps to compile the asset in `PreviewCompilationContext`, then generate the steps to compile the asset in `ThumbnailCompilationContext`, and finally we like the first with the latter. Dependencies from thumbnail compilers (which load a scene and take screenshots) to the runtime compiler (which compile the asset) is **not** expressed at all. It just works now because in all current cases, the `PreviewCompilationContext` does what we need for thumbnails (for example, the `AnimationAssetPreviewCompiler` adds the preview model to the normal compilation of the animation, which is needed for both preview and thumbnail).

### Dependency managers

We currently have two mechanisms that handle dependencies.

> **TODO:** Merge the `AssetDependencyManager` and the `BuildDependencyManager` together into a single dependency manager object. There is a lot of redundancy between both, one rely on the other, some code is duplicated. See `XK-4862`

#### AssetDependencyManager

The `AssetDependencyManager` was the first implementation of an mechanism to manage dependencies between assets. It works independently of the build, which is one of the main issue it had and the reason why we started to develop a new infrastructure.

It is based essentially on visiting assets with a `DataVisitorBase` to find references to other assets. There are two ways of referencing an asset:

- Having a property whose type is an implementation of `IReference`. More explicitely the only case we have currently is `AssetReference`. This type contains an `AssetId` and a `Location` corresponding to the referenced asset.
- Having a property whose type correspond to a *Content* type, ie. a type registered as being the compiled version of an asset type (for example, `Texture` is the Content type of `TextureAsset`).

The problem of that design was that once all the references are collected, there is no way to know of the referenced assets are actually consumed, which could be one of the three following way:

- the referenced asset is not needed to compile this asset, but it's needed at runtime to use the compiled content (eg. Models need Materials, who need Textures. But you can compile Models, Materials and Textures independently).
- the referenced asset needs to be compiled before this asset, and the compiler of this asset needs to load the corresponding content generated from the referenced asset (eg. A prefab model, which aggregates multiple models together, needs the compiled version of each model it's referencing to be able to merge them).
- the referenced asset is read when compiling this asset because it depends on some of its parameter, but the referenced asset itself doesn't need to be compiled first (eg. Navigation Meshes need to read the scene asset they are related to in order to gather static colliders it contains, but they don't need to compile the scene itself).

#### BuildDependencyManager

The `BuildDependencyManager` has been introduced recently to solve the problems of the `AssetDependencyManager`. It is currently not complete, and the ultimate goal is to merge it totally with the `AssetDependencyManager`.

The approach is a bit different. Rather than extracting dependencies from the asset itself, we extract them from the compilers of the assets, which are better suited to know what they exactly need to compile the asset and what will be needed to load the asset at runtime.

But one asset type can have multiple compilers associated to it (for the game, for the thumbnail, for the preview...). So the `BuildDependencyManager` works in the context of a specific compiler.

Currently there is one `BuildDependencyManager` for each type of compiler.

> **TODO:** Have a single global instance of `BuildDependencyManager` that contains all types of dependencies for all context of compilers. For example, we have thumbnail compilers that requires *game* version of assets, which means that the `BuildDependencyManager` for thumbnails will also contain a large part of the `BuildDependencyManager` to build the game. Merging everything into a single graph would reduce redundancy and risk to trigger the same operation multiple times simultaneously.

#### AssetDependenciesCompiler

The `AssetDependenciesCompiler` is the object that computes the dependencies with the `BuildDependencyManager`, and then generates the build steps for a given asset, including the runtime dependencies. It's the main entry point of compilation for the CompilerApp, the scene editor, and the preview. Thumbnails also use it, via the `ThumbnailListCompiler` class.

> **TODO:** This class should be removed, and its content moved into the `BuildDependencyManager` class. By doing so, it should be possible to make `BuildAssetNode` and `BuildAssetLink` internal - those classes are just the data of the dependency graph, they should not be exposed publicly. To do that, a method to retrieve the dependencies in a given context must be implemented in `BuildDependencyManager` in order to fix the usage of `BuildAssetNode` in `EditorContentLoader`.

### In the Game Studio

The Game Studio compiles assets in various versions all the time. It has some specific way of managing database and content depending on the context.

Remark: the Game Studio never saves index file on the disk, it keeps the url -> hash mapping in memory, always.

#### Databases

Before accessing content to load, a Microthread-local database must be mounted. Depending on the context, it can be a database containing a scene and its dependencies (scene editor), the assets needed to create a thumbnail, an asset to display in the preview...

For the scene editor, this is handled by the `GameStudioDatabase` class. Thumbnails and preview also handle database mounting internally (in `ThumbnailGenerator` for example).

> **TODO:** See if it could be possible/useful to wrap all database-mounting in the Game Studio into the GameStudioDatabase class.

#### Builder service

All compilations that occur in the Game Studio is done through the `GameStudioBuilderService`. This class creates an instance of `Builder`, a `DynamicBuilder` which allows to feed the Builder with build steps at any time. Having a single builder for the whole Game Studio allows to control the number of threads and concurrent tasks more easily.

The `DynamicBuilder` class simply creates a thread to run the Builder on, and set a special build step, `DynamicBuildStep`, as root step of this builder. This step is permanently waiting for other child build step to be posted, and execute them.

> **TODO:** Currently the dynamic build step waits arbitratly with the `CompleteOneBuildStep` method when more than 8 assets compiling. This is a poor design because if the 8 assets are for example prefabs who contains a lot of models, materials, textures, it will block until all are done, although we could complete the thumbnails of these models/materials/textures individually. Ideally, this `await` should be removed, and a way to make sure thumbnails of assets which are compiled are created as soon as possible should be implemented.

The builder service uses `AssetBuildUnit`s as unit of compilation. A build unit corresponds to a single asset, and encapsulates the compiler and the generated build step of this asset.

#### EditorContentLoader

The scene editor needs a special behavior in term of asset loading. The main issue is that any type of asset can be modified by the user (for example a texture), and then need to be reloaded. Stride use the `ContentManager` to handle reference counting of loaded assets. With a few exception (Materials, maybe Textures), it does not support hot-swapping an asset. Therefore, when an asset needs to be reloaded, we actually need to unload and reload the *first-referencer* of this asset.

The *first-referencer* is the first asset referenced by an entity, that contains a way (in term of reference) to the asset to reload. For example, in case of a texture, we will have to reload all models that use materials that use the texture to reload.

This is done by the `EditorContentLoader` class. At initialization, this class collects all *first-referencer* assets and build them. Each time an asset is built, it is then loaded into the scene editor game, and the references (from the entity to the asset) are updated. This means that this class needs to track all first-referencers on its own and update them. This is done specifically by the `LoaderReferenceManager` object. The reference are collected from the `GameEditorChangePropagator`, an object that takes the responsibility to push synchronization of changes between the assets and the game (for all properties, including non-references). There is one instance of it per entity. When a property of an entity that contains a reference to an asset (a *first-referencer*) is modified, the propagator will trigger the work to compile and update the entity. In case of a referenced asset modified by the user, `EditorContentLoader.AssetPropertiesChanged` takes the responsibility to gather, build, unload and reload what needs to be reloaded.

## Additional Todos

> **TODO:** `GetInputFiles` exists both in `Command` and in `IAssetCompiler`. It has the same signature in both case, so it's returning information using `ObjectUrl` and `UrlType` in the compiler, where we are trying to describe dependency. That signature should be changed, so it returns information using `BuildDependencyType` and `AssetCompilationContext`, just like the GetInputTypes method. Also, the method is passed to the command via the `InputFilesGetter` which is not very nice and has to be done manually (super error-prone, we had multiple commands that were missing it!). An automated way should be provided.

> **TODO:** The current design of the build steps and list build steps is a *tree*. For this reason, same build steps are often generated multiple times and appears in multiple trees. It could be possible to cache and share the build step if the structure was a *graph* rather than a *tree*. Do to that, the `Parent` property of build steps should be removed. The main difficulty is that the way output objects of build steps flow between steps has to be rewritten.