Browse Source

Create rfc-77-metadata-asset-relocation.md

Signed-off-by: amzn-mike <[email protected]>
amzn-mike 2 years ago
parent
commit
e6691c051b
1 changed files with 297 additions and 0 deletions
  1. 297 0
      rfcs/rfc-77-metadata-asset-relocation.md

+ 297 - 0
rfcs/rfc-77-metadata-asset-relocation.md

@@ -0,0 +1,297 @@
+### Summary:
+
+This proposal is to add metadata files (files that exist alongside with .meta appended to the name) which store a generated asset UUID (see 1.5.6 for generation details) along with other generic information about the asset, allowing the asset to be moved/renamed.  Since the UUID is stored on disk, and checked into source control, the two files can be moved around together without the UUID being lost.  Previously the UUID had to be generated on-request and was based on information that could change: the path of the file.  By saving the UUID in a metadata file, we provide a stable UUID which will not change regardless of any modifications made to the file  The requirements from the user here will be to make sure the metadata file stays with the file it describes (that means moving/renaming them together and checking them into source control together).  For system designers, they will be required to reference assets using UUIDs to make their system compatible with asset relocation.
+
+### What is the relevance of this feature?
+
+Our current solution for this is a command line find-and-replace tool but this has downsides:
+
+Users aren't aware the tool exists and it provides the only supported method of moving files without breaking references
+May not handle updating references contained in all file types correctly (updating binary files in particular is a problem as a generic find-and-replace generally won't work, requiring asset-type-specific handling)
+May require checking out and editing many files just to move/rename a single file
+Requires every file referencing the file to move/rename be on disk, making this unusable in a modular engine where not all assets are available to a single user or where projects are too large to fit on users local storage.
+
+Metadata files are intended to provide an improved solution that will allow users to rename/move files in a more natural way with less issues.
+
+### Feature design description:
+* Asset Processor will be updated to read/create metadata files for each source asset it processes.  Metadata files will store, at a minimum, a canonical UUID (see 1.5.6 for generation details) and a legacy UUID (based on the path at the time of creation).
+* Metadata support will be enabled for each asset type one-at-a-time as asset types are verified to fully support relocation.
+  * This will require ensuring UUIDs are used for all references to the asset type.
+* AP will continue to support the legacy UUID and override system side-by-side for asset types which do not have metadata support enabled.
+* Intermediate assets will have their UUID scheme changed to be a hash of (SourceUUID:BuilderUuid:SubId) unless the builder overrides this by creating a metadata file with a UUID assigned. 
+  * This provides a stable UUID which is unique to each builder and builder output and allows relocation of sources which generate intermediates without having to store metadata for the intermediates.
+* Interfaces will be provided at the AzToolsFramework level for creating/reading metadata UUIDs
+* AP will interface with source control to add/modify metadata files which are generated by AP
+
+#### Example Workflows
+1. Asset of supported type is created by user
+    1. Asset processor detects the new asset, images/floor.png, and begins processing it
+    1. AP tries to read the UUID from the metadata file and doesn't find it
+    1. AP generates a new UUID and saves it to a metadata file, images/floor.png.meta
+    1. AP marks the newly generated metadata file for add to source control (but does not automatically commit the add)
+    1. AP continues processing the asset as normal, and saves it to the database using the generated UUID
+1. Asset of supported type is created by tool using API
+    1. O3DE tool generates a UUID and saves it to the metadata file where it intends to save the source, images/wall.png.meta
+    1. Tool uses UUID to listen to asset event bus to be notified when the product asset for its newly saved source asset is ready
+    1. Tool saves source asset to disk, images/wall.png
+    1. AP detects new asset and reads the UUID from the metadata file
+    1. AP processes the asset as usual and saves it to the database using the provided UUID
+    1. AP sends AssetAdded notification, tool receives and knows the asset has finished processing
+1. User moves and renames a supported asset type + metadata file pair
+    1. images/floor.png and images/floor.png.meta are moved and renamed to textures/wood.png and textures/wood.png.meta.
+    1. AP detects the old file as a delete and removes the asset and database entry.
+    1. AP detects the new file as an add and processes it, using the UUID from the metadata file.
+    1. Since the UUID is read from disk and is the same as before, all files that previously referenced the asset continue to work as though nothing has changed.
+
+### Technical design description:
+#### Key Design Tenets
+* Allow moving and renaming of files without breaking references
+* Allow enabling metadata-based UUIDs on a per asset-type basis
+* Making moving/renaming files as easy as possible
+
+#### Background - How files are referred to and the overrides system
+
+Internally, AP has 3 different ways of referring to a file:
+* by absolute path
+* by scanfolder-relative path
+* by scanfolder (path or ID) + scanfolder-relative path.
+
+In the 2nd case, AP resolves the actual file on disk by calling FindFirstMatchingFile, which resolves the relative path to an absolute path by looping every scanfolder in priority order, returning the first one that matches an existing file on disk.
+
+UUIDs for files are currently assigned based on case 2, the scanfolder-relative path.  That means that files in the same relative path across different scanfolders are assigned the same UUID.
+
+Since having multiple files with the same UUID makes it impossible to resolve a UUID to a specific path, the AP has an overrides system.  This system works by breaking case 1 down into case 3, and then calling FindFirstMatchingFile on the scanfolder-relative path.  The returned path is assumed to be the highest priority file and is the one that gets processed.  All other files with the same relative path are never processed and never assigned a UUID.  When the highest priority file is deleted, the next highest priority file is assigned the same UUID and gets processed.
+
+The key point here is that AP relies on being able to resolve relative paths to absolute paths under the assumption that only the first match matters.
+
+#### Phase 1 - Updating the source dependency system
+
+In order to support metadata UUIDs, work must be done to update AP systems to refer to files in a non-ambiguous way using UUIDs, absolute paths, or scanfolder ID + relative path.  The first of these systems is the source dependency system.  Currently the SourceDependency database table stores all references as RelativePath (Source) → RelativePath (DependsOnSource).
+
+The database will need to be updated to store UUID (Source) → UUID/Relative path/Absolute path (DependsOnSource) where DependsOnSource is stored in whatever format is provided by the builder.  The reasoning here is that SourceDependencies can refer to files which don't exist yet and also files which are not source assets, so it should be left up to builders to determine the best format to use (with UUID being preferred)
+
+This change will remove the built-in ability to filter queried dependencies (since UUIDs can't be filtered by file name) but this feature is only used by the Slice Dependency Browser, which is not accessible when using Prefabs.
+
+#### Phase 2 - Talking in absolute paths
+
+The second piece of supporting work is to refactor the internal AP systems to refer to every file using either case 1 or case 3 above (**in memory only**, this doesn't involve storage on disk).  Creating a SourceAssetPath type which can provide all 3 pieces of a path instead of a string would make the intent of the string more clear and make it easier to provide the required path type (case 1 or case 3) to consumers.
+
+Reasoning - metadata UUIDs can't function when AP makes the assumption that it can talk about files using case 2.  Consider the following example:
+
+> The following 2 files exist on disk.  GemA and GemB are two different scanfolders.  A metadata file is created for both and each file is assigned a UUID:
+> GemA/textures/base/blue.png - UUID = 1234
+> GemB/textures/blue.png - UUID = 4567
+> GemA/textures/base/blue.png is now moved to GemA/textures/blue.png.
+> Both files now have the same relative path (textures/blue.png) but different UUIDs. 
+
+This creates a problem:  If the AP assumes it can just process the first file, assets which reference UUID 4567 will suddenly stop working.  Therefore AP must process both files, and to do so, it must not refer to files using case 2.
+
+##### Side-by-side support
+
+One of the design goals is to enable metadata UUIDs on a per asset-type basis.  This means AP needs to continue to support the overrides system for files which still use a legacy, path-based UUID.
+To solve this, files with a metadata UUID will always be processed.  Files without a metadata file (which should be those which have not enabled metadata UUID support yet) will continue to use a path-based UUID and the override system to resolve the highest-priority scanfolder file.
+
+##### Cache Conflicts
+
+Another part of the reason for the overrides system is because all files get copied into the same relative path in the cache.  In the above example, both files would output to Cache/textures/blue.png.
+To handle this, when outputting a product, if it is the highest priority, it outputs as usual to maintain compatibility with path-based loading systems, otherwise the file will have the scanfolder ID append to the end of it. 
+
+Example:
+* GemA/textures/blue.png → Cache/textures/blue.png
+* GemB/textures/blue.png → Cache/textures/blue.png.3
+
+We have to use the ID because the DisplayName and Portable Key for a scanfolder can and do contain illegal characters like /
+Systems expecting to load assets by path will still be able to grab the highest priority file at the expected path.
+
+#### Phase 3 - Adding metadata file support
+
+Each file will have its own metadata file alongside it, having the same path, with .meta appended to the file name.
+Example:
+* File: textures/blue.png
+* Metadata: textures/blue.png.meta
+
+The file will be a simple JSON file containing any generic information associated with the asset, including the UUID.
+
+We'll need to add a component and interface to the AzToolsFramework project that allows read/writes of generic objects/data to a key path.  The proposed interface would look like so:
+```c++
+bool Set(AZ::IO::PathView absoluteFilePath, string_view key, const void* value, Uuid typeId);
+bool Get(AZ::IO::PathView absoluteFilePath, string_view key, void* value, Uuid typeId);
+```
+This system will function similar to the settings registry, using a JSONPath as the key and storing the values in a JSON file.
+The typeId is used for serialization purposes only and will not be stored in the file.  A simple example file may look like such:
+```
+{
+    "FileVersion": 1,
+    "UUID": {
+        "version": 1,
+        "uuid": "{AA9DC96B-DBDB-4ADF-848B-14E5E562E209}",
+        "legacyUuid": "{CA90298C-DC6D-52F1-B09E-23D209F7F965}"
+    }
+}
+```
+
+The JSON file will have an overall `FileVersion` for the metadata file format itself and each key can optionally add a version for the data being stored.  It will be up to the individual systems storing the data to handle that version.  The UUID system will include a version for the data it stores.
+#### Phase 4 - UUIDs and management
+
+The AP will be primarily responsible for assigning a UUID to each asset through a UUID manager.  When a new asset is created, a metadata file will be created containing the follow information:
+* A primary (canonical) UUID
+    * AP will generate UUIDs randomly.
+    * This is preferred over deterministic methods (such as using the hashed contents, path, or combination) as the default as it has the least likelihood of resulting in duplicate UUIDs.  Having content-hash or path based UUIDs be the first attempt assignment could result in conflicts when files are assigned UUIDs by different users and then later both exist on the same system.
+    > * UserA creates file GemA/images/red.png and it gets assigned UUID = HASH(images/red.png) = 1234. 
+    > * UserB creates file GemB/images/red.png and it gets assigned UUID = HASH(images/red.png) = 1234. 
+    > * UserB submits his gem to source control, UserA gets latest from source control.
+    > * UserA now has 2 files on disk that both have a canonical UUID of 1234 and a legacyUUID of 1234
+    > * AP is left with no way of resolving this conflict as both UUIDs are the same and assigning a new UUID to one of the files could mistakenly cause references intended for GemB to point to the file in GemA (or vice versa).
+    
+    * Once a conflict occurs in this situation, there is no good way out of it without redoing work.  Random UUIDs should avoid this situation as much as possible.
+* A legacy UUID
+    * This will be a path-based UUID based on the path at the time of creation of the primary UUID.  This allows for existing assets to resolve their existing path-based UUID references without breaking.  AP already has facilities for providing legacy UUIDs.
+
+The UUID manager will hook into the APs asset scanner to monitor for changes in metadata files.  When one is updated, it will update its cache and trigger a reprocess of the old and new asset UUIDs.
+* This is needed to support cases where a file is moved and a new file created at the old path with a new UUID, then checked into source code and pulled down by someone else.  It will appear as just a UUID update for the 2nd user.
+
+##### Incremental support
+
+We'll need to enable metadata UUID support per-asset to allow incremental support.  The settings registry will be used to list out supported asset types.  When the AP encounters a file of the supported type which does not have a metadata UUID, it will create one for that file (assuming the type has been migrated, see migration plan).
+
+##### Intermediate assets
+
+Intermediate assets are currently assigned a new UUID for each output to avoid subId conflicts.  This presents a problem because intermediate assets are not checked in and we cannot expect to check in metadata files for them.  They need to support relocation though just like their source assets.  If the source asset is moved, it will cause the intermediate to move, changing its UUID.
+To handle this, we'll change the UUID generation scheme for intermediate assets to be HASH(SourceUuid:BuilderUuid:SubId).  This will provide a stable UUID which unique to each builder and builder output.
+For backwards-compatibility, AP will provide the path-based UUID as a legacy UUID using existing facilities for this.
+
+##### External tool UUID interaction
+
+We'll want to provide tools like the editor and material builder facilities for creating and getting the UUID for a given path.  This allows tools to have (reserve) a UUID for an asset before the asset is created.
+
+An interface and component should be provided at the AzToolsFramework level to handle UUID read/writing with the option for generating a UUID or specifying one. Ex:
+```c++
+AZ::Uuid GetSourceAssetId(AZ::IO::PathView absoluteFilePath);
+bool CreateSourceAssetId(AZ::IO::PathView absoluteFilePath, AZ::Uuid);
+AZ::Uuid CreateSourceAssetId(AZ::IO::PathView absoluteFilePath);
+```
+##### Conflict resolution
+
+When creating a UUID it should be verified to not conflict with any existing UUIDs, if it does, generate another random UUID.
+
+If a conflict is created externally (either by external tools or simply duplicating a file), the AP should refuse to process the files with the duplicated UUID and provide an error so the user can resolve the duplicate appropriately.
+
+##### Name casing mismatch
+
+Some operating systems (linux) are case-sensitive while others (windows) are not.  Users may not take care to ensure metadata files and the owning file are given the same case, so AP will need to watch for and correct metadata filenames which do not match the owning file's case.  AP does not support having multiple files with the same name with different case anyway (due to needing to always support all platforms) so this does not introduce any new limitation.
+
+##### Orphaned files and metadata files
+
+Users may forget to move/rename metadata files when moving/renaming files.  While AP is running, the file watcher can be used to detect file changes that occur which are not applied to the metadata and update the metadata automatically (files can be matched up by checking the content hash which AP already stores).  Since this could be undesirably behavior at times, we should add an option to AP GUI to disable this.
+
+##### Running AP on server builds
+
+Sometimes its not desirable to have AP generate metadata files, such as when running AP on Jenkins.  For this case, we'll add a setreg option to entirely disable the automatic generation of metadata files.
+
+##### File moving race conditions
+
+Its possible the AP could detect a "new" file and try to create a metadata file for it before the OS (in a copy operation) or source control has a chance to create the existing metadata file on disk, which would cause UUID conflicts.  To avoid this, when the file monitor notices a new file without a metadata associated with it, it should wait for a brief period before sending the file for processing.
+
+### What are the advantages of the feature?
+* Renaming/moving is supported by every file system tool.
+* Users don't need to learn (about) any new interface, they only need to know to keep the metadata files together with the parent file.
+* Renaming/moving folders requires zero extra effort/knowledge.
+* Does not require any files to be checked out except the ones being moved/renamed (this does include the metadata file, which needs to be moved/renamed as well)
+* We gain a unified way of storing general data about an asset instead of having many different bespoke designs.
+* O3DE tools do not need to communicate with the AP in order to create an asset and know its ID, they can just create a metadata file with the UUID in it.
+  * This applies to external tools like blender as well - plugins can be created that don't need to communicate with the AP if they need to use UUIDs (or want to provide their own system for generating UUIDs)
+* This solution is not new so there is some existing exposure to and understanding of this type of system.
+
+### What are the disadvantages of the feature?
+* Users need to understand what metadata files are and be sure to keep them alongside the owning file
+* The system only works with UUID references
+  * Many existing systems do not use UUIDs and may have difficulty converting to use them (for example lua includes)
+  * This may mean supporting legacy behavior forever if other solutions are not found
+  * There is a proposal for cleaning up some of the path-based reference usages
+* Source asset folders are "polluted" with metadata files everywhere
+* Metadata files must be generated by a single user to avoid conflicting UUIDs being generated
+* AP will be generating files in source folders
+
+### How will this be implemented or integrated into the O3DE environment?
+
+#### Migration
+
+Since metadata UUIDs may not be deterministically generated, it is important only 1 user of a project create the metadata files and commit them to source code to avoid conflicts.  Because this work will be done in phases, enabling one asset type at a time, a single project may require multiple upgrade passes, especially when new gems are enabled.  Since we'll be supporting the legacy path-based UUIDs side-by-side, we can take advantage of this not requiring users to upgrade immediately and instead let them upgrade on-demand by providing a UI with a list of asset types to populate metadata for.  Note that this is not designed to be a setting, its a one-way upgrade since removing metadata files can break references.
+
+AP will need to assess which scanfolders have been upgraded for which asset type in order to provide feedback to the user.  During the startup file scan phase, AP will take note for each scanfolder if there are any metadata files for each asset type.  If no metadata files are found for any files of a given type, it will assume that type has not been migrated and show it as such.
+
+### Are there any alternatives to this feature?
+
+Below are some alternative options that were considered.  All of these options require a special tool to handle relocation
+
+* Centralized metadata
+  * Pros
+    * Only 1 file to keep track of
+  * Cons:
+    * More merge conflicts
+    * Requires a tool for relocation to occur or tedious hand editing (more time consuming than individual metadata files)
+    * Only supports UUID references
+    * File may become very large
+* Grouped metadata files (1 metadata per folder or other such grouping)
+  * Pros
+    * Fewer metadata files than individual metadata
+    * Less merge conflicts than centralized metadata
+  * Cons
+    * More merge conflicts than individual metadata
+    * Requires a tool for relocation to occur or tedious hand editing (more time consuming than individual metadata files)
+    * Only supports UUID references
+    * Files can still become large
+* Metadata files mirrored in separate (hidden) directory
+  * Pros
+    * Clutter is not visible to user
+    * No merge conflicts
+  * Cons
+    * Requires a tool for relocation to occur or can be manually adjusted by knowledgeable users with extra effort (more time consuming than individual metadata files)
+    * Easier to forget to add to source control
+    * Moving/renaming folders requires updating the metadata files
+    * Only supports UUID references
+* Breadcrumbs (leaving markers redirecting old locations to new locations
+  * Pros
+    * Only relocated files need additional files on disk
+    * Works with path-based references
+  * Cons
+    * Requires a tool for relocation to occur
+    * Could run into issues with multiple files being "previously known as" the same name
+    * Can cause slowdowns having to resolve many steps
+    * Files that have been relocated a lot will have many breadcrumbs
+    * Trying to condense/cleanup the breadcrumbs devolves into the find-and-replace case below
+* Find-and-replace (existing tool)
+  * Pros
+    * No extra files on disk
+    * Works with path-based references
+  * Cons
+    * Requires a tool for relocation to occur
+    * Requires having every referenced file on disk to do a relocation
+    * May fail to update some references
+* Zipped file pairs (source + meta stored in zip file)
+  * Pros
+    * Metadata and file data are bound together, ensuring all operations are applied to both files
+  * Cons
+    * Modifying the source file becomes complicates, requiring extracting from the zip manually or monitoring changes to the file and updating the zip
+    * Zip file may prevent merging from source control
+    * May cause users to not trust source file is up-to-date
+    * Can cause silent overrides of work trying to sync zip-source with extracted source
+
+### How will users learn this feature?
+
+The feature will need a new documentation page to explain to users what metadata files are for, why they're important, and how to manage them.
+
+AP will need a tab to manage migrations, which can include an explanation of metadata files and link to documentation
+
+We could present a message on first use with a link to documentation (with a setting saved to the system registry to avoid showing again)
+
+Only the old find-and-replace tool's page should need an update with a note that metadata files are the preferred way of moving/renaming supported file types
+
+### Are there any open questions?
+* Should metadata files be optional per-file, with the option for users to generate them on-demand?
+  * Pros: If an artist checks in a bunch of files and never runs the AP, other users that start using them don't risk generating multiple, conflicting UUIDs
+  * Cons: Legacy UUIDs are likely to stick around (making the proper handling of them very important, and increasing likelihood of edge case issues) and people need to take an extra step to ensure all metadata files are generated before trying to move/rename files
+ * Should metadata files merge with .assetinfo files?
+ * How do we expand support to the entire engine?  Who will take on this work?
+ * Will creating a metadata file for every asset cause linux based systems to run out of inodes?
+ * How can we mitigate issues with different AssetIds being generated for the same file by different users?