OTIO 2D-Annotations Interchange specification.
This version: May, 2025 v2
Previous version: OTIO Annotations - Review-1
Sam Richards
Introduction
The goal of this format is to facilitate interchange of annotations and notes between different review systems or provide a very simple offline review format for a simple review system.
The user stories are defined in Review and Annotation User Stories and Requirements.
We can break those stories into three categories:
2-D annotations on top of movies or 2-D imagery.
2/D and 3/D annotations on top of models and 3-D environments.
Annotations recorded during a live session.
The live session needs to be an extension of https://lf-aswf.atlassian.net/wiki/spaces/PRWG/pages/11274625/OTIO-Based+Synchronized+Review+Messaging#Annotation-Schemas - Can’t find link.
Similarly model and environment annotations should be handled as a future proposal.
For this document we are going to focus on 2-D annotations on top of movies or 2-D imagery, integrating the result into OTIO.
Why OTIO
The goal is to have an application neutral format for sharing annotations between applications. This format should work both within a facility and between vendors and clients even if they use different production management tools.
OTIO provides a framework for tracking a list of media assets, it is highly extensible so adding additional metadata schemas on-top is fairly straightforward, and some of the more recent developments will work well with the requirements for annotations, such as:
Spatial coordinate system - https://github.com/AcademySoftwareFoundation/OpenTimelineIO/pull/1219
Color Management - https://github.com/AcademySoftwareFoundation/OpenTimelineIO/discussions/1805 and https://github.com/AcademySoftwareFoundation/OpenTimelineIO/discussions/1793
Review syncing protocol - https://lf-aswf.atlassian.net/wiki/spaces/PRWG/pages/11274625/OTIO-Based+Synchronized+Review+Messaging#Annotation-Schemas - Can’t find link
Strong time framework - https://opentimelineio.readthedocs.io/en/stable/tutorials/architecture.html#otio-opentime
MultiMediaReference - MultiMediaReference
OTIOZ files as a way to bundle movie deliverables with notes for review, and for feedback the annotation images.
Terminology
Annotation – A layer over a 2D or 3D object, typically a drawing, for a single point in time, Typically used to give feedback, but can also be used to identify a point of interest in a piece of reference.
Notes – Are text based notes which can be associated with the overall media being reviewed, but can also be associated with a single frame.
Vendor – a vendor is a company who is creating some but not all of the content.
Client – The client for the vendor, all media has its final reviews here. Note, there may be still multiple review levels within client studio, e.g. VFX Supervisor and then Director for final buy-off. Also the client studio may have multiple vendors.
Track based annotations
Our proposal is to take advantage of the overlaying functionality of tracks, to have each review be a different track. The Annotation track ideally could act as a conventional timeline (for backwards compatibility with legacy systems), where each annotation frame is defined using a Clip.2 schema with a still frame with alpha (using a PNG file).
Additionally, each Annotation-frame node, would also have a representation of the annotation as a vector timeline, using the ANNOTATION_1.0 timeline defined by the sync messaging - https://lf-aswf.atlassian.net/wiki/spaces/PRWG/pages/11274625/OTIO-Based+Synchronized+Review+Messaging#Annotation-Schemas - Can’t find link
Example Annotation Frame
The annotation is shown within the clip for the specified frame.
Within the clip, so a clip might look like:
{
"OTIO_SCHEMA": "Clip.2",
"metadata": {
},
"name": "Chimera_DCI4k5994p_HDR_P3PQ_%06d",
"source_range": {
"OTIO_SCHEMA": "TimeRange.1",
"duration": {
"OTIO_SCHEMA": "RationalTime.1",
"rate": 60.0,
"value": 1.0
},
"start_time": {
"OTIO_SCHEMA": "RationalTime.1",
"rate": 60.0,
"value": 44200.0
}
},
"effects": [],
"markers": [],
"enabled": true,
"media_references": {
"DEFAULT_MEDIA": {
"OTIO_SCHEMA": "ImageSequenceReference.1",
"metadata": {},
"name": "",
"available_range": {
"OTIO_SCHEMA": "TimeRange.1",
"duration": {
"OTIO_SCHEMA": "RationalTime.1",
"rate": 60.0,
"value": 1.0
},
"start_time": {
"OTIO_SCHEMA": "RationalTime.1",
"rate": 60.0,
"value": 44200.0
}
},
"available_image_bounds": null,
"target_url_base": "annotation_overlay",
"name_prefix": "annotation_clip1",
"name_suffix": ".png",
"start_frame": 1,
"frame_step": 1,
"rate": 60.0,
"frame_zero_padding": 4,
"missing_frame_policy": "error"
}
},
"annotations": {
"OTIO_SCHEMA": "ANNOTATION_1.0",
"author": "Sam Richards",
“canvas_size”: [1920, 1080],
"creation_timestamp": "2025-01-31T16:14:00Z",
"annotation_note": "This is final",
“annotation_renderer”: “RV-7.1”,
"annotation_commands": [
{
"event": "PAINT_START",
"payload": {
"source_index": 0,
"paint": {
"OTIO_SCHEMA": "Paint.1",
"points": [ ],
"rgba": [ 1.0, 1.0, 0.0, 1.0 ],
"type": "COLOR",
"brush": "circle",
"visible": true,
"name": "Paint",
"effect_name": "Paint",
"layer_range": {
"OTIO_SCHEMA": "TimeRange.1",
"start_time": {
"OTIO_SCHEMA": "RationalTime.1",
"value": 0.0,
"rate": 30.0
},
"duration": {
"OTIO_SCHEMA": "RationalTime.1",
"value": 1.0,
"rate": 30.0
},
},
"hold": false,
"ghost": false,
"ghost_before": 3,
"ghost_after": 3
}
}
}
}
}To break it down, each annotation is typically for a single frame for a clip (although could also be for a range of frames or the whole clip), if there is an actual annotation, it should point to a frame, typically a PNG file (TODO Determine if there is a better spec), which needs to have an alpha channel, so that the frames can be overlaid on top of the actual media.
In the annotations block you can have the following fields:
author - Who is making the annotations, possibly based on the username, but this might need to be overwritten for some reviews.
creation_timestamp - When was this annotation created using ISO 8601 (e.g. 2017-05-16T10:30:56+01:00).
The original_frame_number of the clip. Note this is only if start_frame_number (see below) is defined.
Canvas_size: An array for the width and height of the canvas being drawn on (see below for examples).
annotation_note - This is a note associated with this frame (and there might not be an actual image, it might be just the note), we recommend that this at a minimum support Markdown (See below). NOTE, this note is off-screen, and is in addition to any captioned note (see below).
clip_uuid (optional) - A UUID provided by the vendor to help ensure that we are annotating the right clip. This can be any string the vendor likes, provided that its unique within the vendors facility. See below.
status (optional) - This can be a single string, or an array of strings. If it's an array of strings, this is used to denote the possible values of the status. This would be used in the case of a client updating the status to pass back to the vendor, and the vendor has their own definitions of statuses.
annotation_renderer - Which protocol was used to render the annotation. While we should strive to have annotation renderers that look the same most of the time, its possible there may be proprietary renderers where the brush strokes may not be easily emulated by other renderers. This makes it more likely that you would want to use the pre-rendered overlay.
annotation_commands (optional) - A list of commands that can be used to re-create the annotation using the sync command set.
Annotation colorspaces
The annotations are assumed to be in sRGB colorspace, however the hope is that the results of the ASWF color-interop-forum will eventually be merged into OTIO to make it clear what the colorspace of each set of media is, to help with their later combining.
An example of why this is important is that an annotation color could be defined by picking a color off the media, and then painted over part of the screen. If the wrong colorspaces are used, we could easily end up with a different color making the artistic intent wrong.
Text Formatting and Tagging.
As mentioned above we recommend using Markdown for text general formatting, but additionally recommend
Hashtags (#tag):
This is a very common convention, borrowed from social media.
Example: This is a note about #project-alpha and #meeting-notes.
Mentions (@user):
Allow reviewer to flag a particular user.
Standard Markdown processors will just render this as text.
Example: Hey @jane-doe, can you review this? #review-request
TODO: Should there be other conventions, e.g. to refer to other media?
E.g.: You could have a variable hashtag that refers to underlying media metadata, such as: