API User Guide
The Plan Model
The Aimi Sync API utilises Plans as a representation and mechanism of control for generating synchronized music and voiceovers for videos. Aimi Sync generates a human-readible Plan in json that details a range of music and dialogue construction plans for a supplied video. The generated Plan is returned to the API client and can be used unedited, or manupulated client-side before being used for Plan execution through the API.
The Plan Guide page provides details about working with Plans and is recommended reading before working with the Aimi Sync API.
Getting Access
To utilise the Sync API you will need an API Token. These can be obtained from https://dashboard.aimi.fm under the Sync API tab. Multiple tokens can be created and each expires after 30 days. The token can then be used as a standard Bearer token for all Sync API endpoints.
The Aimi Sync API Flow
Aimi Sync generates music for provided videos through five key stages. A unique taskid
string is used to chain queries together through most of these stages. Documentation for the endpoints is provided on the Endpoints tab of this page. This guide is intended to provide an overview for effective pipeline development with the API.
Video Upload
The sync_upload_init, sync_upload_chunk and sync_upload_complete endpoints facilitate uploading video chunks in parallel. Files larger than 30MB should be split into chunks and chunks should be a maximum of 30MB each. Once a download is initialised and a taskid
is obtained, parallel (or sequential) calls to sync_upload_chunk with respective chunk indices can be made. The total number of file chunks used should then be passed to the sync_upload_complete endpoint, to comibine chunks and prepare the video for analysis.
As the taskid
is used through the other steps of the Sync API flow, multiple videos can be uploaded before moving through the flow with any particular video, simply by keeping track of the taskid
string for each video.
Content Analysis
Once a video is uploaded, the video must be analysed. Video analysis processing time is affected by video length and size. See the FAQ for tips on optimising performance with Aimi Sync.
Audio Analysis is required in cases where the existing audio in the mix is intended to be kept in the video and mixed together with the music Aimi Sync generates, and is otherwise not required.
Both video and audio analysis are deterministic and only need to be run once per video, so a single taskid
is sufficient through these stages of the flow even if multiple plans or executions are desired per video.
Plan Generation
Once a video is uploaded and analysed (including audio analysis if required) a plan can be generated using the sync_generate endpoint. There are several optional plan settings that can be included in the json payload: with_vocals
(false), with_voiceover
(true), with_instrument
(false) and genre
("deep"). See the Plan Guide for more on these.
An Aimi Sync Plan is returned as json in the response body to a successful request. This plan can be used as-is or can be edited as desired. The Plan Guide provides a detailed breakdown of Aimi Sync Plans and how to work with them to effectively guide the music generation process to a desired level of detail.
Plan Execution
Once a plan is ready, the sync_execute endpoint can be used to start a music rendering process. For a single video and its corresponding taskid
mulitple plans can be created, or the same plan can sent for execution multiple times. Up to 10 executions can be carried out in parallel.
As noted in the Plan Guide, the same plan can generate different audio outputs, so submitting the same plan to the sync_execute endpoint can allow for varied music results. It is recommended to try the same plan multiple times before making plan modifications and parallel execution makes this approach time effective.
Status Check & Download
Once an execution is started an execid
is needed to check the status of the Plan execution. At the time of launch, a typical execution for a < 5 minute video is the runtime of the video and longer videos may take more than runtime.
Once an execution has completed, a status check on that execid
will provide a direct download link to rendered results, which may be a muxed video or separate video and audio files in a zip archive.