Long-running operations
Occasionally, a service may need to expose an operation that takes a significant amount of time to complete. In these situations, it is often a poor user experience to simply block while the task runs; rather, it is better to return some kind of promise to the user, and allow the user to check back in later.
The long-running request pattern is roughly analogous to a Future in Python or Java, or a Node.js Promise. Essentially, the user is given a token that can be used to track progress and retrieve the result.
Guidance
Operations that might take a significant amount of time to complete should
return a 202 Accepted
response along with an Operation
resource that can be
used to track the status of the request and ultimately retrieve the result.
Any single operation defined in an API surface must either always return
202 Accepted
along with an Operation
, or never do so. A service must
not return a 200 OK
response with the result if it is “fast enough”, and
202 Accepted
if it is not fast enough, because such behavior adds significant
burdens for clients.
Operation representation
The response to a long-running request must be an Operation
.
Protocol buffer APIs must use the common component
aep.api.Operation
.
OpenAPI services must use this JSON Schema Operation
schema.
Querying an operation
The service must provide an endpoint to query the status of the operation, which must accept the operation path and should not include other parameters:
The endpoint must return a Operation
as described above.
Standard methods
APIs may return an Operation
from the Create
,
Update
, or Delete
standard methods if appropriate. In
this case, the response
field must be the standard and expected response
type for that standard method.
When creating or deleting a resource with a long-running request, the resource
should be included in List
and Get
calls;
however, the resource should indicate that it is not usable, generally with
a state enum.
Parallel requests
A resource may accept multiple requests that will work on it in parallel, but is not obligated to do so:
- Resources that accept multiple parallel requests may place them in a queue rather than work on the requests simultaneously.
- Resource that does not permit multiple requests in parallel (denying any new
request until the one that is in progress finishes) must return
409 Conflict
if a user attempts a parallel request, and include an error message explaining the situation.
Expiration
APIs may allow their operation resources to expire after sufficient time has elapsed after the request completed.
Errors
Errors that prevent a long-running request from starting must return an [error response][AEP-193], similar to any other method.
Errors that occur over the course of a request may be placed in the metadata message. The errors themselves must still be represented with a canonical error object.
Interface Definitions
When using protocol buffers, the common component
aep.api.Operation
is used.
-
The response type must be
aep.api.Operation
. TheOperation
proto definition should not be copied into individual APIs; prefer to use a single copy (in monorepo code bases), or remote dependencies via a tool like [Buf][buf.build].- The response must not be a streaming response.
-
The method must include a
aep.api.operation_info
annotation, which must define both response and metadata types.- The response and metadata types must be defined in the file where the RPC appears, or a file imported by that file.
- If the response and metadata types are defined in another package, the fully-qualified message name must be used.
- The response type should not be
google.protobuf.Empty
(except forDelete
methods), unless it is certain that response data will never be needed. If response data might be added in the future, define an empty message for the RPC response and use that. - The metadata type is used to provide information such as progress, partial
failures, and similar information on each
GetOperation
call. The metadata type should not begoogle.protobuf.Empty
, unless it is certain that metadata will never be needed. If metadata might be added in the future, define an empty message for the RPC metadata and use that.
-
APIs with messages that return
Operation
must implement theGetOperation
method of theOperations
service, and may implement the other methods defined in that service. Individual APIs must not define their own interfaces for long-running operations to avoid inconsistency.
-
202
must be the only success status code defined. -
The
202
response must define anapplication/json
response body and no other response content types. -
The response body schema must be an object with
path
,done
,error
, andresponse
properties as described above for anOperation
. -
The response body schema may contain an object property named
metadata
to hold service-specific metadata associated with the operation, for example progress information and common metadata such as create time. The service should define the contents of themetadata
object in a separate schema, which should specifyadditionalProperties: true
to allow for future extensibility. -
The
response
property must be a schema that defines the success response for the operation. For an operation that typically gives a204 No Content
response, such as aDelete
,response
should be defined as an empty object schema. For a standardGet/Create/Update
operation,response
should be a representation of the resource. -
If a service has any long running operations, the service must define an
Operation
resource with alist
operation to retrieve a potentially filtered list of operations and aget
operation to retrieve a specific operation by itspath
.