Data schemas

Here we describe the most important data formats used by the smartextract API. Specifically, we provide this information in the form of JSON schemas. Note that argument and response formats of individual API endpoints are documented in the API reference.

The template and extraction formats are related in the following way:

A template specifies the fields you wish to extract from your documents, including the field name, description, and further properties such as cardinality. A template is the main ingredient needed to define a pipeline.
An extraction reflects the data contained in a specific document and follows the rules specified in the corresponding template. An extraction results from submitting a document to a pipeline.