Zum Inhalt

Pipelines

A pipeline defines a document processing procedure. It combines various processing components including OCR, AI models and data validation procedures to compute a structured data extraction from a document. Pipelines are highly customizable and enable flexible data extraction.

Typically you would create a pipeline by specifying the data structure you want to extract from documents. This data structure is also known as a template. After that, you can run the pipeline on individual documents or create an inbox based on a pipeline and upload documents to it for automatic processing. It is recommended to organize document extraction around inboxes.

Templates

An extraction template is a JSON object that describes all fields to be extracted, including their name, type, textual description and whether multiple extractions of a field are possible. The template schema is documented here.

For your convenience, smartextract has many predefined extraction templates designed for specific use cases. The includes templates for invoices, receipts, bank statements, delivery notes and more. Where the API expects a TEMPLATE, you may either provide an actual template (as a JSON object) or the name of a predefined template (as a string).

To retrieve a full list of predefined templates, send a GET request to pipelines/templates:

curl -X 'GET' 'https://api.smartextract.ai/templates?lang=en' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer API_TOKEN'
import httpx
response = httpx.get(
    url='https://api.smartextract.ai/templates?lang=en',
    headers={
        'Accept': 'application/json',
        'Authorization': 'Bearer API_TOKEN'
    }
)
print(response.json())

You may specify a different template language via the lang query parameter. At the moment, en for English and de for German are supported. To refer to a predefined template, use the id.lang notation, for example invoice.en.

Creating a pipeline

To create a template based pipeline send a POST request on /pipelines including the template and a clear descriptive pipeline name:

curl -X 'POST' 'https://api.smartextract.ai/pipelines' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer API_TOKEN' \
  -d '{
    "name": "PIPELINE_NAME",
    "template": "TEMPLATE",
  }' 
import httpx
response = httpx.post(
    url='https://api.smartextract.ai/pipelines',
    headers={
        'Accept': 'application/json',
        'Authorization': 'Bearer API_TOKEN'
    },
    json={
        'name': 'PIPELINE_NAME',
        'template': 'TEMPLATE'
    }
)
print(response.json())

The response contains the id of a created pipeline.

Managing pipelines

Listing pipelines

When you create a pipeline, the pipeline id is returned. You can also retrieve a listing of all pipelines you have access to with a GET request to /resources:

curl -X 'GET' 'https://api.smartextract.ai/resources?type=template_pipeline' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer API_TOKEN'
import httpx
response = httpx.get(
    url='https://api.smartextract.ai/resources?type=template_pipeline',
    headers={
        'Accept': 'application/json',
        'Authorization': 'Bearer API_TOKEN'
    }
)
print(response.json())

Note, that the template_pipeline resource type is specified as a query parameter.

Viewing details about a pipeline

To view the details about a given pipeline, including its template, send a GET request to /pipelines/PIPELINE_ID:

curl -X 'GET' 'https://api.smartextract.ai/pipelines/PIPELINE_ID' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer API_TOKEN'
import httpx
response = httpx.get(
    url='https://api.smartextract.ai/pipelines/PIPELINE_ID',
    headers={
        'Accept': 'application/json',
        'Authorization': 'Bearer API_TOKEN'
    }
)
print(response.json())

Modifying pipelines

To modify an existing pipeline, send a PATCH request to /pipelines/PIPELINE_ID:

curl -X 'PATCH' 'https://api.smartextract.ai/pipelines/PIPELINE_ID' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer API_TOKEN'
  -d '{
    "name": "New name",
  }'
import httpx
httpx.patch(
    url='https://api.smartextract.ai/pipelines/PIPELINE_ID',
    headers={
        'Accept': 'application/json',
        'Authorization': 'Bearer API_TOKEN'
    },
    json={
        'name': 'New name'
    }
)

The request payload may include any of the following entries:

  • name: The display name of the pipeline.
  • template: A JSON object describing the desired extraction template.

Sharing pipelines

A pipeline you own can be shared with another smartextract user using the following POST request to /resources/PIPELINE_ID/permissions:

curl -X 'POST' 'https://api.smartextract.ai/resources/PIPELINE_ID/permissions' \
  -H 'Authorization: Bearer API_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "user": "USER_EMAIL",
    "level": "view"
  }'
import httpx
httpx.post(
    url='https://api.smartextract.ai/resources/PIPELINE_ID/permissions',
    headers={
        'Authorization': 'Bearer API_TOKEN',
        'Content-Type': 'application/json'
    },
    json={
        'user': 'USER_EMAIL',
        'level': 'view'
    }
)