Extract Text From Image Trigger Action
  • 07 Jun 2024
  • 4 Minutes to read
  • Contributors

Extract Text From Image Trigger Action


Article summary

The Extract Text from Image trigger action extracts text from an image based on a query. Another way to describe this is “Query based OCR (optical character recognition)” or “Extract text from an image by asking for it.”

The trigger will always only return text that is actually present in the image. It will not add additional information or interpretation to it. This makes it very powerful for transferring data from the physical into the digital world.

Example Use Cases:

  • Ingest data from an order form coming from an outside vendor. Forget manually transferring that 14 character PO number from a supplier invoice into your WMS, combine a simple app and "Extract Text from Image" to pull this data in seconds.
  • Digitize paper forms. The data contained on existing paper travelers is even more valuable when it can be accessed within Tulip apps. "Extract Text from Image" actions are a great mechanism to bridge the physical and digital world.
  • Work with text in languages foreign to your operators, reliably. The manufacturing world is global, give your operators superpowers by combining the "Extract Text from Image" and "Translate" trigger actions to turn paper-based information into something your operators can act on.

Trigger Example

Use a mobile app to take a picture of a label on a product to get the batch number.

ImageTriggerResult
image.pngimage.png11EP8F4WA58CCX

Extract Value from Image

Inputs and outputs

The trigger action has two inputs, Input Image and Query, and one output, the extracted text.

Input: Input Image

This is the image from which the text should be extracted. This can come from the camera input widget, Tulip Vision, or external systems.

Supported data type
InputImage URL

Input: Query

This is the query that is used for extracting the text from the image or document.

Query best practices:

  • Where possible, use words from the document. This is particularly helpful for acronyms and abbreviations (e.g. SN, ID, SSN, Lot No., etc.). The extract text trigger actions support less complex queries than the Answer Question from Data/Document Trigger actions.
    • Ex. Great Input: "Who is the supplier?"
    • Ex. Bad Input: "Who do you think could have sent this to us?"
  • Specifying the location of information can also help (e.g. “What is the reference number on the bottom?”)
Supported data type
InputText

Output: Extracted Text

This is the text that was extracted from the image based on the query.

Supported data type
OutputText

Extract Values from Image/Document

Note

Extracting values from documents is a relatively slow operation. We limit documents to 10 pages to limit execution time.

Extract Values from Image/Document works just like Extract value from image, but supports an array of questions. This will be significantly more performant than running the extract value from image trigger action.

Input: Input Image/Document

This is the image from which the text should be extracted. This can come from the camera input widget, Tulip Vision, or external systems. For files, this can be set statically, input with the file input widget, or reference files stores in Tables.

Supported data type
InputImage URL

Input: Query

This is the query that is used for extracting the text from the image. This should be an array/list of text values.

Supported data type
InputText List

Output: Extracted Text

This is the text that was extracted from the image based on the query.

Supported data type
OutputObject Array. Each element will have an "Question" and "Answer" attribute.

Extract All Text from Image/Document

In some cases, the key:value paradigm of the extract value trigger actions does not make sense for your use case. Reading all data from an image provides nearly infinite flexibility in what problems that can be addressed by copilot. "Extract All Text" Trigger actions provide you this flexibility.

image.png

Input: Input Image/Document

This is the image from which the text should be extracted. This can come from the camera input widget, Tulip Vision, or external systems. For files, this can be set statically, input with the file input widget, or reference files stores in Tables.

Supported data type
InputImage URL or File URL

Output: Extracted Text

This is all of the text found on the respective image or document. Documents will return an array of data, with each item representing the text from one page of the provided document.

Supported data type
Output(for images) Text. (for documents) Text List

Edge Cases

No input image and/or no query provided

If no input image or no query is provided to the trigger action, the App will show the following system error:
Your Input or Query is empty

This happens for all of the following cases:

  • The input image and/or query input do not have a value assigned. This is equivalent to “null”.
  • The query has an empty string assigned.

No result for query

If no result could be found for the query, the trigger action will return an empty text.

Limits

Warning

The following languages are the only languages supported for documents where values are being extracted: English, Spanish, Italian, Portuguese, French, German.

Currently the following limits exist for "Extract Text from Image" triggers. These limits are tracked on an Instance level. In a case where these limits have been exceeded, the "Extract Text from Image" trigger action will fail.

Image Size: All images must be below 5MB
Monthly Limit: 10,000 Requests/Month
Rate Limit: 10 Requests/Minute

Account Usage Limit: See details here


Was this article helpful?