SmartExtract API

The SmartExtract class provides you a more fine grained control over how data extraction happens on your application. It requires a little more effort than the SmartExtractSimple API which is much simpler to use and should fit most of the scenarios.

Usage

Creating an Instance

To start a SmartExtract session in order to extract a document or edit previously extracted data, you need to first create a SmartExtract instance.

const smex = new SmartExtract({
  baseUrl: 'https://app.clik.ai/smart-extract',
});

Parameter

Required

Description

baseUrl

Points to the SmartExtract service base url. You normally will not need to provide this value. It will default to the correct url.

Handling Events

The SmartExtract component triggers various events to communicate with the host application. Following is an explanation of each of the events triggered

Ready Event

This even is triggered when the SmartExtract component is loaded in the iframe and is successfully authenticated. This event signifies that the SmartExtract component is now ready to accept document extraction and/or data editing requests.

smex.addEventListener('ready', function () {
  console.log('Smart Extract is now ready to take extraction/editing requests');
  // Once the SmartExtract component is ready, you can trigger an extraction request
  smex.sendExtractDocumentRequest({...})
});

Data Event

This even is triggered when your user clicks the `Save` button inside the SmartExtract component. The event.detail property holds the JSON object representing the extracted data. The event.detail.workbookData property holds all the information on the spreadsheet including formulas, colors, formatting etc. You must persist this data as is so that later the spreadsheet can be edited with all these details. The event.detail.documentData property holds a simpler JSON object holding data extracted from the document. You can utilise data in this object to process as per your application needs. The event.detail.meta property holds the meta information that was provided by the user on the SmartExtract extraction form.

You must persist the event.detail object as is in your database for re-editing the data under SmartExtract component. If you intend to post process the output data, you must create a copy and store it separately.

smex.addEventListener('data', function (event) {
  console.log('User clicked save button on the SmartExtract component to persist data');
  console.log('The data to be persisted is: ', event.detail);

  // event.details = {
  //   meta: {
  //     assetType: '<The asset type of the document>',
  //     documentType: '<The document type>',
  //     osPeriod: [new Date('<start-date>'), new Date('<end-date>')],
  //     fileName: <The document file name>,
  //     ...
  //   },
  //   workbookData: {...},
  //   documentData: {
  //     // plain text data as detected in the document
  //     source: {
  //       rows: [
  //         // First value is S.N. on each row signifying a unique row-id for each row in the document
  //         ['S.N.', '', '', '', ....], // Each row is an array of all text-tokens detected in the row
  //         [1010, ..., ..., ..., ...],
  //         // ... rest of the data rows
  //       ],
  //     }
  //    for CASH_FLOW documents
  //    extracted: {
  //      columns: [
  //        {
  //          'period': '<value>',
  //          'type': '<value>',
  //          'periodEndDate': '<value>',
  //          'name' : 'col0' or 'col1' or 'col2' or ...
  //        }
  //         // ... rest of the columns
  //      ],
  //       rows: [
  //        {
  //          'lineItem': '<value>'
  //          'subCategory': '<value>',
  //          'category': '<value>',
  //          'col0': '<value>',
  //          'col1': '<value>',
  //          ...
  //        }
  //         // ... rest of the data rows
  //       ],
  //     },
  //    for other documents
  //     extracted: {
  //       rows: [
  //         { '<column-name-1>': '<column-value>', '<column-name-2>': '<column-value>', /* ... ,*/}
  //         // ... rest of the data rows
  //       ],
  //     }
  //   },
  // }
});

Cancel Event

This event is triggered when the user clicks Cancel button on the SmartExtract component. This signifies that user wants to discard any changes they made to the data.

smex.addEventListener('cancel', function (event) {
  console.log('User cancelled the data editing');
});

Error Event

This event is triggered when SmartExtract fails to extract document data. Note: This event is only triggered if disableRetry option is set as true. In case the option is set as false SmartExtract will instead show a retry section where users can either retry data extraction with updated meta information or cancel the extraction which would then trigger the cancel event.

smex.addEventListener('error', function (event) {
  console.log('SmartExtract failed to extract document data');
});

Data Extacted Event

This event is triggered when SmartExtract sends back the extracted data for pre-processing. Note: This event is only triggered if shouldPreProcessData option is set as true. The data is available in event.detail property. This data can be updated and sent back to smart-extracted through the method smex.sendPreProcessedExtractractedData(updatedData)

smex.addEventListener('dataExtracted', function (event) {
  const data = event.detail
  // mutations here
  const updatedData = ...
  smex.sendPreProcessedExtractractedData(updatedData);
});

Starting SmartExtract Session

Once you have created a SmartExtract instance and setup various event handlers we are now ready to start a session to extract data from a new document or edit a data for a previously extracted document.

You can use the SmartExtract#startSession to start a new data extraction session. The API will create an iframe instance and mount it at the provided DOM node. The iframe will load the SmartExtract session and validate the passed authentication token. If all goes good, the iframe would trigger the ready event which should invoke the ready event handler. You can use the ready event handler to request document extraction or start a data editing session using the APIs documented next.

The iframe added created will fill the provided DOM node by default. You can provide a css class name for the iframe if you want to further style the iframe.

smex.startSession(
  // the DOM element on which iframe should be mounted
  // e.g. document.getElementById('iframeWrapper'),
  mountNode,
  // The auth token you obtained after hitting the
  // authentication API,
  sessionAuthToken,
  // options, (optional)
  {
    // optional css classes to be applied to the iframe
    iframeClass: '<css classses to be applied to the iframe>',
  }
);

Parameter

Required

Description

mountNode

Yes

The HTML DOM node where the iframe will be appended.

sessionAuthToken

Yes

The auth token obtained from the authentication api

options

Configuration options

options.iframeClass

The css class applied to the added iframe. You can provide multiple classes by separating them with spaces. E.g. css-class-1 css-class-2

Extracting Document Data

Once the SmartExtract component is loaded in the iframe and triggers the 'ready' event, you can start a document extraction process by calling the sendExtractDocumentRequest API.

smex.sendExtractDocumentRequest({
  file,
  fileName,
  data: {
    // ...
  },
  options: {
    disableRetry: true,
    shouldPreProcessData: true,
    spreadsheet: {
      showSidebar: true,
    },
    // ...
  }
})

Parameter

Required

Type

Description

file

Yes

string

The pdf or xlsx file encoded as a Data Url string

fileName

Yes

string

The file name. Make sure that the file name has correct extension since or your users may not see a proper file preview.

options

object

Options to configure data extraction. See Styling and Customisation section for more details on how to style and customise the extraction form and spreadsheet view.

options.disableRetry

boolean

If true SmartExtract will trigger an 'error' event in case data extraction fails.

If false SmartExtract will show a retry option so that user can retry extraction.

options.shouldPreProcessData

boolean

If true smart-extract will pass back the extracted data for pre-processing and wait for the updated data before rendering it in the spreadsheet.

If false then smart-extract will render the extracted data in spreadsheet without any pre-processing.

options.spreadsheet.showSidebar

boolean

If true, SmartExtract will show the sidebar in the Extraction view. If false, no sidebar will be shown. Default is true.

Editing Previously Extracted Data

The SmartExtract component not just allows you to extract data from documents, your users can continue editing the data in the SmartExtract widget from where they left off. The sendEditDataRequest allows you to pass on previously saved data and continue editing in the SmartExtract component.

Under the edit mode, the user will be taken directly to the spread sheet screen where they can edit the data. Clicking 'Save' or 'Cancel' will trigger one of the 'data' or 'cancel' events.

// extractedData must be same data that was provided by the 'data' event

// extractedData = {
//   workbookData,
//   documentData,
//   meta,
// }

smex.sendEditDataRequest({
    data: extractedData, 
    options: {...}
})

Parameter

Required

Type

Description

data

Yes

object

The extracted data object returned by the data event

options

object

Configuration options to style the spreadsheet view. See Styling Extraction Form for more details

Ending SmartExtract session

Once you are done with the SmartExtract extraction session, you can end it by calling the endSession api. This api would remove any event listeners attached to the SmartExtract iframe events and remove the iframe from the DOM.

smex.endSession();
// Performs cleanup by removing attached event listeners and
// removing the iframe from the DOM.

ts important to end SmartExtract session when you are done with extraction. Ending the session clears any event handlers attached. Make sure to call the endSession API when you are done with the extraction.

PreviousIntegration Overview NextSmartExtractSimple API

Last updated 2 years ago