SmartExtract API
The SmartExtract class provides you a more fine grained control over how data extraction happens on your application. It requires a little more effort than the SmartExtractSimple API which is much simpler to use and should fit most of the scenarios.
Usage
Creating an Instance
To start a SmartExtract session in order to extract a document or edit previously extracted data, you need to first create a SmartExtract instance.
Parameter
Required
Description
baseUrl
No
Points to the SmartExtract service base url. You normally will not need to provide this value. It will default to the correct url.
Handling Events
The SmartExtract component triggers various events to communicate with the host application. Following is an explanation of each of the events triggered
Ready Event
This even is triggered when the SmartExtract component is loaded in the iframe and is successfully authenticated. This event signifies that the SmartExtract component is now ready to accept document extraction and/or data editing requests.
Data Event
This even is triggered when your user clicks the `Save` button inside the SmartExtract component. The event.detail
property holds the JSON object representing the extracted data.
The event.detail.workbookData
property holds all the information on the spreadsheet including formulas, colors, formatting etc. You must persist this data as is so that later the spreadsheet can be edited with all these details.
The event.detail.documentData
property holds a simpler JSON object holding data extracted from the document. You can utilise data in this object to process as per your application needs.
The event.detail.meta
property holds the meta information that was provided by the user on the SmartExtract extraction form.
You must persist the event.detail
object as is in your database for re-editing the data under SmartExtract component. If you intend to post process the output data, you must create a copy and store it separately.
Cancel Event
This event is triggered when the user clicks Cancel
button on the SmartExtract component. This signifies that user wants to discard any changes they made to the data.
Error Event
This event is triggered when SmartExtract fails to extract document data. Note: This event is only triggered if disableRetry
option is set as true
. In case the option is set as false
SmartExtract will instead show a retry section where users can either retry data extraction with updated meta information or cancel the extraction which would then trigger the cancel
event.
Data Extacted Event
This event is triggered when SmartExtract sends back the extracted data for pre-processing. Note: This event is only triggered if shouldPreProcessData
option is set as true
. The data is available in event.detail
property. This data can be updated and sent back to smart-extracted through the method smex.sendPreProcessedExtractractedData(updatedData)
Starting SmartExtract Session
Once you have created a SmartExtract instance and setup various event handlers we are now ready to start a session to extract data from a new document or edit a data for a previously extracted document.
You can use the SmartExtract#startSession
to start a new data extraction session. The API will create an iframe instance and mount it at the provided DOM node. The iframe will load the SmartExtract session and validate the passed authentication token. If all goes good, the iframe would trigger the ready
event which should invoke the ready event handler. You can use the ready event handler to request document extraction or start a data editing session using the APIs documented next.
The iframe added created will fill the provided DOM node by default. You can provide a css class name for the iframe if you want to further style the iframe.
Parameter
Required
Description
mountNode
Yes
The HTML DOM node where the iframe will be appended.
sessionAuthToken
Yes
options
No
Configuration options
options.iframeClass
No
The css class applied to the added iframe. You can provide multiple classes by separating them with spaces. E.g. css-class-1 css-class-2
Extracting Document Data
Once the SmartExtract component is loaded in the iframe and triggers the 'ready' event, you can start a document extraction process by calling the sendExtractDocumentRequest
API.
Parameter
Required
Type
Description
file
Yes
string
fileName
Yes
string
The file name. Make sure that the file name has correct extension since or your users may not see a proper file preview.
options
No
object
options.disableRetry
No
boolean
If true
SmartExtract will trigger an 'error' event in case data extraction fails.
If false
SmartExtract will show a retry option so that user can retry extraction.
options.shouldPreProcessData
no
boolean
If true
smart-extract will pass back the extracted data for pre-processing and wait for the updated data before rendering it in the spreadsheet.
If false
then smart-extract will render the extracted data in spreadsheet without any pre-processing.
options.spreadsheet.showSidebar
No
boolean
If true
, SmartExtract will show the sidebar in the Extraction view.
If false
, no sidebar will be shown.
Default is true.
Editing Previously Extracted Data
The SmartExtract component not just allows you to extract data from documents, your users can continue editing the data in the SmartExtract widget from where they left off. The sendEditDataRequest
allows you to pass on previously saved data and continue editing in the SmartExtract component.
Under the edit mode, the user will be taken directly to the spread sheet screen where they can edit the data. Clicking 'Save' or 'Cancel' will trigger one of the 'data' or 'cancel' events.
Parameter
Required
Type
Description
data
Yes
object
The extracted data object returned by the data event
options
No
object
Ending SmartExtract session
Once you are done with the SmartExtract extraction session, you can end it by calling the endSession
api. This api would remove any event listeners attached to the SmartExtract iframe events and remove the iframe from the DOM.
ts important to end SmartExtract session when you are done with extraction. Ending the session clears any event handlers attached. Make sure to call the endSession
API when you are done with the extraction.
Last updated