# SmartExtract API

The SmartExtract class provides you a more fine grained control over how data extraction happens on your application. It requires a little more effort than the [SmartExtractSimple API](/smart-extract-documentation/smart-extract.js/smartextractsimple-api.md) which is much simpler to use and should fit most of the scenarios.

## Usage

### Creating an Instance

To start a SmartExtract session in order to extract a document or edit previously extracted data, you need to first create a SmartExtract instance.&#x20;

{% tabs %}
{% tab title="Production environment" %}

```javascript
const smex = new SmartExtract({
  baseUrl: 'https://app.clik.ai/smart-extract',
});
```

{% endtab %}

{% tab title="Staging environment" %}

```javascript
const smex = new SmartExtract({
  baseUrl: 'https://app.clik.ai/smart-extract-stg',
});
```

{% endtab %}
{% endtabs %}

| Parameter | Required | Description                                                                                                                        |
| --------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| baseUrl   | No       | Points to the SmartExtract service base url. You normally will not need to provide this value. It will default to the correct url. |

### Handling Events

The SmartExtract component triggers various events to communicate with the host application. Following is an explanation of each of the events triggered

#### Ready Event

This even is triggered when the SmartExtract component is loaded in the iframe and is successfully authenticated. This event signifies that the SmartExtract component is now ready to accept document extraction and/or data editing requests.

```javascript
smex.addEventListener('ready', function () {
  console.log('Smart Extract is now ready to take extraction/editing requests');
  // Once the SmartExtract component is ready, you can trigger an extraction request
  smex.sendExtractDocumentRequest({...})
});
```

#### Data Event

This even is triggered when your user clicks the \`Save\` button inside the SmartExtract component. The `event.detail` property holds the JSON object representing the extracted data.\
\
The `event.detail.workbookData` property holds all the information on the spreadsheet including formulas, colors, formatting etc. You must persist this data as is so that later the spreadsheet can be edited with all these details.\
\
The `event.detail.documentData` property holds a simpler JSON object holding data extracted from the document. You can utilise data in this object to process as per your application needs.\
\
The `event.detail.meta` property holds the meta information that was provided by the user on the SmartExtract extraction form.

You must persist the `event.detail` object as is in your database for re-editing the data under SmartExtract component. If you intend to post process the output data, you must create a copy and store it separately.

```javascript
smex.addEventListener('data', function (event) {
  console.log('User clicked save button on the SmartExtract component to persist data');
  console.log('The data to be persisted is: ', event.detail);

  // event.details = {
  //   meta: {
  //     assetType: '<The asset type of the document>',
  //     documentType: '<The document type>',
  //     osPeriod: [new Date('<start-date>'), new Date('<end-date>')],
  //     fileName: <The document file name>,
  //     ...
  //   },
  //   workbookData: {...},
  //   documentData: {
  //     // plain text data as detected in the document
  //     source: {
  //       rows: [
  //         // First value is S.N. on each row signifying a unique row-id for each row in the document
  //         ['S.N.', '', '', '', ....], // Each row is an array of all text-tokens detected in the row
  //         [1010, ..., ..., ..., ...],
  //         // ... rest of the data rows
  //       ],
  //     }
  //    for CASH_FLOW documents
  //    extracted: {
  //      columns: [
  //        {
  //          'period': '<value>',
  //          'type': '<value>',
  //          'periodEndDate': '<value>',
  //          'name' : 'col0' or 'col1' or 'col2' or ...
  //        }
  //         // ... rest of the columns
  //      ],
  //       rows: [
  //        {
  //          'lineItem': '<value>'
  //          'subCategory': '<value>',
  //          'category': '<value>',
  //          'col0': '<value>',
  //          'col1': '<value>',
  //          ...
  //        }
  //         // ... rest of the data rows
  //       ],
  //     },
  //    for other documents
  //     extracted: {
  //       rows: [
  //         { '<column-name-1>': '<column-value>', '<column-name-2>': '<column-value>', /* ... ,*/}
  //         // ... rest of the data rows
  //       ],
  //     }
  //   },
  // }
});
```

#### Cancel Event

This event is triggered when the user clicks `Cancel` button on the SmartExtract component. This signifies that user wants to discard any changes they made to the data.

```javascript
smex.addEventListener('cancel', function (event) {
  console.log('User cancelled the data editing');
});
```

#### Error Event

This event is triggered when SmartExtract fails to extract document data. **Note: This event is only triggered if `disableRetry` option is set as `true`**. In case the option is set as `false` SmartExtract will instead show a retry section where users can either retry data extraction with updated meta information or cancel the extraction which would then trigger the `cancel` event.

```javascript
smex.addEventListener('error', function (event) {
  console.log('SmartExtract failed to extract document data');
});
```

#### Data Extacted Event

This event is triggered when SmartExtract sends back the extracted data for pre-processing. **Note: This event is only triggered if `shouldPreProcessData` option is set as `true`**.  The data is available in `event.detail` property. This data can be updated and sent back to smart-extracted through the method `smex.sendPreProcessedExtractractedData(updatedData)`

```javascript
smex.addEventListener('dataExtracted', function (event) {
  const data = event.detail
  // mutations here
  const updatedData = ...
  smex.sendPreProcessedExtractractedData(updatedData);
});
```

### Starting SmartExtract Session

Once you have created a SmartExtract instance and setup various event handlers we are now ready to start a session to extract data from a new document or edit a data for a previously extracted document.

You can use the `SmartExtract#startSession` to start a new data extraction session. The API will create an iframe instance and mount it at the provided DOM node. The iframe will load the SmartExtract session and validate the passed authentication token. If all goes good, the iframe would trigger the `ready` event which should invoke the ready event handler. You can use the ready event handler to request document extraction or start a data editing session using the APIs documented next.

The iframe added created will fill the provided DOM node by default. You can provide a css class name for the iframe if you want to further style the iframe.

```javascript
smex.startSession(
  // the DOM element on which iframe should be mounted
  // e.g. document.getElementById('iframeWrapper'),
  mountNode,
  // The auth token you obtained after hitting the
  // authentication API,
  sessionAuthToken,
  // options, (optional)
  {
    // optional css classes to be applied to the iframe
    iframeClass: '<css classses to be applied to the iframe>',
  }
);
```

| Parameter           | Required | Description                                                                                                                                |
| ------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| mountNode           | Yes      | The HTML DOM node where the iframe will be appended.                                                                                       |
| sessionAuthToken    | Yes      | The auth token obtained from the [authentication api](/smart-extract-documentation/api-reference.md#token)                                 |
| options             | No       | Configuration options                                                                                                                      |
| options.iframeClass | No       | The css class applied to the added iframe. You can provide multiple classes by separating them with spaces. E.g. `css-class-1 css-class-2` |

### Extracting Document Data

Once the SmartExtract component is loaded in the iframe and triggers the 'ready' event, you can start a document extraction process by calling the `sendExtractDocumentRequest` API.

```javascript
smex.sendExtractDocumentRequest({
  file,
  fileName,
  data: {
    // ...
  },
  options: {
    disableRetry: true,
    shouldPreProcessData: true,
    spreadsheet: {
      showSidebar: true,
    },
    // ...
  }
})
```

| Parameter                       | Required | Type    | Description                                                                                                                                                                                                                                                                                   |
| ------------------------------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| file                            | Yes      | string  | The pdf or xlsx file encoded as a [Data Url](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs) string                                                                                                                                                               |
| fileName                        | Yes      | string  | The file name. Make sure that the file name has correct extension since or your users may not see a proper file preview.                                                                                                                                                                      |
| options                         | No       | object  | <p>Options to configure data extraction.<br>See <a href="/pages/-MkHMNsZmasTAm9sM9Q-">Styling and Customisation</a> section for more details on how to style and customise the extraction form and spreadsheet view.</p>                                                                      |
| options.disableRetry            | No       | boolean | <p>If <code>true</code> SmartExtract will trigger an 'error' event in case data extraction fails.</p><p>If <code>false</code> SmartExtract will show a retry option so that user can retry extraction.</p>                                                                                    |
| options.shouldPreProcessData    | no       | boolean | <p>If <code>true</code> smart-extract will pass back the extracted data for pre-processing and wait for the updated data before rendering it in the spreadsheet.</p><p>If <code>false</code> then smart-extract will render the extracted data in spreadsheet without any pre-processing.</p> |
| options.spreadsheet.showSidebar | No       | boolean | <p>If <code>true</code>, SmartExtract will show the sidebar in the Extraction view.<br>If <code>false</code>, no sidebar will be shown.<br>Default is true.</p>                                                                                                                               |

### Editing Previously Extracted Data

The SmartExtract component not just allows you to extract data from documents, your users can continue editing the data in the SmartExtract widget from where they left off. The `sendEditDataRequest` allows you to pass on previously saved data and continue editing in the SmartExtract component.

Under the edit mode, the user will be taken directly to the spread sheet screen where they can edit the data. Clicking 'Save' or 'Cancel' will trigger one of the 'data' or 'cancel' events.

```javascript
// extractedData must be same data that was provided by the 'data' event

// extractedData = {
//   workbookData,
//   documentData,
//   meta,
// }

smex.sendEditDataRequest({
    data: extractedData, 
    options: {...}
})
```

| Parameter | Required | Type   | Description                                                                                                                                                                      |
| --------- | -------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| data      | Yes      | object | The extracted data object returned by the data event                                                                                                                             |
| options   | No       | object | Configuration options to style the spreadsheet view. See [Styling Extraction Form](/smart-extract-documentation/smart-extract.js/styling-and-customisations.md) for more details |

### Ending SmartExtract session

Once you are done with the SmartExtract extraction session, you can end it by calling the `endSession` api. This api would remove any event listeners attached to the SmartExtract iframe events and remove the iframe from the DOM.

```javascript
smex.endSession();
// Performs cleanup by removing attached event listeners and
// removing the iframe from the DOM.
```

{% hint style="warning" %}
ts important to end SmartExtract session when you are done with extraction. Ending the session clears any event handlers attached. Make sure to call the `endSession` API when you are done with the extraction.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://clik-ai.gitbook.io/smart-extract-documentation/smart-extract.js/smartextract-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
