Document

The Document Aptitude provides the ability to interact with documents (currently only XLSX files).

readPDF

readPDF provides the ability to input a PDF as a Uint8Array (either from the Filesystem or Network Aptitudes) and parse the text content inside it.

import { document, filesystem } from '@oliveai/ldk';

const pdfFile = await filesystem.readFile('./earnings-report.pdf');
const pdfContent = await document.readPDF(pdfFile);
const expected = {
    // "1" is the page number
    "1": {
        "Content": [
            {"Value": "Test text", "Type": "text"},
            {"Value": "", "Type": "newLine"},
            {"Value": "More text on the next line", "Type": "text"},
            {"Value": "base64 string", "Type": "photo"},
        ]
    }
}

console.log(pdfContent === expected)
// true

readPDFwithOcr

readPDFwithOcr is an enhanced function for readPDF , it provides the ability to input a PDF as a Uint8Array (either from the Filesystem or Network Aptitudes) and parse the text content inside it including ocr any images with text. NOTE: this function will need to have screen permission, check permission tab in Screen Aptitude.

import { document, filesystem } from '@oliveai/ldk';

const pdfFile = await filesystem.readFile('./test.pdf');
const pdfContent = await document.readPDFwithOcr(pdfFile);
const expected = {
    // "1" is the page number
    "1": {
        "Content": [
            {"Value": "Test text", "Type": "text"},
            {"Value": "", "Type": "newLine"},
            {"Value": "More text on the next line", "Type": "text"},
            {"Value": "base64 string", "Type": "photo"},
            {"value": "Text from image", "Type": "photoText"},
        ]
    }
}

console.log(pdfContent === expected)
//true

xlsxEncode

xlsxEncode provides ability to encode a workbook object into XLSX data. It returns A promise of Uint8Array.

import { document } from '@oliveai/ldk';

// Imagine that we have a workbook, that workbook contains a worksheet. 
// To locate a single cell, we need the row numbner and cell number.
// We need a workbook object
const workbook = {
    worksheets: [
        {
            hidden: false,
            hiddenColumns: [],
            hiddenRows: [],
            name: 'name',
            rows: [{ cells: [{ value: 'cell value' }] }],
        },
    ],
 };
const uint8ArrayData = await document.xlsxEncode(workbook);

xlsxDecode

xlsxDecode Decodes uint8Array into a Workbook Object. It returns a promise of Workbook.

import { document } from '@oliveai/ldk';

// Decode Uint8Array to a Workbook Object.
// The uint8ArrayData is the same as the one we get from xlsxEncode method.
const workbook = document.xlsxDecode(uint8ArrayData);

This is the structure of a workbook:

Workbook: {

worksheets: Worksheet[];

}

Worksheet: {

hidden: boolean;

// Representation of a basic workbook with a single sheet where some rows and columns are hidden

hiddenColumns: number[];

hiddenRows: number[];

name: string;

rows: Row[];

}

Row: {

cells:cell[];

}

Cell: {

value:string;

}

Last updated