Screen
The Screen Aptitude provides the ability to read the user's screen with solutions like Optical Character Recognition (OCR).
Warning:screen.ocr is being deprecated in mid-April. Please update your code with the new function: screen.listenOcrMonitor.
API
Example
Permissions

ocr

Optical Character Recognition (OCR) technology is a business solution for automating data extraction from a scanned document or image file with either printed or written text and converts that text into a machine-readable format to be used for data processing. The OCR triggers locally on the end-user's computer and the captured content never leaves the desktop. The Loop will be configured to be available by customer request.
OCR processes a digital image by locating and recognizing characters, such as letters, numbers, and symbols. In Olive Helps, OCR() performs screen optical character recognition and returns recognized text results OCRResult .
1
import { screen } from '@oliveai/ldk';
2
3
// The ocrCoordinates object is used to identify an area on the screen
4
// to search for text
5
const ocrCoordinates = {
6
top: 1,
7
left: 1,
8
width: 10,
9
height: 10,
10
};
11
12
// After calling ocr() function, it returns ocrResult
13
// which is an OCRResult object.
14
screen.ocr(ocrCoordinates).then(ocrResult) => {
15
console.log(JSON.stringify(ocrResult));
16
// Example output:
17
// {"confidence":-1,"text":"","level":4,"page_num":1,"block_num":8,"par_num":1,"line_num":1,"word_num":0,"left":319,"top":323,"width":426,"height":21},
18
// {"confidence":91,"text":"[email protected]","level":5,"page_num":1,"block_num":8,"par_num":1,"line_num":1,"word_num":1,"left":319,"top":323,"width":426,"height":21},
19
}
20
/*
21
* Please notice, ocr's resulting output is not text, it is an OCRResult Object
22
* In order to get specific text, please refer to our example tab for
23
* detailed usage of this function.
24
*/
Copied!

hash

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1).
The hash method calculates the image hash of the specified area using the specified hashing algorithm.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const sensitivity = 1;
4
const hashType = HashType.Average; // Optional, defaults to Average
5
const bounds = {
6
top: 0,
7
left: 0,
8
width: 512,
9
height: 512,
10
};
11
12
screen.hash(bounds, sensitivity, hashType).then((hash) => {
13
console.log(`Hashed area string: ${hash}`);
14
});
Copied!

compareHash

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1).
The compare hash function takes two hash and calculates the hamming distance between them.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const sensitivity = 1;
4
const hashType = HashType.Average;
5
const boundsA = {
6
top: 0,
7
left: 0,
8
width: 512,
9
height: 512,
10
};
11
// We'll compare an area that starts at the right boundary of boundsA
12
const boundsB = {
13
top: 0,
14
left: 512,
15
width: 512,
16
height: 512,
17
};
18
19
const hashA = await screen.hash(boundsA, sensitivity, hashType);
20
const hashB = await screen.hash(boundsB, sensitivity, hashType);
21
const diff = await screen.compareHash(hashA, hashB);
22
console.log(`The two areas have a difference of ${diff}`);
Copied!

listenHash

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1).
Monitors the specified screen area for changes. The callback is called when a change is detected. The screen is considered different when the distance between subsequent hashes exceeds the provided threshold.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const delayMs = 500;
4
const sensitivity = 1;
5
const hashType = HashType.Average; // Optional, defaults to Average
6
const threshold = 3;
7
const bounds = {
8
top: 0,
9
left: 0,
10
width: 512,
11
height: 512,
12
};
13
14
function callback(distance) {
15
console.log('The screen has changed!');
16
}
17
18
screen.listenHash(bounds, threshold, delayMs, sensitivity, hashType, callback);
Copied!

listenPixelDiff

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1).
Monitors the specified screen area for changes. The callback is called when a change is detected. The screen is considered different when the distance between subsequent hashes exceeds the provided threshold. This counts the number of differences in the image by pixel component. The returned difference is a number between 0 and 1. A difference of 0 means the images are the same. A difference of 1 means the images are entirely different.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const delayMs = 500;
4
const threshold = 0.2;
5
const bounds = {
6
top: 0,
7
left: 0,
8
width: 512,
9
height: 512,
10
};
11
12
function callback(distance) {
13
console.log('The screen has changed!');
14
}
15
16
screen.listenPixelDiff(bounds, threshold, sensitivity, hashType, callback);
Copied!

listenPixelDiffActiveWindow

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1).
Monitors the active window for changes. The callback is called when a change is detected. The screen is considered different when the distance between subsequent hashes exceeds the provided threshold. This counts the number of differences in the image by pixel component. The returned difference is a number between 0 and 1. A difference of 0 means the images are the same. A difference of 1 means the images are entirely different.
1
import { screen } from '@oliveai/ldk';
2
3
const delayMs = 500;
4
const threshold = 0.2;
5
6
function callback(distance) {
7
console.log('The active window has changed!');
8
}
9
10
screen.listenPixelDiff(threshold, delayMs, callback);
Copied!

listenOcrMonitor

This function will listen to active window changes. This function provides active window recognition in the backend, so you don't need to use the Window aptitude to capture the active window! You can run multiple listenOcrMonitor functions at the same time. Once you start running listenOcrMonitor , it will keep running in the background. After closing all the whispers that have listenOcrMonitor running, listenOcrMonitor will stop running.
You can find ocrEvent here : OcrEvent
1
import { screen } from '@oliveai/ldk';
2
3
4
function callback(ocrEvents) {
5
console.log('The active window has changed!');
6
ocrEvents.forEach((element) => {
7
// This will log every newly changed character.
8
// We put this here to make you have a better understanding
9
// of what listenOcrMonitor does
10
// But please be careful that logging every changes will
11
// negatively affect PC performance.
12
console.log(element.new.text);
13
}
14
}
15
16
screen.listenOcrMonitor(callback);
Copied!

ocrFileEncoded

This function will provide ability to ocr an image file from network (use network aptitude) or local directory (use filesystem aptitude). Note: this function is only taken encoded base64 string as parameter, you will need to encode file bytes to base 64 string before pass the parameter to ocrFileEncoded function
You can find OCRResult here : # OCRResult
1
import { screen } from '@oliveai/ldk';
2
3
// The encodedFileString object is the base64 encoded file bytes
4
const encodedFileString = "test";
5
6
// After calling ocrFileEncoded() function, it returns ocrResult
7
// which is an OCRResult object.
8
screen.ocrFileEncoded(encodedFileString).then(ocrResult) => {
9
console.log(JSON.stringify(ocrResult));
10
// Example output:
11
// {"confidence":-1,"text":"","level":4,"page_num":1,"block_num":8,"par_num":1,"line_num":1,"word_num":0,"left":319,"top":323,"width":426,"height":21},
12
// {"confidence":91,"text":"[email protected]","level":5,"page_num":1,"block_num":8,"par_num":1,"line_num":1,"word_num":1,"left":319,"top":323,"width":426,"height":21},
13
}
14
/*
15
* Please notice, ocrFileEncoded's resulting output is not text, it is an OCRResult Object
16
* In order to get specific text, please refer to our example tab for
17
* detailed usage of this function.
18
*/
Copied!
If you have a window that contains text that you want to read, you can use the ocr() function to read the text content in the window. In this example we create a Whisper to use the OCR method to read the text contents.
First, we create a Whisper by defining the writeWhisper()function. When the user clicks the "Perform OCR" button, the onClick()function fires, thus in turn calling the performOcr()function. It seems to be a lot of code here, but performOcr() is the key function to make the magic happen!
🎉
It calls ocr()to perform optical character recognition. The window Aptitude is used to detect the active window, thus we can get the active window's coordinates using activeWindow().
1
import { screen, whisper, window } from '@oliveai/ldk';
2
3
// Create a whisper
4
const writeWhisper = (label, body) =>
5
whisper.create({
6
label,
7
onClose: () => {
8
console.log(`Closed Whisper`);
9
},
10
components: [
11
{
12
body,
13
type: whisper.WhisperComponentType.Markdown,
14
},
15
{
16
type: whisper.WhisperComponentType.Button,
17
label: 'Perform OCR',
18
// after clicking this button, performOcr() will be called
19
onClick: (error, incomingWhisper) => {
20
incomingWhisper.close((e) => console.error(e));
21
performOcr();
22
},
23
},
24
],
25
});
26
27
function sleep(ms) {
28
return new Promise((resolve) => setTimeout(resolve, ms));
29
}
30
31
async function performOcr() {
32
await sleep(3000); // sleeping for 3s to give the user time to switch tabs
33
window.activeWindow().then((windowInfo) => {
34
const ocrCoordinates = {
35
top: windowInfo.y,
36
left: windowInfo.x,
37
width: windowInfo.width,
38
height: windowInfo.height,
39
};
40
41
console.log('performing ocr with coordinates...');
42
43
screen
44
.ocr(ocrCoordinates)
45
.then((ocrResults) => {
46
console.log('OCR Results: ');
47
console.log(JSON.stringify(ocrResults));
48
49
// ocrResults is an OCRResult Object
50
// This function reconstructs ocrResults to a string
51
console.log(rebuild_image(ocrResults));
52
53
// filter out values with lower confidence value
54
let resFilter = ocrResults.filter((res) => res.confidence > 75);
55
resFilter = resFilter.map((res) => `${res.text}`);
56
writeWhisper(`ocrResults`, `${resFilter.join(' ')}`);
57
})
58
.catch((error) => {
59
console.log('error: ');
60
console.log(error);
61
});
62
});
63
}
Copied!
Second, knowing that ocr() returns an array of OCRResult Objects, we need to re-construct the result to a readable string. For this we'll make the rebuild_image() method. It loops though every paragraph, every line and every word to concatenate all the words to form a string.
1
// This function reconstructs ocrResults into a string
2
const rebuild_image = (ocrResults) => {
3
const lines = [];
4
for (const box of ocrResults) {
5
if (box.level === undefined) {
6
continue;
7
}
8
let text = box.text;
9
let cur_line = box.line_num;
10
let cur_word = box.word_num;
11
let par_num = box.par_num;
12
13
while (lines.length <= par_num) {
14
lines.push([]);
15
}
16
while (lines[par_num].length <= cur_line) {
17
lines[par_num].push([]);
18
}
19
while (lines[par_num][cur_line].length <= cur_word) {
20
lines[par_num][cur_line].push('');
21
}
22
lines[par_num][cur_line][cur_word] = text;
23
}
24
25
let full_text = [];
26
for (const para of lines) {
27
let para_temp = [];
28
for (const list_of_words of para) {
29
para_temp.push(list_of_words.join(' '));
30
}
31
full_text.push(para_temp.join('\n'));
32
}
33
34
return full_text.join('\n\n');
35
};
Copied!
Finally, wrap up those functions, and call this top level wrapping function OcrLoop(). And magic will happen!
🎉
1
// call writeWhisper function to create an ocr whisper
2
async function OcrLoop() {
3
writeWhisper('ocr', 'starting ocr app');
4
}
5
6
// call OcrLoop()
7
OcrLoop();
8
console.log('starting app');
Copied!

listenOcrMonitor Example

Now, let's take a look at the listenOcrMonitor example here!
listener listens to the changes of the active window. ocrEvent.old provides previous bounds and text, ocrEvent.new provides the latest bounds and text. And then we just concatenate the text we want! The listenOcrMonitor function can recognize the active window in the backend, so you don't need to integrate it with the Window aptitude here!
1
import { screen, whisper } from '@oliveai/ldk';
2
import { OcrEvent } from '@oliveai/ldk/dist/screen/types';
3
4
console.log('Running listenOcrMonitor function...');
5
6
function sleep(ms: number) {
7
// Sleep for ms milliseconds
8
return new Promise((resolve) => setTimeout(resolve, ms));
9
}
10
11
const listener = await screen.listenOcrMonitor((ocrEvents) => {
12
await sleep(1000);
13
const resultNew: string[] = [];
14
const resultOld: string[] = [];
15
16
// For each ocrEvent, we push updated text to resultNew and previous text to resultOld
17
ocrEvents.forEach((ocrEvent) => {
18
resultNew.push(ocrEvent.new.text);
19
resultOld.push(ocrEvent.old.text);
20
});
21
const resultNewString = resultNew.join(' ');
22
const resultOldString = resultOld.join(' ');
23
24
whisper.create({
25
label: 'Test Screen Monitor',
26
onClose: () => {
27
console.log(`Closed Whisper`);
28
},
29
components: [
30
{
31
body: `New Text: ${resultNewString}`,
32
type: whisper.WhisperComponentType.Markdown,
33
},
34
{
35
body: `Old Text: ${resultOldString}`,
36
type: whisper.WhisperComponentType.Markdown,
37
},
38
],
39
});
40
console.log('Result of the changed text is', resultNewString);
41
listener.cancel();
42
});
Copied!

ocrFileEncoded Example:

Here is an example of the usage for ocrFileEncoded function.
First, we will need find a way to pass the file to the loop (this can be done by using network or filesystem aptitude), read and encoded the file to base 64 string.
Second, after encoded the file data, pass it to ocrFileEncoded function, then it will return an array of OCRResult.
1
import { network, screen, whisper } from '@oliveai/ldk';
2
import { Buffer } from 'buffer';
3
4
//whisper to perform ocrFileEncoded
5
const writeWhisperFileEncoded = (label: string, body: string) => {
6
whisper.create({
7
label,
8
onClose: () => {
9
console.log('Closed Whisper');
10
},
11
components: [
12
{ body, type: whisper.WhisperComponentType.Markdown },
13
{
14
type: whisper.WhisperComponentType.Button,
15
label: 'Perform testOcrFileEncoded',
16
onClick: (error, incomingWhisper) => {
17
incomingWhisper.close((e) => console.error(e));
18
performOcrFileEncoded();
19
},
20
},
21
],
22
});
23
};
24
25
//write a whisper to show result compare to the requested image file
26
const writeWhisperFileEncodedResult = (label: string, body: string) => {
27
whisper.create({
28
label,
29
onClose: () => {
30
console.log('Closed Whisper');
31
},
32
components: [
33
{ body, type: whisper.WhisperComponentType.Markdown },
34
{
35
body: `![image](https://raw.githubusercontent.com/open-olive/loop-development-kit/${branch}/ldk/javascript/examples/self-test-loop/static/testocr.png)`,
36
type: whisper.WhisperComponentType.Markdown,
37
},
38
{
39
type: whisper.WhisperComponentType.Button,
40
label: 'Perform testOcrFileEncoded',
41
onClick: (error, incomingWhisper) => {
42
incomingWhisper.close((e) => console.error(e));
43
performOcrFileEncoded();
44
},
45
},
46
],
47
});
48
};
49
50
async function performOcrFileEncoded() {
51
//request a image file from online or use Filesystem aptitude to read local files
52
let request = await network.httpRequest({
53
url: `https://github.com/open-olive/loop-development-kit/raw/develop/ldk/javascript/examples/self-test-loop/static/testocr.png`,
54
method: 'GET',
55
});
56
57
//encode request the file bytes to base64 string
58
const encodedImage = Buffer.from(request.body).toString('base64');
59
60
screen
61
.ocrFileEncoded(encodedImage)
62
.then((result) => {
63
console.log('OCR Results: ');
64
console.log(JSON.stringify(result));
65
const concatResult = result.map((res) => res.text).join(' ');
66
console.log('concatResult', concatResult);
67
writeWhisperFileEncodedResult('result', concatResult);
68
})
69
.catch((error) => {
70
console.log('error: ');
71
console.log(error);
72
});
73
}
74
// wrapper function to start the app
75
const ocrFileEncoded = (): Promise<boolean> =>
76
new Promise(async (resolve, reject) => {
77
try {
78
await writeWhisperFileEncoded('OcrFileEncoded', 'Starting testOcrFileEncoded');
79
} catch (e) {
80
console.error(e);
81
reject(e);
82
}
83
});
Copied!
To use the Screen Aptitude, simply set the following permissions in your package.json under the ldk object.
Please see our Permissions page for more information.
1
...
2
"ldk": {
3
"permissions": {
4
"screen": {},
5
...
6
}
7
},
8
...
Copied!
Copy link