Screen
The Screen aptitude provides the ability to read the user's screen with solutions like Optical Character Recognition (OCR).
API
Example
Permissions

ocr

Optical Character Recognition (OCR) technology is a business solution for automating data extraction from printed or written text from a scanned document or image file and converts text into a machine-readable format to be used for data processing. The OCR triggers locally on the user’s computer and the captured content never leaves the desktop. The loop will be configured to only allow by customer request.
OCR processes a digital image by locating and recognizing characters, such as letters, numbers, and symbols. In Olive Helps, OCR() performs screen optical character recognition and returns recognized text results OCRResult .
1
import { screen } from '@oliveai/ldk';
2
3
// The ocrCoordinates object is used to identify an area on the screen
4
// to search for text
5
const ocrCoordinates = {
6
top: 1,
7
left: 1,
8
width: 10,
9
height: 10,
10
};
11
12
// After calling ocr() function, it returns ocrResult
13
// which is an OCRResult object.
14
screen.ocr(ocrCoordinates).then(ocrResult) => {
15
console.log(JSON.stringify(ocrResult));
16
// Example output:
17
// {"confidence":-1,"text":"","level":4,"page_num":1,"block_num":8,"par_num":1,"line_num":1,"word_num":0,"left":319,"top":323,"width":426,"height":21},
18
// {"confidence":91,"text":"[email protected]","level":5,"page_num":1,"block_num":8,"par_num":1,"line_num":1,"word_num":1,"left":319,"top":323,"width":426,"height":21},
19
}
20
/*
21
* Please notice, ocr's resulting output is not text, it is an OCRResult Object
22
* In order to get specific text, please refer to our example tab for
23
* detailed usage of this function.
24
*/
Copied!

hash

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1)
The hash method calculates the image hash of the specified area using the specified hashing algorithm.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const sensitivity = 1;
4
const hashType = HashType.Average; // Optional, defaults to Average
5
const bounds = {
6
top: 0,
7
left: 0,
8
width: 512,
9
height: 512,
10
};
11
12
screen.hash(bounds, sensitivity, hashType).then((hash) => {
13
console.log(`Hashed area string: ${hash}`);
14
});
Copied!

compareHash

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1)
The compare hash function takes two hash and calculates the hamming distance between them.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const sensitivity = 1;
4
const hashType = HashType.Average;
5
const boundsA = {
6
top: 0,
7
left: 0,
8
width: 512,
9
height: 512,
10
};
11
// We'll compare an area that starts at the right boundary of boundsA
12
const boundsB = {
13
top: 0,
14
left: 512,
15
width: 512,
16
height: 512,
17
};
18
19
const hashA = await screen.hash(boundsA, sensitivity, hashType);
20
const hashB = await screen.hash(boundsB, sensitivity, hashType);
21
const diff = await screen.compareHash(hashA, hashB);
22
console.log(`The two areas have a difference of ${diff}`);
Copied!

listenHash

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1)
Monitors the specified screen area for changes. The callback is called when a change is detected. The screen is considered different when the distance between subsequent hashes exceeds the provided threshold.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const delayMs = 500;
4
const sensitivity = 1;
5
const hashType = HashType.Average; // Optional, defaults to Average
6
const threshold = 3;
7
const bounds = {
8
top: 0,
9
left: 0,
10
width: 512,
11
height: 512,
12
};
13
14
function callback(distance) {
15
console.log('The screen has changed!');
16
}
17
18
screen.listenHash(bounds, threshold, delayMs, sensitivity, hashType, callback);
Copied!

listenPixelDiff

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1)
Monitors the specified screen area for changes. The callback is called when a change is detected. The screen is considered different when the distance between subsequent hashes exceeds the provided threshold. This counts the number of differences in the image by pixel component. The returned difference is a number between 0 and 1. A difference of 0 means the images are the same. A difference of 1 means the images are entirely different.
1
import { screen, HashType } from '@oliveai/ldk';
2
3
const delayMs = 500;
4
const threshold = 0.2;
5
const bounds = {
6
top: 0,
7
left: 0,
8
width: 512,
9
height: 512,
10
};
11
12
function callback(distance) {
13
console.log('The screen has changed!');
14
}
15
16
screen.listenPixelDiff(bounds, threshold, sensitivity, hashType, callback);
Copied!

listenPixelDiffActiveWindow

This is an experimental method and is subject to breaking changes (Available in v3.9.1-beta.1)
Monitors the active window for changes. The callback is called when a change is detected. The screen is considered different when the distance between subsequent hashes exceeds the provided threshold. This counts the number of differences in the image by pixel component. The returned difference is a number between 0 and 1. A difference of 0 means the images are the same. A difference of 1 means the images are entirely different.
1
import { screen } from '@oliveai/ldk';
2
3
const delayMs = 500;
4
const threshold = 0.2;
5
6
function callback(distance) {
7
console.log('The active window has changed!');
8
}
9
10
screen.listenPixelDiff(threshold, delayMs, callback);
Copied!
Imagine you have a window that you want to read, you can use the ocr() function to read the text in the window. We'll create the following whisper to help us run the OCR method.
First, you need to create a whisper. writeWhisper()is the function to create a whisper. By clicking "Perform OCR" button will trigger onClick() function, thus performOcr() will be called. It seems to be a lot of code here, but performOcr() is the key function to make the magic happen!
🎉
It calls ocr()to perform optical character recognition. We use window aptitude here to enable the ability to detect the active window, thus you can get active window's coordinates using activeWindow().
1
import { screen, whisper, window } from '@oliveai/ldk';
2
3
// Create a whisper
4
const writeWhisper = (label, body) =>
5
whisper.create({
6
label,
7
onClose: () => {
8
console.log(`Closed Whisper`);
9
},
10
components: [
11
{
12
body,
13
type: whisper.WhisperComponentType.Markdown,
14
},
15
{
16
type: whisper.WhisperComponentType.Button,
17
label: 'Perform OCR',
18
// after clicking this button, performOcr() will be called
19
onClick: (error, incomingWhisper) => {
20
incomingWhisper.close((e) => console.error(e));
21
performOcr();
22
},
23
},
24
],
25
});
26
27
function sleep(ms) {
28
return new Promise((resolve) => setTimeout(resolve, ms));
29
}
30
31
async function performOcr() {
32
await sleep(3000); // sleeping for 3s to give the user time to switch tabs
33
window.activeWindow().then((windowInfo) => {
34
const ocrCoordinates = {
35
top: windowInfo.y,
36
left: windowInfo.x,
37
width: windowInfo.width,
38
height: windowInfo.height,
39
};
40
41
console.log('performing ocr with coordinates...');
42
43
screen
44
.ocr(ocrCoordinates)
45
.then((ocrResults) => {
46
console.log('OCR Results: ');
47
console.log(JSON.stringify(ocrResults));
48
49
// ocrResults is an OCRResult Object
50
// This function reconstructs ocrResults to a string
51
console.log(rebuild_image(ocrResults));
52
53
// filter out values with lower confidence value
54
let resFilter = ocrResults.filter((res) => res.confidence > 75);
55
resFilter = resFilter.map((res) => `${res.text}`);
56
writeWhisper(`ocrResults`, `${resFilter.join(' ')}`);
57
})
58
.catch((error) => {
59
console.log('error: ');
60
console.log(error);
61
});
62
});
63
}
Copied!
Second, knowing that ocr() returns an array of OCRResult Objects, you need to re-construct the result to a readable string. For this we'll make the rebuild_image() method. It loops though every paragraph, every line and every word to concatenate all the words to form a string.
1
// This function reconstructs ocrResults into a string
2
const rebuild_image = (ocrResults) => {
3
const lines = [];
4
for (const box of ocrResults) {
5
if (box.level === undefined) {
6
continue;
7
}
8
let text = box.text;
9
let cur_line = box.line_num;
10
let cur_word = box.word_num;
11
let par_num = box.par_num;
12
13
while (lines.length <= par_num) {
14
lines.push([]);
15
}
16
while (lines[par_num].length <= cur_line) {
17
lines[par_num].push([]);
18
}
19
while (lines[par_num][cur_line].length <= cur_word) {
20
lines[par_num][cur_line].push('');
21
}
22
lines[par_num][cur_line][cur_word] = text;
23
}
24
25
let full_text = [];
26
for (const para of lines) {
27
let para_temp = [];
28
for (const list_of_words of para) {
29
para_temp.push(list_of_words.join(' '));
30
}
31
full_text.push(para_temp.join('\n'));
32
}
33
34
return full_text.join('\n\n');
35
};
Copied!
Finally, wrap up those functions, and call this top level wrapping function OcrLoop(). And magic will happen!
🎉
1
// call writeWhisper function to create an ocr whisper
2
async function OcrLoop() {
3
writeWhisper('ocr', 'starting ocr app');
4
}
5
6
// call OcrLoop()
7
OcrLoop();
8
console.log('starting app');
Copied!
To use the Screen aptitude, simply set the following permissions in your package.json under the ldk object.
1
...
2
"ldk": {
3
"permissions": {
4
"screen": {},
5
...
6
}
7
},
8
...
Copied!
Last modified 28d ago
Copy link