The Clipboard Aptitude provides the ability to retrieve information about the user's cursor.
listenPosition
Starts listening to changes to the cursor position. Takes a callback function as an argument, which sends a Position whenever called. A promise is triggered after setting the listener that provides access to a Cancellable stream, so that you can turn off the listener.
import { clipboard } from'@oliveai/ldk';// Function that is called after position() is successful// Provides the cursor's position when position() was executedconstcallback= (position) => {console.log(`Cursor position: ${position.x}/${position.y}`);};clipboard.position().then(callback);
In this example, we are going to use the Cursor and Keyboard Aptitudes to provide the required coordinates to the Screen Aptitude's ocr() function.
We'll use listenCharacter() to get cursor position. Move your cursor to the top left corner of the area you want to perform OCR on, then press 's' or 'S' to capture those coordinates. Then move your cursor to the bottom right corner of the area you want, and press 'a' or 'A' to get the second coordinates. Then we can calculate ocrCoordinates according to the cursor positions, and then pass it to the ocr() function.
let check = 0;
async function runOCRWithCursorPosition() {
let topParam: number;
let leftParam: number;
let topParam1: number;
let leftParam1: number;
const listener1 = await keyboard.listenCharacter(async (char) => {
console.debug('Hotkey pressed', 'response', char);
if (char === 's' || char === 'S') {
const position = await cursor.position();
// we have to convert position.x to integer
// ocrCoordinates only takes integers in our backdend.
leftParam = parseInt(position.x.toString(), 10);
topParam = parseInt(position.y.toString(), 10);
console.log('First time cursor position:', topParam, leftParam);
listener1.cancel();
check = 1;
}
});
const listener2 = await keyboard.listenCharacter(async (char) => {
console.debug('Hotkey pressed', 'response', char);
if ((char === 'a' || char === 'A') && check === 1) {
const position1 = await cursor.position();
leftParam1 = parseInt(position1.x.toString(), 10);
topParam1 = parseInt(position1.y.toString(), 10);
console.log('Second time cursor Position:', topParam1, leftParam1);
listener2.cancel();
const width = Math.abs(leftParam1 - leftParam);
const height = Math.abs(topParam1 - topParam);
const ocrCoordinates = {
top: topParam,
left: leftParam,
width,
height,
};
const result = await screen.ocr(ocrCoordinates);
console.log('OCR Results: ');
console.log(JSON.stringify(result));
console.log('result: ', rebuildImage(result));
writeWhisper(`result`, rebuildImage(result));
console.log('got OCR coordinates:');
console.log(
ocrCoordinates.top,
ocrCoordinates.left,
ocrCoordinates.width,
ocrCoordinates.height,
);
console.log('performing ocr with coordinates...');
}
});
}
Now we can wrap up runOCRWithCursorPosition with our Whisper!