Tutorial

Screenshot to Action: A Deep Dive Into the /v1/predict Endpoint

Name: Coasty AI Employee
Brand: Coasty
Price: 19 USD
Availability: InStock
Rating: 4.8 (1250 reviews)

Rachel Kim|June 16, 2026|6 min

⌘+B

Most automation tools rely on brittle selectors or rigid APIs that break when a UI changes. The /v1/predict endpoint flips that model. You send a base64 screenshot and a natural language instruction, and the model returns concrete actions like click, type, and scroll. This is the core of a reliable computer use agent that reads the screen and acts like a human. This guide walks through the endpoint, request fields, pricing, and a working example.

How /v1/predict works

The endpoint takes a base64 screenshot, an instruction, and a CUA version, then returns an actions array and a status. You loop capture, predict, and act until status is done. Request fields (POST https://coasty.ai/v1/predict): - screenshot: base64-encoded image, e.g., a PNG or JPEG - instruction: natural language describing what to do - cua_version: one of 'v3' or 'v4' (default 'v3') Response fields: - actions: array of action objects (e.g., type, click, scroll) - status: 'pending', 'done', or an error code Every prediction costs $0.05. You keep sending screenshots and actions until status is done.

bash

curl -X POST https://coasty.ai/v1/predict \
     -H 'X-API-Key: $COASTY_API_KEY' \
     -H 'Content-Type: application/json' \
     -d '{
       "screenshot": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI7wAAAABJRU5ErkJggg==",
       "instruction": "Click the OK button",
       "cua_version": "v3"
     }'

Response:
{
  "actions": [
    {"type": "click", "x": 200, "y": 150}
  ],
  "status": "done"
}

Full Python loop

●Capture the screen using pyautogui or a library like mss
●Encode the image to base64
●POST to /v1/predict with the screenshot, instruction, and cua_version
●Loop capture, predict, and act until status is done
●Each prediction costs $0.05

python

import base64
import os
import requests
import pyautogui

def predict_and_act(instruction, cua_version="v3"):
    url = "https://coasty.ai/v1/predict"
    api_key = os.getenv("COASTY_API_KEY")
    headers = {"X-API-Key": api_key}

    while True:
        # Capture screen
        screenshot = pyautogui.screenshot()
        with open("temp.png", "wb") as f:
            screenshot.save(f)
        with open("temp.png", "rb") as f:
            img_bytes = f.read()
        base64_img = base64.b64encode(img_bytes).decode()

        # Predict
        resp = requests.post(
            url,
            headers=headers,
            json={
                "screenshot": base64_img,
                "instruction": instruction,
                "cua_version": cua_version
            }
        )
        resp.raise_for_status()
        data = resp.json()

        actions = data.get("actions", [])
        status = data.get("status")

        # Act on each action
        for act in actions:
            if act["type"] == "click":
                pyautogui.click(act["x"], act["y"])
            elif act["type"] == "type":
                pyautogui.write(act["text"])
            elif act["type"] == "scroll":
                pyautogui.scroll(act["delta"])

        if status == "done":
            break

if __name__ == "__main__":
    predict_and_act("Click the OK button")

Loop capture, predict, and act until status is done. Each prediction costs $0.05.

Where this beats brittle automation

Traditional automation relies on fixed selectors or specific API endpoints. When a UI updates, those selectors break. With a computer use API, your agent sees the screen and chooses actions based on the current layout. It can handle dynamic buttons, overlapping elements, and language changes. You get a robust agent that adapts to real-world software without brittle selectors.

The /v1/predict endpoint is the foundation of a powerful computer use agent. Build a bot that reads a screen, decides what to do, and executes clicks and keystrokes just like a human. Want to try it? Get your API key at https://coasty.ai/developers and start turning screenshots into real actions.

Screenshot to Action: A Deep Dive Into the /v1/predict Endpoint

How /v1/predict works

Full Python loop

Where this beats brittle automation

Compare Coasty

Computer Use For