Send a screenshot. Get structured mouse and keyboard actions back. One REST endpoint — for automation, browser testing, and AI agents that interact with any GUI.
{ "action_type": "click", "params": { "x": 98, "y": 136, } }
Pure REST. No SDK lock-in, no extra servers, no browser drivers.
1import requests, base6423img = base64.b64encode(open("screen.png", "rb").read()).decode()45r = requests.post(6 "https://coasty.ai/api/v1/cua/predict",7 headers={"X-API-Key": "cua_sk_..."},8 json={9 "screenshot": img,10 "instruction": "Click the search bar and type 'hello'",11 },12)1314for a in r.json()["actions"]:15 print(a["action_type"], a["params"])
No selectors. No DOM parsing. No brittle XPath. Just vision.
Send screenshot
Base64 PNG/JPEG + plain-language intent
AI reasons visually
Vision model identifies the target UI element
Execute actions
Typed primitives: click, type, scroll, press…
Works on any UI — web, desktop, mobile, VNC. No DOM access, no selectors, no agents.
Multi-step trajectories. The model remembers what it tried, what worked, and what's next.
V3 for speed (3.5s/step, multi-action). V1 for precision (reflection, single-action).
Browser tabs, desktop apps, mobile emulators, VNC feeds — anything you can capture visually.
click, double_click, type, scroll, drag, key_press, key_combo, wait, done, fail.
Plain REST + JSON. Python, Node, Go, Ruby, PHP, Java, C#, or cURL from your terminal.
Deducted from your shared credit balance. Management endpoints always free.
POST /predict5 crPOST /sessions10 crPOST /sessions/{id}/predict4 crPOST /ground3 crPOST /ocr3 crPOST /parseFreeGET /models, /usage, /sessionsFreeSurcharges
The CUA API gives your code the ability to see and interact with any screen. Send a screenshot and a natural language instruction — receive structured mouse clicks, keyboard inputs, and scroll commands with exact coordinates.
Every request needs an X-API-Key header. Sign up to create API keys. Credits are deducted per request from your shared balance.
X-API-Key: cua_sk_your_key_hereChoose your language. The predict endpoint is the core of the API — everything else builds on it.
pip install requestsimport requests, base64
API_KEY = "cua_sk_..."
img = base64.b64encode(open("screen.png", "rb").read()).decode()
r = requests.post(
"https://coasty.ai/api/v1/cua/predict",
headers={"X-API-Key": API_KEY},
json={
"screenshot": img,
"instruction": "Click the search bar and type 'hello'",
},
)
for action in r.json()["actions"]:
print(action["action_type"], action["params"])# Create a session for multi-step tasks
s = requests.post(
"https://coasty.ai/api/v1/cua/sessions",
headers={"X-API-Key": API_KEY},
json={"cua_version": "v3", "screen_width": 1920, "screen_height": 1080},
).json()
session_id = s["session_id"]
# Send screenshots in a loop
while True:
screenshot = capture_screenshot() # your screenshot function
r = requests.post(
f"https://coasty.ai/api/v1/cua/sessions/{session_id}/predict",
headers={"X-API-Key": API_KEY},
json={"screenshot": screenshot, "instruction": "Complete the form"},
).json()
for action in r["actions"]:
execute_action(action) # your action executor
if r["status"] in ("done", "fail"):
breakEvery prediction returns structured actions with exact coordinates, a status signal, and token usage.
{
"request_id": "req_abc123",
"actions": [
{
"action_type": "click",
"params": { "x": 512, "y": 340, "button": "left", "clicks": 1 }
},
{
"action_type": "type_text",
"params": { "text": "hello world" }
}
],
"reasoning": "I see a search bar at (512, 340)...",
"status": "continue",
"usage": {
"input_tokens": 1523,
"output_tokens": 245,
"credits_charged": 5
}
}clickMouse click at (x, y)type_textType a stringkey_pressPress a key (enter, tab...)key_comboCombo (ctrl+c, cmd+v...)scrollScroll at a positiondragDrag between two pointsmoveMove cursorwaitPause executiondoneTask completedfailTask impossibleOnly screenshot and instruction are required.
screenshotstringrequiredinstructionstringrequiredcua_version"v3" | "v1"screen_widthintscreen_heightintmax_actionsint (1-10)trajectoryarraysystem_promptstringtoolsstring[]All endpoints require the X-API-Key header. Credits deducted from your shared balance.
/api/v1/cua/predict5 cr/api/v1/cua/sessions10 cr/api/v1/cua/sessions/{id}/predict4 cr/api/v1/cua/sessions/{id}/resetFree/api/v1/cua/sessions/{id}Free/api/v1/cua/ground3 cr/api/v1/cua/ocr3 cr/api/v1/cua/parseFree/api/v1/cua/modelsFree/api/v1/cua/usageFree/api/v1/cua/sessionsFreeAll errors return a JSON body with error.code and error.message fields.
INVALID_API_KEYMissing or invalid X-API-KeyINSUFFICIENT_CREDITSNot enough credits for this requestINSUFFICIENT_SCOPEAPI key lacks the required scopeRATE_LIMIT_EXCEEDEDToo many requests — check Retry-After headerINVALID_SCREENSHOTBad base64 or unsupported image formatSESSION_NOT_FOUNDSession expired or does not existFree account, free keys, free credits to start. No card required.