SDK/Python

Python SDK

Fully async Python SDK for controlling Android phones via ScreenMCP.

pip install screenmcp

Installation

pip
pip install screenmcp
From source
cd sdk/python
pip install -e .

Requirements

  • Python 3.10+
  • websockets >= 12.0
  • httpx >= 0.25.0

Quick Start

example.py
import asyncio
from screenmcp import ScreenMCPClient

async def main():
    async with ScreenMCPClient(api_key="pk_your_key_here") as phone:
        # Take a screenshot
        result = await phone.screenshot()
        print(f"Got image: {len(result['image'])} bytes base64")

        # Tap on the screen
        await phone.click(540, 960)

        # Type some text
        await phone.type_text("Hello from Python!")

        # Scroll down
        await phone.scroll("down", amount=800)

        # Get the UI tree for inspection
        tree = await phone.ui_tree()
        print(tree)

asyncio.run(main())

Configuration

Python
client = ScreenMCPClient(
    api_key="pk_...",                                # required
    api_url="https://screenmcp.com",          # default
    device_id="your-device-uuid",                    # optional; auto-selects if omitted
    command_timeout=30.0,                            # seconds (default 30)
    auto_reconnect=True,                             # default True
)
OptionTypeDefaultDescription
api_keystr--Required. Your API key starting with pk_.
api_urlstrhttps://screenmcp.comScreenMCP API server URL.
device_idstr | NoneNoneTarget device ID. If omitted, the first available device is used.
command_timeoutfloat30.0Timeout per command in seconds.
auto_reconnectboolTrueAutomatically reconnect on WebSocket disconnect.

Context Manager Usage

The recommended way to use the SDK is with an async context manager (async with), which automatically handles connection and cleanup.

Context manager (recommended)
async with ScreenMCPClient(api_key="pk_...") as phone:
    await phone.screenshot()
    await phone.click(100, 200)
# Connection is automatically closed when exiting the block

Manual lifecycle

If you prefer not to use the context manager, you can manage the connection manually:

Manual connect/disconnect
phone = ScreenMCPClient(api_key="pk_...")
await phone.connect()
try:
    await phone.screenshot()
finally:
    await phone.disconnect()

API Reference

All 15 command methods are async and return a dict with the command result from the phone.

screenshot()

Capture the phone screen. Returns a dict with a base64-encoded image string.

async def screenshot(self) -> dict

Returns: { "image": str } -- base64-encoded WebP image data.

click(x, y)

Tap on the screen at the given coordinates.

async def click(self, x: int, y: int) -> dict
ParameterTypeDescription
xintX coordinate
yintY coordinate

Returns: dict with status.

long_click(x, y)

Long-press at coordinates for approximately 1000ms.

async def long_click(self, x: int, y: int) -> dict
ParameterTypeDescription
xintX coordinate
yintY coordinate

Returns: dict with status.

drag(start_x, start_y, end_x, end_y)

Perform a drag gesture from one point to another.

async def drag(self, start_x: int, start_y: int, end_x: int, end_y: int) -> dict
ParameterTypeDescription
start_xintStart X coordinate
start_yintStart Y coordinate
end_xintEnd X coordinate
end_yintEnd Y coordinate

Returns: dict with status.

scroll(direction, amount)

Scroll the screen in a given direction. Direction must be "up", "down", "left", or "right".

async def scroll(self, direction: str, amount: int = 500) -> dict
ParameterTypeDescription
directionstr"up", "down", "left", or "right"
amountint (default 500)Scroll distance in pixels

Returns: dict with status.

type_text(text)

Type text into the currently focused input field.

async def type_text(self, text: str) -> dict
ParameterTypeDescription
textstrThe text to type

Returns: dict with status.

get_text()

Read the text content from the currently focused element.

async def get_text(self) -> dict

Returns: { "text": str }

select_all()

Select all text in the currently focused field.

async def select_all(self) -> dict

Returns: dict with status.

copy()

Copy the currently selected text to the clipboard.

async def copy(self) -> dict

Returns: dict with status.

paste()

Paste clipboard contents into the focused field.

async def paste(self) -> dict

Returns: dict with status.

back()

Press the Android back button.

async def back(self) -> dict

Returns: dict with status.

home()

Press the Android home button.

async def home(self) -> dict

Returns: dict with status.

recents()

Open the recent apps / app switcher view.

async def recents(self) -> dict

Returns: dict with status.

ui_tree()

Get the accessibility tree of the current screen. Returns an array of UI node objects.

async def ui_tree(self) -> dict

Returns: { "tree": list } -- list of accessibility node dicts.

camera(facing)

Take a photo using the phone camera. Defaults to the rear camera.

async def camera(self, facing: str = "rear") -> dict
ParameterTypeDescription
facingstr (default "rear")"rear" or "front"

Returns: { "image": str } -- base64-encoded WebP image data.

Generic Commands

Use send_command() to send any command, including future commands that may not yet have a dedicated method.

Python
resp = await phone.send_command("screenshot", {"quality": 50})
print(resp)

Error Handling

The SDK provides specific exception classes for different error types. Import them from the screenmcp package.

Python
from screenmcp import ScreenMCPClient, AuthError, CommandError, ConnectionError

try:
    async with ScreenMCPClient(api_key="pk_...") as phone:
        await phone.click(100, 200)
except AuthError:
    print("Invalid API key")
except ConnectionError:
    print("Could not connect to worker")
except CommandError as e:
    print(f"Command failed: {e}")

AuthError

Raised when the API key is invalid, expired, or revoked. Check your API key on the Dashboard.

ConnectionError

Raised when the SDK cannot establish a connection to the worker server. This may indicate network issues or that no worker is available.

CommandError

Raised when a command fails on the phone side or times out. The error message contains details about the failure.

Async Patterns

Running with asyncio

All SDK methods are async. Use asyncio.run() as the entry point for scripts.

script.py
import asyncio
from screenmcp import ScreenMCPClient

async def main():
    async with ScreenMCPClient(api_key="pk_...") as phone:
        result = await phone.screenshot()
        print(f"Screenshot: {len(result['image'])} chars")

asyncio.run(main())

Sequential Command Chains

Chain commands sequentially for UI automation workflows. Each command waits for the phone to complete before the next one starts.

automation.py
async with ScreenMCPClient(api_key="pk_...") as phone:
    # Open an app by tapping its icon
    await phone.click(540, 1200)

    # Wait a moment for the app to load
    import asyncio
    await asyncio.sleep(2)

    # Take a screenshot to verify
    result = await phone.screenshot()

    # Type in a search field
    await phone.click(540, 200)   # tap search bar
    await phone.type_text("hello world")

    # Scroll through results
    await phone.scroll("down", amount=1000)

Save a Screenshot to Disk

save_screenshot.py
import asyncio
import base64
from screenmcp import ScreenMCPClient

async def main():
    async with ScreenMCPClient(api_key="pk_...") as phone:
        result = await phone.screenshot()
        image_bytes = base64.b64decode(result["image"])
        with open("screenshot.webp", "wb") as f:
            f.write(image_bytes)
        print(f"Saved screenshot ({len(image_bytes)} bytes)")

asyncio.run(main())

Using with Existing Event Loops

If you are running inside an existing async framework (like FastAPI or aiohttp), you can use the SDK directly without asyncio.run().

fastapi_example.py
from fastapi import FastAPI
from screenmcp import ScreenMCPClient

app = FastAPI()
phone = ScreenMCPClient(api_key="pk_...")

@app.on_event("startup")
async def startup():
    await phone.connect()

@app.on_event("shutdown")
async def shutdown():
    await phone.disconnect()

@app.post("/screenshot")
async def take_screenshot():
    result = await phone.screenshot()
    return {"image_length": len(result["image"])}