Python SDK
Fully async Python SDK for controlling Android phones via ScreenMCP.
pip install screenmcpInstallation
pip install screenmcpcd sdk/python
pip install -e .Requirements
- Python 3.10+
websockets>= 12.0httpx>= 0.25.0
Quick Start
import asyncio
from screenmcp import ScreenMCPClient
async def main():
async with ScreenMCPClient(api_key="pk_your_key_here") as phone:
# Take a screenshot
result = await phone.screenshot()
print(f"Got image: {len(result['image'])} bytes base64")
# Tap on the screen
await phone.click(540, 960)
# Type some text
await phone.type_text("Hello from Python!")
# Scroll down
await phone.scroll("down", amount=800)
# Get the UI tree for inspection
tree = await phone.ui_tree()
print(tree)
asyncio.run(main())Configuration
client = ScreenMCPClient(
api_key="pk_...", # required
api_url="https://screenmcp.com", # default
device_id="your-device-uuid", # optional; auto-selects if omitted
command_timeout=30.0, # seconds (default 30)
auto_reconnect=True, # default True
)| Option | Type | Default | Description |
|---|---|---|---|
api_key | str | -- | Required. Your API key starting with pk_. |
api_url | str | https://screenmcp.com | ScreenMCP API server URL. |
device_id | str | None | None | Target device ID. If omitted, the first available device is used. |
command_timeout | float | 30.0 | Timeout per command in seconds. |
auto_reconnect | bool | True | Automatically reconnect on WebSocket disconnect. |
Context Manager Usage
The recommended way to use the SDK is with an async context manager (async with), which automatically handles connection and cleanup.
async with ScreenMCPClient(api_key="pk_...") as phone:
await phone.screenshot()
await phone.click(100, 200)
# Connection is automatically closed when exiting the blockManual lifecycle
If you prefer not to use the context manager, you can manage the connection manually:
phone = ScreenMCPClient(api_key="pk_...")
await phone.connect()
try:
await phone.screenshot()
finally:
await phone.disconnect()API Reference
All 15 command methods are async and return a dict with the command result from the phone.
screenshot()Capture the phone screen. Returns a dict with a base64-encoded image string.
async def screenshot(self) -> dictReturns: { "image": str } -- base64-encoded WebP image data.
click(x, y)Tap on the screen at the given coordinates.
async def click(self, x: int, y: int) -> dict| Parameter | Type | Description |
|---|---|---|
x | int | X coordinate |
y | int | Y coordinate |
Returns: dict with status.
long_click(x, y)Long-press at coordinates for approximately 1000ms.
async def long_click(self, x: int, y: int) -> dict| Parameter | Type | Description |
|---|---|---|
x | int | X coordinate |
y | int | Y coordinate |
Returns: dict with status.
drag(start_x, start_y, end_x, end_y)Perform a drag gesture from one point to another.
async def drag(self, start_x: int, start_y: int, end_x: int, end_y: int) -> dict| Parameter | Type | Description |
|---|---|---|
start_x | int | Start X coordinate |
start_y | int | Start Y coordinate |
end_x | int | End X coordinate |
end_y | int | End Y coordinate |
Returns: dict with status.
scroll(direction, amount)Scroll the screen in a given direction. Direction must be "up", "down", "left", or "right".
async def scroll(self, direction: str, amount: int = 500) -> dict| Parameter | Type | Description |
|---|---|---|
direction | str | "up", "down", "left", or "right" |
amount | int (default 500) | Scroll distance in pixels |
Returns: dict with status.
type_text(text)Type text into the currently focused input field.
async def type_text(self, text: str) -> dict| Parameter | Type | Description |
|---|---|---|
text | str | The text to type |
Returns: dict with status.
get_text()Read the text content from the currently focused element.
async def get_text(self) -> dictReturns: { "text": str }
select_all()Select all text in the currently focused field.
async def select_all(self) -> dictReturns: dict with status.
copy()Copy the currently selected text to the clipboard.
async def copy(self) -> dictReturns: dict with status.
paste()Paste clipboard contents into the focused field.
async def paste(self) -> dictReturns: dict with status.
back()Press the Android back button.
async def back(self) -> dictReturns: dict with status.
home()Press the Android home button.
async def home(self) -> dictReturns: dict with status.
recents()Open the recent apps / app switcher view.
async def recents(self) -> dictReturns: dict with status.
ui_tree()Get the accessibility tree of the current screen. Returns an array of UI node objects.
async def ui_tree(self) -> dictReturns: { "tree": list } -- list of accessibility node dicts.
camera(facing)Take a photo using the phone camera. Defaults to the rear camera.
async def camera(self, facing: str = "rear") -> dict| Parameter | Type | Description |
|---|---|---|
facing | str (default "rear") | "rear" or "front" |
Returns: { "image": str } -- base64-encoded WebP image data.
Generic Commands
Use send_command() to send any command, including future commands that may not yet have a dedicated method.
resp = await phone.send_command("screenshot", {"quality": 50})
print(resp)Error Handling
The SDK provides specific exception classes for different error types. Import them from the screenmcp package.
from screenmcp import ScreenMCPClient, AuthError, CommandError, ConnectionError
try:
async with ScreenMCPClient(api_key="pk_...") as phone:
await phone.click(100, 200)
except AuthError:
print("Invalid API key")
except ConnectionError:
print("Could not connect to worker")
except CommandError as e:
print(f"Command failed: {e}")AuthError
Raised when the API key is invalid, expired, or revoked. Check your API key on the Dashboard.
ConnectionError
Raised when the SDK cannot establish a connection to the worker server. This may indicate network issues or that no worker is available.
CommandError
Raised when a command fails on the phone side or times out. The error message contains details about the failure.
Async Patterns
Running with asyncio
All SDK methods are async. Use asyncio.run() as the entry point for scripts.
import asyncio
from screenmcp import ScreenMCPClient
async def main():
async with ScreenMCPClient(api_key="pk_...") as phone:
result = await phone.screenshot()
print(f"Screenshot: {len(result['image'])} chars")
asyncio.run(main())Sequential Command Chains
Chain commands sequentially for UI automation workflows. Each command waits for the phone to complete before the next one starts.
async with ScreenMCPClient(api_key="pk_...") as phone:
# Open an app by tapping its icon
await phone.click(540, 1200)
# Wait a moment for the app to load
import asyncio
await asyncio.sleep(2)
# Take a screenshot to verify
result = await phone.screenshot()
# Type in a search field
await phone.click(540, 200) # tap search bar
await phone.type_text("hello world")
# Scroll through results
await phone.scroll("down", amount=1000)Save a Screenshot to Disk
import asyncio
import base64
from screenmcp import ScreenMCPClient
async def main():
async with ScreenMCPClient(api_key="pk_...") as phone:
result = await phone.screenshot()
image_bytes = base64.b64decode(result["image"])
with open("screenshot.webp", "wb") as f:
f.write(image_bytes)
print(f"Saved screenshot ({len(image_bytes)} bytes)")
asyncio.run(main())Using with Existing Event Loops
If you are running inside an existing async framework (like FastAPI or aiohttp), you can use the SDK directly without asyncio.run().
from fastapi import FastAPI
from screenmcp import ScreenMCPClient
app = FastAPI()
phone = ScreenMCPClient(api_key="pk_...")
@app.on_event("startup")
async def startup():
await phone.connect()
@app.on_event("shutdown")
async def shutdown():
await phone.disconnect()
@app.post("/screenshot")
async def take_screenshot():
result = await phone.screenshot()
return {"image_length": len(result["image"])}