PhoneAgent is an experimental mobile automation project with two operating modes:
- In-app iPhone agent (SwiftUI app + XCTest runner + OpenAI Responses API)
- External bridge to let Codex/OpenClaw control iOS/Android devices
The bridge supports both:
- iOS (XCTest-hosted actions against simulator or physical iPhone)
- Android (adb + UiAutomator + input/screencap actions against emulator or device)
- Self contained iOS app
- OpenClaw controlling an iPhone
- OpenClaw controlling an Android phone
- Codex controlling an iPhone
- iOS app UI: API key entry, prompt input, microphone support, settings, always-on wake-word mode
- iOS test-hosted RPC server: newline-delimited JSON-RPC on port
45678 - Android RPC server: localhost JSON-RPC bridge backed by
adbcommands - Helper scripts:
- iOS bridge launcher + physical-device localhost forwarding
- Android bridge launcher (with adb auto-discovery)
- generic RPC CLI (
rpc.py) for both platforms
get_treeget_screen_imageget_contextset_api_keyopen_apptaptap_elemententer_textscrollswipestop
submit_prompt
submit_prompt powers the in-app iPhone agent loop.
- OpenAI API key stored in Keychain
- Prompt submission from keyboard or microphone
- Optional always-on mode with custom wake word
- Notification completion + quick-reply follow-up loop
- Xcode (for iOS app/UITest bridge)
- Python 3
- Android SDK tools (
adb) for Android bridge
- iOS simulator or physical iPhone (Developer setup)
- Android emulator or physical Android device (USB debugging or wireless debugging)
- Open
/Users/rounak/Developer/PhoneAgent-cli/PhoneAgent.xcodeprojin Xcode. - Run the
PhoneAgentscheme on an iPhone/simulator. - Enter OpenAI API key when prompted.
- Submit tasks via keyboard or microphone.
For Codex/OpenClaw usage, use the skill docs:
# iOS bundle identifier example
./.agents/skills/phoneagent/scripts/rpc.py open-app com.apple.Preferences
# Android package example
./.agents/skills/phoneagent/scripts/rpc.py open-app com.android.settings
# Fetch tree
./.agents/skills/phoneagent/scripts/rpc.py get-tree
# Capture screenshot (writes PNG under /tmp/phoneagent-artifacts)
./.agents/skills/phoneagent/scripts/rpc.py get-screen-image --print-metadataThe CLI supports --host and --port if you need non-default endpoint settings.
- Transport: newline-delimited JSON-RPC objects
- Endpoint:
127.0.0.1:45678by default open_apprequest parameter isbundle_identifier:- iOS: pass bundle identifier (e.g.
com.apple.Preferences) - Android: pass package name (e.g.
com.android.settings)
- iOS: pass bundle identifier (e.g.
tap_element/enter_textuse coordinate rectangles in format{{x, y}, {w, h}}
- Settings:
com.apple.Preferences - Camera:
com.apple.camera - Photos:
com.apple.mobileslideshow - Messages:
com.apple.MobileSMS - Home Screen:
com.apple.springboard
# Pair (from Wireless debugging screen)
adb pair <PHONE_IP:PAIRING_PORT>
# Connect (from Wireless debugging screen)
adb connect <PHONE_IP:ADB_PORT>
# Verify
adb devices -lThen start Android bridge with that network serial:
./.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh --serial <PHONE_IP:ADB_PORT>- RPC bridge is localhost-oriented (
127.0.0.1) - iOS physical-device workflow uses localhost forwarding
- Android bridge executes only through selected
adbserial - In-app API key is stored in iOS Keychain
- iOS app entry:
PhoneAgent/PhoneAgentApp.swift - iOS app UI/state:
PhoneAgent/ContentView.swift,PhoneAgent/PromptView.swift,PhoneAgent/SettingsView.swift - iOS bridge server:
PhoneAgentUITests/SimulatorRPCServer.swift,PhoneAgentUITests/PhoneAgent.swift - RPC CLI:
.agents/skills/phoneagent/scripts/rpc.py - iOS bridge launcher:
.agents/skills/phoneagent/scripts/start_rpc_bridge_local.sh - Android bridge launcher:
.agents/skills/phoneagent/scripts/start_android_rpc_bridge_local.sh - Android bridge server:
.agents/skills/phoneagent/scripts/android_rpc_bridge.py
- Android bridge does not yet implement
submit_promptagent loop - UI tree snapshots can be noisy/stale during animations
- Keyboard/text reliability can vary by app and platform
- Long-running tasks may require explicit polling/retries
- Experimental software
- Personal project
- App contents may be sent to OpenAI API when using agent flow
- Model/tool actions can be incorrect; verify important operations