A comprehensive iOS automation portal that provides HTTP API access to iOS device UI state extraction and automated interactions.
The Droidrun iOS Portal is a specialized iOS application that runs UI tests to expose device automation capabilities through a RESTful HTTP API. It consists of two main components:
- Portal App (
droidrun-ios-portal): A minimal SwiftUI application that serves as the host - Portal Server (
droidrun-ios-portalUITests): XCTest-based HTTP server providing automation APIs
The portal leverages iOS XCTest framework and XCUITest capabilities to:
- Extract UI state information (accessibility trees, screenshots)
- Perform automated interactions (taps, swipes, text input)
- Launch and manage applications
- Handle device-level inputs
- DroidrunPortalServer: XCTest class that runs an HTTP server on port 6643
- DroidrunPortalHandler: HTTP route handler defining the REST API endpoints
- DroidrunPortalTools: Core automation engine implementing device interactions
- AccessibilityTree: UI state extraction and compression utilities
Returns basic device information and description.
Response:
{
"description": "Device description string"
}Retrieves current phone state including active app and keyboard status.
Response:
{
"activity": "com.example.app - Screen Title",
"keyboardShown": false
}Extracts the accessibility tree of the current UI state.
Response:
{
"accessibilityTree": "Compressed accessibility tree string"
}Captures a screenshot of the current screen.
Response: PNG image data (Content-Type: image/png)
Launches an application by bundle identifier.
Request Body:
{
"bundleIdentifier": "com.example.app"
}Response:
{
"message": "opened com.example.app"
}Performs tap gestures on screen coordinates.
Request Body:
{
"rect": "{{x,y},{width,height}}",
"count": 1,
"longPress": false
}Response:
{
"message": "tapped element"
}Performs swipe gestures from specified coordinates.
Request Body:
{
"x": 100.0,
"y": 200.0,
"dir": "up"
}Supported directions: up, down, left, right
Response:
{
"message": "swiped"
}Enters text into a focused input field.
Request Body:
{
"rect": "{{x,y},{width,height}}",
"text": "Hello World"
}Response:
{
"message": "entered text"
}Presses device hardware keys.
Request Body:
{
"key": 0
}Supported keys:
0: Home button4: Action button5: Camera button
Response:
{
"message": "pressed key"
}- Accessibility Tree: Compressed representation of the UI hierarchy with memory addresses removed
- Screenshots: PNG format screen captures
- App State: Current application context and keyboard status
- App Launching: Launch any installed app by bundle identifier
- Touch Interactions: Single taps, double taps, long presses
- Gesture Recognition: Swipe gestures in four directions
- Text Input: Automated typing with keyboard handling
- Hardware Keys: Device button presses
- App Management: Automatic app switching and state management
- Keyboard Detection: Intelligent keyboard presence detection
- Focus Management: Ensures proper element focus for text input
- Error Handling: Comprehensive error reporting and validation
- iOS device or simulator
- Xcode with XCTest capabilities
- Network access to the device
- Build and run the portal app on the target iOS device
- The XCTest suite will automatically start the HTTP server on port 6643
- The server will continue running until the test session ends
The portal is designed to work with automation agents that can:
- Send HTTP requests to the portal endpoints
- Process accessibility tree data for UI understanding
- Coordinate multiple automation actions
- Handle screenshot analysis for visual verification
import requests
# Get device info
response = requests.get('http://device-ip:6643/')
device_info = response.json()
# Take screenshot
screenshot = requests.get('http://device-ip:6643/vision/screenshot')
with open('screenshot.png', 'wb') as f:
f.write(screenshot.content)
# Get accessibility tree
a11y = requests.get('http://device-ip:6643/vision/a11y').json()
print(a11y['accessibilityTree'])
# Launch app
requests.post('http://device-ip:6643/inputs/launch',
json={'bundleIdentifier': 'com.apple.mobilesafari'})
# Perform tap
requests.post('http://device-ip:6643/gestures/tap',
json={'rect': '{{100,200},{50,50}}', 'count': 1})- FlyingFox: HTTP server framework for Swift
- XCTest: iOS testing framework for UI automation
- SwiftUI: User interface framework
- Port: 6643 (configurable)
- Protocol: HTTP/1.1
- Content Types: JSON, PNG images
- Threading: Async/await support
- Uses iOS coordinate system (points, not pixels)
- Rectangle format:
"{{x,y},{width,height}}" - Swipe coordinates specify starting points
- Requires iOS testing environment to run
- Limited to apps accessible through XCUITest
- Network access required for remote operation
- Some system-level interactions may be restricted
- The portal provides full device automation access
- Should only be used in controlled testing environments
- Network access should be restricted to trusted clients
- Consider implementing authentication for production use
This project is part of the larger Droidrun automation framework. Contributions should focus on:
- Enhanced UI state extraction
- Additional gesture support
- Improved error handling
- Performance optimizations
This project is licensed under the MIT License - see the LICENSE file for details.
Note: This is the iOS portal component of the Droidrun framework. For complete automation workflows, integrate with the corresponding agent component that orchestrates automation tasks using this portal's API.