feat: multi Agent OS server support in AskUiControllerClient#276
feat: multi Agent OS server support in AskUiControllerClient#276mlikasam-askui wants to merge 17 commits into
AskUiControllerClient#276Conversation
Replace single-controller config with a TargetComputer / TargetComputerManager abstraction. AskUiControllerClient now manages a list of local and remote controller servers, opens a gRPC connection per target on connect(), and routes agent-os actions to a single active target. Add tools for the agent to list, switch, and inspect the active target computer.
… target session GUID
| from .set_active_display_tool import ComputerSetActiveDisplayTool | ||
| from .switch_agent_os_server_tool import ComputerSwitchAgentOsServerTool |
There was a problem hiding this comment.
shouldn't these 3 new tools also be in experimental?
There was a problem hiding this comment.
I see them as part of the default tool list, similar to the Android agent.
| cursor_position = self.agent_os.get_mouse_position() | ||
| return f"Mouse is at position ({cursor_position.x}, {cursor_position.y})." | ||
| return ( | ||
| f"[Server with id '{server.computer_id}']: Mouse is at position " |
There was a problem hiding this comment.
I am not happy with the name "Server in general". So far, I was using the term "machine" when telling the agent that it is operating multiple devices.
The name of one machine could then be server. If we use the term server here, this will introduce ambiguity that might cause the currently working stuff to break
| return str(self.agent_os.get_system_info().model_dump_json()) | ||
| server = self.agent_os.get_active_agent_os_server(report=False) | ||
| system_info_json = self.agent_os.get_system_info().model_dump_json() | ||
| return f"[Server with id '{server.computer_id}']: {system_info_json}" |
There was a problem hiding this comment.
same here (Server ->machine)
| agent_os=agent_os, | ||
| ) | ||
|
|
||
| def __call__(self, computer_id: str) -> str: |
There was a problem hiding this comment.
how is this computer_id determined? By the user, e.g. in the init of the agent? Currently, they can set local_machine_name and remote_machine_name
There was a problem hiding this comment.
It can be set by the user, or it defaults to the session ID, which is a UUID4.
|
tbh: at the moment I really like the concept with 2 dedicated toolsets to operate 2 machines. Still, I see that this here will scale also to even more machines. Have you checked if the agent understands that it can operate multiple machines and that it calls the switch tool before executing operations if needed? Further: we need to automatically adjust the system capabilities and device information prompts if multiple agentOS are added. Also: we should definitely update the docs with this PR to explain these changes here |
| self, required_tags: list[str] | ||
| ) -> AgentOs | AndroidAgentOs: | ||
| """ | ||
| Find the first registered agent OS whose tags are a superset of |
There was a problem hiding this comment.
| Find the first registered agent OS whose tags are a superset of | |
| Find the first registered AgentOS whose tags are a superset of |
Please find all occurance. in Strings
|
|
||
| def temporary_select(self, device_sn: str) -> AbstractContextManager[Self]: |
There was a problem hiding this comment.
| def temporary_select(self, device_sn: str) -> AbstractContextManager[Self]: | |
| @abstractmethod | |
| def temporary_select(self, device_sn: str) -> AbstractContextManager[Self]: |
There was a problem hiding this comment.
Ok.
Why is this function not implmented?
| raise AndroidAgentOsError(msg) | ||
|
|
||
| @contextmanager | ||
| def temporary_select(self, device_sn: str) -> Iterator[Self]: |
There was a problem hiding this comment.
But in general. Why do we need the concept of temporary_select? How does the agent should call it?
There was a problem hiding this comment.
temporary_select is not intended to be called directly by the agent. It is designed to allow users to execute commands on a specific device without needing to restore the previous device context afterward.
It can also be used in custom tools to restrict a tool to a specific device. For example, an enableWifiOnCarEmulator tool could internally use temporary_select, ensuring it only operates on the car emulator. This differs from a generic enableWifi tool, which works across all devices, where the agent is responsible for switching to the appropriate device before invoking the tool.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _generate_session_guid() -> str: |
There was a problem hiding this comment.
Do we have already such a function?
| return server | ||
| return None | ||
|
|
||
| def switch(self, computer_id: str) -> AgentOsServer: |
There was a problem hiding this comment.
We can switch to the same computer? Should this not failing?
| Raises: | ||
| KeyError: If no server with the given computer id is registered. | ||
| """ | ||
| server = self.get(computer_id) |
There was a problem hiding this comment.
Please wrap a proper AgentOSServerManager Error around.
| def agent_os_server_manager(self) -> AgentOsServerManager: | ||
| """The underlying Agent-OS-server manager.""" | ||
| return self._manager |
There was a problem hiding this comment.
Why do we have the server_manager inside the client?
There was a problem hiding this comment.
THis should not be here
| self.is_cacheable = True | ||
|
|
||
| def __call__(self) -> str: | ||
| server = self.agent_os.get_active_agent_os_server(report=False) |
| assert _generate_session_guid() != _generate_session_guid() | ||
|
|
||
|
|
||
| class TestReplacePort: |
c20d744 to
55cdac0
Compare
…skui-controller-multi-target
Summary
AskUiControllerClientcan now manage multiple Agent OS target computers(local and/or remote) at once. One is active at a time and receives the agent's
actions; callers can switch at runtime or scope a switch to a
withblock.New API:
AgentOsTargetComputer(base),LocalAgentOsTargetComputer(replaces the oldAskUiControllerServer— owns the local controller subprocess; auto-detectsthe Windows
AskuiCoreServiceand switches to its port),RemoteAgentOsTargetComputer(already-running remote controller, no processmanagement).
AgentOsTargetComputerManager— enforces invariants (at most one local,unique session GUIDs /
computer_ids / remote addresses) and owns the gRPCconnection lifecycle. Targets are addressed exclusively by
computer_id(single
dictlookup; no secondary indices); the connection dict is keyed bycomputer_idtoo.AskUiControllerClientand exposed throughAgentOs/ComputerAgentOsFacade:add_agent_os_target_computer,add_remote_agent_os_target_computer,list_agent_os_target_computers,get_current_computer_target_id(returns thecomputer_idstring of theactive target),
switch_agent_os_target_computer,reset_agent_os_target_computers,temporary_select.AndroidAgentOs/PpadbAgentOsget a siblingtemporary_select(device_sn).ComputerAgentauto-registers three newact()tools so the LLM can drivemulti-target flows:
ComputerListAgentOsTargetComputersTool,ComputerSwitchAgentOsTargetComputerTool,ComputerGetCurrentComputerTargetIdTool.LocalAgentOsTargetComputerandRemoteAgentOsTargetComputerare re-exportedfrom the top-level
askuipackage.New unit tests cover the target-computer classes, the manager, the multi-target
client, and the new computer tools. The e2e test was updated to the new
constructor.