描述
Vision-driven browser automation using Midscene Bridge mode. Operates entirely from screenshots — no DOM or accessibility labels required. Can interact with all visible elements on screen regardless of technology stack. Connects to the user's desktop Chrome browser via the Midscene Chrome Extension, preserving cookies, sessions, and login state. Does NOT take over the user's mouse or keyboard — operates through Chrome DevTools Protocol. Use this skill when the user wants to: - Browse, navigate, or open web pages in the user's own Chrome browser - Interact with pages that require login sessions, cookies, or existing browser state - Scrape, extract, or collect data from websites using the user's real browser - Fill out forms, click buttons, or interact with web elements - Verify, validate, test, or QA frontend UI behavior - Take screenshots of web pages - Automate multi-step web workflows - Test what was just built, validate the UI in real browser Powered by Midscene.js (https://midscenejs.com)