⏳ WebDriver BiDi: Revolutionizing Browser Automation Protocols for Modern Testing
A Comprehensive Guide to WebDriver BiDi and Its Impact on Automation Testing
Introduction
WebDriver BiDi is revolutionizing browser automation by introducing bidirectional communication between test scripts and the browser. Unlike traditional WebDriver, which only allows one-way communication, WebDriver BiDi enables real-time interaction, giving testers unparalleled control and insight into browser behavior. This new protocol brings together the best of both worlds—combining deep browser control with event-driven architecture, making your automation
History of Protocols in Automation
We have two sets of automation controls here:
One is based on the WebDriver protocol
And the other is running the tests within the browser.
These two can be categorized as Low-level and High-level implementations*:*
Based on the WebDriver protocol (Low-level implementations)
Running the tests within the browser (High-level implementations).
Low-Level Implementations
WebDriver Classic
In the WebDriver classic system, a client represents the automation framework (e.g., Selenium WebDriver, Webdriver IO, Nightwatch js) and browser drivers developed by Vendors. These drivers communicate with browser binaries via HTTP JSON protocol to execute actions like ‘click’, ‘send keys’, ‘hover’, etc. Each action is an API call to the browser binary, which then performs the action and responds to the client.
Sends commands: Tells the browser what to do (click, type, etc.) using simple commands.
Multi-browser support: Works with many different browsers.
Slowdown: Can be slow because it needs to constantly send new commands.
Limited: Can't directly access things like network activity.
CDP (Chrome DevTools Protocol)
The Chrome DevTools Protocol (CDP) is primarily for developers and aids in debugging. It enables access to the browser DevTools. Google introduced Puppeteer, an automation tool that utilizes CDP to interact with browsers.
Deep control: Lets you see and change almost anything in a Chrome browser, like network traffic, console messages, and device settings.
Fast and flexible: Uses a direct connection to the browser, so it's quick and can respond to changes instantly.
Limited to Chrome: Only works with Chrome-based browsers.
Changing rules: The way to use it can change as Chrome updates.
High-Level Implementations
Native Scripting (Web APIs & Node.js)
Cypress-based implementations fall into this category. In this approach, the automation tool utilizes Web API and injects JavaScript directly into the browser to execute the tests. As a result, the automation tool operates within the constraints established by the JavaScript code sandbox.
Fast and in control: Runs tests quickly inside the browser.
Limited actions: Can't fully mimic real user behaviour, mostly works by changing the page's code.
Security restrictions: Can't do everything, like opening new tabs or windows, because the browser limits what it can do.
“So, now we have discussed three approaches: one is based on JavaScript, where JavaScript is injected, but we do not want to pursue that direction as it’s entirely high-level and represents a different model altogether. Now, within low-level implementations, we have certain advantages and disadvantages with each of the tools. So, what if there exists a solution where the best of both worlds are brought together? For example, Real User Emulation, W3C standard compliance, Multi-browser support, Event-Driven architecture, and Bi-Directional communication. That’s where the solution comes in the form of WebDriver BiDi.*”*
What is WebDriver BiDi?
WebDriver BiDi (Bidirectional) is a significant evolution in the world of test automation. Unlike traditional WebDriver, which primarily allows communication from test scripts to the browser, WebDriver BiDi introduces bidirectional communication. This allows real-time data exchange between your test scripts and the browser.
Imagine you’re controlling a robot. Traditional WebDriver is like telling the robot what to do, but you can’t hear its replies. WebDriver BiDi is different—it lets the robot talk back to you! This makes it easier to see what’s happening in real-time.
Work on WebDriver BiDi was started in 2020 by the W3C Browser Testing and Tools working group This work is a collaboration: Webdriver BiDi working group comprises browser vendors (chrome, firefox, safari), open-source browser automation projects (selenium, webdriverIO), and companies offering browser automation solutions (Browser Stack, Sauce labs).
All these stakeholders work together in harmony to bring a simple and unified solution for the testers that is easy to implement.
Why Do We Need WebDriver BiDi?
1. Listening to DOM Events
• What It Does: WebDriver BiDi can listen to events happening in the DOM (Document Object Model) of a webpage, like clicks, inputs, or any changes.
• Why It Matters: This allows tests to react immediately when something happens on the page, making automation smarter and more efficient.
2. Capturing & Sending JavaScript Errors
• What It Does: If there’s a JavaScript error on a webpage, WebDriver BiDi can catch it and send it back to the test script.
• Why It Matters: This helps in identifying issues in real-time, so you can fix them quickly.
3. Reading Console Messages
• What It Does: WebDriver BiDi can read messages that are logged in the browser’s console (like warnings, errors, or logs).
• Why It Matters: This gives insights into what’s happening behind the scenes in the browser, helping you understand any issues or behavior in your web app.
4. Recording or Manipulating Network Traffic
• What It Does: It can monitor and even change the network requests and responses happening between the browser and the server.
• Why It Matters: This is useful for testing how your app behaves under different network conditions, like slow connections or specific server responses.
5. Real-Time Communication and Control Over Browser Internals
• What It Does: WebDriver BiDi allows two-way communication with the browser, meaning it can send and receive data instantly.
• Why It Matters: This real-time interaction enables more dynamic and responsive tests, making it easier to handle complex scenarios like animations or dynamic content.
These features make WebDriver BiDi a powerful tool for modern browser automation, giving you greater control and insights into how your web applications work.
Conclusion
In conclusion, WebDriver BiDi is a game-changer for automation, as it leverages the benefits of Bi-Directional Communication, Event-Driven architecture, Multi-browser support, and access to low-level controls such as Console and Network. It also adheres to the W3C standard. Let's start adopting these new features as they're released and experience the benefits firsthand.