⏳ WebDriver BiDi: Revolutionizing Browser Automation Protocols for Modern Testing

A Comprehensive Guide to WebDriver BiDi and Its Impact on Automation Testing

⏳ WebDriver BiDi: Revolutionizing Browser Automation Protocols for Modern Testing

Introduction

WebDriver BiDi is revolutionizing browser automation by introducing bidirectional communication between test scripts and the browser. Unlike traditional WebDriver, which only allows one-way communication, WebDriver BiDi enables real-time interaction, giving testers unparalleled control and insight into browser behavior. This new protocol brings together the best of both worlds—combining deep browser control with event-driven architecture, making your automation


History of Protocols in Automation

We have two sets of automation controls here:

  1. One is based on the WebDriver protocol

  2. And the other is running the tests within the browser.

These two can be categorized as Low-level and High-level implementations*:*

  1. Based on the WebDriver protocol (Low-level implementations)

  2. Running the tests within the browser (High-level implementations).

WebDriver BiDi - Protocols History


Low-Level Implementations

WebDriver Classic

WebDriver BiDi - Webdriver Classic

In the WebDriver classic system, a client represents the automation framework (e.g., Selenium WebDriver, Webdriver IO, Nightwatch js) and browser drivers developed by Vendors. These drivers communicate with browser binaries via HTTP JSON protocol to execute actions like ‘click’, ‘send keys’, ‘hover’, etc. Each action is an API call to the browser binary, which then performs the action and responds to the client.

WebDriver Classic Architechure - WebDriver BiDi

  • Sends commands: Tells the browser what to do (click, type, etc.) using simple commands.

  • Multi-browser support: Works with many different browsers.

  • Slowdown: Can be slow because it needs to constantly send new commands.

  • Limited: Can't directly access things like network activity.


CDP (Chrome DevTools Protocol)

CDP Protocol - WebDriver BiDi

The Chrome DevTools Protocol (CDP) is primarily for developers and aids in debugging. It enables access to the browser DevTools. Google introduced Puppeteer, an automation tool that utilizes CDP to interact with browsers.

CDP Protocol Architecture - WebDriver BiDi

  • Deep control: Lets you see and change almost anything in a Chrome browser, like network traffic, console messages, and device settings.

  • Fast and flexible: Uses a direct connection to the browser, so it's quick and can respond to changes instantly.

  • Limited to Chrome: Only works with Chrome-based browsers.

  • Changing rules: The way to use it can change as Chrome updates.


High-Level Implementations

Native Scripting (Web APIs & Node.js)

Cypress-based implementations fall into this category. In this approach, the automation tool utilizes Web API and injects JavaScript directly into the browser to execute the tests. As a result, the automation tool operates within the constraints established by the JavaScript code sandbox.

Native Scripting Approach - WebDriver BiDi

  • Fast and in control: Runs tests quickly inside the browser.

  • Limited actions: Can't fully mimic real user behaviour, mostly works by changing the page's code.

  • Security restrictions: Can't do everything, like opening new tabs or windows, because the browser limits what it can do.

“So, now we have discussed three approaches: one is based on JavaScript, where JavaScript is injected, but we do not want to pursue that direction as it’s entirely high-level and represents a different model altogether. Now, within low-level implementations, we have certain advantages and disadvantages with each of the tools. So, what if there exists a solution where the best of both worlds are brought together? For example, Real User Emulation, W3C standard compliance, Multi-browser support, Event-Driven architecture, and Bi-Directional communication. That’s where the solution comes in the form of WebDriver BiDi.*”*

CDP and WebDriver Classic - WebDriver BiDi


What is WebDriver BiDi?

WebDriver BiDi (Bidirectional) is a significant evolution in the world of test automation. Unlike traditional WebDriver, which primarily allows communication from test scripts to the browser, WebDriver BiDi introduces bidirectional communication. This allows real-time data exchange between your test scripts and the browser.

Imagine you’re controlling a robot. Traditional WebDriver is like telling the robot what to do, but you can’t hear its replies. WebDriver BiDi is different—it lets the robot talk back to you! This makes it easier to see what’s happening in real-time.

WebDriver BiDi Protocol

Work on WebDriver BiDi was started in 2020 by the W3C Browser Testing and Tools working group This work is a collaboration: Webdriver BiDi working group comprises browser vendors (chrome, firefox, safari), open-source browser automation projects (selenium, webdriverIO), and companies offering browser automation solutions (Browser Stack, Sauce labs).

The WebDriver BiDi Working Group

All these stakeholders work together in harmony to bring a simple and unified solution for the testers that is easy to implement.


Why Do We Need WebDriver BiDi?

1. Listening to DOM Events

What It Does: WebDriver BiDi can listen to events happening in the DOM (Document Object Model) of a webpage, like clicks, inputs, or any changes.

Why It Matters: This allows tests to react immediately when something happens on the page, making automation smarter and more efficient.

2. Capturing & Sending JavaScript Errors

What It Does: If there’s a JavaScript error on a webpage, WebDriver BiDi can catch it and send it back to the test script.

Why It Matters: This helps in identifying issues in real-time, so you can fix them quickly.

3. Reading Console Messages

What It Does: WebDriver BiDi can read messages that are logged in the browser’s console (like warnings, errors, or logs).

Why It Matters: This gives insights into what’s happening behind the scenes in the browser, helping you understand any issues or behavior in your web app.

4. Recording or Manipulating Network Traffic

What It Does: It can monitor and even change the network requests and responses happening between the browser and the server.

Why It Matters: This is useful for testing how your app behaves under different network conditions, like slow connections or specific server responses.

5. Real-Time Communication and Control Over Browser Internals

What It Does: WebDriver BiDi allows two-way communication with the browser, meaning it can send and receive data instantly.

Why It Matters: This real-time interaction enables more dynamic and responsive tests, making it easier to handle complex scenarios like animations or dynamic content.

These features make WebDriver BiDi a powerful tool for modern browser automation, giving you greater control and insights into how your web applications work.


Conclusion

In conclusion, WebDriver BiDi is a game-changer for automation, as it leverages the benefits of Bi-Directional Communication, Event-Driven architecture, Multi-browser support, and access to low-level controls such as Console and Network. It also adheres to the W3C standard. Let's start adopting these new features as they're released and experience the benefits firsthand.


💡
WebdriverIO v9 & Selenium have integrated this future of protocols into their tools. To read more, please check this blog below. 👇👇👇


Did you find this article valuable?

Support Hardik Chotaliya by becoming a sponsor. Any amount is appreciated!