Member-only story
Extracting LinkedIn Post Messages with a Web Agent
In this tutorial, you’ll learn how to build a web agent that automates the extraction of LinkedIn articles from a specific organization, utilizing Chrome browser and custom data for scraping.
This approach is particularly useful when platforms don’t provide APIs for data access. Interestingly, OpenAI has recently introduced Operator, a cutting-edge agent that can perform tasks using its own browser. Check out the research preview https://openai.com/index/introducing-operator/.
For this tutorial, we’ll be using the framework from https://github.com/browser-use/browser-use. I’ll make some minor adjustments to ensure seamless integration with my own Chrome browser.
Step by Step
Let’s begin by creating a Python environment variable using Conda.
conda create --name browser-use python=3.11
conda activate browser-use
Install the library.
pip install browser-use
Install Playwright.
playwright install
Next, we’ll update the browser.py
file by modifying the _setup_browser_with_instance
method. Without making this change, I encountered issues connecting to my Chrome browser smoothly.
"""
Playwright browser on steroids.
"""
import asyncio
import logging
from dataclasses import dataclass, field
from playwright._impl._api_structures import ProxySettings
from playwright.async_api import Browser as PlaywrightBrowser
from playwright.async_api import (
Playwright,
async_playwright,
)
from browser_use.browser.context import BrowserContext, BrowserContextConfig
logger = logging.getLogger(__name__)
@dataclass
class BrowserConfig:
"""
Configuration for the Browser.
Default values:
headless: True
Whether to run browser in headless mode
disable_security: False
Disable browser security features
extra_chromium_args: []
Extra arguments to pass to the browser
wss_url: None
Connect to a browser instance via WebSocket
cdp_url: None
Connect to a browser instance via CDP
chrome_instance_path: None
Path to a Chrome instance to use to connect to your normal browser
e.g. '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome'
"""
headless: bool = False…