Member-only story

Extracting LinkedIn Post Messages with a Web Agent

Ferry Djaja
20 min readJan 27, 2025

--

In this tutorial, you’ll learn how to build a web agent that automates the extraction of LinkedIn articles from a specific organization, utilizing Chrome browser and custom data for scraping.

This approach is particularly useful when platforms don’t provide APIs for data access. Interestingly, OpenAI has recently introduced Operator, a cutting-edge agent that can perform tasks using its own browser. Check out the research preview https://openai.com/index/introducing-operator/.

For this tutorial, we’ll be using the framework from https://github.com/browser-use/browser-use. I’ll make some minor adjustments to ensure seamless integration with my own Chrome browser.

Step by Step

Let’s begin by creating a Python environment variable using Conda.

conda create --name browser-use python=3.11
conda activate browser-use

Install the library.

pip install browser-use

Install Playwright.

playwright install

Next, we’ll update the browser.py file by modifying the _setup_browser_with_instance method. Without making this change, I encountered issues connecting to my Chrome browser smoothly.

"""
Playwright browser on steroids.
"""

import asyncio
import logging
from dataclasses import dataclass, field

from playwright._impl._api_structures import ProxySettings
from playwright.async_api import Browser as PlaywrightBrowser
from playwright.async_api import (
Playwright,
async_playwright,
)

from browser_use.browser.context import BrowserContext, BrowserContextConfig

logger = logging.getLogger(__name__)


@dataclass
class BrowserConfig:
"""
Configuration for the Browser.

Default values:
headless: True
Whether to run browser in headless mode

disable_security: False
Disable browser security features

extra_chromium_args: []
Extra arguments to pass to the browser

wss_url: None
Connect to a browser instance via WebSocket

cdp_url: None
Connect to a browser instance via CDP

chrome_instance_path: None
Path to a Chrome instance to use to connect to your normal browser
e.g. '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome'
"""

headless: bool = False…

--

--

No responses yet