Experimenting with Set-of-Mark Prompting with GPT4-Vision

Ferry Djaja
5 min readFeb 8, 2024

I came across this challenge from the GitHub issue where the GPT-4V error rate is quite high when estimating the position of mouse on the screen from this Git repo: https://github.com/OthersideAI/self-operating-computer/issues/3. From there I tried to understand how the Set-of-Mark Prompting is working by reading the paper and trying to run the demo code. Unfortunately, I wasn’t able to run the code on my Mac, hence I decided to give it a try on Google Colab.

--

--