Using Puppeteer and PixelMatch to monitor websites

Alfred Wang
5 min readFeb 16, 2021

my thought process behind the scene

This is an application that I have worked on to solidify my understanding of various skills I’ve learned during the Flatiron Bootcamp and also a great time for me to experience with new things.

Other than our typical client side request and go fetch back end , this projects need a Node.js backend that continuously run and check for the queries. Go to the links of the website, capture a screenshot, compare it to the last known screenshot, calculates the difference. If there is a difference send an email to the user and also record the before and after picture and the difference picture into the database.

Puppeteer —

  • Generate screenshots and PDFs of pages.
  • Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. “SSR” (Server-Side Rendering)).

Pixelmatch —

The smallest, simplest and fastest JavaScript pixel-level image comparison library, originally created to compare screenshots in tests.

First we have to use Sequelize to do some reading and connect to the database, we would need to define a “user” and it’s attributes and it has to match the Ruby backend database model.

-We have three models, a user, alter(which is change of the picture, however change is a reserved word in Rails) and Query.

-The relationship of “has-many” and “belongs_to” is also define here.

-The data format createdAt and created_at is different, so a “timestamp false” is needed to prevent error

Now we have the models set up we may begin and try to test the connection

An update is commented out after testing

Since it reads the data, we can then filter and only get the data we need such as we only want the query that is currently “Active” from the users.

So the logic for saving picture and naming them becomes a challenge, first we need to save an initial picture, however, we don’t want to save a picture every time we check a query. So we only want to save an initial picture and a second picture ONLY when there is difference.

const time = Date.now();
const imagePath = `images/${query.id}-${time}.png` // 'images/1-123.jpg'

This is for making every picture with a unique name and also saving them to the right folder.

const diffImagePath = `images/${query.id}-diff-${time}.png`

This is for saving the picture when there is a difference.

Pixel match compares two picture, so I have to point the function to read where that image is located

const newImage = PNG.sync.read(fs.readFileSync(`../public/${imagePath}`))

There is a function to get latest screenshot where it saves the image into a last_image attribute.

After that set up some if else where it’s always do an initial capture screenshot, and 2nd time check if there is difference between the screenshots.

If there is difference save it, if not, since it is the same image, don’t save it. We don’t need to keep all the same images in the server. We can just update it with new the newest captured but same picture.

When there is difference, we want to keep the image of using pixel match and save initial picture, second picture(that has changes) and the calculated difference picture into the database.

When there is a difference, create a new instance of “Alter” which has both the before and after image also the difference. Since there is a difference, we will also send an email to the user to notify them a change has been found

await fetch('https://api.emailjs.com/api/v1.0/email/send', {method: 'POST',body: JSON.stringify(emailData),headers: { 'Content-Type': 'application/json' },}).then(res => {console.log('success emailing to ', query.User.email)}).catch(err => {console.log('error emailing', err)})

I also took an online course on Accessibility , where some users are not using mouse and keyboard, so it would be difficult for users to navigate through a long nav bar , for example “Amazon”.

If you use “Tab” on the amazon website, you will notice a “Skip To content Bar” so the user; For example might be nodding as a way to navigate the website, would not need to nod too many times and can skip the nav bar and go straight into the main content.

Web Accessibility principles

Principles — At the top are four principles that provide the foundation for Web accessibility: perceivable, operable, understandable, and robust

Even though I did implement it into my project, however, I only have three Nav items but still want to let the tabbing experience to be okayish.

I also added a screen reader only message on the page with difference, since the difference is only showed in picture, I can only put a message to let the user know a difference was detected.

Known technical debt

With this application comes with many future challenges, such as currently I’m using lazy loading data, where I fetch unnecessary data into the frontend and then filter them out.

  • aws/s3

Everything is run locally and the rails, node, front end public folder all need access to the images.

  • Throttling comparisons

Currently Node runs all queries through an array.map, which runs many browser sessions simultaneously

There is a full list of my TODO but not yet completed within the time given, however, I’ve made notes and progress on what I should be focusing and working on a project board.

--

--