Skip to main content

Stream report #3 from July 17th

TODOs for this stream

  • Run a crawl with 10 browsers in parallel and see if we can verify the errors described
  • Debug the data loss
  • Outline at least 3 use cases for working with a primed profile

What actually happened

Before this stream started, I fixed the errors that I had encountered at the end of the last stream and ran a 1k crawl with 10 browsers in parallel. It turns out that the issue filed for AssertionErrors crashing crawls was 4 years old.

When the stream began I just wanted to quickly fix the TSLint to ESlint PR, by reverting all changes from function x () {} to const x = () => {}, because I had found out that this changes scoping rules when writing the report for that stream. Finding where I had made this change turned out to take quite a bit of time but I got it done in the end.

However once that PR was merged I also wanted to merge the PR with my fix for the AssertionError handling however I realized that I had no tests, which I deemed unacceptable. So I spent some time writing up a test, that asserted that if OpenWPM was in testing mode, it would propagate all AssertionErrors as before but would simply restart the browser in a non-testing crawl.

I then updated the default_params fixture and the default config in OpenWPMTest to set the testing flag correctly to make sure, we’d still catch any assertion errors while running tests.

This plus a bunch of running repin.sh and install.sh while switching between branches filled the two hours.

What was learned

Our previous needs and use cases as maintainers were very much detached from the needs of the community. The only way we were running big crawls was using crawler.py and openwpm-crawler to have hundreds of OpenWPM instances run in parallel on GCP.

This meant the user pain of crawls crashing was not experienced by us and the issue didn’t get prioritized. Only after experiencing the frustration of having a crawl just die on me, I realized the importance of this pain point.

So maybe there are also upsides to this no longer being actively used as a working tool as it allows me to engage with the users on the level that they are you the tool and not assuming that everybody uses it like we did.

How is OpenWPM better than it was before

OpenWPM has improved meaningfully throughout this stream. I merged all the work from the last streams, meaning it is now (almost) available to end users. (We recommend that researchers use the latest tag, so they shouldn’t see the changes yet.)

The move of the Extension to the top-level makes it easier to discover. The move from TSLint to ESLint allows me to move forward in the journey of merging the two extension folders.

Fixing this longstanding error handling issue should make OpenWPM easier to use and more reliable for researchers.

Outlook for next stream

Since a new Firefox just released this week and some meaningful changes to OpenWPM just landed (see above) I think I should create a new release in the next stream.

Also, I should stop putting off the analysis and see if I could reproduce the issue of missing site_visit entries.

The same goes for writing documentation that helps users discover how to use primed profiles e.g. for cookies or Add-ons/WebExtensions.