Stream report #3 from July 17th
TODOs for this stream
- Run a crawl with 10 browsers in parallel and see if we can verify the errors described
- Debug the data loss
- Outline at least 3 use cases for working with a primed profile
What actually happened
Before this stream started, I fixed the errors that I
had encountered at the end of the last stream
and ran a 1k crawl with 10 browsers in parallel.
It turns out that the
issue filed for AssertionError
s crashing crawls
was 4 years old.
When the stream began I just wanted to quickly fix the
TSLint to ESlint PR, by reverting all
changes from function x () {}
to const x = () => {}
, because I had found out that
this changes scoping rules when writing the report for that stream.
Finding where I had made this change turned out to take quite a bit of time but
I got it done in the end.
However once that PR was merged I also wanted to merge the PR with my fix for the
AssertionError
handling however I realized that I had no tests, which I deemed unacceptable.
So I spent some time writing up a test, that asserted that if OpenWPM
was in testing mode, it would propagate all AssertionError
s as before but would simply
restart the browser in a non-testing crawl.
I then updated the default_params
fixture and the default config in OpenWPMTest
to set
the testing flag correctly to make sure, we’d still catch any assertion errors while running
tests.
This plus a bunch of running repin.sh
and install.sh
while switching between branches
filled the two hours.
What was learned
Our previous needs and use cases as maintainers were very much detached from the needs of the community. The only way we were running big crawls was using crawler.py and openwpm-crawler to have hundreds of OpenWPM instances run in parallel on GCP.
This meant the user pain of crawls crashing was not experienced by us and the issue didn’t get prioritized. Only after experiencing the frustration of having a crawl just die on me, I realized the importance of this pain point.
So maybe there are also upsides to this no longer being actively used as a working tool as it allows me to engage with the users on the level that they are you the tool and not assuming that everybody uses it like we did.
How is OpenWPM better than it was before
OpenWPM has improved meaningfully throughout this stream. I merged all the work from the last streams, meaning it is now (almost) available to end users. (We recommend that researchers use the latest tag, so they shouldn’t see the changes yet.)
The move of the Extension to the top-level makes it easier to discover. The move from TSLint to ESLint allows me to move forward in the journey of merging the two extension folders.
Fixing this longstanding error handling issue should make OpenWPM easier to use and more reliable for researchers.
Outlook for next stream
Since a new Firefox just released this week and some meaningful changes to OpenWPM just landed (see above) I think I should create a new release in the next stream.
Also, I should stop putting off the analysis and see if I could reproduce the issue
of missing site_visit
entries.
The same goes for writing documentation that helps users discover how to use primed profiles e.g. for cookies or Add-ons/WebExtensions.