Skip to main content

Stream report #2 from July 9th

TODOs for this stream

  • Run a crawl with 10 browsers in parallel and see if we can verify the errors described
  • Debug the data loss
  • Outline at least 3 use cases for working with a primed profile

What actually happened

I tried running a crawl on the v0.16.0 tag to reproduce the error got reported by taehyung222 on matrix. (Come join us at #openwpm:mozilla.org if you want.)

However once I had created a list with the top 1000 Alexa entries using parts of a script from the openwpm-crawler repo and started running the crawl I discovered that running 10 browsers in parallel and streaming doesn’t work that well from a performance perspective.

Also the crawl was way slower than I had expected and in the end got crashed by an assertion error stopping the entire crawl after only ~150 sites were visited.

At that point I gave up and ended the stream.

What was learned

Streams need work outside of the time I’m streaming.

If I want to analyse a dataset on stream I should prepare it beforehand.

If I promise a stream report, I need to sit down and actually write one.

If I claim to be a responsible maintainer, I should be helping community members instead of wasting their time.

Streams should only happen when I’m actually ready to stream. In this stream I was hungry, tired and anxious. I should have cancelled the stream and worked offline.

How is OpenWPM better than it was before

It isn’t. Nothing of value was produced on this stream. I’m sorry.

Outlook for next stream

All I can promise is that I’ll try to do better.

I’ll try to run the crawl under the week, so that the dataset is ready for analyzing next Friday.

Hopefully I’ll see you then.