Skip to main content

zabka.it

Implementation Ideas

This is a loose collection of ideas in my head regarding the implementation.

General

As discussed with @ckerschb there can’t be any overhead for release Firefox builds, so all changes to the DOM code, where the WebAPIs are implemented, will have to be ifdef-ed out and behind a preference. Which should make the overhead minimal for non release builds and none for release builds.

Producers

All relevant call sites will have to request our service, submit an event to it which contains all relevant information.

#ifdef WEB_API_INSTRUMENTATION
    if(Preferences::GetBool("experimental.web_api_instrumentation")) {
        auto const & const webApiInstrumentationService = getApiInstrumentationService()
        if(webApiInstrumentationService.shouldSubmitEvent("window.navigator")) {
            // Construct event here
        }
    }
#endif

Consumers

Current behavior for consumers

The OpenWPM WebExtension submits a rather complicated settings object that specifies which API is to be instrumented in which way. For each object the WebExtension is able to specify the following per object it is instrumenting:

  • Which existing and the non-existing properties to instrument, as well as which properties should be excluded
  • If they want their function calls to be serialized to strings
  • If they want “get” operations on properties that are functions to be logged
  • If they want to prevent the overriding of properties to prevent nested objects or functions being changed
  • If a property should be recursively instrumented and how many levels deep the recursion should go

The instrument obtains a reference to the logger and writes out the captured calls to it.

New suggested behavior

When it comes to configuration, I think we should follow the precedent set, however I think an XPCOM based instrumentation doesn’t need to implement the freezing of nested objects and functions, since we don’t lose our instrumentation when such an object is overwritten. I also think that in the interest of keeping the Rust code simple the function should always be serialized and getting should always be logged.

Users should be able to write, JavaScript in the likes of:

const event_handler = (event) => {
    //Maybe some transformation
    logging_db.saveRecord("javascript", event)
}

let subscriber_id = browser.js_instrument.subscribe(settings, event_handler)
// visit the page
browser.js_instrument.unsubscribe(subscriber_id)

Current data being captured

Some of these values are self-explanatory, but some I need to ask for explanations for

Column Name Type Optional Description
incognito int32
crawl_id uint32 OpenWPM internal identifier
visit_id int64 OpenWPM internal identifier
instance_id uint32 True OpenWPM internal identifier
extension_session_uuid string
event_ordinal int64
page_scoped_event_ordinal int64
window_id int64
tab_id int64
frame_id int64
script_url string
script_line string
script_col string
func_name string
script_loc_eval string
document_url string
top_level_url string
call_stack string
symbol string
operation string
value string
arguments string
time_stamp string True
    const msg = {
      operation,
      symbol: instrumentedVariableName,
      value: serializeObject(value, logSettings.logFunctionsAsStrings),
      scriptUrl: callContext.scriptUrl,
      scriptLine: callContext.scriptLine,
      scriptCol: callContext.scriptCol,
      funcName: callContext.funcName,
      scriptLocEval: callContext.scriptLocEval,
      callStack: callContext.callStack,
      ordinal: ordinal++,
    };

TODO

  • Document what the JS instrument currently collects and see if we can collect the same info
  • Understand the JS instrument settings object