Implementation Ideas

This is a loose collection of ideas in my head regarding the implementation.

General

As discussed with @ckerschb there can’t be any overhead for release Firefox builds, so all changes to the DOM code, where the WebAPIs are implemented, will have to be ifdef-ed out and behind a preference. Which should make the overhead minimal for non release builds and none for release builds.

Producers

All relevant call sites will have to request our service, submit an event to it which contains all relevant information.

#ifdef CALLMONITOR
    if(Preferences::GetBool("experimental.callmonitor")) {
        auto const & const callMonitorEventManager = getCallMonitorEventManager()
        if(callMonitorEventManager.shouldSubmitEvent("window.navigator")) {
            callMonitorEventManager.submitEvent(
                "window.navigator",
                // Attach more context such as callstack or arguments passed
            );
        }
    }
#endif

Consumers

Current behavior for consumers

The OpenWPM WebExtension submits a rather complicated settings object that specifies which API is to be instrumented in which way. For each object the WebExtension is able to specify the following per object it is instrumenting:

Which existing and the non-existing properties to instrument, as well as which properties should be excluded
If they want their function calls to be serialized to strings
If they want “get” operations on properties that are functions to be logged
If they want to prevent the overriding of properties to prevent nested objects or functions being changed
If a property should be recursively instrumented and how many levels deep the recursion should go

The instrument obtains a reference to the logger and writes out the captured calls to it.

New suggested behavior

When it comes to configuration, I think we should follow the precedent set, however I think an XPCOM based instrumentation doesn’t need to implement the freezing of nested objects and functions, since we don’t lose our instrumentation when such an object is overwritten. I also think that in the interest of keeping the Rust code simple the function should always be serialized and getting should always be logged.

Users should be able to write, JavaScript in the likes of:

const event_handler = (event) => {
    //Maybe some transformation
    logging_db.saveRecord("javascript", event)
}

let subscriber_id = browser.callMonitor.subscribe(settings, event_handler)
// visit the page
browser.callMonitor.unsubscribe(subscriber_id)

Current data being captured

Some of these values are self-explanatory, but some I need to ask for explanations for

Column Name	Type	Optional	Description
incognito	int32
crawl_id	uint32		OpenWPM internal identifier
visit_id	int64		OpenWPM internal identifier
instance_id	uint32	True	OpenWPM internal identifier
extension_session_uuid	string
event_ordinal	int64
page_scoped_event_ordinal	int64
window_id	int64
tab_id	int64
frame_id	int64
script_url	string
script_line	string
script_col	string
func_name	string
script_loc_eval	string
document_url	string
top_level_url	string
call_stack	string
symbol	string
operation	string
value	string
arguments	string
time_stamp	string	True

    const msg = {
      operation,
      symbol: instrumentedVariableName,
      value: serializeObject(value, logSettings.logFunctionsAsStrings),
      scriptUrl: callContext.scriptUrl,
      scriptLine: callContext.scriptLine,
      scriptCol: callContext.scriptCol,
      funcName: callContext.funcName,
      scriptLocEval: callContext.scriptLocEval,
      callStack: callContext.callStack,
      ordinal: ordinal++,
    };

2020-10-05

https://zabka.it/bachelor/03-implementation/ Stefan Zabka

#Bachelors Thesis