Implementation Ideas
This is a loose collection of ideas in my head regarding the implementation.
General
As discussed with @ckerschb there can’t be any overhead for release Firefox builds,
so all changes to the DOM code, where the WebAPIs are implemented, will have to be
ifdef-ed
out and behind a preference.
Which should make the overhead minimal for non release builds
and none for release builds.
Producers
All relevant call sites will have to request our service, submit an event to it which contains all relevant information.
#ifdef CALLMONITOR
if(Preferences::GetBool("experimental.callmonitor")) {
auto const & const callMonitorEventManager = getCallMonitorEventManager()
if(callMonitorEventManager.shouldSubmitEvent("window.navigator")) {
callMonitorEventManager.submitEvent(
"window.navigator",
// Attach more context such as callstack or arguments passed
);
}
}
#endif
Consumers
Current behavior for consumers
The OpenWPM WebExtension submits a rather complicated settings object that specifies which API is to be instrumented in which way. For each object the WebExtension is able to specify the following per object it is instrumenting:
- Which existing and the non-existing properties to instrument, as well as which properties should be excluded
- If they want their function calls to be serialized to strings
- If they want “get” operations on properties that are functions to be logged
- If they want to prevent the overriding of properties to prevent nested objects or functions being changed
- If a property should be recursively instrumented and how many levels deep the recursion should go
The instrument obtains a reference to the logger and writes out the captured calls to it.
New suggested behavior
When it comes to configuration, I think we should follow the precedent set, however I think an XPCOM based instrumentation doesn’t need to implement the freezing of nested objects and functions, since we don’t lose our instrumentation when such an object is overwritten. I also think that in the interest of keeping the Rust code simple the function should always be serialized and getting should always be logged.
Users should be able to write, JavaScript in the likes of:
const event_handler = (event) => {
//Maybe some transformation
logging_db.saveRecord("javascript", event)
}
let subscriber_id = browser.callMonitor.subscribe(settings, event_handler)
// visit the page
browser.callMonitor.unsubscribe(subscriber_id)
Current data being captured
Some of these values are self-explanatory, but some I need to ask for explanations for
Column Name | Type | Optional | Description |
---|---|---|---|
incognito | int32 | ||
crawl_id | uint32 | OpenWPM internal identifier | |
visit_id | int64 | OpenWPM internal identifier | |
instance_id | uint32 | True | OpenWPM internal identifier |
extension_session_uuid | string | ||
event_ordinal | int64 | ||
page_scoped_event_ordinal | int64 | ||
window_id | int64 | ||
tab_id | int64 | ||
frame_id | int64 | ||
script_url | string | ||
script_line | string | ||
script_col | string | ||
func_name | string | ||
script_loc_eval | string | ||
document_url | string | ||
top_level_url | string | ||
call_stack | string | ||
symbol | string | ||
operation | string | ||
value | string | ||
arguments | string | ||
time_stamp | string | True |
const msg = {
operation,
symbol: instrumentedVariableName,
value: serializeObject(value, logSettings.logFunctionsAsStrings),
scriptUrl: callContext.scriptUrl,
scriptLine: callContext.scriptLine,
scriptCol: callContext.scriptCol,
funcName: callContext.funcName,
scriptLocEval: callContext.scriptLocEval,
callStack: callContext.callStack,
ordinal: ordinal++,
};