Musings on configurations

Configuration is hard, that’s why I don’t want to do it.

Back in September 2020 I wrote up some general thoughts on how I want the interface of the component to look however I very carefully avoided specifying how the settings object on the consumer side should look like. Later I realized that I didn’t understand our current js_instrument_settings at all and promised to look into it and I’ve done so now.

The current `js_instruments_settings`

Our current approach is to take all sorts of things from our user and try to transform it into a meaningful output. Here is an example of everything we accept from the comment on clean_js_instrumentation_settings:

    // Collections
    "collection_fingerprinting",
    // APIs, with or without settings details
    "XMLHttpRequest",
    {"XMLHttpRequest": {"excludedProperties": ["send"]}},
    // APIs with shortcut to includedProperties
    {"Prop1": ["hi"], "Prop2": ["hi2"]},
    {"XMLHttpRequest": ["send"]},
    "Storage",
    // Specific instances on window
    {"window.document": ["cookie", "referrer"]},
    {"window": ["name", "localStorage", "sessionStorage"]}

We accept a list of these things in combination and try to handle the resulting mess ourselves by expanding and merging the different inputs to wrangle them into a usable structure. Once we have achieved that we dump the resulting config as a JSON and then destroy the JSON by unquoting a string to save us an eval in the WebExtension.

Update: The OpenWPM maintainer team has since decided that this was a bad idea and I filed #857

I think both accepting many kinds of input and choosing to break an existing serialization format are bad decisions as they lead to complicated and subtle code. (The setting looks like JSON at first glance but if you try to json.loads it, it will fail with a non-obvious error.)

The expanded settings

Once all the settings have been expanded, there are a lot more properties. They are:

object
instrumentedName
logSettings
- propertiesToInstrument
- nonExistingPropertiesToInstrument
- excludedProperties
- logCallStack
- logFunctionsAsStrings
- logFunctionGets
- preventSets
- recursive
- depth

Their meaning can be looked up here.

The new settings object

Note: The new settings object is the settings object for the instrumentation written in Rust. The limitation of capabilities discussed here don’t imply that OpenWPM should drop these existing capabilities. It just means that these capabilities should remain implemented in JavaScript.

I think the format of the settings object should be valid JSON.

Since we can’t implement nonExistingPropertiesToInstrument, preventSets and logFunctionGets as those happen in JS and never call out into the binding layer we shouldn’t include them in the new settings.

recursive and depth require further exploration and should be revisited once the implementation exists.

A minimal implementation of the settings could be:

name - The WebIDL fully qualified name
propertiesToInstrument
excludedProperties
logCallStack
logFunctionsAsStrings

Since MDN uses the fully qualified name for all WebAPIs (see e.g. cookies.CookieStore) users can read up on the interface they are interested in and just copy the name from the title.

This severely limited settings object would require that we give the user premade collections, so they can easily implement common usage scenarios. However, we shouldn’t expand those for the user as they pass us an unstructured list but instead have some like this in the OpenWPM platform code:

import openwpm.js_instrumentation as jsi
settings = []
settings.append(jsi.get_preset("fingerprinting"))
settings = list(filter(lambda setting: setting["name"] != "Storage", settings))
# these two lines do the same thing
settings += {"name": "Clipboard", "propertiesToInstrument":["read"]}
settings += {"name": "Clipboard.read"}

This way the user has full control over the settings and knows exactly what they are getting.

Suspected Limitations

Unlike the JavaScript instrumentation I’m uncertain if the Rust instrumentation will be able to record property access. It seems unlikely that

const cookie = await browser.cookies.get({name:"myCookieName"})
console.log(cookie.value)

would cause two accesses into the binding layer.

So an instrumentation like

[{"name": "cookies.Cookie",
"propertiesToInstrument":"name",
"logCallStack": true}]

would be semantically valid but would not capture any data.

This needs to be explored during implementation.

2021-01-14

https://zabka.it/bachelor/04-configuration/ Stefan Zabka

The current js_instruments_settings

The expanded settings

The new settings object

Suspected Limitations

The current `js_instruments_settings`