Musings on configurations
Configuration is hard, that's why I don't want to do it.
Back in September 2020 I wrote up some general thoughts on how I
want the interface of the component to look
however I very carefully avoided specifying how the settings
object on the consumer
side should look like.
Later I realized that I didn't understand our current js_instrument_settings
at all
and promised to look into it and I've done so now.
The current js_instruments_settings
Our current approach is to take all sorts of things from our user and try to transform
it into a meaningful output.
Here is an example of everything we accept from the comment on clean_js_instrumentation_settings
:
// Collections
"collection_fingerprinting",
// APIs, with or without settings details
"XMLHttpRequest",
{"XMLHttpRequest": {"excludedProperties": ["send"]}},
// APIs with shortcut to includedProperties
{"Prop1": ["hi"], "Prop2": ["hi2"]},
{"XMLHttpRequest": ["send"]},
"Storage",
// Specific instances on window
{"window.document": ["cookie", "referrer"]},
{"window": ["name", "localStorage", "sessionStorage"]}
We accept a list of these things in combination and try to handle the resulting mess ourselves by expanding and
merging the different inputs to wrangle them into a usable structure.
Once we have achieved that we dump the resulting config as a JSON and then
destroy the JSON by unquoting a string
to save us an eval
in the WebExtension.
Update: The OpenWPM maintainer team has since decided that this was a bad idea and I filed #857
I think both accepting many kinds of input and choosing to break an existing serialization
format are bad decisions as they lead to complicated and subtle code.
(The setting looks like JSON at first glance but if you try to json.loads
it,
it will fail with a non-obvious error.)
The expanded settings
Once all the settings have been expanded, there are a lot more properties. They are:
object
instrumentedName
logSettings
propertiesToInstrument
nonExistingPropertiesToInstrument
excludedProperties
logCallStack
logFunctionsAsStrings
logFunctionGets
preventSets
recursive
depth
Their meaning can be looked up here.
The new settings object
Note: The new settings object is the settings object for the instrumentation written in Rust. The limitation of capabilities discussed here don't imply that OpenWPM should drop these existing capabilities. It just means that these capabilities should remain implemented in JavaScript.
I think the format of the settings object should be valid JSON.
Since we can't implement nonExistingPropertiesToInstrument
, preventSets
and logFunctionGets
as those happen in JS and never call out into the binding layer we shouldn't include them in the new settings.
recursive
and depth
require further exploration and should be revisited once the implementation exists.
A minimal implementation of the settings could be:
name
- The WebIDL fully qualified namepropertiesToInstrument
excludedProperties
logCallStack
logFunctionsAsStrings
Since MDN uses the fully qualified name for all WebAPIs (see e.g. cookies.CookieStore) users can read up on the interface they are interested in and just copy the name from the title.
This severely limited settings object would require that we give the user premade collections, so they can easily implement common usage scenarios. However, we shouldn't expand those for the user as they pass us an unstructured list but instead have some like this in the OpenWPM platform code:
import openwpm.js_instrumentation as jsi
settings = []
settings.append(jsi.get_preset("fingerprinting"))
settings = list(filter(lambda setting: setting["name"] != "Storage", settings))
# these two lines do the same thing
settings += {"name": "Clipboard", "propertiesToInstrument":["read"]}
settings += {"name": "Clipboard.read"}
This way the user has full control over the settings and knows exactly what they are getting.
Suspected Limitations
Unlike the JavaScript instrumentation I'm uncertain if the Rust instrumentation will be able to record property access. It seems unlikely that
const cookie = await browser.cookies.get({name:"myCookieName"})
console.log(cookie.value)
would cause two accesses into the binding layer.
So an instrumentation like
[{"name": "cookies.Cookie",
"propertiesToInstrument":"name",
"logCallStack": true}]
would be semantically valid but would not capture any data.
This needs to be explored during implementation.