I am creating an open source project in which I am adding metrics using Application Insights to a desktop application. I would like to keep privacy at the forefront of the data collection, and so am taking pains to not collect any more data than absolutely necessary, and in general, no personally identifiable information at all. I seem to have generally succeeded in wiping the data that is sent to the server, here is an example data upload:
{"ver":1,"name":"Microsoft.ApplicationInsights.guid.Event","time":"2020-04-25T03:31:15.464+0200","sampleRate":100.0,"iKey":"guid","tags":{"ai.internal.nodeName":"aeb804e4-c649-4a9c-bd57-905c7e81abf3","ai.session.id":"aeb804e4-c649-4a9c-bd57-905c7e81abf3","ai.session.isNew":"true"},"data":{"baseType":"EventData","baseData":{"ver":2,"name":"application.startupMode","properties":{"mode":"help"}}}
Other than the instrumentation key which I scrubbed, this is the full data I'm sending, as you can see, nothing about the user at all. The session id resets each run of the program. However, the geolocation appears to be happening anyways, and I can see more details in Application Insights than I want, and it's detailed down to the city. I do not have enough users for this to be anonymized, so probably each city is a unique user (even probably some entire countries), and there would not be enough duplication among users for this to be irreversible.
I have scrubbed the geolocation data from this image.
Therefore, I would like to prevent this data from being logged at all, or at least not accessible to me. Is this possible to do? I would even be ok with faking the data, though I would prefer not having to set up a proxy server or something complicated like that.
it is possible, by explicitly setting city/state/country yourself. if any of those are set in the incoming events, then the GeoIP lookup based on the IP address is not done.
see the bond specification, relevant parts here:
[Description("The IP address of the client device. IPv4 and IPv6 are supported. Information in the location context fields is always about the end user. When telemetry is sent from a service, the location context is about the user that initiated the operation in the service.")]
[MaxStringLength("46")]
200: string LocationIp = "ai.location.ip";
[Description("The country of the client device. If any of Country, Province, or City is specified, those values will be preferred over geolocation of the IP address field. Information in the location context fields is always about the end user. When telemetry is sent from a service, the location context is about the user that initiated the operation in the service.")]
[MaxStringLength("256")]
201: string LocationCountry = "ai.location.country";
[Description("The province/state of the client device. If any of Country, Province, or City is specified, those values will be preferred over geolocation of the IP address field. Information in the location context fields is always about the end user. When telemetry is sent from a service, the location context is about the user that initiated the operation in the service.")]
[MaxStringLength("256")]
202: string LocationProvince = "ai.location.province";
[Description("The city of the client device. If any of Country, Province, or City is specified, those values will be preferred over geolocation of the IP address field. Information in the location context fields is always about the end user. When telemetry is sent from a service, the location context is about the user that initiated the operation in the service.")]
[MaxStringLength("256")]
203: string LocationCity = "ai.location.city";
I'm not 100% sure how to set them in the Java sdk, but I know that the backend supports it (as i was the one who added it a long long time ago)
hypothetically you could just set context.country
to the string "Unknown"
or something to get all the rest of those fields to not be generated.