Plausible Analytics Events API - prevent manipulation

According to the Plausible Analytics docs, you can do POST-requests towards the /events/api endpoint to record pageviews. I am self-hosting Plausible and was surprised that I could simply do some POST-requests to the endpoint with some dummy data using Postman, and it was recorded as an actual page-view.

I checked another site (using the cloud version), and it seemed I could manipulate the data there as well. Is this normal, or did I set it up wrong? How is one supposed to prevent manipulation of the analytics data, or is this simply how the technology works?

Solution

It's not about Plausible. It's about almost any front-end based analytics tracking system. Matomo, Adobe Analytics, Google Analytics: they're all critically vulnerable in this respect. Not mentioning the army of third party services that track conversions to optimize traffic segmentation.
However: 2.1 Nobody cares enough to bother spoiling others' data. Well, nobody enough for people to not concern themselves with it. It happens rarely. 2.2 It is pretty difficult to spoil the data in a reliable way. You'd have to study tracking pattern, get proxies, set up distributed event flooding, plausibly randomize every dimension that is organically set. It is difficult. 2.3 Even if you're good enough to spoil the data, good analysts and data scientists will be able to at least detect an attack if not clean the data from garbage. 2.4 An attack like this would cost more than setting up pretty good tracking. So from a business perspective, it's too expensive to spoil all your competitors' data.
Finally, yes, you can make it secure. But it's currently expensive. The idea here is to use a sort of server-side tag manager. Adobe Launch (now called Tags), Matomo, Tealium and GTM all offer server-side options. Not only does it offer an opportunity to hide your analytics endpoint, but also allows you to bypass adblockers that normally prevent anywhere from 5 to 75% of all tracking, depending on the audience.

Server-side, however, now requires the tracking implementation specialist to not only know bits of JS and DOM, but server side too, as well have as some API skills. And server-side TMSes don't allow you to execute generic code on the server, so now you have to be ready to write your own back-end code.

Obviously, you may ignore server-side TMS and use measurement protocol instead, directly sending events from your server endpoint to the tracking endpoint, bypassing a TMS. There's value TMSes provide, but server-side TMS just becomes a pretty and well-documented router.

Your tracking scheme now looks like so:

event happened >> 
you generate a data object >> 
you encrypt it >> 
you send it to your generic endpoint >> 
the endpoint decrypts it >> 
it checks the validity of the event >> 
it builds a proper payload to send to the actual server-side TMS endpoint OR to the analytics system, using its measurement protocol >>
Done.

See how much more complex your tracking becomes?

You still will flash your backend endpoint if not the tracking endpoint. It is still possible to hack into it, but now it requires digging into potentially a bunch of obfuscated JS looking for the encryption logic. Therefore, you would want now to obfuscate your encryption code as much as your fantasy allows: using evals and base64, or maybe the Function constructor to conceal eval. Or use some code on the backend to finish encryption.

Again, it's not worth it. I've never seen anyone to care enough about this kind of attack to go through all this trouble, however fun it may seem.