should you sell your personal data?

May 11, 2023

A while ago, sometime late 2022, I came across a startup called Vana. The website's landing page explains the general idea behind the company: users put their personal data in their “Vana Vault” at which point they can sell or rent the contents to whoever they choose in the Vana Marketplace. This might be data scientists or the highest-bidder.

Vana's website prior to all the AI shift

Vana's website prior to all the AI shift

In early 2023, the marketing shifted from selling data to using your data for personalized AI models.

You might notice that they prefer to use the term “VNA” as opposed to “your data” now. This follows a common tactic of organizations to create a new term when they don't want to use words with negative connotations. If you're a used-car salesman, maybe you say “pre-owned” instead of “used”.

Regardless of their change in direction, I want to take a look at the choice of language of similar businesses to pre-AI Vana. That is, businesses that provide their users a way to sell their personal data (things like Amazon purchase history, Netflix watch history, etc.). I'll say early that while I don't believe Vana or similar services are necessarily malicious, how they present themselves to prospective users is revealing.

pre-AI Vana FAQ

pre-AI Vana FAQ

If you haven't come across anything like Vana before you might be surprised to find out that the idea is relatively common. Datacoup was founded in 2012, Streamlytics in 2019, and Wibson in 2018. Interestingly, Wibson's domain and social media accounts are down, and their linkedin now directs to “illow”, a data deletion service. Quite the pivot!

Taking a look at Datacoup first, a common theme is established pretty quickly.

The service (which shut down in 2019) presents itself as a way for consumers to reclaim their data from “Internet behemoths”. They appeal to people's feelings that they are being exploited and propose the datacoup data marketplace as a solution.

This sentiment of taking control of your data is echoed throughout their landing page. A bit interesting is their choice of the word “Rather” in the above sentence (“Rather than [large companies making money off of your data], monetize your data yourself”). I'll expand later on why I think this is misleading.

Streamlytics is a little more creative in their approach. Their main domain focuses on attracting businesses that want to buy training data of users that have opted-in to having their data sold. They emphasize that this is a more ethical way to buy data (as opposed to non-consensual data brokers), and that all data will go through anonymization.

On the consumer side, “Clture” seems to be the primary way for users to upload their data to the Streamlytics platform. The general idea is the same as before; people upload their data from various platforms, and then get paid for it. Similar to Datacoup, the core messaging of Clture is centered around the idea of taking control of your data.

Clture is generally targeted towards data from entertainment/streaming services like Netflix. According to their faq, you sign a data license which states that the user still owns the data. Unfortunately I was unable to look at the agreement myself since the app is down. The website for their proprietary “Universal Data Interchange Format” (what they use to anonymize the data, and apparently also to do things with NFTs) also returns a 404/page not found! As you might expect, is down as well. The use of a proprietary method of data anonymization is concerning to me from a security standpoint, especially since there is such little publicly available information available about it and who was involved in creating it.

Unable to connect to the actual clture app / sign up page. Web archives did not work either.

Unable to connect to the actual clture app / sign up page. Web archives did not work either.

This is a bit of a tangent but if you're wondering about how Streamlytics's Universal Data Interchange Format has anything to do with NFTs, here's a screenshot from their Discord server. I'll leave it at that since the topic of NFTs is outside of the scope here.

What's more concerning than the core app being down is the privacy policy and terms of use returning “Page not found” :(

Fortunately, prior copies can be found using the web archive. Although I'm not a lawyer, there are a couple of things that stand out to me as odd.

  • Section 12 “How can users opt out?": After clicking that you no longer authorize Streamlytics to sell your information, Streamlytics says that they will “respond to the request within 45 days of receipt” and “notify you of its decision”.
  • Section 13 “Data security”: For the type of service Streamlytics is, I would expect details on data security. There is a single sentence in this section essentially stating that they have “reasonable [security]”

From a non-technical standpoint, Section 12 is most concerning to me. According to Clture's faq, the data license agreement states that the user still owns the data. If Streamlytics holds the power to deny an opt-out request, the user does not have control in actuality.

This exemplifies the tenet of my concern with services like Clture. I don't believe any additional control over personal data is being given to the user. They should be seen for what they are: a quick way to get a few extra dollars a month.

Earlier when Datacoup said “Rather than [companies selling your data], monetize [your data]", it really meant “Companies are selling your data, and you can sell it to more for scraps if you want”.

If you know that a thief will steal your watch tomorrow so you offer it up today in return for a dollar back, the core problem has not been solved. In fact, you have shown the robber a way to steal from you without even needing to mug you in the first place.

We should not be treating the personal data of individuals as a tradable commodity. New companies should not be created to feed into a system ripe with abuse. Instead, we should focus on creating a direct connection between what the user decides to give to a service they use and what the service gets. An abstraction or intermediary only serves to distance the user from their personal data, not to regain control of it.


a note on web archive tools and operational security

solving monkey ctf challenges with monkey ctf tactics