Should you sell your personal data?

May 11, 2023

A while ago, sometime late 2022, I came across a startup called Vana. The website's landing page explains the general idea behind the company: users put their personal data in their “Vana Vault” at which point they can sell or rent the contents to whoever they choose in the Vana Marketplace. This might be data scientists or the highest-bidder.

Vana's website prior to all the AI shift

Vana's website prior to all the AI shift

In early 2023, the marketing shifted from selling data to using your data for personalized AI models.

You might notice that they prefer to use the term “VNA” as opposed to “your data” now. This follows a common tactic of organizations to create a new term when they don't want to use words with negative connotations. If you're a used-car salesman, maybe you say “pre-owned” instead of “used”. Or if you're the Russian government you might use “special military operation” instead of “war with Ukraine” (until recently that is).

Regardless of their change in direction, I want to take a look at Vana's pre-AI-bandwagon choice of language as well as some other similar businesses. That is, businesses that provide their users a way to sell their personal data (things like Amazon purchase history, Netflix watch history, etc.).

If you haven't come across anything like Vana before you might be surprised to find out that the idea is relatively common. Datacoup was founded in 2012, Streamlytics in 2019, and Wibson in 2018. Interestingly, Wibson's domain and social media accounts are down, and their linkedin now directs to “illow”, a data deletion service. Quite the pivot!

Taking a look at Datacoup first, a common theme is established pretty quickly.

The service (which shut down in 2019) presents itself as a way for consumers to reclaim their data from “Internet behemoths”. They appeal to people's feelings that they are being exploited and propose the datacoup data marketplace as a solution.

This sentiment of taking control of your data is echoed throughout their landing page. A bit interesting is their choice of the word “Rather” in the above sentence (“Rather than [large companies making money off of your data], monetize your data yourself”). I'll expand later on why I think this is misleading.

Streamlytics is a little more creative in their approach. Their main domain focuses on attracting businesses that want to buy training data of users that have opted-in to having their data sold. They emphasize that this is a more ethical way to buy data (as opposed to non-consensual data brokers), and that all data will go through anonymization.

On the consumer side, “Clture” seems to be the primary way for users to upload their data to the Streamlytics platform. The general idea is the same as before; people upload their data from various platforms, and then get paid for it. Similar to Datacoup, the core messaging of Clture is centered around the idea of taking control of your data.

Clture is generally targeted towards data from entertainment/streaming services like Netflix. According to their faq, you sign a data license which states that the user still owns the data. Unfortunately I was unable to look at the agreement myself since the app is down. The website for their proprietary “Universal Data Interchange Format” (what they use to anonymize the data, and apparently also to do things with NFTs) also returns a 404/page not found! As you might expect, is down as well. The use of a proprietary method of data anonymization is concerning to me from a security standpoint, especially since there is such little publicly available information available about it and who was involved in creating it.

Unable to connect to the actual clture app / sign up page. Web archives did not work either.

Unable to connect to the actual clture app / sign up page. Web archives did not work either.

This is a bit of a tangent but if you're wondering about how Streamlytics's Universal Data Interchange Format has anything to do with NFTs, here's a screenshot from their Discord server. I'll leave it at that since the topic of NFTs is outside of the scope here.

What's more concerning than the core app being down is the privacy policy and terms of use returning “Page not found” :(

Fortunately, prior copies can be found using the web archive. Although I'm not a lawyer, there are a couple of things that stand out to me as odd.

  • Section 12 “How can users opt out?": After clicking that you no longer authorize Streamlytics to sell your information, Streamlytics says that they will “respond to the request within 45 days of receipt” and “notify you of its decision”.
  • Section 13 “Data security”: For the type of service Streamlytics is, I would expect details on data security. There is a single sentence in this section essentially stating that they have “reasonable [security]”

From a non-technical standpoint, Section 12 is most concerning to me. According to Clture's faq, the data license agreement states that the user still owns the data. If Streamlytics holds the power to deny an opt-out request, the user does not have control in actuality.

This exemplifies the tenet of my concern with services like Clture. I don't believe any additional control over personal data is being given to the user. They should be seen for what they are: a quick way to get a few extra dollars a month.

Every day you walk down an alley to your favourite coffee shop. The alleyway might have some weird graffiti now and then, but for the most part it gets the job done and you quite enjoy it. In fact, the alley is pretty great, so more and more shops start to open along the walls. One day you notice cameras positioned throughout your walk, the next day you notice some kind of sensors.

After confronting the man that makes decisions for the alley, it turns out that he had been selling all of the pictures and sensor data to the shops for them to understand their customers better. This (probably) makes you a little uncomfortable.

A few days later another man walks up to you.
“Did you hear about the whole spying thing? It's crazy how much money the alley makes off OUR photos while we get nothing– don't you agree?"
“Yeah,” you reply.
“Hey– y'know if you go ask the alley man for a copy of your photos and sensor info I could sell it to some other people if you want. I'll give you a cut– let's show this alley who's in control, you with me? Don't worry though I'll scribble out your face first.”

Now, obviously you're not actually in control of your photos. The coffee shop still knows that you'll be more receptive to an extra large latte if you walked over a bit more slowly that morning. Google still knows which ads you'll be most likely to engage with. When Datacoup says “Rather than [companies selling your data], monetize [your data]", it really means “Companies are selling your data, and you can sell it to more for scraps if you want”.

If you know that a thief will steal your watch tomorrow so you offer it up today in return for a quarter back, the core problem has not been solved. In fact, you have shown the robber a way to steal from you without even needing to mug you in the first place.

We should not be treating the personal data of individuals as a tradable commodity. New companies should not be created to feed into a system ripe with abuse. Instead, we should focus on creating a direct connection between what the user decides to give to a service they use and what the service gets. An abstraction or intermediary only serves to distance the user from their personal data, not to regain control of it.


Web archive tools and operational security

Solving monkey CTF challenges with monkey CTF tactics (ISSessions CTF 2022)