I have no idea how many accounts I’ve signed up for. As such, there are bits of my data all over the web. I understand why forums and shops want to know who I am, or at least connect some information to a pseudonym. This also helps me: I can set up email notifications and save my shipping address, and maintain a recognizable identity in online discussions. All commendable goals without which, life online could become anything from tedious to impossible. Nevertheless, I can’t help but feel wary about having usernames, passwords, email addresses, shipping addresses, preferences, age etc. sprinkled all over.
Image by Katie
One way to deal with this is to use fake identities, but this approach brings with it multiple problems. Using a single fake identity makes this identity important to you. Losing control of it would not be as bad as having your real identity stolen, but it would have an effect anyhow. You could use a different fake identity for every site, but this requires a lot of work. It’s also worth mentioning that using fake names is often a violation of ToS. Perhaps the biggest issue with the fake account approach is that you can’t build online reputation that way. At least not reputation you can use on your resume.
Every site that thinks about setting up registration needs to ask some basic questions before taking the leap:
I know what you’re thinking. Hasn’t sign in via Facebook, Twitter, Google+, etc. already solved this issue? Apps that choose this authentication strategy do not need to store your credentials. This is a great first step. But I’m talking about something quite else.
There is no rule that says that a website must store user account information in the website’s own database. Barring some exceptions, some of which I attempt to tackle a bit later, account information is usually only needed when the user is actually on the site. This means that in the majority of cases, the users themselves could supply their account information when logging into a service.
There would be multiple benefits:
The user could choose to use a cloud provider to store the account data, or use a local disk if so inclined. Cloud storage would be preferred as that would keep accounts accessible regardless what device or from where we were accessing a service.
A merchant typically trusts in their database; the owner of the software has full control of what goes into the database. This is why merchants can trust the account data they save today. Data coming from the outside needs to be somehow verified. And verification becomes key If the account information would reside outside the merchant’s service.
Thankfully cryptographers have already given us great tools to do just this. A basic verifiable object could be something like this:
With this verification process, the user may store the account data instead of the merchant and yet the merchant can be sure of the integrity of the data. In fact, one might argue that data integrity is better than when storing the account in the merchant’s database. It’s more likely that someone breaks into the merchant’s database and meddles with the data, than someone hacking the digital signature of the account information. Breaking into the database would most likely result in the merchant’s signing key to be exposed though.
There are of course cases when the account information is required outside the typical usage. For example all background processes that operate on the user’s information require some data about the user.
Let’s approach this through an example. Imagine a typical web based store front. An account for such a site usually contains:
A typical use-case for such a site is: a user opens the website, browses the selection and decides to order something. After the order has been processed, the user will want to track it’s progress. After receiving the product, the user will come back and give a review of the item.
In order for the above to work, the only customer specific account information the store website needs to store, is a number of transaction records and a way to link the review text to the user’s username (or whatever else is shown besides reviews). If the user identifier is not a globally known name (email address for example), this data can not be traced back to the user without the account information. Which only the user holds.
The moral of the story: the less background processes a service runs, the less of the user’s data the service needs to persist. Background processes may be designed in ways that only parts of the account information are needed for the processing and so the data is stored only for the duration of the background task.
In such a system as the one described above, surprisingly little user data needs to be available at all times! A full break-in to this system would reveal very little of value for the hacker.
This article was written as a way to figure out whether this oddball idea could actually work. I’m not fully satisfied that this makes sense to do on a grander scale, but I am truly surprised of how well this idea has stood my preliminary test. Practical implementations would naturally require extensive browser and server support.