Surrendering Users' Data at Gun-Point
Approximately a year ago, large companies in Iran began sharing their users’ information with the government.
As a result, people began boycotting these companies. Engineers working at these companies faced criticism for handing over the data and continuing to work for these organizations. In response, they posted on their social media platforms, seeking sympathy and asking: “What would you have done if the army intelligence simply walked in and demanded all your data?”. That is a fair question. I cannot but agree that, given the same situation, I would have likely done the same.
However, I was not in that situation. And that is the point.
The engineers in these companies have built an impressive body of knowledge about their users. So much so that some of them have managed to assemble a team of AI and Machine Learning experts, supplying their marketing teams with predictive information and trained models to increase their market share by playing with prices and outperforming their competitors.So the amount of information they have gathered is indeed impressive, and it far exceeds the amount needed for the app to function effectively.
So why do these engineers continue to store data they don’t actually need? I have three possible reasons:
- They lack creativity in finding solutions that only gather information that is justified by the need.
- They are unwilling to risk their jobs.
- They lack understanding of the risks associated with storing these data.
Let me offer an example of how things can be better:
Imagine that Tim is tasked with cleaning up inactive users from the database and moving their data into an archiving database.
How can Tim recognize the inactive users?
Tim comes up with the solution of keeping track of user login dates. A giant append-only list for each user, where the time and date of each login is appended. From this data, Tim can identify users who have not logged in for over six months.
This solution does more than just the job. It also provides a body of knowledge about user habits, their access to the internet, and the ratio of actions per login. These information, combined with data from other sources, become rich data sources that allow tracking every single step any user takes.
Instead of the append-only list approach, Tim could have implemented a simpler solution using a boolean value for each user. Initially, this value would be set to false
, and once a user logs in, the database would update it to true
. After six months, Tim could easily generate a list of users who have not logged in.
This approach is more cost-effective, less susceptible to abuse, and still achieves the same result as the more complex solution.
Storing plaintext passwords is undeniably a mistake, as it provides more information than necessary for anyone with good intentions. To address this issue, passwords are hashed instead. The purpose is not to know the actual password but to confirm, with reasonable certainty, if a user has provided the same password before.
A salted hash can achieve this goal, ensuring that the stored information is sufficient for verifying user authentication.
Similarly to how storing plaintext passwords is a mistake, any other information on our databases can also be misused to harm users.
So if you’re more than just digital mercenaries, solely focused on the next paycheck without giving a shit about the consequences for others, I implore you to adhere to some professional principles. Don’t blindly agree to every request from the marketing team. Instead, ask yourself if you would be comfortable with your own data, in these databases, being breached or falling into the hands of an authoritarian regime or even your marketing team.