Waging Guerrilla Warfare Against Big Data
Organizations across the globe are amassing huge repositories of data on every living person on Earth. The data is collected by governments, non-governmental organizations, and private companies. These organizations then sell or trade this data to other organizations who may then trade or -sell it again. In addition, this data is often held insecurely, which leads to it being stolen by professional thieves who then sell it to organized criminal organizations for use in credit card fraud and identity theft.
The Uselessness of Data Privacy Laws
Numerous data privacy laws have been passed across the globe, but they do little more than create a false sense of security. It is almost impossible to discover that data is being collected and stored. In addition, the worst offenders are government organizations — who tend to believe themselves to be above the law. These data privacy laws are almost always useless in protecting data privacy against people employed legitimately as IT administration staff. Lastly, no data privacy law is capable of forcing organizations to secure data completely against the possibility of theft.
A good rule of thumb is that if someone has your data, everyone has your data.
Corrupting Data is More Effective Than Destroying Data
Imagine for a moment a team of highly skilled hackers penetrating deep within the IRS information infrastructure. If these hackers delete taxpayer data, the IRS will detect the intrusion, secure their network against the hackers, and restore the lost data from backup. The end result will be nothing but a minor inconvenience for the organization and the IRS will very likely hide the intrusion from the public in order to avoid embarrassment and potential damage to the careers of those responsible for allowing the breach to occur.
Now imagine a different scenario. In this scenario, instead of deleting data, the hackers slowly modify small portions of the data. Because the changes are minor, the hackers are not detected and kicked out of the systems. The hackers modify tax returns and other financial data held by the IRS for long enough for the backups to become corrupt. Now, even if the hackers are detected and blocked, the IRS does not have a simple and reliable method for determining if it is using good data or corrupt data. If the hackers make the breach public, every taxpayer who is harassed by the IRS can claim “Your data is bad!” and demand that the IRS prove that their data is good — in court. The damage to the agency would be so severe that even government workers would have to be terminated, or at least sent on leave with full pay until their retirement dates.
Guerrilla Techniques for Corrupting Data
The previous example requires significant technical skill and commitment. However, a better technique for corrupting data is available to anyone who provides data to any organization. This technique is nothing more complex than providing bad data for the original entry.
As most of us would be terrified to provide bad data to the IRS, let’s consider a softer target for our next example — Facebook. Facebook collects a huge amount of data on it’s almost one billion subscribers. It then sells this data, without disclosing who the data is sold to or what it will be used for.
Facebook collects two categories of data, both of which can be compromised. The first type of data is profile data. To create an account, Facebook asks for your name, email address, gender, and date of birth. Once your account is created, Facebook wants to know where you work, where you live, where you were born, who your family members are, your sexual orientation, your relationship status, your religious and political views, your telephone numbers, screen names, and home address. This is a lot of data and this data is valuable for all sorts of purposes which may be not in your best interest. For example, date of birth and home town are often used by companies to verify identity. Anyone who buys or steals this data from Facebook will have a much easier path to stealing your identity. If you believe that Facebook security is perfect and can never be compromised, do you also believe the same of every organization to whom Facebook sells your data?
There is no need to give Facebook most of this data, but there is an alternative to not using Facebook or not filling out the fields. That alternative is to fill out the fields incorrectly. Pick a new “online” birthday for yourself. That will make it more difficult for organizations to combine records for you in different databases. Use a nickname instead of your legal name. Instead of registering as Adam Kowalski, register as “Buzz Kowalski”. Enter your parents hometown instead of your own. Register using a free Yahoo Mail address instead of your usual personal email address. Decide which fields Facebook deserves to know and which ones they deserve to know correctly.
The second category of data Facebook collects is activity data. This includes people you Friend or Follow and pages you Like. Facebook has become even more intrusive recently by asking members if they know people who send friend requests “outside of Facebook”. This proves that personal connection data is important to Facebook — most likely because they are selling it to someone. The guerrilla strategy is to Friend large number of complete strangers on Facebook. It’s fun to friend people with similar interests — they may even become real friends!
Facebook is an excellent example due to its massive data collection capabilities, but these techniques are useful for any database which accepts input directly from individuals.
It Only Works If Enough of Us Play
If only a few people corrupt their Facebook data, or their data in any other database, nothing happens. All databases are imperfect, this is expected. However, if enough people input corrupt data, the people who buy and use the data will lose confidence in it. Data suddenly becomes far less valuable when users are less confident of its accuracy.
The Upsides and the Downsides
The benefits of providing synthetic (i.e. fake) data include privacy, freedom, and protection against identity theft. There negatives are that you will receive less relevant advertisements and that you will have to remember (or write down and store securely) more than once set of answers to the usual registration form questions.