The GDPR has given the business world a good shake. This is especially so among software developers and startups, two groups that we -- the GDPR Busters team -- closely identify with. What makes the GDPR especially difficult to digest for software developers is the fact it's so broad and vague, and software developers are all about implementation details. For startups the challenge GDPR presents is that, due to their small size, they already have a ton of things they need to worry about.

Prominent startup/developer forum Hacker News has shown us how shaken these two groups were. Most GDPR articles received hundreds of upvotes and comments. The reactions had a feeling of mass hysteria.

This motivated me to embark on a mission to bring knowledge and peace of mind to the software developer and startup world. I'm a founder and CTO of software company Phusion, and the maintainer of the Passenger app server. I initiated a number of AMAs -- Ask Me Anything discussions -- on a variety of forums. Here are the top questions I received.

"Where do I begin with making sense of GDPR and what to tackle first?"

If you are just starting out, here are the most important concepts to understand or to research:

  • Understand that GDPR's goal is organizational awareness and accountability.

    The GDPR is not about a set of rules to robotically follow, nor is it about specific technologies. The GDPR is about promoting organizational awareness of privacy, which is especially important in modern times with ever-increasing privacy concerns. The legislators want all organizations to care about privacy on a cultural level. They want organizations to think about privacy by default and by design, instead of as an afterthought.

    Legislators want to promote data minimization. It is tempting to collect as much data as possible because it might be useful one day. But legislators want us to think about why we collect and process data. If a data processing activity is not necessary, then we shouldn't do it.

    Conversely, the GDPR is not about prohibiting the processing of personal data. If one has a valid reason to do so, then it is allowed. This is in contrast to some previous data protection laws which required registration before one is allowed to process sensitive data. That system has been replaced by a system of accountability. There is no need to register, but one needs to be able to show that one has done the homework.

  • Understand that GDPR's vagueness is intentional and is about the spirit of the law.

    Those who have read GDPR's legislative text find that the text isn't very specific about what to do. This is intentional. Not only is the GDPR about the spirit of the law, its vagueness is also meant to keep it relevant in the face of an ever-changing technological landscape.

    This concept can be hard to grasp for North Americans. In North America, judges tend to operate according to the letter of the law. In Europe, it is more common for judges to operate according to the spirit of the law. This Hacker News comment provides an excellent description of this matter.

    Fears about maximum fines for violating minor technicalities are overblown. Data protection authorities want to see that one has done their best. Fines are seen as a last resort to force an organization into compliance, rather than as punishment.

  • Compile a data processing register (DPR).

    One should inventorize all data processing activities into a document. A fancy word for such a document is a data processing register. It should be considered a live document. Inspectors will expect such a document to be available and up to date.

    For each data processing activity, one must document the following aspects:

    1. Why does one process this data? What purpose does it serve?
    2. What kind of data does one process? E.g. names, addresses, medical records.
    3. What is the lawful basis for processing? "Consent" is only one of multiple possibilities.
    4. What is the retention policy?
    5. What technical and organizational security measures are in place to protect this data?
    6. Which other parties does one share this data with? If this data is shared to outside the EU, what safeguarding mechanisms exist?

    Personal data is a broad concept: anything that identifies a natural person. It could be a name or a photo. But it could even be the phrase "that tall guy in the back" if one is in a small room.

    The GDPR's vagueness is intentional, meant to keep it relevant in the face of an ever-changing technological landscape.

    Anything that happens to personal information -- storage, movement, lookup -- is considered "processing" of personal data.

    In order to compile a DPR it is a good idea to review all of one's systems, processes and legal documents.

  • Sign data processing agreements (DPAs).

    If one is sharing data with external parties, then it is important that one signs data processing agreements with them. A DPA is a binding legal contract that specifies how personal data is to be treated and secured.

    This is especially important if one is sharing data with parties outside the EU. Such sharing is permitted, but only if the contract specifies an amount of protection that is at least as strict as what's provided under the GDPR.

"What are the most valuable online resources for getting actionable advice?"

We've found the following resources to be especially helpful:

Your EU member state's local data protection authority probably has a lot of useful information on their website.

Needless to say, for specific advice regarding one's situation, I recommend contacting a GDPR lawyer. If you're a software developer, don't contact just any lawyer with GDPR knowledge -- you should contact a lawyer that is specialized in IT.

If you have trouble finding such a lawyer (which is understandable if you are outside the EU) then please contact us, we'll be happy to refer you.

"How should we deal with deletion/export requests? We have so many disconnected systems that process personal data! Email, support tickets, Facebook, etc."

The data processing register (which should be the first thing to work on when you begin with GDPR compliance) gives one insight into which systems process which data. That way one can systematically delete or export personal data.

What if one is unable to find the relevant personal data in one of the systems? For example suppose a user said "my email address is a@example.com, please delete me" -- but the support ticket system only allows one to lookup by name. How does one deal with that? Does one need to somehow link the different systems and pieces of personal data together?

The EU will have a difficult time enforcing GDPR to a small US-based web shop that only incidentally gets an EU customer.

Answer: there is no obligation to collect more personal data in order to link currently disconnected systems, with the goal of satisfying data deletion/export requests. Doing so would even go against the spirit of data minimization. So it is fine if in the above example one is unable to delete/export data from the support ticket system.

What one could do is to provide separate deletion/export request portals per logical system. Each portal would only ask for identifying information relevant to that system.

It is also important to know that one is not always obliged to satisfy data deletion/export requests! The "right to be forgotten" is not absolute. It depends on the lawful basis for processing.

Right to erasure Right to portability Right to object
Consent Yes Yes No (but right to withdraw consent)
Contract Yes Yes No
Legal obligation No No No
Vital interests Yes No No
Public task No No Yes
Legitimate interests Yes No Yes

For example, invoices are necessary for tax reporting, and so they fall under "legal obligation". One is not required to delete personal data from invoices during the legally required retention period.

"Given that you have no business presence or interest in the EU, what is the worst thing that can happen if you're not compliant?"

According to Handbook GDPR, Compliance in practice (Dutch, p. 11) by IT lawyer Arnoud Engelfriet, the criterium for whether a non-EU company needs to comply to GDPR is: does the company intend to seriously service EU citizen? This is determined based on multiple factors, such as the website's language (do they have e.g. German translations?), providing pricing in euros, testimonials from EU customers, or having a contract with a parcel company with the specific intention of delivering to EU customers. The mere fact that the EU citizen can do business with this company is not enough to have it fall under the GDPR. Without knowing the specifics of your hypothetical company, my opinion is that it does not need to be GDPR compliant.

Not all lawyers agree with this interpretation. Having said that, the EU will have a difficult time enforcing GDPR to, say, a small US-based web shop that has no EU presence and only incidentally gets an EU customer. So I think nothing will happen if they are non-compliant.

What will most definitely not happen is that they fine you 20M because a single EU visitor came to your website. The authorities prefer to make organizations compliant over punishing them with fines. For example the UK's Information Commissioner said in a public statement:

It's scaremongering to suggest that we’ll be making early examples of organisations for minor infringements or that maximum fines will become the norm.
The ICO's commitment to guiding, advising and educating organisations about how to comply with the law will not change under the GDPR. We have always preferred the carrot to the stick.
[..] we intend to use those powers proportionately and judiciously.

Another example: the Dutch data protection authority recently found that Windows 10 is in violation of the GDPR. They did not fine Microsoft. Instead they had a good talk with Microsoft, who in turn promised to become compliant. Later on, they published an announcement saying that Microsoft had rectified the issue.

It is a real possibility that as time goes by, data protection authorities become stricter w.r.t. enforcement. The next organization may not be let off the hook as easily as Microsoft, because organizations will have had more time to prepare and seen more examples of what not to do. But I think small foreign organizations will no EU presence have nothing to worry about.

"Is compliance easier for smaller companies? Are there exemptions?"

Compliance should definitely be easier for smaller companies. The initial research effort into GDPR takes a lot of time, there's no way around that. But smaller companies tend to have fewer processes, tend to collect less data, and tend to have fewer legacy systems, which should make the post-research implementation easier. For example, at Phusion (which is a company consisting of about 10 people) we didn't have to change much.

Exemptions exist, though not in a form that's useful to most organizations. Take for example the requirement to have a DPR. The UK's Information Commissioner says:

Who needs to document their processing activities?
[...]
If you have 250 or more employees, you must document all your processing activities.
There is a limited exemption for small and medium-sized organisations. If you have fewer than 250 employees, you only need to document processing activities that:

  • are not occasional; or
  • could result in a risk to the rights and freedoms of individuals; or
  • involve the processing of special categories of data or criminal conviction and offence data.

However the Dutch data protection authority does note that processings are rarely occasional. If you have a customer list or even an employee salary administration then that's already a non-occasional (i.e. structural) processing.

So in general, most organizations are bound by the exact same rules, though the amount of work is usually proportional to the organization's size.

"How did your own compliance go? What were the biggest challenges and time sinks?"

Phusion's biggest challenge was creating the data processing register. This involved inventorizing all our systems and processes (both automated and manual), inventorizing all the SaaSes we use, and analyzing what data they processed and how they are linked together. This wasn't a challenge in the sense that it was hard, but it was definitely the most boring.

The next biggest challenge would be learning about the GDPR itself. It wasn't that hard for me personally because I took an IT law class in university. But as a software developer I want to know what the implications are for implementation details, and most of the resources out there don't really help you with that.

Phusion also had a sort of "unfair advantage" when it comes to ease of compliance. We consist mostly of software developers. Being software developers we have a natural attitude that aligns very well with GDPR: we hate surveillance capitalism and spam, we value security, etc. So the amount of training material we had to create was very light. I gave a few presentations. We have a Slack channel on which we regularly share new compliance insights.

Software developers have a natural attitude that aligns very well with GDPR: we hate surveillance capitalism and spam.

Other companies may need more extensive training efforts. For example one company I know made a 10-minute training video that every employee had to watch.

"What techniques or technologies do I need to use to ensure compliance?"

The GDPR does not prescribe specific technical mechanisms. The GDPR is about changing one's default attitude to one of caring about and being serious about data protection. The authorities need to see that one has done one's best and that options have been carefully considered. The legislation gives one a lot of freedom in choosing security mechanisms, but one needs to be able to explain why one chose a particular setup.

For example, do you need to apply disk encryption? That depends: what attack scenarios are you trying to protect against? We at Phusion encrypt some, but not all, of our servers' disks in order to protect ourselves against vulnerabilities in the hosting provider's administration panel. Yes, this actually happened to us before: someone hacked the provider's admin panel and rebooted our server into recovery mode, which did not ask for the root password. But disk encryption results in higher sysadmin burden and does not protect against attacks to running services, so we chose to only encrypt the disks of our most important servers.

Do you need to encrypt messages sent between servers? Again it depends, what are you trying to protect against? If the messages go over links that are already secured at the link level, then extra encryption won't buy you anything. But if the data goes over the public Internet then it is very much recommended to apply encryption there.

These are all arguments that you can document so that when the authorities visit you, you can show them that you've done your homework. Their main job isn't to punish you with fines, their main job is to make you compliant.

"How do I deal with backups?"

Backups have been a big source of confusion. When one receives a deletion request, does one need to delete personal data from backups? But backups are supposed to be immutable, so deleting data from backups can cause all sorts of problems.

Binders on a shelf.
Photo by Samuel Zeller

One viable strategy is as follows:

  • Store a separate list of deletion requests. Then, every time one restores a backup, make sure that the data referenced by the deletion request list are deleted from the live systems.
  • Ensure that old backups expire so that personal data aren't retained in old backups indefinitely.
  • Document the backup expiration time in your privacy policy.

Again, the GDPR does not prescribe specific technical mechanisms. But it is important to carefully consider the pros and cons of different strategies. The above strategy is good enough for most organizations, but if one is processing especially sensitive data (e.g. data related to sex, health, religion, crime) then the above strategy may not be good enough. The amount of "paranoia" with which one should design the backup strategy is proportional to its importance.

"How do I deal with Git repositories?"

The Git commit log contains personal data, namely names and email addresses. Furthermore, Git is supposed to be immutable. One can rewrite history but that causes all sorts of issues. So how does one deal with Git?

In an organizational setting, the personal data stored in the commit log can be considered to fall under:

  • ...legal obligation. For example, the Dutch tax authority considers "tax administration" to be a broad concept. Anything that shows that work has been done during a specific time can be considered tax administration.
  • ...legitimate interest. After all, this information is necessary for proving copyright.

Open source projects are a special case, because the Git commit log is public and is retained indefinitely, and because often there is no managing organization involved. Steve Winslow from the Linux Foundation recently gave a webinar about this topic. He recommended open source projects to make use of the Developer Certificate of Origin (DCO). The DCO -- which should be referenced from e.g. the CONTRIBUTING file -- makes it clear to contributors that personal data pertaining to the contribution will remain public indefinitely. This way, such data would fall under the lawful basis of "contract", which does not provide data subjects with the right to be forgotten.

Pro tip: make use of anonymization and pseudonymization

One should make use of anonymization and pseudonymization as much as possible in order to protect data.

If one anonymizes data -- that is, in such a way that it cannot be used to track down the original subject the data refers to, even if you combine it with other data -- then the GDPR states that that does not count as personal data. Meaning that you can do whatever you want with that kind of data. Typical examples are anonymous statistics.


Photo by arvin febry

If you scrub data in such a way that it cannot identify a natural person, unless you combine it with other data, then it is pseudonymous. Pseudonymous data is considered safer than the original personal data, but it still counts as personal data. Pseudonymized data should be stored separately from other data which, when combined, can identify a natural person.

Reach out to us

The GDPR has raised a lot of questions. We are happy to see that many organizations take it so seriously. But understanding and implementing it can be challenging, which is especially so for parties outside the EU.

Phusion's mission is to help organizations understand GDPR better. So if you need help with GDPR, or if you are looking for a European lawyer with both GDPR and IT knowledge, please don't hesitate to reach out to us!