For months, marketing and online-services companies have dreaded the coming of the General Data Protection Regulation (GDPR), pro-privacy rules protecting European citizens that went into force on May 25. Yet, few understood the impact that the rules would have on another group: security researchers.
Worried about falling afoul of the regulations, a number of domain-name registrars have limited access to the formerly-public database listing the contact information for the owners and technical contacts of domains. The Whois database maintained by those registrars is a useful tool for security researchers to use as an initial step toward tracking down malicious actors.
Similar services have shut down as well. A blockchain startup that included information about whether a wallet owner had passed a background check shuttered its service. And academic and industry researchers worry that their databases used to track down bad actors could expose them to legal liability.
In fact, the GDPR has garnered polar opposite reactions from developers and security professionals, said Guy-Vincent Jourdan, an associate professor of electrical engineering and computer science at the University of Ottawa, who described the reactions at two conferences he recently attended.
“I was at a web conference, and there, people were uncorking champagne—everyone was celebrating about GDPR, because it was so great and they were excited about it,” he said. “While at the security conference, everyone was crying and saying this was the end of the world.”
The GDPR aims to curtail the unwanted use of data and give consumers more control of their own data. Companies that use Whois data for mass emails and spammers who use it for fraud schemes will violate the rules. The publishing of identifying data—which includes IP address and many blockchain implementations—along with sensitive information also violates the GDPR.
Security researchers have traditionally found uses for public data in ways that were not intended. If those methods reveal the subject’s identity, the researchers could violate sections of the GDPR. It will take both time and due diligence for security researchers to determine whether their investigative methods are impacted by the regulations.
“It is important for us, as researchers, to think about the data we are gathering and collecting,” said Richard Ford, chief scientist at security software firm Forcepoint. “There are still ways to do it right, but it is just a little bit harder.”
While the GDPR is intended to protect European citizens, because researchers cannot always know whose data they are collecting, the rules will hamper research in general, experts said. Here are five areas of research that are, or could be, impacted by the EU’s General Data Protection Regulation.
1. Non-intended uses of Whois data
When companies and individuals register a domain, their information is placed in a public database known as the Whois database. Large domain name registrars, such as GoDaddy, maintain a server that provides that information to anyone who asks using a web form or a service known as the domain lookup service on port 43.
In May, with the GDPR looming, leading registrar GoDaddy removed details of all the 57 million domains registered on its service, only responding to so-called “port 43 queries” with the organization, state or province, and country. Queries made through its website can get access to the full Whois record, unless the address is in a country protected by the GDPR.
While the lack of registration information could pose problems for researchers, the information in the database is usually not that useful for identifying bad actors but can be used for detecting patterns in ownership, said Allan Liska, senior solutions architect at Recorded Future.
“Whois has been a very valuable tool for researchers, but [has been] diminishing in value over the past few years,” he said. “Bad guys tend to use fake information, but they tend to reuse that fake information, so it can still make connections and be valuable.”
2. Finding ways to de-anonymize data
Companies have published “anonymized” data for research purposes in the past, only to find that the data actually allows the identification of some of the people whose information was included in the data set. In 2006, for example, the research arm of internet service America Online released a data set that included the search data of 658,000 subscribers. Yet, a variety of sensitive data—such as “can you adopt after a suicide attempt” and queries on incest—as well as location data, and even Social Security numbers, appeared in the data set.
Researchers often find ways to de-anonymize, with other instances of de-anonymization occurring with movie databases, data from social networks, geolocation data and online reading preferences.
For security researchers working with network telemetry data or information harvested from PCs, the dangers of de-anonymization—and a GDPR violation—are real.
“Most types of telemetry are not impacted, but you have to be careful when you are gathering telemetry to make sure that you are anonymizing the data,” said Forcepoint’s Ford. “If it is data-centric telemetry, GDPR is most likely not an issue. But when you are doing human-centric research, with anomalies in people’s behavior, those data sets become even more difficult to manage under GDPR.”
3. Some blockchain implementations will disappear
Blockchain technologies that allow for information to be harvested from the ledger have already fallen afoul of the GDPR.
In late May, for example, blockchain services firm Parity shut down its Parity ICO Passport Service (PICOPS) a day before the GDPR went into force. The service allowed owners of wallets to pass an ID background check, confirming that they were not from a restricted set of countries or on a watch list. Because the wallet is seen as an identifier, the service had to comply with the GDPR.
“[A]s things stand the solutions we have identified restrict the service to a very limited set of features,” the company said in a statement. “Because of this, the significant resources required to make PICOPS GDPR-compliant, and the fact that PICOPS is not part of our core technology stack, we have decided to discontinue the service despite overwhelming market needs and demand.”
Researchers who cull blockchain data may have to take extraordinary care to avoid de-anonymizing personal information and violating GDPR.
4. Take care in mining social media
Researchers who mine social networks for a variety of information—whether for content, to create a network map or to create a profile of individuals—will have to abide by provisions of the GDPR, which has restrictions on automated profiling.
Researchers will have to be careful with research on “anything that is about mining information from social media to find cliques with the same interests or issues, or simply to determine if there is a flu outbreak somewhere,” said the University of Ottawa’s Jourdan. “The information is going to be less available and considered more private.”
In addition, researchers may have to give notice and obtain consent for any non-anonymous data included in a profile and abide by the subject’s decisions, according to an analysis by the International Association of Privacy Professionals.
5. Hunting may produce protected data
Another security research activity that will likely be impacted by the GDPR is threat hunting. Using network telemetry and other data to find threats in the network, and then investigating those threats to identify the attacker, will often involve protected data under the GDPR.
For threat-intelligence analysts, this is problematic.
“Countless stories have been shared in the industry about how finding just one email address registered to a domain used for C2 [command and control] malware led to more insights about the malware threat and those operating it,” one security firm pointed out.
Overall, threat hunters will have to maintain strong contacts with their companies’ legal teams to vet any actions that could identify EU citizens.
“I hope that security researchers will embrace privacy and find ways to work with it,” Forcepoint’s Ford said. “The security industry will look at how we gather data and practice data minimization.”
Overall, the impact of the GDPR on security research has not yet been fully felt, experts said.
“By default, organizations will close things up until they figure out what they can and cannot do,” said the University of Ottawa’s Jourdan. “For the next few months, everything that has to do with investigating incidents and determining who is behind something will be impacted by GDPR.”