Subscribe Now

* You will receive the latest news and updates on the Canadian IT marketplace.

Trending News

Blog Post

Take metadata security seriously

Take metadata security seriously 

Sending a letter is a frequently used analogy. Content is sealed inside the envelope. The obvious metadata – sender, recipient, postage stamp, and markings added by the post office – is on the outside of the envelope. Other metadata is available without opening the envelope: type, size, colour, scent, and weight of the envelope; the font or handwriting and colour of the ink. People quickly distinguish between greeting cards, expected bills, and junk mail based on this information. When mistakes occur, it is usually because an advertiser has intentionally disguised mail to avoid it being summarily discarded.

The Internet is no different; many layers of metadata exist. But security efforts usually focus on content. Almost everyone understands that it is desirable to protect emails and file attachments. Phil Zimmermann created Pretty Good Privacy (PGP) in 1991. The Internet Engineering Task Force (IETF) published the proposed Privacy Enhanced Mail (PEM) standard in 1993. The Secure/Multipurpose Internet Mail Extensions (S/MIME) standard was published in 1996. While PGP remains popular in the security community, none of these standards has been widely adopted, and none of them protect metadata.

In response to email security concerns, several vendors offer hybrid solutions in which sensitive emails are directed to a HTTPS portal for retrieval by the intended recipient. Other efforts seek to protect email content and some metadata using TLS for email transport.

Instant Messaging (IM) applications suffer from similar security issues. Many use TLS to protect messages, and an increasing number offer end-to-end encryption. However, many people use several different applications to keep in contact with colleagues, friends, and family on non-interoperable systems. Managing risk in this environment is a challenge.

Web sites and web applications generally adopt HTTPS for content security. As discussed last week, HTTPS is a good first step, but several security issues must be addressed before it can be considered a strong security control.

Metadata, on the other hand, has received comparatively little security attention. This is unfortunate because metadata is much more susceptible to automated, large-scale interception and analysis. For example, interpreting the content of every email a single individual sends and receives is complex and resource intensive. Content can be scanned for keywords and phrases, and artificial intelligence applied, but computers are not yet capable of understanding the complexity of human language, emotion, sentiment, and humour.

Oral communication is even more difficult. Telephone calls (including VoIP) can be monitored, captured, and replayed, but as anyone who uses voice recognition software can attest, accuracy is still problematic.

Metadata analysis is far easier to automate. Parsing email headers to extract the sender, recipients, date, time, and subject line is trivial. Mining that data to identify relationships and communication patterns is not difficult. Telephony is also rich in metadata: origin and destination telephone numbers, and the date, time, and length of the call. In addition, mobile phones provide location-based information and VoIP provides IP addresses that can be correlated with other metadata.

Internet use in general creates enormous volumes of metadata. Even if every site a person visits uses HTTPS, DNS lookups reveal the original domain names requested, and information about each HTTPS connection including the site, web server certificate, volume of data transferred, and time spent connected provides information about the user. In addition, cookies used by advertising networks allow individuals to be tracked across many sites, even without monitoring at the network layer.

In response to security concerns, some IM systems have eliminated sending messages through centralized servers by adopting a peer-to-peer architecture. While it may help protect content, it may also increase the amount of metadata observable at the network layer.

Virtual Private Networks (VPN) and anonymization networks, such as Tor, protect some types of metadata. However, traffic analysis techniques can still derive some information by monitoring packet sizes and timing. For example, VoIP, IM, and HTTP over a VPN have different observable characteristics. Metrics as simple as data volumes versus time, date, or day of week provide insight into operational hours. Applied to an individual, time zone and sleep-wake cycles can be determined.

Metadata has the potential to reveal sensitive information and deserves the same legal protections as content. The Office of the Privacy Commissioner of Canada observed in a 2014 research paper, “We continue to see notable individuals and various organizations taking the view that metadata is to be distinguished from actual communications content, and is therefore less worthy of privacy protection.” Among the “various organizations” is the Harper Government, which has adopted the absurd position that metadata collection by government agencies is legal despite the fact that that collecting the content of the same communications would violate criminal and constitutional law.

In the absence of appropriate legal protection, the only practical approach to communication security is to develop and implement better technical security controls. Developers must strive to minimize metadata exposure and focus on protecting all aspects of communications. One shining example is Open Whisper Systems, the developer of open-source secure IM and VoIP clients for Android and iOS. Not only do their apps seek to minimize metadata exposure, but they openly discuss issues and limitations. While still in development, draft specifications by the Dark Mail Alliance demonstrate the desire to significantly reduce email metadata exposure as well as protect content.

It is imperative that security and IT professionals effectively address communication security issues. When assessing risks and selecting appropriate controls, all aspects of the communication must be considered. As metadata and content are part of every communication, it is time to take metadata security seriously.

Have a security question you’d like answered in a future column? Email

Related posts