How machine learning helps in privacy with data aggregation

Unacast

Published on:

June 28, 2023

Last updated:

October 31, 2023

Get data for any location

Start your search

Introduction

The location data industry is growing fast but still in its technical infancy. Most products based on location data are relatively simple technically, and can be seen as a form of implemented descriptive statistics (e.g., the average amount of devices seen inside a venue or a given area). Worst case, those products are the raw location data themselves. Machine learning can bring a lot of value to the industry by saving costs, increasing product quality, and enhancing privacy.

However, to make innovation happen, engineers and data scientists have to convince stakeholders and decision-makers about the short and long-term benefits of such a methodological change. This is especially challenging when decision-makers are not familiar with artificial intelligence or machine learning in detail. Understandably, from a manager's perspective, it is hard to buy into a significant product change without understanding the matter.

Therefore, this series of posts aims to provide a high-level overview of how machine learning can provide more robust location data products while reducing costs and enhancing privacy.

Want a data sample?

Put Unacast's ML location data to the test.

Get Started

‍

The location data industry and privacy

The location data industry is a large, growing and fragmented business area offering products that can provide unique insights for their customers. Specific products based on location data allow companies to analyze, for instance, how many people go to a competitor's store, where their customers are coming from, how many people moved from one area to another, etc. However, working with location data is far from trivial and comes with one massive problem: privacy!

Besides other technical and data-related issues that need to be addressed when working with location data, individual privacy is the most important and, in the long run, probably the most challenging for the industry. It does not matter if the location data in question is GPS data coming from mobile phones, telco data, or satellite imagery -- the whole point of location data is to reveal a location. Simple products (raw data or aggregated) do not rule out the possibility of reverse engineering and, thus, violating someone's privacy.

Even “privacy-friendly” data transformations - like hashing the unique identifier, obfuscating the latitude and longitude, and aggregating data - hardly make reverse engineering impossible. In addition, even if a third-party company is aggregating that location data in a perfect privacy-friendly way, the individually identifiable data has already been sent digitally to that company.

With that, this sensitive data is not controlled by the first-party data owner or the individual anymore. As if this is not already problematic enough, this whole process of building products is costly and not very robust due to underlying supply problems.

Unacast believes that the future of the location data industry lies in a combination of two things:

The early aggregation of data on the 1st party data side in a non-identifiable format and;
Utilizing machine learning on top of these aggregates to create human mobility insights.
‍

Summary

The location data industry is rapidly growing but still in its early stages. Most products based on location data are simple, not robust, and lacking in privacy. Methods based on machine learning have the potential to bring additional value to this industry by reducing costs, increasing product quality, and enhancing privacy.

We at Unacast believe that the future of the location data industry lies in a combination of early data aggregation in a non-identifiable format with machine learning techniques on top of these aggregates and, with that, creating high-quality human mobility insights products.