How Machine Learning Will be a Game Changer for Location Data: Part 2

Get data for any location

Start your search
This series of posts aims to provide a high-level overview of how machine learning can provide more robust location data products while reducing costs and enhancing privacy. 

Introduction

Most products based on location data serve insights into human mobility and are based on fairly simple technical methods. For example, a common workflow for a product that estimates foot traffic to a retail store or other venue may look like this:


First, a standard data workflow to estimate foot traffic to a location using GPS data. Raw GPS data is filtered and then only stay points are clustered together. Based on those clusters, supply correction and extrapolation are performed to derive the estimated person count for that store.

More sophisticated products within the industry bring more context, like home and work or area demographics, into the metric. However, the flow is always the same: first pre-process the raw data, cluster individual data points to a dwelling event, correct for technical problems of the data, and aggregate all dwelling events in an area.

This approach is simple but effective. It allows for very accurate estimates of foot traffic, especially when someone is interested in patterns over time. The technical sophistication, and mostly proprietary part, lies in the supply correction as a simple aggregation would be highly affected by the underlying issues in supply. 

Looking at the chart below, impacted aggregated data (orange) can be corrected with sophisticated supply correction techniques to derive a useful signal (cyan).


Location data limitations

Even though the above methodology works, it still comes with significant limitations:

  • Supply is constantly changing and requires improvements and new product versions continuously.
  • Acquiring and storing all the device-level data over time comes with high costs.
  • Public reputation for working with the data is low and, due to privacy reasons, the volume of available data is decreasing.

The general setup of buying location data in its raw form and re-selling it as some sort of derivative is not a viable path in the future and will decrease the robustness and quality of existing location data products. 

A scalable long-term quality product requires that the aggregation is already happening on the 1st party data owner side. With that, issues around data privacy, supply inconsistencies, manipulated data, or storage costs are significantly reduced.

Aggregating data on the 1st party side is a win-win for everyone, but: how can we build a product based on already aggregated data? How do we deal with data deduplication, assignment of data to locations, or estimating foot traffic to a store? The answer is machine learning!

Ready to Get Started?

Book a meeting with an ML location data expert now.

Schedule a Meeting

What is machine learning?

There are various great introductions to the basics of AI and machine learning (like this one) and a simple internet search (or asking chatGPT / Bard) will provide a better answer here than this story will. However, to make it super intuitive and easy:

Machine learning allows an artificial system to learn relationships between data without human interaction.

It is important to note that the number of input features is not limited to just one. In fact, machine learning usually uses a lot of features to train robust relationships. The benefits are manifold. For instance, when we think about our aggregated data problem coming from 1st party data providers, machine learning would allow us to learn relationships between those aggregates and a given target we would like to estimate (e.g., foot traffic to a store).

How to use machine learning with location data

There are several ways to use machine learning with location data that befit a range of industries and use cases. To keep things focused, we’ll pick one and use it as an example: estimating foot traffic to a store.

Estimating foot traffic to a store

To make things more intuitive, a case study is chosen here using GPS data coming from mobile devices. The aim is to develop a reliable and qualitative product that informs customers how many people visited a specific store on a daily basis. This is a very useful insight for companies who are interested in their competitor's store performance or site selection.

The current state-of-the-art methodology

As of today, companies who estimate store traffic based on GPS data are either doing that directly based on raw GPS data or by aggregating that raw data and correcting for supply fluctuations. The current state-of-the-art is to have a high data volume to build raw device-level feeds or aggregations of the underlying data. However, with low data volume, this methodology is limited.


When the product comes with high enough data volumes, both product methodologies (device level and aggregation) do work and major concerns are more about data privacy, supply fluctuations, cost, and trust in the data supply. However, when the data volume is low or the store is located in an area with a generally low market share, simple aggregation does not allow for a product since it would always end up with “0” counts. Given the general decrease in available location data, this is already a problem for the industry.

The better way: Using a machine-learning model

Keeping in mind the conditioning example from before, a machine-learning model does simply learn relationships between conditions. Similar to the dog learning that raising a paw leads to a reward, a machine learning model can learn that if more people are close to the venue, there are most likely also more people inside the venue.


Machine learning models allow the training of relationships between the surrounding foot traffic to the foot traffic inside the venue. This methodology holds even when the data supply is of a deficient volume.

In other words, the purpose of machine learning is to train a relationship (or model) that describes how foot traffic inside a store changes based on fluctuations in traffic outside the store. For example, imagine that on a given Saturday there is a grand opening that leads to the situation that twice as many people are close to the store as on a regular Saturday. In that case, it is very likely that more people will make their way into the store.


Of course, the relationship between foot traffic outside the store to the inside must not be linear. But that is also not the only relationship for a model to learn. Just think about it, what else affects foot traffic to a store that can be measured? Because, essentially, every data that relates to the store traffic improves the quality of the model. A few datasets that enhance those relationships are precipitation, area population, demographics, day of the week, holidays, and many more.

Machine learning is capable of using all these different datasets and combining them into a single model that describes the relationship of how foot traffic inside a store changes based on data describing the surroundings.

Summary

Even small changes in the supply volume can have a massive negative impact on an aggregated data product without proper correction. We at Unacast specialize in the supply correction of GPS and Telco data and have automatized functions that correct data daily. 

This said, it is essential to point out that scalable long-term products require that aggregation is already happening on the first party data owner side before machine-learning is applied. 

Our machine-learning models allow the training of relationships between foot traffic inside a venue and the foot traffic surrounding it. This methodology holds even when data supply is of a deficient volume. Our technology is safe and secure.

But even though machine learning offers a lot of opportunities, it is not something that can solve everything and comes with limitations that need to be addressed. We will address that side of things in Part 3: Nothing is perfect, so what are the Pros and Cons?

Want a data sample?

Put Unacast's location data to the test.

Get Started

Frequently Asked Questions

Discover how analyzing real-world movement patterns can reveal valuable trends in customer behavior, optimize business operations, and enhance strategic decision-making.

What is site selection and why is it important?

Site selection is the strategic process by which businesses identify, evaluate, and choose optimal locations for their operations. This process is paramount as the location of a business directly influences factors such as accessibility, visibility, profitability, and market longevity. For retailers, the right site can mean higher customer footfall and increased sales. In real estate, a well-selected site can promise lucrative returns on investment and tenant stability. Financial service firms leverage site selection to position their branches or ATMs in high-demand areas. Essentially, site selection plays a pivotal role in ensuring the success and growth of a business by aligning its physical presence with market opportunities and demands.

How does location intelligence enhance site selection?

Location intelligence refers to the harnessing of geospatial data to derive actionable insights, which can significantly enhance the site selection process. By analyzing data like consumer demographics, foot traffic patterns, competitor locations, trade area data, and more, businesses can make more informed decisions about where to establish or expand their operations. Location intelligence allows for a deeper understanding of market dynamics, revealing hidden opportunities or potential pitfalls. For instance, retailers can identify gaps in the market, real estate professionals can forecast property value trends, and financial service providers can assess areas with high customer demand. Advanced tools, like those offered by Unacast, further refine these insights by leveraging AI and machine learning, enabling more precise and timely decision-making.

What challenges do businesses face in the site selection process?

Unacast provides invaluable support to businesses during the site selection process through its advanced location data and analytics software, all powered and refined by Artificial Intelligence and Machine Learning technologies. The company offers a suite of products designed to deliver accurate, actionable, and comprehensive location intelligence. This data proves crucial for businesses looking to understand consumer behavior, analyze traffic patterns, evaluate competitor locations, and much more. With Unacast’s robust tools, businesses in retail, real estate, and financial services can derive insightful information necessary for making strategic, informed site selection decisions. The platform not only provides reliable data but also ensures it is readily actionable for businesses, whether they are looking to open a new store, invest in property, or expand their financial services to new locations.

What types of location data are crucial for informed site selection?

Demographic data offers insights into the age, income, and lifestyle of people in a particular area, helping businesses understand their potential customer base. Foot traffic data provides information on the number of people visiting a location, which is crucial for retailers to estimate the store's potential popularity and for real estate professionals to assess an area's vibrancy and demand. Geographic Information System (GIS) data helps in visualizing and analyzing geographical details, supporting companies in identifying accessible and strategically located sites. Understanding the proximity to competitors, accessibility, and the socio-economic profile of the surrounding areas is also vital. Unacast’s platform aggregates and analyzes these various data types, providing a holistic view that significantly empowers businesses in their site selection endeavors.

Resources

Sort
No items found.

Book a Meeting

Meet with us and put Unacast’s data to the test.
bird's eye view of the city