How accurately can Unacast measure traffic volumes in comparison to real physical measurements? In the study below, we explore how Turbine, Unacast's platform-as-a-service, compares to reference data (a.k.a. ground truth data), and examine the strengths and potential use cases of the output aggregates.
A drop in the ocean seems insignificant, but an ocean is made of a multitude of drops. Likewise, a single geolocation point in time and space may seem useless, but when it is one amongst a million comparable data points, an ocean of possibility becomes available. All that is left for us to do is to explore that ocean and map its wonders.
With data sourced from telecommunication companies, the potential is large indeed. We can routinely expect data covering a stable demographic sample at all times and locations covered by the network. In the case of data sourced from BT, this coverage encompasses the entirety of the UK, gathering data from approximately 32% active mobile users
During the course of a day, each device on the network will generate pings that individually allow us to locate them in space and time. These can be enriched with demography and customer information, and are immediately anonymised via a tokenisation system to an irreversible identifier. By the time of final output, identifiers are aggregate such that no single customer can be traced and their individual behaviour cannot be determined, even through complex grouping of the data. This allows for customer data to remain totally private, whilst larger trends can be disentangled. Trends and aggregates will be used in turn to inform transport enhancements, improving customer journeys.
The resultant aggregated data is functionally informative for insights into footfall at a specific location, assuming that we see identifiers 100% of the time. In reality this is never the case, as identifiers will appear with significant frequency during times of high usage, then all but disappear in times of low usage. The real trick then is to be able to fill in the gaps in the paths of signals to understand movement between pings (for example, as in Figure 1). As such, our task is not to simply classify pings as a chain of point-objects, but rather to understand and describe movement that has led to pings appearing at specific points in space and time. This higher order contextualisation of the data is non-trivial but infinitely more useful for location insights, not to mention incredibly fun to disentangle.
An example of this application would be in de scribing an average daily commute traced by millions of identifiers. We may see pings in the vicinity of residences; then near radio silence as people walk or drive to a station, with only a few pings appearing on average; then many pings could appear as train commuters check their emails. In isolation, these pings may inform us about likely residential locations, then that identifiers cluster at a stations, and then again identify non-residential areas. With true insights and enrichment of data Turbine Engine can transform this into fully contextualised journey aggregates; the exact volumes of cars on roads taken to a station (and if there was more traffic than normal); the number of times specific trains were taken; whether the trains were at full capacity or only half full; if more the identifiers show walking from the station to work areas, or instead took taxis. These are all extremely valuable insights that would be invisible without the contextualisation that Turbine provides.
The real test becomes the accuracy of such methods. At a point, algorithms that assume human behaviour must be tested, lest they drift from reality and no longer properly contextualise a journey. As one example of this, road data processed by Turbine was compared against physical vehicle counts to measure exactly how accurately Turbine is describing road behaviour.
Comparison to Reference Data
Highways England  maintain a system of sensors across all UK motorways, run and monitored by Clearview Intelligence. These constantly measure traffic volume and are used to identify accidents and for adjustments in lane speeds according to the UK ‘smart motorway’ scheme . These sensors count cars in a small portions of roads using inductive loops, and can provide users with not only numbers of cars, but the speeds and directions of the cars. These represent incredibly accurate views into the volumes of cars seen per day on major UK roads. Turbine on the other hand does not need to restrict counts to specific road segments (Figure 2). Instead, any road appearing on our maps can have counts calculated. By taking a sub sample of turbine data at positions that match the Clearview Intelligence sensors, we can compare our data to the actual numbers of vehicles on a given day.
Extracting the volumes for a given timestamp shows immediately promising results. Figure 3 shows a sample of monitored positions from Clearview Intelligence, with triangle orientation roughly align ing to the direction of the sensor. The reference counts indicate the expected patterns in which traffic volume at 3am is extremely low, whilst the same volume measurements at 5pm show high amounts of traffic. When comparing this to Turbine values we see remarkably similar values at large. Whilst there are local differences, the general traffic trends are well modelled.
This is better demonstrated statistically by examining the coefficient of determination (or r2) between actual and reference values.
Definition: r2. The Coefficient of Determination, hereafter r2, indicates the variation in the linear regression between two variables (in our case, the reference sample and the Turbine sample). This is essentially a measure of how well our modelled data matches the real data. An r2closer to 1 suggests the two samples are very similar and we have correctly modelled the data. An r2 nearer to 0 suggests a poor match between the reference and Turbine datasets.
We explore the proximity of our modelled data in comparison to the reference data in Figure 4. Here we see 3 plots that give an indication of the quality of the Turbine road modelling. The upper left panel shows the turbine traffic volumes in all positions and for all times, directly plotted against the same volumes in the reference data. Perfect modelling would give a 1:1 match between these values, as indicated by the white line. Colour in this plot indicates the counts within the 2d histogram bins, with the majority of road counts for both reference and Turbine data showing fewer than 1000 cars recorded for any given time-bucket or position. It should be noted that the colour scale is logarithmic, and thus the frequency of occurrence of low volume counts is by far the greatest.
Remarkably, the Turbine counts are highly similar to the reference counts, with the values clustering around the 1:1 line. The correlation falls off slightly above volumes of 3000, though this represents fewer than 1.3% of the data-points. This suggests Turbine slightly overestimates counts for the busiest 1% of locations, most likely as a result of incorrect scaling of passengers per car at certain times or days.
This brings about an interesting question. In situations where the Turbine outputs are less well mapped to reference data, is this common across bucketed-time? Or is this location specific and invariant in time? To explore this, the distribution of r2for both time and location fixed buckets is shown in the upper right panel of Figure 4. Here we can compare which aspects have the better r2, time or location. The predominance for the time location axis to have higher r2 values indicates that we show generally very good predicted traffic volumes across the country, and that there seems to be little geographic variance in our measurements. This in turn indicates that we are not biasing towards certain regions such as cities compared to rural areas. The case is not so clear when we examine the time dimension. The much more widely distributed time axis shows that there is indeed variance with time, with some specific times and dates being modelled worse compared to others.
To explore this aspect further we can look to the lower panel of Figure 4. This plots the r2 of all locations aggregated by the time and date. Thus reading left to right we can see the change in r2 over the month examined. Taking the scattered r2 points alone, there is not too much to see. A saw-tooth pattern can be discerned with a weekly cadence, which is again indicative of difference in occupancy of a car over weekends and weekdays, with the current model not accounting for this perfectly.
The time pattern becomes far more interesting and informative when we gather r2 values by the hour of the day. As shown by the colour and horizontal lines, we can aggregate all days and measure the r2 on an hourly basis at all locations simultaneously. This shows incredible accuracy of the modelled values during the day-time hours with the r2 per hour not slipping below 88% between 04:00-19:00UTC (05:00-20:00BST). Even outside these hours we see great correlation on average, with even the lowest r2 value (at 1am UTC) at 77.5%.
The overall view then on Turbine’s modelling of road volumes is highly promising. Over 70% of measurements across all times and locations have an r2 > 0.8, indicating a high degree of accuracy.
Uses of the Data
The examination of modelled road usage is incredibly interesting. Whilst expected results such as rush hour peaks are expected, there are other much less tangible insights to be grasped. One such is just how busy London traffic is compared to the rest of the country.
The busiest roads seemingly emanate out from London before reaching the rest of the country, and roads only loosely associated with London are nonetheless busier than roads around other large cities such as Birmingham (see Figure 5). Due to the comprehensive coverage of Turbine road data in comparison to placed physical sensors, potential insights are broad and exciting. The accuracy of the data would allow for finely detailed contextualisation of all regions of the UK. This would be applicable in multiple industries or individual use cases. Some examples are provided below:
- Traffic-light timings at major intersections could be finely tuned to optimise flow and reduce queues, by utilising hourly expected traffic volume, extending to even smaller unmonitored junctions.
- Diversion analysis could be performed, us ing changes in volumes around traffic diversions to assess the quality of the diver sion, and indeed the proportion of the traffic heeding directions.
Transport and Goods Companies
- Providing routes with low volumes of traffic (regardless of traffic speed) could be useful for transportation of exceptional goods. Especially hazardous materials or secure goods could benefit from highly de tailed route volume information, and choose routes with lower expected volumes.
- Public transport companies would value in formation on road volumes highly, as this could be leveraged to inform new routes that are likely to capture passengers otherwise using car. The data could also well inform frequency of trains or busses that would best suit an increase in expected volume.
- Transport accounts for 27% of CO2 emissions  in the UK. Monitoring and time based trends could assist policy decisions regarding CO2 reduction both locally and regionally.
- Traffic volumes would be highly assistive in decisions to widen roads. These are expensive and damaging projects that are generally informed by traffic monitoring that has been set up in anticipation of a road widening project. Using a historical catalogue of road volumes from Turbine, these projects could be much better informed, with traffic information stretching back un til the start of the BT-Turbine collaboration.
Business and Advertising
- A plethora of clear advertising opportunities could be drawn and optimised from the road volumes data. Further to this, business placement could be well informed, especially businesses dependent on high road volumes (gas stations, services, restaurants, tourism catchment)
This list is far from exhaustive and many further applications could be brought forward.
In the future, use cases could be expanded by combination with other easily accessible metrics such as average speeds on roads. This would give an indication of when high volumes become traffic-jams for instance. Further expansion could be a movement to live data instead of daily data. This could be used to alert discrepancies in volume which could in turn be used for accident monitoring or live road adjustments.
The opportunities are limitless, and the use-cases are broad. These can all be built on top of the base that is currently provided, with accurate road counts across the UK.
- FON. BT Mobile Market Share. url: https:// fon.com/british-telecom/.
- Clearview Intelligence. Smart Motorways. url: https://www.clearview-intelligence.com/ case - studies / midas - radar - challenges - solved-with-wireless-detection/
- UK.gov. Highways England. url: https : / / www . gov . uk / government / organisations / highways-england.
- UK.gov. Proportional CO2 Contributions in the UK, 2019. url: https : / / www . gov . uk / government / statistics / transport - and - environment - statistics - autumn - 2021 / transport - and - environment - statistics - autumn-2021.