Post

First new steps - Analysing the UK's busiest stations

First new steps - Analysing the UK's busiest stations

As mentioned in my previous article, over the last few months I’ve been collecting data from National Rail’s Darwin push port - a comprehensive dataset that tracks all train movements across the UK. In this series of articles, I plan to delve deeper into the dataset, uncovering insights that will interest train enthusiasts, commuters, and government bodies alike.

This first article I want to start with a simple bit of analysis - What’s the busiest station in the UK by number of services - but one that I was surprised by. Before we get to that however, let’s explain what the data we’ll be looking at is.

In the National Rail system, the definition of a “rail service” can be quite nuanced. Services may be split, rejoined, consist of empty stock movements (ECS), or even be freight trains. For this analysis, let’s focus on passenger services - excluding ECS and freight - which typically have an origin, a destination, and several intermediate locations along the way. These locations may include stations, but can also represent other points of importance on the railway, such as junctions. Each location is identified by a tpl (TIPLOC) code, while each service is assigned a unique rid.

For our first analysis, we will look at the number of unique passenger services that pass through each station in the UK, sorting from the busiest to the least busy. This will provide us a table of the number of rids passing through a given tpl. And here’s what the data looks like — for the 22nd of November.

tplLocation Namecount
LNDNBDELondon Bridge (Eastern) (South Eastern Platforms)1302
LEEDSLeeds1237
WATRLMNWaterloo Main Lines1212
WATRLWCWaterloo West Crossings1212
STFDStratford (London)1160
LIVSTLondon Liverpool Street1018
BTHNLGRBethnal Green1015
MNCRPICManchester Piccadilly1006
ARDWCKJArdwick Junction978
WDONWimbledon944
ECROYDNEast Croydon943
RAYNSPKRaynes Park943
WNDMLBJWindmill Bridge Junction940
CLPHMJMClapham Junction Main Lines935

On initial inspection, this dataset looks nothing like the 2024 figures for busiest station in the UK - I mean Bethnal Green is on there! So what’s going on?

Well, as we said above - the key aspect is this is the number of unique services passing a given station - not necessarily stopping. Bethnal Green sits on the Weaver line of the London Overground, but also has the lines to London Liverpool St passing through it’s bounds. This means services going to Liverpool St will report they’ve passed Bethnal Green. The same can be said for the non-station locations, Waterloo West Crossings (north of Clapham Junction), Ardwick Junction (south of Manchester Picadilly) and Windmill Bridge Junction (north of East Croydon) - each of these is a major junction which has many services passing through it.

There are a few odditites in this data though. Clapham Junction sits at the entrypoint for almost all services into Waterloo and Victoria - yet Waterloo is near the top and Clapham Junction is down in 14th. Why? The clue is in the name of the TPL Clapham Junction “Main Lines”. This tpl covers just on of the routes into the station, and we can see this if we search the dataset for tpl’s start with CLPHM. We find:

tplLocation Namecount
CLPHMJMClapham Junction Main Lines935
CLPHMJCClapham Junction Central Lines774
CLPHMJWClapham Junction Windsor Lines351
CLPHMJ1Clapham Junction Platform 1302

This makes a lot more sense! We now get 2362 unique services passing through Clapham Junction - which is a far more plausible number than what we had previously. There is likely many other stations which require similar aggregation - and something I will be investigating in future posts.

If you want to re-create this analysis, you soon will be able to! I intend to post my dataset to the AWS Open Data Registry, where you will be able to setup AWS infrastructure to query the dataset yourself. I will also post all queries used in these analysis, so hopefully you can improve on my work.

Appendix: SQL Query

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
WITH passenger_services AS (
	SELECT rid,
		update_id,
		year,
		month,
		day
	from services
	where services.passenger = true
),
data AS (
	SELECT rid,
		tpl
	FROM passenger_services
		INNER JOIN locations ON passenger_services.update_id = locations.service_update_id
	WHERE passenger_services.year = '2024'
		AND passenger_services.month = '11'
		AND (
			passenger_services.day = '21'
			OR passenger_services.day = '22'
		)
		AND starts_with(passenger_services.rid, '20241122')
	GROUP BY rid,
		tpl
)
SELECT
	tpl,
	count(*),
	array_agg(rid)
from data
GROUP BY tpl
ORDER BY count(*) DESC