Post

Analysing popular routes out of Bristol Temple Meads

Analysing popular routes out of Bristol Temple Meads

Analysing popular routes out of Bristol Temple Meads

Over the past few months, I have been designing a system for ingesting data from National Rail’s “Darwin Push Port Feed”. This feed contains live information about train movements, scheduling, and service alterations across the UK rail network. In this article, I will share some analysis of the service delays for three most popular (by passenger numbers) routes out of Bristol Temple Meads. These are the route to London Paddington, Cardiff Central and Clifton Down.

The Query

To obtain this data, I used the query found in the appendix, it does a few things of note:

  • It fetches the last SCHED and ACT (scheduled and actual) messages for each tpl. In this case, it’s assumed the last schedule message is the most accurate.
  • It attempts to correct for day rollover (where a service runs into the next day) when computing the delay for a service. This is done by comparing the schedule and actual arrival times, and if the difference is off by more than half a day (I assume no service is this late!) then it adjusts the calculation to ensure the correct delay is calculated.
  • It assumes that the last and first locations updates are the destination and origin respectively. This could be incorrect if there is data missing (ie I fail to capture the start of a service).

This query was then run for each three routes, pulling back the raw data into Google Sheets for analysis.

The Analysis

To analyse the data, I needed to ensure only the correct data was being pulled in, as my query cannot currently understand directionality of a route. For example, a service can be arriving in Bristol Temple Meads from either Weston (en route to Paddington) or from Paddington (either terminating or continuing to Weston). This was done by filtering the data for only services that started in the correct direction - in this example, if I want to analyse services going to London Paddington via Bristol, they would need to terminate at either Taunton, Weston or the South West of Wales.

Now that’s done, we can see some lovely heat maps!

Services to London Paddington

Alt text

Services to Cardiff Central

Alt text

Services to Clifton Down

Alt text

Instantly these charts show an interesting picture, each route has approximately the same number of services per hour (approx 2), but we can clearly see the Clifton Down route have far fewer delays than the other routes - although this could be due to the reduced distance it covers.

The Paddington services show a different story, only 5 of the days tested have a maximum delayed of less than 15 minutes! There seems to be little pattern to the delays across a given day - although the last service of the day seems especially prone to delays.

The Cardiff services show a similar amount of delays - although there are far fewer services which are not delayed at all. This however is the first route which has two different sections - Bristol to Cardiff services can come either from Taunton or Portsmouth. We can generate heatmaps for these separatley:

Services via Taunton Alt text

Services via Portsmouth Alt text

These extra heatmaps give us a little more insight into the source of the delays. It looks as though the Portsmouth route contributes slightly more delays than the Taunton one, which makes sense given the congestion services suffer around Portsmouth and Southampton.

This is just a first pass analysis. In future posts, I’d like to look further at where these delays originate - and if it can show us bottlenecks in the network.

Appendix

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
WITH data AS (
	SELECT services_v1.rid,
		ts,
		tpl,
		type,
		time_type,
		time
	FROM services_v1
		INNER JOIN locations_v1 ON services_v1.update_id = locations_v1.service_update_id
		INNER JOIN service_details ON services_v1.rid = service_details.rid
	WHERE service_details.passenger = true
		AND (services_v1.year = '2025')
		AND (services_v1.month = '03')
		AND starts_with(services_v1.rid, '202503')
		AND (locations_v1.year = '2025')
		AND (locations_v1.month = '03')
),
sched_int AS (
	SELECT rid,
		tpl,
		ts,
		type,
		time_type,
		time,
		ROW_NUMBER() OVER (
			PARTITION BY rid,
			tpl,
			type
			ORDER BY ts DESC
		) AS row_number
	from data
	WHERE time_type = 'SCHED'
),
sched AS (
	SELECT *
	FROM sched_int
	WHERE row_number = 1
	ORDER BY time ASC
),
act_int AS (
	SELECT rid,
		tpl,
		ts,
		type,
		time_type,
		time,
		ROW_NUMBER() OVER (
			PARTITION BY rid,
			tpl,
			type
			ORDER BY ts DESC
		) AS row_number
	from data
	WHERE time_type = 'ACT'
),
act AS (
	SELECT *
	FROM act_int
	WHERE row_number = 1
	ORDER BY time ASC
),
service_delays AS (
	SELECT act.rid,
		act.tpl,
		act.ts,
		sched.time AS sched_time,
		act.time AS act_time,
		CASE
			when DATE_DIFF(
				'minute',
				date_parse(sched.time, '%H:%i:%S'),
				date_parse(act.time, '%H:%i:%S')
			) < -800 then DATE_DIFF(
				'minute',
				date_parse(sched.time, '%H:%i:%S'),
				date_parse(act.time, '%H:%i:%S')
			) + 1440
			when DATE_DIFF(
				'minute',
				date_parse(sched.time, '%H:%i:%S'),
				date_parse(act.time, '%H:%i:%S')
			) > 900 then DATE_DIFF(
				'minute',
				date_parse(sched.time, '%H:%i:%S'),
				date_parse(act.time, '%H:%i:%S')
			) - 1440 else DATE_DIFF(
				'minute',
				date_parse(sched.time, '%H:%i:%S'),
				date_parse(act.time, '%H:%i:%S')
			)
		END as delay,
		sched.type
	FROM act
		LEFT JOIN sched ON act.tpl = sched.tpl
		AND act.rid = sched.rid
		AND act.type = sched.type
),
all_services AS (
	SELECT rid,
		array_agg(tpl) as tpls
	FROM data
	GROUP BY rid
),
filtered_services AS (
	SELECT rid,
		tpls
	FROM all_services
	WHERE contains(tpls, 'BRSTLTM')
		AND contains(tpls, 'CRDFCEN')
),
destination_int AS (
	SELECT ROW_NUMBER() OVER (
			PARTITION BY rid
			ORDER BY act_time ASC
		) AS destination,
		rid,
		tpl
	FROM service_delays
),
destinations AS (
	SELECT *
	FROM destination_int
	WHERE destination = 1
),
origin_int AS (
	SELECT ROW_NUMBER() OVER (
			PARTITION BY rid
			ORDER BY act_time DESC
		) AS origin,
		rid,
		tpl
	FROM service_delays
),
origins AS (
	SELECT *
	FROM origin_int
	WHERE origin = 1
)
SELECT filtered_services.rid,
	SUBSTR(filtered_services.rid, 1, 8) as date,
	service_delays.tpl,
	ts,
	sched_time,
	act_time,
	delay,
	type,
	origins.tpl,
	destinations.tpl,
	ROW_NUMBER() OVER (
		PARTITION BY SUBSTR(filtered_services.rid, 1, 8)
		ORDER BY act_time DESC
	) as s_id
FROM filtered_services
	LEFT JOIN service_delays ON service_delays.rid = filtered_services.rid
	LEFT JOIN origins ON service_delays.rid = origins.rid
	LEFT JOIN destinations ON service_delays.rid = destinations.rid
WHERE starts_with(filtered_services.rid, '202503')
	AND service_delays.tpl = 'CRDFCEN'
	AND type = 'ARR'
ORDER BY rid,
	sched_time