Adding EPG Data

One of the problems with plotting out subtitle data over a 24 hour period is working out which programmes are causing problems. The answer is to add EPG data to plots and working out how to get the text onto the plots in a readable form.

The result isn’t particularly elegant, because each plot is different so the positioning of the text can overlap with the plot. In the end I changed the colour of the plot and overlayed black text to make it readable. This mostly works, except for programmes lasting 10 minutes or less where the text overlaps.

The above plot is a record of subtitle delay on BBC One for the 7th January 2025. It shows approximately zero delay up till 01:30 when BBC One joins BBC News and the characteristic 4 second delay of the BBC News ASR trial is seen. Then at 06:00 the delay becomes larger and more variable because Breakfast is subtitled by humans using a mixture of autocue and scripts along with respeaking. The delay then drops slightly for Morning Live and then back to zero for the prerecorded programmes. You can also see the delay go back up during the live News bulletins at 13:00, 18:00 and 22:00, the One Show at 19:00 and Match of the Day running up to midnight.

Plotting the average word rates over 5 minute segments for the subtitles and transcript reveals some interesting contrasts in the plot above. While in the subtitles are seen to roughly keep track with the speech most of the time, three programmes stand out as causing problems, Match of the Day, Breakfast and most noticeably, Morning Live where the speech rate far exceeds the subtitle word rate.

Because the programme boundaries are now available it is also possible to plot the average word rates for each programme, see above. In this plot you don’t see the very high peak speech rates in sections of Morning live, but you can clearly see that while the average speech rate is around 200 wpm, the subtitles only manage about 160 wpm. This means that in total around 20% of the words spoken are missing from the subtitles and in some segments of the programme it is much higher.

The delay plot for ITV 1 on the 19th January shows some similar patterns and some key differences. There are no subtitles between midnight and 03:00 during Shop on TV, and Unwind with ITV from around 03:50 to 05:00 contains no speech. The programmes at 03:00 and 05:00 are prerecorded and the delay spikes are likely to be caused by anomalies caused by the commercial breaks. Good Morning Britain goes live at 06:00and the delay is around 6 seconds, dropping to zero when adverts are subtitled and there is a large delay peak of over 30 seconds just after 09:00.

By looking at the scatter plot of delay for 09:00 to 10:00, above, it would appear this peak was caused by a technical fault, because the delay rises almost linearly before dropping back. This must be some form of buffering or rate limiting downstream of the subtitler.

ITV1 live programming continues with Lorraine at 09:30, This Morning at 10:00 and then Loose Women at 12:30 where the subtitle delay averages around 12 seconds for around 15 minutes from 13:00 onwards. This might be down to the delay in the programme feed reaching one of the subtitlers working on the programme, or some other technical issue. The delay drops back below 10 seconds for the lunchtime news before prerecorded programmes take over at 14:00. at 18:00 the subtitle delay jumps to a very consistent 5 seconds for the regional news. This is consistent with the use of ASR generated subtitles for all their regional news bulletins. The delay is more variable for the national news that follows and again for the news at 22:00.

The word rate plots for commercial channels have to be interpreted with caution, because most TV adverts do not contain subtitles. This reduces the average subtitle word rate below that of the programme itself and makes the programme averages less useful. However, from the 5 minute word rate averages, it is clear that the subtitles fall behind the speech most noticeably during This Morning.

The similarity with BBC One’s Morning live is striking. These are similar programmes and clearly both present a very difficult challenge for subtitlers. These programmes are both factual and aimed at an audience that is quite likely to rely on subtitles. These programmes and others like them should be the focus for research into improving subtitle quality, because any techniques that could be found to improve the quality of subtitles for this material would benefit many other less challenging live programmes. This is the kind of problem that might benefit from the use of programme specific machine learning along with helper material like scripts alongside the used of cutting edge speech to text. However, because this is a good example of a messy, hard problem I suspect it is one many academics will avoid because of the high risk of failure.

So the 24 hour plots are made far more informative by the addition of the EPG data. The EPG data helps highlight which types of programming present the biggest challenges for subtitling and are starting to indicate where future research should focus.

Posted in Uncategorized | Tagged , , | Leave a comment

24 hour plots

The introduction of ASR subtitles for BBC News on the 13th November prompted me to produce plots covering a 24 hour period, from midnight to midnight. The main reason was I spent a couple of weeks looking at one hour plots following the change to check of glitches and this became very boring. Grouping the data into 24 hour plots made the task more reasonable as I could spot any problems and then drill down to the one hour segments where the faults occurred. The most dramatic of these one day plots is of the 13th November where the wildly fluctuating subtitle delay, typical of BBC News up to that point becomes an almost flat line.

To make the plots readable on this scale I have taken the one minute average of the subtitle delay which does not show the maximum extent of any delay (or advance) but gives a good indication of when any problems occurred.

I have also plotted the different word rates for the 24 hour period, comparing the subtitle word rate with the speech to text transcript word rate to expose times when the subtitles omit a significant proportion of the speech content. This proved more difficult, as the word rates fluctuate quite significantly from minute to minute. Eventually I settled on five minute averages for the word rates which shows the difference up reasonably well. Here is the plot of word rates plotted for the same BBC News day.

The issue with plotting a 24 hour new channel is that it has very little variation of content across the day. By contrast a mainstream channel like BBC One has a varied output with both live and pre-recorded content as can be seen from these day plots from the 16th December.

The delay plot shows the channel going over to BBC News on ASR at about 01:30 and then the start of BBC Breakfast at 06:00 where the human subtitlers take over until about 10:45 where the channel goes over to pre-recorded programmes. The News at one and six pm both have live subtitling. The peak just before 7pm is caused by the regional weather forecast and live subtitles continue until 7:30pm with the One Show. Live subtitles then reappear for the 10pm news.

The word rate plot shows how the BBC News channel ASR tracks the spoken word rate, but once the respoken subtitles from 6am onwards fall behind. The most notable problems with word loss are shown between about 9:30 and 10:30am where there is around a 30% word loss in the subtitles. The word loss between 7 and 7:30pm is also notable but less severe. These are indicative of the problems with subtitling live magazine programmes where the speech rates are quite high and there is a great deal of interchange between the speakers.

The one thing that is lacking in these plots is any indication of the programme schedule. That is my next task.

Posted in Uncategorized | Leave a comment

BBC News Changes

At 14:00 on 13th November my monitoring system showed a dramatic change in the form and quality of the subtitles on the BBC News channel. The subtitles went from a mixture of block and snake subtitles to being entirely snake and the distribution of subtitle delay went from -3 to +15 seconds in the period 13:00 to 14:00 to +3 to +7 seconds in the period from 14:00 to 15:00.

Before 14:00

After 14:00

Now I have been monitoring BBC News subtitles, on and off, for several months and have fed some of the worst problems back to the BBC and RedBee, their subtitle provider. I had seen examples of subtitle delay that looked like they were caused by one or more technical faults along with a few incidents of subtitles turning up early were more likely to have been caused by the subtitler. In one case I saw subtitles being buffered to the extent that ended up over 3 minutes late before catching up and another where pre-prepared blocks of subtitles were triggered around a minute early. I was beginning to suspect that there had been some improvements as a result of my feedback, but this recent change is clearly a complete technology refresh that must have been planned well in advance.

The other noticeable change my monitoring has picked up is the subtitles are now more or less verbatim. Where subtitles are generated using respeaking it is generally the case that the maximum word rate that can be achieved is 180 words per minute, often as low as 160 words per minute. In this plot from 13:00 to 14:00 you can see the subtitle word rate is in places well below the speech rate, indicating that the respeaker has left words out

By contrast between 14:00 and 15:00 the subtitles keep up with the speech. The two gaps where there is speech but no subtitles are programme trails which were not subtitled.

All of this indicates that BBC News is now trialling direct speech to text subtitles. There is no official word yet from the BBC, but having watched the subtitles for a while they are pretty impressive. They have managed to retain the colour changes between speakers, which is an improvement over ITV regional news where the speaker change is indicated by a new line beginning with “>>”, though the colour change is not entirely reliable. One other issue with these subtitles is that the punctuation is not as good as that produced by a respeaker, something that could do with more work. On the positive side, while my system cannot directly measure word errors, from what I have watched so far the accuracy is very high, even managing to get names and places correct, which suggests that the system is being fed with script and other helper data.

This is a really interesting development and it will be fascinating to see if there is any response from the audience as the change is quite obvious. There is has been no official statement about this yet, as it currently has the status of a trial, but given that it would appear to have eliminated many other problems it is likely to continue into a full deployment at which point there should be a press statement or blog post.

46

Posted in Uncategorized | Tagged , | Leave a comment

Dorkbot Bristol

A Lightning Talk

On 14th October I gave a brief talk about measuring subtitle quality as a lightning talk at Dorkbot Bristol. It was a severely cutdown version of my IBC presentation for an audience of technically and artistically minded people. When asked the question, who here uses subtitles, not quite half of the audience raised their hands and almost all of them use same language subtitles, so I had a receptive audience.

This was the first time I’d given a talk on this work where it wasn’t being recorded so I felt free to give more details about which TV channels were being represented in the plots and led to audible gasps in one case. The audience response has given me confidence in the relevance of the topic for a non-specialist audience and will be seeking out more opportunities to talk about this work and gather feedback.

One topic that came up in the later conversations was the issue of censorship of swear words in subtitles. This is a longstanding issue, dating back to the early days of subtitles. Indeed, Prof Alan Newell who led the early work on subtitling has claimed that “A further interesting finding was the difference between the impact of the spoken and written word, the most obvious example being the use of swear words. These have a greater impact when read as subtitles rather than when heard.”[1] This attitude, was also espoused by the likes of the infamous reactionary Mary Whitehouse and when combined with other patronising attitudes towards disability this led to the censorship of swear words subtitles. These attitudes are now returning with YouTube censoring words in automatic captions, though this can be disabled.[2] Netflix has also been called out for its censorship of subtitles with their audience complaining about the service misrepresenting, censoring and simplifying dialogue from a variety of shows.

One of the issues which both YouTube and mainstream broadcasters face is speech to text systems outputting swear words in error when there are similar sounding words in the speech. In the case of broadcasters, these systems replace swear words, pre-watershed but are switched off post-watershed. The exception the BBC makes is during coverage of the Glastonbury Festival where censorship of the subtitles causes a barrage of complaints, so the system has to be switched off for the duration.

As a result of this feedback, I’m intending to add a check for censorship in subtitles, though the initial challenge is to check that the speech to text system I am using does not have censorship trained into it.

[1] Design and the Digital Divide: Insights from 40 Years in Computer Support for Older and Disabled People. Alan F. Newell, Morgan & Claypool Publishers 2011 (page 33) https://link.springer.com/book/10.1007/978-3-031-01592-2

[2] How to Uncensor the Automatic Captions. Gabe the Slacker. YouTube. https://www.youtube.com/watch?v=fTdHXCXETcQ

5

Posted in Uncategorized | Tagged , | Leave a comment

IBC2025 Paper and Slides

Now I have presented my paper at IBC2025 I can share a version of the slide deck and the paper.

IBC2025 slides in pdf format

IBC 2025 paper in pdf format

14

Posted in Uncategorized | Tagged , , | Leave a comment

DARCI conference

Just before departing for Amsterdam to present my paper at IBC2025 I will be doing an online presentation titled Using AI-based tools to monitor subtitle quality for the conference on Disability, Accessibility and Representation in the Creative Industries (DARCI) which is taking place at the University of York.

It will be a shorter and less technical version than the one for IBC2025. The IBC paper is aimed at the broadcast engineering community, while the DARCI presentation is aimed at accessibility experts. You can attend DARCI in person or on-line and registration is open on their web site https://enhancingaudiodescription.com/darci-conference. I am speaking in the 10:45 session on the 11th September, Paper Session 1B: Sign Languages and Captioning in the Creative Industries.

2

Posted in Uncategorized | Tagged , , | Leave a comment

The future will be subtitled

The rise of the text generation

We’re all used to hearing middle class parents complain about the behaviour of their teenage offspring, from the state of their bedroom to the way they talk or the fact they are always after money. However, there is a new complaint I’m hearing these days, even turning up on Radio 4 news programmes. Teenagers insist on turning the TV subtitles on. Infuriated parents are having to learn where the “subs” button on the remote control in order to turn them off and arguments break out if they turn them off whilst their offspring are watching.

This is not some moral outrage story, cooked up by the tabloid press as part of the ongoing culture wars, it is a very real phenomenon. Moreover, it is not confined to teenage children, but younger adults too. A recent YouGov poll confirmed something we’ve been seeing signs of for several years. This poll shows that 61% of 18-24 year olds prefer to watch TV with the subtitles turned on and while it doesn’t give data for people under 18 the level of subtitle use in the younger age group is probably higher. Other, surveys also indicate a clear trend of increasing preference for subtitles with decreasing age. These trends are not new, iPlayer data from 2016 showed subtitle usage of around 30% on children’s services rising to 35% on content classified as learning. Seven years later it is no surprise to see these trends continuing to grow. Meantime in the field of computer games the usage of subtitles is even more widespread with usage figures for subtitles in major computer games of 95% and higher.

So, what is going on? Why are young people so enamoured with subtitles? The short answer is we don’t know. There is a lack of published research on this phenomenon and while broadcasters are interested in answering this question, there hasn’t been sufficient motivation for them to fund further investigation. Part of the problem being that everyone seems to have their pet theory on the topic, so they don’t see it as warranting further expenditure.

Personally, I think that subtitle use has to be seen as part of a much wider set of cultural changes that have taken place in the past 30 years which have changed the relationship between people, media and communication in general. Let’s start with television itself. What has changed in the past 25-30 years to take us from a situation where subtitles were seen as something for older people and those with hearing difficulties to something that is now a majority preference amongst younger audiences?

When I was a teenager the only subtitles available on television were ones that were burnt into the picture. These were provided for foreign language programmes and occasionally for programmes aimed at the Deaf and hard of hearing. Teletext subtitles didn’t arrive on UK Television until 1979, and TV sets that displayed Teletext were very expensive and very few programmes had subtitles, all of which were pre-recorded. Live subtitles didn’t arrive until 1984 and it was only with the arrival of digital television at the end of the 1990s that the ability to display subtitles became mandatory in all new receivers. The ITC (now Ofcom) was given the duty to set for minimum levels of subtitle provision for broadcasters thanks to rebel amendments to the Broadcast Act of 1996 which also mandated minimum levels of audio description and sign language. Quotas for subtitling increased over the years with the BBC unilaterally achieving 100% subtitling for all its main television channels by 2008 and mirroring this on iPlayer by 2012. The result is that for anyone under the age of 24 there have almost always been subtitles available on television. Unlike older generations they have grown up with subtitles, whether they used them or not. The question then is why would they chose to use them?

Another technology that may have had a big impact is the world wide web which became available as a royalty-free technology in 1993. The web, like books and newspapers, was initially text based with images, sound and video coming later. At the same time SMS text messaging became part of the digital mobile phone services and a cheaper method of communicating short messages than phone calls. From these technologies come the smart phones, aps and near ubiquitous connectivity we see today. These, along with the arrival of social media apps, change the way people communicate with each other. Rather than making a phone call, communication is more often text based, giving the advantage of being both asynchronous and recorded. In the past a phone call would rudely interrupt whatever you are doing, and you would have had to write down any important information, but now a text message can be attended to later and the information saved for later use.

As a result, a whole generation has now grown up exchanging text messages and has also become used to handling multiple different exchanges interleaved with each other. There is something unnerving about the way I have seen younger colleagues able to continue typing a report whilst holding a face-to-face conversation. The ability to process more than one stream of language simultaneously is something I definitely don’t have. So how does that relate back to subtitles? Well, unlike speech which vanishes as soon as it is uttered, subtitles stay onscreen for a period of time and can be glanced at if you miss a word or its context. This enables you to switch attention between a TV programme and other things going on like a conversation, text messages or a game. Research is finally catching up with these issues and I hope to see the results published soon.

So as far as I can see subtitles are here to stay as part of our media landscape. Other issues such as not wanting to wear headphones, watching video with the sound turned off also contribute to their popularity amongst the population as a whole. Furthermore, recent improvements in the quality of text to speech has meant that almost any video can have useful, if not entirely accurate, subtitles mean they are now available for anyone posting video content.

The future will be subtitled…

1

Posted in Uncategorized | Tagged , | Leave a comment

DARCI Abstract

The abstract for our presentstion to the Conference on Disability, Accessibility and Representation in the Creative Industries (DARCI) in September 2025

Using AI-based tools to monitor subtitle quality.

The RNID’s 2023 report, “Subtitle It,” highlights the significant challenges viewers face in accessing subtitles through on-demand platforms. While the Media Act 2024 mandates minimum quotas for subtitle provision on “tier 1” services, subtitle quality remains an issue which impacts accessibility and viewer experience.

While AI-based speech-to-text tools cannot provide broadcast-quality subtitles, because they produce different types of errors, they can be used to monitor some of the problems which affect the quality of television subtitles and degrade the audience experience. This presentation will demonstrate how, using a modified version of Whisper, OpenAI’s speech-to-text engine combined with natural language processing and a simple statistical approach, we can usefully quantify problems with timing and word omission in subtitles in broadcast and on-demand content.

We will show how these problems vary across different types of TV programming, including archive programmes with original subtitles that omit a substantial portion of the spoken words and live programmes, where subtitlers re-speak the dialogue or manually cue pre-prepared blocks, leading to subtitles that (usually) lag the speech along with the omission and reordering of words.

These problems will be illustrated with examples from broadcast television and on-demand content, including technical faults and workflow issues. The examples will also highlight the challenges of aligning subtitles with a speech-to-text transcript, given that this work has revealed examples of subtitles omitting around 40% of the spoken words and subtitles arriving between 20 seconds early and 50 seconds late as well as an example of a programme broadcast with the subtitles for a different episode.

We will conclude with some observations on current practices and historical trends in TV subtitling and discuss the need for improved quality control and monitoring of subtitles provided for broadcast and on-demand programmes.

6

Posted in Uncategorized | Tagged , , | Leave a comment

IBC2025 Paper

“Using generative-AI speech-to-text output to provide automated monitoring of television subtitles”

My latest paper will be published as part of the IBC Technical Papers conference. This paper will cover the technical details of a research project that uses speech to text software to track timing and word omission in television subtitles.

This is the outcome of several months of independent research carried out in association with Dr Michael Crabb, Head of Computing at the University of Dundee and the UK Subtitling Audiences Network.

The proof of concept software can plot timings for individual subtitles where a match is found between the speech to text transcript and the first word of a subtitle. The resulting chart can give a feel for the amount of delay across the recording.

The software can also plot the number of words in each minute for the transcript and subtitles to give an indication of the proportion of words missing from the subtitles.

While the technique cannot be 100% accurate it gives a good indication of the quality of the subtitles and direct attention to where faults have occurred.

3

Posted in Uncategorized | Tagged , | Leave a comment