The Loudest Voice in the Room — Raff's Reflections

Black and white photograph of newspaper front pages on a newsstand, showing the Mirror with a large image of Nigel Farage and the headline 'No Care No Clue No Thanks' alongside The Scotsman with election coverage.

Preliminary findings from a six-week linguistic study of UK media coverage ahead of the 2026 Scottish Parliament and English local council elections. Full analysis will follow post-election. These are patterns observed in the data, not conclusions. The corpus, harvesting script, and analytical tools will be made publicly available at the end of the study period on Codeberg.

A study published in May 2026 found that AI platforms disproportionately reference Nigel Farage when prompted about UK politics, more than any other political leader.¹ The researchers described this as a model bias problem.

It may not be. It may be a data problem.

Six weeks ago I started building a corpus to study how UK media frames politics in the run-up to an election. What I found has a bearing on that question too.

What I Built

Between 23 March and 7 May 2026 (polling day for the Scottish Parliament and English local councils) I harvested headlines and descriptions from 25 UK media outlets via RSS. The corpus contains 53,087 headlines. The outlets span seven ownership groups: News UK, DMGT, Reach PLC, Newsquest, the Scott Trust, public broadcasters, and The National as an independent Scottish control.

I didn't pre-classify any outlet by political leaning before starting the analysis. That's a deliberate choice. If you decide in advance that a paper is right-wing and then find right-wing language in it, you haven't found anything, you've confirmed your assumption. The patterns here emerge from the data, not from what I expected to find going in. This approach is consistent with corpus-driven Critical Discourse Analysis, for those who care about methodological grounding.

These are preliminary findings, published on election day. The corpus runs through Sunday. Full analysis will follow.

Finding One: Who Gets Mentioned

I counted named references to party leaders across the corpus, excluding Prime and First Ministers whose mention counts reflect office rather than editorial choice.

Party leader mentions — 23 March to 7 May 2026 (53,087 headlines)

Scottish outlets

The National

UK-wide

Farage (Reform UK)

442

Sarwar (Scottish Labour)

267

Polanski (English Greens)

234

Badenoch (Conservatives)

178

Offord (Scottish Reform)

167

Findlay (Scottish Cons.)

Greer (Scottish Greens)

Davey (Lib Dems)

Cole-Hamilton (Scot. LD)

Mackay (Scottish Greens)

Excludes Prime Minister and First Minister whose mention counts reflect office rather than editorial choice.

Nigel Farage is not a candidate in this election. Reform UK Scotland holds no seats in the Scottish Parliament. He still receives 442 mentions across the corpus, including 47 in Scottish-targeted outlets, more than either Scottish Greens co-leader, both of whom lead a party with sitting MSPs.

Malcolm Offord, who became leader of Reform UK Scotland in 2026, receives 103 mentions in Scottish-targeted outlets. That puts him ahead of Findlay, whose Scottish Conservatives have a full MSP group and a long Holyrood history.

Meanwhile: the Scottish Greens have held seats in Holyrood since 1999. Their co-leaders from 2021 to 2024 served in government. On election day, some polls placed the party as the second largest in the Scottish Parliament after the SNP. Their two current co-leaders, Gillian Mackay and Ross Greer, received 40 mentions between them across six weeks of coverage.

Offord alone got 167.

I'm not going to tell you what that means. It may reflect genuine public interest in Reform's emergence in Scottish politics. It may reflect newsrooms defaulting to a UK-wide political frame even when covering a Scottish election. Probably some of both. The post-election analysis will look at whether pre-election mention frequency predicted vote share or didn't.

Finding Two: How Language Is Used

The corpus also tracks linguistic devices associated with loaded or propagandistic framing: fear amplification, evaluative adjectives, absolutist language, false urgency, scare quotes, dehumanising metaphor, false expertise, victimhood framing, diminutive framing, and loaded possessives.

I computed a normalised combined loading index for each outlet, averaging linguistic device loading and topical domain loading on a 0–1 scale.

Headline loading index — combined normalised score (0–1 scale)

GB News

1.000

Guardian Scotland

0.293

The Guardian

0.269

The Observer

0.206

The Sun

0.195

Scottish Daily Express

0.149

Scottish Daily Mail

0.131

Scottish Sun

0.125

Daily Mail

0.120

The National

0.110

Daily Express

0.099

Daily Mirror

0.099

Daily Record

0.084

The i

0.060

The Herald

0.060

BBC News

0.055

BBC Scotland

0.052

Evening Standard

0.051

The Independent

0.047

Financial Times

0.044

STV News

0.044

The Scotsman

0.042

The Economist

0.020

Daily Telegraph

0.013

Metro

0.011

Combined index averages normalised linguistic device loading and topical domain loading. Equal weighting. Full raw data in the table below.

view full index data

Outlet	Device raw	Domain raw	Device normalised	Domain normalised	Combined
GB News	1.84	6.12	1.000	1.000	1.000
Guardian Scotland	0.45	3.07	0.210	0.375	0.293
The Guardian	0.51	2.67	0.244	0.293	0.269
The Observer	0.25	2.78	0.097	0.316	0.206
The Sun	0.52	1.92	0.250	0.139	0.195
Scottish Daily Express	0.26	2.20	0.102	0.197	0.149
Scottish Daily Mail	0.29	1.94	0.119	0.143	0.131
Scottish Sun	0.32	1.79	0.136	0.113	0.125
Daily Mail	0.33	1.72	0.142	0.098	0.120
The National	0.17	2.06	0.051	0.168	0.110
Daily Express	0.31	1.57	0.131	0.068	0.099
Daily Mirror	0.29	1.62	0.119	0.078	0.099
Daily Record	0.19	1.75	0.062	0.105	0.084
The i	0.20	1.49	0.068	0.051	0.060
The Herald	0.11	1.74	0.017	0.102	0.060
BBC News	0.10	1.72	0.011	0.098	0.055
BBC Scotland	0.09	1.72	0.006	0.098	0.052
Evening Standard	0.13	1.60	0.028	0.074	0.051
The Independent	0.16	1.48	0.045	0.049	0.047
Financial Times	0.13	1.53	0.028	0.059	0.044
STV News	0.09	1.64	0.006	0.082	0.044
The Scotsman	0.10	1.59	0.011	0.072	0.042
The Economist	0.09	1.41	0.006	0.035	0.020
Daily Telegraph	0.08	1.37	0.000	0.027	0.013
Metro	0.12	1.24	0.023	0.000	0.011

GB News scores 1.000 on both normalised measures. The next highest outlet scores 0.293. That is not GB News sitting at the high end of a distribution. It is GB News in a different category entirely.

The Guardian figure (0.269) needs context. Its score is driven by topical loading: its headlines cover multiple domains at once, which reflects editorial depth rather than sensationalism. Its linguistic device score is moderate. Compare that to The Sun (0.195), which is the inverse: high device loading, narrow topic range. They're doing different things with language even when their combined scores look similar.

The Daily Mail (0.120) scores lower than you might expect given its reputation. This is a known limitation of lexical analysis. The Mail's most effective devices tend to be structural and contextual, not the vocabulary-level patterns this index captures. The full analysis will look at this in more detail.

BBC News and BBC Scotland both sit near the bottom, close together. Given that both operate under Ofcom impartiality requirements, it's actually reassuring that those requirements show up as a measurable signal in the data.

The BBC Scotland Finding

Both BBC outlets operate under identical Ofcom requirements. Their pre-election coverage of Scottish party leaders looks noticeably different.

Outlet	Farage	Sarwar	Polanski	Badenoch	Offord	Findlay	Greer	Cole-H.	Mackay	Headlines
BBC News	13	3	5	6	3	1	0	0	0	1,461
BBC Scotland	1	8	1	0	4	3	0	0	1	962

BBC News mentions Farage 13 times, Sarwar 3 times. BBC Scotland does the opposite: Farage once, Sarwar 8 times. Which suggests BBC News was covering this election through a Westminster lens rather than a Scottish one.

The figure I keep coming back to is Greer's zero. The Scottish Greens co-leader does not appear in a single BBC Scotland headline across six weeks of pre-election coverage. Offord, whose party has never held a Scottish Parliament seat, appears four times. Together, the two Scottish Greens co-leaders get one mention between them.

I want to be careful here. BBC Scotland may have covered the Scottish Greens thoroughly in broadcast, or covered their policies without naming their leaders directly. Headlines are not the full picture. But as a measure of how visible these leaders were in online coverage, the absence is hard to ignore.

The AI Question

A study published in May 2026 found that AI platforms are more likely to reference Nigel Farage than any other UK leader when prompted about British politics.¹ The researchers at Peec AI suggested the mechanism was Reform's social media strategy: commenting at scale on posts to drive volume and therefore model visibility. They found that LLMs cited Facebook as their top source in responses to political prompts.

There is a problem with this. Facebook requires login to access most content. LLMs can't train on login-restricted material at scale. Their training data comes from the publicly crawlable web: news sites, Wikipedia, open forums, and similar. Citing something in a response and being trained on it are different things, and the Peec AI study runs them together.

News media is a cleaner explanation. It's openly crawlable, treated as high-authority in training pipelines, and as this corpus shows, it over-represents Farage significantly relative to his actual electoral weight. If an LLM trains on UK media and UK media has mentioned Farage 442 times against Greer's 27, the model reflects that. Not because of a deliberate choice, but because that's what was in the data.

Whether you call that model bias or inherited media bias is a different question. I'd argue it's an important one.

The National: A Note

The National is the only outlet in the corpus without a UK-wide counterpart. As an editorially pro-independence, pro-SNP publication, it functions as a control — it isn't subject to the same dual-audience pressures as the ownership-group outlets, and its coverage should, in theory, reflect a distinctly Scottish political perspective.

Outlet	Swinney	Sarwar	Offord	Farage	Findlay	Polanski	Greer	Cole-H.	Mackay	Badenoch	Headlines
The National	126	84	47	34	8	39	3	3	2	5	2,511

Swinney leads the count, which makes sense. But Sarwar (84), Offord (47), Farage (34), and Polanski (39) all appear more often than any Scottish party leader outside Swinney. Greer gets 3 mentions across 2,511 headlines.

The National's editorial logic is coherent: it covers the SNP leader, the main challenger, and the political forces it sees as threats. That's normal journalism. But the cumulative picture across the whole corpus is the same pattern repeated: parties with no Scottish parliamentary presence getting more coverage than parties which held seats for decades.

How Media Bias Gets Into AI

This is the part I find most interesting.

LLMs are trained on large volumes of internet text, and news media is a significant and high-authority part of that. A model trained on this corpus, or on the broader UK media landscape it samples, will treat coverage frequency as a signal of political importance. Ask it about Scottish politics and it will give you back what the media gave it. Farage will come up. Greer probably won't.

That isn't the model being biased independently. It's the model inheriting bias that was already in the training data, then serving it back as neutral information. The person asking an LLM for an election briefing has no idea that the answer reflects editorial decisions made by journalists who weren't thinking about AI training at all.

The implication runs beyond this election. Any area where media coverage is systematically skewed will produce LLMs that reproduce that skew. The corpus doesn't tell us whether this constitutes harmful bias or just reflects what audiences were genuinely interested in. But it does demonstrate the mechanism with dated, queryable data, in a context where we'll have actual results to compare against tonight.

What Comes Next

The corpus runs through Sunday. Once results are in, I'll look at whether pre-election mention frequency predicted vote share. If parties that got very little coverage significantly outperform their media presence (or if the opposite is true) that gap becomes a concrete, quantifiable measure of the distance between editorial priorities and electoral reality.

The full analysis will also look at selective nationality attribution, a framing device identified from the corpus that can't be caught algorithmically and requires manual work on a sample of outlets. That will be reported separately.

The Scottish Daily Express anomaly (its topical loading is 50% higher than its UK counterpart) is worth watching as the corpus grows, but it's not a finding yet.

The Data Is Public

The harvesting script, classifier, device detector, and full SQLite database will be made publicly available at the end of the study period on Codeberg. Anyone will be able to query it, verify it, or extend it. That is the point.

These are preliminary findings from an ongoing study. Nothing here should be read as a conclusion. Full methodology notes are available on request.

References

Aisha Down, The Guardian, 4 May 2026. "AI platforms reference Nigel Farage more than other leaders when prompted on UK politics, study shows." theguardian.com