Earth Notes: On Website Technicals (2025-06)

Updated 2025-06-16.
Tech updates: Junited - Rigby to Westin - GPTBot badness, captions, diversion delay... #Junited2025
Junited 2025 this month, then back to choosing the distribution for randomising the scope for consulting on considering a grass-roots movement to form a citizens' assembly to set terms for a Royal Commission to guide the forming of a ministerial task force to consider the scoping and creation of a focus group to sketch terms of reference for a study group to outline the agenda for a pre-meeting to form a steering group to get ready to think about upgrading the RPi server, easy does it...

2025-06-16: Junited 2025: O Westin

O Westin is a prolific creator of very, very short science fiction and fantasy stories on various social media platforms including Twitter and then the Fediverse. Flavour for my feeds!

2025-06-15: Junited 2025: NeilZone

Am I just copying Mr Rigby today, or does Neil Brown simply deserve to win the Internets (for fifteen minutes off-peak, batteries not included)?

From his legal business' diversity section: We have no religion or belief, but Neil prefers vim over emacs.

2025-06-14: Junited 2025: QC

Another of my webcomic favourites is Questionable Content which I have been reading since before it was famous... This embodies a positive vision of AI that I can get behind.

Diversion delay until after noon

2025-06-14 and following day - most diversion delayed until after solar noon

I successfully implemented a scheme to delay remaining diversion to the heat battery once nearly full until after solar noon. It was implemented with 67%-full and Z thresholds, and as shown in the chart most (~2.5kWh) diversion is delayed until after solar noon (cf ~17kWh exports) on . The following day again diversion is delayed until after solar noon.

The aim is to reduce peak spill to grid at around solar noon when many other PV systems will be exporting at their maxima.

2025-06-13: Junited 2025: SMBC

I have had many science-based chuckles from Saturday Morning Breakfast Cereal over the years! I commend it and Zach W to you!

2025-06-12: Junited 2025: MECS and eCooking

One of my supervisors works on, and contributes occasional blog posts to, the Modern Energy Cooking Services UK International Development supported effort to accelerate a transition from biomass to genuinely ‘clean’ cooking.

I learnt just today that while an induction hob (like 16WW's) uses only ~80% of the energy of a plain old resistance hotplate, an EPC (Electric Pressure Cooker in this case) is nearer half!

2025-06-11: Junited 2025: Andrea Sella

Andrea is a UCL chemistry prof with lots of 'rizz'. He has been very kind to me, and to my daughter who he took on a 1.5h private tour of the chemistry facilities while she was deciding what to do at university.

He writes well and is an excellent science communicator, and is active in the Fediverse.

Andrea has just returned his 2014 Royal Society Faraday Prize since it will not eject Elon Musk for his rampage across US science.

Andrea has a heat pump also!

2025-06-10: Junited 2025: Emil Jacobs

Emil strongly advocates for nuclear electricity generation and fairly hard-left politics, which is an unusual combination. I agree that we need nukes in the generation mix, but find him a bit harsh on the main intermittent renewables (solar and wind). Emil is very active on the Fediverse, which is how I follow him.

He says, in response, trimmed a little:

As for my harsh takes on solar and wind: that is functional.

  • I simply see too much wavy takes on energy, stemming from a lack of understanding of how much energy we as a society actually use.
  • I mirror typical attacks on nuclear (waste, cost, etc etc) trying to let people consider a fuller picture.
  • Solar and wind on itself introduce all kinds of issues that feel-good vibe posters ignore (or don't know about)

So, it is a narrative of push back that I try to establish. I'm not against solar and wind and I disagree strongly with pro-nuclear advocates that say we shouldn't care about solar and wind at all (for ironically very similar reasons only-renewable people use against nuclear). We need to build it all. It is just that on fediverse I rarely have to deal with the second group, heh.

2025-06-09: Junited 2025: Michael de Podesta

Michael's blog is full of physics and renewable heating and determination to fix the climate. He used to measure things at the NPL, so he is accurate and only tends to make sweeping generalisations about politicians not moving fast enough!

Default closed captions for video and audio!

With a steer from Mr Cridland (see below) I have added the default attribute to my captions track for both AUDIO and VIDEO. Now my videos (such as on smart TRVs) will show closed captions by default if I have a .vtt file and I am displaying this under an https URL. Hurrah!

James' solution for a HTML5 audio player with VTT captions actually does work for me (again, hurrah!), but produced protests from my current page generation pipeline (and poorly optimised CSS) mainly because of the little bit of inlined JavaScript to tie it together.

So I exported the JS for the case required for the first audio tag on the page, ie the main one for a podcast, and made it async, and now it does work. Tested on Firefox, Chrome and Safari.

Screenshot 20250609 captions being displayed under AUDIO player
Snapshot of captions in action with a min-height:5em vertical box to reduce re-formatting of the page for larger text fragments.

There is some ugly cross-origin stuff to sort out on the m-dot pages ... later.

2025-06-08: Junited 2025: Robert Birming

I am giving into peer pressure and quietly tagging along with the cool kids on Junited 2025. I will be linking to relentlessly tech 'blogs' or adjacent-ish.

I back-filled to the start of the month: Ministry of Truth stylee...

2025-06-07: Junited 2025: James Cridland

I enjoy reading James' blogs on audio, but discovered him though Podnews, from my work on RSS Podcast Feed Inefficiency.

2025-06-06: Junited 2025: Dan Curran

A couple of years ago Dan was in the audience at Kinsgton Efficient Homes Show. This year he presented his own short talk about all the work he has done to move his home towards net zero!

2025-06-05: Junited 2025: Tim Bray

Tim's blog cannot help being quite technical because he is, and has form! But it strays into real life often.

2025-06-04: Junited 2025: Ken Shirriff

Ken Shirriff's blog on retro tech stuff including IC reverse engineering, is fun. I have ridden to the first page of Hacker News at least once I think, posting Interesting BiCMOS circuits in the Pentium, reverse-engineered.

2025-06-03: Junited 2025: DESNZ Written Answers

I would like to bring to your attention a little known blogger, the UK Secretary of State for Energy Security and Net Zero, and his RSS feed of written answers to parliamentary questions.

While a lot of the questions are political bluster (and a few are indeed quite unpleasant in their antisocial implications), their general flavour and responses are useful for tracking the government's views and actions (and inactions and bluster).

2025-06-02: Junited 2025: Chris Siebenmann

Chris often has his "slice of sysadmin life" featured on Hacker News and the like, so needs no exposure from me, but here is a recent musing of several, where he states:

... For me, blocking web scrapers here on Wandering Thoughts is partly an editorial decision of whether I want any of my resources or my writing to be fed into whatever they're doing. I will certainly block scrapers for doing what I consider an abusive level of crawling, and in practice most of the scrapers that I block come to my attention due to their volume, but I will block low-volume scrapers because I simply don't like what they're doing it for.

Are you a 'brand intelligence' firm that scrapes the web and sells your services to brands and advertisers? Blocked. ...

I blocked Semrush many years ago because its business model seemed to be from my point of view making my content easier for lowlife to steal to detour search engine users from what they were actually looking for. Several parts bad.

Plus Semrush did all sorts of such technically bad things that I had to repeatedly raise the stakes legally (eg explicitly denying it any implied rights of access under the Computer Misuse Acts) before it started adhere to even basic etiquette.

Thus Semrush earned permanent spots in several of my robots.txt files such as:

# Asked Semrush 2017/12/09 privately to stop touching my servers entirely.
User-agent: SemrushBot
Disallow: /
User-agent: SemrushBot-BA
Disallow: /
User-agent: SemrushBot-SA
Disallow: /
User-agent: SemrushBot-SI
Disallow: /

OpenAI GPTBot Misbehaviour

One of the reasons that I mark up my HTML semantically, such as with schema.org microdata, is to help search engines and others (even AI engines) absorb the information that I wish to share.

But like many others my systems have been compromised for real human users by the blundering, greedy and sometimes deceitful behaviour of the AI 'bots'.

I make available archives of the data and main HTML pages of the site, eg to facilitate off-line reading of the site. But those archives are big, and some of the more idiotic crawlers (not just AI so far as I can tell) were downloading them over and over. So after a couple of intermediate steps to try not to annoy actual humans that might benefit, I moved the big archive in a directory that has a robots.txt "Keep Out" sign on it:

# DHD20250209: avoid huge consolidated archives being redundantly spidered.
User-agent: *
Disallow: /out/monthly/archive/

However, on a couple of sampling of my logs, looking for the biggest bandwidth hogs over a recent week or so, I found:
www.earth.org.uk:443 20.171.X.X - - [07/May/2025:20:27:30 +0000] "GET /out/monthly/archive/public-data-files.tar.xz HTTP/2.0" 200 437823930 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)"

and:

www.earth.org.uk:443 20.171.X.X - [26/May/2025:18:57:40 +0000] "GET /out/monthly/archive/public-data-files.tar.xz HTTP/2.0" 200 437823930 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)"
www.earth.org.uk:80 20.171.X.X - [26/May/2025:23:05:52 +0000] "GET /out/monthly/archive/public-data-files.tar.xz HTTP/1.1" 302 547 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)"
www.earth.org.uk:443 20.171.X.X - [26/May/2025:23:05:52 +0000] "GET /out/monthly/archive/public-data-files.tar.xz HTTP/2.0" 200 437823930 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)"
www.earth.org.uk:443 85.93.X.X - [29/May/2025:17:18:20 +0000] "GET /out/monthly/archive/public-data-files.tar.xz HTTP/2.0" 200 437830560 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"

The 20.171.X.X IP accesses are from OpenAI / GPTBot's official IP address range.

What gives? This is obviously unjustifiable behaviour. Someone should start an 'unauthorised access' criminal action under the Computer Misuse Acts or similar, whatever one separately thinks of the ethics of AI scraping, etc. This is clearly "the law is for little people" and some of the same "we do not care about you" attitude that took years to resolve with Semrush, see below...

2025-06-01: Junited 2025: Thomas Rigby

I noticed that Mr Rigby was doing Junited 2025, not necessarily on the 1st, nor did he necessarily start on the 1st, but in the spirit of rewriting history, let me go with this!