Earth Notes: On Website Technicals (2023-09)

Updated 2024-02-03.
Tech updates: crashy, bandwidth, og:image, archivedAt, copyrightNotice, contentReferenceTime, cookie consent, immutable, RPi5, reutils.
A new school/university year, a new server kernel? Actually the RPi5 looks like a decent (1.8W idle) upgrade, and the SD card in the RPi looking after my Thermio seems very unhappy after a power cut, so maybe changes coming all round soon!

2023-09-28: Raspberry Pi 5 Good Off-Grid?

The RPi5 is nearly on the market. It seems to have an RTC built-in, which means that I might be able to do without HAT(s). But is its minimum (eg idle) consumption: will it be low enough for my off-grid server when I replace my RPi3B?

According to Jeff Geerling idle consumption of Raspberry Pi 5 Model B Rev 1.0 is 1.8W on an 8GB device running Debian GNU/Linux 12 (bookworm) aarch64 with kernel 6.1.32-v8+.

(Subsequent measurements suggest 1.5x to 2.x RPi3B idle consumption, so a little over 2W. This may drop a little with firmware updates, as happened with the RPi4.)

So the RPi5 may still be good for off-grid, but with a lot more oomph on demand (~8W more CPU power).

Indeed I may be able to burn a few watts of excess solar when the battery is full, catching up on expensive stored work!

reutils V1.2.2

The BMRS system seems to be glitching more often. So reutils V1.2.2 avoids tooting (ie posting to Mastodon) if it does not have fresh data. That should eliminate low-value toots of the form National Grid CO2 intensity probably average so don't run big appliances now if you can wait.

2023-09-25: AdSense GDPR Messages

Recently, Google AdSense started insisting that I set up a GDPR cookie message. Even though I had turned off ad personalisation and I do not do anything requiring permission myself so far as I can tell.

So I set up the Google-provided mechanism ~2023-09-07.

One slight bonus is that now Google is allowing me explicitly to shut out third-party providers making dubious claims to legitimate interest in tracking me across multiple sides, establishing my inside-leg measurement, etc. At this stage excluding everything but non-personalised ads via Google itself seems not to have diminished my tiny revenue. And I have now removed JavaScript from essentially all pages other than the few with ads, so those are the only places a cookie-consent pop-up can even appear.

Google continues to complain that I'm now asking for permission but not doing ad personalisation. Well duh...

Anyhow, the stats so far are interesting even though clouded a bit with noise from me setting things up and testing.

Messages shownEEA and UK traffic rateConsent rate
25985%58%

The consent rate has been fairly stable and higher than I expected. I have a prominent "I do not consent" button to make it easy to say no. Saying no simply means getting no ads at all (in the UK and EU) I think.

Immutable

I am reinstating the immutable directive for Cache-Control response headers for objects under /img/. Firefox and Safari, and thus ~22% of users according to Can I Use, may benefit.

Once an object exists under /img/ it may be removed. It may also be replaced with a functionally/logically identical smaller object, ie it may get optimised, but should not otherwise change. This seems a good use of immutable.

But I think that I can safely remove the public directive, having re-read specs.

2023-09-22: copyrightNotice and Other Metadata

I have added a copyrightNotice to every page, wrapping the existing copyright statement.

I have manually added an archivedAt metadata link to the home page, pointing to the offline ZIP archive.

I have selectively added a contentReferenceTime to a couple of pages where it seemed relevant: there may be more. It may also be sensible to automate insertion of this in some cases, maybe even in dated entries such as this, and ins?

2023-09-18: Mastodon vs Twitter and og:image

In the olden days, Twitter strongly encouraged you to add an appropriate meta name="og:image" tag to your HTML page header. This allowed Twitter (and other systems) to show a thumbnail of the specified image in your tweet, which improved engagement.

Twitter would fetch that image once (or a small number of times) when the tweet was initially made, and occasionally thereafter, and scale it to show to clients in the tweet that they saw.

Google's instrumentation, eg for AMP, had/has some strong views on the minimum number of pixels that should be in such images, in part to enable a good experience on high-resolution desktops and mobiles based on some related uses of that image.

So I have tended to pick a decent ~1000px image and let my static site builder extract a letter-box clip for the page hero. That hero is made to weigh in at under 40kB. The original weighing an order of magnitude more does not matter to viewers of the Web page; they may never get to see it.

Under the Mastodon / ActivityPub / Fediverse distributed model, even with only a few hundred followers, posting a link to an EOU page causes tens to hundreds of requests for the page HTML and the og:image which often dwarfs the page in size, again by an order of magnitude often.

So in a couple of significant cases, such as the grid intensity page which is posted a lot, I have made efforts to shrink the og:image (see yesterday) and/or replace the full image with a hand-crafted letterbox image. Significance was judged by turning up in the top-50-ish results from yesterday's bandwidth-hog measurements.

Unless thought about, this risks DDoSing EOU whenever I toot a link to it! (Conversely, it means that distributed stress-testing on demand just became easier for me, when I want it...)

For example, one automated carbon intensity post resulted in 70 fetches of the og:image within 7 minutes, all but 2 within the first 2 minutes. The EOU account on which the toot was posted has 311 followers currently. The image's cache life (Cache-Control max-age) is ~1 year. All of the GET requests were 200 (none 304). The previous toot with the og:image was ~1 hour before, and the image has not changed in that time.

(See also 2024-01-29: Mastodon Preview Stampede.)

2023-09-17: Bandwidth Hogs

I think that I have done this before, but I cannot find my previous if so...

I have put together a small script that examines site logs for a week and totals up count and total bytes per EOU site object, looking for bandwidth sucks.

In principle that might help me minimise/optimise objects that are the targets of (say) Mastodon / ActivityPub mass fetches when a page link is posted.

In practice I see that the top entries are mainly FLAC files that are probably being downloaded by greedy and stupid spiders.

The very top item, responsible for ~500MB last week (168 downloads) was a version of the video of me digging compost with toddler commentary.

10s "compost bin worm toddler commentary [VIDEO]" (poster) Uploaded . Downloads:

In at number 6 is ~130MB in three downloads of a rather obscure data archive, again unlikely to have been a human driving!

In at number 9 is ~110MB for ~1500 downloads of the energy series dataset page, which suggests that many of them are being fetched without any compression:

407398 energy-series-dataset.html
 23461 energy-series-dataset.htmlbr
 31308 energy-series-dataset.htmlgz

(Indeed uncompressed transfers seem mainly to be Akkoma Fediverse bots and some Pleroma...)

In at number 16 is ~60MB is ~5000 polls of the podcast RSS file.

gCO2perkWh In at number 22 is ~45MB is ~3200 downloads of the banner image that goes in automated grid-intensity posts to Mastodon. So I have made as tight a .webp version as I can, a bit lossy, and smaller than the previous 8006 bytes. This should save bandwidth for browsers, but possibly not for the Fediverse bots fetching the original.

cwebp -v -m 6 -near_lossless 1 img/grid-demand-curves/gCO2perkWh-1.png -o img/grid-demand-curves/gCO2perkWh-1.png.webp
8845 img/grid-demand-curves/gCO2perkWh-1.png
4522 img/grid-demand-curves/gCO2perkWh-1.png.webp

I did another round of byte squeezing, the .png again with TinyPNG and zopflipng -m -m -m, and re-created the .webp:

7535 img/grid-demand-curves/gCO2perkWh-1.png
4478 img/grid-demand-curves/gCO2perkWh-1.png.webp

In other news

While messing around I found that I had left some bytes on the table elsewhere:

% ls -al img/tools-800w-JA.jpg*
31421 img/tools-800w-JA.jpg
11321 img/tools-800w-JA.jpg.avif
 1708 img/tools-800w-JA.jpg.avifL
20021 img/tools-800w-JA.jpg.jxl
 5104 img/tools-800w-JA.jpg.jxlL
10791 img/tools-800w-JA.jpgL
  145 img/tools-800w-JA.jpg.txt
24608 img/tools-800w-JA.jpg.webp
 3844 img/tools-800w-JA.jpg.webpL
% script/lossless_JPEG_compress img/tools-800w-JA.jpg
INFO: file 31421 shrunk to 31415 (non-progressive) img/tools-800w-JA.jpg
INFO: file 31421 shrunk to 30661 (semi-progressive) img/tools-800w-JA.jpg

A ~2.5% lossless saving on a common hero image... (The .jpgL had a little bit of fat in it also.)

IPTables to the rescue

While trying to do some of that optimisation, my logs were being sullied by several bogus repeated requests per second from a UAE data centre IP, which I blocked entirely in iptables.

2023-09-02: Crashy Server

For some reason sencha is now crashing every few days.

One crash while capturing monthly data archives, just after revision 53008, lost one of the less important monthly data sets (CPU temperature data/RPi/cputemp/202308.log), and got the SVN client side in a severe tangle that it took some hours to undo.

We will never talk about revisions 53009 to 53012 again, OK?

Things were largely sorted by revision 53019, though some clean-up is still needed of log files damaged during the crash.

(Corruption clean-up in aisle 53038...)

Maybe time for a server upgrade to a much newer kernel?