Earth Notes: On Website Technicals (2024-01)

Updated 2024-05-06.
Tech updates: data curation, sitemap ping gone, heating polling again, offline too eager, RPi upgrade, more citation, stampede DDoS.
New Year (and new month): data curation for the win! And the heat battery RPi finally got an upgrade and fix... Less so the Mastodon Preview Stampede / DDoS issue, though I have put in place reasonable mitigations.

2024-01-29: Mastodon Preview Stampede

When a link is included in a Mastodon/Fediverse toot every single downstream instance that has one of your followers pulls down the HTML page, and then the og:image linked in the page header, for a preview.

(See previous note on 2023-09-18: Mastodon vs Twitter and og:image.)

While Twitter also pulled a preview image, it was typically 1 to 3 times, not hundreds of times in a 'stampede' that can DDoS the host server.

There is a partial Mastodon/Fediverse mitigation already, with randomised delays before the requests (eg up to 30 seconds), and more mitigations are being discussed.

But in the interim I am tempted to add some mitigation on my side, that when a bot makes an image request (no Referer) as happens in the stampede, then if a lower-fi version of the image is available then return that instead along with a Vary: Referer to help caches do the right thing.

In this case for any /img/....jpg or png as such og:images are constrained to be, if there is a jpgL or pngL version there then that is the low-fi version and can be served instead.

This will affect all image spidering, eg by Google, but that probably does not hurt much, as EOU is not likely being found much through images. And these low-fi images should still be usable.

For the og:image for the grid intensity page (roughly hourly auto-tooted, so hit often by stampedes) there was already a .png.webp version, and I now manually created a lower-bit-depth PNG that looks fine, and saves about 4kB per hit...

% ls -al img/grid-demand-curves/gCO2perkWh-1.png*
  7535 17 Sep 12:27 img/grid-demand-curves/gCO2perkWh-1.png
  4478 17 Sep 12:27 img/grid-demand-curves/gCO2perkWh-1.png.webp
  3545 29 Jan 18:08 img/grid-demand-curves/gCO2perkWh-1.pngL

For the currently most-popular thermal imaging page and its image nearly 200kB could be saved per hit, and the image is still OK albeit fuzzy at a low (~10) quality factor:

% ls -al img/thermal-images-tiled-2x1-1208w.jpg*
222997 15 Jan  2019 img/thermal-images-tiled-2x1-1208w.jpg
 60628 22 Aug  2021 img/thermal-images-tiled-2x1-1208w.jpg.webp
 26718 29 Jan 19:13 img/thermal-images-tiled-2x1-1208w.jpgL

First attempt at the code:

# Serve lower-fi (smaller) images to bots with no Referer set.
# These will typically not Accept WEBP in place of JPEG/PNG.
# These are the original request with an 'L' appended, where available.
# These will typically be manually generated for frequently tooted pages.
# This will cover most search engines (which may be unacceptable).
# This is intended to mitigate Mastodon page preview og:image jpg/png stampedes:
# Using HTTP:Referer should get Referer added to the Vary list.
RewriteCond %{HTTP:Referer} ^$
RewriteCond %{THE_REQUEST} /img/.*.(jpg|png)
RewriteRule ^/(.+)\.(jpg|png)$ /$1.$2L [L]

The RewriteCond %{THE_REQUEST} /img/.*.(jpg|png) line is an optimisation to try to avoid checking the filesystem for the 'L' variant (which is potentially relatively slow and expensive) unless the request looks plausibly the right sort of request.

This code could probably be integrated with the Save-Data part that can also serve up WEBP image variants.

The above does not seem to be setting the Vary: Referer header in the response with the low-fi image. (This may be why.) Trying an alternative formulation:

<If "%{HTTP:Referer} == ''">
  <FilesMatch "\.(jpg|png)$">
    Header merge Vary Referer
RewriteCond %{THE_REQUEST} /img/.*.(jpg|png)
RewriteRule ^/(.+)\.(jpg|png)$ /$1.$2L [L]

That sets Vary: Referer all the time, and does not return the low-fi image when it should! A variant with <If "-z req_novary('Referer')"> did not help either.

I have now put this fairly general code near the top:

# No-Referer requests (often bots) may be handled differently several ways.
# Make that clear to caches, though ONLY for such requests, which is naughty.
SetEnvIf Referer ^$ NO_REFERER_SET
Header merge Vary Referer env=NO_REFERER_SET

and I have reinstated:

RewriteCond %{HTTP:Referer} ^$
RewriteCond %{THE_REQUEST} /img/.*.(jpg|png)
RewriteRule ^/(.+)\.(jpg|png)$ /$1.$2L [L]

This combination seems to be working, and I should stop here, but I would like to try merging with the Save-Data rules. So that block is comment out and the image handling block is now:

# Serve alternate compact WEBP or low-fi images if possible.
# Ensure that caches handle Accept and Save-Data correctly.
# Treat an empty Referer as equivalent to "Save-Data: on",
# mainly to mitigate Mastodon page preview og:image jpg/png stampedes:
# but reducing bot load from image requests is probably a good thing also.
<IfModule mod_headers.c>
  <FilesMatch "\.(jpg|png)$">
    Header merge Vary Accept
    Header merge Vary Save-Data
  # Four cases for primary images (.png, .jpg):
  #      Save-Data   image/webp  Serve
  #  1   on          yes         .xxx.webpL if extant
  #  2   on          no          (or no .webpL), .xxxL if extant
  #  3   x           yes         .xxx.webp if extant
  #  0                           .xxx (no rewrite)
  # Aim to serve in priority: .webpL, L, webp, original.
  # 1  - Save-Data, can accept WEBP, have .webpL file: serve it!
  RewriteCond %{HTTP:Save-Data} on [NC,OR]
  RewriteCond %{HTTP:Referer} ^$
  RewriteCond %{HTTP:Accept} image/webp
  RewriteRule ^/(.+)\.(jpg|png)$ /$1.$2.webpL [L]
  # 2 - Save-Data, cannot accept WEBP or have no .webpL, have L file: serve it!
  RewriteCond %{HTTP:Save-Data} on [NC,OR]
  RewriteCond %{HTTP:Referer} ^$
  RewriteCond %{HTTP:Accept} !image/webp [OR]
  RewriteRule ^/(.+)\.(jpg|png)$ /$1.$2L [L]
  # 3 - can accept WEBP, have .webp file: serve it!
  RewriteCond %{HTTP:Accept} image/webp
  RewriteRule ^/(.+)\.(jpg|png)$ /$1.$2.webp [L]
  # 0 - by default do not rewrite.

This allows me to create .webpL versions of frequently-hit files, though I suspect that the Mastodon previews in particular will not use them.

% ls -al img/grid-demand-curves/gCO2perkWh-1.png*
  7535 17 Sep 12:27 img/grid-demand-curves/gCO2perkWh-1.png
  4478 17 Sep 12:27 img/grid-demand-curves/gCO2perkWh-1.png.webp
  3545 29 Jan 18:08 img/grid-demand-curves/gCO2perkWh-1.pngL
  3484 30 Jan 13:06 img/grid-demand-curves/gCO2perkWh-1.png.webpL
% ls -al img/thermal-images-tiled-2x1-1208w.jpg*
222997 15 Jan  2019 img/thermal-images-tiled-2x1-1208w.jpg
 60628 22 Aug  2021 img/thermal-images-tiled-2x1-1208w.jpg.webp
 26718 29 Jan 19:13 img/thermal-images-tiled-2x1-1208w.jpgL
 18650 30 Jan 13:25 img/thermal-images-tiled-2x1-1208w.jpg.webpL


Some support has now been added to the page build script to log output where a page has what would be a large preview og:image of the form:

INFO: POSSIBLE PREVIEW STAMPEDE ISSUE: no img/OpenTRV/Wembley-Stadium-Arch-under-construction-AJHD.jpgL and img/OpenTRV/Wembley-Stadium-Arch-under-construction-AJHD.jpg is   211675 bytes so larger than 20000 bytes in .whitepaper-OpenTRV-TRV1.5-North-London-trial-winter-2016.html.

Running the following scripts help find candidates to provide L and WEBP versions for, manually:

% egrep STAMPEDE .build/* | awk '{print $9, $11}' | sort | uniq -c | sort -n
   6 img/EGC/radiator-2845463-1920w-G.jpg 79629
   7 img/16WW/gas-meter-m3.jpg 159023
   7 img/solar-PV-panels-on-roof.jpg 182359
   9 img/OpenTRV/20150515M2MNetworkOutline-small.png 49842
   9 img/solar-cells-800w.jpg 37167
  13 img/people-meeting-2.jpg 283880
% egrep STAMPEDE .build/* | awk '{print $9, $11}' | uniq -c | awk '{print $1*$3, $0}' | sort -n
1000734    2 img/compost-bin-in-garden.jpg 500367
1113161    7 img/16WW/gas-meter-m3.jpg 159023
1262173    1 img/front-entrance-after.jpg 1262173
1305399    1 img/wind/Herne-Bay-to-Kentish-Flats-windfarm-horizon-20210728-evening.jpg 1305399
1419400    5 img/people-meeting-2.jpg 283880
1464314    1 img/audio/ambient/16WW/20230909T12Z-16WW-garden-ambient-noon-hot.jpg 1464314
2318905    5 img/tools-3200w-JA.jpg 463781
% egrep STAMPEDE .build/* | awk '{print $9, $11}' | uniq | sort -n -k 2
img/wind/North-Atlantic-extratropical-cyclone-earth20141110-NASA-JPL-Caltech-NOAA.jpg 860295
img/site/podcast/MacBook-Air-Leap-mobile-headset-Jabra-headset-Blue-Yeti-2.jpg 871015
img/front-entrance-after.jpg 1262173
img/wind/Herne-Bay-to-Kentish-Flats-windfarm-horizon-20210728-evening.jpg 1305399
img/audio/ambient/16WW/20230909T12Z-16WW-garden-ambient-noon-hot.jpg 1464314

2024-02-06: narrow

To avoid having warnings for ~50 pages where there is in practice no problem, I have narrowed to only warn where the og:image appears in the list of top bandwidth hogs. That list is currently dominated by audio files. Only ~7 out of the top 100 are JPEG or PNG.

2024-01-28: More Citation

I have added itemprop=citation metadata markup to short-form auto-inserted "References" section entries.

I have also added for full bibliography entries the creativeWorkStatus where it can be reasonably deduced. Online stuff does not count as Published in this sense, unlike books and journal articles which do. Anything @unpublished has status Draft.

2024-01-19: Heat Battery RPi Upgrade

I have a new microSD card sitting on my desk glaring at me, and I have cleared enough of the to-do list. So it is time to work on upgrading pekoe, the RPi (~2014, B+ V1.2) that listens to the Sunamp Thermino heat battery, and sometimes manages additional top-up from grid via the Eddi.

In 2022 I installed a "Lite" 32-bit version of Raspberry Pi OS. I probably want to do about the same this time, probably sticking to a 32-bit version of the OS even if the hardware supports a 64-bit version.

(During this process I'd also like to capture the current revision of the Thermino firmware for my records.)

Right now grid carbon intensity is far too high to be doing any top-ups, but I would like pekoe to be running when it is, tomorrow!

I also want to avoid losing any existing logged data, and to avoid a long gap in logging during the transition.

I also want to maintain the ability to manage most system stuff via Ansible, keep the replacement on the same IP, etc, etc.

Raspberry Pi Imager

A good place to start seems to be to download the official Raspberry Pi Imager. I already had one on my MacBook Air, so I replaced it.

I selected the "lite" (no desktop) "legacy" 32-bit Debian Bullseye image, and the 128GB microSD card in a USB adapter.

I was asked if I wanted to customise OS settings (of course!), starting with grabbing WiFi passwords from my MBA account keychain.

I was asked for a hostname (pekoe), a username and password for me (though I will reconfigure it to be ssh-key-only very quickly), and whether to enable ssh (initially password authentication).

Writing the OS image to the microSD card and verifying took under a minute.

2024-01:20: upgrade!

After waiting until 4pm for the heat battery to mostly refill after my bath, it was time to upgrade.

Things to do:

  • Capture/commit partial main log for month to date.
  • Capture live logs as ZIP archive in repo.
  • Turn off pekoe (), and remove old microSD card.
  • Install new microSD card, power up, not networked, connect keyboard and monitor.
  • Configure fixed network address.
  • Configure via sudo raspi-config a few other tweaks such as minimum GPU memory.
  • Copy over private ssh key from old filesystem.
  • Power up (17:20Z) connected to wired network, discover on network ASAP, disable direct login with passwords.
  • Run ansible to bring image to expected configuration, except Thermino data gathering.
  • Add microSD-card-saving tweak (18:05Z): increased commit interval (commit=300).
  • Copy over from old SD card mounted read-only, then check out for updates SVN repo for /rw/docs-public/ (sudo rsync -av -P --exclude things-causing-read-errors /mnt/rw/ /rw) noting that the old microSD card is likely cranky which is why this is happening, and a forced fsck was not actually clean.
  • Symlink to Env, including latest live logs.
  • Manually run Thermino interface tools to test.
  • Compare old card (mounted on external filesystem) with new; fix up as needed. Old card was flaky enough that files and SVN metadata were rsynced in from the server and laptop to patch things up overnight.
  • Carefully reinstall Thermino (live) logs for for new data to append to (Sunday 12:40Z).
  • Run ansible to bring image to expected configuration, including Thermino data gathering (Sunday 12:22Z).
  • Observe/test/fix.
  • Add microSD-card-saving tweak: disable swap (sudo systemctl disable dphys-swapfile.service and/or edit /etc/dphys-swapfile to set CONF_SWAPSIZE=0).

2024-01-21: Thermino logging/top-up working again

The old microSD card was sufficiently unhappy that it took overnight repair via ssh to get the bulk of the data recovered, but the Thermino logging was working by 1pm today, and driving the Eddi over the network in the next quarter of and hour or so, and not much opportunity to fill boots with low-carbon heat was lost.

(My ~/.netrc needed copying over manually, else the Eddi actions would silently fail.)

The main remaining fixes are:

  • Re-enable DNS secondary via Ansible.
  • Disable NetworkManager?
  • Disable ModemManager?
  • Add power-saving tweaks such as powering down HDMI after boot if no one logged in?
  • Reduce syslog traffic to SD card.

2024-01-22: DNS reinstated

The right way to do this would have been have ansible pull suitable files from the DNS master's build and install them.

However, that is a bit messy because I do not run ansible from that machine for starters.

So I took the easy way out and diffed the old and new unconfigured /etc/bind and simply copied over the two relevant files when only the stuff that I expected to (my local zones and options) had changed.

Then I ran sudo /sbin/rndc reload.

I ran MxToolBox SuperTool DNS checker before and after.

Before the reload it complained about that DNS server being down, after the reload it was happy. So all good.

I am also trying the BIND rate-limit option to try to reduce abusive behaviour. I have tweaked it to fire only on what seems to be definitely bad traffic.

(Even a very simple use of tcpdump shows lots of probably-dodgy DNS activity...)

I also set up iptables for my peace of mind...

2024-01-14: Fediverse Polling on UK Home Heating

After testing the water on Mastodon, I think that I can now usefully resume my research/polling about the UK home central heating season.

Other social media platforms may also be used!

Offline page rebuild lazier

Off-line main pages were being rebuilt during the pages-incr incrementable rebuild, apparently through their HTML being validated. Such rebuilds should be monthly and batched, ie much less keen and costly. This over-eagerness has now been tamed!

2024-01-11: Sitemap Ping No More

Google no longer accepts sitemap pings, so I have stopped doing them.

Pinging atom sitemap feed...
Site ...
--2024-01-11 15:04:50--
Resolving (, 2a00:1450:4009:826::2004
Connecting to (||:443... connected.
HTTP request sent, awaiting response... 404 Sitemaps ping is deprecated. See
2024-01-11 15:04:51 ERROR 404: Sitemaps ping is deprecated. See

2024-01-02: Data Curation

While for some ... visions of sugar-plums danced in their heads ... (go easy on the psychedelics tiger), I start to think about the capture and analysis of (mainly home) data for the full year (including the normal month-end work). There is some data not from 16WW, such as GB grid intensity.

I get some of that out of the way early where I can. Our failed gas boiler meant that I could take the gas meter reading very early knowing that it would not change before the year rolled!

One new feature of this year was my discovery of Zenodo and Dryad repositories for some of my open data, complete with DOIs, as a side-effect of my PhD research!

Eddi minute data

Another new thing this year was to fetch from the Eddi servers a full dump (1-minute resolution) of all historic data for 2023 (and 2022).

I wrote a new and slightly dodgy script to do so!

Output is one JSON array per day, each day on a line, the array being of maps of the form (with zero values omitted):


(I have not made this data public yet, as I need to think about privacy.)

% sh script/myenergi/ 2023 | xz --best -e > X.xz
% ls -l X.xz
3702160 X.xz
% xz -d < X.xz | wc -cl
     365 61135090

Lots of redundancy, which xz eats up.

Lost files

Once in a while it is good to check that I have not failed to check in some files. This is made trickier by my EOU mix of live data and checked-in stuff.

This is an example of how I can check (screening 'live' files, etc):

% svn status data | egrep -v '/live/' | egrep -v data/consolidated/energy
?       data/.flags/consolidated.flag
?       data/16WW-mains-water-inlet-temperature-month-cadence.csv
?       data/16WW-mains-water-inlet-temperature-month-cadence.mid
?       data/16WW-mains-water-inlet-temperature.csv
?       data/WW-PV-roof/raw/index.html

Those are all legitimately not checked in, so nothing more to do.

Monthly automatic

At the start of each month a whole pile of stuff is turned into archive file and svn add run ready for me to commit, something like this:

% svn status -q
A    data/16WWHiRes/Enphase/202312.daily.production.json.gz
A    data/16WWHiRes/Enphase/202312.log.gz
A    data/FUELINST/log/202312.log.gz
A    data/OpenTRV/pubarchive/localtemp/202312.log.gz
A    data/OpenTRV/pubarchive/remote/202312.json.gz
A    data/RPi/cputemp/202312.log.xz
A    data/SunnyBeam/202312.gz
A    data/WW-PV-roof/raw/sunnybeam.dump.20231201.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231202.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231203.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231204.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231205.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231206.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231207.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231208.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231209.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231210.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231211.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231212.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231213.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231214.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231215.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231216.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231217.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231218.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231219.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231220.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231221.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231222.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231223.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231224.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231225.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231226.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231227.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231228.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231229.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231230.txt
A    data/WW-PV-roof/raw/sunnybeam.dump.20231231.txt
A    data/eddi/log/202312.daily.csv
A    data/eddi/log/202312.freqResp.log.gz
A    data/eddi/log/202312.hourly.csv.gz
A    data/eddi/log/202312.log.gz
A    data/heatBattery/log/202312.log.gz
A    data/powermng/202312.log.gz

Yearly extras

I capture yearly views of data, partly for redundancy, so usually with a different compression tool and format, typically xz rather than the more typical gzip. (Though xz turns out to be less robust than is ideal...)

I can generate these yearly archives from a catenation of (decompressed) monthly archives, but I prefer to gather the data from closer to the source or even in a different way to guard against some sorts of storage or decode failures. The new annual Eddi data dumps are in that spirit.

Here is a little shell history of manual activity to make these yearly dumps. (I usually head and tail re-decompressed output as a sense check.)

    32	14:17	pushd data/FUELINST/log/
    33	14:17	ls
    34	14:18	cat live/2023????.log | xz -v -9 > 2023.log.xz
    35	14:18	svn add 2023.log.xz
    36	14:18	xz -d < 2023.log.xz | head
    37	14:18	xz -d < 2023.log.xz | tail
    38	14:18	xz -d < 2023.log.xz | tail -1000
    39	14:19	popd
    40	14:19	svn -m "" commit
    41	14:19	more data/FUELINST/log/live/20240101.log
    42	14:20	head -3 data/FUELINST/log/live/{20231231,20240101}.log
    43	14:20	pushd data//SunnyBeam/
    44	14:20	ls
    45	14:21	cat /var/log/SunnyBeam/2023????.log | xz -v -e > 2023.xz
    46	14:23	svn add 2023.xz
    47	14:23	xz -d < 2023.xz | head
    48	14:23	xz -d < 2023.xz | tail
    49	14:23	popd
    50	14:23	svn -m "" commit
    51	14:25	push data/eddi/log
    52	14:25	pushd data/eddi/log
    53	14:25	ls -al
    54	14:25	ls 202?.*
    55	14:27	popwd
    56	14:27	popd
    57	14:27	pushd data//heatBattery/log/
    58	14:27	ls -al
    59	14:29	cat live/2022????.log | xz -v -e > 2022.log.xz
    60	14:30	svn add 2022.log.xz
    61	14:30	xz -d < 2022.log.xz | head
    62	14:30	xz -d < 2022.log.xz | tail
    63	14:30	cat live/2023????.log | xz -v -e > 2023.log.xz
    64	14:31	xz -d < 2023.log.xz | head
    65	14:31	xz -d < 2023.log.xz | tail
    66	14:31	svn -m "" commit
    67	14:31	svn add 2023.log.xz
    68	14:31	svn -m "" commit
    69	14:31	popd
    70	14:31	pushd data/powermng/
    71	14:31	ls -al
    72	14:32	cat /var/log/powermng/2022????.log | xz -v -e > 2022.log.xz
    73	14:32	svn add 2022.log.xz
    74	14:32	xz -d < 2022.log.xz | head
    75	14:33	xz -d < 2022.log.xz | tail
    76	14:33	cat /var/log/powermng/2023????.log | xz -v -e > 2023.log.xz
    77	14:33	svn add 2023.log.xz
    78	14:33	xz -d < 2023.log.xz | head
    79	14:33	xz -d < 2023.log.xz | tail

I forgot to do some of this for 2022, so caught up this time!

And just before committing...

% svn status -q
A    data/powermng/2022.log.xz
A    data/powermng/2023.log.xz
A    data/heatBattery/log/2023.log.xz
A    data/heatBattery/log/2022.log.xz
A    data/SunnyBeam/2023.xz
A    data/FUELINST/log/2023.log.xz
A    data/16WWHiRes/Enphase/adhoc/net_energy_2021.csv.xz
A    data/16WWHiRes/Enphase/adhoc/net_energy_2022.csv.xz
A    data/16WWHiRes/Enphase/adhoc/net_energy_2023.csv.xz
A    data/OpenTRV/pubarchive/remote/2022.json.xz
A    data/OpenTRV/pubarchive/remote/2023.json.xz
A    data/OpenTRV/pubarchive/localtemp/2022.log.xz
A    data/OpenTRV/pubarchive/localtemp/2023.log.xz


(Count: 1)