Earth Notes: On Website Technicals (2022-12)Updated 2023-01-01.
2022-12-29: Christmas Slump
2022-12-18: Static Gallery
I now have a minimal static Gallery site laid out on
sencha (the new-ish main Raspberry Pi server).
Minimal: a home page, raw exhibit files, and HTML exhibit landing page for each exhibit. (The accession files beside the exhibits are also available.) Lots of best practice absent at this scale.
There is also dark mode support due to some header stuff copied from EOU!
All of this is done with fairly bare scripts and Posix utilities, eg
A little more patching up of missing exhibits, and removing of bogus ones, happened. A more through check of possible loss and damage (eg with MD5) is needed.
In the middle of all of this
sencha rebooted, after well over 200 days uptime. I should probably
Next up was to have
sencha serve the Gallery's IP address, and have Apache serve the site and any aliases. Simplifying the cruft in DNS down to the obvious
www.gallery.hd.org and pointing them at the existing primary address seems good! (The Gallery need not have a unique IP address.)
With the very basics working, I'm doing some tidy-up at the edges. Some of this is trimming state space in the search engines' heads.
- Redirect (301, permanent) of any URL with a hostname that is not the canonical
gallery.hd.orgto the canonical.
- Kill off all (now defunct) query parameters by redirect (301, permanent) of a URL with any to a version of the URL stripped of them.
- Exhibit cache life set to be one year (ie nominally forever); for now everything else (other than site furniture) is a little over a month.
robots.txtpartly to rein in bad bots, but also to link to a sitemap in due course.
sitemap.xml, initially containing only the
locentry per URL to keep things fast and simple. Google has been having difficulty reading anything over ~5k entries (
Couldn't fetch), but Bing seems fine, and various verifiers are happy with it...
- Set up DNS and Apache to accept inbound links from many of the old aliases/mirrors, which is then redirected to the canonical URL.
2022-12-12: Gallery Patching
I expect a sprawling archive such as the Gallery's to accumulate bit rot.
In an attempt to detect this, I compared the new and old copies of the exhibit database file trees byte-by-byte:
% rsync -nirc /local/galleryDB/photos/ /mnt2/galleryDB/photos skipping non-regular file "_i18n" skipping non-regular file "locationDB.properties" >fc.T...... clothing/_more2012/_more09/child-clothes-4-6-four-to-six-girl-and-2-3-two-to-three-boy-to-pass-on-or-give-to-school-or-charity-shops-skirts-pyjamas-trousers-slippers-cardigans-jeans-socks-shirts-32-JR.jpg >f+++++++++ light/_more2021/_more05/.accession.LED-multicolour-WiFi-LIFX-Mini-Colour-pendant-in-lampshade-green-1-DHD.jpg.xml >f+++++++++ light/_more2021/_more05/LED-multicolour-WiFi-LIFX-Mini-Colour-pendant-in-lampshade-green-1-DHD.jpg >f+++++++++ mechanoids/_more2020/_more11/phone-cordless-Siemens-Gigaset-AL180-ECO DECT-1-DHD.jpg >f+++++++++ mechanoids/_more2020/_more11/phone-cordless-Siemens-Gigaset-AL180-ECO DECT-2-DHD.jpg >fc.T...... places-and-sights/_more2005/_more12/England-London-Kingston-Market-Place-German-Christmas-market-stalls-shopping-gifts-handmade-presents-toys-sweets-food-trinkets-back-12-DHD.jpg >fc.T...... places-and-sights/_more2011/_more08/England-Isle-of-Wight-Sandown-Zoo-interesting-lepidopteran-black-with-white-striped-resting-on-bullet-point-on-signboard-6-DHD.jpg >fc.T...... places-and-sights/_more2020/_more05/England-London-Kingston-Bonner-Hill-Cemetery-on-sunny-May-bank-holiday-lockdown-quiet-bright-green-flowers-grass-trees-20200525-114652-DHD.jpg
The current copy of the
child-clothes image appears to be intact. The old one appears broken (truncated). (Others marked
>c.T seem more subtly damaged/changed.)
LED-multicolour-WiFi image appears to be of flowers in grass. It appears to be redundant, so has been removed along with the accession file.
ECO DECT (note the space rather than dash) files appear to be duplicates of files with dashes, so have been manually removed.
The current copy of the
England-London-Kingston-Market-Place image seems to be intact.
The current copy of the
England-Isle-of-Wight image seems to be intact.
The current copy of the
England-London-Kingston-Bonner-Hill image seems to be intact.
2022-12-11: Archive green
I have taken fresh 'solid'
LZMA2 full archives with
xz of some of the key data sets that were collected on the just-decommissioned RPi 2B+ host
- CPU temperature records of
- Frequent-sample extracts for Enphase:
- Daily full JSON samples for Enphase:
- Local raw old-format temperature records at the OpenTRV (REV2) receiver:
- Remote old-format 16WW OpenTRV data:
- Remote new-format 16WW OpenTRV data:
- 10-minute logs from
These mainly run until when
sencha became the primary data-collection and logging server.
Some data, such as the 1-minute SunnyBeam generation logs, had already been archived.
Most of these should in principle be triply-redundant. Anything valuable should already have been captured. However, undetected corruption of data, including missed logs, may have happened, and this provides another route for data recovery.
Also the solid continuous form of some of these archives may be useful.
Card adaptor power suck
It seems that mounting the old filesystems (read-only) on the RPi 3B via my micro SD card USB adaptor chain was using as much power as keeping the entire old server running (~900mW) according to measurements with
powermng. Which points out how efficient the old RPi was, and how inefficient random bits of consumer tech can be.
I've now adjusted the footer on every page back to saying server consumption is ~1W, rather than the ~2W that has been there for ~2Y!
The HTML 5 minimiser does not seem to be reliably doing one of the operations that it is set to: sorting the class names in a
Doing so may slightly improve compression. In any case it should not hurt.
I manually edited various scripts that are generating HTML so as to generate class attributes with the class names already sorted.
2022-12-07: You've Got Mail!
Well, I do not! The RPi 2 (B+) that was running mail (
dovecot) and other things is now sitting on my desk. I have removed its micro SD card and mounted relevant partitions (read-only!) via a USB reader on the production RPi 3B to be able to quickly access and copy over files and configuration.
% df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sda2 3.5G 3.0G 409M 88% /mnt /dev/sda3 115G 107G 2.4G 98% /mnt2
Looking first at
dovecot config differences between the old (2B+) and new as-shipped main config I have decided to preserve all the changes in
/etc/dovecot/conf.d/10-master.conf to improve security and reduce resource use. The changes also turn off IMAP service, and plain unprotected POP3, and enable POP3S on port 995. I added
port = 0 to both normal and secure IMAP config to try to ensure that only POP3 is activated.
/etc/dovecot/conf.d/10-ssl.conf I am requiring SSL. I have to provide the cert and key
.pem files to make this work.
I have obtained and modified
mkcert.sh to do this, see for example SSL certificate creation.
At this point dovecot will start, listening on port
I changed in DNS
pop3.exnet.com to point to the RPi 3B.
On my MacBook I turned WiFi off and on and restarted my mail client to flush DNS caches.
My MacBook mail client was then able to connect to the POP3 (RPi 3B) server and collect the dross that had been accumulating in the mailbox for me there!
(This process was actually more messy, and had help from the MBA client's "Connection Doctor" and
telnet and various logs, etc!)
Note that these changes feel too intricate/delicate to enforce via Ansible, at least for now. So I will have to re-do by hand these bits of config when moving mail next time, as things currently stand.
I folded in my previous
/etc/aliases to that of the RPi 3B, and aliased another old admin ID to route to me.
I ported across my
sendmail configuration more-or-less as-is. There seemed to be a couple of new features, documented as 'safe', that I allowed / set up. I turned off
One gotcha that got me again is
sendmail hanging, eg when trying to rebuild aliases, due to
/etc/hosts not giving the fully-qualified domain name for itself. This is bad:
This is good, and lets
sendmail get stuff done rather than repeatedly sleeping for 60s hoping that things will change:
X.X.X.X sencha.exnet.com sencha
Due to the sheer volume of SPAM attempts (at least historically), partly to reduce wear on the SD card, some mail-related logging should be reduced in
Also, an oddity appearing regularly in the mail error logs, though apparently otherwise harmless:
Dec 8 14:13:19 sencha dovecot: log: Error: Received master input for invalid service_fd XX: XX nnnnn BYE Dec 8 14:13:20 sencha dovecot: log: Error: Received master input for invalid service_fd YY: YY mmmm BYE Dec 8 14:13:21 sencha dovecot: log: Error: Received master input for invalid service_fd ZZ: ZZ nnnnn BYE
In any case, mail seems to be broadly working , hurrah!
2022-12-05: Machine Down!
Mail server (green) not responding after power cycling last night and this morning; may require TLC plugged into TV. Long overdue (more than a year) to move away DNS and mail and DNS... It seems that the SD card interface is glitchy (which may explain observed behaviour over a long time). A few errors had also accumulated, which
fsck sorted. Now I am spending several hours with
fsck -c -v checking for bad blocks (I don't know if this even works)...
I took the hint to move pekoe to its newly working wired connection officially. That means, amongst other things, pekoe can take over some permanent services from green, such as being a DNS secondary. I'll aim to get that working tomorrow, with the magic of virtual network interfaces meaning that pekoe can in principle take over green's IP address for that service, so all glue records at the various registrars can stay as-is.
Even leaving the RPi running the
fsck -c -v overnight did not fix things. Some more hand-holding in the morning brought the machine back to a runnable state.
2022-12-02: [[ citationID ]]
It is now possible to drop into any main page a
[[citationID]] or cite tag, at most one per source line, and magic will happen.
Secondly, a References section is created at the end of the page, with a sorted, de-duplicated list of references. Each of these is linked back to the bibliography.html page. Each also has the title shown. If there is a URL available then the title directly links to it, to help external document access be as simple and direct as possible.
Here is an example: [hart-davis202216ww]
It is also possible to create a
<!-- GHOSTREF [[ .. ]] --> record on a line by itself to force creation of a References entry without anything else visible in the text.
These citation IDs are generally all lower-case. For accessibility (a11y), where sections consist of two or more adjacent concatenated words not separated by digits or punctuation, camel case (initial capital) should be used on the second and subsequent concatenated words. [santana2020camel] This is expected (for example) to help screen-readers.
It was mentioned that when a link is included in a Mastodon post, all instances that see that post, because of followers there,
GET the post, thus creating a surge of activity on the server.
I performed a little experiment at 17:42Z, and here is a slightly anonymised sample from my logs. (I do not believe Mastodon server instance names to be private.)
[04/Dec/2022:17:42:43 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mas.to/) Bot" [04/Dec/2022:17:42:46 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://dju.social/) Bot" [04/Dec/2022:17:42:53 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mastodon.green/) Bot" [04/Dec/2022:17:42:56 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://toot.wales/) Bot" [04/Dec/2022:17:42:57 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mastodon.me.uk/) Bot" [04/Dec/2022:17:43:04 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://ohai.social/) Bot" [04/Dec/2022:17:43:10 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mastodon.energy/) Bot" [04/Dec/2022:17:43:13 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://macaw.social/) Bot" [04/Dec/2022:17:43:13 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://toot.community/) Bot" [04/Dec/2022:17:43:14 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mastodon.scot/) Bot" [04/Dec/2022:17:43:15 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mastodon.art/) Bot" [04/Dec/2022:17:43:18 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mstdn.social/) Bot" [04/Dec/2022:17:43:19 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mastodon.org.uk/) Bot" [04/Dec/2022:17:43:19 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.0.4 (Mastodon/3.5.5; +https://mastodonapp.uk/) Bot" [04/Dec/2022:17:43:21 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://fediscience.org/) Bot" [04/Dec/2022:17:43:24 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://c.im/) Bot" [04/Dec/2022:17:43:25 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://fosstodon.org/) Bot" [04/Dec/2022:17:43:26 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://bayes.club/) Bot" [04/Dec/2022:17:43:27 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://dataprotection.social/) Bot" [04/Dec/2022:17:43:27 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://nerdculture.de/) Bot" [04/Dec/2022:17:43:32 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://mastodon.online/) Bot" [04/Dec/2022:17:43:44 +0000] "GET /bibliography.html HTTP/1.1" "http.rb/5.1.0 (Mastodon/4.0.2; +https://chaos.social/) Bot"
More than 20 hits over a minute in this case. Not that intensive, but worth bearing in mind, especially as the count of following instances rises. Still, the server cache should be nice and warm for those hits!
- [hart-davis202216ww] 16WW Eddi PV DHW Diverter Export Margin Analysis (2022-08)
- [santana2020camel] Why you should use camel case for your hashtags