Earth Notes: On Website Technicals (2024-03)
Updated 2024-04-26.2024-03-31: Mobile-only View
I have added a new dmob
desktop-only CSS class that hides content on wider-than mobile screens.
@media screen and (min-width:640px) { ... /* Mobile-only (hidden on wider screens). */ .dmob{display:none} }
I have done this so that I can drop in a hint just above (the first) audio or video player on a page for mobile (narrow-screen) users that are on the desktop full-fat pages that they may wish to switch to 'lite':
(Slow or expensive connection? Switch to the mobile/lite view.)
This dmob
-class paragraph should disappear in full/desktop (non-mobile, non-offline) site view for wide-ish screens/viewports.
This is the first update to the desktop pages CSS in over three years.
2024-03-29: & of Doom
Apparently the world ends if an HTML &name;
entity code is allowed into an RSS title
tag. Such as in Repair Café ...
for example.
So I have made an evil hack. Any entity in that case is rewritten from &Xyz;
to X
, which is often the correct unaccented form. It will get me by for the moment...
I have done the same for titles in sitemap.atom
and other Atom files.
Auto dark mode for dashboard and intensity page
I inserted the magic in-line CSS @media (prefers-color-scheme:dark){body{background-color:#000;color:#eee}img{filter:brightness(.9)}}
for those two pages.
2024-03-27: mp3L
I am happy with my automatic ~16kbps Opus audio conversion. The ffmpeg
flags are:
-codec:a libopus -ac 1 -b:a 16k -f opus
For the lo-fi mono .mp3L
auto-generation I am trying lowest VBR bit-rate, mono, 10kHz low-pass, best (and slowest) compression:
-codec:a libmp3lame -qscale:a 9 -ac 1 -cutoff 10000 -compression_level 0 -f mp3
At least on my MacBook Air with a very recent ffmpeg
the results sound acceptable and have a reasonable size on my first test, with a 3-second video source:
3124848 img/video/Welcome-1.mp4 16174 img/a/a/Welcome-1.l3124848.48k.mp4.mp3L 7551 img/video/Welcome-1.opusL
The file
utility on the MBA reports for the .mp3L
:
Audio file with ID3 version 2.4.0, contains: MPEG ADTS, layer III, v1, 64 kbps, 48 kHz, Monaural
whereas a higher nominal bit rate (128kbps vs 64kbps) is claimed for one of the supposedly-equivalent .mp3L
files generated by Audacity:
Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 48 kHz, Monaural
I adjusted the RSS podcast generator to request auto-generation of .mp3L
, but other than one edge case (now fixed, where I had checked in a smaller .mp3
than could be auto-generated as .mp3L
) that does not change the RSS feed file/content.
Extending the AUDIO
tag to auto-generate .mp3L
s where needed produced ~100 files.
2024-03-26: Low-carbon
On the GB Grid Intensity page I have renamed the zero-carbon
fuels total to low-carbon
because, as overall grid intensity falls, the difference stops being a rounding error!
OPPP
I am taking the stats service from Open Podcast Prefix Project for a quick spin to see its public stats.
2024-03-28: episode 60
I published the 60th podcast episode this afternoon. It will be interesting to see if human listeners show up in the stats at all.
(For this episode, for the first time, the .mp3L
and .opusL
lower-fi, lower-bandwidth audio variants were auto-generated.)
2024-03-25: Auto-generating Opus
The RSS feed now should auto-generate a very-low bandwidth Opus audio file from the primary enclosure (or a lossless version if available) if one is not checked in.
This is done by a shiny new script script/audioBuildLossy.sh
which borrows a lot of machinery from the lo-fi / hero image generation scheme.
I think the slightly older ffmpeg
on sencha
the EOU server may make slightly larger and/or less good Opus files than on my MacBook Air, and so the checked-in versions are not redundant. I can check in a hand-crafted .opusL
at any time for best results, and it should be used in preference next time the RSS feed is rebuilt.
This can be extended to make the .mp3L
in future, and then even the nominal primary .mp3
from a lossless master.
I have also taken the opportunity to add the lossless FLAC, if present, as an alternateEnclosure
in its own right. It will not exist for video episodes. I may regret this (and undo it) if lots of people listen to the FLAC!
I have added the same facility to the AUDIO
tag: a .opusL
will be generated if one is not checked-in. So suddenly a lot of audio files will now have a smaller version to download, and that smaller one becomes the default for 'lite' pages.
Auto-generated .opusL
s for VIDEO
tags is done too. That works somewhat differently inside when the source is the Gallery...
Next up will be .mp3L
s for RSS and AUDIO. That will require significant logic rejigging, and some testing that the achieved output (at ~48kbps nominal) is acceptable.
(Then will come .mp3
generation (at ~144kbps, from FLAC), with still more logic change and ear-based testing!)
2024-03-24: RSS Feed Files Update Less
I have adjusted the RSS feed files to be updated only when one of their HTML page source files does.
That also implies removing the channel lastBuildDate
, since it would simply change every time.
In its place goes a channel pubDate
with the timestamp of the newest HTML page source file.
The makefile
has been updated to have no direct dependencies on the .rss
files, but instead on .rss.built
files touched whenever a rebuild is attempted even if the .rss
does not change.
In part this is to give smart RSS readers a clue to poll less often, and to received more 304 Not Modified
responses when they do.
skipHours
I currently have skipHours
from 22h to 07h (UTC) inclusive, to cover times without incoming solar power for my server, and when I am likely to be asleep or at least less likely to be updating EOU.
It occurs to me that it would also be worth skipping 4pm to 7pm local time to avoid peak grid demand hours, typically also high carbon-intensity, at least towards this end of the connection even if some visitors are not in the UK.
Allowing for winter and summer time that could be another block of skipHours
from 15h to 18h (UTC) inclusive. That would bring us to a total of 14 skipped hours.
Not that anything is paying attention to them at all so far...
: there were 904 HTTP log entries (eg GET
or HEAD
) for /rss/podcast.rss
.
2024-03-23: Time Cues in Transcripts
In order to make transcripts slightly easier to absorb, I am making time cues just a little less salient, but still accessible. I have slightly reduced opacity from a default:
to an opacity of 0.6:
It is subtle, but I think that it helps.
For visitors from the far future, this is the current cuetime
styling in case changed:
For now I have put this in its own CSS file, explicitly imported by the few (~10) files that use the cuetime
style.
Bibliography haircut
With a view to making the 'lite' bibliography smaller, and being able to produce less huge desktop bibliography pages too, I have started chopping out less-important data and metadata from the generated HTML.
from:
498811 bibliography.html 482031 m/bibliography.html 65001 bibliography.htmlgz 59569 m/bibliography.htmlgz 50099 bibliography.htmlbr 45788 m/bibliography.htmlbr
to:
498788 bibliography.html 259093 m/bibliography.html 64977 bibliography.htmlgz 50124 bibliography.htmlbr 47161 m/bibliography.htmlgz 38086 m/bibliography.htmlbr
2024-03-07: Podcast Feeds and Transcripts, Opus Audio
Apple now supports podcast transcripts. Apparently transcripts are generated automatically by default, by Apple, but a podcast:transcript
tag can point to a pre-made transcript. This can be in .vtt
(WebVTT) format, eg as I have already provided for the OpenTRV movie mashup video episode at img/video/OpenTRV/OpenTRV-mashup-1.mp4.vtt
. Where such a file exists, it is now being added this way to the RSS feed file.
A new namespace has been added to the RSS file to allow this tag: xmlns:podcast="https://podcastindex.org/namespace/1.0"
.
An HTML-ish format is also allowed, though the use of the time
tag is not compatible with normal HTML5. Still, for where that is not a conflict, and as a little preparation, I have now added an ID mo-1-transcript
to the episode's HTML page tag containing the transcript text, so it may be possible to link to that.
Adding .vtt
transcripts where there was no full text transcript should help search engines. The cue time points should also help with accessibility (a11y) and usability generally.
It may be worth running a low-fidelity transcription just to get timing points for the HTML transcript.
2024-03-08: Transcobble and a11y
Using Transcobble local transcription in the browser, and a hacked together awk
script, I have now added significant transcripts both as .vtt
files (auto-linked into the RSS feed and the HTML pages now), but also in the body of the pages concerned. Hurrah!
Five podcasts now have .vtt
captions files.
I should possibly generate these timestamped .vtt
transcripts for all new episodes and some extant ones, to help accessiblity, even where I have the words already, eg when I read from a script. I could copy over some of the key timestamps from .vtt
to the HTML also. AI making things better!
Steno.fm
displays transcripts with highlighted sections as one listens...
That is a few years-old to-do items crossed off!
Observations after completing the transcripts for all 60 extant episodes (yesterday, ).
- A common failure mode of Transcobble / Whisper is to without indication omit a chunk of text entirely, from a few words to a few sentences.
- It can also get 'stuck' after some interesting non-speech sounds.
- It is not entirely consistent, eg sometimes transcribing the very same lossless audio clip as
Earthnotes
orEarth Notes
.
2024-03-09: CORS
Testing https://www.earth.org.uk/rss/podcast.rss
with the CORS Tester says that This URL will not work correctly with CORS
.
Apparently I should add the header access-control-allow-origin
with value *
for at least that RSS file, so hereby new configuration:
<Location /rss> # Give podcast RSS and similar feed files an expiry time of 1h. ExpiresDefault "access plus 1 hour" # Allow CORS to work. Header set access-control-allow-origin * </Location>
And now This URL will work correctly with CORS.
Hurrah!
(CORS issues may explain why I have not been able to see captions when viewing videos in my pages in the filesystem.)
All transcript .vtt
files need the CORS treatment too.
I have also added podcast:location
to the RSS feed file.
I am also updating to allow text/vtt
/ .vtt
files to be automatically offered DEFLATEd / GZIPped.
As of this evening both video episodes and five audio have WebVTT transcripts.
2024-03-11: extending expiry time overnight
Since most of the feed files (eg the podcast RSS) will only update when I am at the keyboard updating something, it should be possible to set a longer expiry time for them at night, eg:
<If "%{TIME_HOUR} -lt 7 || %{TIME_HOUR} -gt 21"> # Give podcast RSS and similar feed longer expiry out of work hours. ExpiresDefault "access plus 3 hour 7 minutes" </If> <Else> # Give podcast RSS and similar feed files an expiry time of 1h. ExpiresDefault "access plus 1 hour 7 minutes" </Else>
This seems to work!
This may reduce futile polling by more sophisticated clients, and save a little energy and bandwidth. (I do not think that my Firefox "Brief" RSS plugin will take any notice.)
I could also force longer cacheing when system power status is LOW
.
I am also adding a ttl
(maximum time to live in a client cache), in minutes) of 367 (ie 6h7) to the RSS file, to see what difference that makes, if any!
Amazon was polling every ~3 minutes after a ttl
of 127 originally went in... (At least Amazon seems to be pulling it DEFLATEd/GZIPped.)
... 18.246.X.X - - [11/Mar/2024:17:17:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" 34.210.X.X - - [11/Mar/2024:17:23:16 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" 18.232.X.X - - [11/Mar/2024:17:25:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" 54.214.X.X - - [11/Mar/2024:17:29:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" 34.217.X.X - - [11/Mar/2024:17:35:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" 107.21.X.X - - [11/Mar/2024:17:38:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" 52.12.X.X - - [11/Mar/2024:17:41:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" 34.222.X.X - - [11/Mar/2024:17:47:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8303 "-" "Amazon Music Podcast" ...
I have used 2h7 to be distinct from the default RSS and ExpiresDefault
1h poll/expiry, and the ExpiresDefault
values above, and all the values are prime-ish to avoid clashes with other activity.
I can also try (hat-tip) the skipDays
and skipHours
tags, the latter being more directly relevant for a solar-powered system!
I am initially adding skipHours
for 00h to 07h inclusive, as likely to be quiet (no updates) and off-grid battery relatively low.
None of which has slowed down Amazon, it seems...
34.236.X.X - - [12/Mar/2024:06:25:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 34.222.X.X - - [12/Mar/2024:06:29:28 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 35.166.X.X - - [12/Mar/2024:06:35:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 44.211.X.X - - [12/Mar/2024:06:38:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 35.162.X.X - - [12/Mar/2024:06:41:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 35.92.X.X - - [12/Mar/2024:06:47:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 3.236.X.X - - [12/Mar/2024:06:51:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 35.87.X.X - - [12/Mar/2024:06:53:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" 44.242.X.X - - [12/Mar/2024:06:59:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast" ...
2024-03-12: alternateEnclosure
I have sent an email to Amazon, and a toot to OverCast, asking why no use is made of (eg) If-Modified-Since
and skipHours
to reduce bandwidth and (carbon) footprint. I may yet write to Apple which seems equally obvious to these signals.
2023-03-21: done: to Apple: Is there any way that I can set your RSS fetcher to honour Cache-Control (or Expires) and/or SkipHours? Currently it does not seem to.
2024-03-28: response from Apple: ... we do not provide any technical support for the implementation of your requested changes.
So I asked for contact details of their climate change director.
Meanwhile, I have added podcast:alternateEnclosure
to my podcast RSS to list the ("Low bandwidth") 'L
' version where available, which may help some end users save their bandwidth!
Note the suggested single line of code to produce a tiny 16kbps Opus (audio/opus
) [valin2013high] version:
ffmpeg -y -i input.wav -c:a libopus -ac 1 -b:a 16k output.opus
On a couple of sample files it sounds acceptable, and I have captured one, so I am now providing it in general in the RSS and via standard AUDIO
and VIDEO
tag links also.
To support this, I have generated .opusL
files for all podcast episode audio and video, and checked them in.
That gives another useful 3x reduction step in file size / bandwidth, eg:
17290019 img/audio/diary/20240128.flac 4859775 img/audio/diary/20240128.mp3 1515255 img/audio/diary/20240128.mp3L 562889 img/audio/diary/20240128.opusL
A ~2kB/s (nominal 16kb/s) Opus file is ~1% of the size of the nominal 48ksps 16-bit stereo uncompressed (eg WAV) (~192kB/s) file that it encodes.
2024-03-13: ByteDance
And the first spider to pull down a .opusL
file is ... TikTok / ByteDance! Then Yandex and YaCy...
[13/Mar/2024:01:02:47 +0000] "GET /img/audio/meta2/meta2.opusL HTTP/2.0" 200 557403 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)" ... [13/Mar/2024:14:39:33 +0000] "GET /img/video/20201112/20201112-EcoHomeLab-talk-on-smart-thermostatic-radiator-valves-TRVs.opusL HTTP/1.1" 200 2121570 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" [13/Mar/2024:14:43:32 +0000] "GET /img/video/OpenTRV/OpenTRV-mashup-1.opusL HTTP/2.0" 200 190720 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)" [13/Mar/2024:14:44:44 +0000] "GET /img/audio/statscast/statscast-202004.opusL HTTP/2.0" 200 1332099 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)" [13/Mar/2024:14:48:35 +0000] "GET /img/audio/mkaudio/battery-sounds.opusL HTTP/2.0" 200 322273 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)" [13/Mar/2024:14:58:47 +0000] "GET /img/audio/statscast/statscast-202005.opusL HTTP/1.1" 200 1355681 "https://www.earth.org.uk/statscast-202005.html" "yacybot (/global; amd64 Windows 10 10.0; java 1.8.0_401; America/en) http://yacy.net/bot.html"
Bots/spiders including Google have made an appearance by the end of the day, but no actual humans/browsers other than me so far as I can see.
(2024-03-19: there is some slight evidence in the logs of a human user of one of the .opusL
files today, via an explicit download link, for OpenTRV-mashup-1.opusL
; hurrah!)
2024-03-14: lite Opus
The savings from Opus are so good, and the fidelity still good, that I am making it the default for non-desktop (eg mobile/lite) AUDIO
when available.
CanIUse reports ~97% browser support.
To allow the Firefox and Chrome audio
tag to play these Opus files I have had to change the declared MIME type to be audio/ogg
and indeed more specifically I have used audio/ogg;codecs=opus
. (They are Opus in an Ogg container.)
There's now often nearly 1:100 size ratio from lowest-fi .opusL
up to lossless .flac
. The order etc of the downloads after AUDIO
and VIDEO
tags should be tidied up to be clearer and in order and help users pick the right one. Redo to be smallest to largest, and with an indicator (eg meter
) of how big they are with lowest-fi 1/5 to highest 5/5, eg a logarithmic view. The default for the current view could be highlighted (eg bold).
As of ~1pm I have something for video that on a desktop page looks like:
As of ~4pm and with the 'standard' object size normalised to half scale, and the scale linearised:
I have also written a slightly more stern rebuff for badly-behaved RSS fetchers:
<If "%{TIME_HOUR} -lt 8 || %{TIME_HOUR} -gt 21"> # Give podcast RSS and similar feeds longer expiry out of work hours. ExpiresDefault "access plus 3 hours 7 minutes" # For RSS files (which will have skipHours from 0 to 7 inclusive), # if there is no Referer and no conditional fetching, back off! RewriteCond %{HTTP_REFERER} ^$ RewriteCond %{http:if-modified-since} ^$ RewriteCond %{http:if-none-match} ^$ RewriteRule "^/rss/.*\.rss$" - [L,R=429] </If>
In the wee hours of the morning when generally the feeds do not update, and for which some of the feeds also have explicit skipHours
, then no Referer
and no attempt to do a conditional fetch (ie only getting the feed file if it has actually changed), will result in a 429
status code: Too Many Requests
.
2024-03-15: tweaked configuration
That did not work: another go!
<Location /rss> # Allow CORS to work. Header set access-control-allow-origin * <If "%{TIME_HOUR} -lt 8 || %{TIME_HOUR} -gt 21"> # Give podcast RSS and similar feeds longer expiry out of work hours. ExpiresDefault "access plus 3 hours 7 minutes" # For RSS files (which will have skipHours matching the above), # if there is no Referer and no conditional fetching, back off! RewriteCond %{HTTP_REFERER} ^$ RewriteCond %{HTTP:If-Modified-Since} ^$ RewriteCond %{HTTP:If-None-Match} ^$ RewriteRule "\.rss$" - [L,R=429] </If> <Else> # Give podcast RSS and similar feeds an expiry time of 1h. ExpiresDefault "access plus 1 hour 7 minutes" </Else> </Location>
I have seen no sign of a human using Opus audio from the RSS podcast feed yet. I have changed the declared MIME type in the RSS feed of the Opus files to audio/ogg
in line with the change made for Firefox and Chrome to be able to play them in VIDEO
and AUDIO
tags.
Oops: it seems that Location
and rewrite rules do not play nicely: Although rewrite rules are syntactically permitted in <Location> and <Files> sections (including their regular expression counterparts), this should never be necessary and is unsupported.
It seems to work for now, but maybe I will have to move the rewrites out of the Location
block and adjust the RewriteRule
to start with /rss
?
With all that in mind, some new Apache config:
# Allow CORS to work for RSS feeds and transcripts. # This allows browsers to access them from non-EOU pages. <IfModule mod_headers.c> <FilesMatch "\.(rss|vtt)$"> Header set access-control-allow-origin * </FilesMatch> </IfModule> <If "%{TIME_HOUR} -lt 8 || %{TIME_HOUR} -gt 21"> # Give podcast RSS and similar feeds longer expiry out of work hours. ExpiresByType application/rss+xml "access plus 7 hours 7 minutes" # For RSS files (which will have skipHours matching the above), # if there is no Referer and no conditional fetching, back off # when battery is low. RewriteCond %{HTTP_REFERER} ^$ RewriteCond %{HTTP:If-Modified-Since} ^$ RewriteCond %{HTTP:If-None-Match} ^$ RewriteCond /run/EXTERNAL_BATTERY_LOW.flag -f RewriteRule "^/rss/.*\.rss$" - [L,R=429,E=RSS_RATE_LIMIT:1] Header always set Retry-After "25620" env=RSS_RATE_LIMIT </If> <Else> # Give podcast RSS and similar feeds an expiry time of 1h. ExpiresByType application/rss+xml "access plus 4 hours 7 minutes" </Else>
2024-03-16: Amazon slower
To avoid conflict between the 10 skipHours
(22h to 07h inclusive) and the ttl
, I have pushed the ttl
to 1507, ie ~25h. (Not exactly 24h, so as to help spread load around the day over time.)
When being knocked back with a 429
status, Amazon reduces polling to about once every 30 minutes rather than 3, so a ~10-fold reduction. It would just be better if the cache control (etc) was followed and If-Modified-Since
was used properly. Other clients apparently do.
52.38.X.X - - [15/Mar/2024:22:04:20 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast" 44.200.X.X - - [15/Mar/2024:22:11:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast" 35.88.X.X - - [15/Mar/2024:23:05:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast" 54.167.X.X - - [15/Mar/2024:23:19:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 429 436 "-" "Amazon Music Podcast"
In that 2h slot at the end of yesterday there were apparently 332 RSS fetches, vs 383 in the previous 2h, and 730 for 10:00 and 11:00. 5133 for the whole day.
Way more than really makes sense given that I suspect I have few RSS followers and essentially no RSS podcast listeners.
I could to add a Retry-After: 11220
or similar header to the 429
response, maybe converting the RewriteRule
to:
RewriteRule "\.rss$" - [L,R=429,E=RATE_LIMIT:1] Header always set Retry-After "11220" env=RATE_LIMIT
Which results in:
% wget -S -O /dev/null https://www.earth.org.uk/rss/podcast.rss --2024-03-16 13:55:58-- https://www.earth.org.uk/rss/podcast.rss Resolving www.earth.org.uk (www.earth.org.uk)... 79.135.97.78 Connecting to www.earth.org.uk (www.earth.org.uk)|79.135.97.78|:443... connected. HTTP request sent, awaiting response... HTTP/1.1 429 Too Many Requests Date: Sat, 16 Mar 2024 13:55:58 GMT Server: Apache Retry-After: 11220 Content-Length: 227 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 2024-03-16 13:55:58 ERROR 429: Too Many Requests.
A normal response looks like (not showing the body):
HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Sat, 16 Mar 2024 14:03:10 GMT Server: Apache Upgrade: h2 Connection: Upgrade, Keep-Alive Last-Modified: Sat, 16 Mar 2024 13:34:51 GMT ETag: "ed04-613c7305e7ab1" Accept-Ranges: bytes Content-Length: 60676 Vary: Accept-Encoding,Referer Cache-Control: max-age=4020 Expires: Sat, 16 Mar 2024 15:10:10 GMT X-Frame-Options: DENY access-control-allow-origin: * Keep-Alive: timeout=5, max=100 Content-Type: application/rss+xml Length: 60676 (59K) [application/rss+xml]
I am also adding these top-level fields from the http://purl.org/rss/1.0/modules/syndication/
namespace:
<sy:updatePeriod>monthly <sy:updateFrequency>1
And:
<podcast:updateFrequency rrule="FREQ=MONTHLY">Monthly</podcast:updateFrequency>
Given that at least one RSS client claims to obey HTTP cacheing headers but seems not to use If-Modified-Since
(or If-None-Match
), I am increasing the 'normal' expiry time to 4h7 and the during-skipHours
expiry time to 7h7 (the latter being more than half the skipHours
blackout period of 10h).
skipHours
expiry time to 10h7, so as to push the next allowed poll out of skipHours
. 2024-01-17: better audio download list and MIDI
I have now updated the way that audio downloads are listed (under the audio player widget) to approximately match what I did with the video player:
For a couple of the podcast 'music' episodes in particular where I have a MIDI 'source' file available, that now appears in the download list.
I have also, in the RSS part of the EOU Apache site configuration, inserted:
RewriteCond /run/EXTERNAL_BATTERY_LOW.flag -f
just above:
RewriteRule "\.rss$" - [L,R=429,E=RSS_RATE_LIMIT:1] Header always set Retry-After "11220" env=RSS_RATE_LIMIT
so no 429
s are sent unless the battery is LOW. All the other hints and entreaties continue to be sent!
2024-03-28: SpaceCowboys
I received a meaningful response from the "Feeder" Android RSS reader author to my suggestion Have you considered support for these (RSS-feed-specified SkipHours tag, and server-supplied HTTP expiry time/date) to reduce bandwidth and CPU?
:
regular http cache-control is already supported.
what's skiphours?
I pointed him at the skipHours
definition in the spec. He noted that:
Regarding skipHours, any implementation would result in stochastic behavior for users. The feature was designed for servers which can pick when they sync, but Feeder is not not in control of when its background sync runs. This is determined by Android.
A thought: maybe during skipHours you could avoid actually doing a poll when woken when there have been no non-skipHours since your last poll. The source is telling you that you will not (likely) have missed any change in that time.
2024-03-05: HTML Micro-optimisation
In order to make pages work properly on the m-dot
domain, www
domain, and locally in the filesystem (and offline), and the http:
and https:
online variants, and have the unprocessed source HTML be valid and usable, I have been replacing a prefix for non-top-level-page objects such as data/...
in the source of //WWW
with eg //www.earth.org.uk/
for m-dot
pages. For desktop pages, so as to work off-line that replacement has been ./
so as to keep paths relative.
That waste of two bytes in most cases has been an annoyance. It has to be there for syntactic correctness when there is nothing after the prefix, eg a href=//WWW
becomes a href=./
for desktop and a href=//www.earth.org.uk/
for m-dot
.
I have now tweaked things that when there is something starting with [0-9a-zA-Z_]
after the prefix then for desktop pages the prefix can be removed entirely. A tiny 'minification'!
(This whole thing applies to //STATIC
URLs too!)
I will have forgotten some subtle constraint I am sure, but the main HTML validates and looks OK...
2024-03-04: METERCHANGE
I am hoping to extract more from the data that I already have, such as allowing data analysis to be able to work through meter changes, eg extend current analyses back further.
Also, if we get a heat pump I would at least like the ability to cope gracefully with removal of a gas meter entirely.To this end I have added added a new METERCHANGE
data record to my main 'weekly'(ish) data set.
First I will apply the notion to my 'yearly' data file and deltas. For every non-empty meter field it is an adjustment to accumulate. It is the final (old) value from the old meter minus the start (new, often near-zero) value for the new meter. The accumulated adjustments should be applied to each subsequent reading to make a new continuous 'virtual' reading. (An all zeroes METERCHANGE
record has no effect, and an empty adjustment field is equivalent to a zero field.)
My excitement is tempered from realising that the electricity import meters were unratcheted until (thus running backwards for exports, though did directly reflect net flow), and the gas meter being (100) cubic feet until .
Any new meters are at least likely to be in the same units (kWh and m^3), and will never run backwards!
I can reasonably push the 'weekly' data back to when the electricity import/export pair was installed, I have gas (m^3) and the first generation meter values for then.
References
- [barbaresi2021trafilatura] Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction
- [RAB2009RSS] RSS 2.0 Specification
- [valin2013high] High-quality, low-delay music coding in the opus codec
(Count: 3)