Earth Notes: On Website Technicals (2023-12)

Updated 2024-04-02.
Tech updates: ShellCheck, 503 work storage, Dataset citation glitch, 2024 already, video pages lost, ffmpeg, intensities, INTVKL.
I am into what I really hope is the final straight for my first academic paper... I also hope to replace the microSD card in the RPi that controls the Thermino this month: it is sitting on my desk glowering at me.

2023-12-29: INTVKL Live

National Grid announces commercial operations of Viking Link: a few days earlier than I expected!

2023-12-28: Grid Intensity Updates

The GB National Grid ESO does not seem inclined to publish a new set of 'static' intensity figures by country that GB interconnects too, eg for boostrapping a dynamic calculation.

One side-effect of that (and my calculation not seeing non-transmission-connected GB solar and wind) is that my GB grid carbon intensity value is typically 10–20% higher than National Grid ESO's.

And there is no extant weight for Denmark / INTVKL at all.

So instead I have turned to the Electricity Maps Data Portal and taken 2022 whole-year values for each country to use for 2024 onwards. In all cases other than France they are significantly lower than before; France is a little higher. I expect these values to bring my calculation closer to that of NG ESO, albeit mine are still static.

As intensities are generally falling, using a slightly old figure provides a slight margin and makes under-estimation less likely.

All the new figures are set to automatically roll in with the new year, eg:

intensity.fuel.INTVKL.2024/=0.126 # DK

The six relevant county values (IE and FR have multiple interconnectors) are:

# BE 2022 mean 0.123: Electricity Maps / ENTSO-E.
# DK 2022 mean 0.126: Electricity Maps / ENTSO-E.
# FR 2022 mean 0.062: Electricity Maps / ENTSO-E.
# IE 2022 mean 0.288: Electricity Maps / ENTSO-E.
# NL 2022 mean 0.262: Electricity Maps / ENTSO-E.
# NO 2022 mean 0.012: Electricity Maps / ENTSO-E.

2023-12-27: avconv est Mort, Vive le ffmpeg

Now libav/avconv are no longer supported by Homebrew on macOS, as abandoned.

I have reverted to using ffmpeg and done some minor initial fixes, but things will break and will need to be patched up.

(See previous fun.)

Similarly, UnCSS seems unsupported and is now spouting multiple DeprecationWarnings, and my PurifyCSS backups also seems to be unloved. So I am going to try PurifyCSS:

% purgecss --css my.css --content my.html | jq --raw-output '.[].css'

At the very least PurifyCSS versioning seems off: @5.0.0 reports itself as 4.1.3 and @4.1.3 reports itself as 4.1.2.

It also does not seem to be good at removing CSS the way that I am using it! So, cancelled for now...

2023-12-25: Video is Not the Main Content of the Page

GSC apparently demoted many sites' video pages, including EOU's two video podcast episodes.

GSC still knows that the videos exist, but will now no longer show the pages in video search results, claiming that Video is not the main content of the page.

As a small experiment to regain GSC's attention for these two, I have decorated with an id the principal video/audio object for podcast episodes, and pointed the otherwise-useless page mainEntityOfPage at it. I have also updated the target type from WebPage to MediaObject for such podcast episodes.

2023-12-27: clutter

Today I am tinkering, reducing some of the clutter and vertical space above podcast episode video/audio players in particular.

Now is good because the pending forced rebuild of everything due to changing the main wrapper script is still pending because of the gloomy weather; thus I can avoid all main pages being built twice for this GSC video tweak!

I am now suppressing the table of contents and 'See:' auto-inserted page link for podcast episodes.

I am also now repeating the up-to-section links to the bottom of the page, and combining multiple up-to-section tags into one, and suppressing the top set for podcast episodes.

I manually suppressed the hero image for the latest video podcast episode with the appropriate NOHERO directive, matching the older one.

These videos are now somewhat more in your face.

2023-12-29: GSC unconvinced

As of the 29th this update did not seem to have convinced GSC!

So for the (video, hero-suppressed) episodes I now suppress the by-line and updated line too.

I also shortened the titles/headlines so as not to wrap.

All this gets the video closer to the very top of the page.

I may yet suppress the page description or move it elsewhere for these.

2023-12-30: wider

GSC still seems unmoved.

I have allowed videos to break out of the page container and then grow to up to twice their natural width to help fill the viewport, still kept centred. The appropriate fullw CSS needs to be enabled.

This currently creates one ugly artefact: a horizontal scrollbar at the page foot. I ought to be able to kill this with overflow-x:hidden, but have not been able to do so yet.

Nope: Google "sidebar" ads float all over the video, yuck!

2024-01-02: smaller

Where NOHERO is flagged for a podcast episode, I am now dropping the page description (above the media), and have set the H1 headline font to be small on narrow screens.

(I may rescue and re-insert the pgdescription text at the foot.)

2024-01-04: NOADS

Next: trying disabling ads for the video podcast pages!

2024-01-11: keyword

I have added the word "video" to the title of one target page.

I have also wrapped most of the body text for those pages beyond the transcript into details.

I have also added a false requiresSubscription attribute, and a true isAccessibleForFree attribute, to each video and audio object metadata set.

I have also made it possible to wrap the transcript of a video (or audio) object in details to reduce visual clutter, and I have done this for the two target pages and one audio podcast episode.

I have dropped some optional footer material for all podcast episodes.

2024-01-18: [VIDEO]

I have appended the literal text [VIDEO] to the end of the name and description metadata. I have also added "video" before each generated instance of "clip". For balance and consistency, I am likewise adding "audio" before "clip" for those items.

2024-01-19: playsinline

I am adding the playsinline attribute to videos, since that reflects the intent, though does not prevent full-screen play.

2024-01-22: footer trim

Screenshot 20240123 GSC video not main content

Still no dice, so trimming the footer a little where NOHERO:

  • Omitting the "Page media" line.
  • Omitting the "This server is powered by off-grid solar PV" line.
  • Omitting the "Popularity rank" line.
  • Omitting the "Tags" line (and the section linking is now in the navigation only).
  • Omitting the 'revision' and 'rebuilt' and 'featured months'.
  • Omitting the word count when also a podcast episode.

2024-02-04: success!

As of this morning, GSC is reporting that yesterday it indexed one of the pages as a video page, it seems.

2023-12-19: Paper Submitted

My paper was submitted today, so expect different excuses for not getting things done, for a while!

2023-12-11: Earlier Every Year

GSC is complaining about some of my uploadDates for videos (some from 2007!) being just dates, as it now wants date-time values with a timezone! The schema allows the field to be Date or DateTime.

So in one place where I am extracting this largely-imaginary value from a file timestamp (which may or may not survive a server move), I have added T%H:%MZ to the end of the generation string. Where the time is not available I am making the time a nominal noon by adding T12:00Z; spurious precision and a waste of seven bytes...

Since I am updating this core script which will force all pages to rebuild, I updated the site/footer copyright date to 2024 to save a rebuild later. Jumping the shark, coming earlier every year, or prudent efficiency: you decide!

Uptime

The main server (sencha) up time reached 101 days. Then after some heavy-ish work including ansible stuff, it crashed!

2023-12-10: Dataset Citation Rollercoaster

Screenshot 20231210 GSC Datasets

The GSC Dataset stats have been strange! The initial leap from ~8 to ~30 was GSC taking all my bibliography Dataset citations as first-class entries. To make the red errors go away I supplied the description field that GSC was complaining about (as an abbreviated abstract) and provided reasonable coverage of license, creator and publisher in types it approved of. Which is all good, because I improved my page metadata and the source BibTeX and the processing of the latter to the former!

But dropping back to ~8 entries makes me think that someone at Google saw a monumental climb in Dataset entries, thought What have I done? and demoted citations of CreativeWorks again!

As of this morning, though, I can still find at least one in Google Dataset Search.

2023-12-12: this morning the count is back to 30...

2023-12-13: this morning the count is down to 9, not including any bibliography citation entries.

2024-01-07: having crept up to 11 yesterday, the count is now at 33, with complaints about missing licenses for bibliography citations again...

2024-01-28: Now showing as back down to 10 as of !

2024-01-30: Now showing as 33 as of !

2023-12-09: ShellCheck

I have tentatively started playing with ShellCheck to clean up some of my important shell scripts.

This has both been good and bad so far. Good in that nothing huge was found in the few small scripts that I pointed ShellCheck at, and I have made one or two suggested improvements. Bad in that a couple of the scripts that I poked seem to be broken and I cannot easily tell if that matters, ie if they are still doing anything important!

Work storage: refuse some HTTP requests when battery is low

Some HTTP requests are annoying because they likely come from spiders and are also likely wasteful or at least not urgent (and maybe could have been avoided entirely).

For example, image requests without a Referer might reasonably be deflected with a 5xx temporary failure code when we are short of (off-grid) energy, ie the battery state-of-charge is low.

This is a form of work storage or deferral until better times.

(See previous work storage note and next.)

A generic way to reject no-Referer requests is implied in the Apache documentation to reject hotlinking:

RewriteCond "%{HTTP_REFERER}" "!^$"
RewriteCond "%{HTTP_REFERER}" "!www.example.com" [NC]
RewriteRule "\.(jpg|png)$"    "-"   [F,NC]

For my purpose the response would most likely need to be replaced with a code such as 429 Too Many Requests (RFC 6585) or 503 Service Unavailable.

(It is a shame that there is not a more specific code for "Conserving resources [and your request may be frivolous]"!)

I also want to gate this code on the real-time battery state, such as the presence or absence of a flag file. Maybe something like this:

RewriteCond "%{HTTP_REFERER}" "^$"
RewriteCond /path/to/flag/for/low_battery.flag -f
RewriteRule "\.(jpg|png)$"    "-"   [R=503,L]

Here is a very close example (503 Temporarily Unavailable, with trigger):

RewriteEngine On
RewriteCond %{ENV:REDIRECT_STATUS} !=503
RewriteCond "/srv/www/example.com/maintenance.trigger" -f
RewriteRule ^(.*)$ /$1 [R=503,L]

As a test, I have inserted this for the Gallery:

RewriteCond "%{HTTP_REFERER}" "^$"
RewriteCond /run/EXTERNAL_BATTERY_VLOW.flag -f
RewriteRule "^/_exhibits/" "-" [L,R=503]

This could be set to trigger only for plain-file exhibits over a certain size.

It also might be good to shut out bots that fake the Referer, eg to be the same as the target URL.

Here is what a few match/deferral entries look like in the logs:

gallery.hd.org:80 X.X.X.X - - [09/Dec/2023:18:55:09 +0000] "GET /_exhibits/places-and-sights/_more2007/_more05/England-London-Docklands-Canary-Wharf-HSBC-tower-vertiginous-view-1-DHD.jpg HTTP/1.0" 503 473 "-" "-"
gallery.hd.org:80 X.X.X.X - - [09/Dec/2023:19:01:18 +0000] "GET /_exhibits/places-and-sights/Italy-Pisa-the-Leaning-Tower-BGK.jpg HTTP/1.1" 503 473 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"
gallery.hd.org:80 X.X.X.X - - [09/Dec/2023:19:04:44 +0000] "GET /_exhibits/medicine/_more2007/_more11/PICC-line-peripherally-inserted-central-catheter-on-arm-and-paraphernalia-syringes-sterile-solutions-etc-8-DHD.jpg HTTP/1.1" 503 473 "-" "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"
gallery.hd.org:80 X.X.X.X - - [09/Dec/2023:19:05:01 +0000] "GET /_exhibits/baby/_more2006/_more05/playtime-for-5-five-month-old-baby-girl-facial-expressions-concentrating-cheeky-attentive-face-closeup-mono-26-JR.jpg HTTP/1.1" 503 473 "-" "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"

The cost of the extra filtering, particularly testing for the filesystem flag, may outweigh the energy saved, at least with this simple formulation.

(I have added a filter on the request ahead of the flag test to minimise unnecessary filesystem access.)

2023-12-29: visible effect

Screenshot 20231229 5XX Crawl errors from Gallery

We have had some very gloomy days and the off-grid battery has been getting VLOW. The 503s seem to be working according the the GSC crawl stats graph...