Earth Notes: On Website Technicals (2020-11)

Updated 2024-04-17.
Tech updates: work storage, Let's Encrypt auto-renew, lazy wins, slow https switch, AMP https only, soft canonical, Apple touch, Apache stop, ad sub.
Continuing to slowly shift functions from the old RPi to the new one this month, and enjoying how lazy pays off in unexpected ways...

2020-11-20: Ads Subtracted

I am tweaking to reduce blank spaces and pointless page weight for pages where Google won't show ads because of low traffic.

I have simplified the logic for desktop to be the same as AMP, ie to insert ad code only on pages with at least a specified popularity by page hits.

My expectation is that only a small fraction of (desktop and AMP) pages will now carry ad code, but effective site-wide RPM and earnings should be only slightly reduced.

AMP and desktop pages with ads are periodically rebuilt to purge ad weight from those that are no longer popular enough.

After a full rebuild , the count of (eg) desktop pages carrying ads fell to under 70 from more than 210, from nearly 300 candidate pages.

Very early indications () are that views of pages with ads have dropped about 20% against the 3-fold reduction in pages carrying ads. I can move the threshold for pages to carry ads or not to adjust this balance.

2020-11-17: Old Apache Stop

I have turned off the Apache2 instance running on the old RPi2, since it is not now running any material static site.

# /etc/init.d/apache2 stop
# update-rc.d apache2 disable

A quick attempt to contact one of the residual services now hangs/fails, correctly.

After a reboot Apache is still not responding, correctly.

Note that netstat does show the servlet-based listeners still.

2020-11-16: Apple Touch Icons

apple touch icon 120x120

I get the occasional blast of requests from an Apple device like so:

www.earth.org.uk:80 "GET /apple-touch-icon-120x120-precomposed.png HTTP/1.1"
www.earth.org.uk:80 "GET /apple-touch-icon-120x120.png HTTP/1.1"
www.earth.org.uk:80 "GET /apple-touch-icon.png HTTP/1.1"

Sometimes a 152x152 icon is requested.

This happens (I think) when EOU is added to an i-device's homescreen.

So using ManyTools' Apple-touch-icon generator in this case, I created a set of icons, I then svn cped down to the desktop root the 60x60, 120x120 and 152x152 versions (generated as apple-touch-icon-iphone-60x60.png, apple-touch-icon-iphone-retina-120x120.png, apple-touch-icon-ipad-retina-152x152.png) where they are usable by default, and avoiding multiple copies of the pixels in the repo.

apple-touch-icon.png
apple-touch-icon-120x120.png
apple-touch-icon-152x152.png

Before putting them in the repo I reduced their weight with zopflipng -m -m.

Possibly a visit to tinypng.com first would have been even better!

I may copy them to the AMP root too, though probably not to the lite/m-dot to avoid incurring extra bandwidth (and storage) costs for users, for just a little bit of eye-candy.

2020-11-15: WWW Soft Canonical

Without adding any overhead (eg extra headers) to normal connections, but to gently redirect spiders to the WWW https versions of most files, I have inserted the following early config for www.earth.org.uk.

# Redirect most Referer-less http accesses to https.
# Aim to gently redirect spiders to canonical https for most content.
# Avoid redirecting (top-level) HTML files that contain own rel=canonical,
# so users directly choosing http can stay on http.
# Avoid breaking the LE ACME challenge.
# Use a 302 (temporary) redirect, for now.
RewriteEngine on
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{REQUEST_URI} !^/\.well-known/
RewriteCond %{REQUEST_URI} !^/[^/]*\.html$
RewriteCond %{REQUEST_URI} !^$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^/(.*)$ https://www.earth.org.uk/$1 [L,R=302]

Currently there are slightly more http than https WWW requests, by ~10%.

2020-11-12: AMP HTTPS Only

Today I am making amp.EOU https only, eg by redirecting http to https.

Since AMP users are already paying the overhead of the AMP JavaScript, etc, the bandwidth- and latency- saving aspects of http are likely less important to them. They may likely already have being pushed to the AMP page via https-based search and AMP cache.

Making the http option disappear should save a little crawl bandwidth but spiders. About one quarter of AMP page hits are currently http.

Then I will be down from serving 6 variants of each page — www, m, amp for each of http and https — to 5. It's a start.

There's a little wrinkle to avoid interfering with Let's Encrypt auto-renew.

RewriteEngine on
RewriteCond %{HTTPS} off
RewriteCond %{REQUEST_URI} !^/\.well-known/
RewriteRule ^/(.*)$ https://amp.earth.org.uk/$1 [L,R=301]

I also aim to add a couple of other tweaks, eg that make my security rating in WebPageTest better than the current "F"!

The biggest complaint from WebPageTest is fixed by adding the header Strict-Transport-Security: max-age=31536000 which should force a browser to use https for amp.EOU for a year. This raises the WPT security rating for the home page from "F" to "E".

Adding the header X-Frame-Options: DENY, which I already use for the desktop site, improves the security score to "D", but seems to stop images loading in Firefox (though not Chrome Canary). The header is apparently effectively obsoleted by the Content-Security-Policy header though. Given all that, it's not staying!

I note that https://www.theguardian.com/uk includes these headers:

  • x-frame-options: SAMEORIGIN
  • content-security-policy: default-src https:; script-src https: 'unsafe-inline' 'unsafe-eval' blob: 'unsafe-inline'; frame-src https: data:; style-src https: 'unsafe-inline'; img-src https: data: blob:; media-src https: data: blob:; font-src https: data:; connect-src https: wss:; child-src https: blob:; object-src 'none'; base-uri 'none'
  • referrer-policy: no-referrer-when-downgrade
  • strict-transport-security: max-age=31536000; includeSubDomains; preload
  • x-content-type-options: nosniff
  • x-xss-protection: 1; mode=block

It seems as if x-xss-protection is also only for older browsers, and I should concentrate my efforts on crafting content-security-policy.

Part may be Referrer-Policy: origin-when-cross-origin, or the equivalent via content-security-policy.

Another part may be script-src https://cdn.ampproject.org:* to let the AMP scripts run, though that may not let Google ads run.

I do use a little inline CSS to keep the header and CRP (Critical Rendering Path) small, which implies something like style-src 'unsafe-inline' which weakens the whole mechanism. Maybe I should wean myself off local CSS in all critical cases instead.

2020-11-11: Slow Switchover

I made the https desktop pages canonical (rather than http) 2020-09-21.

As well as the slow switchover of entries in the https-canon sitemap, there have been interesting glitches such as doubling-up in some of the items (http and https) for GSC enhancements such as breadcrumbs, and bizarre double entries (ie the same item listed twice) in AMP. The latter may be because I am now redirecting http AMP to https...

Screenshot 20201101 sitemap xml coverage after switch to https canonical
GSC coverage graph for https sitemap.xml at .
20201111 screenshot GSC sitemap xml coverage recovering after canonical switch to https
GSC coverage graph for http (not https) sitemap.xml at ; note the drop to zero 'valid' pages at switchover of canonicals to https, and the slow recovery.
20201202 screenshot GSC sitemap xml coverage recovering after canonical switch to https
GSC coverage graph for https sitemap.xml at ; now up to 84 'valid', 191 excluded because "Duplicate, submitted URL not selected as canonical".
20201212 screenshot GSC sitemap xml coverage recovering after canonical switch to https
GSC coverage graph for https sitemap.xml at ; now up to 95 'valid', 181 excluded because "Duplicate, submitted URL not selected as canonical".
20201223 screenshot GSC sitemap xml coverage recovering after canonical switch to https
GSC coverage graph for https sitemap.xml at ; now up to 103 'valid', 174 excluded because "Duplicate, submitted URL not selected as canonical". Impressions also shown.
20201230 screenshot GSC sitemap xml coverage recovering after canonical switch to https
More than half-way! GSC coverage graph for https sitemap.xml at ; now up to 141 'valid', 139 excluded because "Duplicate, submitted URL not selected as canonical". This process has been sped up with careful use of the reinstated GSC 'Request Indexing' feature for https pages marked as non-canonical.
20210124 screenshot GSC sitemap xml coverage recovering after canonical switch to https
The last few are proving very stubborn: I wonder from other things that I have seen if Google is having difficulty ingesting page updates... GSC coverage graph for https sitemap.xml at ; now up to 270 'valid', 13 excluded because "Duplicate, submitted URL not selected as canonical" (11) and "Crawled - currently not indexed" (2). This process has been sped up with more GSC 'Request Indexing'.

(I removed from the sitemap an XHTML page that is not canonical.)

20210131 screenshot GSC sitemap xml coverage recovering after canonical switch to https
GSC coverage graph for https sitemap.xml at ; now up to 280 'valid', 2 excluded because "Duplicate, submitted URL not selected as canonical". Still being nudged along with more GSC 'Request Indexing' and manual page tweaks/improvements.
20210204 screenshot GSC sitemap xml coverage recovering after canonical switch to https
And we're done after a little over 4 months! GSC coverage graph for https sitemap.xml at ; now up to 283 'valid' (0 excluded/warnings).

2020-11-05: Lazy Wins

Lazy loading seems to win in two ways. Reducing bandwidth is the obvious one, but also in reducing initial visible page rendering time even when not.

So, for example, on the home page, Chrome does not avoid loading any images because even the ones below the fold are not far enough below. But it seems that by letting Chrome concentrate on the important bits above the fold, initial layout is faster. Firefox manages to save bandwidth too by avoiding loading several images for the initial view. Those images will never be loaded if the visitor does not scroll down; even if they do, load on the server is spread out.

Here are three simple scenarios, all from WebPageTest instances in London, all over HTTP/2 (ie one TCP connection) to https://www.earth.org.uk.

Note that the bandwidth limits were not identical across all runs, but higher bandwidth does not beat the advantages of lazy loading!

Chrome without Lazy Loading

For this run all loading=lazy attributes were manually removed from the HTML.

20201105 screenshot WebPageTest home page https Chrome no lazy load
Chrome not lazy, 55kB total download, speed index 500ms.

Chrome with Lazy Loading

20201105 screenshot WebPageTest home page https Chrome lazy load
Chrome lazy, 51kB total download, speed index 400ms.

Firefox with Lazy Loading

20201105 screenshot WebPageTest home page https Firefox lazy load
Firefox lazy, 21kB total download, speed index 400ms.

2020-11-02: Let's Encrypt Auto-renew Un-snagging

Amongst ignorable email complaints from Let's Encrypt I received a worrying one that implied that the actual EOU TLS certs were going to expire.

Looking in the Apache logs I could see redirects and errors during the http-01 challenge for the amp. and m. sites. The two sites fairly aggressively redirect to www. anything that does not look like a top-level HTML page (or script or favicon, etc). That breaks the GET /.well-known/acme-challenge/.... So I put in special-case fixes to not rewrite/redirect any such requests.

Having done that, renewal succeeded by manually running:

% sudo certbot renew

With a fair wind behind, auto-renewal should "just work"!

(Note from future me in : it does!)

2020-11-01: More Work Storage

(See previous work storage note and next.)

I have made a few more tweaks, eg to stop almost all DAILY and WEEKLY periodic updates when battery is LOW or below.

A few more tweaks up to and on mean that almost no periodic page rebuilds will happen unless the sun is out. Nor will changing the build scripts force a rebuild in the absence of sunshine and a decent state of battery charge.

(: Pleasingly, with the battery VLOW from several dark foggy days in a row, no pages were rebuilt at all , not even the stats page. Good dynamic conservation response from work storage!)