Earth Notes: On Website Technicals (2021-07)
Updated 2024-06-30.2021-07-31: AMP Ate My Hamster
The page-experience / CWV Google Search Console oddness continues!
The Page Experience headline is currently Your site has 100% URLs with a good page experience
. But this has been flip-flopping daily, and the graphs are very odd!
2021-07-27: AMP Be Going?
GSC is this afternoon reporting the AMP page count back at 9!
But from yesterday "page experience" being all green, I am back at: Your site has no URLs with a good page experience
.
The page experience graph simultaneously says 100% of my URLs are good, while saying that I only had 5 Total impressions from good URLs
.
2021-07-23: AMP BE GONE
GSC is this afternoon reporting the AMP page count up from 9 to 10!
At least some of these seem to be where Google has decided that the HTTP version of the full-fat desktop www page is canonical, rather than the HTTPS version.
2021-07-22: Yak Shaving
I am shaving a few more bytes from error pages such as 404. That reduces (marginally) the tax from broken bots and spiders and rogues scanning the site for vulnerabilities.
In the first instance I am dropping specialist 'print' media tweaks. Who is printing the 404 page and thus who cares?
I am also dropping the schema.org
metadata, since nothing should care about it either in such noindex
pages.
Before:
% ls -fl {,m/}404.html{,gz,br} 1536 404.html 774 404.htmlgz 564 404.htmlbr 1234 m/404.html 660 m/404.htmlgz 455 m/404.htmlbr
So far (65+ bytes or >10% saved for HTTP/2 Brotli-supporting clients):
% ls -fl {,m/}404.html{,gz,br} 1293 404.html 670 404.htmlgz 486 404.htmlbr 991 m/404.html 554 m/404.htmlgz 390 m/404.htmlbr
Good and Bad
GSC is still stubbornly reporting 9 AMP pages. Meanwhile GSC has decided that almost none of my page impressions is 'good'. But as of this evening Your site uses HTTPS
, and I have an overall good page experience.
2021-07-18: Little Save-Data
I looked for for Save-Data: on
requests by checking for input='on' pattern='on'
in the ErrorLog
overnight, having turned on rewrite logging:
LogLevel alert rewrite:trace6
There was evidence of just one genuine third-party request, for an embedded intensity button in my profile at another site (Fieldlines). Not apparently a single direct article view though.
I was able to trigger such requests via WebPageTest.
2021-07-17: AMP Gone
GSC is reporting AMP impressions down from ~1000 per day to under 70.
As of this I am going to manually remove all the generated AMP pages. Nothing still seems to be (accidentally) updating them, so they should stay gone.
Note that I have to leave a home page (index.html
) in place for AMP for now to avoid a bare directory listing. It is marked 'noindex
'.
If I make no further changes then AMP article requests will get 302 (temporary) redirects to appropriate m-dot pages. Possibly that should become 301 (permanent) at some point.
I have also added support for serving .webpL
files when there is a Save-Data
(on) header.
Lo-fi WebP: webpL
I have also attempted to make the Save-Data value match case-insensitive:
# Serve smaller images/audio for Save-Data clients if possible. # Ensure caches handle Save-Data correctly. <FilesMatch "\.(jpg|png|webp|mp3|mp4)$"> Header append Vary "Save-Data" </FilesMatch> # If client has Save-Data header set (to "on", case-insensitive). RewriteCond %{HTTP:Save-Data} on [NC] # ... and if the lo-fi image/audio exists... RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME}L -s # ... then send .xxxL content instead of .xxx hi-fi original. RewriteRule ^/(.+)\.(jpg|png|webp|mp3|mp4)$ /$1.$2L [L]
2021-07-13: AMP Downslope Momentum
I am looking for pages that are still showing as having AMP in GSC, and manually submitting reindexing requests. Only a handful, but it all helps.
I am trying to pick pages that Google is likely to be relatively slow to refresh, eg less-well-ranked ones.
2021-07-12: M-Dot Higherer
The GSC reports for crawling for m-dot pages, with latest data for the 9th, shows ~120 per day compared to 20-something on previous days. AMP crawling is below 20 per day. Overall (and www) crawl rates are fairly constant. So the bot has apparently switched most AMP crawl budget to m-dot.
The Page Experience report is showing the impressions of good URLs ticking up from a low on the 9th (56/d vs 756/d on the 2nd).
All this lines up with adjusting the header and navigation 'lite' page link now to point to https://m.
rather than http://m.
, as hoped!
Small even smaller
I am trying to reduce the overhead on small pages further, to be more like lite pages:
% ls -alS m/OpenTRV-protocol-discussions-201412-3.html 4095 m/OpenTRV-protocol-discussions-201412-3.html % ls -alS OpenTRV-protocol-discussions-201412-3.html* 5533 OpenTRV-protocol-discussions-201412-3.html 2053 OpenTRV-protocol-discussions-201412-3.htmlgz 1664 OpenTRV-protocol-discussions-201412-3.htmlbr
Avoiding 'extra' page image/video metadata and trimming invisible metadata precision for dates for small pages gets to:
% ls -alS OpenTRV-protocol-discussions-201412-3.html* 5076 OpenTRV-protocol-discussions-201412-3.html 1968 OpenTRV-protocol-discussions-201412-3.htmlgz 1593 OpenTRV-protocol-discussions-201412-3.htmlbr
Turning off SpeakableSpecification
support for noindex
pages gets close to brotli-compressed content being able to fit into a single TCP frame (~1400 bytes) like the 'lite' version:
% ls -alS OpenTRV-protocol-discussions-201412-3.html* 4839 OpenTRV-protocol-discussions-201412-3.html 1905 OpenTRV-protocol-discussions-201412-3.htmlgz 1524 OpenTRV-protocol-discussions-201412-3.htmlbr
A bit more trimming of metadata not needed when noindex
:
% ls -alS m/OpenTRV-protocol-discussions-201412-3.html* 4089 m/OpenTRV-protocol-discussions-201412-3.html 1617 m/OpenTRV-protocol-discussions-201412-3.htmlgz 1283 m/OpenTRV-protocol-discussions-201412-3.htmlbr % ls -alS OpenTRV-protocol-discussions-201412-3.html* 4751 OpenTRV-protocol-discussions-201412-3.html 1870 OpenTRV-protocol-discussions-201412-3.htmlgz 1496 OpenTRV-protocol-discussions-201412-3.htmlbr
~10% weight reduction for the maximally-compressed desktop page!
2021-07-11: M-Dot Higher
Searching on my mobile in Google for the same term that reliably brings up an EOU page prominently as a few days ago, now brought up the (HTTPS) m/lite page.
So Google is now maybe using https://m.
as the preferred target for mobile searches.
2021-07-10: WebP Footling
Much as I am keen to use JXL (JPEG XL), I am first going to have a go at using WebP as a more compact (lossless) alternate for hero PNGs.
Hero images are provided in picture
elements, and at a single resolution (though depending on wide/narrow viewport).
This enables folding in WebP versions of a PNG where smaller. I should fall back to PNG for older browsers, though most support WebP. It is even worth using an inline WebP image version, with a PNG out-of-line fallback, as few will need to fall back.
This should all only be done where the WebP image can be produced, and that saves many more bytes than the overhead of the extra HTML needed!
The following incantation seems to produce a smaller WebP than source PNG via ImageMagick on both Mac and RPi:
% convert train-fast.png -define webp:method=6 train-fast.webp % ls -al 26736 train-fast.png 22584 train-fast.webp
Sometimes convert
does better with -quality 100
also, but sometimes worse.
I should do this for the .pngL
lo-fi versions too.
In each case, let the PNG version determine inlining etc, but if a smaller WebP version is available, put that in ahead, and use the PNG as fallback.
I am having to add support for .webp
with MIME type image/webp
, and the .webpL
and .webpLL
suffixes too. Eventually the same Apache support to switch to the L
version with Save-Data
will need to be added.
This works in at least some (non-inlining) cases, but not for example when the source image is suitable (eg light enough) to use as-is.
It does potentially save a few hundred bytes for every single 'tools' hero load in these site-technicals pages...
% ls -al img/a/h/tools-1280w.l354283.* 2584 img/a/h/tools-1280w.l354283.640x80.l.png 2004 img/a/h/tools-1280w.l354283.640x80.l.png.webp 2043 img/a/h/tools-1280w.l354283.640x80.l.pngL 5399 img/a/h/tools-1280w.l354283.800x200.png 4486 img/a/h/tools-1280w.l354283.800x200.png.webp 4434 img/a/h/tools-1280w.l354283.800x200.pngL
All these results are on the Mac, where cwebp
is version 1.2.0. On the RPi server, with version 0.5.2, output file sizes are much larger, and so the WebP images are not being deployed.
Much better results seem to be happening on the RPi side (and marginally less good on the Mac) with:
cwebp -lossless -m 6 input.png -o output.webp
Which suggests that ImageMagick is, unusually, not helping...
Adding -q 100
adds effort, and should generally result in smaller files.
2021-07-09: M-Dot HTTPS
The dropping of AMP pages has slowed right down, presumably as the better-ranked ones have now been re-digested with 'noindex
'. I assume that others are polled/re-read less frequently, so there is likely to be a natural asymptotic-like decay. (Without me manually requesting/forcing re-indexing, anyway.)
I am not now going to wait for all the dust to settle: I am going to make the m-dot / lite official view HTTPS. The HTTP side will remain available, but I am going to see if GSC gets happier.
A fairly gloomy day so forced a rebuild of the desktop pages (then the rest) containing the navigation links to the now HTTPS m-dot variants.
I have also added a AMPDEPRECATED
flag to the makefile
to parallel the one in wrap_art
to gradually turn off parts of the AMP support. The first step was removing the references to AMP pages in sitemap.xml
, but I have removed most automatic AMP-page building. It's still possible to build AMP pages individually or en masse.
I note from the GSC crawl stats that the AMP page crawl rate more than halved on July 2nd to mid-30s pages per day.
2021-07-06: M-Dot Ascendant
Searching on my mobile in Google for a term that reliably brings up an EOU page prominently, whereas it always used to bring up the (https) AMP page, has now brought up the (HTTP) m/lite page. (I saw the (https) www page be brought up once, a couple of days ago.)
To make the warning about "too many http URLs" go away, I may have to link to lite and lite-http pages in the navigation bar, and make the https://m page the official alternate to the canonical. I will wait for the AMP page removal to settle before trying that...
2021-07-04: AMP Be Going
GSC and thus Google is quickly purging EOU AMP pages from its index, down to 99 reported cf 187 before starting, and against ~300 actual!
In the GSC page experience section I am getting a slightly alarming (red) Insufficient HTTPS coverage on your site
warning ... If your site has too high a ratio of HTTP URLs, you will see warning banner on your site, and the HTTPS section will show Failing.
This is possibly because I had 3 sets of HTTPS pages (www, m, amp) and two HTTP (www, m), but I am down down to 2 and 2.
2021-07-01: AMP Be Gone
Starting at about noon today I put the AMP-be-gone programme in place. Today's step is updating the page-build script to make all AMP pages as noindex
. Also, removing the explicit and header amphtml
cross-site links so as to orphan the AMP pages.
Just removed from the site guide:
As of December 2018 there is also an AMP site version, much like the mobile/"lite" version, but possibly faster for vistors from Google search for the first page at least, because of the AMP cache. Not every page can be reproduced for AMP because of restrictions that the AMP format imposes, and a few minor features may be missing from all AMP page versions.
I captured a screenshot to remind me of current crawl stats.