Earth Notes: On Website Technicals (2020-09)

Updated 2022-09-18.
Tech updates: Brotli side, AMP https preferred, H2 oddity, anchor ads away, forever compression, canonical https www, 92222[2], GSC domain property.
The biggest change this month is preferring the https forms of AMP and desktop (www), and indeed making https://www.EOU canonical, though at a minimum ~100ms page rendering cost for UK-based visitors. All that and Brotli enabled for https pages!

2020-09-23: Domain Property in GSC

I have added the whole of earth.org.uk as a new-style 'domain property' in Google Search Console (GSC).

I note that:

  • crawl stats are not available for the new domain property
  • the domain performance graph looks somewhat different to the canonical property performance (possibly in part because of the move of some traffic to https)

The domain property performance graph shows ~500k impressions for the last year, with the http canonical showing about double that. Interesting the ratio for clocks is somewhat less... The graph shapes are sufficiently different that I think a chunk of data has gone missing from the domain stats.

2020-09-22: Cache-Control Simplification

I have simplified the config for the m. and amp. files. They now have a uniform Cache-Control for all files, set at ~1 day for amp. (to avoid larding on top of the AMP cache too much) and ~11 days for m. to minimise traffic even at the cost of being a bit more stale. Having the same value for all objects per site should maximise H2 (and H3) compression.

The Expires header is not used to reduce header size. (Accept-Ranges and Etag are also omitted.)

The actual Cache-Control max-ages used are 92222 and 922222 seconds for amp. and m. respectively for an efficient representation for H1, H2, and H3. (For H2 and H3 the static Huffman code sizes for the digits themselves are relevant.)

The 'public' value is unlikely to help overall, so is omitted.

2020-09-21: Canonical WWW Now HTTPS

MachMetrics https WWW Overview 20200926 screenshot

Here goes nothing!

I'm switching the canonical (desktop) site to be https. Let's see what happens...

I will not be astonished if there is some turmoil, maybe several weeks' worth! Let's hope it's all sorted by the winter solstice!

2020-09-22: I have updated my MachMetrics account to poll the https version of the desktop/canonical/www home page.

2021-01-29: the glacial switchover is still in train well through the turn of the year, months later!

2020-09-20: Forever Expiry Time 31536000s

Noting the special support for an expiry time of 31536000 seconds (365 days) to mean 'forever' in QPACK: Header Compression for HTTP/3 (H3), I am making that the 'forever' time for EOU (www/static) too.

I have also removed the Firefox-only 'immutable' from the Cache-Control header. (In fact Safari also supports 'immutable', I see.) Most browsers probably won't use it, it takes some space, and prevents use of the built-in H3 static header compression entry.

The magic config line (to exactly match H3 the static table entry) is now:

Header set Cache-Control "public, max-age=31536000"

2020-09-18: Anchor AutoAds off

A 90-day AdSense test finished today, which indicated that having a heavier-than-average ad load did not generate more revenue than a below-average ad load. So I'm back on the less-pushy below-average setting.

I also took the opportunity to turn off the 'anchor' ads that stick to the top of (desktop) pages when viewed on mobile. I find them distracting and a significant waste of screen real-estate.

Screenshot 20200918 Performance with scale

It's difficult to tell for sure, but I think that traffic from Google is coming off its peak of the last year or so, at least looking at the GSC performance graph. Impressions are good, but actual clickthroughs less convincing...

2020-09-17: H2 Strangeness

I tried out HTTP/2 Test: Verify HTTP/2 support. For https://www.earth.org.uk I get HTTP/2 protocol is supported and ALPN extension is supported. (Both are also supported over plain http, but no browser will make use of that in practice! For some reason although http AMP shows the same, http m says that both are unsupported, on https also: odd.)

It turns out there was some stray Let's Encrypt 'auto' config hanging around:

# ls -al /etc/apache2/sites-enabled/
m.earth.org.uk.conf -> ../sites-available/m.earth.org.uk.conf
m.earth.org.uk-le-ssl.conf -> /etc/apache2/sites-available/m.earth.org.uk-le-ssl.conf

Removing the link, and restarting Apache makes all well with the world, or at least well with the HTTP/2 test:

# rm /etc/apache2/sites-enabled/m.earth.org.uk-le-ssl.conf
# /etc/init.d/apache2 restart

A quick survey of the fraction of HTTPS connections using HTTP/2 suggests that it's about one quarter, ie three HTTP/1 for every HTTP/2. Note that Googlebot does not yet use HTTP/2 for example.

% egrep ':443 .* HTTP/1' /var/log/apache2/other_vhosts_access.log | wc -l
5934
% egrep ':443 .* HTTP/2' /var/log/apache2/other_vhosts_access.log | wc -l
1985

Towards the end of October the HTTP/2 fraction is about one third, ie about two HTTP/1 connections for every HTTP/2 connection.

2020-09-04: AMP HTTPS Now Preferred

As of today the preferred scheme for the AMP pages is https, though http will still be served and should be fully functional.

This means in practice:

  • Making the amphtml link in the header and AMP navigation link point to the https version.
  • Giving all the header inter-version and canonical and navigation links explicit http / https schemes.
  • Making redirects between versions appropriate in terms of schemes, retaining https as appropriate to avoid security snafus (principle of least surprise).
  • Using the correct scheme in robots.txt and sitemap.xml as appropriate.

It will take a little while to get all the wrinkles out!

In future, the m/lite preferred version is likely to remain http for speed, and the www/desktop preferred version become https for a small SEO boost.

2020-10-28: HTTPS Doubles Page Crawl Time

Screenshot 20201028 AMP page crawl time doubles with https

GSC's average page download time (now at ~10kB per page) has risen from ~250ms to ~500ms with the switch from http to https preferred.

Note that Google's crawlers are generally visiting from across the Atlantic, multiplying the effects of the extra TLS setup handshake. If the crawlers ever get to use H3 (HTTP/3) including header compression, some of that may come back.

As of today mean page fetch size is a little under 10kB and ~180 pages are crawled per day, as given in GSC's crawl stats for the AMP site version.

2020-09-01: Brotli Sides

I have enabled Brotli static pre-compression for supporting top-level pages, such as the home and sitemap pages. If that does not cause any problems then I shall extend such br content-encoding to main pages also, either side of making the https set canonical for (say) AMP and desktop. (Brotli compression only works over https.)

: I weakened and have turned on Brotli precompression for all main pages, having seen at least some (legit) spiders starting to fetch content over https, where it may help. (Not all spiders can Accept-Encoding: br even with https though.)