Earth Notes: On Website Technicals (2020-10)

Updated 2022-09-15.
Tech updates: smaller than recommended, https 150ms slower, https Dataset canonical, Textract, ORCID, 1995...
A fiddling-round-the-edges month... I have verified that https adds significant page rendering latency (~150ms) even for clients in the UK.

2020-10-17: Ancient History

ExNet home page 1995 via TheOldNet

Doesn't really belong here, but rearing its ugly head from, ExNet's home page circa 1995/1996!

Swirly background ahoy, and Java applet that most browsers will decline to show these days...

2020-10-14: ORCID

I have included my ORCID link in author metadata in each page.

I think that the main benefit will accrue to the ~13 Datasets that refer to that metadata as the creator.

2020-10-11: Textract

When computing readability of articles, I use reado with unfluff. The latter sometimes discards most of the content, resulting in wacky scores.

I have given textract a little spin. It's a little slow but spits out decent plain text given my HTML core source.

(I'm having some difficulty getting it to install properly on the Mac, as I did with the Google AMP validator, for example.)

The code change for checking would be from reado --unfluff to textract | reado, though there would be some complications...

I haven't yet found a compelling improvement, but I may reassess that!

2020-10-08: HTTPS Dataset Page Canonical

For all pages containing a, I have hardwired the https://www. to be canonical. This avoids a confusion at the moment with both the http and https pages claiming to be the canonical copy.

I also made the survey results page a Dataset in its own right!

Also, for those datasets that are under data/, I have flagged them as isPartOf the main 16WW dataset. The reverse hasPart relationship appears to be rejected by Google at the moment, eg by the Structured Data Testing tool with CreativeWork is not a known valid target type for the hasPart property.

2020-10-06: Is HTTPS Fast Yet?

MachMetrics https WWW metrics 20201006 screenshot

It's clear amongst the volatility in "Interactive" and "Speed Index" values that https is consistently slower for visitors, by ~150ms. Even as here where the client and server are both in London.

This change is basically the https negotiation time from cold.

Maybe gains from HTTP/2 (h2) mean that some metrics are a bit less volatile, and page complete is only up by ~100ms (~510ms vs ~410ms).

These numbers are a mixture of mobile and desktop renderings of the same desktop home page, over 'cable', eg for users coming in over WiFi from home. Desktop numbers are showing as ~560ms, mobile ~460ms, to page complete.

2020-10-05: Image Size Smaller Than Recommended

I received a warning from GSC for one AMP page: Image size smaller than recommended size.

It doesn't say which image, and as I have multiple ImageObject alternates for that page now, I think what it really means is "none of the images that you list with ImageObject markup is large enough" at the point that the page was last checked, many days ago.

According to the Google developer guidelines for Article, there should be one or more ImageObjects (or URLs) with images representative of the article.

The guidelines suggest minimum width and area, and the warning I got was with reference to those, so GSC claims.

The relevant dimensions are different for normal and AMP (non-story) pages.

For non-AMP Images should be at least 696 pixels wide and they should be a minimum of 300,000 pixels when multiplying width and height with aspect ratios 16x9, 4x3, and 1x1.

For AMP non-story pages Images should be at least 1200 pixels wide and be a minimum of 800,000 pixels when multiplying width and height, with the same aspect ratios.

I use the same set of images to select from for the top-of-page hero banner as I mark up at the various aspect ratios as ImageObjects. As I want (at least) 800px for that, the non-AMP minimum width is not much of a restriction in practice.

I have these limits (or higher) already coded into the page-build script. I am tweaking this to warn about not meeting the higher width for just for the AMP page builds, and will continue to provide info for others. (The minimum area suggestions seem to be weaker.) That will generate some more noise, but at least it should be in sync with GSC complaints.