Earth Notes: On Website Technicals (2019-02)

Updated 2024-04-21.
Tech updates: micro-optimisation, isBasedOn, misuse of link rel prev/next, AMP half-indexed, Google-, soft 404, 1990 style, desktop tweak, 60% AMPed.
I failed to get more than ~60% of EOU AMP pages reported as "valid" (elsewhere "indexed") though none are marked invalid and essentially all the canonical pages are reported as "indexed". Bizarre!

2019-02-28: AMP 60% Indexed

20190228 AMP 60pc pages indexed from GSC Enhancements view

Wandered back up to about 60% indexed or "valid". This might be as high as it gets...

Though essentially all the canonical pages are reported as "indexed" (two for a few days were reported as "excluded"), the old GSC structured data report only reports about the same number of "Organization" schema.org objects though all main canonical pages have them. Unhelpful muddiness and inconsistency in reporting. Certainly not actionable.

2019-02-24: Desktop Tweak

Given the truncation of my desktop social-media button bar, I am tweaking the layout of the navigation bar at the top of the page.

In the first instance I have narrowed it and (temporarily) dropped the carbon-intensity button. My aim is to improve the look of the top of the desktop page for non-desktop visitors.

Cache-Control

I also tweaked the 'access +nnn' Cache-Control and Expires values sent for desktop auto-generated out graphics so be just over half the nominal interval rather than the full interval. So graphs under out/weekly now expire after about 4 days rather than 7. The aim is to avoid leaving a client with a stale copy older than the implied interval given timing races and so on.

I drastically shortened the general expiry time under data since although most of the objects under there become completely static after a certain point, other things change daily or more often. I will have to see if this change induces significant extra traffic.

2019-02-20: WWWoaah!

I am pleased to see that EOU seems to work reasonably well in an emulation of the 1990 initial Web browser, with Open -> Open from full document reference.

See: WWW = Woeful, er, winternet wendering? CERN browser rebuilt after 30 years barely recognizes modern web.

2019-02-18: Soft 404

I am puzzled by Google reporting (in GSC) files such as www.earth.org.uk/data/WW-PV-roof/E2019.csv, with a MIME type in the HTTP header of text/csv, as "Soft 404". There's nothing '404' about it: it's clearly a data file, and present, and behaving as expected: not a missing HTML nor duplicate nor error document for example.

2019-02-13: Google-

Since Google+ is going away in March/April I have removed the social media button for it from desktop/lite pages. (AMP uses a different mechanism.)

While I am having fun, and to save more page weight, I removed the RSS button, since I saw no evidence of it being used.

Thank you again to Share42 for the script and buttons, to TinyPNG for minifying the icons, and to zopfli for minimising the pre-compressed JavaScript!

Page weight (on first load) should now have dropped by more than 180 bytes.

I will probably tidy up the appearance of the float box that includes the now-shorter button bar, in due course...

(The old and new versions of the button bar have distinct paths so that they can coexist as pages are gradually rebuilt and/or old ones live in various caches. At some point I may remove the older files for tidiness. Note also that the desktop and lite JavaScript files, though under different paths, each on their own site to avoid security snafus, are the same object in the repository.)

2019-02-10: AMP 50% Indexed

20190210 AMP 50pc pages indexed from GSC Enhancements view

AMP pages marked as valid/indexed has been wobbling around the 100 (ie ~50%) mark for many days. Note that only one residual AMP error is being reported. (This one apparently from Google's "crawl issue" internal bug still.) All main canonical pages as listed in sitemap.xml are reported as indexed. So it puzzles me why half the AMP versions are not.

2019-02-09: Holding it Wrong: link rel= prev/next

I have been linking sets of pages together, such as in this sequence of tech notes, with manual links in the page body and link rel prev and next in the head. It's slightly tiresome and error-prone work.

Also, the link rel part seems simply to be wrong, eg from "Indicating paginated content to Google":

Note: You should not use this technique merely to indicate a reading list of an article series; you should use this to indicate a single long piece of content that is broken into multiple pages.

I have read various things on this topic, but this seems to be the clearest statement so far.

I have manually removed a couple of manual prev/next pairs between individual article headers as a small quick test and improvement.

But I would like to do something more systematic for the long series that I have. Eg some fixed metadata that does the right thing in the body of the page, and whatever is appropriate (but probably not prev/next) in the head.

Happily this may trim the head/CRP for all the affected pages. It should certainly save me some manual boilerplate hacking and maintenance over time!

Now for pages marked as SERIES, I automatically insert previous and next links, and breadcrumb structured data, with a link to the head/unnumbered page if extant: Breadcrumb. I am still tweaking the appearance of the resulting early sidebar.

2019-02-03: Schema.org ImageObject isBasedOn

For hero images used in EOU and derived from external sources, and for which I have a credit/discussion .txt file, I have made two enhancements.

The .txt link now gets a itemprop=discussionUrl. I am not sure if the semantics are quite right, but it's close.

If the .txt file contains a line of the form isBasedOn: URL then a 'src' link is made after the 'i' link to the given URL with a itemprop=isBasedOn.

isBasedOn Example

Here is a snippet from the foot of the desktop/canonical version of this page as of writing, with some whitespace added for readability:

<strong id=pgMedia>Page Media</strong>:
<span itemprop=image itemscope itemtype=http://schema.org/ImageObject><meta itemprop=width content=1280><meta itemprop=height content=1192>
<a href=img/tools-1280w.png itemprop=url>image</a>
(<a href=img/tools-1280w.png.txt itemprop=discussionUrl>i</a>/
<a href=https://pixabay.com/en/tool-pliers-screwdriver-145375/ itemprop=isBasedOn>src</a>)</span>.

2019-02-02: Micro-optimisation

Last month I managed to squeak the head/CRP for a particular page under the limit to retain its Twitter video player card, etc.

This was in part through assuming that the embedded player video URL, eg https://www.youtube.com/embed/BAP56HIPBY8, would not need quoting when used as an attribute value. For this it must not contain spaces nor quotes nor a '>' closing angle bracket.

At the time I could not be sure that the URL would never end in a '/' (slash). If one did, it would not be safe to use unquoted in an attribute at the end of an HTML tag ie ... attrname=value>.

I rearranged the attributes so as to have the URL-containing one not last. But that inconsistency in attribute ordering reduces compressibility.

Today I added checks for raw and Twitter player URL safety, and put the attributes back in the same order that I use elsewhere. The uncompressed form of the page preamble/head/CRP is exactly the same size and semantic content, but the gzip -8 and zopfli output is slightly smaller. The pre-compressed version is made with zopfli, but the CRP size is tested with gzip -8, and the desktop page threshold is currently 1260, aiming to allow some meaningful body text into the first TCP frame sent, after HTTP/1.1 headers.

Compression gains from reordering attributes
VersionUncompressed bytesgzip -8 byteszopfli bytes
Original296112581243
Re-ordered296112571237