Earth Notes: On Website Technicals (2019-02)Updated 2022-11-30.
2019-02-28: AMP 60% Indexed
Wandered back up to about 60% indexed or "valid". This might be as high as it gets...
Though essentially all the canonical pages are reported as "indexed" (two for a few days were reported as "excluded"), the old GSC structured data report only reports about the same number of "Organization"
schema.org objects though all main canonical pages have them. Unhelpful muddiness and inconsistency in reporting. Certainly not actionable.
2019-02-24: Desktop Tweak
Given the truncation of my desktop social-media button bar, I am tweaking the layout of the navigation bar at the top of the page.
In the first instance I have narrowed it and (temporarily) dropped the carbon-intensity button. My aim is to improve the look of the top of the desktop page for non-desktop visitors.
I also tweaked the '
Expires values sent for desktop auto-generated
out graphics so be just over half the nominal interval rather than the full interval. So graphs under
out/weekly now expire after about 4 days rather than 7. The aim is to avoid leaving a client with a stale copy older than the implied interval given timing races and so on.
I drastically shortened the general expiry time under
data since although most of the objects under there become completely static after a certain point, other things change daily or more often. I will have to see if this change induces significant extra traffic.
I'm pleased to see that EOU seems to work reasonably well in an emulation of the 1990 initial Web browser, with Open -> Open from full document reference.
2019-02-18: Soft 404
I am puzzled by Google reporting (in GSC) files such as
www.earth.org.uk/data/WW-PV-roof/E2019.csv, with a MIME type in the HTTP header of
text/csv, as "Soft 404". There's nothing '404' about it: it's clearly a data file, and present, and behaving as expected: not a missing HTML nor duplicate nor error document for example.
Since Google+ is going away in March/April I have removed the social media button for it from desktop/lite pages. (AMP uses a different mechanism.)
While I am having fun, and to save more page weight, I removed the RSS button, since I saw no evidence of it being used.
Page weight (on first load) should now have dropped by more than 180 bytes.
I will probably tidy up the appearance of the float box that includes the now-shorter button bar, in due course...
2019-02-10: AMP 50% Indexed
AMP pages marked as valid/indexed has been wobbling around the 100 (ie ~50%) mark for many days. Note that only one residual AMP error is being reported. (This one apparently from Google's "crawl issue" internal bug still.) All main canonical pages as listed in
sitemap.xml are reported as indexed. So it puzzles me why half the AMP versions are not.
2019-02-09: Holding it Wrong: link rel= prev/next
I have been linking sets of pages together, such as in this sequence of tech notes, with manual links in the page body and
next in the head. It's slightly tiresome and error-prone work.
link rel part seems simply to be wrong, eg from "Indicating paginated content to Google":
Note: You should not use this technique merely to indicate a reading list of an article series; you should use this to indicate a single long piece of content that is broken into multiple pages.
I have read various things on this topic, but this seems to be the clearest statement so far.
I have manually removed a couple of manual
next pairs between individual article headers as a small quick test and improvement.
But I'd like to do something more systematic for the long series that I have. Eg some fixed metadata that does the right thing in the body of the page, and whatever is appropriate (but probably not
next) in the head.
Happily this may trim the head/CRP for all the affected pages. It should certainly save me some manual boilerplate hacking and maintenance over time!
Now for pages marked as
SERIES, I automatically insert previous and next links, and breadcrumb structured data, with a link to the head/unnumbered page if extant: Breadcrumb. I'm still tweaking the appearance of the resulting early sidebar.
2019-02-03: Schema.org ImageObject isBasedOn
For hero images used in EOU and derived from external sources, and for which I have a credit/discussion
.txt file, I have made two enhancements.
.txt link now gets a
itemprop=discussionUrl. I'm not sure if the semantics are quite right, but it's close.
.txt file contains a line of the form
isBasedOn: URL then a '
src' link is made after the '
i' link to the given URL with a
Here is a snippet from the foot of the desktop/canonical version of this page as of writing, with some whitespace added for readability:
<strong id=pgMedia>Page Media</strong>: <span itemprop=image itemscope itemtype=http://schema.org/ImageObject><meta itemprop=width content=1280><meta itemprop=height content=1192> <a href=img/tools-1280w.png itemprop=url>image</a> (<a href=img/tools-1280w.png.txt itemprop=discussionUrl>i</a>/ <a href=https://pixabay.com/en/tool-pliers-screwdriver-145375/ itemprop=isBasedOn>src</a>)</span>.
Last month I managed to squeak the head/CRP for a particular page under the limit to retain its Twitter video player card, etc.
This was in part through assuming that the embedded player video URL, eg
https://www.youtube.com/embed/BAP56HIPBY8, would not need quoting when used as an attribute value. For this it must not contain spaces nor quotes nor a '>' closing angle bracket.
At the time I could not be sure that the URL would never end in a '/' (slash). If one did, it would not be safe to use unquoted in an attribute at the end of an HTML tag ie
I rearranged the attributes so as to have the URL-containing one not last. But that inconsistency in attribute ordering reduces compressibility.
Today I added checks for raw and Twitter player URL safety, and put the attributes back in the same order that I use elsewhere. The uncompressed form of the page preamble/head/CRP is exactly the same size and semantic content, but the
gzip -8 and
zopfli output is slightly smaller. The pre-compressed version is made with zopfli, but the CRP size is tested with
gzip -8, and the desktop page threshold is currently 1260, aiming to allow some meaningful body text into the first TCP frame sent, after HTTP/1.1 headers.
gzip -8 bytes