Earth Notes: On Website Technicals (2019/04)

Updated 2019-04-22 19:37 GMT
Tech updates: moar liter, bumpy indexing, copyrightYear fix, Schedule, HH:MM and spatial page metadata...
More ( microdata) metadata this month. Once I went looking for what might be relevant rather than just scraping a minimum to keep Google happy, there's lots!

2019/04/22: Adding spatialCoverage

I have just created a special meta/header tag to label a page as being spatially at 16WW. This injects an appropriate spatialCoverage itemprop in a footer, other than in 'lite' pages.

In another pass I added a way to label a page as being in a specfic country (usually UK for this site) then optionally at a latitude and longitude, and then exceptionally at a given elevation.

I have been able to give more than 75% of pages a sensible spatialCoverage between those two.

I just implemented an item from my to-do list:

2019/04/18: More Precise datePublished

All but ~20 pages have now had their datePublished updated, mainly simply to add a trailing time, but sometimes also to correct the date.

There are cases where the SVN repository date is not a good reflection of when the information on page was first published, eg because a single huge page was split up into many. (Sometimes the creation date of a new page into which information was moved has been accepted, but the temporalCoverage set to reflect a range including the original date.)

Another case is where a page was created a day or three ahead of time to save a rush on the day, but that page was not actually published (ie embargoed) until a more logical time.

In such cases where the SVN timestamp is unhelpful, the manually-selected plain date has been left in place.

The output of the tool I created to cross-check as I last ran it was:

WARNING: datePublished .electricity-storage-whole-household.html is 2010-12-20, svn is 2017-05-20T10:23:44Z...
WARNING: datePublished .index.html is 2007-05-25, svn is 2007-07-18T09:52:33Z...
WARNING: datePublished .note-on-site-technicals-20.html is 2019-01-01, svn is 2018-12-27T12:48:38Z...
WARNING: datePublished .note-on-site-technicals-3.html is 2017-08-01, svn is 2017-07-31T19:22:13Z...
WARNING: datePublished .off-grid-stats-historical-200909.html is 2009-09-11, svn is 2018-10-06T18:48:45Z...
WARNING: datePublished .off-grid-stats-historical.html is 2007-11-08, svn is 2018-10-06T18:09:39Z...
WARNING: datePublished .saving-electricity-2008.html is 2008-01-01, svn is 2017-08-20T14:20:48Z...
WARNING: datePublished .saving-electricity-2009.html is 2009-01-01, svn is 2017-08-20T14:09:14Z...
WARNING: datePublished .saving-electricity-2010.html is 2010-01-01, svn is 2017-08-20T13:51:28Z...
WARNING: datePublished .saving-electricity-2011.html is 2011-01-01, svn is 2017-08-20T13:09:22Z...
WARNING: datePublished .saving-electricity-2012.html is 2012-01-01, svn is 2017-08-20T12:54:01Z...
WARNING: datePublished .saving-electricity-2013.html is 2013-01-01, svn is 2017-08-20T12:13:14Z...
WARNING: datePublished .saving-electricity-2014.html is 2014-01-01, svn is 2017-08-20T11:00:31Z...
WARNING: datePublished .saving-electricity-2015.html is 2015-01-01, svn is 2017-08-20T10:29:47Z...
WARNING: datePublished .saving-electricity-2016.html is 2016-01-01, svn is 2017-08-19T15:23:50Z...
WARNING: datePublished .saving-electricity-2017.html is 2017-01-01, svn is 2017-08-18T18:39:50Z...

Note that the whole site appears to have been imported into SVN 2007-07-18T09:52:33Z, at that point consisting of the following files/dirs:


Timestamps from SVN for anything imported at that point are misleading. Clues in the text, and assuming that a page is at least one day older that the oldest capture in the Wayback Machine, help with these.

Note that this date inference is needed for the home page index.html as it predates the repository.

2019/04/16: More Precise dateModified

I've extended page date metadata to hours and minutes (UTC) for dateModified and sdDatePublished. In particular dateModified is now the repository source file latest commit date and time rather than the file timestamp.

Note that other places, such as the sitemaps, may still use the file timestamp, as it is quick to get and a reasonable guide for a search engine of when content has changed. And timestamps such as HTTP LastModified will come from the file timestamps of the plain or compressed version of the file as requested by the client.

Rather than be free-floating, I have now attached the 'EOU' info as sourceOrganization to the page/Article.

I've also allowed datePublished to include a (UTC) full time, where I am able to provide it, eg from inspection of SVN repository logs.

Finally, to make the (last updated) date easy for a user to find, it is now shown per the Google News guidelines:

Date and time should be positioned between the headline and the article text.

2019/04/15: The Joy of Schedule

As I posted as a new issue on Github schemaorg/schemaorg:

In my page:

I talk about the joy of planting, growing and eating pumpkins.

Somewhere in there I'd like to mark up that this fun is to be had April to September every year. Maybe I could jam one of the ISO 8601 repeating times into temporal or temperalCoverage, maybe for the Article or an embedded Thing or Event representing the growing of pumpkins.

What would be he right thing to do here? So far I really can't see what it would be!

I was pointed to the existence of and Schedule so I shall see how I might make those work!

2019/04/10: Fixed copyrightYear

Since structured/meta data has been on EOU, I have had the copyrightYear be the whole site's first year, ie 2007. I have now fixed it to be the year that the individual page, ie CreativeWork, was first published, which is more true to the definition, and more granular.

2019/04/07: Indexing Bumpy

56% of AMP pages reported valid/indexed

It's still really unclear to me what the notions of "valid" and "indexed" mean in various places in GSC, such as the AMP and Mobile Usability 'Enhancements' vs coverage by sitemap... GSC seems unwilling to stray reporting much above 50% of my AMP pages as being "Valid" (green) even though it reports no problems, and has 100% of the canonical (desktop) pages as "Valid" in the main sitemap.

(Also, my network connection has been very flaky for about 24h, so I'm expecting some complaints from GSC about that in due course...)

2019/04/04: Speak Moar Liter

I am for now stripping out speakable meta-data from lite pages, since no one is going to be using it for a while, and Google only cares about markup and content parity between desktop and AMP it seems.

~80 bytes lopped off each 'lite' .htmlgz.

There are other marginal metadata elements that I could strip out for lite (m-dot) also, if I had the urge!