Earth Notes: On Website Technicals (2018-09)

Updated 2022-10-16.

Tech updates: data file Atom sitemap in robots.txt, Google Dataset Search, poetry, DataDownload, CC0 licence, About, AMP.

Data, data, data: working on making numbers/stats/data published on EOU easier to find and easier to use, including licensing. And warming up to creating a full AMP view of EOU!

2018-09-30: AMP

I am contemplating adding a set of AMP-formatted pages, maybe under an amp. sub-domain analogous to the existing m. mobile domain.

For performance I'm already doing many of the things that AMP wants, but the following are likely to be a pain:

Need to replace all img and related tags, which implies huge manual effort.
Need to dispense with JavaScript-driven social media buttons in AMP page versions.
Need to tweak page generation for tags that AMP does not regard as optional such as head and body.
Need to update page generation for three-way MOBILE value, eg true/false/amp, and test all cases that currently assume excluded middle!
Need to accommodate extra rel=amphtml link in each header.
Need to adjust CSS eg for !important items.
Need to do without any inline CSS styling.
Need to do without any other JavaScript, eg the interactive 'stacks'.

I initially hand-crafted and validated one page (based on the validator's own minimal template) to establish that the boiler-plate itself is probably not to horrible.

Images may kill me, so there may need to be a higher-level representation, and a refusal to translate pages containing low-level image stuff.

The solution may be to find a reliable way to detect pages that contain untranslatable elements (raw img/picture/etc, JS, in-line style) and refuse to generate 'AMP' versions, redirecting to the plain HTML5 mobile version. (And not putting in link rel=amphtml references from the other page versions.) A command-line amphtml-validator could be handy: this NPM tool works on my RPi but not on my Mac...

After some more footling, I have fixed the boilerplate head/foot generation and CSS, and at least a few very simple AMP-compliant pages can be generated from the same source as the desktop and lite versions, hurrah! (Nothing with images, scripts including Google's site search, etc.)

2018-10-02: I observed that details and summary tags, which I use quite a lot, were rejected by AMP without any reason as to why. Lo and behold a fix is on its way...

2018-11-10: I have just validated an AMP version of the 'about' page, including a floating IMG converted to amp-img. Not very exciting of itself, and it won't stay where it currently is, but it has been submitted to Google to see what happens.

2021-01-13: the AMP page set is going strong, and gets most of the Google SERPS visibility on searches from a mobile phone. Only a few pages, eg with custom JavaScript or hand-crafted images, are absent.

However, if Google does drive mobile-search appearance by actual page performance, rather than simply being AMP, as suggested for circa March 2021, and the 'lite' m-dot pages start appearing (as they are faster and lighter), then I may discontinue the AMP pages.

I serve both http and https versions of the desktop and lite pages, but already only serve https versions of AMP pages (redirecting http requests) to trim the cost of AMP support.

2018-09-28: About Us

About a decade late I have finally created an About Us page! Bare bones to start with, but may get grander later...

Today I also visited brightonSEO.

2018-09-24: Data Licence CC0

I asked a lawyer friend (Andrew Katz) who thinks deeply about these things, often over beers at FOSDEM, what data licensing might work for me. He said that attribution licences are a bit of a pain for data, as it tends to get fragmented very easy, and if you combine datasets and get an output, then it's not clear if the output is a derivate, and therefore in some way demands the attribution.

He prefers the most liberal licence possible, CC0. As I understand it, this is essentially "public domain" even in places where there is no such thing directly in law! He points out that CC0 also covers database rights.

He also suggests that I mention in the data FAQ/README my preferred position that attribution is welcome but not essential. I don't know how to do that yet in a structured machine-readable way in the schema.org/Dataset!

The URL for the license field for CC0 is https://creativecommons.org/publicdomain/zero/1.0/.

So I have applied CC0 in this way to one small dull data set which has zero economic value in itself, but that can be used as training wheels for this purpose.

2018-09-18: DataDownload

Having (re-)read Google Search Dataset, I have amended JSON-LD for the couple of data sets that refer to directories of data to point the url at the describing HTML page, and a sub-distribution of type DataDownload with member contentURL at the data directory itself.

Giving stuff away correctly is hard: Amongst other sources I'm looking at "How to License Research Data" to select a good license (sic, US English) field value for my data sets...

2018-09-15: Poetry Incoming

Not strictly 'tech' this one! Given various people's worries (including a friend's) about tipping points and positive feedback loops, and the New Scientist 15th September issue p44 piece about "Giving voice to a planet's suffering", I am stirred to have another crack at expressing important thoughts in verse.

There's a definite risk that my attempts at poetry may increase the planet's suffering directly, but I have enjoyed trying before.

Not fixing fast enough
and run-away dangers,
are reasons to work harder —
not to slump in dispair.
"It may never happen,"
or at least may never happen as bad...

2018-09-09: Google Dataset Search

Nature reports that Google unveils search engine for open data. "The tool, called Google Dataset Search, should help researchers to find the data they need more easily."

Dataset Search for site:earth.org.uk: shows 3 results as of 2018-09-09 16WW Relative Humidity, Water Mains Inlet Temperature and 1-minute Sunny Beam PV grid-tie power generation.

I saw a couple of issues (eg a typo) with one of them, so fixed it.

Now that Google appears to be doing something with Dataset schema data, I may work harder to mark up and point to other data that I have.

Indeed, I have just created a Dataset for the Enphase data set.

(On 2018-09-15 I see that Dataset Search has found this new data.)

This is separate from, but complementary to, the data sitemap Atom feed.

2018-09-03: Data File Sitemap Atom Feed for Robots

The new Google Search Console seems unwilling to show Atom sitemaps. But it and other search engines seem happy to pick up such sitemaps automatically from robots.txt.

I maintain an Atom feed of all new and updated data files over the last week or so, to help with discovery of new data, rss/datafeed.atom.

So, reckless and impetuous as I am, I added the following to EOU's (desktop) robots.txt:

Sitemap: http:https://www.earth.org.uk/rss/datafeed.atom