Earth Notes: On Website Technicals (2018/09)

Tech updates: data file Atom feed sitemap for robots.txt, Google Dataset Search, Poetry...
tools

2018/09/24: Data Licence CC0

I asked a friend (Andrew Katz) who thinks deeply about these things, often over beers at FOSDEM, what data licensing might work for me. He said that "attribution licences are a bit of a pain for data, as it tends to get fragmented very easy, and if you combine datasets and get an output, then it's not clear if the output is a derivate, and therefore in some way demands the attribution."

He prefers the most liberal licence possible, CC0. As I understand it, this is essentially "public domain" even in places where there is no such thing directly in law! He points out that CC0 also covers database rights.

He also suggests that I mention in the data FAQ/README my preferred position that attribution is nice but not essential. I don't know how to do that yet in a structured machine-readable way in the schema.org/Dataset!

The URL for the license field for CC0 is https://creativecommons.org/publicdomain/zero/1.0/.

So I have applied CC0 in this way to one small dull data set which has zero economic value in itself, but that can be used as training wheels for this purpose.

2018/09/18: DataDownload

Having (re)read Google Search Dataset, I have amended JSON-LD for the couple of data sets that refer to directories of data to point the url at the describing HTML page, and a sub-distribution of type DataDownload with member contentURL at the data directory itself.

(Giving stuff away correctly is hard: Amongst other sources I'm looking at "How to License Research Data" to select a good license (sic, US English) field value for my data sets...)

2018/09/15: Poetry Incoming

Not strictly 'tech' this one! Given various people's worries (including a friend's) about tipping points and positive feedback loops, and the New Scientist 15th September issue p44 piece about "Giving voice to a planet's suffering", I am stirred to have another crack at expressing important thoughts in verse.

There's a definite risk that my attempts at poetry may increase the planet's suffering directly, but I have enjoyed trying before.

Not fixing fast enough
and run-away dangers,
are reasons to work harder —
not to slump in dispair.
"It may never happen,"
or at least may never happen as bad...

2018/09/09: Google Dataset Search

Nature reports that Google unveils search engine for open data. "The tool, called Google Dataset Search, should help researchers to find the data they need more easily."

Dataset Search for site:earth.org.uk: shows 3 results as of 2018/09/09 16WW Relative Humidity, Water Mains Inlet Temperature and 1-minute SunnyBeam PV grid-tie power generation.

I saw a couple of issues (eg a typo) with one of them, so fixed it.

Now that Google appears to be doing something with Dataset schema data, I may work harder to mark up and point to other data that I have.

Indeed, I have just created a Dataset for the Enphase data set.

(On 2018/09/15 I see that Dataset Search has found this new data.)

This is separate from, but complementary to, the data sitemap Atom feed.

2018/09/03: Data File Sitemap Atom Feed for Robots

The new Google Search Console seems unwilling to show Atom sitemaps. But it and other search engines seem happy to pick up such sitemaps automatically from robots.txt.

I maintain an Atom file of all new and updated data files over the last week or so, to help with discovery of new data, datafeed.atom.

So, reckless and impetuous as I am, I added the following to EOU's (desktop) robots.txt:

Sitemap: http://www.earth.org.uk/datafeed.atom