Earth Notes: On Website Technicals (2023-11)
Updated 2023-12-09.2023-11-25: Public Data Archive
I want to make it easy to be able to occasionally (manually) archive the bulk of the public data files off-site, eg in Zenodo for safe-keeping.
The first step is adding a make
rule to periodically rebuild a public-data-files.txt
plain-text, one-filename-per-line, tricky-files-excluded, sorted list. Tricky files include those with unsafe names such as starting with -
, or containing whitespace, or even anything that looks like a broken temporary file. Sorting the list may help with compression as a solid archive in various ways. As of now this list contains 10384 files.
This list is both useful metadata in its own right, but then is used to build an xz
-compressed tar archive of the data itself. This currently runs to somewhat under 400MB.
This new snapshot is now listed as an extra 'distribution' on the main data page, as well as adding the DOI [hart-davis2023EOUdata].
I have updated all DOIs in the Zenodo 'snapshot' bibliography [hart-davis2023bibliography] and offline page archive [hart-davis2023EOUoffline] entries to use the 'invariant' DOI that redirects to the latest version. I have also changed the bibliography entry URL in each case to point to the live version of each file here at EOU.
2023-11-20: SD Card Prep
I bought a 128GB microSDVC card via Amazon for pekoe
(RPi) that looks after the Thermino heat battery.
UHS-I U3 160MB/s Full HD & 4K UHDand on the packaging "up to" 160MB/s read and 120MB/s write.
I am running f3 - Fight Flash Fraud fwrite
on my Mac for it (via brew install f3
).
The fwrite
tool reports Free space: 119.33 GB
and seems to be writing 1GB files, and achieving ~75+MB/s performance on each early file.
% f3write /Volumes/Untitled/ F3 write 8.0 Copyright (C) 2010 Digirati Internet LTDA. This is free software; see the source for copying conditions. Free space: 119.33 GB Creating file 1.h2w ... OK! ... Creating file 119.h2w ... OK! Creating file 120.h2w ... OK! Free space: 640.00 KB Average writing speed: 73.77 MB/s
Now verifying with fread
:
% f3read /Volumes/Untitled/ F3 read 8.0 Copyright (C) 2010 Digirati Internet LTDA. This is free software; see the source for copying conditions. SECTORS ok/corrupted/changed/overwritten Validating file 1.h2w ... 2097152/ 0/ 0/ 0 ... Validating file 119.h2w ... 2097152/ 0/ 0/ 0 Validating file 120.h2w ... 697167/ 0/ 0/ 0 Data OK: 119.33 GB (250258255 sectors) Data LOST: 0.00 Byte (0 sectors) Corrupted: 0.00 Byte (0 sectors) Slightly changed: 0.00 Byte (0 sectors) Overwritten: 0.00 Byte (0 sectors) Average reading speed: 84.74 MB/s
So this card looks to be per spec on capacity, and good to go for fixing pekoe
.
CrowView
After some bracing root canal surgery in the afternoon (yes really), I unboxed and played a little with my new CrowView 14" Portable Ultralight Dual Monitor. (It was on Kickstarter and I paid HKD1038, so ~£106.)
I am a little concerned about the forces applied to my MBA by the CrowView attached to the side, but it also stands alone, eg portrait for PDF papers!
In any case, connecting the CrowView and MBA with the USB-C to USB-C cable supplied makes the CrowView just work. Its position relative to the MBA display and rotation are easy to adjust in the MBA settings. Note that the CrowView can get its power (~5W) from the MBA or via a second USB-C connector.
The MBA (M1) will not apparently drive the CrowView and my Dell monitor simultaneously, not that there is space on my desk anyway!
The image shows testing the CrowView on the train: it works.
2023-11-12: Dataset Bibliography Improvements
Google Search Console started complaining that schema.org/Dataset
citations in my bibliography were incomplete. (This seems to be the first time that these have been counted as items in their own right.)
In particular GSC was complaining that Dataset
s were missing a description
. So I now insert an abbreviated abstract
and description
item for them.
This also forced me to add or improve the abstract for some BibTeX entries!
I also made the entry url
be a sameAs
to indicate that I am not claiming my entry to be canonical, nor indeed the data to be mine.
I am also adding creator
for each author
in the easiest cases, as GSC seems to want the former even though schema.org/CreativeWork
says for creator
: The creator/author of this CreativeWork. This is the same as the Author property for CreativeWork.
Where multiple authors are listed I am assuming them to be people, though there may be some minor exceptions; these get to be creator
and author
. The rest stay as just author
for now.
However, that does not actually fix any of the Dataset
cases yet!
(2023-11-13: I have fixed the special case of a single author when it is me me me, as I think that my name is world-wide unique, and also of me in a list of authors, and for both cases I now link to extra metadata provided on the host page such as my ORCID.)
And naturally, adding extra detail where the author
/ creator
is now explicitly a schema.org/Person
turns up a new set of warnings (a 'missing' URL for each)!
Apparently-single-author entries can be picked out with:
% egrep "^ *author" db.bibliography/single/* | egrep -v "( and )|;" | wc -l 72
which includes examples such as:
DBSP2020domestic.bib:author={{Domestic Building Services Panel}}, EHSprofile.bib:author = {DCLG}, ESC2022EoH.bib:author={LCP Delta}, NEEDdataset.bib:author = {UK BEIS}, NGMdataset.bib:author={Kiln}, NIHSreport.bib:author = {Northern Ireland Executive}, RBK2019energy.bib:author={Burohappold Engineering}, SHCScollection.bib:author = {Scottish Government}, hart-davis2023TRVmodel.bib:author={Hart-Davis, Damon}, hart-davis2023central.bib:author = {Damon Hart-Davis}, hemmings2023airport.bib:author={Peter Jonathan Hemmings},
For complaints about missing license
fields, and for which there is no directly corresponding BibTeX field, I now carefully map particular set copyright
values to licence URLs. Current licences listed (for fewer than 20 bibliography entries) include:
Apache License Version 2.0 Creative Commons Attribution 4.0 International Creative Commons Attribution-NonCommercial 4.0 Creative Commons Attribution-NonCommercial-NoDerivs (by-nc-nd) 2.0 UK: England and Wales Creative Commons ShareAlike Attribution Licence (CC BY-SA 4.0) Creative Commons Zero v1.0 Universal UK Open Government Licence UK Open Government Licence v3.0
As of 2023-11-18 GSC reports that every (~28 of 'em) Dataset has a description
and creator
, and all but 8 have a license
.
Zenodo
I am integrating GitHub and Zenodo to automatically capture TRVmodel [hart-davis2023TRVmodel] releases and give them DOIs.
This process seems to be mainly automated and free.
I have created TRVmodel V0.9.4 for Zenodo, which gets DOI 10.5281/zenodo.10116281
. Yippie!
2023-11-13: I have created/archived the 2023-11-03 snapshot of EOU main pages [hart-davis2023EOUoffline] at Zenodo which gets DOI 10.5281/zenodo.10119196
.
References
- [hart-davis2023bibliography] Earth.Org.UK (EOU) general BibTeX bibliography snapshot
- [hart-davis2023EOUdata] Earth.Org.UK (EOU) public data snapshot
- [hart-davis2023EOUoffline] Earth.Org.UK main pages offline archive
- [hart-davis2023TRVmodel] TRVmodel: TRV energy modelling in home heating
(Count: 4)