Earth Notes: On Website Technicals (2023-11)

Updated 2025-02-09.

Tech updates: GSC Dataset pickiness, Zenodo DOIs, microSDVC card prep and f3, CrowView, public data archive.

Still working on my first academic paper, keeping a watching brief on an upgrade to a RPi5, and various minor tweaks. More DOIs and offsite backups of some important parts of my public data hoard.

2023-11-25: Public Data Archive

I want to make it easy to be able to occasionally (manually) archive the bulk of the public data files off-site, eg in Zenodo for safe-keeping.

The first step is adding a make rule to periodically rebuild a public-data-files.txt plain-text, one-filename-per-line, tricky-files-excluded, sorted list. Tricky files include those with unsafe names such as starting with -, or containing whitespace, or even anything that looks like a broken temporary file. Sorting the list may help with compression as a solid archive in various ways. As of now this list contains 10384 files.

This list is both useful metadata in its own right, but then is used to build an xz-compressed tar archive of the data itself. This currently runs to somewhat under 400MB.

This new snapshot is now listed as an extra 'distribution' on the main data page, as well as adding the DOI [hart-davis2023EOUdata].

I have updated all DOIs in the Zenodo 'snapshot' bibliography [hart-davis2023bibliography] and offline page archive [hart-davis2023EOUoffline] entries to use the 'invariant' DOI that redirects to the latest version. I have also changed the bibliography entry URL in each case to point to the live version of each file here at EOU.

2023-11-20: SD Card Prep

I bought a 128GB microSDVC card via Amazon for pekoe (RPi) that looks after the Thermino heat battery.

The device is a Samsung PRO PLUS 128GB MB-MD128KAEU claiming UHS-I U3 160MB/s Full HD & 4K UHD and on the packaging "up to" 160MB/s read and 120MB/s write.

I am running f3 - Fight Flash Fraud fwrite on my Mac for it (via brew install f3).

The fwrite tool reports Free space: 119.33 GB and seems to be writing 1GB files, and achieving ~75+MB/s performance on each early file.

% f3write /Volumes/Untitled/
F3 write 8.0
Copyright (C) 2010 Digirati Internet LTDA.
This is free software; see the source for copying conditions.

Free space: 119.33 GB
Creating file 1.h2w ... OK!
...
Creating file 119.h2w ... OK!
Creating file 120.h2w ... OK!
Free space: 640.00 KB
Average writing speed: 73.77 MB/s

Now verifying with fread:

% f3read /Volumes/Untitled/
F3 read 8.0
Copyright (C) 2010 Digirati Internet LTDA.
This is free software; see the source for copying conditions.

                  SECTORS      ok/corrupted/changed/overwritten
Validating file 1.h2w ... 2097152/        0/      0/      0
...
Validating file 119.h2w ... 2097152/        0/      0/      0
Validating file 120.h2w ...  697167/        0/      0/      0

  Data OK: 119.33 GB (250258255 sectors)
Data LOST: 0.00 Byte (0 sectors)
	       Corrupted: 0.00 Byte (0 sectors)
	Slightly changed: 0.00 Byte (0 sectors)
	     Overwritten: 0.00 Byte (0 sectors)
Average reading speed: 84.74 MB/s

So this card looks to be per spec on capacity, and good to go for fixing pekoe.

CrowView

After some bracing root canal surgery in the afternoon (yes really), I unboxed and played a little with my new CrowView 14" Portable Ultralight Dual Monitor. (It was on Kickstarter and I paid HKD1038, so ~£106.)

I am a little concerned about the forces applied to my MBA by the CrowView attached to the side, but it also stands alone, eg portrait for PDF papers!

In any case, connecting the CrowView and MBA with the USB-C to USB-C cable supplied makes the CrowView just work. Its position relative to the MBA display and rotation are easy to adjust in the MBA settings. Note that the CrowView can get its power (~5W) from the MBA or via a second USB-C connector.

The MBA (M1) will not apparently drive the CrowView and my Dell monitor simultaneously, not that there is space on my desk anyway!

The image shows testing the CrowView on the train: it works.

2023-11-12: Dataset Bibliography Improvements

Google Search Console started complaining that schema.org/Dataset citations in my bibliography were incomplete. (This seems to be the first time that these have been counted as items in their own right.)

In particular GSC was complaining that Datasets were missing a description. So I now insert an abbreviated abstract and description item for them.

This also forced me to add or improve the abstract for some BibTeX entries!

I also made the entry url be a sameAs to indicate that I am not claiming my entry to be canonical, nor indeed the data to be mine.

I am also adding creator for each author in the easiest cases, as GSC seems to want the former even though schema.org/CreativeWork says for creator: The creator/author of this CreativeWork. This is the same as the Author property for CreativeWork.

Where multiple authors are listed I am assuming them to be people, though there may be some minor exceptions; these get to be creator and author. The rest stay as just author for now.

However, that does not actually fix any of the Dataset cases yet!

(2023-11-13: I have fixed the special case of a single author when it is me me me, as I think that my name is world-wide unique, and also of me in a list of authors, and for both cases I now link to extra metadata provided on the host page such as my ORCID.)

And naturally, adding extra detail where the author / creator is now explicitly a schema.org/Person turns up a new set of warnings (a 'missing' URL for each)!

Apparently-single-author entries can be picked out with:

% egrep "^ *author" db.bibliography/single/* | egrep -v "( and )|;" | wc -l
      72

which includes examples such as:

DBSP2020domestic.bib:author={{Domestic Building Services Panel}},
EHSprofile.bib:author = {DCLG},
ESC2022EoH.bib:author={LCP Delta},
NEEDdataset.bib:author = {UK BEIS},
NGMdataset.bib:author={Kiln},
NIHSreport.bib:author = {Northern Ireland Executive},
RBK2019energy.bib:author={Burohappold Engineering},
SHCScollection.bib:author = {Scottish Government},
hart-davis2023TRVmodel.bib:author={Hart-Davis, Damon},
hart-davis2023central.bib:author = {Damon Hart-Davis},
hemmings2023airport.bib:author={Peter Jonathan Hemmings},

For complaints about missing license fields, and for which there is no directly corresponding BibTeX field, I now carefully map particular set copyright values to licence URLs. Current licences listed (for fewer than 20 bibliography entries) include:

Apache License Version 2.0
Creative Commons Attribution 4.0 International
Creative Commons Attribution-NonCommercial 4.0
Creative Commons Attribution-NonCommercial-NoDerivs (by-nc-nd) 2.0 UK: England and Wales
Creative Commons ShareAlike Attribution Licence (CC BY-SA 4.0)
Creative Commons Zero v1.0 Universal
UK Open Government Licence
UK Open Government Licence v3.0

As of 2023-11-18 GSC reports that every (~28 of 'em) Dataset has a description and creator, and all but 8 have a license.

Zenodo

I am integrating GitHub and Zenodo to automatically capture TRVmodel [hart-davis2023TRVmodel] releases and give them DOIs.

This process seems to be mainly automated and free.

I have created TRVmodel V0.9.4 for Zenodo, which gets DOI 10.5281/zenodo.10116281. Yippie!

2023-11-13: I have created/archived the 2023-11-03 snapshot of EOU main pages [hart-davis2023EOUoffline] at Zenodo which gets DOI 10.5281/zenodo.10119196.

References

(Count: 4)