Earth Notes: On Website Technicals (2023-01)

Updated 2023-04-08.
Tech updates: energy stats, ASCII .bib, et al., toot from Java, undead, technically, ProGuard, Sendmail HELO, bib lite, date +h, 410.
Trying not to drown in traffic to the Multimedia Gallery, seemingly to URLs that have have been dead for well over a decade...

2023-01-28: 410 Gone

Now that the Gallery is back, albeit skeletally, bots are going at it quite hard. Some URLs that they seem especially keen on, such as the 'pick a random entry' page, will not be coming back.

To try to make that clear a 410 status ("Gone") is now returned:

RewriteRule ^/_cat/doHTMLRandom.jsp - [L,G]

This rule is placed early to avoid any redirections of the request before the 410 is delivered, ie to minimise useless requests.

If 410 seems at all effective then I will apply it more widely.

2023-01-25: Bib GZip

I added MIME-type text/x-bibtex and suffix bib to the list of of files that can be GZipped on the fly if the client accepts that content encoding. That may save a few bytes for anyone using my bibliography.

I have also atomised the monolithic 92-entry general.bib file into separate source files, one per entry, and now automatically reconstruct the monolith from them when they change.

2023-01-27: Faster

I switched a couple of uses of the .bib files (checking for existence of a citation, and extracting a single terse citation) to the individual single-entry files, to stay nearer O(n) time than O(n^2)...

2023-01-24: Little Date Utility

A new handy and portable (at least between my Mac and RPi!) utility to print the current time plus n hours UTC:

% sh script/UTCplusH.sh
2023-01-24T21:46:56Z
% sh script/UTCplusH.sh 1
2023-01-24T22:47:00Z
% sh script/UTCplusH.sh -1
2023-01-24T20:47:03Z
% sh script/UTCplusH.sh 10
2023-01-25T07:47:06Z
% sh script/UTCplusH.sh 48
2023-01-26T21:47:11Z

The core is:

gawk 'BEGIN{print strftime("%FT%TZ", systime()+3600*ADJH, 1)'}

2023-01-23: Sendmail confHELO_NAME

For all sorts of reasons ... mumble mumble ... and idleness and trepidation, it has been the case that the name by which my sendmail MTA (Mail Transfer Agent) greets the world when sending or receiving email has not matched the domain name that comes up when looking up the PTR record for its IP address in DNS.

This is naughty, or at least slack, and sometimes the smell of a SPAMmer, and has caused a few remote mail servers to reject outgoing mail from my system. They are right to do this in fact!

Because I no longer have a whole class C block of 256 addresses (or indeed three of them: those were the days) I do not directly control the PTR records, and have to go through whatever custom system the ISP provides. Things has drifted apart over some years, thus the discrepancy. (It seems that even my ISP does not directly manage the PTR records any more, so changes may be a doubly-manual annoying and error-prone process!)

I will do a more thorough review of these mappings given how services have moved about, and I will also pin down outgoing mail to be from a single IP address with CLIENT_OPTIONS.

In the meantime, thanks to search engines and serverfault.com, I have discovered the confHELO_NAME configuration item, and I have set it to be the actual current PTR record domain name.

And my first outgoing email to one of the picky mail servers worked immediately. Hurrah!

Biliography Lite

To roughly halve the size of the HTML for the m-dot lite site, I now make a version with abstract and keywords omitted.

Conversely, for the desktop page now my notes default to open, and are thus searchable with the browser search function.

2023-01-15: Lowering Sizes

Every time I encounter the ProGuard Java shrinker and optimiser I am newly impressed.

I was unhappy with the >600kB JAR file supporting the grid intensity page.

After a couple of hours wondering why ProGuard was blaring warnings at me, I was down to ~50kB fully obfuscated (but a little hard to debug)!

Done with ~20 lines added to my ant build.xml file.

I have now settled on a ~80kB unobfuscated, slightly statically optimised, and much cleaner and more robust JAR file. Hurrah!

684487 Jan 10 18:56 reutils-1.1.21.jar
 84974 Jan 15 19:30 edhMain.reutils-1.1.22.jar

Lower, Lower!

By being a bit bolder and allowing almost all obfuscation to reduce size, but retaining some information (source file names and line numbers) to help diagnose run-time exceptions, size is now ~10% of the original. (Removing unrelated functionality no longer used would cut that further!)

 65114 16 Jan 15:04 edhMain.reutils-1.1.23.jar

Example exception output now:

% sh extraTweet.sh "testing 1 2 3"
FAILED command: extraTweet
java.lang.IllegalArgumentException
        at org.hd.d.edh.E.a(TwitterUtils.java:370)
        at org.hd.d.edh.Main.main(Main.java:146)

Naturally there were a couple of astonishments due to the obfuscation. But those were fixed and some overdue related code improvements done.

2023-01-14: Lowering Standards

A couple of pages on EOU, eg my PhD research page, are rated as "very difficult to read" by a Flesch–Kincaid scoring mechanism.

% reado --unfluff < PhD-research.html
...
score: 29.94
school level: college graduate
notes: Very difficult to read. Best understood by university graduates.

Pages tagged as 'technical' are allowed a lower readability than most. But the previous value (42) was way too high for these rogue pages, so that 'technical' threshold has been lowered to 25 for now!

That did not last long... Down to 15 within a week!

2023-01-10: Undead URLs

It is fascinating that search engines (in this case apparently at Microsoft) are still polling for Gallery URLs that have been dead more than a decade, possibly two!

d.hd.org:80 40.86.XX.XX - - [11/Jan/2023:15:18:16 +0000] "GET /_I/cat/13/20yuxbmq0pc.HTM HTTP/1.1" 404 397 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15"
www.hd.org:80 40.69.XX.XX - - [11/Jan/2023:15:18:18 +0000] "GET /Damon/_I/cat/19/2g4emqm6t78.HTM HTTP/1.1" 301 583 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15"
d.hd.org:80 40.69.XX.XX - - [11/Jan/2023:15:18:18 +0000] "GET /_I/cat/19/2g4emqm6t78.HTM HTTP/1.1" 404 397 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15"
d.hd.org:80 52.173.XX.XX - - [11/Jan/2023:15:18:19 +0000] "GET /_I/cat/19/rs1t27q1u0.HTM HTTP/1.1" 404 397 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.1 Safari/605.1.15"

2023-01-08: Posting/Tooting on Mastodon from Java

I did not see a nice copy-and-paste example elsewhere though it was not that hard in the end, so here is the core of my solution:

// Fetch the auth tokens, or silently abort if not available...
final String authtoken = getMastodonAuthToken();

// Send message...

// Here is how to do it with curl...
// (MAT is a file containing the access token.)
// % curl https://mastodon.energy/api/v1/statuses -H "Authorization: Bearer `cat $MAT`" -F "status=$1"
// See https://dev.to/bitsrfr/getting-started-with-the-mastodon-api-41jj

// Use URL encoding to force into ASCII (7-bit) encoding.
final String formEncodedBody = "status=" +
    URLEncoder.encode(statusMessage, StandardCharsets.US_ASCII);

final int timeout_ms = 10000;

final URL u = new URL("https", md.hostname, "/api/v1/statuses");

final HttpsURLConnection uc = (HttpsURLConnection) u.openConnection();
uc.setUseCaches(false);
uc.setAllowUserInteraction(false);
uc.setDoOutput(true);
uc.setDoInput(true);
uc.setConnectTimeout(timeout_ms);
uc.setReadTimeout(timeout_ms);
uc.setRequestMethod("POST");
uc.setRequestProperty("Authorization", "Bearer " + authtoken);
uc.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
uc.setRequestProperty("Content-Length", String.valueOf(formEncodedBody.length()));
final OutputStream output = uc.getOutputStream();
output.write(formEncodedBody.getBytes(StandardCharsets.US_ASCII));
output.close();
final int responseCode = uc.getResponseCode();
final String responseMessage = uc.getResponseMessage();

uc.disconnect();

(Note the embedded cURL example and link!)

Screenshot 20230108 first production Mastodon toot from Java
First production toot from Java!

2023-01-07: Author Lists

I have now put in place code to parse author lists in the bibliography (split by semicolon or "and"), truncate overly-long lists and tail them with et al. (the trailing "." makes it an unsexed abbreviation I understand), and standardise the separator in the HTML to be ";" for compactness. Each author gets their own itemprop=author section, which improves the metadata.

There are some potential potholes with ;s in the HTML entity codes for accented letters, but so far that has been swerved around by insisting on a space after a ";" as separator.

2023-01-04: ASCII7 Bibliography

It turns out that some people have accents in their names! I am shocked, I tell you!

I want the bibliography .bib files to stay ASCII7, ie 7-bit ASCII, for robustness.

Accents don't fit in that reduced character set. HTML can get round that with entities such as "&eacute;" for "é". LaTeX, which is in effect the source language of BibTeX, uses backslash escapes such as \'{e} (or {\'e}).

I have set up my HTML conversion to recognise the single one of these that has so far appeared! [bouckaert2021net]

2023-01-01: New Year New Stats

Yearly Electricity and Gas (ie heat and light) Carbon Footprint

I have spent most of today gathering and archiving and commenting on energy stats for 16WW!

One improvement was to switch to the hour-by-hour carbon figure for electricity for the front-page 16WW carbon-footprint graph (a snapshot of which is shown here) where available, ie for 2020 onwards.

(A little time was spent updating copyright notices for 2023, too!)

References

(Count: 1)