Earth Notes: On Website Technicals (2017-12)

Updated 2022-10-21.

Tech updates: allegedly too little markup, bad traffic, big hero, base download ms, service worker no rel, jump-to.

The jump-to links clue-stick from Tony McC was a revelation and improved both Google search results and on-page usability: hurrah and thank you! As to bad/bot traffic: it hovered around 50% for the following year, though there was some much more malicious hackery mixed in towards the end.

2017-12-30: Jump-to Links

Having been prompted by Tony McC's tweet ("How many people are aware of, and optimise for, jump to links?"), I have put in place automatic table-of-contents generation for longer pages where I have suitable IDs on headings. And I now get warnings where headings need to be fixed...

Although the ToC extraction is automated, to ensure that there was enough of it to give reasonable results across the site, I ended up editing IDs into h2 (and h3) tags in ~80% of the pages! (Generally, to avoid bloating page size, I made most of the new IDs to be one salient word from the header's text for good compression with the likes of gzip; I mainly left existing IDs as-is.)

I took three major attempts to be happy with the styling: initially I had an h3-headed small bullet list, which became a h2-headed sidebar list, then a details-wrapped list to avoid eating too much vertical space and distorting some layouts.

I also slightly increased the number of words before an ad insertion to help avoid ugly layouts with too much stacked in the right-hand margin (though I turned it up for headed-sidebar, and did not turn it down with details-sidebar, so my first ad is now significantly further down the page). I also in passing fixed a subtle bug counting words for the ad-insertion point that caused an HTML5 conformance error if the ad was inserted between the section tag and the child header tag, due to counting the words for the header itself and inserting the ad above that line. Subtle and rare, and only caught by testing the whole (full) site for conformance.

2017-12-29: Service Workers no Relation

I have been doing a bit of fiddling with service workers to help make this site work better off-line (or with poor networking). None of this will apply until I start serving the site under https, and that is still a way off (even under TLS 1.3 with 0-RTT there is still at least one slow-for-mobile round-trip added to the initial page view).

Service Workers look very useful, but there's a lot of code to write right, or a fair amount to load (10s of kB potentially) to use one of the flexible and popular libraries, all a bit much for this site's intended application.

One small ray of hope, arrived at wondering why Firefox 57 was rejecting all my in-line JavaScript, including that to register my Service Worker and even apparently the Google AdSence code, was a proposal to Add support for LINK rel=serviceworker as an element and header to allow the installation of Service Worker declarative in a document or via a header, eg allowing something as simple and elegant to load and register as <link rel=serviceworker href=sw.js> rather than:

<head>
...
<script>
if (navigator.serviceWorker.controller) {
console.log('[Service Worker] active service worker already running');
} else {
navigator.serviceWorker.register('sw16.js', { scope: '/' }).then(function(reg)
{
  console.log('Service worker has been registered for scope:'+ reg.scope);
  });
}
</script>
</head>

I spent a fair amount of time searching and failing to find anything definitive. I certainly failed to make it actually work on my test site. So I wrote to the owners of that putative feature and got a fast response (from Marijn K):

That feature was never actually implemented in browsers (other than behind a flag in chrome), and was removed from the ServiceWorker spec in https://github.com/w3c/ServiceWorker/pull/1207 If you do have compelling use cases for this feature, you could open a new issue on the ServiceWorker spec repository explaining your use case, and we might reconsider bringing it back...

I am only fiddling around. So I cannot present a compelling use case, yet. I can whine into the void instead:

Other than it being much much tidier and compact (and semantically safer to prefetch the targets of, parallelise, etc) than having messy JavaScript inline (or a separate file, or even managed and folded into some 'global.js')?
I have always been far keener on compact clear declaritives than random messy imperative code for reliability and maintainability! And that's across codebases from embedded (robots and radiator valves) to the edges of HPC such as Lehman Brothers' finest optimised credit derivatives...
...

Noindex and Sitemaps

An interesting thread on Twitter about noindex, link following and sitemaps...

(I got a like from @JohnMu!)

2017-12-27: GSC Crawl Base Download Time

Today GoogleBot crawled/spidered many more items on the 'full' site than usual, mainly very small data text files, and the mean 'time spent downloading' dropped to a low of 133ms, matching the recent low for the mobile/lite site.

2017-12-22: Big Hero

I am fairly happy with the multi-column display on the home page of newest/popular/updated article, and experimentally added a featured article hero and link (manually-selected, 'full' site only) above, and 'random' article link below.

I have now also expanded the featured article image to half target-page width, ie 400px for the full-fat site and a nominal 320px for the lite site if it were ever to be shown there. Full-fat home page weight goes up a bit, but I think the UX improvement justifies it.

I will be automating the featured article selection by month to line up with a nominal editorial calendar; the manual override will remain available.

SmartThings

Hush, hush, whisper who dares... I'm not mentioning at all the fact that I have just installed some SmartThings to cross-correlate against OpenTRV sensors; nor that it took a titanic ~2.5h support chat with ST support to get the hub working but thanks to Daniel for sticking with me until we got there; nor could I possibly praise SmartThings Google Sheets Logging as a neat and quick solution to export data from the ST system which manages to mix the cloud, Web apps, phone apps and IoT while actually getting things done rather efficiently!

2017-12-16: Bad Traffic

About one quarter of all the hits on this site (across full and lite variants) was previously identified as one bad distributed (mainly Asian) bot trying to rapidly download the same page over and over in occasional bursts from different hosts. Its aim remains opaque. For now that is blocked relatively simply in the Apache configuration, so there are still log entries but little actual bandwidth cost. The bot carries on oblivious.

In the last few days I identified much of the rest of the traffic as being from a couple of the big SEO firms' bots, while providing little to no value for this site. So I have added some robots.txt entries for the most egregious cases and even asked one to manually blacklist my domains (though it still seems to be visiting somewhat anyway). One of those firms boasts of being one of the most active bots/spiders just behind Google and maybe Bing. Considering the lack of value to most of the sites crawled, eg to help competitors out-rank them, and not bringing any new eyeballs, I would have thought keeping heads down would be more appropriate...

With those two measures in place, bot hits are now reduced to more like one third of all hits, and most of those with little bandwidth cost.

All this is in my head because mid-winter I am trying to minimise unwanted processing while the server's off-grid batteries are low.

Behind all that bot barrage there seem to be somewhere between 100 and 200 unique human visitors each day. Not quite Google nor the BBC News site, but not quite zero either!

2017-12-11: Too Little Markup?

As I noted at Webmaster Central Help Forum:

I always like to keep my pages light-weight, especially for mobile (and low-bandwidth) users, and have since the '90s! Remember 9600bps connections and Netscape?
I saw a strange warning from one of the well-known SEO outfits saying that I had too little markup on my page and it was a SPAM flag, apparently a significant one.
I have in a previous life been paid to optimise code, and I can, and I have applied minimisation techniques to the HTML and to the CSS and have very little JS for example.
Am I really going to upset an SE such as Google by having very lean markup?

Given the "Sounds fishy" response I got, I won't lose much sleep over this!