Earth Notes: On Website Technicals (2018-08)

Updated 2024-05-06.
Tech updates: PWA revisited, auto lazy loading, jumpy AutoAds, more content pyramid, CRP and efficient canonicals, custom 404.
As ever, continuing to try to improve the user experience (and eat less bandwidth) with lazy loading, keeping a leash on the CRP, etc. And not enjoying AutoAds' jank, partly prompting me to turn them off a few months later returning to ad placement by my code.

2018-08-27: Custom 404 Page

About 10 years late I finally have a simple (fully-static) custom 404 error page. All it required were the following lines at the end of the VirtualHost sections for the full and lite sites:

# Custom 404 page.
ErrorDocument 404 /404.html

This new fairly minimal custom page is several times the ~400 bytes by default served by Apache for a 404.

(This is not perfect/finished yet. Eg, malformed requests to the mobile site that don't look like top-level pages get redirected to the desktop site 404 page.)

This points up the fact that the search-engine-supporting metadata in the header and elsewhere could be omitted for pages such as the 404.html that are never intended to be indexed. That would save a fair amount of overhead, though might hurt some screen readers.

This could be triggered on noindex alone, or just on noindex 'site' pages.

(Bing insists on probing all-lower-case versions of pages frequently, and all bogus page probes eg for break-in attempts will waste some bandwidth too.)

As a first step, I am omitting the word count from the foot of all trivially-small pages, such as the 404 page. Also I am stripping some other non-essential text from full-fat noindex pages as if mobile/lite. This is simple, hooking into existing logic. It saves ~10% of the uncompressed 404 page size for example.

As of I added custom 406 and 429 error pages and subsequently made all the custom error pages somewhat smaller, since they represent wasted bandwidth themselves!

2018-08-18: CRP and Efficient Canonicals

If a site (or site group) has multiple page versions presenting the same information, one page should be designated as canonical and the others as alternate (or even noindex). This avoids indexing duplicate content unwittingly.

EOU has mobile/lite (m.) and full-fat/desktop (www.) versions of each of its main pages. The canonical/alternate relationship is declared in the head with link elements, which takes a bit of space on the critical rendering path (CRP) but is OK. There is enough repetition that gzip compresses this well.

However, if I continue to support http while bringing https on-line, there will be four versions of each page. The https://www. 'secure' desktop version would be canonical, and the https://m. 'secure' lite/mobile version the preferred alternate.

Having the four link tags in each head is going to start to get heavy. It's not even clear to me what the relationship between the https and http desktop pages should be!

Google at least supports canonical and alternate annotations in sitemaps. Maybe that would allow only the main (https) canonical and alternate links to be in every head, with the sitemap only published for the https site, and with the canonical and all three alternates listed there with appropriate markup.

Search engines that don't understand the extended sitemaps attributes may end up simply ignoring the http page alternates, and index only the https pages. I could live with that.

Also today I visited the Beer Lovers microbrewery, and suggested adding a map to the site! All to help a fellow site owner, natch.

(As of there is a map!)

2018-08-17: More Content Pyramid

I continue to think that a content pyramid approach is useful to satisfy a number of categories of visitor, from quick skim for an "answer" to in-depth research. (And I still like a pun-laden but informative cross-head here and there!)

If your title promises an answer then deliver that answer in the first paragraph of your article eg with hypotext or 'accordion'.

EOU tries to achieve some of this through the title and description/sub-title which are pushed to search engines in meta-data as well as appearing at the top of the page. So no 'click-bait' SERPS result leading to completely unrelated body text for a start. Then an optional 'intro' paragraph (not created for many pages yet), then the main body, then a good set of links/references at the end.

That main body text may contain details/summary 'accordion' sections, which have the advantage of being 'semantic' tags (I think), and not requiring JavaScript, and never failing to display fully on any browser that I have so far encountered.

I can still probably do better, especially on the longer pieces.

2018-08-16: Jumpy AutoAds

Most of the (Google) ads that I am running on EOU currently are the 'AutoAds' injected with a single script near the top of the page. Google chooses where to inject the ads, if any. (Not at the top or in the head so as to stay out of the critical rendering path or CRP.)

One problem is that such ads often change markedly in placement on page reload, often disappearing entirely and making the page jump away from where I was looking (aka "jank"). This is quite annoying.

Also I now have no real way of telling which lucky pages are actually getting the clicks, which is one less clue about which content to concentrate time on, improving and keeping up to date.

2018-08-15: Auto Lazy Loading

In general I am in favour of "lazy" approaches to complex computations, such as displaying complex pages with lots of weight and tricky rendering, where often not all the possible elements are not needed. A visitor may never scroll to the end of a page and thus loading images late in that page is a waste of bandwidth and probably time and money for the site and the visitor.

But even in HTML5, until now, pretty much all the lazy loading schemes have needed JavaScript support and are fragile and burdensome in significant ways.

I think that a browser is perfectly placed to avoid loading things until they are actually needed, folding the logic into the preloading and other cleverness that already goes on. I observe that some people get hot under the collar when it is suggested that the browser do this, and not load everything immediately. Frankly I am amazed that modern browsers are not lazy loading already, especially in any kind of 'data saving' mode.

I was therefore pleased to see Built-In Lazy Loading Lands in Google Chrome Canary and Feature Policy: lazyload.

However, my local W3C HTML5 Validator will not (yet) allow me to add lazyload="on" to existing img tags speculatively!

2018-08-14: PWA Revisited

I was pleased to see this tweet: There's a common misconception that making a PWA means creating a Single Page App with an app-shell architecture. I certainly hope to fold some PWA enhancements into EOU over https, while leaving the plain HTTP site working just as before.