Earth Notes: On Website Technicals (2019-01)

Updated 2023-04-14.
Tech updates: Happy New Ear, cssgip, work storage, AMP srcset, LEDs, details, 400kpx image warning, bad AutoAd, indigestion, multi-hero, OGP revisited.
What might happen in the New Year on this site, including a snapshot of some items from my secret to-do and FIXME list, and lots of AMP-related fiddling. And a new energy-sensitive makefile-based 'work' storage scheme...

2019-01-28: Twitter and Multiple Images

I've tweaked page generation to add a square thumbnail (SQTN) image to the page head. For this there has to be a declared SQTN, it has to be exactly square, and the head/CRP has to be not large already.

Interestingly, when there are two og:images in the the head of a Twitter summary_large_image, Twitter seems to show the last one. (This would be consistent with their treatment of other repeated/redundant info, but the Twitter documentation is not explicit on this.) So I have put the LIMG last to encourage Twitter to show the often-more-interesting LIMG.

This may mean that some social media channels show different images blindly, not picking the one with optimal aspect ratio for example.

For Facebook, according to Open Graph - Object Properties:

You can include multiple og:image tags if you have multiple resolutions available.

2019-01-27: OGP: Open Graph Protocol

Looking again at The Open Graph protocol I note that it says that all pages should have at least og:title, og:image, og:type, and og:url.

I don't have the last two, though Twitter has been happy to validate my pages as Twitter cards, and I do have equivalent semantic information as schema.org subtypes of CreativeWork for og:type, and link rel=canonical for og:url. I'm particularly not keen to waste space in the head/CRP for the latter duplicating its content given the HTML5 semantics are clear.

I also have optional values such as og:locale.

Is the site suffering in any way, eg not getting some visibility that it might? I don't know, though I doubt it.

I was looking at this to see if there was anything that I could trim, to reduce the size of the head/CRP. For example drop the <meta property=og:locale content=en_GB> and extend the <html lang=en> to <html lang=en-GB>? I have done that for now with complaint from neither the v.Nu HTML5 validator nor the AMP validator. It seems to cut ~8 bytes from the GZIPped size on a sample page (~37 bytes cut before compression). This has allowed a couple of desktop pages to keep their head/CRP short enough without ditching their (optional) video tags, good! The Twitter card validator seems happy too!

It may yet be worthwhile replacing the Twitter-specific <meta name=twitter:card content=summary_large_image> with something like the smaller and more generic <meta property=og:type content=article>. It looks as if Twitter would accept the latter, though as a possibly-plainer "summary" card.

(Note that even https://www.facebook.com/ does not use og:type, and it is not listed under 'should use' in a developer page, though it does use link rel=canonical and og:url. And Optimizing Metadata suggests that the og:url could live on only the canonical page as indicated by link rel=canonical. The-Open-Graph-protocol claims og:url not to be necessary when link rel=canonical is present.)

(I did sneakily adjust one of my CSS files to compress better (sizes adjusted from 256 to 222) to slip one more page in under the head/CRP limits while retaining its video elements in the desktop page version... And removed a few superfluous quote marks from the AMP wrappers, and a few stray spaces when inserting page-specific extra headers...)

2019-01-26: Multiple Heroes

I now allow multiple (large and square thumbnail) hero images for each page (with the first/primary preferably 1200w and 800k pixels area minimum per Google guidelines).

The primary is mentioned in the page header, ie as the og:image.

All large hero images are candidates to display on the page in the hero position and for links to that page, eg from the home page.

All of the large and square thumbnail images are now mentioned in the page footer as ImageObjects (except for lite pages, where only the primary is listed, to save page weight). The aim is to give search engines and social media a wider set of aspect ratios to work with. (Google suggests 16x9, 4x3, 1x1 aspect ratios.)

2019-01-25: The path of index ingestion never did run smooth

20190125 AMP 54pc pages indexed from GSC Enhancements view

At the moment the number of valid AMP pages reported against canonical pages in GSC is going down, even though the coverage report against the main sitemap shows all main pages as indexed. No AMP errors or warnings are reported.

Shakespeare had the right idea...

2019-01-24: IMG title

Since there is generally no hover on mobile, the title for an img is unlikely to be seen (eg as a tooltip). Therefore, to trim non-desktop page size, I'm omitting the generated title tag from IMGs. This should not impair mobile UX.

2019-01-20: CRP Head

With all the different permutations of link rel between page versions, dynamically including support for ads where supported, and optional preloads for speed, keeping the header and CRP (Critical Rendering Path) small enough to get out in the first TCP frame is increasingly complex!

I already had in place an optional -H flag for the main page-generation script to minimise the header if a first attempt created an oversize one. That has had to be made a little more brutal, eg to shut out ads on AMP since ad support for those pages must be inserted in the head. But in general this lets my optimistically cram more into a header than I might otherwise, allowing a valid page to be created by discarding some non-essential stuff if necessary.

Alongside that I have now put in place a mechanism that when large optional chunks are being added to the CRP a LARGEHEAD flag is set. Then some really-optional stuff such as preloads can be silently dropped even without the -H flag, ie a more gentle slimming of the header content on the first attempt.

2019-01-19: Auto Ads Bad?

Having never has a single AMP AdSense 'auto' ad appear that I know of, and being rather concerned that there are just too many non-AMP AutoAds on a page, I'm reverting to having my code insert ads in sensible places. I can nominally switch back to AutoAds at any time with a one-line change.

(Those sensible places are basically immediately above headings. These seem good breaks in the reading flow to interrupt with an ad, if any are. However, the W3 Nu Html Checker does not like anything injected between the start of a section or article and the h2 or whatever that should be its first child. So section and article are also injection points, and should then get the ad rather than the immediately-following heading.)

I have tweaked my code to inject up to a tunable limit of ads on longer pages, less than the ~8 of Auto Ads.

There is still no automatic ad injection for 'lite' pages.

Now I have 'manual' AMP ads showing, in the Chrome and Firefox responsive/mobile developer view and in my Opera Mini on Fairphone, but not on a desktop browser view. In fact, the desktop view (Firefox, Chrome, Safari) is a big ugly empty space that doesn't collapse. Why? What were they thinking?

2019-01-20: curiouser and curiouser. AMP ads will display on (for example) desktop Firefox if the viewport is narrow enough.

2019-01-17: 400k Pixel Minimum Area?

I think the threshold at which GSC issues the "Image size smaller than recommended size" warning may be 400k--500k pixels area. I set my script's warning threshold to 500k for when building a page, and all but a handful are now above that. The smallest area hero was below 300k, and when I went and inspected it in GSC there was indeed a warning lurking that it had not yet volunteered to me. So I beefed up that image and GSC is happy for that page.

It will be interesting to see if GSC complains about the sub-500k images, some of which will be hard to get higher-resolution versions of.

20190117 AMP 57pc pages indexed from GSC Enhancements view

After something of a hiatus during Google's global index update, the new AMP pages seem to be being absorbed again.

(2019-01-22: given a lack of complaints about AMP pages with hero images down to just over 300kpx I suspect that the threshold may be nearer there, and Google's non-AMP image guidelines mention a 300kpx minimum.)

2019-01-15: New GSC Image Warning

Today I received an interesting and new warning from the Google Search Console (GSC) for about eight of my EOU pages re "Linked AMP version is valid with warnings" and "Image size smaller than recommended size". The Google guidance says to aim for the page image (or images, in different aspect ratios) to be at least 1200px wide and at least 800k pixels area. While almost all my hero images meet the first of those two, the latter is often not going to be met right now.

The warning arrived by email, and is all described in more detail in the on-line console.

As an experiment I fixed one of the sample/example pages to meet the requirements, and the live test indicated that I had indeed fixed the issue.

Quite an efficient and effective workflow, and the change is good for (eg) Twitter postings too.

While complying with Google recommendations is not necessary, if done right doing so benefits pretty-well all uses and users of the page.

Good one Google!

(2018-01-20: all the warnings have gone, though the last of the eight pages lingered for several days for no obvious reason!)

2019-01-11: AMP details, details

Hurrah! As of 5pm GSC had stopped complaining about use of details and summary in my 'canary' page.

So I put back use of this in the home page also, and will, sunshine permitting, roll out the 'Contents' feature to all suitable AMP pages in the next day or three. Excitement! (The Contents implementation depends on details.)

Since I'm having fun and forcing lots of stuff to be rebuilt anyway, I'm taking this opportunity to slightly reduce the threshold on page size before auto-injecting a Contents section.

I've also slightly improved the image background colour (to show while images are loading) to be omitted for images with any transparency, and where that background colour would be the same as the page (ie white). The former is more a correctness issue, the latter a space optimisation.

Also, while the gloves are off, I'm now cross-linking the m-dot and AMP pages, both in the header (link rel=) and in the top navigation.

2019-01-06: AMP LED Reviews

After a considerable amount of rework and general plumbing, the LED lamp review page is available in AMP!

2019-01-04: AMP Progress

According to Google Search Console it seems to be absorbing about eight AMP pages per day (out of ~200 candidates) into the index, and is ~40% done.

20190113 AMP 47pc pages indexed from GSC Enhancements view

GSC is reporting a fairly strong swing away from showing m-dot pages towards AMP pages. It's not yet clear if there will be any switch from desktop pages to AMP.

I'm not yet seeing any sign of the smallest new srcset images (targeted at 360w x 1.5 density) being used and pulled through the AMP cache. Probably far too soon anyway.

... And then it all paused and indeed one page dropped out of the index, with the number of indexed pages rising again slightly on the 8th. During this interval Google has apparently been rolling out a major index update. It would make sense not to add more pages during such an update.

(As of 2019-01-12 the rate is still low, but ~47% of the AMP URLs have been indexed.)

2019-01-02: AMP srcset

Following Gregable's comments on apparent AMP cache Oddities, I am now inserting an explicit srcset in autogenerated mobile (amp. and m.) body images, with a new lower bound on device size of 360 CSS pixels at 1.5x pixel density, ie 540px real. This smallest image size is eligible for the desktop srcset too. This can provide ~33% image weight savings for the smallest displays.

(As of 2019-01-20 I can see some apparent small use of these smallest 540px/270px/178px images.)

I'll see if that works once some AMP pages time out in the cache, such as in the cached version of the Ecobuild Exhibition write-up.

By some trickery I can see that my srcset is indeed honoured. So I may extend this feature to more body images (non-autogenerated), and to hero images such as for the home page carousel.

2019-01-01: Happy New Ear

Old to-do lists are often interesting: what seemed important before often seems much less so now! For future amusement then, if nothing else, is a sample (not neatly formatted) from my secret-squirrel list...

In future I might move most of the live list into public view, but a snapshot still has value.

To-Do

  • Try cssgip for (esp) hero images.
  • Wrap some Q&A elements in schema.org Question and Answer.
  • Start adding blog entries by day under db.st/.src/2019/01/.DD.html and have skeletons constructed automatically with texts inserted. The description may need to be manually created, or could maybe be stitched from H2 headings until full.
  • Revisit all use of HTML5 lines for AMP pages to re-introduce missing material.
  • Add 301 redirect and other partial auto-canonicalisation for query params for m. (and amp.) also.
  • Reject any GET under /img or /out (or data) with a cache-buster ? param with 403 forbidden.
  • Add social-media buttons to AMP version somehow... Maybe with a clickable image map expanded by wrap_art.sh itself esp for AMP version? https://www.highballblog.com/2010/02/how-to-create-image-map-for-your.html
  • Remove G+ social media button.
  • Move scripts from .work/*.sh to script/ or .work/script with the former preferred (more 'open').
  • Have many of page header processing tools (eg to extract LIMG) stop at first blank line to save time and avoid some accidents!
  • Consider converting some TechArticle parts to http://schema.org/HowTo and HowToSection https://twitter.com/maxxeight/status/1075259222135332864 https://api.slack.com/slash-commands
  • Stop the build for ERROR on stderr when building page insert.
  • Allow multiple LIMG/SQTN per page (pref 1200w min and 800k pixels area min) at 16x9, 4x3, 1x1, to support https://developers.google.com/search/docs/data-types/article (maybe the second and subsequent can be added at end of page to avoid the CRP, and maybe in JSON-LD to save space).
  • Include in Page Media actual hero image used as well as primary (eg in metadata) if different.
  • Improve non-descriptive link anchors such as "here" and "more".
  • Auto-extract data from mains-inlet temp .html source and create out/ graph included in the page.
  • For IMG link back to Gallery cat page, link to lite/xhtml for mobile?
  • Extend Gallery use to cover hero images including banners.
  • Tidy up use of small images from Gallery to allow desktop srcset eg for IR image of fan in MHRV-mechanical-heat-recovery-ventilation.html.
  • In footer (maybe only desktop) add link to next page (per PAGES original order) and possibly prev to make systematic review of all/most articles easier.
  • Remove Firefox-only 'immutable' from Cache-Control? https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching
  • Move bulk of OpenTRV (and archive) links to note-on-other-sites.html and clean up / split.
  • Build utility to find and eagerly pre-build all hero and body images. Makes full image rebuild separate from page build. Have the .html page depend on a report of the pre-built hero and body images for a page so that the page build itself is quick. This could also build an image sitemap (esp for body images).
  • Pre-create (and have Apache serve) pre-compressed (SVGO+zopfli/brotli?) .svggz and .svgbr files? Maybe make such versions under a/b/ to capture better compression than the originals even if scaling isn't done, at least for mobile.
  • Run AMP HTML through html-minify, but will need to keep head/body tags...
  • Make desktop article hero image and front page feature image 800w 4:3. Use picture or srcset to avoid huge download for smaller screens. Maybe drop front-page own banner to save weight...
  • Add OpenTRV pubarchive DataSets.
  • Add new direct-to-kitchen circuit to schematic (brown+black) and add colours of existing (?+green).
  • Create (in makefile?) day-per-frame (30fps, thus ~1M/s?) video of battery behaviour for expanded battery system piece, and grid-tie for saving-electricity pieces?
  • Add lossless_JPEG_compress.sh flag to allow full lossless optimisation of even tiny JPEGs.
  • Auto substitute Save-Data page requests with 'lite' page content. See http://m.earth.org.uk/note-on-site-technicals-11.html#Lite
  • ARTICLE: create appropriate http://schema.org/Comment for EOU/homepage reader testimonial and possibly letters pages (inbox).
  • ARTICLE: write up new bathroom LED fixture (2018-03).
  • ARTICLE: put up Clas Ohlsen LED review.
  • ARTICLE: note that Lueco LEDs failed quickly.
  • ARTICLE: in LEDON piece link to broken bulb internal pic.
  • ARTICLE: write up m/sitemap.xml autodiscovery with lastmod and mention in m/robots.txt only, for now.
  • ARTICLE: Air quality piece incl urban wood burning eg WBS (and GBF+EB debacle).
  • ARTICLE: Add facebot log and SSH bruteforcing (~5% potential idle/sleep lost) SPAM to carbon cost article. Document facebookexternalhit SPAM attack (and FB's indifference).
  • ARTICLE: Timeline to destruction: what to do with PV etc if house pulled down.
  • ARTICLE: Create a series of short DID YOU KNOW videos for social media, wrapped in 500-word articles. (Do we embed an og:video?) See: https://rigor.com/blog/2015/12/optimizing-animated-gifs-with-html5-video for flags etc.
  • Rewrite all LED db '<img src="img/' to use protocol-relative static URL.
  • Make banner hero image full-bleed if page no wider than image (800px/640px for full/lite). Consider adding super-wide (eg 1200px) hero banner for wide pages, also full-bleed if appropriate. Note Twitter Boostrap breakpoints of larger than 992 px (medium devices) larger than 1200px (large devices). Possibly overlay header and/or description on image for desktop. Eg see https://codepen.io/mrnathan8/pen/KwKdmO
  • Have featured article ~25% of the time be an EASYREAD article (other than already FEATUREM).
  • Extend image XxY extraction to video (eg with mediainfo).
  • Add F/T/V (Featured, Tagged, Video) etc flags to 'All' articles list in sitemap.html.
  • Add 'online' live testcase for .jpgL serving with img/TRV-1.jpg{L} and .pngL.
  • Fix/improve colour contrast for _gridCarbonIntensityGB.(x)html. See https://dequeuniversity.com/rules/axe/3.0/color-contrast
  • Fix the generated HTML for _gridCarbonIntensityGB.html to use new CSS to be more compact.
  • Put in check that smallest page (offline.html?) < 1200 bytes, home page < 10kB, icwd.
  • Possibly promote media incl video (pref the hosted media to save site bandwidth) early on relevant pages.
  • Add warning for all (non-SHORT, no-NOINDEX?) pages without pgintro to encourage creating it (and inverted pyramid) nearly everywhere!

Maybe: No Immediate Fix Required

  • Fix PNG hero generation to try to preserve transparency first.
  • Serve low-fi version of images where possible if referrer not from earth.org.uk as well as for Save-Data. Reduces cost of hotlinking. Would need Referer added to Vary.
  • See how small .jpgLL can get with jpegrescan and maybe add to end of inlineable search list on 'lite' only to get fewer separate fetches (and more stuff bundled in the first HTML fetch).
  • Create dynamic make dependencies on on <IMG src="out/..." ... /> sources?
  • Add support for srcset for IMG, esp for out/* dynamic images.
  • Respond with 410 gone (or 301 redirect) to all images under /img/autogen.
  • Make site-technicals pages 'BlogPosting' to possibly get into the Google AMP results carousel. (With tag BLOG?) Though BlogPosting may offer no advantage over any Article or sub-type.

Evergreen

  • Improve non-descriptive link anchors such as "here" and "more".
  • Write more test(s) using test-page.html.
  • Move scripts from .work/*.sh to script/ or .work/script with the former preferred (more 'open').
  • Link through from appropriate Gallery pages/sections to targeted EOU pages, eg those that use the images in the Gallery section/page.
  • Rename all img/css/ minified to .min.css as upgraded.
  • Consider adjusting titles to 50--55 char (~8 words) per https://twitter.com/CyrusShepard/status/1039534537351614465
  • Add suitable (open, data) licence to each Dataset. http://creativecommons.org/licenses/by/4.0/ ?
  • Keep links fresh (eg fix broken ones) on most most popular pages (and home page and site technicals index).
  • Create new pages with all-lower-case URLs to reduce bad traffic, eg from BingBot.
  • Wrap repo links (and code snippets?) in schema.org/SoftwareSourceCode. Eg for OpenTRV pages.
  • Make low-CTR search-query target page descriptions calls for action.
  • Replace more text-align:right and text-align:center c/o ta.css
  • Try to raise most articles to readability > 50. (egrep "readability score 4" .m/*.info)
  • Consider using <abbr> to expand terms in many pages, esp non-tech.
  • Ensure that static.EOU redirects to www.EOU to avoid dup content issues.

Jumping the Shark

I may just possibly have started tinkering with the first (cssgip) item, reviving some fast and simple ImageMagick code to extract a mean colour: convert XXX.jpg +dither -colors 1 -unique-colors txt:. That hex colour code can then be dropped into a style attribute eg style="background:#efefef" in the image tag.

A quick run in WebPageTest emulating a 56kbps modem connection shows that once the (Chrome in this case) browser makes space for the image, the background colour is used for undecoded areas. This will be less useful on progressive images, more useful on typical non-progressive PNGs. It is annoying that browsers will not believe supplied width/height attributes and make space ready, before attempting to load metadata from the source image. For a typical mobile device, with (often) plenty of bandwidth, but high latency, there is not much time between receiving metadata to size the image on the page, and starting to paint it. Maybe with decoding=async the gap might be more usefully filled by this background effect.

Work Storage: Make Files while the Sun Shines...

(See next work storage note.)

Parts of the site such as live stats graphs are (re)built more often when there is more energy available in the off-grid system, eg when the sun is shining brightly and the batteries are full. On the grid this would be known as "Demand Control" or "Dynamic Demand Management (DDM)", or "Dynamic Demand". This simply reduces the frequency with which some work is done when energy is scarce. Work avoided in this case may never (need) to be done at all.

Now I've added to that by relaxing the strictness with which some pages are rebuilt unless energy is abundant. Thus I store up work when energy is scarce, which is an alternative to storing the energy to build those pages. This may also mean that some intermediate builds are never done and that energy is permanently 'saved' rather than just having its use postponed. I have done this for the Gallery for many years.

There is going to need to be a lot more of this, woven into the fabric of the Web, as routine.