Earth Notes: On Website Technicals (2017/10)

Tech updates: rounded corners, mobile usability, HTTP/2 vs mobile, bad bot, UnCSS tweaks, latency, unit tests, visuals, Save-Data, lite vs mobile.
tools

2017/10/30: Lite vs Mobile

Because the www. main site attempts to be very responsive, and the m. mobile site is all that and also bit lighter-weight, I've changed the text in the top navigation from "mobile" to "lite" to give people more of a clue of what they're getting, whatever device type they use it from. The mobile site may still be a better fit for viewports under 640px, but that's not its only use!

2017/10/28: PNG Save-Data

I hand-crafted a few well-used .pngL file versions such as the 'tools' hero image for this page in its multiple variants, and will continue to manually created lo-fi vesions as I find pertinent images with significant use. Anything directly linked to from the mobile pages is fair game for example.

To quickly scan for all " 200 " responses (GETs) at or over 10000 bytes to PNGs on the main site referred to from the mobile site I could use:

egrep '^www.earth.*\.png HTTP.*" 200 [1-9][0-9]{4,} "http://m\.earth\.org\.uk\/' LOGFILE

There aren't many big PNGs currently being requested, but there are a few big JPEGs. Where they are inline in a page there may be value to a visitor in creating the 'L' version.

I also added hero auto-generation for PNG files, and found a good ImageMagick convert combination to be -trim, -resize, -posterize. The order is critical; attempting to reduce colours (to reduce final file size) before resizing is not useful, for example. The -trim has to be be done first so that the final target image dimensions are what are expected.

This auto-generation includes the .pngL outputs, though size reductions are typically much less dramatic than for .jpgL.

After ImageMagick convert I am using zopflipng to squeeze a final ~8% from the file size.

Picture

There's now a picture wrapper for desktop hero banners to select the mobile version of the image for small screens to save significant page weight, and be even more responsive.

Not all browsers will support this, such as apparently my Opera Mini, but the fallback to the desktop image should be reliable.

2017/10/25: Save-Data Brainwave

I've noticed that my mobile's Opera Mini browser is using the Save-Data request header, and I've installed the Data Saver Chrome Plugin which should do the same, though I've not seen evidence of it working yet. So I have some real-life easy ways to test changes involving the header in any case: I can eat my own dog-food.

In the same way that I have Apache send pre-compressed HTML .htmlgz pages where available and where an appropriate Accept-Encoding request header is present, it occurred to me that I could swap in the mobile version of a page with Save-Data, though that would not quite fully work right now for minor technical reasons, but also would stop users getting to the richer 'desktop' pages explicitly, even if they wanted to.

But maybe I could do the same with selective (lower-quality, fewer bytes, identical subject and dimensions) .jpgL files alongside key original .jpg image files, having Apache serve the .jpgL where available and where the Save-Data request header is present. This could be particularly easy to arrange for the autogenerated 'hero' images which are mainly for decoration. Combining that with more intensive use of srcset, again especially for the hero images, could trim back page weight significantly without any extra JavaScript, etc, in my completely static pages. (Save-Data would have to be added to the Vary header.)

Apache mod_PageSpeed can do something very like this dynamically; my proposal is entirely static for best response latency and throughput.

I think it would also be be good to have a hero banner width specifically to support the very common 320px viewport, again to reduce download size in common cases. The desktop page might have a srcset roster of (800px, 640px, 360px), and the mobile only (640px, 360px), with each of those potentially able to drop back to a reduced byte weight for Save-Data requests. All responsive design, all invisible and fairly simple and robust, with decent default behaviour.

I would need to be slightly careful about adding too much HTML overhead before the real body content, and one thing in favour of the .jpgL method is that it adds no HTML at all.

I may also use something slightly more subtle than .jpgL, even though those images are really not meant to be used directly, such as (say) .JPEG differentiated on spelling and case.

See also: New Save-Data HTTP header tells websites to reduce their data usage.

Tinkering

Here is what a couple of the initial results look like (with the lo-fi version renamed to defeat any substitution on the fly by the HTTP server):

Here are some new lines for Apache's mime.conf:

AddType image/jpeg .jpgL
AddType image/png .pngL

In the site definition I need to tell clients/caches to allow image responses to Vary with the Save-Data header, as well as send out the Lo-fi version of the image if it exists and the browser is setting Save-Data:

# Ensure caches handle Save-Data correctly.
<FilesMatch "\.(jpg|png)$">
  Header append Vary "Save-Data"
</FilesMatch>
# If client has Save-Data header set (to "on", initially).
RewriteCond %{HTTP:Save-Data} on
# ... and if the lo-fi image exists...
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME}L -s
# ... then send .XXXL content instead of .XXX.
RewriteRule ^/(.+)\.(jpg|png)$ /$1.$2L [L]

I have manually created the .jpgL version of one of the key (manually-created) site images as a canary to test that the system is working. (It is possibly to manually create any such required images.) Note that I am not yet doing anything with PNGs, though support is there.

I have set up the auto-generated .jpgL images to be about half the weight (ie file size) of the .jpg files, and since images are often the bulk of the entire loaded page weight, that makes a significant difference to total bandwidth used.

So, for example, setting Save-Data knocks a couple of seconds off the time to load the mobile home page with the slowest (dial-up) seetings that I can easily set in WebPagetest.

And it's live! Just 8-non-comment lines of Apache config and no changes at all to my HTML. I did have to get my scripts to autogenerate new lo-fi versions in tandem with the existing ones, but that was not very hard. I can see my desktop Chrome and my mobile's Opera Mini correctly pulling the normal or lo-fi images depending on whether they have data saving enabled. Hurrah!

2017/10/24: More Chrome Dev Summit

Watching the second day...

The Save-Data HTTP header and other indications of effective bandwidth and memory constraints may suggest pushing a visitor to the mobile/"lite" site or even replacing a few images (etc) on the fly in Apache with rewrites.

Given that progressive JPEGs are significantly harder work to decode on mobile, I have switched all for-mobile autogenerated JPEGs to be non-progressive for this site. (I regenerated the images and tested the effect in WebPagetest while watching the the livestream!)

The priority-hints proposal (a group attribute on img and script tags) looks interesting, as does the async img proposal for non-critical material such as a hero image for example.

GSC "Time spent downloading a page"

For the mobile site it's now pretty close to 200ms and fairly flat, with a mean download size of ~4--8kB. For www the mean is hovering at more like 260ms with a mean download size of ~20--40kB, which implies an effective bandwidth of 500kB/s or ~5MBps raw.

2017/10/23: Service Workers

Watching the Chrome Dev Summit 2017 - Livestream (Day 1) suggests that even for a plain HTML site such as this it would be good to help pre-cache and auto-refresh some assets, even to the point of keeping the site usable when off-line, eg on the Tube!

Service workers don't work in all browsers, and need https, but as long as they are only enhancing a site rather than critical to it, and are not too heavyweight, this can be all gravy.

The improvement could be something as simple and lightweight as the Network or Cache service worker example, ensuring that a fall-back cache is held for a fixed set of pages and other key small assets (such as the home page and all the updated and popular pages linked from it) when the network becomes a notwork.

Also it would be possible to show a list of available cached pages when offline on a cache miss, or from the home page for example.

See also "Making a Simple Site Work Offline with ServiceWorker" and particularly the CodePen version. Notice that the service worker has to be in the root, but is invoked in this case via js/global.js.

I may play with some of this on my fossil site, which is served over both http and https via Cloudflare, where I can test that nothing breaks with http and that going off-line dosn't stop the site from working when using https. The code looks simple and small to achieve this, and can be created manually.

I have tried a couple of times now to make a basic worker setup with Workbox, but it seems to copy over or generate 200kB+ of dense JS which would outweigh many many pages on most of my sites... Maybe I'm missing something. Maybe I can use a subset of functionality, and only trigger any extra download or activity when the user goes to a "make this site work off-line" page which then unleashes the JS to register the worker, etc.

What is Fast?

In the livestream, the Googlers seem to be regarding a sub-1s first meaningful paint as 'fast', but with a few extra seconds allowed on first visit (2--5s total to interactive). I aim for ~1s even on first visit.

2017/10/15: Latency

Although I am able to serve precompressed HTML pages locally to my laptop in as little as 4ms or so, even with a simple tool such as curl (see ttfb.sh) I'm occasionally getting huge spikes up to 200ms+. Eeek.

% sh script/ttfb.sh http://m.earth.org.uk/
0.039382
% sh script/ttfb.sh http://m.earth.org.uk/
0.011480
% sh script/ttfb.sh http://m.earth.org.uk/
0.011241
% sh script/ttfb.sh http://m.earth.org.uk/
0.036796
% sh script/ttfb.sh http://m.earth.org.uk/
0.012767
% sh script/ttfb.sh http://m.earth.org.uk/
0.121798
% sh script/ttfb.sh http://m.earth.org.uk/
0.009894
% sh script/ttfb.sh http://m.earth.org.uk/
0.010384

I don't believe that this is anything particularly to do with my Apache configuration now, since (when I am testing) there should be thread free and waiting.

I have installed NGINX to have a little play, and to understand how it works. It is reportedly faster and more memory efficient than Apache.

One aspect that I noticed while looking at configuration, is that NGINX allows log files to be written buffered (and even gzipped) to reduce filesystem activity.

I suspect that my latency spikes arise from contending with other filesystem traffic, so I could aim (for example) to cache key critical path HTML content in memory with Apache and/or tweak how it writes logs.

If log file activity is any part of the problem then I can enable server wide BufferedLogs for Apache, though for 2.2 it is still 'experimental' and would (for example) prevent me viewing activity in real time.

If contending with other filesystem traffic is an issue then there is mod_mem_cache which could be set up to hold (say) up to 100kB of objects up to 14600 bytes each (ie that could be sent in the initcwnd. Some limitations are that this cache is worker process based (so may be held in multiple copies and continually discarded so with a low hit rate too), it is shadowing and likely duplicating any filesystem cacheing done by the OS, and there is no obvious way to keep this cache for just the key HTML pages let alone just the precompressed ones. For the mobile site (this directive appears to be per-site) almost all objects are reasonable candidates for caching however.

To enable cacheing generally (though the default setup parameters are system-wide and not really appropriate):

a2enmod cache
a2enmod mem_cache
/etc/init.d/apache2 restart

With that global setting there is no visible improvement, nor even with a specific config block for the mobile site instead:

# Set up a small in-memory cache for objects that will fit initcwnd.
<IfModule mod_mem_cache.c>
CacheEnable mem /
MCacheMaxObjectSize 14600
MCacheSize 100
</IfModule>

So I have disabled mem_cache again for now.

The latency seems much more consistent (and at ~14ms) when tested locally (on the server), possibly because this way the CPU will likely be spun up to max by running the script before Apache is invoked, or possibly because my router (and WiFi) is introducing jitter.

% sh script/ttfb.sh http://m.earth.org.uk/
0.013
% sh script/ttfb.sh http://m.earth.org.uk/
0.014
% sh script/ttfb.sh http://m.earth.org.uk/
0.015
% sh script/ttfb.sh http://m.earth.org.uk/
0.014
% sh script/ttfb.sh http://m.earth.org.uk/
0.016
% sh script/ttfb.sh http://m.earth.org.uk/
0.014
% sh script/ttfb.sh http://m.earth.org.uk/
0.015
% sh script/ttfb.sh http://m.earth.org.uk/
0.014

Unit Tests

I've introduced cleancss to further minify CCS after the uncss (etc) step, to trim some remaining redundancy, from newlines to adjacent residual rules for the same selector. I have thus squeezed out a few more bytes from the header.

However, there is quite a narrow path to tread to minify more, but not to remove a few pieces that need to remain, such as fallbacks for old browsers. To this end I have wrapped up cleancss in a script with the right properties, and created a makefile for unit testcases to ensure that the minification does strike the required balance.

Tests for other delicate/fragile functionality can be added to the new makefile in due course.

Aesthetics

Amongst all this fun I made some minor design changes, such as having full-width hero banners narrower than a full container width (ie up to desktop 800px width) stretch to the container width to look consistent with other elements. This includes all mobile hero banners, so I bumped up the ceiling size of auto-generated mobile images from 5kB to 7.5kB to look better when they are stretched. (And 2.5kB on an image already being streamed, and still within the initcwnd, is not a big deal.)

Column/carousel headers and header floats are now selected for a nearer 1:1 aspect ratio, and for not being hugely over-width, for a more consistent appearance, and to avoid inefficient use of bandwidth.)

2017/10/13: uncss Tweaks

Discussing "With -n avoid injecting blank lines (and unnecessary newlines)" with the uncss maintainers I have resolved to avoid leaving any trailing newline(s) in my minified CSS files (which is easy now that I have cleancss available on the command line, so no more manual cut-n-paste), and they may be willing to tweak uncss not to inject a newline after processing each CSS fragment when -n is used.

The first of those (my bit) means that I have been able to remove an extra pipeline process that I was using to remove blank lines.

(The uncss maintainers want to avoid scope creep and starting to actively strip out whitespace, which is reasonable.)

The second will save a few more bytes in the header (and thus the critical rendering path) without extra minification work, if it happens.

2017/10/10: Bad Bot

I wasn't quite believing my logs-derived stats, and in particular they claimed that I was seeing much less mobile traffic from real humans than (say) Google claims to send my way from searches alone.

There's one inexplicable type of activity where a varying IP, usually somewhere in China, though occasionally elsewhere else in Asia, downloads the same legitimate page over and over at a moderate rate. The User-Agent is not recognisably a bot, ie looks like a human's browser. The stats' computed mean page download size seemed to be too close to that page's size (with gzip Content-Encoding) as if it might be dominating the calculation.

I wrote a simple filter in the stats to ignore most of the effect of that traffic, and count it as 'bot' traffic, and my bot traffic percentage jumped from about 50% to about 75%, ie that one bot is responsible for 25% of all my traffic! Bah! What a waste. And presumably trying to break in. I may be able to add other defences in due course.

Now mean www and m page transfer size (eg including HTTP header overhead) show as 10745 and 13913 bytes respectively, both within the TCP initcwnd. So able to be delivered to the browser in the first volley without waiting for an ACK, which is good.

Also, mobile page downloads now seem to be about 13% of www, which is lower than I'd like, but closer to what other data sources suggest.

2017/10/09: Mobile vs HTTP/2 Dilemma

It's going to be a while before I can easily update this site to HTTP/2, since basically it requires a complete OS upgrade to support the underlying TLS that HTTP/2 in practice requires.

But articles such as "Mobile Networks May Put HTTP2 On Hold" and "HTTP/2 Performance in Cellular Networks [PDF]" strongly suggest that HTTP/2's placing of all eggs in one TCP connection basket, coupled with the lossy nature of cellular/mobile, the loss of HTTP/1.1's multiple connections' initial connection windows (initcwnd), and maybe how TCP deals with loss (still largely assuming it to indicate congestion), makes it perform worse than venerable HTTP/1.1 for mobile (especially unencrypted). TLS adds lots of latency that HTTP/2 doesn't fully undo, and that's what really matters for mobile performance it seems: "More Bandwidth Doesn’t Matter (much)".

My current thinking: when I introduce HTTPS I will make it the canonical form for desktop, but have plain old plain text HTTP for mobile as the preferred alternate form. (I'll support both for both, without redirects or pinning, to let users choose; search engines can exercise some policy too.)

In the future, improved TCP congestion and loss management algorithms such as "BBR, the new kid on the TCP block" may help ease the difference in favour of HTTP/2.

2017/10/08: Mobile Usability

I just had a couple more pages show up in the GSC "Mobile Usability" report, dated 2017/10/09 (well, "10/09/17" in what I consider a particularly perverse display format that I am unable to adjust) even though I pre-emptively 'fixed' them a week or more ago and verified them to be 'mobile friendly' with Google's tool then and again now. Very odd that it should take 4 weeks to propagate a stale notification to the GSC UI.

Incidentally, on the mobile site, there seems to be an effective floor of 200ms on "Time spent downloading a page" in GSC with occasional dips below that possibly from G then crawling from (say) Europe rather than the US west coast. Very low download times only seem to be reported when there are very few items downloaded that day. (The average is currently running at 239ms, minimum 86ms.)

I'm now marking the (already-end-of-body) Share42 JavaScript async to try to lower its priority (from medium to low in Chrome) and to ensure that nothing at all blocks waiting for it. With async the script does seem to get loaded later, and some images sooner, but nothing totally clear, and I fear that this may break something subtle for some browsers, eg by introducing a timing race.

I've also significantly pushed up the number of threads that Apache (worker) has waiting to service requests (MinSpareThreads, MaxSpareThreads, plus MaxClients), for when browsers open many concurrent connections, possibly amplified by sharding. This seems to be dealing with cases where one or two connections seemed to be blocked for hundreds of milliseconds, delaying final rendering. For some reason this effect seemed more pronounced on slow/mobile connections, which has been puzzling me. I suspect that although the connections were quickly accepted by Apache, it was not able to send content on some of them until others had completed. For example for Pingdom, testing from Australia, the site moves from the top half of all sites to the top 20%. This may also kill some of the spikes in download time seen by GSC. There is more to life than just TTFB to optimise for! I will have to keep an eye on memory use on my little RPi though.

2017/10/01: Rounded Corners

Time to visit the dark side again. Long after the fashion has been and gone, no doubt, I feel that it is time to knock the sharp corners off the hero images in the column display on the front page at least. (Indeed, after a bit of fiddling, I've applied this as a house image style to 'soften' all floats too.)

So I have applied a small amount of border-radius magic, which involved a new CCS class in the base CSS support, and some tweaking of the wrapper scripts. Yes, it's a little smoother, at very little increased page weight, maybe ~10s of bytes after compression, given the removal of most unused CSS by static analysis per-page.