Earth Notes: RSS Podcast Efficiency
Updated 2024-05-01.Abstract
Keywords
RSS, podcasting, efficiency, climate, skipHours, alternateEnclosure, Cache-Control
Introduction
IN PROGRESS
Working Notes
This describes work in progress.
Note that For the EOU Web (off-grid, RPi) server hosting a mixture of static sites including EOU, over the 7 days from to 25,881,279,225 bytes (~26GB) (sum of column 11 in the logs) were served over 301,193 requests (eg Filtering for requests for The traffic to all of EOU in this interval is 8,927,622,485 (~9GB) over 115,247 requests, so Note that this podcast RSS file does not contain the body text of articles nor audio/video content, only summaries and links. Some RSS feed files (not at EOU) contain the full text for their entries. 134MB per week or ~600MB per month (and ~7.5% of all EOU server requests) to check for new entries in the RSS feed, which emerge less than once per month on average, is excessive. And this feed has a very small number of readers, including only a very small number of direct clients polling, eg from browser RSS readers or mobile phone podcast players. This represents a waste of CPU and bandwidth and thus energy for all participants. Battery life also for mobile clients. Given that the system is not run on entirely zero-carbon energy, this in turn will be hurting the climate. Ofcom: Audio listening in the UK () notes that A live view of RSS podcast hits and bytes as a fraction of EOU site desktop traffic is available. One aim is to keep these values below ~4.5% of hits and ~1% of bytes as seen on after some defensive measures were put in place, even if the number of podcast listeners goes up. (See some of the scripting tools that I am using to extract and present data.) So Apple and Amazon are clearly dominant in terms of traffic, and probably no one wants to complain too much because of their dominance in the market. The anonymous (no Podbean appears to make about one request a day from each of tens of instances located in data centres (ie there are not end-user podcast player requests). Podchaser appears high in the by-bytes list because, like Note that for this interval requests are fairly evenly spread over 24h, with a little more traffic in UK day and evening. Looking at logs (before more aggressive Note Podnews RSS Stats: for 2024-04-15: podping
is not in scope for this work as it introduces a central service dependency and may simply hide poor behaviour further upstream. 2024-04: size of the problem
GET
and HEAD
) ie log lines. /rss/podcast.rss
gives 134,263,853 bytes (~134MB, ~0.5%) over 8,618 requests (~2.9%). /rss/podcast.rss
is ~7.5% of EOU hits, ~1.5% of EOU bytes. A fifth of adults listen to podcasts each week, with reach higher among the under 35s and those in higher socioeconomic groups. Those who do listen to podcasts listen to an average of five per week.
[ofcom2024listening] Which implies that a daily poll/update of each podcast feed might be a good default, rather than many times per hour!More on problem size...
Stats
/rss/podcast.rss
feed file by access count and total bytes 2024-03-24 to 2024-04-01 (06:25Z), plus 'ALL' total.Count Bytes User-Agent ("-" means none, ALL is total) 8618 134263853 ALL 2769 30608880 "Amazon Music Podcast" 1458 39332327 "iTMS" 653 6895886 "Podbean/FeedUpdate 2.1" 437 8646182 "-" 254 2713382 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
/rss/podcast.rss
feed file by total bytes and access count 2024-03-24 to 2024-04-01 (06:25Z), plus 'ALL' total.Count Bytes User-Agent ("-" means none, ALL is total) 8618 134263853 ALL 1458 39332327 "iTMS" 2769 30608880 "Amazon Music Podcast" 437 8646182 "-" 653 6895886 "Podbean/FeedUpdate 2.1" 100 4235406 "Podchaser (https://www.podchaser.com)" iTMS
appears to be overwhelmingly Apple (Apple also has an itms
agent), with a handful of hits from a feed validator. User-Agent
) traffic bears examination too. iTMS
, it does not accept compression and thus uses ~8x more bandwidth per fetch than a client that does.
/rss/podcast.rss
feed file per hour UTC by access count and total bytes 2024-03-24 to 2024-04-01 (06:25Z).Count Bytes Hour UTC 303 4573230 00 340 5748777 01 328 5203708 02 336 5664193 03 354 5792703 04 349 6714477 05 330 5144024 06 338 5197583 07 331 5551755 08 316 4765563 09 345 5205566 10 348 5347440 11 435 6084557 12 345 5260004 13 393 5699269 14 395 5937681 15 370 5690353 16 404 7035478 17 437 6078302 18 444 6730083 19 340 5415327 20 389 5864647 21 335 4920618 22 313 4638515 23 406
/429
defences were raised) for to inclusive, the top-35 RSS feed bad boys/bots are:
/rss/podcast.rss
feed file by User-Agent
by access count and total bytes 2024-04-07 to 2024-04-15 (06:25Z) and ALL.Count Bytes User-Agent ("-" means none, ALL is total) 9643 175368483 ALL 2806 34128111 "Amazon Music Podcast" 2401 73456937 "iTMS" 542 6380012 "Podbean/FeedUpdate 2.1" 483 8501504 "-" 360 4300947 "Mozilla/5.0 (Linux;) AppleWebKit/ Chrome/ Safari - iHeartRadio" 250 3799126 "itms" 242 2815836 "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36" 196 192996 "FeedBurner/1.0 (http://www.FeedBurner.com)" 192 2258465 "fyyd-poll-1/0.5" 190 2600966 "Overcast/1.0 Podcast Sync (3 subscribers; feed-id=XXXX; +http://overcast.fm/)" 141 1471059 "PocketCasts/1.0 (Pocket Casts Feed Parser; +http://pocketcasts.com/)" 123 1385317 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:124.0) Gecko/20100101 Firefox/124.0" 110 1642327 "NRCAudioIndexer/1.1" 103 1566098 "gPodder/3.11.1 (+http://gpodder.org/) Linux" 98 1150676 "CastFeedValidator/3.6.1 (https://castfeedvalidator.com)" 97 1155327 "axios/1.5.1" 90 1359564 "TPA/1.0.0" 88 1043874 "iVoox Global Podcasting Service" 77 966741 "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 65 991562 "PodcastRepublic/18.0" 64 5675347 "deezer/curl-3.0" 62 618821 "Aggrivator (PodcastIndex.org)/v0.1.7" 57 666816 "TuneIn-Podcast-Checker" 53 815776 "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)" 48 729556 "Podcasts/1555.2.1 CFNetwork/1237 Darwin/20.4.0" 48 493782 "Wget/1.21.3" 44 669960 "ListenNotes/3.0 (id=XXXX; +https://www.listennotes.com/about/)" 35 433865 "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.66 Safari/537.36" 32 1437205 "Podchaser (https://www.podchaser.com)" 31 340478 "SpaceCowboys Android RSS Reader / 2.6.21(306)" 28 209109 "AntennaPod/3.3.2" 20 211622 "okhttp/4.9.3" 19 227454 "Mozilla/5.0 (compatible; MuckRack/1.0; +https://muckrack.com)" 19 223363 "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36" ??? Unknown 7,101 every 0 minutes; Zapier 2,283 every 1 minutes; Google Podcasts and Search 2,199 every 1 minutes; NetNewsWire 616 every 2 minutes; PodcastAddict 609 every 2 minutes; Reeder 422 every 3 minutes; Amazon Music Podcasts 345 every 4 minutes; Overcast 288 every 5 minutes; iHeartRadio 248 every 6 minutes; FreshRSS 233 every 6 minutes; AntennaPod 230 every 6 minutes; ...
Interactions with Technology Providers
Various providers of pieces of the technology puzzle (eg aggregators, mobile podcast app writers) were contacted to better understand behaviour of their systems, and possibly nudge them in a good direction.
Some of the interactions are summarised below.
Linux Audit
Please also see the Linux Audit parallel work RSS is cool! Some RSS feed readers are not (yet), including:
- Slackbot
- Newsboat
- Selfoss
- Feedbin
- Tiny Tiny RSS
- Miniflux
- Nextcloud
- Feed on Feeds / SimplePie
- Feedly
LA pointed out as relevant the Email and other content has been edited to preserve confidentiality, etc, as appropriate: The Earth Notes Podcast RSS has been registered with Amazon Music for Podcasters. Amazon serves as an aggregator and catalogue. On I sent Amazon (UK) podcasting an email containing: May I ask why you are polling my podcast RSS feed every few minutes when it usually updates only every few weeks? Probably more than all other users combined... (See a sample of the log below.) Also the skipHours in the RSS and the 3h+ Cache-Control / Expires HTTP headers that I have set seem to be ignored, and there appears to be no attempt to use If-Modified-Since or If-None-Match. What am I doing wrong? RSS file start: Log sample: Note that the Amazon requests come in from a large variety of IP addresses, with those checked being from within the Throwing After being prodded the US-Global support team replied : Please note, this request goes beyond the scope of support our team offers, and therefore will take some time before we receive a response from the engineers. The Earth Notes Podcast RSS has been registered with Apple's iTunes podcast catalogue. Apple serves as an aggregator and catalogue, and hosts the de facto canonical podcast catalogue. Apple says at RSS feed refresh: Apple Podcasts checks RSS feeds frequently to detect new episodes and any other metadata or artwork changes so that listeners have access to the latest as soon as possible. These changes usually display quickly — often within a few hours. You can view the time and date when each show was last refreshed from your show information pages in Apple Podcasts Connect. On I contacted Apple via its Is there any way that I can set your RSS fetcher to honour Cache-Control (or Expires) and/or SkipHours? Currently it does not seem to. My server is off grid and I'd prefer polling to be minimised in the hours I include (23Z to 07Z). Done right this could save a lot of bandwidth, CPU and carbon for you and the servers that you poll. An initial response said that I responded with: This could be added to the other simple technical fixes that Apple already implements to reduce carbon emissions from unnecessary CPU and bandwidth use. I note that your agent polls very frequently and often does not even use compression, ie is not compliant with even basic de facto etiquette. Some example Apple fetches, including uncompressed On I was provided with links to Apple Podcasts feedback, Environment, and the contact email for environment report feedback. There is also poor behaviour like this (all from the same IP address): AntennaPod uses conditional fetches for the RSS feed file. When set with a 12h refresh interval log entries for the feed fetch are (noting underlying an feed file change before the Feeder is an open-source feed reader and podcast player for Android mobile devices. I noticed its user agent in the Earth Notes logs. I asked (by logging an 'idea'): To which the author responded: regular http cache-control is already supported. what's skiphours? I pointed the author at the I added: The author noted in the exchange that in version 2.6.20 (of ) I gave 2.60.21 a sneaky test run and the log showed: After loading the new app version, telling it the feed URL and messing around, then forcing a A set of ~hourly interactions for the 2.6.20 Feeder version by another user for a different feed during a period where it was unchanged: : I have seen what appears to be one other user upgrade to 2.6.21 and RSS polling traffic is tiny, even if still unconditional. So I am recommending Feeder to my podcast page visitors. Three different clients (the last three hits are from the same client) all getting The Earth Notes Podcast RSS has been registered with the I emailed a suggestion : Would it be possible to support the RSS SkipHours tag in future, and/or respect the Cache-Control/Expires/ETag headers from the fetch? As of I have what looks like one client (one IP) downloading unconditionally every 20 minutes, which puts it on the edge of the top-10 (ie including Apple, Amazon and Spotify)! I have not seen a I have filed an issue asking if I had a quick response (test URL now supplied): Yes, we support If-Modified-Since. Please provide a test url if possible, to debug... We don't support Cache-Control or Expires at the moment, only Etag/If-Modified-Since. It seems that 3.11.4 (unreleased) and the 3.11.1 should support But in any case the aberrant polling stopped spontaneously! Strange... TuneIn (RSS feed fetcher user agent It seems to poll faster than hourly, not respecting HTTP cache control or RSS It seems to be hosted on AWS (Amazon Web Services). I used the contact form to ask: RSS feed polling excessively Is there any way that I can get your RSS fetcher to honour Cache-Control (or Expires) and/or SkipHours? Currently it does not seem to, and is polling far more often than makes sense. I am concerned about climate impact. After several miscommunications, including attempting to create me an account, I sent further explanation: I am referring to how often you poll my RSS feed at https://www.earth.org.uk/rss/podcast.rss It updates with new content about monthly. You poll it about every 30 minutes, and don’t seem to pay any attention to Cache-Control, Expires, Last-Modified or ETag, nor the skipHours tag (or other update-hint tags) in the RSS feed itself, eg: You are wasting a tremendous amount of your CPU time and bandwidth and feed providers’ (such as me), with an accompanying hit on all our bills and climate emissions. I only have a small off-grid server which is not updating the feed overnight for example. Is there anything we can do to make this better? I note that some of the other services polling the same feed are making use of at least some of those fields and hints. Less-than-monthly according to Listen Notes I received a response offering to offer to extend the polling interval on my feed from 4h to 40h because I accepted the increase to 40h, but asked: But what do you mean by "headings are unfortunately not reliable across our directory”? HTTP cache control headers are very basic, and if you don’t trust them entirely, you can limit whatever cache life you see to (say) 1 day or even 12h, vastly reducing pointless polling traffic (and climate emissions) for many (slow) feeds. ... and my ticket was closed! I asked anyway: Things are looking a little better. Note that each IP address (other than for Still not quite one poll every 40h (more like 6h!), but much better anyhow! I hope that TuneIn also at least thought about how wastefully it is polling everyone else too... On I asked Podbean to implement any of skipHours, Expires, Cache-Control, If-Modified-Since, since their traffic was very visible in my logs. This got a response a couple of days later: I responded that their polls (from many IPs what appear to be, probably their, datacentre-based bots) are far more than daily, eg in one log sample I sent them, more like ~8 per hour, and then in another, 22 in under one minute... Podbean traffic is still very visible in my logs, now showing up at #3 by hits. So I have emailed again: I do not know if those are many separate human clients, or all your machines in datacentres (all the IPs that I have checked are datacentre-based). But you are showing up as #3 of ALL clients polling my RSS feed. Only Amazon and iTunes' horrible implementations consume more than you. And most of those polls are completely redundant, ie if you did If-Modified-Since conditional GETs on all but (say) at most one poll per day you’d get 304 responses consuming less of your and my and Internet resources, and heating the planet less, with no worse outcomes. And there must be a bug lurking given a Cache-Control value of never less than ~4h on the feed [successive hits from the same IP]: So please re-consider honouring Cache-Control, use If-None-Match, and ideally also observe skipHours in the RSS file, or even do something smarter like other clients seem to, such as looking at the interval between recent updates/episodes. At the moment I am rejecting ~25% of all Also this client has poor behaviour when batted away with a I had a response : Our technical team has thoroughly reviewed the requests from Podbean and confirmed that the frequency of less than 200 requests per day falls within the normal range. There are no issues with this level of activity. If you have any further questions or concerns, feel free to reach out to us anytime. To which I replied: 200 requests per day for something that updates less than once per month is as much 10000 (ten thousand) times too often and a 99.99% waste of resources. Even given a more typical weekly podcast update frequency this is maybe >1000x too often. I’d urge your team to reconsider for the sake of the climate and your clients’ bandwidth bill and battery life if nothing else. But in any case, there seems to be a bug around 429 that should be fixed: When asked to slow down with 429, polling *faster* is bad. There is a Retry-After header present which ideally you should be honouring, but maybe wait at least wait an hour or so if you want to keep the code simple. : I received an automated "did we fix your problem" email, to which I answered "no" on the grounds that none of the issues that I had reported had been addressed, and that wasting creator bandwidth and energy etc are not good. On I happened to notice an odd double request in logs, and emailed the project owner: I noticed what may be a bug in the app looking at my logs. First a request that worked (albeit ignoring the skipHours set in the RSS feed). Then immediately following is a redundant request that I rejected because it will have been making a bad request in some different way: I can see this duplicate request pattern at various times in my logs. The author replied very quickly: The app only accesses the RSS feed once, unless an error is returned in which case it will retry by chaining some headers parameters. So following a 200 response code, the app will not reconnect on its own. However, if the user for some reason presses the refresh button quickly multiple refreshes will happen again I sent him another 06:00Z example to look at from two days earlier, He replied that his app relies on The author says that openrss.org
issues list. More on interactions...
Amazon
...
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:atom="http://www.w3.org/2005/Atom” xmlns:podcast="https://podcastindex.org/namespace/1.0" xml:lang="en-gb">
<channel>
<atom:link href="https://www.earth.org.uk/rss/podcast.rss" rel="self" type="application/rss+xml"/>
<title>Earth Notes Podcast</title>
<description>All things green and efficient @Home in the UK, cutting carbon and improving comfort.</description>
<link>https://www.earth.org.uk/SECTION_podcast.html</link>
<language>en-gb</language>
<itunes:author>Earth Notes / Damon Hart-Davis</itunes:author>
<itunes:owner><itunes:email>d@hd.org</itunes:email></itunes:owner>
<itunes:image href="https://www.earth.org.uk/img/wordcloud/podcast-1.png"/>
<itunes:category text="Education"/>
<itunes:category text="Technology"/>
<itunes:explicit>no</itunes:explicit>
<podcast:location geo="geo:51.406696,-0.288789,16">16WW, Kingston-upon-Thames, UK</podcast:location>
<ttl>367</ttl>
<skipHours><hour>0</hour><hour>1</hour><hour>2</hour><hour>3</hour><hour>4</hour><hour>5</hour><hour>6</hour><hour>7</hour></skipHours>
<item><title>2024-01-28 Diarycast - Year In Review (2023)</title><description>The rollercoaster thrills and spills of 2023 at EOU Towers... #podcast #yearInReview</description><link>https://www.earth.org.uk/diarycast-20240128.html</link><guid isPermaLink="false">img/audio/diary/20240128.mp3</guid><enclosure url="https://www.earth.org.uk/img/audio/diary/20240128.mp3" length="4859775" type="audio/mpeg"/><pubDate>Sun, 28 Jan 2024 13:51:53 GMT</pubDate><itunes:duration>271</itunes:duration></item>
[12/Mar/2024:05:33:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:35:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:41:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:46:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:47:16 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:53:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:59:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:05:59:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:05:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:11:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:12:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:17:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:23:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:25:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:29:28 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:35:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:38:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:41:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:47:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:51:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:53:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:06:59:15 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8351 "-" "Amazon Music Podcast"
[12/Mar/2024:07:04:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8350 "-" "Amazon Music Podcast"
[12/Mar/2024:07:05:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8350 "-" "Amazon Music Podcast"
[12/Mar/2024:07:11:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 8350 "-" "Amazon Music Podcast"
compute.amazonaws.com
zone. 429
(Too many requests
) codes at Amazon slows it down about 10-fold. ...
Apple
Podcasts for Creators
portal, including the following: ...
I've received confirmation from our internal teams that we do not provide any technical support for the implementation of your requested changes.
...
GET
s:
[02/Apr/2024:19:01:14 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:01:14 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:01:14 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"
[02/Apr/2024:19:16:36 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:16:36 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:16:37 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"
[02/Apr/2024:19:32:53 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:32:53 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:32:53 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"
[02/Apr/2024:19:51:44 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 3599 "-" "iTMS"
[02/Apr/2024:19:51:44 +0000] "HEAD /rss/podcast.rss HTTP/1.1" 200 412 "-" "iTMS"
[02/Apr/2024:19:51:44 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 79283 "-" "iTMS"
[30/Apr/2024:08:03:57 +0000] "GET /img/audio/diary/20200726/20200722-Waterloo-station-ticket-barriers-keep-your-distance-signage-sq-1000w.jpg HTTP/1.1" 200 48396 "-" "iTMS"
[30/Apr/2024:08:03:57 +0000] "GET /img/audio/podcast-furniture/title/diarycast-1.png HTTP/1.1" 200 4180 "-" "iTMS"
[30/Apr/2024:08:03:57 +0000] "GET /img/audio/podcast-furniture/title/statscast-1.png HTTP/1.1" 200 4174 "-" "iTMS"
[30/Apr/2024:08:03:57 +0000] "GET /img/audio/podcast-furniture/title/statscast-1.png HTTP/1.1" 200 4174 "-" "iTMS"
[30/Apr/2024:08:03:57 +0000] "HEAD /img/site/podcast/20200523-Ambient-haiku.png HTTP/1.1" 200 339 "-" "iTMS"
[30/Apr/2024:08:03:58 +0000] "GET /img/audio/podcast-furniture/title/diarycast-1.png HTTP/1.1" 200 4180 "-" "iTMS"
[30/Apr/2024:08:03:58 +0000] "GET /img/audio/podcast-furniture/title/diarycast-1.png HTTP/1.1" 200 4180 "-" "iTMS"
"-" "iTMS"
[30/Apr/2024:08:03:58 +0000] "GET /img/audio/podcast-furniture/title/metacast-1.png HTTP/1.1" 200 4109 "-" "iTMS"
[30/Apr/2024:08:03:59 +0000] "GET /img/audio/podcast-furniture/title/diarycast-1.png HTTP/1.1" 200 4180 "-" "iTMS"
[30/Apr/2024:08:03:59 +0000] "GET /img/audio/podcast-furniture/title/metacast-1.png HTTP/1.1" 200 4109 "-" "iTMS"
iTMS
is fetching and re-fetching the same cover art repeatedly, with no attempt to cache or de-duplicate even within one batch run. This may not be Apple's preferred use case (iTunes would prefer every episode's cover art to be distinct), but clearly no senior engineer that cares about efficiency has been given any time on this. Because everyone just has to put up with whatever Apple does? Handy that I made my images very compact. AntennaPod
200
entry):
[01/Apr/2024:13:30:24 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11144 "-" "AntennaPod/3.2.0"
[02/Apr/2024:06:59:47 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 14443 "-" "AntennaPod/3.2.0"
[02/Apr/2024:19:07:46 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11266 "-" "AntennaPod/3.3.2"
[03/Apr/2024:08:12:00 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 3443 "-" "AntennaPod/3.3.2"
[03/Apr/2024:20:12:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 14445 "-" "AntennaPod/3.3.2"
[04/Apr/2024:08:14:20 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 280 "-" "AntennaPod/3.3.2"
Feeder
Have you considered support for these (RSS-feed-specified SkipHours tag, and server-supplied HTTP expiry time/date) to reduce bandwidth and CPU?
:
skipHours
definition in the RSS 2.0 spec [RAB2009RSS]. He noted that: Regarding skipHours, any implementation would result in stochastic behavior for users. The feature was designed for servers which can pick when they sync, but Feeder is not not in control of when its background sync runs. This is determined by Android.
A thought: maybe during skipHours you could avoid actually doing a poll when woken when there have been no non-skipHours since your last poll. The source is telling you that you will not (likely) have missed any change in that time.
One quirk is that Feeder will revalidate the cache if last sync is older than 15 minutes.
And in version 2.60.21 (of ) one of the fixes is Tweaked Cache-Control headers to respect site headers even more
. 2.60.21
[03/Apr/2024:18:42:04 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:42:05 +0000] "GET /SECTION_podcast.html HTTP/2.0" 200 11259 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:43:23 +0000] "GET /img/wordcloud/podcast-1.png HTTP/2.0" 200 71167 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:43:30 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:18:51:32 +0000] "GET /img/site/podcast/20200523-Ambient-haiku.png HTTP/2.0" 200 90726 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[03/Apr/2024:19:12:31 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[04/Apr/2024:06:38:51 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[04/Apr/2024:13:49:26 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[04/Apr/2024:18:40:12 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 10965 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
Sync feeds
(), the feed was not reloaded until I picked the phone up at . Feeder is set to the default nominal 1h between refreshes of the feed. (Though Feeder then did unconditional fetches (200
) which should probably have been 304
given that the feed file was unchanged since , and ideally deferred until after skipHours
ie .) Good progress!
[03/Apr/2024:17:17:28 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:18:24:03 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:19:25:25 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:20:25:40 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
[03/Apr/2024:21:25:44 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 200 7285 "-" "SpaceCowboys Android RSS Reader / 2.6.20(305)"
304
s since I also turned off ETag
for RSS feed files to avoid a long-standing bad interaction with mod_deflate
in Apache:
[15/Apr/2024:07:02:11 +0000] "GET /rss/saving-electricity.rss HTTP/2.0" 304 93 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:07:02:29 +0000] "GET /rss/note-on-site-technicals.rss HTTP/2.0" 304 93 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:09:07:18 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 223 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:09:07:18 +0000] "GET /rss/saving-electricity.rss HTTP/1.1" 304 223 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
[15/Apr/2024:09:07:18 +0000] "GET /rss/note-on-site-technicals.rss HTTP/1.1" 304 223 "-" "SpaceCowboys Android RSS Reader / 2.6.21(306)"
fyyd
fyyd
directory. It seems to poll unconditionally for updates hourly: no 304
codes are returned even when the feed file is not changing. ...
gPodder
[29/Apr/2024:13:24:54 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 12856 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
[29/Apr/2024:13:44:55 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 12856 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
[29/Apr/2024:14:05:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 12856 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
[29/Apr/2024:14:25:01 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 12856 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
304
for over a week, which may be to do with me turning off ETag
s:
[21/Apr/2024:08:02:49 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 3473 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
[21/Apr/2024:08:22:37 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 3473 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
[21/Apr/2024:08:42:33 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 3473 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
[21/Apr/2024:09:02:34 +0000] "GET /rss/podcast.rss HTTP/1.1" 304 3473 "-" "gPodder/3.11.1 (+http://gpodder.org/) Linux"
gPodder
could implement If-Modified-Since
(or skipHours
, or Cache-Control
/ Expires
to avoid a premature re-poll), on the grounds that this is likely to affect more than just me.
Last-Modified
, and did on a test. ... the Last-Modified and Etag headers are stored in DB alongside the podcast and relevant headers are added to the query if present.
TuneIn
TuneIn-Podcast-Checker
) hosts a podcast directory. skipHours
.
[03/Apr/2024:04:59:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:05:52:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:06:25:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:07:02:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
...
...
[03/Apr/2024:02:23:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:03:00:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:03:33:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:04:26:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:04:59:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:05:52:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:06:25:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:07:02:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:07:35:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:08:28:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:09:01:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:09:54:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:10:27:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:11:04:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:11:37:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:12:30:11 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:13:03:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:13:57:07 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:14:29:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:15:06:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11114 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:15:39:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11117 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:16:32:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11117 "-" "TuneIn-Podcast-Checker"
[03/Apr/2024:17:05:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11117 "-" "TuneIn-Podcast-Checker"
Update frequency: every 52 days Average audio length: 9 minutes
. ... we've found that the headings are unfortunately not reliable across our directory so our system doesn’t take a look at them.
...
When can I expect to see the change to 40h polling? It is still more than hourly: see the log fragment below.
[07/Apr/2024:06:40:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:07:12:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:08:06:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:08:23:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:09:16:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:09:49:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:10:42:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:11:14:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:12:08:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:12:25:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11489 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:13:18:10 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11610 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:13:18:11 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11591 "-" "TuneInRssParser/1.0"
[07/Apr/2024:13:51:25 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11610 "-" "TuneIn-Podcast-Checker"
[07/Apr/2024:13:51:26 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11591 "-" "TuneInRssParser/1.0"
[07/Apr/2024:14:44:09 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11610 "-" "TuneIn-Podcast-Checker”
TuneInRssParser
) is unique in this log fragment:
[08/Apr/2024:20:55:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11715 "-" "TuneIn-Podcast-Checker"
[08/Apr/2024:20:55:17 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11696 "-" "TuneInRssParser/1.0"
[09/Apr/2024:14:38:27 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11965 "-" "TuneIn-Podcast-Checker"
[09/Apr/2024:14:38:28 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11946 "-" "TuneInRssParser/1.0"
[09/Apr/2024:21:20:12 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11965 "-" "TuneIn-Podcast-Checker"
[09/Apr/2024:21:20:13 +0000] "GET /rss/podcast.rss HTTP/1.1" 200 11946 "-" "TuneInRssParser/1.0"
Podbean
... Podbean's daily requests for your feed are not frequent. Continuous requests occur because there are request failures, so we will retry this request. We will optimize this request. If a 429 error occurs again, we will increase the interval to reduce the crawling of your feed within a short period of time.
2024-04-20
...
[20/Apr/2024:11:04:50 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 11393 "-" "Podbean/FeedUpdate 2.1"
[20/Apr/2024:11:19:10 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 11389 "-" "Podbean/FeedUpdate 2.1"
Podbean/FeedUpdate 2.1
requests with a 429
during skipHours
, because of its non-conditional requests, when it should not be asking anyway. 429
. For example one client (one IP) below comes back every minute to retry, ignoring the several hours Retry-After
response header:
[22/Apr/2024:01:04:35 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
[22/Apr/2024:01:05:38 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
[22/Apr/2024:01:06:40 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
[22/Apr/2024:01:07:43 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
...
...
[22/Apr/2024:01:04:35 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
[22/Apr/2024:01:05:38 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
[22/Apr/2024:01:06:40 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
[22/Apr/2024:01:07:43 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 747 "-" "Podbean/FeedUpdate 2.1"
Podcast Addict
...
[22/Apr/2024:06:00:05 +0000] "GET /rss/podcast.rss HTTP/2.0" 200 11373 "-" "PodcastAddict/v5 (+https://podcastaddict.com/; Android podcast app)"
[22/Apr/2024:06:00:05 +0000] "GET /rss/podcast.rss HTTP/2.0" 429 674 "-" "PodcastAddict/v5 (+https://podcastaddict.com/; Android podcast app)"
...
which makes me think that you may have a debounce issue in the GUI, though I still suspect it is more likely a glitch in your HTTP backend library...
okhttp
like most Android apps, so I pointed him at the GitHub Feeder issue where Feeder and my Apache server learnt to play together better! The app is already using if modified since and behaves according to the returned value (skipping the update in case if 304).
But to get a 429
from my server the request must be missing both If-None-Match
and If-Modified-Since
headers (among other things), and polling within the hours forbidden in the RSS feed skipHours
.
Hints Dropped
In order to give remote entities polling the RSS feed file as much chance as possible to avoid polling when it is pointless, wasting CPU and bandwidth, I provide a suite of hints, at least some of which any poller could act on.
I also provide In the RSS file itself are the following lines in the This says that updates are expected roughly monthly and that updating once in that interval is OK, and that this feed has a TTL (time to live) of ~25h, ie can be cached that long, and that updates will generally not be happening from 22:00Z to 07:00Z so please do not poll then at all. (Possibly the TTL should be higher, up to a month... As of I have pushed the value up to 4327 minutes, ie a little over 3 days.) In the HTTP response headers for the feed file are the following relevant lines: The The Also I defer any rebuilding of the Current defensive measures on top of the hints and update restrictions in place to reduce wasted bandwidth/CPU polling the RSS feed itself are: During This is intended to allow through manual/human updates, especially from browsers and podcast catcher mobile clients, where possible, only blocking mindless brute-force waste. For attempts to fetch during Again, these rules are intended to allow human-driven requests though and only block brute-force badly-behaved bots. These defences overlap, so a client may get either where they do, though only Bad poll defences may be extended, especially : I have extended the in- : For EOU RSS traffic I have rearranged the defences to return a : I removed the (Count: 10)alternateEnclosure
items, alternatives alongside the default MP3 (audio) or MP4 (video) file, that allow users to download much smaller versions if they wish, to save more bandwidth, data-charges, CPU, etc. I have not seen evidence of any client using (or able to use) these. More on hints...
2024-04-03: snapshot
channel
part:
<pubDate>Wed, 03 Apr 2024 12:58:31 GMT</pubDate>
<ttl>1507</ttl>
<skipHours><hour>0</hour><hour>1</hour><hour>2</hour><hour>3</hour><hour>4</hour><hour>5</hour><hour>6</hour><hour>7</hour><hour>22</hour><hour>23</hour></skipHours>
<sy:updatePeriod>monthly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<podcast:updateFrequency rrule="FREQ=MONTHLY">monthly</podcast:updateFrequency>
Date: Wed, 03 Apr 2024 18:15:19 GMT
Last-Modified: Wed, 03 Apr 2024 15:34:48 GMT
ETag: "133ff-61532f67e0edd"
Cache-Control: max-age=14820
Expires: Wed, 03 Apr 2024 22:22:19 GMT
Last-Modified:
allows an If-Modified-Since
conditional fetch. The ETag
allows an If-None-Match
conditional fetch. So if a conditional fetch is used and the feed file has not changed, then only a very small 304
status response is sent. Cache-Control: max-age
and Expires
are pushed out from this daytime poll's 4h7 to 7h7 during skipHours
. Paying attention to either header would push polling frequency well below the typical default ~1h. If a conditional fetch is done, only a slow string of tiny 304
s should happen almost all the time, and not even that in skipHours
ideally! rss/podcast.rss
file during skipHours
, or the GB grid has high carbon intensity, or the local battery is low. This should help reduce GB-grid-powered network traffic at these times. 2024-04-22: defences
skipHours
(22:00Z to 07:59Z) when no polling should be happening at all ideally, making an unconditional request (eg no If-Modified-Since
), and in the absence of a Referer
, and when GB grid carbon intensity is high (or there is not even any identifying User-Agent
), the request will be rejected with 429
"Too Many Requests" with a long Retry-After
that matches the Cache-Control
max-age
for an accepted poll.
# For RSS files (which will have skipHours matching the above),
# if there is no Referer and no conditional fetching, back off
# when battery is low or the grid intensity is high or there is no UA.
# 429 Too Many Requests
RewriteCond "%{TIME_HOUR}" "<08" [OR]
RewriteCond "%{TIME_HOUR}" ">21"
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP:If-Modified-Since} ^$ [NV]
RewriteCond %{HTTP:If-None-Match} ^$ [NV]
# Not saying who you are (no User-Agent) and ignoring skipHours is rude.
RewriteCond %{HTTP:User-Agent} ^$ [NV,OR]
# Have any interaction with the filesystem as late as possible.
RewriteCond %{DOCUMENT_ROOT}/_gridCarbonIntensityGB.7d.red.flag -f [OR]
RewriteCond /run/EXTERNAL_BATTERY_LOW.flag -f
RewriteRule "^/rss/.*\.rss$" - [L,R=429,E=RSS_RATE_LIMIT:1]
Header always set Retry-After "25620" env=RSS_RATE_LIMIT
skipHours
or when GB grid carbon intensity is high, making an unconditional request (eg no If-Modified-Since
), and in the absence of a Referer
, and with not even gzip
compression invited, the request will be rejected with 406
"Unacceptable". The lack of gzip
in Accept-Encoding
is technically the trigger for this rejection.
# Reject (bot) attempts to unconditionally fetch without compression.
# 406 Unacceptable.
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP:If-Modified-Since} ^$ [NV]
RewriteCond %{HTTP:If-None-Match} ^$ [NV]
RewriteCond %{HTTP:Accept-Encoding} !gzip
RewriteCond "%{TIME_HOUR}" "<08" [OR]
RewriteCond "%{TIME_HOUR}" ">21" [OR]
# Have any interaction with the filesystem as late as possible.
RewriteCond %{DOCUMENT_ROOT}/_gridCarbonIntensityGB.7d.red.flag -f
RewriteRule "^/rss/.*\.rss$" - [L,R=406]
406
can happen outside skipHours
for now. 429
. skipHours
expiry time to 10h7, so as to push the next allowed poll out of skipHours
. 429
in preference to a 406
if both are applicable, since a 429
does seem to slow down Amazon for example. And the Retry-After
header may provide more control (than 406
) with better behaved clients. If-None-Match
guards on the 406
and 429
defences, as the EOU site is no longer generating ETag
s to which those would be a good response. References