Earth Notes: On Website Technicals (2021-12)

Updated 2023-08-07.
Tech updates: Sitebulb 5.4.0 review, AMP off, reviewed reviewed, crawl frenzy, IndexNow.
Winding down for Christmas (2022?), and actually paying for Sitebulb finally after the free version picked up more minor site bugs! (Oh, and AMP may have really really stopped.) Plus some random GSC stats snapshots, and setting up IndexNow.

2021-12-31: IndexNow

What else is there to do while waiting for New Year than set up IndexNow support?

I have this set up incrementally pushing changed, reasonably-new main pages. This rotates randomly between IndexNow participants, since they should share all URLs that they receive.

This was provoked by Bing suddenly (unannounced, in the last few days) stopping accepting sitemap pings, rejecting them with a 410 Gone.

This is pretty much the entire support in the makefile:

# Recreate/expose the IndexNow key as necessary.
# It is not built in to the makefile since it is meant to be 'secret'.
# https://www.indexnow.org/documentation
IndexNowKeySrc=.work/IndexNow.key.txt
IndexNowKey := $(shell cat $(IndexNowKeySrc))
IndexNowKeyFile := $(IndexNowKey).txt
all:: $(IndexNowKeyFile)
$(IndexNowKeyFile): $(IndexNowKeySrc)
        @echo "Rebuilding $@"
        ln $(IndexNowKeySrc) $(IndexNowKeyFile)
        chmod a+r $(IndexNowKeyFile)
# Ping main-page updates to IndexNow and remembers which have been done.
# https://www.indexnow.org/
# Errs on the side of under-reporting.
# Submits updates incrementally.
# Only considers pages up to a few days old.
IndexNowMaxDaysOld=7
# Eliminates explicit 'NOINDEX' pages.
# Does not attempt to ping any one page more than once between updates.
# All pings could be sent to the primary (URL1) or can be shared at random.
IndexNowSEURL1=https://yandex.com/indexnow
IndexNowSEURL2=https://www.bing.com/indexnow
IndexNowFlags=.work/IndexNow.flags
.PHONY: IndexNow.ping
IndexNow.ping: $(WORKTMP)/IndexNow.ping
all:: $(WORKTMP)/IndexNow.ping
$(WORKTMP)/IndexNow.ping: makefile $(IndexNowKeyFile) $(SCWPAGES)
        @echo "Rebuilding $@"
        @$(LOCKFILENRSLOW) $@.lock
        @for f in `find $(PAGES) -mtime -${IndexNowMaxDaysOld} | sort -R`; do \
            if egrep -q '<!-- *NOINDEX *-->' .$$f; then continue; fi; \
            count=0; \
            n=$$f; \
            flag=${IndexNowFlags}/$$f.log; \
            if [ ! -f $$flag -o $$f -nt $$flag ]; then \
                echo IndexNow: $$n; \
                URL=`( echo ${IndexNowSEURL1} ; echo ${IndexNowSEURL2} ) | sort -R | head -1`; \
                wget -O $$flag "$$URL"'?url=$(URLLISTPREFIX)'"$$n"'&key=${IndexNowKey}'; \
                count=1; break; \
            fi; \
            done; \
            if [ 0 = "$$count" ]; then echo "All done..."; touch $@; fi
        @/bin/rm -f $@.lock

Screenshot 20220105 BWT URL Submission 2022-01-05: I note that if I submit a URL to Bing or Yandex, Yandex spiders it immediately. But I don't think I've seen Bing respond at all to a URL submission.

But I can see the Bing-IndexNow submitted URLs in the appropriate section of the Bing Webmaster Tools, with a date and time. They appear immediately, given a (BWT) page refresh.

2021-12-27: Race to Crawl

Screenshot 20211217 GSC Crawl stats
HTML page crawling for desktop EOU has shot up and stayed there...

2021-12-13: Reviews Healing

Screenshot 20211213 GSC Review snippets
Work on replacing the 30+ reviews (and their underlying Product and Event stuff to make GSC happy) with the external attribute files is basically done, which is swift considering that the last fix went in yesterday.
Screenshot 20220101 GSC Review snippets
2020-01-01: the persistent drop in impressions is interesting. Given recent GSC changes, while typically two Review fragments are shown by GSC per nominal review the way I do it, maybe now only one of those impressions is counted (the nested instance rather than the top-level AggregateRating).

2021-12-04: AMP Off

Screenshot 20211204 AMP impressions stopped
Google appears to have stopped sending AMP impressions to EOU.

2021-12-03: Sitebulb 5.4.0

I was sent a canny "Would you like to try again?" marketing email from Sitebulb, so I did, and this time I have ponied up for a 'Lite' licence at least for now. I cannot possibly justify the expenditure as it is basically all my ad revenue, but I have found the tool helpful in its trial version for a number of things. So I think that Sitebulb ought to have at least a little of my money...

A number of observations:

  • I cannot start a new project with my Mac's internal firewall turned on, which is ugly. But I can then (re)run projects with it on.
  • The schema.org structured microdata parsing is currently broken; it does not understand multiple values in a single itemprop (or itemtype) attribute.
  • Paying for a Lite licence terminates access for the Pro features usable in the trial licence: don't pay up too soon!
  • Running a Sitebulb crawl against my Apache (2.4.25) MPM Event configuration caused Apache to stop responding after a while. I switched to MPM Worker, though it may need to be trimmed a little to conserve memory.

See previous (2.0.2) review.

Product: Sitebulb Website Crawler 5.4.0

Sitebulb logo
Website auditing tool for site owners, SEO consultants and agencies, for Mac and Windows.
  • InStock
  • for Lite single user monthly including VAT valid at/until:
Review summary
  • 14-day free trial upgraded to Lite
  • As previous (2.0.2) review I found Sitebulb desktop website crawler to perform a thorough crawl and cross-check of many aspects of a site's content and behaviour. Encountered a problem with incorrect parsing of schema.org multi-value itemprop attributes, and inability to create a new project with the Mac's internal firewall enabled. Tested on x64 Mac OS X and x86 Windows 10 laptops. Support remains friendly and good. I upgraded to a paid (Lite) account.
  • Rating: 4/5
  • Published: