Skip to content

Truncating domain for analytics referer doesn't account for all domain patterns

Describe the bug

The 'Web pages visits by source' analytics report collates domains by by truncating them at the second-last '.' character. This works for standard DOMAIN.TLD patterns, but not for all cases. For example, the standard 'TLD' for the UK is '.co.uk', so statistics are shown for "co.uk" as if it were a single site.

I'm not sure there's a benefit to dropping subdomains either. For example, blog links from 'wordpress.com' will all get collated together as well.

It seems it would be better to take the whole domain value instead of "SUBSTRING_INDEX(domain, '.', -2) at https://code.castopod.org/adaures/castopod/-/blob/develop/modules/Analytics/Models/AnalyticsWebsiteByRefererModel.php#L102

Steps to reproduce

  1. Send a request to Castopod with a multi-part domain as referer (e.g. from google.co.uk).
  2. Note the 'Web pages visits by source' will show statistics for "co.uk"

Expected behaviour

The report lists unique websites as sources, including where they don't match a DOMAIN.TLD pattern.

Actual behavior

Some collated entries in the report are too general.

Context

  • Castopod: 1.6.0

Possible fixes

Just use the whole domain for the label (https://code.castopod.org/adaures/castopod/-/blob/develop/modules/Analytics/Models/AnalyticsWebsiteByRefererModel.php#L102). Or, if this is to avoid separate entries for www.DOMAIN.TLD and DOMAIN.TLD, parse for "www." specifically.

I can create a merge request, but there may have been another reason for taking this approach?

Edited by Aonrud