Referrer spam in Google Analytics (and other systems) is a growing problem, distorting genuine traffic numbers and making extracting useful data all the more difficult. Fortunately, it's easy to fix.
What is referrer spam?
A referrer is the last page someone was on before clicking through to your page, usually sent with every page view. Referrer spam dates back to the days when many sites published their analytics data (usually generated from server log files). This meant that if you sent traffic to a third party site that did this, an actual link would be created to your site from their analytics pages. For instance, have a look at the links in this demo report (look for referral reports on the left).
Unfortunately, referrer information is set by the browser or software a visitor is using, and can easily be faked or modified. So, spammers sent fake traffic around the web pretending it was referred from their own sites, hoping to create links to their own pages (and that those links would make their sites rank better in Google). A useful side-effect from the spammer's point of view was the occasional click from site owners, wondering why a particular site had sent them traffic.
Why spam Google Analytics?
Google Analytics reporting is not publicly accessible, however it's usage is so wide (something like 2/3 of the most visited sites online use it) that spammers decided the "curiosity clicks" alone would be worth their efforts, and set about spamming referrers on a massive scale, hoping to attract traffic to their sites. Unfortunately, the way a typical Google Analytics project is set up makes this very easy for them:
- Everyone's tracking code is the same, apart from their ID number (e.g. UA-12345-1)
- These ID numbers can be "guessed" as they follow a specific format
- The tracking code can be installed on third party sites and Google will show the data in your reports
- Google Analytics has handy API features that allow data to be sent in bulk by software or scripts
What are the symptoms of referrer spam?
Put simply, you'll see odd or fake referrers in your Google reports. Visit the report at Acquisition >> All Traffic >> Referrals to see referrers. For example:
It's quite common to see the "XYZ" domain appear frequently as this seems to be a chosen domain extension of the spammers.
Depending on your overall traffic, these referrers could be a significant proportion of your overall visitors, as reported by Google - as high as 15-20% for lower-traffic sites. The traffic will usually have a 100% bounce rate, dragging down your overall site metrics.
How to block the spammers
Blocking the spam is actually quite straightforward, but does require some care to make sure you don't disrupt your stats. Unfortunately, you can't block historic spam, so you will only be able to get better numbers for future reports. One method to block them would be to make a note of the bad referrers and filter them out. However, they change all the time which means you would need to constantly add new referrers. So, we'll use a different method that will block future spam too, based on the fact that the spammers don't know what your site is - they're just running through tracking numbers randomly.
Add a new view to your Google Analytics profile.
Creating a new view is strongly recommended. While it's a pain to now have an additional view, and there won't be any historic data available, this avoids the potential for lost data. You can keep the existing view as an "unfiltered" one. In the event of any problems, your data will still be there.
Determine which are the valid sites for your Google Analytics code
The technique we'll use to block the spammers will only record stats for your own site(s), so you need to make sure they are recorded correctly. Visit the report below:
Audience >> Technology >> Network
In this report, choose "hostname" as the 'primary dimension':
It's best to choose a long time frame (e.g. 12 months). You should see something like the below. Interestingly, all of the below is spam, because this site never installed the Google Analytics code (!):
Most of the spam traffic will be under (not set). Regardless, you're looking for:
- Your own domain name
- Any valid external domain names where you've installed the code (e.g. a subdomain, regional domain name, checkout server or similar)
- Any other third party sites that provide genuine traffic (e.g. Google Translate, the Google cache)
So, for SEOThing.co.uk, we might include the below:
- Googleusercontent.com (Google Translate/cache)
- Googleweblight.com (some kind of Google proxy)
If you had an external shopping cart checkout, you might include checkout-processor.com, for example. Google is going to use your list in a regular expression pattern. If you don't know what that is, don't worry - do the following:
Take the "word" part of each domain name (e.g. seothing for www.seothing.co.uk). Make a list of them separated by a bar character (|). Don't include a bar at the beginning or the end:
Create a new filter
You need admin access to the Google Analytics account to do this. Visit Admin >> All filters. Then click "New filter" at the top and follow these steps:
- Choose "custom" for the filter type
- Select "include" as the type of filtering
- Choose "hostname" as the filter field
- Add your list of domain names
- Apply the filter to the new view you created earlier
The whole form should then look something like this:
If everything looks good, save your changes. Congratulations, you have eliminated referrer spam from your Google Analytics data. No maintenance is required unless:
- You change domain name
- You want to start tracking another domain name associated with your site (e.g. you host a support forum with a third party or similar)
Removing spam from existing reports
If you want to look at existing reports with the referrer spammers removed, you will need to create a new "segment" via the button in the top right of the page:
Then choose "new segment":
You will use an advanced condition to filter by hostname. Click where the existing condition says "Ad Content" and start typing "host..." the field should auto-complete:
Finally, you will need to add your valid hostnames, choosing a match type of "matches regex". Your final segment will look like this:
Let us know if you have any questions or need help adding your filter.