I often get questions from clients or marketeers who are frustrated by rubbish (spam) visits in their Google Analytics. Google Analytics spam, often call ghost spam or referral spam, contaminates the data making it difficult to see the real trends in website visitors and their behavior. This causes the most problems for smaller sites where spam is a larger portion of the traffic. Such sites may not have the time or budget to access the support of a specialist analyst and many website developers are not up-to date with the evolving issue. This results in the client’s vital Google Analytics data being rendered useless and ignored.
In this post I would like to answer some of our clients questions by briefly explaining the problem and more importantly show an occasional user how to quickly and safely exclude spam and see the real visitor data.
Spam data occurs when your Google Analytics records fake traffic. Usually it is most visible in the referral traffic but can crop up in other sources including direct traffic and organic traffic. Google Analytics spam comes in broadly two types:
Google Analytics spam is harmless to your website. The security of your site has not been compromised. You may see strange landing pages from this spam activity but these pages do not exist. Your site has not been hacked.
The spam activity can increase the average bounce rate and lower the average time on site recorded in your Google Analytics, but this will not affect your search rankings. True, Google uses such visitor engagement metrics as ranking signals, but the data in Google Analytics is not used by their search algorithms. Google is very clear about that.
While Google Analytics spam may be harmless, it masks valuable information in you Google Analytics and trying to remove the spam with the wrong actions can be potentially damaging and waste time. Trying to block ghost spam visits to your site (by adding lines to your .htaccess file) is a common mistake. Most spam is not real visitors so cannot be blocked from your site and you risk blocking real visitors or good bots. Adding a new filter in Google Analytics to remove each new referring domain is hopeless. Ghost spam usually shows up for a few days and then disappears only to reappear from a different domain.
Now we understand a little more about the two types of spam we can remove much of the spam from your reports quite easily.
Most of the spam (ghost) works by hitting random Google Analytics tracking-IDs, meaning the offender doesn’t really know who the target is, and for that reason either the hostname is not set or it records a fake one. As shown in this screen shot, where lines 1,2 & 5 are clearly Ghost spam visits.
If we add a segment to view only traffic from the valid hostnames (for most small to medium size sites that will mean just your domain name) then we will exclude the ghost spam when you view your Google Analytics data.
To add this segment when in Google Analytics:
That is great for removing ghost spam, which is most of the problem. Depending on your situation you may be content with this.
However this simple segment is less effective at removing crawler spam.
These ‘bad bots’ visit your site, therefore, know who they are hitting and can record a valid hostname.
For example see Lines 1, 3, 4, 5 and 6 in the image on the right. We can tell the data is spam by exploring the source but the hostname is correct.
Please note: Do not click on the suspect source in Google Analytics to go to the site and see if it is spam. This could confirm to the spammers that your Google Analytics account is active and invite more spam. Cut and paste the suspect source into your browser or a web search to investigate it.
To remove crawler spam we must add additional filters to the segment to exclude these sources:
Now, when you apply this segment, you can see just your real visitor data. You can even compare it to your unfiltered data to check that you are not inadvertently excluding real visits. Adding a segment is safe, relatively simple and quick. It is also applied to all your data, both past and present, so you can instantly view trends over time. If it goes wrong, don’t panic – just re-edit your segment – all your raw data is still there completely unaffected. If more spam appears next month, again, don’t panic, just edit the segment to remove the data from the new spam attack. For most occasional users this will be more than enough. You can now view your data with no pollution from spam.
However this segment approach has limitations. For a more complete and permanent removal of spam from your data you will need to go into admin and add these exclusions as Google Analytics filters to your view, but at least you now understand what your are excluding and have validated the method. Adding filters at an admin level to remove the spam from your view is quite straight forward, however it means permanently removing the data from the view so should be done with care to avoid permanent loss of data. It should also be noted that while admin level ‘view filters’ will exclude more spam being recorded in the future they can not be applied to historic data. You can only report previously collected data spam free by applying a segment. For these reasons I would not recommend adding ‘view filters’ to an occasional Google Analytics user and so it is beyond the scope of this post. This link gives further information about Google Analytics spam and describes adding filters to your view to exclude spam permanently from your Google Analytics. If you are an enthusiast or heavily dependent on Google Analytics data I recommend exploring this.
If you would like help setting up segments or filtered views, or analysing and understanding the vital data in your Google Analytics then please contact us. I will be happy to help.