Click Fraud on Google ads, especially in their display network, has become a massive issue to the extent that advertisers are actually refraining from using it at all. In basic terms – we don’t trust Google anymore to show our ads to human audiences. Most of us are familiar with 3rd party click-fraud prevention solutions (Cheq and Clickcease to name two). While these tools are great and perform well, not everyone can justify their cost and there are those of us who may have an issue with giving a 3rd party that kind to access to our campaigns.
In a previous post, I’ve discussed how you can use Google’s reCAPTCHA tool to identify fraudulent clicks in your campaigns, by applying a GTM script by Simo Ahava to trigger Google Ads conversion tracking and report it in Data Studio.
Today I want to present you with an elegant DIY solution that will enable you to identify, capture and report invalid traffic on Google ads. Implementing this solution will provide you with two “super-powers”:
All the information required to block click-fraud IPs, as well as filing an Invalid Traffic Investigation (ITI) on Google.
The scripts used in this post retain the core functionality described by Simo, so you can also use the Google Analytics events he suggests (can be used for simple visualization of the data).
NERD ALERT – I should point out, that it get’s rather technical from this point on. I’ll try to break it down into simple steps and I’m sure you’ll find it easier than it looks. Just bear with me and shoot me any technical questions you have on my Twitter.
While there are several great companies out there that offer these services, not every advertiser can justify the use of such tools in their stack.
Why do you need this report?
To deal with fraudulent traffic in your campaigns, Google Ads provides two tools:
- IP exclusion – You can exclude up to 500 IP address from each campaign to make minimize fraudulent clicks from competitors and click farms.
- Invalid Traffic Investigation – In the case of your campaigns being attacked by fraudulent clicks, you can also request an Invalid Traffic Investigation (ITI) by Google.
For both solutions, you need to have access to the data of the visitors identified as potentially fraudulent traffic. The minimal data required is their IP, but for the ITI you would also need the user’s GCLID parameter, User Agent and URL visited.
The “standard” way of collecting this data is using server logs. These are files saved, well, on your servers, that log every request (e.g. HTML page, image load, etc.) by visitors on your site. Analyzing these logs is complex (although there are some tools that simplify) but also doesn’t contain a clear indication of whether or not a visitor is suspected to be a bot.
With the log you create with this framework, you can easily collect all the data required, only on fraudulent visits, without any hassle. It also comes in a format you can easily use to submit to Google.
Setting up the log
Create a reCAPTCHA account
First, go to the reCAPTCHA site and register your website (for free):
https://www.google.com/recaptcha/admin/create
Make sure you select reCAPTCHA V3 and register all relevant domains.
Copy the Site Key and Secret Key, you’ll need them in a few steps.
Create an Integromat webhook
Next, go to Integromat and create a new Scenario (you can create a free account for that).
The scenario will use two services: Webhooks and Google Sheets.
Add your first module of the type Webhooks and set it as a Trigger: Custom Webhook
Add a new webhook and name it something memorable, e.g. reCAPTCHA Log
To enable the webhook, click on ‘Re-determine data structure’ and then ‘Copy address to clipboard’.
Paste the address copied into the form below and hit send.
You should now see that the data structure has been ‘Successfully determined’
Add the PHP file to your server
In this post, we’ll use the PHP script devised by Sebastian Pospischil and Philipp Schneider. We’ll make several alterations to it to make sure we capture the relevant data points.
Create a file recaptcha.php with the following code, and upload it to all the domains you wish to validate reCAPTCHA requests on. Place it in a subdirectory named /gtm/.
You can use any path and filename you want for the PHP file -as long as you update the Custom HTML tag’s HTTP request endpoint URL accordingly(in the following step)
Paste the following code into the file.
<?php
// reCaptcha info
$url = 'https://www.google.com/recaptcha/api/siteverify';
$secret = "_reCAPTCHA_secret_key_";
$remoteip = $_SERVER['REMOTE_ADDR'];
$refUrl = $_SERVER['HTTP_REFERER'];
$userAgent = $_SERVER['HTTP_USER_AGENT']??null;
// Form info
$action = $_POST['action'];
$response = $_POST['token'];
// Botscore
$botscore = $_COOKIE['_rbs'];
// Info for log: gclid and referrer
$gclid = $_COOKIE['_gclid'];
$referrer= $_COOKIE['_referrer'];
// Curl Request
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, array(
'secret' => $secret,
'remoteip' => $remoteip,
'action' => $action,
'response' => $response
));
$curlData = curl_exec($curl);
curl_close($curl);
$curlJson = json_decode($curlData, true);
//defining the answer
$answer = $curlData;
// Curl Request
$iurl = '_Integromat_webhook_URL';
//refresh the cookie
setcookie('_rbs', $curlJson['score'], time()+1800, '/','', 0);
// only fire $answer if botscore cookie is not set or different from acutal score
if ($botscore != $curlJson['score']) {
echo $answer;
//Firing to Integromat
if ($curlJson['score'] <= 0.3) {
$icurl = curl_init();
curl_setopt($icurl, CURLOPT_URL, $iurl);
curl_setopt($icurl, CURLOPT_POST, true);
curl_setopt($icurl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($icurl, CURLOPT_POSTFIELDS, array(
'botscore' => $curlJson['score'],
'remoteip' => $remoteip,
'userAgent' =>$userAgent,
'gclid' =>$gclid,
'URL' => $refUrl,
'referrer' => $referrer
));
$icurlData = curl_exec($icurl);
curl_close($icurl);
}
} else {
echo "noChange";
}
?>
Make sure you replace both place holders:
1. The _reCAPTCHA_secret_key with the secret key you got from the reCAPTCHA console in the first step.
2. The _Integromat_webhook_URL with the webhook URL you got from Integromat
The current setup of this script will log only users with a score of 0.3 or lower, indicating they are highly likely to be bots rather than humans.
Add the Custom HTML tag to GTM
In your Google Tag Manager account, create a new tag of the type Custom HTML.
Paste the following code into it.
<style >
/* hides the recaptcha badge */
.grecaptcha-badge {
display: none!important;
}
</style>
<script src = "https://www.google.com/recaptcha/api.js?render=_reCAPTCHA_site_key_"> </script>
<script>
// Parse the URL
function getParameterByName(name) {
name = name.replace(/[[]/, "[").replace(/[]]/, "]");
var regex = new RegExp("[?&]" + name + "=([^&#]*)"),
results = regex.exec(location.search);
return results === null ? "" : decodeURIComponent(results[1].replace(/+/g, " "));
}
// Give the URL parameters variable names
var gclid = getParameterByName('gclid');
if (gclid) {
document.cookie = "_gclid=" + gclid;
}
var referrer = document.referrer;
if (referrer && referrer !== "" && referrer !== undefined && referrer.indexOf(document.location.host) === -1) {
document.cookie = "_referrer=" + referrer;
}
else {
document.cookie = "_referrer=direct";
}
grecaptcha.ready(function() {
grecaptcha.execute('6Ld9HdcUAAAAACcYfQmljdzZVcy8--fCyEJPU1G1', {
action: 'homepage'
}).then(function(token) {
var xhr = new XMLHttpRequest();
xhr.onload = function() {
if (xhr.response !== 'noChange') {
var greResult = JSON.parse(xhr.response);
window.dataLayer.push({
event: 'recaptcha',
recaptchaAnswer: greResult.success,
recaptchaScore: greResult.score
});
}
};
xhr.open('POST', '/gtm/recaptcha.php', true); //replace this with URL to your PHP fil
xhr.setRequestHeader('Content-type', 'application/x-www-form-urlencoded');
xhr.send('token=' + token + '&action=homepage');
});
});
</script>
Make sure you replace the _reCAPTCHA_site_key with the site key you got from the reCAPTCHA console in the first step
Set the tag to trigger on ‘All Pages’. You can also set it to load at a later point, i.e. DOM Ready or Window Loaded.
Create a Google Sheets for logging
Create a new Google Sheets spreadsheet and give it a memorable name, e.g. reCAPTCHA Log.
Add the following headers to the sheet:
Timestamp, IP, User-Agent, Score, URL, GCLID, Referrer
Connect Integromat and Google Sheets
Go back to Integromat and add a Google Sheets module to the scenario (right of the webhook you’ve created).
If this is the first time you’re using Integromat with Google Sheets, you’ll be prompted to grant access to Integromat to your Google account.
Select the Action ‘Add a Row’ and then select the relevant spreadsheet (e.g. reCAPTCHA Log) and sheet (e.g. Sheet 1).
When the sheet has loaded, you will be able to see the sheet’s header you’ve set under Values (make sure that ‘Table contains headers’ is set to Yes).
You can now add the relevant data points submitted to each relevant field.
Field | Value |
IP | remoteip |
User Agent | userAgent |
Score | botscore |
URL | URL |
GCLID | gclid |
Referrer | referrer |
For the Timestamp field, you can navigate to the tab with the Calendar icon and select ‘now’
This is what the end result should look like:
To activate the scenario you need to save it (disc icon) and turn it on using the toggle on the bottom left corner of the screen.
Testing the implementation
To test your implementation, we need to access the site using, well, a bot. Regular visits from a user identified as a human will not be logged. The easiest way that I’ve come up with is to simply submit the page for inspection via Google’s Search Console.
In your Google Search Console, using the top search bar, input the URL you want to test. I recommend adding the GCLID parameter to it to make sure it’s also stripped from the URL correctly, e.g. https://www.example.com?gclid=12345
After adding in the URL, you can select ‘Test Live URL’ so that Googlebot fetches the page and triggers the reCAPTCHA Log.
You will most likely see two visits logged, as Googlebot comes both as a Desktop and as a Mobile User-Agent.
What can you do with this data?
With this log available, you can run a weekly or monthly report to see how many fraudulent clicks you’ve received in your Google Ads Campaigns (marked as clicks that have a GCLID value).
You can then take the IPs of these users and add them to an IP exclusion list (see Google’s documentation on how to do this).
Additionally, you can export the full data and add it to a Click Quality Form submission to highlight the exact clicks you suspect as fraudulent.
Expansions
You can limit the data logged only to visits that contain the GCLID parameter. This will filter out standard bots visiting your site, for example, Google’s and Facebook crawlers.
You can do this by setting up a filter in Integromat that filters in only hits that have a GCLID value. In any case, you will only be able to act on these in Google Ads.
To create the filter, click on the wrench icon between the two modules and then ‘Set up filter’. Name the filter, e.g. ‘Has GCLID’, and add a condition for GCLID exists (under Basic Operators).
Legal disclaimer
The data logged in this solution is mostly similar to the data collected by other web analytics tools (e.g. Google Analytics). The key difference here is that the IP of the users suspected is visible in the log. This is similar to collecting server logs (the previous way of analyzing such data). A user’s IP address is considered to be Personally Identifiable Data (PII) and as such should be stored responsibly. I recommend consulting with your Data Officer or Legal team before using this solution to make sure you are compliant with current regulations (e.g. GDPR and CCPA).