Hiding Google Analytics Campaign Variables
Do you use a service like Google Analytics for viewing your website statistics? Are you keeping track of your inbound links using campaign variables (utm_source, utm_medium, utm_campaign)? I recently ran into a situation where Google search results were linking to URLs with my campaign variables in them. Not a good thing – it really messes up your stats by reporting Google searches as coming from another source! Not to mention causing duplicate copies of your content to appear in the search listings.
Thankfully there is a quick fix for Google. Setting the canonical header link will cause Google to re-evaluate the URL next time your site is indexed. But what about a user copy-and-pasting a link to another site, or bookmarking that link?
It turns out Google Analytics can parse campaign URLs in two different ways. It can parse them in the query parameters (those variables that come after the ‘?’ in your URLs). Or, it can also parse them when stored in the fragment after the ‘#’ in your URL. Google provides an API function to enable parsing of the fragment parameters. The function is _setAllowHash(true). You insert this just before the call to _trackPageview.
pageTracker._setAllowAnchor(true); pageTracker._trackPageview(); |
In theory, this should work well. Google is not supposed to index the fragment parameters that come after a URL. But what if a user bookmarks the URL? Or what if they copy-and-paste the URL to digg or another site? This still isn’t going to solve the problem.
Time to rethink. Ideally, the campaign variables would only be available to Google Analytics and not even show in the client’s URL bar. Then they cannot be indexed by search engines and it would be unlikely they’d be copy-and-pasted to another site by the user. Here is a better solution.
When campaign variables are passed to a web page, the PHP page that is loaded can look at the $_GET parameters and detect those variables. It can then remove them, stick them in a session cookie, and redirect the user on to the correct URL – the one without the campaign variables.
The fragment portion of the URL, the part after the ‘#’, can be modified by Javascript. When the redirected page loads, before the Google Analytics code is called, a bit of Javascript code can be used to pull the campaign variables out of the cookie and place them into the fragment portion of the URL. After the Google Analytics code runs, these campaign variables can be removed, the cookie can be deleted, and the original fragment (if there was one) can be restored.
So, putting it all together, here is what happens.
- User clicks on a link with campaign variables and visits your website with a URL that looks something like: http://example.com/page/?utm_source=youtube&…
- The PHP code running your website detects that the user has clicked on a link with campaign variables, stores those variables in a cookie, and redirects the user to that same URL but without the campaign variables.
- The user’s browser visits the new page, without the campaign variables, and passes the cookie along to that page. The new URL looks something like this: http://example.com/page/
- The page loads in the user’s browser. As it loads a bit of Javascript runs. The Javascript adds the campaign variables to the fragment portion of the URL. At this snapshot in time, the URL looks like this: http://example.com/page/#utm_source=youtube&…
- Immediately after the URL is rewritten, the Google Analytics page tracker code runs and credits the source to the intended campaign. Immediately afterward, the custom Javascript erases the variables from the fragment so that the user never sees them, putting the URL back to: http://example.com/page/#
That’s it. Google Analytics gets the proper information to keep track of your campaigns, the user doesn’t see a cluttered URL, and Google doesn’t get a chance to index the page with the campaign variables in the query string.
There is one bug annoyance. That is, after the Google Analytics page tracker runs and the fragment is erased, it still leaves the single ‘#’ character in the URL. But at least this won’t cause any harm if the user bookmarks it or copy-and-pastes it somewhere. If anyone has ideas on how to get rid of this, please leave me feedback in the comments.
Now, if only this could be incorporated into Joost de Valk’s wonderful Google Analytics for WordPress plugin! I’ve modified my copy to do this already. See the attached Google Analytics for WordPress Modifications. This isn’t a complete plugin, only a modification to the source file for version 2.9.5 of the official plugin.
[Update: See the comments below. Adding this to Google Analytics for WordPress might not be that useful]
Lastly, the code
Here is the bit of PHP code the detects the Google Analytics variables, sets the cookie, and redirects the user to a “clean” URL. If you have run into a similar situation and solved it a different/better way, please leave a comment and let me know what you did. I’m very interested in knowing if this could be done a better way!
// Add any Google Analytics Campaign variables to the found_tags array. // Remove them from the _GET array so they don't get forwarded on $found_tags = array(); foreach(array('utm_source', 'utm_medium', 'utm_campaign', 'utm_term', 'utm_content') as $tag) { if(isset($_GET[$tag]) && !empty($_GET[$tag])) { $found_tags[$tag] = $_GET[$tag]; unset($_GET[$tag]); } } // If any campaign variables were found, redirect the user to the "clean" URL // after setting the 'gatmp' session cookie with the campaign variables. if(count($found_tags) > 0) { setcookie('gatmp', http_build_query($found_tags)); $dest = $_SERVER['SCRIPT_URI']; if( count($_GET) > 0 ) { $dest .= '?'.http_build_query($_GET); } header ('HTTP/1.1 301 Moved Permanently'); header ('Location: '.$dest); exit(0); } |
Next, here is the javascript that detects the cookie and passes the campaign variables on to Google Analytics. This code takes the place of the normal pageTracker._trackPageview() function call.
function gaTrackerClass() { this.cookieVal = false; // Grab the cookie, if it exists, store in this.cookieVal if (typeof(document.cookie) != "undefined" && document.cookie.length > 0) { c_name = 'gatmp'; // Cookie name var c_start=document.cookie.indexOf(c_name + "="); if (c_start!=-1) { var v_start=c_start + c_name.length+1; var v_end=document.cookie.indexOf(";",v_start); if (v_end==-1) v_end=document.cookie.length; this.cookieVal = unescape(document.cookie.substring(v_start,v_end)); // Unset the cookie so it doesn't get used multiple times document.cookie = c_name + "=; expires=Thu, 01-Jan-1970 00:00:10 GMT"; } } // Our _trackPageview function. It emulates the behavior of the Google // function, using the cookie rather than query parameters in the URL. // If no cookie is found, just call the normal _trackPageview function this._trackPageview = function(str) { if( typeof(pageTracker) != "undefined" ) { if(this.cookieVal != false && typeof(window.location) != "undefined") { // Save the current fragment var hashtmp = window.location.hash; // Call Google Analytics and record the campaign variables window.location.hash = '#' + this.cookieVal; pageTracker._setAllowHash(true); pageTracker._trackPageview(str); // Restore the fragment to its original value window.location.hash = hashtmp; } else { pageTracker._trackPageView(str); } } } } var gaTracker = new gaTrackerClass(); gaTracker._trackPageview(); |
Related posts:
Now I haven’t had any coffee yet, but how’s the result of this different from doing campaign tracking with # other than the bookmarking aspect? And how much of an issue is the bookmarking aspect of this, really? Might actually be valuable data if you look at it from another point of view
@Joost de Valk
One of the reasons for this post was to get suggestions on how this could be done better. MANY thanks for the comment! Most people will find campaign tracking with the ‘#’ URL fragment will solve any issues with search engines indexing their content. And yes, now that I look at it, knowing that a link in a twitter tweet was posted to digg could be valuable information! So hiding the campaign variables from the user might not always be a good idea. That’s a very good point! Thank you for mentioning it.
I probably got started down this path the same way most people do. I noticed my campaign links showing up in Google search results and I panicked.
I didn’t realize that it takes time for the Google crawler to process the canonical header link and update their index. In my case it took about 3-4 days.
My first approach to solving this was exactly as you suggested. I clicked the checkbox labeled ‘Use # instead of ? for Campaign tracking?‘ in your Google Analytics for WordPress plugin. Then I started adjusting all my existing redirects to use the ‘#’ rather than ‘?’. When I changed my Apache mod_rewrite rules I found that Apache escaped the ‘#’ and turned it into a %23. I thought I might run into similar problems with other websites not accepting campaign tracking with ‘#’, so I figured campaign tracking with ‘#’ wouldn’t work for me. (Note: I know better now – for anyone else looking for the answer to this, use the noescape option in the RewriteRule to prevent Apache from turning it into a %23).
Another motivation for not using the ‘#’ hash fragment for the campaign tracking was due to a photography website that I’m currently putting together. The photography website uses an AJAX interface to allow switching between images quickly without doing a complete page reload (similar to what is done on smugmug.com). To keep a proper history state in the client’s browser, this site stores its photoID (identifies a specific/unique photo in a gallery) in the ‘#’ fragment. This modifies the URL so the browser’s ‘Back’ and ‘Forward’ buttons work correctly – and, since it only modifies the ‘#’ fragment, it doesn’t trigger a complete reload of the web page. It is also used so that the client can add a bookmark while looking at a specific image and, when they visit that bookmark, the site will fetch that same image.
That’s probably a longer description than needed, but using the ‘#’ fragment for campaign tracking on this photo site would interfere with the AJAX navigation that is also done using the ‘#’ fragment. I was looking for a solution that would keep search engines from seeing (and indexing) the campaign variables while at the same time not interfering with the ‘#’ based navigation features of the photo site. And that is how I arrived at this solution.
Have you tried passing in the campaign information directly into the GA _trackPageView? I'm looking at using a tidy URL that doesn't expose marketing information to the end user, e.g. mysite.com/?campaign=1, rather than mysite.com/?utm_source=…, etc.
So the key would be to override the query parameters in the _trackPageView call directly, e.g. _trackPageView('/?utm_source=…'), rather than setting the codes on the page URL. Have you tried this technique yourself?
Looks like I'm exactly in this situation… I want to use bit.ly to redirect users to the right url. But how exactly would you configure the whole stuff to don't show the campaign information in the url? Can you please be a little bit more specific? Sounds very interesting…
I don't see why there is anything more necessary than the canonical URL tag. In fact: this is a great use for that tag and more necessary than most other uses I can think of. As for google analytics for wordpress: am I to understand it already uses the # instead of the ? version of the code?
Forgot to say this is a really great posting which covers the whole thing very nicely. I've searched trough the net for a long time, but I didn't found something similar like your posting about hiding the campaign variables… Thanks a lot!
Nice Post….. Know More About That Just Log On To:- http://mcube.vmc.in/