Archive

Archive for July, 2009

Hiding Google Analytics Campaign Variables

July 6th, 2009 8 comments

Do you use a service like Google Analytics for viewing your website statistics? Are you keeping track of your inbound links using campaign variables (utm_source, utm_medium, utm_campaign)? I recently ran into a situation where Google search results were linking to URLs with my campaign variables in them. Not a good thing – it really messes up your stats by reporting Google searches as coming from another source! Not to mention causing duplicate copies of your content to appear in the search listings.

Thankfully there is a quick fix for Google. Setting the canonical header link will cause Google to re-evaluate the URL next time your site is indexed. But what about a user copy-and-pasting a link to another site, or bookmarking that link?

It turns out Google Analytics can parse campaign URLs in two different ways.  It can parse them in the query parameters (those variables that come after the ‘?’ in your URLs). Or, it can also parse them when stored in the fragment after the ‘#’ in your URL. Google provides an API function to enable parsing of the fragment parameters.  The function is _setAllowHash(true).  You insert this just before the call to _trackPageview.

pageTracker._setAllowAnchor(true);
pageTracker._trackPageview();

In theory, this should work well. Google is not supposed to index the fragment parameters that come after a URL. But what if a user bookmarks the URL?  Or what if they copy-and-paste the URL to digg or another site? This still isn’t going to solve the problem.

Time to rethink. Ideally, the campaign variables would only be available to Google Analytics and not even show in the client’s URL bar. Then they cannot be indexed by search engines and it would be unlikely they’d be copy-and-pasted to another site by the user. Here is a better solution.

When campaign variables are passed to a web page, the PHP page that is loaded can look at the $_GET parameters and detect those variables. It can then remove them, stick them in a session cookie, and redirect the user on to the correct URL – the one without the campaign variables.

The fragment portion of the URL, the part after the ‘#’, can be modified by Javascript. When the redirected page loads, before the Google Analytics code is called, a bit of Javascript code can be used to pull the campaign variables out of the cookie and place them into the fragment portion of the URL. After the Google Analytics code runs, these campaign variables can be removed, the cookie can be deleted, and the original fragment (if there was one) can be restored.

So, putting it all together, here is what happens.

  1. User clicks on a link with campaign variables and visits your website with a URL that looks something like: http://example.com/page/?utm_source=youtube&…
  2. The PHP code running your website detects that the user has clicked on a link with campaign variables, stores those variables in a cookie, and redirects the user to that same URL but without the campaign variables.
  3. The user’s browser visits the new page, without the campaign variables, and passes the cookie along to that page. The new URL looks something like this: http://example.com/page/
  4. The page loads in the user’s browser. As it loads a bit of Javascript runs.  The Javascript adds the campaign variables to the fragment portion of the URL.  At this snapshot in time, the URL looks like this: http://example.com/page/#utm_source=youtube&…
  5. Immediately after the URL is rewritten, the Google Analytics page tracker code runs and credits the source to the intended campaign. Immediately afterward, the custom Javascript erases the variables from the fragment so that the user never sees them, putting the URL back to: http://example.com/page/#

That’s it. Google Analytics gets the proper information to keep track of your campaigns, the user doesn’t see a cluttered URL, and Google doesn’t get a chance to index the page with the campaign variables in the query string.

There is one bug annoyance.  That is, after the Google Analytics page tracker runs and the fragment is erased, it still leaves the single ‘#’ character in the URL.  But at least this won’t cause any harm if the user bookmarks it or copy-and-pastes it somewhere.  If anyone has ideas on how to get rid of this, please leave me feedback in the comments.

Now, if only this could be incorporated into Joost de Valk’s wonderful Google Analytics for WordPress plugin! I’ve modified my copy to do this already. See the attached Google Analytics for WordPress Modifications. This isn’t a complete plugin, only a modification to the source file for version 2.9.5 of the official plugin.
[Update: See the comments below. Adding this to Google Analytics for WordPress might not be that useful]

Lastly, the code :) Here is the bit of PHP code the detects the Google Analytics variables, sets the cookie, and redirects the user to a “clean” URL. If you have run into a similar situation and solved it a different/better way, please leave a comment and let me know what you did. I’m very interested in knowing if this could be done a better way!

// Add any Google Analytics Campaign variables to the found_tags array.
// Remove them from the _GET array so they don't get forwarded on
$found_tags = array();
foreach(array('utm_source', 'utm_medium', 'utm_campaign', 'utm_term', 'utm_content') as $tag) {
    if(isset($_GET[$tag]) && !empty($_GET[$tag])) {
        $found_tags[$tag] = $_GET[$tag];
        unset($_GET[$tag]);
    }
}
 
// If any campaign variables were found, redirect the user to the "clean" URL
// after setting the 'gatmp' session cookie with the campaign variables.
if(count($found_tags) > 0) {
    setcookie('gatmp', http_build_query($found_tags));
    $dest = $_SERVER['SCRIPT_URI'];
    if( count($_GET) > 0 ) {
        $dest .= '?'.http_build_query($_GET);
    }
    header ('HTTP/1.1 301 Moved Permanently');
    header ('Location: '.$dest);
    exit(0);
}

Next, here is the javascript that detects the cookie and passes the campaign variables on to Google Analytics. This code takes the place of the normal pageTracker._trackPageview() function call.

function gaTrackerClass() {
  this.cookieVal = false;
 
  // Grab the cookie, if it exists, store in this.cookieVal
  if (typeof(document.cookie) != "undefined" && document.cookie.length > 0) {
    c_name = 'gatmp'; // Cookie name
    var c_start=document.cookie.indexOf(c_name + "=");
    if (c_start!=-1) {
      var v_start=c_start + c_name.length+1;
      var v_end=document.cookie.indexOf(";",v_start);
      if (v_end==-1) v_end=document.cookie.length;
      this.cookieVal = unescape(document.cookie.substring(v_start,v_end));
      // Unset the cookie so it doesn't get used multiple times
      document.cookie = c_name + "=; expires=Thu, 01-Jan-1970 00:00:10 GMT";
    }
  }
 
  // Our _trackPageview function. It emulates the behavior of the Google
  // function, using the cookie rather than query parameters in the URL.
  // If no cookie is found, just call the normal _trackPageview function
  this._trackPageview = function(str) {
    if( typeof(pageTracker) != "undefined" ) {
      if(this.cookieVal != false && typeof(window.location) != "undefined") {
        // Save the current fragment
        var hashtmp = window.location.hash;
 
        // Call Google Analytics and record the campaign variables
        window.location.hash = '#' + this.cookieVal;
        pageTracker._setAllowHash(true);
        pageTracker._trackPageview(str);
 
        // Restore the fragment to its original value
        window.location.hash = hashtmp;
      } else {
        pageTracker._trackPageView(str);
      }
    }
  }
}
var gaTracker = new gaTrackerClass();
gaTracker._trackPageview();
Categories: Tutorial Tags: ,