<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Steve Taylor &#187; search engines</title>
	<atom:link href="http://sltaylor.co.uk/blog/category/search-engines/feed/" rel="self" type="application/rss+xml" />
	<link>http://sltaylor.co.uk</link>
	<description>Freelance WordPress developer in London - XHTML, CSS &#38; design</description>
	<lastBuildDate>Wed, 25 Apr 2012 20:43:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Double slashes in Analytics URLs</title>
		<link>http://sltaylor.co.uk/blog/double-slashes-in-analytics-urls/</link>
		<comments>http://sltaylor.co.uk/blog/double-slashes-in-analytics-urls/#comments</comments>
		<pubDate>Sun, 18 Oct 2009 15:22:25 +0000</pubDate>
		<dc:creator>Steve Taylor</dc:creator>
				<category><![CDATA[search engines]]></category>
		<category><![CDATA[WordPress]]></category>

		<guid isPermaLink="false">http://sltaylor.co.uk/?p=215</guid>
		<description><![CDATA[I&#8217;ve just been dealing with an issue on a site where Google Analytics is logging a lot of pages twice, once normally and once with a double slash&#8212;&#8221;//&#8221;&#8212;at the end. Obviously this is worrying. If Google is seeing the same page in two &#8220;places&#8221; via two technically different URLs, duplicate content penalties and PageRank squandering [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/wp-content/uploads/2009/10/double-slash.gif" alt="double-slash" width="254" height="204" class="alignright size-full wp-image-216" /></p>
<p>I&#8217;ve just been dealing with an issue on a site where Google Analytics is logging a lot of pages twice, once normally and once with a double slash&#8212;&#8221;//&#8221;&#8212;at the end.</p>
<p>Obviously this is worrying. If Google is seeing the same page in two &#8220;places&#8221; via two technically different URLs, duplicate content penalties and PageRank squandering are distinct possibilities. It also seems to break a lot of the Analytics &#8220;Site Overlay&#8221; functionality.</p>
<p>Here I&#8217;m going to go through what I&#8217;ve done to isolate the cause of the issue, and approaches to fixing it.</p>
<p><span id="more-215"></span></p>
<h2>Ruling out the obvious</h2>
<p>Naturally I combed through our <code>sitemap.xml</code> (generated by the Google XML Sitemaps WordPress plugin)&#8212;no double slashes in there.</p>
<p>Also, I did a <code>site:domain.com</code> type search in Google, and searched through all the returned URLs. No double slashes there either. This suggests that the worst possibility is thankfully not an issue&#8212;Google doesn&#8217;t appear to be <em>indexing</em> multiple versions of the same pages. It appears to be an Analytics-specific thing.</p>
<h2>Possible cause #1: Bad incoming links</h2>
<p>I don&#8217;t think this is the issue. I pinpointed the referring URLs for a few of the double-slashed entries in our Analytics, and the links are fine.</p>
<h2>Possible cause #2: Bad .htaccess redirects</h2>
<p>There are quite a few 301 redirects in our <code>.htaccess</code> file, because this site is a revamp where many URLs have changed. I was worried that some might be badly formed and redirecting with extra slashes. However, the above test pretty much rules this out, too. Clicking on the referring links above went through fine. If a redirect was rewriting external links, this would have been visible in actually clicking through that link in a browser.</p>
<h2>Probable causes: A single bad internal URL and an Analytics mystery</h2>
<p>I used <a href="http://home.snafu.de/tilman/xenulink.html">Xenu</a> to check internal links, and found <em>one</em> instance of an internal link with double-slash at the end. It was the top-left link back to the front page on the WordPress login page of a sub-site that runs a separate WP installation from the main site in the root.</p>
<p>This seemed to point the finger at the generally excellent <a href="http://www.qianqin.de/qtranslate/">qTranslate</a> plugin&#8212;which is installed on this sub-site, but nowhere else. The &#8220;back to blog&#8221; links on other WP installations were fine.</p>
<p>As part of handling multiple language versions of the same WP content, qTranslate uses URL suffixes, such as <code>/de/</code>. This &#8220;back to blog&#8221; link is output with the WP <code>bloginfo('url')</code> function (followed by a hard-coded trailing slash). This function returns the blog URL entered via WP&#8217;s settings, which should be entered without a trailing slash. There was indeed no trailing slash in our setting, so it seems that qTranslate&#8217;s filtering of the <code>bloginfo('url')</code> function must be mistakenly adding a trailing slash where one isn&#8217;t expected.</p>
<p>I&#8217;ve no idea how, but it seems that this single instance of a double-slash was being picked up by Analytics, and was proliferating through other logged data.</p>
<h2>Solution #1: Fixing the trailing slash in WordPress</h2>
<p>First step was to remove the extra slash from the results of the <code>bloginfo('url')</code> function. Looking at the qTranslate forums, others seem to have noticed this issue. Hopefully it&#8217;ll be fixed soon, but until then, placing this code in your WordPress theme&#8217;s <code>functions.php</code> file should make sure this function returns the right URL:</p>
<pre name="code" class="php">function fixblogInfoURL( $result = '' ) {
	if ( substr( $result, -1 ) == '/' ) $result = substr( $result, 0, -1 );
	return $result;
}
add_filter( 'bloginfo_url', 'fixblogInfoURL' );</pre>
<h2>Solution #2: An Analytics filter</h2>
<p><a href="http://www.google.com/support/forum/p/Google+Analytics/thread?tid=132fe1bb420478d3&#038;hl=en">This Analytics support thread</a> ends with a suggestion to include a search/replace filter in your Analytics profile. As all my testing seems to show this is a problem internal to the Analytics system, this seems like a good approach. I&#8217;ve only just set this up, so I&#8217;ll report back if any problems come up with this. Please let me know how it&#8217;s worked (or not) for you!</p>
<h2>Solution #3? Rewriting</h2>
<p>Searching on this issue will bring up multiple suggestions for <code>.htaccess</code> rewrite code to replace double slashes with single slashes.</p>
<p>I can see the logic in this, but as my testing seems to indicate that this is a case of one or two bad URLs mysteriously spreading through the Analytics system (and not through Google&#8217;s index or anywhere else), it seems sufficient to isolate those bad URLs, fix them, and add an Analytics filter.</p>
<p>Again, I&#8217;ll report back if this rewrite approach ends up being necessary; and again, let me know your experiences if they differ.</p>
<hr />
<p><b class="alert">UPDATE:</b> I&#8217;ve decided it&#8217;s best to include a <code>.htaccess</code> rewrite just in case. Thomas Scholz suggests one way below. I&#8217;ve actually ended up using <a href="http://yoast.com/trailing-double-forward-slashes-in-urls-in-the-serps/">this one at yoast.com</a>.</p>
<h2>Postscript: The Default Page setting</h2>
<p><i>9/11/09:</i> Thanks to <a href="/blog/double-slashes-in-analytics-urls/#comment-2399">Gavin Doolan&#8217;s comment</a>, this problem has become a little clearer. On the Google Analytics profile in question, the &#8220;Default page&#8221; setting was set to &#8220;/&#8221; (one forward slash). It seems now this was wrong, and at least part of the problem.</p>
<p>Now, since I implemented the above fixes, our stats stopped registering double-slash URLs completely (even with the default page set to &#8220;/&#8221;). They&#8217;re still in the stats from before the fixes were applied, but hits dropped to zero after the fixes. Maybe the rewrite was saving Analytics from registering double-slash URLs. Maybe there was some other factor at work other than the default page setting (which certainly seems possible given the confusion I&#8217;ve seen on forums about this issue).</p>
<p>However, even with the double-slashes not registering in terms of hits, the Site Overlay feature just wasn&#8217;t working. It registered no hits on any links&#8212;and when you hovered over any link, there on its end was the dreaded double-slash. Now that the default page is left blank, Site Overlay is back to normal. I suspect and hope that this is the last nail in this issue&#8217;s coffin!</p>
]]></content:encoded>
			<wfw:commentRss>http://sltaylor.co.uk/blog/double-slashes-in-analytics-urls/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Google, SEO &amp; CSS image replacement</title>
		<link>http://sltaylor.co.uk/blog/google-seo-css-image-replacement/</link>
		<comments>http://sltaylor.co.uk/blog/google-seo-css-image-replacement/#comments</comments>
		<pubDate>Sat, 09 Jun 2007 12:06:37 +0000</pubDate>
		<dc:creator>Steve Taylor</dc:creator>
				<category><![CDATA[search engines]]></category>
		<category><![CDATA[XHTML/CSS]]></category>

		<guid isPermaLink="false">http://sltaylor.co.uk/blog/2007/06/google-seo-css-image-replacement/</guid>
		<description><![CDATA[I&#8217;ve just been reading about possible clashes between the CSS &#8220;image replacement&#8221; technique that I use and Google&#8217;s rules about spam techniques. Image replacement involves using CSS to hide the text for an element (e.g. a &#60;h1&#62;), and setting the background-image for that element to replace it with an image. Users with visual browsers with [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just been reading about possible clashes between the <a href="http://www.mezzoblue.com/tests/revised-image-replacement/">CSS &#8220;image replacement&#8221; technique</a> that I use and Google&#8217;s rules about spam techniques.</p>
<p>Image replacement involves using CSS to hide the text for an element (e.g. a <code>&lt;h1&gt;</code>), and setting the <code>background-image</code> for that element to replace it with an image. Users with visual browsers with CSS get the image; text-only browsers, bots, etc., just see plain text.</p>
<p>It&#8217;s not without its detractors and slight drawbacks, but it&#8217;s a widespread technique. A quick scan of big-name sites as of writing found it in evidence on <a href="http://www.stopdesign.com/">stopdesign.com</a>, <a href="http://www.mezzoblue.com/">mezzoblue.com</a> and <a href="http://www.adobe.com/">adobe.com</a>.</p>
<p><span id="more-21"></span></p>
<p>However:</p>
<blockquote cite="http://www.google.com/support/webmasters/bin/answer.py?answer=66353"><p>Hiding text or links in your content can cause your site to be perceived as untrustworthy since it presents information to search engines differently than to visitors. (<a href="http://www.google.com/support/webmasters/bin/answer.py?answer=66353">Google Webmaster Help Center</a>)</p></blockquote>
<p>This obviously caused some panic among developers using image replacement. While Google seem to have made some comments saying that they would distinguish between legitimate usage and spamming, they&#8217;re pretty vague about what constitutes one or the other. The ever-informative 456bereastreet.com <a href="http://www.456bereastreet.com/archive/200510/google_seo_and_using_css_to_hide_text/">have a good summary</a>. But while their conclusion is quite reassuring, it&#8217;s two years old &#8211; an aeon in web technology &#8211; and still vague:</p>
<blockquote cite="http://www.456bereastreet.com/archive/200510/google_seo_and_using_css_to_hide_text/"><p>While it&#8217;s good to know that sites are not currently being removed without a manual review, that could change in the future. So I would advise anyone making extensive use of CSS techniques that hide text to make sure that it can&#8217;t be mistaken for spamming.</p></blockquote>
<p>How do you make sure? They don&#8217;t say.</p>
<p>More recently, <span class="removed_link" title="http://www.seocritique.com/sem/seo/css-image-replacement">SEO Critique</span> took the approach of checking out high-profile designers with close links to Google and seeing if they used image replacement:</p>
<blockquote cite="http://www.seocritique.com/sem/seo/css-image-replacement"><p>Rand Fishkin at SEOmoz does use CSS image replacement on the SEOmoz.org site, albeit sparingly. &#8230; I figure that Rand goes to enough conferences and has enough interaction with Googlers like Matt Cutts and Vanessa Fox that if there was a danger of imminent death by penalty he would know and would quickly order the offense removed. Hence, used sparingly and in the strict spirit of If a Blind Person Were Using a Text Reader How Would It Sound? My opinion is that using CSS image replacement is probably okay.</p></blockquote>
<p>Probably not an issue to get panicked about, then, but one to keep your eye on. I&#8217;ll carry on using it judiciously until further notice.</p>
]]></content:encoded>
			<wfw:commentRss>http://sltaylor.co.uk/blog/google-seo-css-image-replacement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
<!-- This Quick Cache file was built for (  sltaylor.co.uk/blog/category/search-engines/feed/ ) in 0.26400 seconds, on May 17th, 2012 at 1:28 pm UTC. -->
<!-- This Quick Cache file will automatically expire ( and be re-built automatically ) on May 17th, 2012 at 2:28 pm UTC -->
