Skip to navigation | Skip to content



Ignoring internal rewrites in .htaccess

mod-rewritemod_rewrite is a notoriously fiendish chunk of software, almost legendarily so among web developers. I’ve got by with snippets in my .htaccess file, stuff that makes sure there’s no “www” in the URL, and managing holding pages. I absorbed a lot of how it works, but I couldn’t readily construct my own rewrites.

Until tonight.

I was forced to solve a situation involving a WordPress site which had some legacy flat HTML content. Way back when, I hacked it to use .html suffixes on the WP URLs, so we didn’t lose any search engine juice for the old URLs. This necessitated changing core WP code—always a bad idea—and thus blocked any automatic WP upgrade process. Which is a pain.

So, I’ve removed the .html hack, and tried to pull flat HTML content in for the old stuff, under the guise of smooth new .html-free URLs.

Say we have something that was at /news_2004_09.html. First, we redirect to new URLs:

RewriteRule ^news_([0-9]{4})_([0-9]{2})\.html$ /news/$1/$2/ [R=301,L]
RewriteRule ^(.*)\.html$ /$1/ [R=301,L]

The first bit redirects (301 = permanently) pages with the format news_[year]_[month].html to news/[year]/[month]/. The second line deals with all other pages that used to have .html endings.

Now to pull in flat HTML:

RewriteRule ^news/([0-9]{4})/([0-9]{2})/$ /html-news/news_$1_$2.html [L]

Note the absence of the R=301 flag. Without this, an “internal rewrite” instead of an “external redirect” happens. For the end user, what this amounts to is that the URL you typed in or clicked on will stay the same—just the way the resource used by the server for that request will alter. R=301 (or any other HTTP status code) will make the URL in the browser change—meaning this URL will get bookmarked, indexed, etc.

All well and good, but (and this is where my last couple of hours have gone)… Even though no R flag means it’s an “internal rewrite”, and the L flag indicates this is the last rule to be processed, Apache still effectively “makes another request”.

I don’t think this involves more traffic between the server and browser, and I think it’s related to the fact that the rewriting is being set up in .htacccess (rather than at the server level, in httpd.conf). I’m not quite sure.

The end result is, all your other rules get processed again, even though you gave it that L flag! So for me, the old URL was going through the following rewrites:

  1. http://example.com/news_2004_09.html (initial URL)
  2. http://example.com/news/2004/09/ (permanent redirect)
  3. http://example.com/html-news/news_2004_09.html (“internal rewrite”, not visible to browser)
  4. http://example.com/html-news/news_2004_09/ (another permanent redirect)

See what happened? My catch-all code for changing old .html URLs was processing the “internal rewrite” URL. That last step shouldn’t have happened. The browser’s URL should have stopped at stage 2, with the content coming from the rewrite in stage 3. Instead, I got the URL in stage 4, with no content at all.

The solution? Well, thanks to Richard K on the Mod_Rewrite forums, here it is:

RewriteRule ^news_([0-9]{4})_([0-9]{2})\.html$ /news/$1/$2/ [R=301,L]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^(.*)\.html$ /$1/ [R=301,L]
RewriteRule ^news/([0-9]{4})/([0-9]{2})/$ /html-news/news_$1_$2.html [L]

I worked out a while ago that I needed a RewriteCond before my catch-all rule, but I couldn’t for the life of me find the condition to test if the current request is a result of an internal rewrite already.

ENV:REDIRECT_STATUS is my new best friend.

1 comment

  1. matt (22nd April 2010)

    THANK YOU for sharing this! Can’t believe how much time I just wasted trying to resolve this internal redirect issue.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Want to show me some code? Don't paste long code here, link to a post at pastebin.ca or a similar site. Thanks.

Recent posts

Archives