Steve Taylor photo

Preventing internal rewrites in .htaccess

mod-rewritemod_rewrite is a notoriously fiendish chunk of software, almost legendarily so among web developers. I’ve got by with snippets in my .htaccess file, stuff that makes sure there’s no “www” in the URL, and managing holding pages. I absorbed a lot of how it works, but I couldn’t readily construct my own rewrites.

Until tonight.

I was forced to solve a situation involving a WordPress site which had some legacy flat HTML content. Way back when, I hacked it to use .html suffixes on the WP URLs, so we didn’t lose any search engine juice for the old URLs. This necessitated changing core WP code—always a bad idea—and thus blocked any automatic WP upgrade process. Which is a pain.

So, I’ve removed the .html hack, and tried to pull flat HTML content in for the old stuff, under the guise of smooth new .html-free URLs.

Say we have something that was at /news_2004_09.html. First, we redirect to new URLs:

RewriteRule ^news_([0-9]{4})_([0-9]{2}).html$ /news/$1/$2/ [R=301,L]
RewriteRule ^(.*).html$ /$1/ [R=301,L]

The first bit redirects (301 = permanently) pages with the format news_[year]_[month].html to news/[year]/[month]/. The second line deals with all other pages that used to have .html endings.

Now to pull in flat HTML:

RewriteRule ^news/([0-9]{4})/([0-9]{2})/$ /html-news/news_$1_$2.html [L]

Note the absence of the R=301 flag. Without this, an “internal rewrite” instead of an “external redirect” happens. For the end user, what this amounts to is that the URL you typed in or clicked on will stay the same—just the way the resource used by the server for that request will alter. R=301 (or any other HTTP status code) will make the URL in the browser change—meaning this URL will get bookmarked, indexed, etc.

All well and good, but (and this is where my last couple of hours have gone)… Even though no R flag means it’s an “internal rewrite”, and the L flag indicates this is the last rule to be processed, Apache still effectively “makes another request”.

I don’t think this involves more traffic between the server and browser, and I think it’s related to the fact that the rewriting is being set up in .htacccess (rather than at the server level, in httpd.conf). I’m not quite sure.

The end result is, all your other rules get processed again, even though you gave it that L flag! So for me, the old URL was going through the following rewrites:

  1. http://example.com/news_2004_09.html (initial URL)
  2. http://example.com/news/2004/09/ (permanent redirect)
  3. http://example.com/html-news/news_2004_09.html (“internal rewrite”, not visible to browser)
  4. http://example.com/html-news/news_2004_09/ (another permanent redirect)

See what happened? My catch-all code for changing old .html URLs was processing the “internal rewrite” URL. That last step shouldn’t have happened. The browser’s URL should have stopped at stage 2, with the content coming from the rewrite in stage 3. Instead, I got the URL in stage 4, with no content at all.

The solution? Well, thanks to Richard K on the Mod_Rewrite forums, here it is:

RewriteRule ^news_([0-9]{4})_([0-9]{2}).html$ /news/$1/$2/ [R=301,L]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^(.*).html$ /$1/ [R=301,L]
RewriteRule ^news/([0-9]{4})/([0-9]{2})/$ /html-news/news_$1_$2.html [L]

I worked out a while ago that I needed a RewriteCond before my catch-all rule, but I couldn’t for the life of me find the condition to test if the current request is a result of an internal rewrite already.

ENV:REDIRECT_STATUS is my new best friend.

7 comments

  1. matt avatar matt

    THANK YOU for sharing this! Can’t believe how much time I just wasted trying to resolve this internal redirect issue.

  2. Carl avatar Carl

    You need to make this post more prominent! I found this post via a search in Google for “preventing internal rewrites in htaccess” but I had a spend an hour going through non relevant stuff that didn’t work before I found you.

    Can I suggest you change your page title from “ignoring internal rewrites…” to “preventing internal rewrites…” and people will find this excellent advice more easily because it will appear higher for the expression that people are more likely to search for.

  3. Thanks for the tip, Carl. What I’ve done is change the title but kept “ignoring” in the slug—both words seem appropriate, and I don’t want to hinder the searches of those who would use “ignoring”!

  4. Very Thankful avatar Very Thankful

    Thanks for this!

    Seriously, this should be more well known. I was beating my head on the keyboard for over 30 minutes trying to figure out how to keep Apache from 301 redirecting something that was supposed to be an INTERNAL redirect only.

  5. Alena Markus avatar Alena Markus

    It took me a lot of time just to find this kind of solution to my problem. I no longer spend most of my time in redirecting things and refreshing my work every time I experience having problem with my work. Your post is very helpful and really works.

  6. Vaclav Kohout avatar Vaclav Kohout

    Great! Put it to .htaccess Bible! It stops all my neverending loops finaly.

  7. Pablo Minetti avatar Pablo Minetti

    Hi,
    you save my live.. thanks for this post.
    I spend lot of hour, to solve this problem, fortunatly I found your post.
    THANKS

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>