performance when using a .htaccess

Lisa Kachold lisakachold at obnosis.com
Fri Oct 22 20:18:34 MST 2010


On Fri, Oct 22, 2010 at 4:00 PM, Lisa Kachold <lisakachold at obnosis.com>wrote:

>
>
> On Fri, Oct 22, 2010 at 2:18 PM, keith smith <klsmith2020 at yahoo.com>wrote:
>
>>
>>
>> Hi,
>>
>> I have a question about performance when using a .htaccess file.  I have
>> read that having multiple .htaccess files can slow Apache.  Meaning a
>> .htaccess file in each directory.
>>
>> We have moved a ton of content, upwards of 900 pages.  About 600 of those
>> have been moved from our blog which was located in the directory /blog.  It
>> was suggested to break the .htaccess into files that reflect the content
>> moved.  For example put a .htaccess file in the /blog directory that
>> reflects all the content from the blog instead of one big .htaccess file in
>> the doc root directory that would contain 900 redirects.
>>
>
> Well, that's better than FollowSymlinks?
>
> The reason that multiple .htaccess file management can be slow and
> difficult is that Apache2 searches each TREE and .htaccess files are
> inherited from hierarchical directories.
>
> A rewrite might actually be able to do exactly what you need?  have you
> considered that?  Rewrite overhead is not huge, especially if you are
> caching for this /blog URL?
>
>
You simply enable mod_rewrite in Apache2 (procedure varies depending on your
distro/version).
A mod_rewrite solution is ONE line entry in your configuration file for that
VirtualHost (for instance):

1) Here's a simple rewrite (provided your directory BLOG containing all of
the 600 files can be trivially redirected to something like "newblog" ).

RewriteEngine  on
RewriteBase    /blog/
RewriteRule    ^*/newblog/* $R1

Rewrite all files from one URL "blog" with a R permanent redirect to
/blogs/?

2) Use a RewriteMap which is loaded ONCE by Apache:

http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewritemap

The RewriteMap directive defines a *Rewriting Map* which can be used inside
rule substitution strings by the mapping-functions to insert/substitute
fields through a key lookup. The source of this lookup can be of various
types.

The *MapName* is the name of the map and will be used to specify a
mapping-function for the substitution strings of a rewriting rule via one of
the following constructs:

*${ MapName : LookupKey }
${ MapName : LookupKey | DefaultValue }*

When such a construct occurs, the map *MapName* is consulted and the key *
LookupKey* is looked-up. If the key is found, the map-function construct is
substituted by *SubstValue*. If the key is not found then it is substituted
by *DefaultValue* or by the empty string if no *DefaultValue* was specified.

For example, you might define a RewriteMap as:

RewriteMap examplemap txt:/path/to/file/map.txt

You would then be able to use this map in a RewriteRule as follows:

RewriteRule ^/ex/(.*) ${examplemap:$1}

3) Advanced Rewrites Filesystem Reorganization Description:

This really is a hardcore example: a killer application which heavily uses
per-directory RewriteRules to get a smooth look and feel on the Web while
its data structure is never touched or adjusted.

drwxrwxr-x   2 netsw  users    512 Aug  3 18:39 Audio/
drwxrwxr-x   2 netsw  users    512 Jul  9 14:37 Benchmark/
drwxrwxr-x  12 netsw  users    512 Jul  9 00:34 Crypto/
drwxrwxr-x   5 netsw  users    512 Jul  9 00:41 Database/
drwxrwxr-x   4 netsw  users    512 Jul 30 19:25 Dicts/
drwxrwxr-x  10 netsw  users    512 Jul  9 01:54 Graphic/
drwxrwxr-x   5 netsw  users    512 Jul  9 01:58 Hackers/
drwxrwxr-x   8 netsw  users    512 Jul  9 03:19 InfoSys/
drwxrwxr-x   3 netsw  users    512 Jul  9 03:21 Math/
drwxrwxr-x   3 netsw  users    512 Jul  9 03:24 Misc/
drwxrwxr-x   9 netsw  users    512 Aug  1 16:33 Network/
drwxrwxr-x   2 netsw  users    512 Jul  9 05:53 Office/
drwxrwxr-x   7 netsw  users    512 Jul  9 09:24 SoftEng/
drwxrwxr-x   7 netsw  users    512 Jul  9 12:17 System/
drwxrwxr-x  12 netsw  users    512 Aug  3 20:15 Typesetting/
drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/

Solution:

The solution has two parts: The first is a set of CGI scripts which create
all the pages at all directory levels on-the-fly. I put them under
/e/netsw/.www/ as follows:

-rw-r--r--   1 netsw  users    1318 Aug  1 18:10 .wwwacl
drwxr-xr-x  18 netsw  users     512 Aug  5 15:51 DATA/
-rw-rw-rw-   1 netsw  users  372982 Aug  5 16:35 LOGFILE
-rw-r--r--   1 netsw  users     659 Aug  4 09:27 TODO
-rw-r--r--   1 netsw  users    5697 Aug  1 18:01 netsw-about.html
-rwxr-xr-x   1 netsw  users     579 Aug  2 10:33 netsw-access.pl
-rwxr-xr-x   1 netsw  users    1532 Aug  1 17:35 netsw-changes.cgi
-rwxr-xr-x   1 netsw  users    2866 Aug  5 14:49 netsw-home.cgi
drwxr-xr-x   2 netsw  users     512 Jul  8 23:47 netsw-img/
-rwxr-xr-x   1 netsw  users   24050 Aug  5 15:49 netsw-lsdir.cgi
-rwxr-xr-x   1 netsw  users    1589 Aug  3 18:43 netsw-search.cgi
-rwxr-xr-x   1 netsw  users    1885 Aug  1 17:41 netsw-tree.cgi
-rw-r--r--   1 netsw  users     234 Jul 30 16:35 netsw-unlimit.lst

The DATA/ subdirectory holds the above directory structure, i.e. the real *
net.sw* stuff and gets automatically updated via rdist from time to time.
The second part of the problem remains: how to link these two structures
together into one smooth-looking URL tree? We want to hide the
DATA/directory from the user while running the appropriate CGI scripts
for the
various URLs. Here is the solution: first I put the following into the
per-directory configuration file in the
DocumentRoot<http://httpd.apache.org/docs/2.0/mod/core.html#documentroot>of
the server to rewrite the announced URL
/net.sw/ to the internal path /e/netsw:

RewriteRule  ^net.sw$       net.sw/        [R]
RewriteRule  ^net.sw/(.*)$  e/netsw/$1

The first rule is for requests which miss the trailing slash! The second
rule does the real thing. And then comes the killer configuration which
stays in the per-directory config file /e/netsw/.www/.wwwacl:

Options       ExecCGI FollowSymLinks Includes MultiViews

RewriteEngine on

#  we are reached via /net.sw/ prefix
RewriteBase   /net.sw/

#  first we rewrite the root dir to
#  the handling cgi script
RewriteRule   ^$                       netsw-home.cgi     [L]
RewriteRule   ^index\.html$            netsw-home.cgi     [L]

#  strip out the subdirs when
#  the browser requests us from perdir pages
RewriteRule   ^.+/(netsw-[^/]+/.+)$    $1                 [L]

#  and now break the rewriting for local files
RewriteRule   ^netsw-home\.cgi.*       -                  [L]
RewriteRule   ^netsw-changes\.cgi.*    -                  [L]
RewriteRule   ^netsw-search\.cgi.*     -                  [L]
RewriteRule   ^netsw-tree\.cgi$        -                  [L]
RewriteRule   ^netsw-about\.html$      -                  [L]
RewriteRule   ^netsw-img/.*$           -                  [L]

#  anything else is a subdir which gets handled
#  by another cgi script
RewriteRule   !^netsw-lsdir\.cgi.*     -                  [C]
RewriteRule   (.*)                     netsw-lsdir.cgi/$1

Some hints for interpretation:

   1. Notice the L (last) flag and no substitution field ('-') in the forth
   part
   2. Notice the ! (not) character and the C (chain) flag at the first rule
   in the last part
   3. Notice the catch-all pattern in the last rule

Reference:  http://httpd.apache.org/docs/2.0/misc/rewriteguide.html  (SEE
also the excellent sections on blocking annoying robots, and other tricks).

4) I would consider organizing your blog files into some form of
organization like say an Alphabetical new file structure where wildcard
rewrites will reduce your toital number of rewrites.

With a large number of rewrites, especially where are permanent R1 redirect
is used, I would ALWAYS USE HARD /etc/apache2 configuration files as an
include statement.  They are easier to backup manage, grep through and
evaluate problems after a graceful restart to reinitialize new changes.



>
>> Thank you for your feedback.
>>
>> ------------------------
>> Keith Smith
>>
>> ---------------------------------------------------
>> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>
>
>
>
> --
> Skype: 6022393392
> ATT:     5037544452
> GV:      6923073392
> Phoenix Linux Security Team <http://hackfest.obnosis.com>
> PLUG.PHOENIX.AZ.US
> http://www.it-clowns.com
> *"Great things are not done by impulse but a series of small things
> brought together." -Van Gogh*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Skype: 6022393392
ATT:     5037544452
GV:      6923073392
Phoenix Linux Security Team <http://hackfest.obnosis.com>
PLUG.PHOENIX.AZ.US
http://www.it-clowns.com
*"Great things are not done by impulse but a series of small things brought
together." -Van Gogh*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20101022/bacd0184/attachment.html>


More information about the PLUG-discuss mailing list