On Fri, Oct 22, 2010 at 4:00 PM, Lisa Kachold <lisakachold@obnosis.com> wrote:


On Fri, Oct 22, 2010 at 2:18 PM, keith smith <klsmith2020@yahoo.com> wrote:


Hi,

I have a question about performance when using a .htaccess file.  I have read that having multiple .htaccess files can slow Apache.  Meaning a .htaccess file in each directory.

We have moved a ton of content, upwards of 900 pages.  About 600 of those have been moved from our blog which was located in the directory /blog.  It was suggested to break the .htaccess into files that reflect the content moved.  For example put a .htaccess file in the /blog directory that reflects all the content from the blog instead of one big .htaccess file in the doc root directory that would contain 900 redirects.

Well, that's better than FollowSymlinks?

The reason that multiple .htaccess file management can be slow and difficult is that Apache2 searches each TREE and .htaccess files are inherited from hierarchical directories. 

A rewrite might actually be able to do exactly what you need?  have you considered that?  Rewrite overhead is not huge, especially if you are caching for this /blog URL?
 
You simply enable mod_rewrite in Apache2 (procedure varies depending on your distro/version).
A mod_rewrite solution is ONE line entry in your configuration file for that VirtualHost (for instance):

1) Here's a simple rewrite (provided your directory BLOG containing all of the 600 files can be trivially redirected to something like "newblog" ).

RewriteEngine  on
RewriteBase    /blog/
RewriteRule    ^/newblog/ $R1
Rewrite all files from one URL "blog" with a R permanent redirect to /blogs/?

2) Use a RewriteMap which is loaded ONCE by Apache:

http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewritemap

The RewriteMap directive defines a Rewriting Map which can be used inside rule substitution strings by the mapping-functions to insert/substitute fields through a key lookup. The source of this lookup can be of various types.

The MapName is the name of the map and will be used to specify a mapping-function for the substitution strings of a rewriting rule via one of the following constructs:

${ MapName : LookupKey }
${ MapName : LookupKey | DefaultValue }

When such a construct occurs, the map MapName is consulted and the key LookupKey is looked-up. If the key is found, the map-function construct is substituted by SubstValue. If the key is not found then it is substituted by DefaultValue or by the empty string if no DefaultValue was specified.

For example, you might define a RewriteMap as:

RewriteMap examplemap txt:/path/to/file/map.txt

You would then be able to use this map in a RewriteRule as follows:

RewriteRule ^/ex/(.*) ${examplemap:$1}


3) Advanced Rewrites Filesystem Reorganization
Description:

This really is a hardcore example: a killer application which heavily uses per-directory RewriteRules to get a smooth look and feel on the Web while its data structure is never touched or adjusted.

drwxrwxr-x   2 netsw  users    512 Aug  3 18:39 Audio/
drwxrwxr-x   2 netsw  users    512 Jul  9 14:37 Benchmark/
drwxrwxr-x  12 netsw  users    512 Jul  9 00:34 Crypto/
drwxrwxr-x   5 netsw  users    512 Jul  9 00:41 Database/
drwxrwxr-x   4 netsw  users    512 Jul 30 19:25 Dicts/
drwxrwxr-x  10 netsw  users    512 Jul  9 01:54 Graphic/
drwxrwxr-x   5 netsw  users    512 Jul  9 01:58 Hackers/
drwxrwxr-x   8 netsw  users    512 Jul  9 03:19 InfoSys/
drwxrwxr-x   3 netsw  users    512 Jul  9 03:21 Math/
drwxrwxr-x   3 netsw  users    512 Jul  9 03:24 Misc/
drwxrwxr-x   9 netsw  users    512 Aug  1 16:33 Network/
drwxrwxr-x   2 netsw  users    512 Jul  9 05:53 Office/
drwxrwxr-x   7 netsw  users    512 Jul  9 09:24 SoftEng/
drwxrwxr-x   7 netsw  users    512 Jul  9 12:17 System/
drwxrwxr-x  12 netsw  users    512 Aug  3 20:15 Typesetting/
drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/
Solution:

The solution has two parts: The first is a set of CGI scripts which create all the pages at all directory levels on-the-fly. I put them under /e/netsw/.www/ as follows:

-rw-r--r--   1 netsw  users    1318 Aug  1 18:10 .wwwacl
drwxr-xr-x  18 netsw  users     512 Aug  5 15:51 DATA/
-rw-rw-rw-   1 netsw  users  372982 Aug  5 16:35 LOGFILE
-rw-r--r--   1 netsw  users     659 Aug  4 09:27 TODO
-rw-r--r--   1 netsw  users    5697 Aug  1 18:01 netsw-about.html
-rwxr-xr-x   1 netsw  users     579 Aug  2 10:33 netsw-access.pl
-rwxr-xr-x   1 netsw  users    1532 Aug  1 17:35 netsw-changes.cgi
-rwxr-xr-x   1 netsw  users    2866 Aug  5 14:49 netsw-home.cgi
drwxr-xr-x   2 netsw  users     512 Jul  8 23:47 netsw-img/
-rwxr-xr-x   1 netsw  users   24050 Aug  5 15:49 netsw-lsdir.cgi
-rwxr-xr-x   1 netsw  users    1589 Aug  3 18:43 netsw-search.cgi
-rwxr-xr-x   1 netsw  users    1885 Aug  1 17:41 netsw-tree.cgi
-rw-r--r--   1 netsw  users     234 Jul 30 16:35 netsw-unlimit.lst

The DATA/ subdirectory holds the above directory structure, i.e. the real net.sw stuff and gets automatically updated via rdist from time to time. The second part of the problem remains: how to link these two structures together into one smooth-looking URL tree? We want to hide the DATA/ directory from the user while running the appropriate CGI scripts for the various URLs. Here is the solution: first I put the following into the per-directory configuration file in the DocumentRoot of the server to rewrite the announced URL /net.sw/ to the internal path /e/netsw:

RewriteRule  ^net.sw$       net.sw/        [R]
RewriteRule  ^net.sw/(.*)$  e/netsw/$1

The first rule is for requests which miss the trailing slash! The second rule does the real thing. And then comes the killer configuration which stays in the per-directory config file /e/netsw/.www/.wwwacl:

Options       ExecCGI FollowSymLinks Includes MultiViews

RewriteEngine on

#  we are reached via /net.sw/ prefix
RewriteBase   /net.sw/

#  first we rewrite the root dir to
#  the handling cgi script
RewriteRule   ^$                       netsw-home.cgi     [L]
RewriteRule   ^index\.html$            netsw-home.cgi     [L]

#  strip out the subdirs when
#  the browser requests us from perdir pages
RewriteRule   ^.+/(netsw-[^/]+/.+)$    $1                 [L]

#  and now break the rewriting for local files
RewriteRule   ^netsw-home\.cgi.*       -                  [L]
RewriteRule   ^netsw-changes\.cgi.*    -                  [L]
RewriteRule   ^netsw-search\.cgi.*     -                  [L]
RewriteRule   ^netsw-tree\.cgi$        -                  [L]
RewriteRule   ^netsw-about\.html$      -                  [L]
RewriteRule   ^netsw-img/.*$           -                  [L]

#  anything else is a subdir which gets handled
#  by another cgi script
RewriteRule   !^netsw-lsdir\.cgi.*     -                  [C]
RewriteRule   (.*)                     netsw-lsdir.cgi/$1

Some hints for interpretation:

  1. Notice the L (last) flag and no substitution field ('-') in the forth part
  2. Notice the ! (not) character and the C (chain) flag at the first rule in the last part
  3. Notice the catch-all pattern in the last rule
Reference:  http://httpd.apache.org/docs/2.0/misc/rewriteguide.html  (SEE also the excellent sections on blocking annoying robots, and other tricks).

4) I would consider organizing your blog files into some form of organization like say an Alphabetical new file structure where wildcard rewrites will reduce your toital number of rewrites.

With a large number of rewrites, especially where are permanent R1 redirect is used, I would ALWAYS USE HARD /etc/apache2 configuration files as an include statement.  They are easier to backup manage, grep through and evaluate problems after a graceful restart to reinitialize new changes.

 

Thank you for your feedback.

------------------------
Keith Smith


---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss



--
Skype: 6022393392
ATT:     5037544452
GV:      6923073392
Phoenix Linux Security Team   PLUG.PHOENIX.AZ.US
http://www.it-clowns.com
"Great things are not done by impulse but a series of small things brought together." -Van Gogh


















--
Skype: 6022393392
ATT:     5037544452
GV:      6923073392
Phoenix Linux Security Team   PLUG.PHOENIX.AZ.US
http://www.it-clowns.com
"Great things are not done by impulse but a series of small things brought together." -Van Gogh