Archive for May, 2008
Doing Mod_Rewrite Right
Posted by pbg in General, Web Development on May 16th, 2008
There are a few different things to do to make Apache_mod rewrite right. Overall the difficulty isnt too great, but setting it up right at the beginning is the key. You dont really want to have to catch every little exception in mod_rewrite regular expressions. Using your database to store safe strings to use in your url makes the whole process much more efficient. This little fact is usually not mentioned in tutorials for mod_rewrite.
You really do want to keep the mod_rewrite rules simple. Dont try to write a complex regexp in mod_rewrite that handles all kinds of apostropes, special characters, etc. (like I did). You dont have to have question marks, quotations, colons in the rewritten url for it to be useful to search engines. You can turn a title like “O’mally’s dog’s bone” into http://domain.com/Omallys_dogs_bone and there is definitely enough textual sense in that rewritten url for a search engine to deal with it.
Take your table with all your content data in it. Create a field for your content for a safe title. Then you can process your old titles into the new field. In your looping construct, use a bit of php to clean out your old titles for spaces, quotes, slashes, and other silly things.
$punctuations = array('.', '\'', '?','!','*','=','Ó','%','@','&',',','/');
$safeTitle = str_replace($punctuations, "", $title);// get rid of the junk
$safeTitle = str_replace(" ", "_", $safeTitle);// replace spaces with underscores
Now you have a content resource which you can add to your output queries that will fill in your url link on your page for mod_rewrite goodness.
Make your mod_rewrite rule in your .htaccess file. Note here that the rule has a place for 2 variables, and is looking for all instances of strings with upper and lower case letters, the numbers 0-9, and the underscore character. And of course, it turns it all back into a query string to submit to your content page.
RewriteRule ^/?([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)(/)?$ item.php?safeTopicName=$1&safeTitle=$2
Almost done right? Eh, not quite. Almost though. Dont screw over your existing users, who may have linked to something of yours to the past. You can still account for your old reference style to your web content, and you most definitely should. You can write checks for query string data validation to allow for transparent access to content through either the old query string method or the new one.
if($_GET["safeTopicName"]){
$sql = sprintf("SELECT topicId
FROM contentTopics
WHERE safeTopicName
LIKE '%s'",
mysql_real_escape_string($_GET["safeTopicName"]));
diode($topicId = $db->getOne($sql), $sql); // my db connection wrapper
$sql = sprintf("SELECT articleid
FROM content
WHERE safeTitle
LIKE '%s'",
mysql_real_escape_string($_GET["safeTitle"]));
diode($articleid = $db->getOne($sql), $sql);
} else {
if($_GET["topicId"]) {
$topicId = (int)$_GET["topicId"]);
}
if($_GET["articleid"]) {
$articleid = (int)$_GET["articleid"];
}
}
if(!isset($topicId) || !isset($articleid)) {
addMessage("no item found", "MsgErr");
redirect();
exit();
}
A couple notes: Im using PEAR, and a couple of custom functions for efficiency sake. Note the use of (int) and mysql_real_escape_string() for sanitizing and typing. And yes, there are probably better ways to write this up, but you get the idea. Look for your $_GET vars, and if you dont have one set or the other, no result, otherwise, process it so the rest of the code needs no further reliance on these initial options so a user can get to your site with /Planets/earth as well as with item.php?topicId=2&articleid=249.
To Recap:
- Set up safe versions of your content titles
- process the old titles with a script
- make a simpler rewrite rule as a result
- set up your validation to process both kinds of queries
- marvel about how much simpler it was to do it that way than to try and do it all with Mod_Rewrite alone.
Web Form Security: Moving target vs. Honeypot
Posted by pbg in General, Web Development, php on May 13th, 2008
In my last blog posting I alluded to using randomizing form field names as a solution to form attacks. Here is an example of how it can be created for a simple form page. There is no doubt more than one way to accomplish this kind of idea, so please this example only as a basic demo that suited my needs.
Create a moving target that attackers cannot seize upon repeatedly. build arrays in a looping construct for all the form fields you want to assign in your page. Store them in a PHP Session array. You use built-in php functions such as md5(), uniqid(), microtime(), mt_rand(), and a salt value if you like as well. You output your form fields dynamically, using php to assign the randomized hash to the name value of the form field. Enter some data, submit the form. The script takes your $_POST array and compares the array keys to $_SESSION. You can then do further validation and then assign your values to common sense variable names that are always private.
When you have validated this submission, you know the data has come from your form page. While you can spoof referrers, You cant spoof the form field names because they are only created at runtime.
The honeypot is the inverse approach, And also has lots of fans in its camp. A honeypot is a web form with addtional form elements, usually of a hidden type, that get discovered by a spammers crawler. They then seize upon the field name and use it in an attack. But since the form field isnt visible to users through the browser, it must be some kind of forged submission, and is worthy of filtering out.
The advantage of the moving target over honeypot is that forged submissions can be filtered out earlier in the script. Also, an attacker could easily analyze the form page once and determine what form fields to omit, and just add that information into the submitting script. They visited the page once, made a correction, and are back in business. Even so it is known as a successful defense. It is a successful defense because of the reason spam is spam: people messing with your site without ever even visiting it, not once. And if you are using an off-the-shelf website-in-a-box like WordPress or Drupal or whatever, the attacker can even more easily attack your site, with its cookie cutter template form elements, one same as the other million out there already.
It is very economical to attack as many sites as possible in the same way as possible. It will always be so.
I have had my share of naysayers over the moving target method. Please allow me reply to a few of the comments others have already made.
Why not just use the form name, why all form fields? I guess you could, but really there are a couple answers. First is the concept of defense in depth. Secure the whole thing, not just one element that an attacker could lock on to. Next answer is that it is simple enough to do the work in php to generate all the form field names you wish.
The site could still be attacked. Yes. Assume that it will be. Funky forms is of course not the only line of defense you must apply to stop your site from being trashed. What I was able to accomplish here is to break the link between the site and the garden variety automated attack, which must assume to know your form name and names of input fields in order to forge the rest of the information. The client must be on your web page in real time to submit data into your form. And in fact that is all the moving target approach does. The attacker still harvests your page, prepares a http remote attack in the guise of a simulated form posting, then goes to work, submitting to all the websites. But nothing gets through to a site with the moving target approach because field names wont match up.
A position based attacker could still hit it. Yes but of course you are not done validating your input because you have this in place. Spam, like anything else, is a matter of economics, in terms of both time and money. Yes someone could get you, but not likely, because like 2 boxers in a ring, both have to be stationary for a moment for a punch to connect. Otherwise its much harder to be effective, and much less powerful. The analogy is a fair one: The time required to hit a site with moving target is greater than the time to perform the usual kind of automated crawling and submitting designed for static form field names. The mere fact that you require your user to be on your page, absolutely, is enough in itself for attackers not to bother changing its tactics for millions of websites, or to lose so much time to making an exception to you that it becomes uneconomical to do so. As it stands, they may never even know that their submission was unsuccessful. You can of course push suspicious submissions to Akismet.
Yeah but sessions are evil and should never be used. Some have said so. Not to long ago, they didn’t work very well. But this isnt the case anymore. Drupal doesn’t use sessions, for example, and other middlewares avoid them as well. Projects with requirements for handling legacy code, particular kinds of services or policies may insist that sessions not be used. But even more evil is to never use sessions because of not understanding how to use them properly and parsimoniously.
First comes your form page, use some php before the form to generate the fields that you need.
<?php
session_start();
if(!$_SESSION["subscriber"]["values"]) {
$fieldNamesCount = 11;
$fieldNamesArray = array();
for ($i = 0; $i < $fieldNamesCount; $i++) {
// $fieldNamesArray2[] = md5("killSpam" . uniqid(microtime(), 1)); // random coctail with salt, if you wish
$fieldNamesArray[] = uniqid(md5(mt_rand())); // random coctail
}
$_SESSION["subscriber"]["fieldNames"] = $fieldNamesArray;
} else {
// do something when its a return pag
}
echo "<pre>"; print_r($fieldNamesArray) echo "</pre>";
……. and then your form fields look something like this:
Name: <input type="text" name="{$_SESSION["subscriber"]["fieldNames"][0]}" value="<?php echo $_SESSION["subscriber"]["values"][0]; ?>" size="20" maxlength="50" />
Phone: <input name="{$_SESSION["subscriber"]["fieldNames"][1]}" type="text" value="<?php echo $_SESSION["subscriber"]["values"][1]; ?>" size="20" maxlength="20" />
You submit this to your form target script. If you look at your page Info in Firefox, under the forms tab, you will see you have form field names created from random hashes generated at runtime. The values for the names will be unique at every page load. The user must be on the page to submit.
So lets take a look at the script you are posting this data to.
Lets just assume that you are pointing this form submission to a different file, so here is what is required at a minimum:
<?php
session_start();
if (!$_POST) {
echo "no post reference";
exit();
}
// compare $_SESSION["subscriber"]["fieldNames"]
// to array_keys($_POST);
if(!$_SESSION["subscriber"]["fieldNames"]) {
echo "no ref to my session";
exit();
}
$postedKeys = array_keys($_POST); // I need to access this as an array.
$_SESSION["subscriber"]["values"] = $_POST;
$realNames = array('Name','Telephone',.... etc);
for($i = 0; $i < count($postedKeys); $i++) {
if($postedKeys[$i] == $_SESSION["subscriber"]["fieldNames"][$i]) {
// no cheating! you must you my randomly generated field names to use this page!!!!
$realValues[$realNames[$i]] = $_SESSION["subscriber"]["values"][$_SESSION["subscriber"]["fieldNames"][$i]];
} else {
// its the work of satan
echo "please dont do that ";
exit();
}
}
so if it passes all the tests, its good to go. Otherwise, its like two people talking to each other who dont speak each others language. They will never get what each other is saying, will never understand, and will just move on.