Input validation is the
process of testing input received by the application for compliance
against a standard defined within the application. It can be as simple
as strictly typing a parameter and as complex as using regular
expressions or business logic to validate input. There are two different
types of input validation approaches: whitelist validation (sometimes
referred to as inclusion or positive validation) and blacklist
validation (sometimes known as exclusion or negative validation). These
two approaches, and examples of validating input in Java, C#, and PHP to
prevent SQL injection, are detailed in the following subsections.
Tip
When performing
input validation you should always ensure that the input is in its
canonical (simplest) form before making any input validation decisions.
This may involve decoding the input into a simpler format, or just
rejecting input that isn't already in canonical format where
non-canonical input isn't expected.
Whitelisting
Whitelist validation is
the practice of only accepting input that is known to be good. This can
involve validating compliance with the expected type, length or size,
numeric range, or other format standards before accepting the input for
further processing. For example, validating that an input value is a
credit card number may involve validating that the input value contains
only numbers, is between 13 and 16 digits long, and passes the business
logic check of correctly passing the Luhn formula (the formula for
calculating the validity of a number based on the last “check” digit of
the card).
When using whitelist validation you should consider the following points:
Data type
Is the data type correct? If the value is supposed to be numeric, is it
numeric? If it is supposed to be a positive number, is it a negative
number instead?
Data size
If the data is a string, is it of the correct length? Is it less than
the expected maximum length? If it is a binary blob, is it less than the
maximum expected size? If it is numeric, is it of the correct size or
accuracy? (For example, if an integer is expected, is the number that is
passed too large to be an integer value?)
Data range If the data is numeric, is it in the expected numeric range for this type of data?
Data content
Does the data look like the expected type of data? For example, does it
satisfy the expected properties of a ZIP Code if it is supposed to be a
ZIP Code? Does it contain only the expected character set for the data
type expected? If a name value is submitted, only some punctuation
(single quotes and character accents) would normally be expected, and
other characters, such as the less than sign (<), would not be
expected.
A common method of
implementing content validation is to use regular expressions. Following
is a simple example of a regular expression for validating a U.S. ZIP
Code contained in a string:
In this case, the regular expression matches both five-digit and five-digit + four-digit ZIP Codes as follows:
^\d{5} Match exactly five numeric digits at the start of the string.
(–\d{4})? Match the dash character plus exactly four digits either once (present) or not at all (not present).
$
This would appear at the end of the string. If there is additional
content at the end of the string, the regular expression will not match.
In general, whitelist
validation is the more powerful of the two input validation approaches.
It can, however, be difficult to implement in scenarios where there is
complex input, or where the full set of possible inputs cannot be easily
determined. Difficult examples may include applications that are
localized in languages with large character sets (e.g., Unicode
character sets such as the various Chinese and Japanese character sets).
It is recommended that you use whitelist validation wherever possible,
and then supplement it by using other controls such as output encoding
to ensure that information that is then submitted elsewhere (such as to
the database) is handled correctly.
Designing an Input Validation and Handling Strategy
Input validation is
a valuable tool for securing an application. However, it should be only
part of a defense-in-depth strategy, with multiple layers of defense
contributing to the application's overall security.
Whitelist
input validation used at the application input layer to validate all
user input as it is accepted by the application. The application allows
only input that is in the expected form. Whitelist
input validation also performed at the client's browser. This is done
to avoid a round trip to the server in case the user enters data that is
unacceptable. You cannot rely on this as a security control, as all
data from the user's browser can be altered by an attacker. Blacklist
and whitelist input validation present at a Web application firewall
(WAF) layer (in the form of vulnerability “signatures” and “learned”
behavior) to provide intrusion detection/prevention capabilities and
monitoring of application attacks. Parameterized statements used throughout the application to ensure that safe SQL execution is performed. Encoding used within the database to safely encode input when used in dynamic SQL. Data
extracted from the database appropriately encoded before it is used.
For example, data being displayed in the browser is encoded for
cross-site scripting (XSS).
|
Blacklisting
Blacklisting is the
practice of only rejecting input that is known to be bad. This commonly
involves rejecting input that contains content that is specifically
known to be malicious by looking through the content for a number of
“known bad” characters, strings, or patterns. This approach is generally
weaker than whitelist validation because the list of potentially bad
characters is extremely large, and as such any list of bad content is
likely to be large, slow to run through, incomplete, and difficult to
keep up to date.
A common method of
implementing a blacklist is also to use regular expressions, with a list
of characters or strings to disallow, such as the following example:
'|%|--|;|/\*|\\\*|_|\[|@|xp_
In general, you
should not use blacklisting in isolation, and you should use
whitelisting if possible. However, in scenarios where you cannot use
whitelisting, blacklisting can still provide a useful partial control.
In these scenarios, however, it is recommended that you use blacklisting
in conjunction with output encoding to ensure that input passed
elsewhere (e.g., to the database) is subject to an additional check to
ensure that it is correctly handled to prevent SQL injection.
What to Do When Input Fails Validation?
So, what do you do when
input fails validation? There are two major approaches: recovering and
continuing on, or failing the action and reporting an error. Each has
its advantages and disadvantages:
Recovering
Recovering from an input validation failure implies that the input can
be sanitized or fixed—that is, that the problem that caused the failure
can be solved programmatically. This is generally more likely to be
possible if you are taking a blacklisting approach for input validation,
and it commonly takes the approach of removing bad characters from the
input. The major disadvantage of this approach is ensuring that the
filtering or removal of values does actually sanitize the input, and
doesn't just mask the malicious input, which can still lead to SQL
injection issues. Failing
Failing the action entails generating a security error, and possibly
redirecting to a generic error page indicating to the user that the
application had a problem and cannot continue. This is generally the
safer option, but you should still be careful to make sure that no
information regarding the specific error is presented to the user, as
this could be useful to an attacker to determine what is being validated
for in the input. The major disadvantage of this approach is that the
user experience is interrupted and any transaction in progress may be
lost. You can mitigate this by additionally performing input validation
at the client's browser, to ensure that genuine users should not submit
invalid data, but you cannot rely on this as a control because a
malicious user can change what is ultimately submitted to the site.
Whichever approach you
choose, ensure that you log that an input validation error has occurred
in your application logs. This could be a valuable resource for you to
use to investigate an actual or attempted break-in to your application.
|
Validating Input in Java
In
Java, input validation support is specific to the framework being used.
To demonstrate input validation in Java, we will look at how a common
framework for building Web applications in Java, Java Server Faces
(JSF), provides support for input validation. For this purpose, the best
way to implement input validation is to define an input validation
class that implements the javax.faces.validator.Validator interface. Refer for the following code snippet for an example of validating a username in JSF:
public class UsernameValidator implements Validator {
public void validate(FacesContext facesContext,
UIComponent uIComponent, Object value) throws ValidatorException
{
//Get supplied username and cast to a String
String username = (String)value;
//Set up regular expression
Pattern p = Pattern.compile(“^[a-zA-Z]{8,12}$”);
//Match username
Matcher m = p.matcher(username);
if (!matchFound) {
FacesMessage message = new FacesMessage();
message.setDetail(“Not valid – it must be 8–12 letter only”);
message.setSummary(“Username not valid”);
message.setSeverity(FacesMessage.SEVERITY_ERROR);
throw new ValidatorException(message);
}
}
And the following will need to be added to the faces-config.xml file in order to enable the above validator:
<validator>
<validator-id>namespace.UsernameValidator</validator-id>
<validator-class>namespace.package.UsernameValidator</validator-class>
</validator>
You can then refer to this in the related JSP file as follows:
<h:inputText value=“username” id=”username” required=“true”>
<f:validator validatorId=“namespace.UsernameValidator” />
</h:inputText>
An
additional useful resource for implementing input validation in Java is
the OWASP Enterprise Security API (ESAPI) that you can download at www.owasp.org/index.php/ESAPI.
ESAPI is a freely available reference implementation of
security-related methods that you can use to build a secure application.
This includes an implementation of an input validation class, org.owasp.esapi.reference.DefaultValidator, which you can use directly or as a reference implementation for a custom input validation engine.
Validating Input in .NET
ASP.NET features a number of built-in controls that you can use for input validation, the most useful of which are the RegularExpressionValidator control and the CustomValidator
control. Using these controls with an ASP.NET application provides the
additional benefit that client-side validation will also be performed,
which will improve the user experience in case the user genuinely enters
erroneous input. The following code is an example of the use of RegularExpressionValidator to validate that a username contains only letters (uppercase and lowercase) and is between eight and 12 characters long:
<asp:textbox id=“userName” runat=“server”/>
<asp:RegularExpressionValidator id=“usernameRegEx” runat=“server”
ControlToValidate=“userName”
ErrorMessage=“Username must contain 8–12 letters only.”
ValidationExpression=“^[a-zA-Z]{8,12}$” />
The next code snippet is an example of the use of CustomValidator to validate that a password is correctly formatted. In this case, you also need to create two user-defined functions: PwdValidate on the server to perform validation on the password value, and ClientPwdValidate in client-side JavaScript or VBScript to validate the password value at the user's browser.
<asp:textbox id=“txtPassword” runat=“server”/>
<asp:CustomValidator runat=“server”
ControlToValidate=“txtPassword”
ClientValidationFunction=“ClientPwdValidate”
ErrorMessage=“Password does not meet requirements.”
OnServerValidate=“PwdValidate” />
Validating Input in PHP
As PHP is not
directly tied to a presentation layer, input validation support in PHP,
as in Java, is specific to the framework in use. Because there is no
presentation framework in PHP with overwhelming popularity, a large
number of PHP applications implement input validation directly in their
own code.
You can use a number of functions in PHP as the basic building blocks for building input validation, including the following:
preg_match(regex, matchstring) Do a regular expression match with matchstring using the regular expression regex.
is_<type>(input) Check whether the input is <type>; for example, is_numeric().
strlen(input) Check the length of the input.
An example of using preg_match to validate a form parameter could be as follows:
$username = $_POST['username'];
if (!preg_match(“/^[a-zA-Z]{8,12}$/D”, $username) {
// handle failed validation
}