On February 23, 2006, an American B-2 bomber
crashed shortly after take-off in Guam due to bad data being fed to the
airplane's flight control computers. A lack of data checking resulted
in the loss of a $2.1 billion plane. As with any user interaction in
programming, it is foolish to trust data without validating its
integrity first.
One of the main ways of validating user input is to
verify the data definition for the database. More often than not, the
database definition will be known at the time of application
development. You can then verify user input against a known
specification. However, if you do not have this luxury, you will need to
query the database for its definition and ensure the user's data does
not run afoul of it.
Querying the database for its structure
If we are completely ignorant of a database's structure, we need to first retrieve a table listing. To affect that, we use SHOW TABLES.
statement = """SHOW TABLES"""
command = cur.execute(statement)
Be sure to follow the execute() call with fetchall() assigned to a variable to hold the tuple that is returned.
The tuple results can then be accessed to give the user a choice.
print "Which table would you like to use?"
for i in xrange(0, len(results)): print i+1, results[i][0]
choice = input("Input number:")
As fish only has one table, the output of the for loop would simply be:
But if we do the same for the world database, we get a more realistic selection:
1 City
2 Combo
3 Country
4 CountryLanguage
The user can then choose from the list. If we want to verify the user's data, we need to verify three things:
1. The value input by the user is only a number.
2. The numeric value is not outside the offered range.
3. The value is a whole number.
To validate the input as a number, we need to import the string module and use the isdigit() method of string objects.
We would then use an if statement along the following lines:
if choice.isdigit() is True:
print "We have a number!" ## or do further checking
We then need to confirm that the input is within the given range. To verify that the value of results is greater than 0 but not greater than the number of given options:
if (choice<0) or (choice>len(results)):
print "We need a new number!" ## or do further checking
Within the previous range, however, we still run into problems with decimals. We currently have no protection against choice being equal to 3.5, for example. There are a couple of ways that we can protect against this at the validation stage:
By checking the length of the input and telling the user that we need a single digit within the given range
By stripping out all but the first digit and returning the results to the user for confirmation
To check the length of the input, we simply use Python's built-in len() function in a conditional loop:
if len(choicea) != 1:
print "We need a single digit within the given range, please."
This, however, is not the most user-friendly way to
handle the data. In cases where there are a lot of choices, it can even
leave the user confused. Better is to offer an alternative by way of
confirmation. To do this, we convert the input to a string using
Python's built-in str() function and then present the first element of the indexed string to the user for confirmation.
choice_string = str(choice)
confirm = input("Did you mean %s?" %(choice_string[0]))
If confirm is assigned a positive value by the user—whether it is yes, true, or just 1, we should then convert the value of choice_string[0] to an integer. We do this with Python's built-in int() function.
real_choice = int(choice_string[0])
This has the benefit of handling input from users who
either have poor typing skills or who may otherwise input gobbledygook
after their initial, valid selection.
Retrieving the table structure
After validating the user's input, we have the choice
of database to be used. We now need to give the user details on the
fields being used by that table. Again, we use DESCRIBE.
table_statement = """DESCRIBE %s""" %(results[real_choice-1][0])
cur.execute(table_statement)
table_desc = cur.fetchall()
It is worth noting here that indices start at 0 but
our choices to the user started at 1. Therefore, whatever choice the
user makes must be reduced by one in order to synchronize it with the
index of results.
Also, we do not want to pass the value of the entire
tuple in the statement. We just want the value of the table to be
queried. Therefore, we must subscript the results record with a 0.
In MySQL, the DESCRIBE statement returns a
table. In MySQL for Python, we get another tuple. Each element in that
tuple is a row in the table returned by MySQL. So where MySQL would
return the following.
mysql> DESCRIBE CountryLanguage;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| CountryCode | char(3) | NO | PRI | | |
| Language | char(30) | NO | PRI | | |
| IsOfficial | enum('T','F') | NO | | F | |
| Percentage | float(4,1) | NO | | 0.0 | |
+-------------+---------------+------+-----+---------+-------+
A prettified version of what Python returns is the following:
>>> for i in xrange(0, len(table_desc)): print table_desc[i]
...
('CountryCode', 'char(3)', 'NO', 'PRI', '', '')
('Language', 'char(30)', 'NO', 'PRI', '', '')
('IsOfficial', "enum('T','F')", 'NO', '', 'F', '')
The differences between xrange() and range() are often overlooked or even ignored. Both count through the same values, but they do it differently. Where range() calculates a list the first time it is called and then stores it in memory, xrange() creates an immutable sequence that returns the next in the series each time it is called. As a consequence, xrange() is much more memory efficient than range(),
especially when dealing with large groups of integers. As a consequence
of its memory efficiency, however, it does not support functionality
such as slicing, which range() does, because the series is not yet fully determined.
Each element of each row is then available by a further subscript for the column that you want to access:
>>> print table_desc[0][0]
CountryCode
>>> print table_desc[0][1]
char(3)
>>> print table_desc[1][3]
So to offer the user the format of the table columns, we could use the following code:
print "The records of table %s follow this format:" %(results[choice-1][0])
for i in xrange(0, len(table_desc)):
print table_desc[i][0]
The output is as follows:
The records of table CountryLanguage follow this format:
CountryCode
Language
IsOfficial
We can also walk through this data to give the user
the format of each field for each record in the table. The fields of
information for each field in any MySQL table remains constant and
follow this order:
Field: The name of the column
Type: The data type allowed in that column, along with its length
Null: Whether a null value is allowed
Key: Whether this value is a primary key for the table
Default: What the default value is, if no value is entered as input for this column
Extra: Any additional information about this column
To access a particular column, one simply appends the appropriate column number as a subscript, as shown previously.
Knowing this, one can code in helps options
for each field in turn. This can be a JavaScript pop-up or a manual page
for each column. For the sake of space, however, this is left here as
an exercise for yourself.