MySQL for Python : Simple Insertion - Using metadata

5/28/2013 7:37:54 PM

On February 23, 2006, an American B-2 bomber crashed shortly after take-off in Guam due to bad data being fed to the airplane's flight control computers. A lack of data checking resulted in the loss of a $2.1 billion plane. As with any user interaction in programming, it is foolish to trust data without validating its integrity first.

One of the main ways of validating user input is to verify the data definition for the database. More often than not, the database definition will be known at the time of application development. You can then verify user input against a known specification. However, if you do not have this luxury, you will need to query the database for its definition and ensure the user's data does not run afoul of it.

Querying the database for its structure

If we are completely ignorant of a database's structure, we need to first retrieve a table listing. To affect that, we use SHOW TABLES.

statement = """SHOW TABLES"""
command = cur.execute(statement)

Be sure to follow the execute() call with fetchall() assigned to a variable to hold the tuple that is returned.

results = cur.fetchall()

The tuple results can then be accessed to give the user a choice.

print "Which table would you like to use?"
for i in xrange(0, len(results)): print i+1, results[i][0]
choice = input("Input number:")

As fish only has one table, the output of the for loop would simply be:

1 menu

But if we do the same for the world database, we get a more realistic selection:

1 City
2 Combo
3 Country
4 CountryLanguage

The user can then choose from the list. If we want to verify the user's data, we need to verify three things:

1. The value input by the user is only a number.
2. The numeric value is not outside the offered range.

3. The value is a whole number.

To validate the input as a number, we need to import the string module and use the isdigit() method of string objects.

import string

We would then use an if statement along the following lines:

if choice.isdigit() is True:
print "We have a number!" ## or do further checking

We then need to confirm that the input is within the given range. To verify that the value of results is greater than 0 but not greater than the number of given options:

if (choice<0) or (choice>len(results)):
print "We need a new number!" ## or do further checking

Within the previous range, however, we still run into problems with decimals. We currently have no protection against choice being equal to 3.5, for example. There are a couple of ways that we can protect against this at the validation stage:

By checking the length of the input and telling the user that we need a single digit within the given range
By stripping out all but the first digit and returning the results to the user for confirmation

To check the length of the input, we simply use Python's built-in len() function in a conditional loop:

if len(choicea) != 1:
print "We need a single digit within the given range, please."

This, however, is not the most user-friendly way to handle the data. In cases where there are a lot of choices, it can even leave the user confused. Better is to offer an alternative by way of confirmation. To do this, we convert the input to a string using Python's built-in str() function and then present the first element of the indexed string to the user for confirmation.

choice_string = str(choice)
confirm = input("Did you mean %s?" %(choice_string[0]))

If confirm is assigned a positive value by the user—whether it is yes, true, or just 1, we should then convert the value of choice_string[0] to an integer. We do this with Python's built-in int() function.

real_choice = int(choice_string[0])

This has the benefit of handling input from users who either have poor typing skills or who may otherwise input gobbledygook after their initial, valid selection.

Retrieving the table structure

After validating the user's input, we have the choice of database to be used. We now need to give the user details on the fields being used by that table. Again, we use DESCRIBE.

table_statement = """DESCRIBE %s""" %(results[real_choice-1][0])
cur.execute(table_statement)
table_desc = cur.fetchall()

It is worth noting here that indices start at 0 but our choices to the user started at 1. Therefore, whatever choice the user makes must be reduced by one in order to synchronize it with the index of results.

Also, we do not want to pass the value of the entire tuple in the statement. We just want the value of the table to be queried. Therefore, we must subscript the results record with a 0.

In MySQL, the DESCRIBE statement returns a table. In MySQL for Python, we get another tuple. Each element in that tuple is a row in the table returned by MySQL. So where MySQL would return the following.

mysql> DESCRIBE CountryLanguage;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| CountryCode | char(3) | NO | PRI | | |
| Language | char(30) | NO | PRI | | |
| IsOfficial | enum('T','F') | NO | | F | |
| Percentage | float(4,1) | NO | | 0.0 | |
+-------------+---------------+------+-----+---------+-------+

A prettified version of what Python returns is the following:

>>> for i in xrange(0, len(table_desc)): print table_desc[i]
...
('CountryCode', 'char(3)', 'NO', 'PRI', '', '')
('Language', 'char(30)', 'NO', 'PRI', '', '')
('IsOfficial', "enum('T','F')", 'NO', '', 'F', '')

The differences between xrange() and range() are often overlooked or even ignored. Both count through the same values, but they do it differently. Where range() calculates a list the first time it is called and then stores it in memory, xrange() creates an immutable sequence that returns the next in the series each time it is called. As a consequence, xrange() is much more memory efficient than range(), especially when dealing with large groups of integers. As a consequence of its memory efficiency, however, it does not support functionality such as slicing, which range() does, because the series is not yet fully determined.

Each element of each row is then available by a further subscript for the column that you want to access:

>>> print table_desc[0][0]
CountryCode
>>> print table_desc[0][1]
char(3)
>>> print table_desc[1][3]

So to offer the user the format of the table columns, we could use the following code:

print "The records of table %s follow this format:" %(results[choice-1][0])
for i in xrange(0, len(table_desc)):
print table_desc[i][0]

The output is as follows:

The records of table CountryLanguage follow this format:
CountryCode
Language
IsOfficial

We can also walk through this data to give the user the format of each field for each record in the table. The fields of information for each field in any MySQL table remains constant and follow this order:

Field: The name of the column
Type: The data type allowed in that column, along with its length
Null: Whether a null value is allowed
Key: Whether this value is a primary key for the table
Default: What the default value is, if no value is entered as input for this column
Extra: Any additional information about this column

To access a particular column, one simply appends the appropriate column number as a subscript, as shown previously.

Knowing this, one can code in helps options for each field in turn. This can be a JavaScript pop-up or a manual page for each column. For the sake of space, however, this is left here as an exercise for yourself.

Others