My SQL : Logging Statements (part 1) - Logging Queries

6/5/2013 4:01:39 AM

MySQL has traditionally employed statement-based replication and just recently implemented row-based replication.

In statement-based replication, the actual executed statement is written to the binary log together with some execution information, and the statement is reexecuted on the slave. Since not all statements can be logged as statements, there are some exceptions that you should be aware of. This section will describe the process of logging statements as well as the important caveats.

Since the binary log is a common resource—all threads write statements to it—it is critical to prevent two threads from updating the binary log at the same time. To handle this, a lock for the binary log—the LOCK_log mutex—is acquired just before the event is written to the binary log and released just after the event has been written. Because all session threads for the server log statements to the binary log, it is quite common for several session threads to block on this lock.

1. Logging Data Manipulation Language Statements

Data Manipulation Language (DML) statements are usually DELETE, INSERT, and UPDATE statements. To support safe logging, MySQL writes the binary log while transaction-level locks are held, and releases them after the binary log has been written.

To ensure the binary log is updated consistently with the tables that the statement modifies, the statement is logged to the binary log at the same time that the statement is being committed, just before the table locks are released. If the logging were not made as part of the statement, another statement could be “injected” between the changes that the statement introduces to the database and the logging of the statement to the binary log. This would mean that the statements would be logged in a different order than the one in which they took effect in the database, which clearly could lead to inconsistencies between master and slave. For instance, an UPDATE statement with a WHERE clause could update different rows on the slave because the values in those rows could change if the statement order changed.

2. Logging Data Definition Language Statements

Data Definition Language (DDL) statements affect a schema, such as CREATE TABLE and ALTER TABLE statements. These create or change objects in the filesystem—for example, table definitions are stored in .frm files and databases are represented as filesystem directories—so the server keeps information about these available in data structures internally. To protect the update of the internal data structure, it is necessary to acquire a lock before altering the table definition.

Since a single lock is used to protect these data structures, the creation, alteration, and destruction of database objects can be a considerable source of performance problems. This includes the creation and destruction of temporary tables, which is quite common as a technique to create an intermediate result set to perform computations on.

If you are creating and destroying a lot of temporary tables, it is often possible to boost performance by reducing the creation (and subsequent destruction) of temporary tables.

3. Logging Queries

For statement-based replication, the most common binlog event is the Query event, which is used to hold a statement executed on the master. In addition to the actual statement executed, the event contains some additional information necessary for execution of the statement.

Recall that the binary log can be used for many purposes and contains statements in a potentially different order than that in which they were executed on the master. In some cases, part of the binary log may be played back to a server to perform PITR, and in some cases, replication may start in the middle of a sequence of events because a backup has been restored on a slave before starting replication. Furthermore, a database administrator (DBA) might manually tweak the binary log to fix a problem.

In all these cases, the events are executing in different contexts. That is, there is information that is implicit when the server executes the statement but that has to be known to execute the statement correctly. Examples include:

Current database: If the statement refers to a table, function, or procedure without qualifying it with the database, the current database is implicit for the statement.
Value of user-defined variable: If a statement refers to a user-defined variable, the value of the variable is implicit for the statement.
Seed for the RAND function: The RAND function is based on a pseudorandom number function, meaning that it can generate a sequence of numbers that are reproducible but appear random in the sense that they are evenly distributed. The function is not really random, but starts from a seed number and applies a pseudorandom function to generate a deterministic sequence of numbers. This means that given the same seed, the RAND function will always return the same number. However, this makes the seed implicit for the statement.
The current time: Obviously, the time the statement started executing is implicit. Having a correct time is important when calling functions that are dependent on the current time—such as NOW and UNIX_TIMESTAMP—because otherwise they will return different results if there is a delay between the statement execution on the master and on the slave.
Value used when inserting into an AUTO_INCREMENT column: If a statement inserts a row into a table with a column defined with the AUTO_IN⁠CRE⁠MENT attribute, the value used for that row is implicit for the statement since it depends on the rows inserted before it.
Value returned by a call to LAST_INSERT_ID: If the LAST_INSERT_ID function is used in a statement, it depends on the value inserted by a previous statement, which makes this value implicit for the statement.
Thread ID: For some statements, the thread ID is implicit. For example, if the statement refers to a temporary table or uses the CURRENT_ID function, the thread ID is implicit for the statement.

Since the context for executing the statements cannot be known when they’re replayed—either on a slave or on the master after a crash and restart—it is necessary to make the implicit information explicit by adding it to the binary log. This is done in slightly different ways depending on the kind of information.

In addition to the previous list, some information is implicit to the execution of triggers and stored routines, but we will cover that separately in Section 3.2.6.

Let’s consider each of the cases of implicit information individually, demonstrate the problem with each one, and examine how the server handles it.

3.1. Current database

The log records the current database by adding it to a special field of the Query event. This field also exists for the events used to handle the LOAD DATA INFILE statement, so the description here applies to that statement as well.

3.2. Current time

Five functions use the current time to compute their values: NOW, CURDATE, CURTIME, UNIX_TIMESTAMP, and SYSDATE. The first four functions return a value based on the time when the statement started to execute. In contrast, SYSDATE will return the value of time(2). The difference can best be demonstrated by comparing the execution of NOW and SYSDATE with an intermediate sleep:

mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE();
+---------------------+----------+---------------------+
| SYSDATE()           | SLEEP(2) | SYSDATE()           |
+---------------------+----------+---------------------+
| 2010-03-27 22:27:36 |        0 | 2010-03-27 22:27:38 |
+---------------------+----------+---------------------+
1 row in set (2.00 sec)

mysql> SELECT NOW(), SLEEP(2), NOW();
+---------------------+----------+---------------------+
| NOW()               | SLEEP(2) | NOW()               |
+---------------------+----------+---------------------+
| 2010-03-27 22:27:49 |        0 | 2010-03-27 22:27:49 |
+---------------------+----------+---------------------+
1 row in set (2.00 sec)

Both functions are evaluated when they are encountered, but NOW returns the time that the statement started executing and SYSDATE returns the time from time(2).

To handle these time functions correctly, the timestamp indicating when the event started executing is stored in the event. This value is then copied from the event to the slave execution thread and used as if it were the time the event started executing when computing the value of the time functions.

Since SYSDATE calls time(2) directly, it is not safe for replication and will return different values on the master and slave when executed. So unless you really want to have the actual time inserted into your tables, it is prudent to stay away from this function.

3.3. Context events

Some implicit information is associated with statements that meet certain conditions:

If the statement contains a reference to a user-defined variable (as in Example 1), it is necessary to add the value of the user-defined variable to the binary log.
If the statement contains a call to the RAND function, it is necessary to add the pseudorandom seed to the binary log.
If the statement contain a call to the LAST_INSERT_ID function, it is necessary to add the last inserted ID to the binary log.
If the statement performs an insert into a table with an AUTO_INCREMENT column, it is necessary to add the value that was used for the column (or columns) to the binary log.

Example 1. Statements with user-defined variables

SET @value = 45;
INSERT INTO t1 VALUES (@value);

In each of these cases, one or more context events are added to the binary log before the event containing the query is written. Since there can be several context events preceding a Query event, the binary log can handle multiple user-defined variables together with the RAND function, or (almost) any combination of the previously listed conditions. The binary log stores the necessary context information through the following events:

User_var

Each such event records the name and value of a single user-defined variable.

Rand

Records the random number seed used by the RAND function. The seed is fetched internally from the session’s state.

Intvar

If the statement is inserting into an autoincrement column, this event records the value of the internal autoincrement counter for the table before the statement starts.

If the statement contains a call to LAST_INSERT_ID, this event records the value that this function returned in the statement.

Example 2 shows some statements that generate all of the context events and how the events appear when displayed using SHOW BINLOG EVENTS. Note that there can be several context events before each statement.

Example 2. Query events with context events

master> CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, b INT, c CHAR(64));
Query OK, 0 rows affected (0.00 sec)

master> SET @foo = 12;
Query OK, 0 rows affected (0.00 sec)

master> SET @bar = 'Smoothnoodlemaps';
Query OK, 0 rows affected (0.00 sec)

master> INSERT INTO t1(b,c) VALUES (@foo,@bar), (RAND(), 'random');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

master> INSERT INTO t1(b) VALUES (LAST_INSERT_ID());
Query OK, 1 row affected (0.00 sec)

master> SHOW BINLOG EVENTS FROM 238\G
*************************** 1. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 238
 Event_type: Query
  Server_id: 1
End_log_pos: 306
       Info: BEGIN
*************************** 2. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 306
 Event_type: Intvar
  Server_id: 1
End_log_pos: 334
       Info: INSERT_ID=1
*************************** 3. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 334
 Event_type: RAND
  Server_id: 1
End_log_pos: 369
       Info: rand_seed1=952494611,rand_seed2=949641547
*************************** 4. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 369
 Event_type: User var
  Server_id: 1
End_log_pos: 413
       Info: @`foo`=12
*************************** 5. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 413
 Event_type: User var
  Server_id: 1
End_log_pos: 465
       Info: @`bar`=_latin1 0x536D6F6F74686E6F6F6... COLLATE latin1_swedish_ci
*************************** 6. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 465
 Event_type: Query
  Server_id: 1
End_log_pos: 586
       Info: use `test`; INSERT INTO t1(b,c) VALUES (@foo,@bar), (RAND(), ...
*************************** 7. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 586
 Event_type: Xid
  Server_id: 1
End_log_pos: 613
       Info: COMMIT /* xid=44 */
*************************** 8. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 613
 Event_type: Query
  Server_id: 1
End_log_pos: 681
       Info: BEGIN
*************************** 9. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 681
 Event_type: Intvar
  Server_id: 1
End_log_pos: 709
       Info: LAST_INSERT_ID=1
*************************** 10. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 709
 Event_type: Intvar
  Server_id: 1
End_log_pos: 737
       Info: INSERT_ID=3
*************************** 11. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 737
 Event_type: Query
  Server_id: 1
End_log_pos: 843
       Info: use `test`; INSERT INTO t1(b) VALUES (LAST_INSERT_ID())
*************************** 12. row ***************************
   Log_name: mysqld1-bin.000001
        Pos: 843
 Event_type: Xid
  Server_id: 1
End_log_pos: 870
       Info: COMMIT /* xid=45 */
12 rows in set (0.00 sec)

3.4. Thread ID

The last implicit piece of information that the binary log sometimes needs is the thread ID of the MySQL session handling the statement. The thread ID is necessary when a function is dependent on the thread ID—such as when it refers to CONNECTION_ID—but most importantly for handling temporary tables.

Temporary tables are specific to each thread, meaning that two temporary tables with the same name are allowed to coexist, provided they are defined in different sessions. Temporary tables can provide an effective means to improve the performance of certain operations, but they require special handling to work with the binary log.

Internally in the server, temporary tables are handled by creating obscure names for storing the table definitions. The names are based on the process ID of the server, the thread ID that creates the table, and a thread-specific counter to distinguish between different instances of the table from the same thread. This naming scheme allows tables from different threads to be distinguished from each other, but each statement can access its proper table only if the thread ID is stored in the binary log.

Similar to how the current database is handled in the binary log, the thread ID is stored as a separate field in every Query event and can therefore be used to compute thread-specific data and handle temporary tables correctly.

When writing the Query event, the thread ID to store in the event is read from the server variable pseudo_thread_id. This means that it can be set before executing a statement, but only if you have SUPER privileges. This server variable is intended to be used by mysqlbinlog to emit statements correctly and should not normally be used.

For a statement that contains a call to the CONNECTION_ID function or that uses or creates a temporary table, the Query event is marked as thread-specific in the binary log. Since the thread ID is always present in the Query event, this flag is not necessary, but is mainly used to allow mysqlbinlog to avoid printing unnecessary assignments to the pseudo_thread_id variable.

Others