MySQL has traditionally employed statement-based
replication and just recently implemented row-based replication.
In statement-based replication, the actual executed statement is
written to the binary log together with some execution information, and
the statement is reexecuted on the slave. Since not all statements can be
logged as statements, there are some exceptions that you should be aware
of. This section will describe the process of logging statements as well
as the important caveats.
Since the binary log is a common resource—all threads write
statements to it—it is critical to prevent two threads from updating the
binary log at the same time. To handle this, a lock for the binary
log—the LOCK_log mutex—is
acquired just before the event is written to the binary log and released
just after the event has been written. Because all session threads for the
server log statements to the binary log, it is quite common for several
session threads to block on this lock.
1. Logging Data Manipulation Language Statements
Data Manipulation Language (DML) statements are usually DELETE, INSERT, and UPDATE statements. To support safe logging,
MySQL writes the binary log while transaction-level locks are held, and
releases them after the binary log has been written.
To ensure the binary log is updated consistently with the tables
that the statement modifies, the statement is logged to the binary log
at the same time that the statement is being committed, just before the
table locks are released. If the logging were not made as part of the
statement, another statement could be “injected” between the changes
that the statement introduces to the database and the logging of the
statement to the binary log. This would mean that the statements would
be logged in a different order than the one in which they took effect in
the database, which clearly could lead to inconsistencies between master
and slave. For instance, an UPDATE statement with
a WHERE clause could update different
rows on the slave because the values in those rows could change if the
statement order changed.
2. Logging Data Definition Language Statements
Data Definition Language (DDL) statements affect a schema, such as CREATE
TABLE and ALTER TABLE
statements. These create or change objects in the
filesystem—for example, table definitions are stored in
.frm files and databases are
represented as filesystem directories—so the server keeps information
about these available in data structures internally. To protect the
update of the internal data structure, it is necessary to acquire a lock
before altering the table definition.
Since a single lock is used to protect these data structures, the
creation, alteration, and destruction of database objects can be a
considerable source of performance problems. This includes the creation and
destruction of temporary tables, which is quite common as a technique to
create an intermediate result set to perform computations on.
If you are creating and destroying a lot of temporary tables, it
is often possible to boost performance by reducing the creation (and
subsequent destruction) of temporary tables.
3. Logging Queries
For statement-based replication, the most common binlog event is the Query event, which is used to hold a statement
executed on the master. In addition to the actual statement executed,
the event contains some additional information necessary for execution
of the statement.
Recall that the binary log can be used for many purposes and
contains statements in a potentially different order than that in which
they were executed on the master. In some cases, part of the binary log
may be played back to a server to perform PITR, and in some cases, replication may start in the
middle of a sequence of events because a backup has been restored on a
slave before starting replication. Furthermore, a database administrator
(DBA) might manually tweak the binary log to fix a problem.
In all these cases, the events are executing in different
contexts. That is, there is information that is
implicit when the server executes the statement but that has to be known to execute the
statement correctly. Examples include:
Current database
If the statement refers to a table, function, or procedure
without qualifying it with the database, the current database is
implicit for the statement.
Value of user-defined variable
If a statement refers to a user-defined variable, the value of the variable is
implicit for the statement.
Seed for the RAND
function
The RAND function
is based on a pseudorandom number function, meaning
that it can generate a sequence of numbers that are reproducible
but appear random in the sense that they are evenly distributed.
The function is not really random, but starts from a seed number
and applies a pseudorandom function to generate a deterministic
sequence of numbers. This means that given the same seed, the
RAND function will always
return the same number. However, this makes the seed implicit for
the statement.
The current time
Obviously, the time the statement started executing is implicit.
Having a correct time is important when calling functions that are
dependent on the current time—such as NOW and UNIX_TIMESTAMP—because otherwise they
will return different results if there is a delay between the
statement execution on the master and on the slave.
Value used when inserting into an AUTO_INCREMENT column
If a statement inserts a row into a table with a column
defined with the AUTO_INCREMENT attribute, the value
used for that row is implicit for the statement since it depends
on the rows inserted before it.
Value returned by a call to LAST_INSERT_ID
If the LAST_INSERT_ID
function is used in a statement, it depends on the value inserted
by a previous statement, which makes this value implicit for the
statement.
Thread ID
For some statements, the thread ID is implicit. For example, if the
statement refers to a temporary table or uses the CURRENT_ID function, the thread ID is implicit for the
statement.
Since the context for executing the statements cannot be known
when they’re replayed—either on a
slave or on the master after a crash and restart—it is necessary to make
the implicit information explicit by adding it to the binary log. This
is done in slightly different ways depending on the kind of
information.
In addition to the previous list, some information is implicit to
the execution of triggers and stored routines, but we will cover that
separately in Section 3.2.6.
Let’s consider each of the cases of implicit information
individually, demonstrate the problem with each one, and examine how the
server handles it.
3.1. Current database
The log records the current database by adding it to a special field of the
Query event. This field also exists
for the events used to handle the LOAD DATA INFILE
statement, so the description here
applies to that statement as well.
3.2. Current time
Five functions use the current time to compute their values: NOW, CURDATE, CURTIME, UNIX_TIMESTAMP, and SYSDATE. The first four functions return a
value based on the time when the statement
started to execute. In contrast, SYSDATE will return the value of time(2). The difference can best be
demonstrated by comparing the execution of NOW and SYSDATE with an intermediate sleep:
mysql> SELECT SYSDATE(), SLEEP(2), SYSDATE();
+---------------------+----------+---------------------+
| SYSDATE() | SLEEP(2) | SYSDATE() |
+---------------------+----------+---------------------+
| 2010-03-27 22:27:36 | 0 | 2010-03-27 22:27:38 |
+---------------------+----------+---------------------+
1 row in set (2.00 sec)
mysql> SELECT NOW(), SLEEP(2), NOW();
+---------------------+----------+---------------------+
| NOW() | SLEEP(2) | NOW() |
+---------------------+----------+---------------------+
| 2010-03-27 22:27:49 | 0 | 2010-03-27 22:27:49 |
+---------------------+----------+---------------------+
1 row in set (2.00 sec)
Both functions are evaluated when they are encountered, but
NOW returns the time that the
statement started executing and SYSDATE returns the time from time(2).
To handle these time functions correctly, the timestamp indicating when the event
started executing is stored in the event. This
value is then copied from the event to the slave execution thread and
used as if it were the time the event started executing when computing
the value of the time functions.
Since SYSDATE calls time(2) directly, it is not safe for
replication and will return different values on the master and slave
when executed. So unless you really want to have the actual time
inserted into your tables, it is prudent to stay away from this
function.
3.3. Context events
Some implicit information is associated with statements that meet certain
conditions:
If the statement contains a reference to a user-defined
variable (as in Example 1), it is
necessary to add the value of the user-defined variable to the
binary log.
If the statement contains a call to the RAND
function, it is necessary to add the pseudorandom seed to the
binary log.
If the statement contain a call to the LAST_INSERT_ID
function, it is necessary to add the last inserted ID to the
binary log.
If the statement performs an insert into a table
with an AUTO_INCREMENT column, it is necessary
to add the value that was used for the column (or columns) to the
binary log.
Example 1. Statements with user-defined variables
SET @value = 45;
INSERT INTO t1 VALUES (@value);
|
In each of these cases, one or more context
events are added to the binary log before the event
containing the query is written. Since there can be several context
events preceding a Query event, the binary log can
handle multiple user-defined variables together with the RAND function, or (almost) any combination
of the previously listed conditions. The binary log stores the
necessary context information through the following events:
User_var
Each such event records the name and value of a single
user-defined variable.
Rand
Records the random number seed used by the RAND
function. The seed is fetched internally from the session’s
state.
Intvar
If the statement is inserting into an autoincrement
column, this event records the value of the internal
autoincrement counter for the table before the statement
starts.
If the statement contains a call to LAST_INSERT_ID, this event records the
value that this function returned in the statement.
Example 2 shows some
statements that generate all of the context events and how the events
appear when displayed using SHOW BINLOG EVENTS.
Note that there can be several context events before each statement.
Example 2. Query events with context events
master> CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, b INT, c CHAR(64));
Query OK, 0 rows affected (0.00 sec)
master> SET @foo = 12;
Query OK, 0 rows affected (0.00 sec)
master> SET @bar = 'Smoothnoodlemaps';
Query OK, 0 rows affected (0.00 sec)
master> INSERT INTO t1(b,c) VALUES (@foo,@bar), (RAND(), 'random');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
master> INSERT INTO t1(b) VALUES (LAST_INSERT_ID());
Query OK, 1 row affected (0.00 sec)
master> SHOW BINLOG EVENTS FROM 238\G
*************************** 1. row ***************************
Log_name: mysqld1-bin.000001
Pos: 238
Event_type: Query
Server_id: 1
End_log_pos: 306
Info: BEGIN
*************************** 2. row ***************************
Log_name: mysqld1-bin.000001
Pos: 306
Event_type: Intvar
Server_id: 1
End_log_pos: 334
Info: INSERT_ID=1
*************************** 3. row ***************************
Log_name: mysqld1-bin.000001
Pos: 334
Event_type: RAND
Server_id: 1
End_log_pos: 369
Info: rand_seed1=952494611,rand_seed2=949641547
*************************** 4. row ***************************
Log_name: mysqld1-bin.000001
Pos: 369
Event_type: User var
Server_id: 1
End_log_pos: 413
Info: @`foo`=12
*************************** 5. row ***************************
Log_name: mysqld1-bin.000001
Pos: 413
Event_type: User var
Server_id: 1
End_log_pos: 465
Info: @`bar`=_latin1 0x536D6F6F74686E6F6F6... COLLATE latin1_swedish_ci
*************************** 6. row ***************************
Log_name: mysqld1-bin.000001
Pos: 465
Event_type: Query
Server_id: 1
End_log_pos: 586
Info: use `test`; INSERT INTO t1(b,c) VALUES (@foo,@bar), (RAND(), ...
*************************** 7. row ***************************
Log_name: mysqld1-bin.000001
Pos: 586
Event_type: Xid
Server_id: 1
End_log_pos: 613
Info: COMMIT /* xid=44 */
*************************** 8. row ***************************
Log_name: mysqld1-bin.000001
Pos: 613
Event_type: Query
Server_id: 1
End_log_pos: 681
Info: BEGIN
*************************** 9. row ***************************
Log_name: mysqld1-bin.000001
Pos: 681
Event_type: Intvar
Server_id: 1
End_log_pos: 709
Info: LAST_INSERT_ID=1
*************************** 10. row ***************************
Log_name: mysqld1-bin.000001
Pos: 709
Event_type: Intvar
Server_id: 1
End_log_pos: 737
Info: INSERT_ID=3
*************************** 11. row ***************************
Log_name: mysqld1-bin.000001
Pos: 737
Event_type: Query
Server_id: 1
End_log_pos: 843
Info: use `test`; INSERT INTO t1(b) VALUES (LAST_INSERT_ID())
*************************** 12. row ***************************
Log_name: mysqld1-bin.000001
Pos: 843
Event_type: Xid
Server_id: 1
End_log_pos: 870
Info: COMMIT /* xid=45 */
12 rows in set (0.00 sec)
|
3.4. Thread ID
The last implicit piece of information that the binary log
sometimes needs is the thread ID of the MySQL session handling the statement.
The thread ID is necessary when a function is dependent on the thread
ID—such as when it refers to CONNECTION_ID—but
most importantly for handling temporary tables.
Temporary tables are specific to each thread, meaning that two temporary
tables with the same name are allowed to coexist, provided they are
defined in different sessions. Temporary tables can provide an
effective means to improve the performance of certain operations, but
they require special handling to work with the binary log.
Internally in the server, temporary tables are handled by
creating obscure names for storing the table definitions. The names
are based on the process ID of the server, the thread ID that creates the
table, and a thread-specific counter to distinguish between different
instances of the table from the same thread. This naming scheme allows
tables from different threads to be distinguished from each other, but
each statement can access its proper table only if the thread ID is
stored in the binary log.
Similar to how the current database is handled in the binary
log, the thread ID is stored as a separate field in every Query event and can
therefore be used to compute thread-specific data and handle temporary
tables correctly.
When writing the Query event,
the thread ID to store in the event is read from the server variable pseudo_thread_id. This means that it can be
set before executing a statement, but only if you have SUPER privileges.
This server variable is intended to be used by mysqlbinlog to emit
statements correctly and should not normally be used.
For a statement that contains a call to the CONNECTION_ID
function or that uses or creates a temporary table, the Query event is marked as thread-specific in
the binary log. Since the thread ID is always present in the Query event, this flag is not necessary, but
is mainly used to allow mysqlbinlog
to avoid printing unnecessary assignments to the pseudo_thread_id
variable.