The events mentioned thus far are information carriers in the sense
that they represent some real change of data that occurred on the master.
There are, however, other events that can affect replication but do not represent any
change of data on the master. For example, if the server is stopped, it
can potentially affect replication since changes can occur on the
datafiles while the server is stopped. A typical example of this is
restoring a backup, or otherwise manipulating the datafiles. Such changes
are not replicated because the server is not running.
Events are needed for other purposes as well. Since the binary logs
consist of multiple files, it is necessary to split the groups at
convenient places to form the sequence of binlog files. To handle this
safely, special events are added to the log.
1. The Binary Log and Crash Safety
As you have seen, changes to the binary log do not correspond to changes to
the master databases on a one-to-one basis. It is important to keep the
databases and the binary log mutually consistent in case of a crash. In
other words, there should be no changes committed to the storage engine
that are not written to the binary log, and vice versa.
Nontransactional engines introduce problems right away. For
example, it is not possible to guarantee consistency between the binary
log and a MyISAM table because MyISAM is nontransactional and the storage engine
will carry through any requested change long before any attempts at
logging the statement.
But for transactional storage engines, MySQL includes measures to
make sure that a crash does not cause the binary log to lose too much
information.
Events
are written to the binary log before releasing the locks on the table,
but after all the changes have been given to the storage engine. So if
there is a crash before the storage engine releases the locks, the
server has to ensure that any changes recorded to the binary log are
actually in the table on the disk before allowing the statement (or
transaction) to commit. This requires coordination with standard
filesystem synchronization.
Because disk accesses are very expensive compared to memory
accesses, operating systems are designed to cache parts of the file in a
dedicated part of the main memory—usually called the page cache—and wait to write file
data to disk until necessary. Writing to disk becomes necessary when
another page must be loaded from disk and the page cache is full, but it
can also be requested by an application by doing an explicit call to
write the pages of a file to disk.
Recall from the earlier description of XA that when the first
phase is complete, all data has to be written to durable storage—that
is, to disk—for the protocol to handle crashes correctly. This means
that every time a transaction is committed, the page cache has to be
written to disk. This can be very expensive and, depending on the
application, not always necessary. To control how often the data is
written to disk, you can set the sync-binlog option.
This option takes an integer specifying how often to write the binary
log to disk. If the option is set to 5, for instance, the binary log
will be written to disk every fifth commit of a statement or
transaction. The default value is 0, which means that the binary log is
not explicitly written to disk by the server, but happens at the
discretion of the operating system.
For storage engines that support XA, such as InnoDB, setting the sync-binlog option to 1 means that you will
not lose any transactions under normal crashes. For engines that do not
support XA, you might lose at most one transaction.
If, however, every group is written to disk, it means that the
performance suffers, usually a lot. Disk accesses are notoriously slow
and caches are used for precisely the purpose of improving the
performance by not having to always write data to disk. If you are
prepared to risk losing a few transactions or statements—either because
you can handle the work it takes to recover this manually or because it
is not important for the application—you can set sync-binlog to a higher value or leave it at
the default.
2. Binlog File Rotation
MySQL starts a new file to hold binary log events at regular intervals. For
practical and administrative reasons, it wouldn’t work to keep writing
to a single file—operating systems have limits on file sizes. As
mentioned earlier, the file to which the server is currently writing is
called the active binlog file.
Switching to a new file is called binary log
rotation or binlog file rotation depending on the
context.
There are four main activities that cause a rotation:
The server stops
Each time the server starts, it begins a new binary log.
We’ll discuss why shortly.
The binlog file reaches a maximum size
If the binlog file grows too large, it will be
automatically rotated. You can control the size of the binlog
files using the binlog-cache-size server
variable.
The binary log is explicitly flushed
The FLUSH LOGS command
writes all logs to disk and creates a new file to
continue writing the binary log. This can be useful when
administering recovery images for PITR. Reading from an open binlog file can have
unexpected results, so it is advisable to force an explicit flush
before trying to use binlog files for recovery.
An incident occurred on the server
In addition to stopping altogether, the server can encounter
other incidents that cause the binary log to be rotated. These
incidents sometimes require special manual intervention from the
administrator, because they can leave a “gap” in the replication
stream. It is easier for the DBA to handle the incident if the
server starts on a fresh binlog file after an incident.
The first event of every binlog file is the Format description
event, which describes the server that wrote the file along with
information about the contents and status of the file.
Three items are of particular interest here:
The binlog-in-use
flag
Because a crash can occur while the server is writing to a
binlog file, it is critical to indicate when a file was closed
properly. Otherwise, a DBA could replay a corrupted file on the
master or slave and cause more problems. To provide assurance
about the file’s integrity, the binlog-in-use flag is set when the file
is created and cleared after the final event (Rotate) has
been written to the file. Thus, any program can see whether the
binlog file was properly closed.
Binlog file format version
Over the course of MySQL development, the format for the
binary log has changed several times, and it will certainly change
again. Developers increment the version number for the format when
significant changes—notably changes to the common headers—render
new files unreadable to previous versions of the server. (The
current format, starting with MySQL version 5.0, is version 4.)
The binlog file format version field lists its version number; if
a different server cannot handle a file with that version, it
simply refuses to read the file.
Server version
This is a string denoting the version of the server that wrote
the file. The server version used to run the examples in this
article was “5.1.37-1ubuntu5-log,” for instance,
and another version with the string “5.1.40-debug-log” is used to
run tests. As you can see, the string is guaranteed to include the
MySQL server version, but it also contains additional information
related to the specific build. In some situations, this
information can help you or the developers figure out and resolve
subtle bugs that can occur when replicating between different
versions of the server. To rotate the binary log safely even in
the presence of crashes, the server uses a write-ahead strategy
and records its intention in a temporary file called the purge index file (this name
was chosen because the file is used while purging binlog
files as well, as you will see). Its name is based on
that of the index file, so for instance if the name of the
index file is master-bin.index, the name of the purge
index file is master-bin.~rec~. After creating the
new binlog file and updating the index file to point to it, the
server removes the purge index file.
In the event of a crash, if a purge index file is present on
the server, the server can compare the purge index file and the
index file when it restarts and see what was actually accomplished
compared to what was intended.
3. Incidents
The term “incidents” refers to events that don’t change data on a server but must be
written to the binary log because they have the potential to affect
replication. Most incidents don’t require special intervention from the
DBA—for instance, servers can stop and restart without changes to
database files—but there will inevitably be some incidents that call for
special action.
Currently, there are two incident events that you might discover
in a binary log:
Stop
Indicates that the server was stopped through normal means. If
the server crashed, no stop event will be written, even when the
server is brought up again. This event is written in the old
binlog file (restarting the server rotates to a new file) and
contains only a common header; no other information is provided in
the event.
When the binary log is replayed on the slave, it ignores any
Stop events. Normally, the fact
that the server stopped does not require special attention and
replication can proceed as usual. If the server was switched to a
new version while it was stopped, this will be indicated in the
next binlog file, and the server reading the binlog file will then
stop if it cannot handle the new version of the binlog format. In
this sense, the Stop event does
not represent a “gap” in the replication stream. However, the
event is worth recording because someone might manually restore a
backup or make other changes to files before restarting
replication, and the DBA replaying the file could find this event
in order to start or stop the replay at the right time.
Incident
An event type introduced in version 5.1 as a generic
incident event. In contrast with the Stop event, this event contains an
identifier to specify what kind of incident occurred. It is used
to indicate that the server was forced to perform actions almost
guaranteeing that changes are missing from the binary log.
For example, incident events in version 5.1 are written if
the database was reloaded or if a nontransactional event was too
big to fit in the binlog file. MySQL Cluster generates this event
when one of the nodes had to reload the database and could
therefore be out of sync.
When the binary log is replayed on the slave, it stops with
an error if it encounters an Incident event. In the case of the
MySQL Cluster reload event, it indicates a need to
resynchronize the cluster and probably to search for events that
are missing from the binary log.
4. Purging the Binlog File
Over time, the server will accumulate binlog files unless old ones are purged
from the filesystem. The server can automatically purge old binary logs
from the filesystem, or you can explicitly tell the server to purge the
files.
To make the server automatically purge old binlog files, set
the expire-logs-days
option—which is available as a server variable as well—to the number of
days that you want to keep binlog files. Remember that as with all
server variables, this setting is not preserved between restarts of the
server. So if you want the automatic purging to keep going across
restarts, you have to add the setting to the my.cnf file for the server.
To purge the binlog files manually, use the PURGE BINARY LOGS
command, which comes in two forms:
PURGE BINARY LOGS BEFORE
datetime
This form of the command will purge all files that are
before the given date. If datetime is
in the middle of a logfile (and it usually is), all files before
the one holding datetime will be
purged.
PURGE BINARY LOGS TO
' filename'
This form of the command will purge all files that precede
the given file. In other words, all files before
filename in the output from SHOW MASTER LOGS will be removed,
leaving filename as the first binlog
file.
Binlog files are purged when the server starts or when a binary
log rotation is done. If the server discovers files that require
purging, either because a file is older than expire-logs-days or because a PURGE BINARY LOGS command was executed, it
will start by writing the files that the server has decided are ripe for
purging to the purge index file (for example, master-bin.~rec~). After that, the files are
removed from the filesystem, and finally the purge index file is removed.
In the event of a crash, the server can continue removing files by
comparing the contents of the purge index file and the index file and
removing all files that were not removed because of a crash. As you saw
earlier, the purge index file is used when rotating as well, so if a
crash occurs before the index file can be properly updated, the new
binlog file will be removed and then re-created when the rotate is
repeated.