Although the master is quite good at handling a large number of slaves,
there is a limit to how many slaves it can handle before the load becomes
too high for comfort (a user mentioned 70 slaves as a practical limit for
his purposes, but as you probably realize, this depends a lot on the
application), and an unresponsive master is always a problem. In those
cases, you can add an extra slave (or several) as a relay
slave (or simply relay), whose only
purpose is to lighten the load of replication on the master by taking care
of a bunch of slaves. Using a relay in this manner is called
hierarchal replication. Figure 1 illustrates a
typical setup with a master, a relay, and several slaves connected to the
relay.
By default, the changes the slave receives from its master are not
written to the binary log of the slave, so if SHOW BINLOG EVENTS is
executed on the slave in the previous setup, you will not see any events
in the binlog. The reason for this is that there is no point in wasting
disk space by recording the changes: if there is a problem and, say, the
slave crashes, you can always recover by cloning the master or another
slave.
On the other hand, the relay server needs to keep a binary log to
record all the changes, because the relay passes them on to other slaves.
Unlike typical slaves, however, the relay doesn’t need to actually apply
changes to a database of its own, because it doesn’t answer
queries.
In short, a typical slave needs to apply changes to a database, but
not to a binary log. A relay server needs to keep a binary log, but does
not apply changes to a database.
To avoid writing changes to the database, it is necessary to keep
tables around (so the statements can be executed), but the changes should
just be thrown away. A storage engine named Blackhole was created for
purposes just like this one. The Blackhole engine accepts all statements and always reports success in
executing them, but any changes are just thrown away. A relay introduces
an extra delay that can cause its slaves to lag further behind the master
than slaves that are directly connected to the master. This lag should be
balanced against the benefits of removing some load from the master, since
managing a hierarchal setup is significantly more difficult than managing
a simple setup.
1. Setting Up a Relay Server
Setting up a relay slave is quite easy, but we have to consider what to do
with tables that are being created on the relay as well as what to do
with tables that already exist on the relay when we change its role. Not
keeping data in the databases will make processing events faster and
reduce the lag for the slaves at the end of the replication process,
since there is no data to be updated. To set up a relay slave, we thus
have to:
Configure the slave to forward any events executed by the
slave thread by writing them to the binlog of the relay
slave.
Change the storage engine for all tables on the relay slave to
use the BLACKHOLE storage engine
to preserve space and improve performance.
Ensure that any new tables added to the relay also use the
BLACKHOLE engine.
Configuring the relay server to forward events executed by the
slave thread is done by adding the log-slave-updates
option to my.cnf, as demonstrated
earlier.
To ensure all tables created on the relay slave are created with
the BLACKHOLE engine,
connect to the server and set the default storage engine:
relay> SET GLOBAL STORAGE_ENGINE = 'BLACKHOLE';
The final task is to change the storage engine for all tables
already on the relay slave to use BLACKHOLE. Do this using the ALTER TABLE statement to change the storage
engine for each table on the server. Since the ALTER TABLE statements
shouldn’t be written to the binary log (the last thing we want is for
slaves to discard the changes they receive!), turn off the binary log
temporarily while executing the ALTER
TABLE statements. This is shown in Example 1.
Example 1. Changing the engine for all tables in database windy
relay> SHOW TABLES FROM windy;
+-----------------+
| Tables_in_windy |
+-----------------+
| user_data |
.
.
.
| profile |
+-----------------+
45 row in set (0.15 sec)
relay> SET SQL_LOG_BIN = 0;
relay> ALTER TABLE user_data ENGINE = 'BLACKHOLE';
.
.
.
relay> ALTER TABLE profile ENGINE = 'BLACKHOLE';
relay> SET SQL_BIN_LOG = 1;
|
This is all you need to turn a server into a relay server. The
usual way you come to employ a relay is to start with a setup where all
slaves attach directly to a master and discover after some time that it
is necessary to introduce a relay slave. The reason is usually that the
master has become too loaded, but there could be architectural reasons
for making the change as well. So how do you handle that?
You can use what you learned in the previous sections and modify
the existing deployment to introduce the new relay server by:
Connecting the relay slave to the master and configuring it to
act as a relay server
Switching over the slaves one by one to the relay
server
2. Adding a Relay in Python
Let’s turn to the task of developing support for administering relays by
extending our library. Since we have a system for creating new roles and imbuing servers with those roles, let’s use
that by defining a special role for the relay server. This is shown
in Example 2.
Example 2. Role definition for relay
class Relay(role.Base):
def __init__(self, master):
pass
def imbue(self, server):
config = server.get_config()
self._set_server_id(server, config)
self._enable_binlog(server, config)
config.set('mysqld', 'log-slave-updates' '1')
server.put_config(config)
server.sql("SET SQL_LOG_BIN = 0")
for db in list of databases:
for table in server.sql("SHOW TABLES FROM %s", (db)):
server.sql("ALTER TABLE %s.%s ENGINE=BLACKHOLE", (db,table))
server.sql("SET SQL_LOG_BIN = 1")