MySQL Replication for Scale-Out : Managing the Replication Topology - Example of an Application-Level Load Balancer

12/3/2011 4:45:58 PM

A deployment is scaled by creating new slaves and adding them to the collection of computers you have. The term replication topology refers to the ways you connect servers using replication. Figure 1 shows some examples of replication topologies: a simple topology, a tree topology, a dual-master topology, and a circular topology.

Figure 1. Simple, tree, dual-master, and circular replication topologies

These topologies are used for different purposes: the dual-master topology handles failovers elegantly, for example, and circular replication and dual masters allow different sites to work locally while still replicating changes over to the other sites.

The simple and tree topologies are used for scale-out. The use of replication causes the number of reads to greatly exceed the number of writes. This places special demands on the deployment in two ways:

It requires load balancing

We’re using the term load balancing here to describe any way of dividing queries among servers. Replication creates both reasons for load balancing and methods for doing so. First, replication imposes a basic division of the load by specifying writes to be directed to the masters while reads go to the slaves. Furthermore, you sometimes have to send a particular query to a particular slave.

It requires you to manage the topology

Servers crash sooner or later, which makes it necessary to replace them. Replacing a crashed slave might not be urgent, but you’ll have to replace a crashed master quickly.

In addition to this, if a master crashes, clients have to be redirected to the new master. If a slave crashes, it has to be taken out of the pool of load balancers so no queries are directed to it.

To handle load balancing and management, you should put tools in place to manage the replication topology, specifically tools that monitor the status and performance of servers and tools to handle the distribution of queries.

For load balancing to be effective, it is necessary to have spare capacity on the servers. There are a few reasons for ensuring you have spare capacity:

Peak load handling

You need to have margins to be able to handle peak loads. The load on a system is never even but fluctuates up and down. The spare capacity necessary to handle a large deployment depends a lot on the application, so you need to monitor it closely to know when the response times start to suffer.

Distribution cost

You need to have spare capacity for running the replication setup. Replication always causes a “waste” of some capacity on the overhead of running a distributed system. It involves extra queries to manage the distributed system, such as the extra queries necessary to figure out where to execute a read query.

One item that is easily forgotten is that each slave has to perform the same writes as the master. The queries from the master are executed in an orderly manner (that is, serially), with no risk of conflicting updates, but the slave needs extra capacity for running replication.

Administrative tasks

Restructuring the replication setup requires spare capacity so you can support temporary dual use, for example, when moving data between servers.

Load balancing works in two basic ways: either the application asks for a server based on the type of query, or an intermediate layer—usually referred to as aproxy—analyzes the query and sends it to the correct server.

Using an intermediate layer to analyze and distribute the queries (as shown in Figure 2 ) is by far the most flexible approach, but it has two disadvantages:

Processing resources have to be spent on analyzing queries. This delays the query, which now has to be parsed and analyzed twice: once by the proxy and again by the MySQL server. The more advanced the analysis, the more the query is delayed. Depending on the application, this may or may not be a problem.
Correct query analysis can be hard to implement, sometimes even impossible. A proxy will often hide the internal structure of the deployment from the application programmer so that she does not have to make the hard choices. For this reason, the client may send a query that can be very hard to analyze properly and might require a significant rewrite before being sent to the servers.

Figure 2. Using a proxy to distribute queries

One of the tools that you can use for proxy load balancing is MySQL Proxy. It contains a full implementation of the MySQL client protocol, and therefore can act as a server for the real client connecting to it and as a client when connecting to the MySQL server. This means that it can be fully transparent: a client can’t distinguish between the proxy and a real server.

The MySQL Proxy is controlled using the Lua programming language . It has a built-in Lua engine that executes small—and sometimes not so small—programs to intercept and manipulate both the queries and the result sets. Since the proxy is controlled using a real programming language, it can carry out a variety of sophisticated tasks, including query analysis, query filtering, query manipulation, and query distribution.

The precise methods for using a proxy depend entirely on the type of proxy you use, so we will not cover that information here. Instead, we’ll focus on using a load balancer in the application layer. There are a number of load balancers available, including:

Hardware
Simple software load balancers, such as Balance
Peer-based systems, such as Wackamole
Full-blown clustering solutions, such as the Linux Virtual Server

It is also possible to distribute the load on the DNS level and to handle the distribution directly in the application.

1. Example of an Application-Level Load Balancer

Let’s tackle the task of designing and implementing a simple application-level load balancer to see how it works. In this section, we’ll implement read/write splitting.

The most straightforward approach to load balancing at the application level is to have the application ask the load balancer for a connection based on the type of query it is going to send. In most cases, the application already knows if the query is going to be a read or write query and also which tables will be affected. In fact, forcing the application developer to consider these issues when designing the queries may produce other benefits for the application, usually in the form of improved overall performance of the system. Based on this information, a load balancer can provide a connection to the right server, which the application then can use to execute the query.

A load balancer on the application layer needs to have a central store with information about the servers and what queries they should handle. Functions in the application layer send queries to this central store, which returns the name or IP address of the MySQL server to query.

Let’s develop a simple load balancer like the one shown in Figure 3 for use by the application layer. We’ll use PHP for the presentation logic because it’s so popular on web servers. It is necessary to write functions for updating the server pool information and functions to fetch servers from the pool.

Figure 3. Load balancing on the application level

The pool is implemented by creating a table with all the servers in the deployment in a common database that is shared by all nodes. In this case, we just use the host and port as primary key for the table (instead of creating a host ID) and create a common database to contain the tables of the shared data.

Note:

You should duplicate the central store so that it doesn’t create a single point of failure. In addition, because the list of available servers does not often change, load balancing information is a perfect candidate for caching.

For the sake of simplicity—and to avoid introducing dependencies on other systems—we demonstrate the application-level load balancer using a pure MySQL implementation.

There are many other techniques that you can use that do not involve MySQL. The most common technique is to use round-robin DNS; another alternative is using Memcached, which is a distributed in-memory key/value store.

Also note that the addition of an extra query might be a significant overhead for high-performing systems and should be avoided.

The load balancer lists servers in the load balancer pool, separated into categories based on what kind of queries they can handle. Information about the servers in the pool is stored in a central repository. The implementation consists of a table in the common database given in Example 1, the PHP functions in Example 2 for querying the load balancer from the application, and the Python functions in Example 3 for updating information about the servers.

Example 1. Database tables for the load balancer

CREATE TABLE nodes (
  host CHAR(28) NOT NULL,
  port INT UNSIGNED NOT NULL,
  sock CHAR(64) NOT NULL,
  type SET('READ','WRITE') NOT NULL DEFAULT '',
  PRIMARY KEY (host, port)
);

We store for each host regarding whether it accepts reads, writes, both, or neither. This information is stored in the type field. By setting it to the empty set, we can bring the server offline, which is important for maintenance.

A simple SELECT will suffice to find all the servers that can accept the query. Since we want just a single server, we limit the output to a single line using the LIMIT modifier to the SELECT query, and to distribute queries evenly among available servers, we use the ORDER BY RAND() modifier.

Note:

Using the ORDER BY RAND() modifier requires the server to sort the rows in the table, which may not be the most efficient way to pick a number randomly (it’s actually a very bad way to pick a number randomly), but we picked this approach for demonstration purposes only.

Example 2 shows the PHP function getServerConnection, which queries for a server and connects to it. It returns a connection to the server suitable for issuing a query, or NULL if no suitable server can be found. The helper function connect_to constructs a suitable connection string given its host, port, and a Unix socket. If the host is localhost, it will use the socket to connect to the server for efficiency.

Example 2. PHP function for querying the load balancer

function connect_to($host, $port, $socket) {
  $db_server = $host == "localhost" ? ":{$socket}" : "{$host}:{$port}";
  return mysql_connect($db_server, 'query_user');
}

$COMMON = connect_to(host, port, socket);
mysql_select_db('common', $COMMON);

define('DB_WRITE', 'WRITE');
define('DB_READ', 'READ');

function getServerConnection($queryType)
{
  global $COMMON;
  $query = <<<END_OF_SQL
SELECT host, port, sock FROM nodes
 WHERE FIND_IN_SET('$queryType', type)
ORDER BY RAND() LIMIT 1
END_OF_SQL;
  $result = mysql_query($query, $COMMON);
  if ($row = mysql_fetch_row($result))
    return connect_to($row[0], $row[1], $row[2]);
  return NULL;
}

The final task is to provide utility functions for adding and removing servers and for updating the capabilities of a server. Since these are mainly to be used from the administration logic, we’ve implemented this function in Python using the Replicant library . The utility consists of three functions:

pool_add(common, server, type): Adds a server to the pool. The pool is stored at the server denoted by common, and the type to use is a list—or other iterable—of values to set.
pool_del(common, server): Deletes a server from the pool.
pool_set(common, server, type): Changes the type of the server.

Example 3. Administrative functions for the load balancer

class AlreadyInPoolError(replicant.Error):
    pass

_INSERT_SERVER = """
INSERT INTO nodes(host, port, sock, type)
    VALUES (%s, %s, %s, %s)"""

_DELETE_SERVER = "DELETE FROM nodes WHERE host = %s AND port = %s"

_UPDATE_SERVER = "UPDATE nodes SET type = %s WHERE host = %s AND port = %s"

def pool_add(common, server, type=[]):
    common.use("common")
    try:
        common.sql(_INSERT_SERVER,
                   (server.host, server.port, server.socket, ','.join(type)));
    except MySQLdb.IntegrityError:
        raise AlreadyInPoolError


def pool_del(common, server):
    common.use("common")
    common.sql(_DELETE_SERVER, (server.host, server.port))

def pool_set(common, server, type):
    common.use("common")
    common.sql(_UPDATE_SERVER, (','.join(type), server.host, server.port))

These functions can be used as shown in the following examples:

pool_add(common, master, ['READ', 'WRITE'])

for slave in slaves:
    pool_add(common, slave, ['READ'])

Others