zookeeper source code analysis

Analysis of ZooKeeper-Jute

This chapter analyzes the zookeeper-jute module within the ZooKeeper project. In the ZooKeeper ecosystem, the zookeeper-jute module primarily handles serialization and deserialization operations, along with defining several core data structures.

Overview of ZooKeeper-Jute

The ZooKeeper project consolidates all serialization and deserialization-related functionality within the zookeeper-jute module. Let's begin with a basic overview of this module's fundamental components. To demonstrate, we'll create a Jute-related test case, starting with implementing the Record interface. Here's the implementation:

// Getters, setters, and constructors omitted for brevity
public class DemoRecord implements Record {
  private String name;
  private int age;

  @Override
  public void serialize(OutputArchive archive, String tag) throws IOException {
    archive.startRecord(this, tag);
    archive.writeInt(age, "age");
    archive.writeString(name, "name");
    archive.endRecord(this, tag);
  }

  @Override
  public void deserialize(InputArchive archive, String tag) throws IOException {
    archive.startRecord(tag);
    this.age = archive.readInt("age");
    this.name = archive.readString("name");
    archive.endRecord(tag);
  }
}

The code above defines two member variables:

  1. The 'name' variable representing the name
  2. The 'age' variable representing the age

Let's focus on the Record interface implementation. First, let's examine the serialize method, which follows these steps:

  1. Marks the beginning of the output archive
  2. Writes the age value with the tag name "age"
  3. Writes the name value with the tag name "name"
  4. Marks the end of the output archive

The deserialize method follows a similar pattern:

  1. Marks the beginning of the input archive
  2. Reads the age value from the input archive using the "age" tag
  3. Reads the name value from the input archive using the "name" tag
  4. Marks the end of the input archive

After preparing the Record implementation, here's a usage example:

public static void main(String[] args) throws Exception {
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  OutputArchive boa = BinaryOutputArchive.getArchive(baos);
  DemoRecord zhangsan = new DemoRecord("zhangsan", 10);
  zhangsan.serialize(boa, "data1");

  ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
  InputArchive bia = BinaryInputArchive.getArchive(bais);
  DemoRecord demoRecord = new DemoRecord();
  demoRecord.deserialize(bia, "data1");

  baos.close();
  bais.close();
}

The core processing flow includes:

  1. Creating a ByteArrayOutputStream and using it to create an OutputArchive
  2. Creating a DemoRecord object and serializing it to the output archive
  3. Creating a ByteArrayInputStream using the output stream's content
  4. Deserializing the data into the demoRecord variable using the input archive

Note that in step 3, the byte output stream's content must be used as a parameter to create the byte input stream. Through debugging, we can observe that the content serialized in step 2 is successfully deserialized into Java memory in steps 3 and 4, stored in the demoRecord variable, as shown in the image.

image-20220302110732921

From the usage of jute, we can identify three core interfaces:

  1. InputArchive for input operations
  2. OutputArchive for output operations
  3. Record interface for serialization and deserialization, with core capabilities in the input and output archives

InputArchive and OutputArchive

In the ZooKeeper project, there are three types of archive storage or transmission formats:

  1. XML-based input and output archives: XmlInputArchive and XmlOutputArchive
  2. CSV-based input and output archives: CsvInputArchive and CsvOutputArchive
  3. Binary-based input and output archives: BinaryInputArchive and BinaryOutputArchive

The most commonly used format is binary transmission. Let's analyze the BinaryInputArchive and BinaryOutputArchive classes. First, we'll examine the BinaryOutputArchive class, starting with its constructor:

public BinaryOutputArchive(DataOutput out) {
  this.out = out;
}

In this constructor, the DataOutput interface is used, and its instance is assigned to the member variable 'out'. This constructor is not frequently used; instead, the static method getArchive is often used to create a BinaryOutputArchive instance:

public static BinaryOutputArchive getArchive(OutputStream strm) {
  return new BinaryOutputArchive(new DataOutputStream(strm));
}

In this code, the output stream is converted to a DataOutputStream type and passed to the constructor to initialize the BinaryOutputArchive instance. Once the 'out' member variable is set, data can be written out. Let's take the example of writing a boolean value:

public void writeBool(boolean b, String tag) throws IOException {
  out.writeBoolean(b);
}

From this code, we can see that writing a boolean value involves calling the writeBoolean method provided by the DataOutput interface. Other types of write operations are not analyzed in detail.

Next, let's analyze the BinaryInputArchive class, starting with its constructor:

public BinaryInputArchive(DataInput in, int maxBufferSize, int extraMaxBufferSize) {
  this.in = in;
  this.maxBufferSize = maxBufferSize;
  this.extraMaxBufferSize = extraMaxBufferSize;
}

In this constructor, there are three variables:

  1. 'in' represents the data input
  2. 'maxBufferSize' represents the maximum buffer size
  3. 'extraMaxBufferSize' represents the extra maximum buffer size

This constructor is not frequently used; instead, the static method getArchive is often used to create a BinaryInputArchive instance:

static public BinaryInputArchive getArchive(InputStream strm) {
  return new BinaryInputArchive(new DataInputStream(strm));
}

Let's examine the readBool method, which corresponds to the writeBool method:

public boolean readBool(String tag) throws IOException {
  return in.readBoolean();
}

In this code, the boolean value is read using the 'in' member variable, and the result is returned. The BinaryInputArchive class has other read methods, which are not analyzed in detail.

ZooKeeper Core Data Structures

In the zookeeper-jute module, apart from serialization and deserialization-related functionality, there are also definitions for several core data structures. These definitions are located in the zookeeper-jute/src/main/resources/zookeeper.jute file. Before analyzing this file, let's understand the common data attributes used in the ZooKeeper project:

  1. zxid: a globally unique transaction ID
  2. czxid: the zxid when the node was created
  3. mzxid: the zxid when the node was last modified
  4. ctime: the time when the node was created
  5. mtime: the time when the node was last modified
  6. version: the current version number of the node
  7. cversion: the version number of the child nodes
  8. aversion: the version number of the ACL
  9. ephemeralOwner: the session ID that created the ephemeral node; 0 if the node is persistent
  10. dataLength: the length of the node data
  11. numChildren: the number of child nodes
  12. pzxid: the zxid of the last child node update

After understanding these common data attributes, we can analyze the data definitions in the zookeeper.jute file. The org.apache.zookeeper.data package contains four classes: Id, ACL, Stat, and StatPersisted. Let's focus on the Id and ACL classes.

The Id class has two member variables:

Variable NameVariable TypeVariable Description
idStringID
schemeStringScheme

The 'scheme' variable in the Id class has four possible values:

  1. world: open access, no restrictions
  2. ip: IP-based access control
  3. auth: user authentication
  4. digest: user authentication with password encryption

After understanding the Id class, let's analyze the ACL class. The ACL class has two member variables:

Variable NameVariable TypeVariable Description
idorg.apache.zookeeper.data.IdID
permsIntPermissions

The 'perms' variable in the ACL class has six possible values, defined in the org.apache.zookeeper.ZooDefs.Perms class:

  1. 1: READ permission
  2. 2: WRITE permission
  3. 4: CREATE permission
  4. 8: DELETE permission
  5. 16: ADMIN permission
  6. 31: all permissions

Summary

This chapter focuses on the analysis of the zookeeper-jute module in the ZooKeeper project. We started with an introduction to the jute module, followed by an analysis of the InputArchive and OutputArchive interfaces. Finally, we explored the core data structures defined in the zookeeper-jute module.