Thrift框架详解（四）

概诉

上一届主要讨论了Thrift编解码中的二进制协议格式。这是一种比较大众的协议。本节来讨论二进制协议的另一种加强版本。TCompactProtocol

编解码

对于二进制编码，经常需要对数据进行压缩以节省空间，varint可以压缩较小的正数，但是对于负数varint反而更浪费空间，zigzag编码可以处理负数，使负数也可以使用varint编码压缩，protobuf和thrift都使用二者结合的方式来压缩数字类型。

原码、反码、补码
首先，计算机为了方便位运算，使用补码来存储数字。

然后回顾下：

原码：最高位为符号位，剩余位表示绝对值；
反码：除符号位外，对原码剩余位依次取反；
补码：对于正数，补码为其自身；对于负数，除符号位外对原码剩余位依次取反然后+1。

varint

本文以int类型为例，详细介绍如果通过varint+zigzag编码技术压缩数字。
首先，我们常用的数字其实多数都不是很大，比如：商品的价值、动态计数等，对于这些不是很大的数字，二进制的高位多数是0。
varint编码每个字节前1位表示下一个字节是否也是该数字的一部分，后7位表示实际的值，最后，先低位后高位，对于int类型来说，varint编码最少占用1个字节，最多占用5个字节。

varint编码java代码表示：

byte[] bytes = new byte[5];
int idx = 0;
for(idx = 0; (n & -128) != 0; n >>>= 7) {
  bytes[idx++] = (byte)(n & 127 | 128);
}
bytes[idx++] = (byte)n;

例如，对于int类型1来说，二进制表示为 00000000 00000000 00000000 00000001，总共占用4个字节，然而，前3个字节都是0，varint就是通过压缩高位的0来达到节省空间的目的，使用varint压缩后，二进制表示为 00000001，只占用1个字节。

对于int类型2^30来说二进制表示为
01000000 00000000 00000000 00000000，使用varint压缩后，二进制表示为10000000 10000000 10000000 10000000 00000100，占用5个字节。

zigzag

对于负数来说，因为最高位符号位始终为1，使用 varint 编码就很浪费空间，zigzag 编码就是解决负数的问题的，同时其对正数也没有很大的影响。

int类型zigzag变换的代码表示为(n << 1) ^ (n >> 31)

左移1位可以消去符号位，低位补0
有符号右移31位将符号位移动到最低位，负数高位补1，正数高位补0

例如：对于正数来说，最低位符号位为0，其他位不变
对于负数，最低位符号位为1，其他位按位取反
-1的二进制表示为 11111111 11111111 11111111 11111111，zigzag变换后 00000000 00000000 00000000 00000001，再用varint编码，是不是很小了。

1的二进制表示为 00000000 00000000 00000000 00000001，zigzag变换后00000001，再用varint编码，依然很小。

解码

zigzag操作为(n >>> 1) ^ -(n & 1)

正数：当前的数乘以2， zigzagY = x * 2
负数：当前的数乘以-2后减1, zigzagY = x * -2 - 1

无符号右移1位
按位与1，然后取负值，这一步非常巧妙，对于正数就是0，负数就是-1
按位异或得到结果
- 正数是与0按位异或
- 负数是与-1按位异或

-1 的解码过程为

00000000 00000000 00000000 00000010 = 无符号右移1位（(n >>> 1)） = 00000000 00000000 00000000 00000001
00000000 00000000 00000000 00000010 = 按位与1（都为1则为1，否则为0）（(n & 1)）= 00000000 00000000 00000000 00000000 为负数
上面两个记过异或 00000000 00000000 00000000 00000001 (异或 -1 相同为0，不同为1) = 11111111 11111111 11111111 11111111 = 00000000 00000000 00000000 00000001

TCompactProtocol 代码详解

TCompactProtocol对于二进制编码，经常需要对数据进行压缩以节省空间，varint可以压缩较小的正数，但是对于负数varint反而更浪费空间，zigzag编码可以处理负数，使负数也可以使用varint编码压缩，protobuf和thrift都使用二者结合的方式来压缩数字类型。

来看一下二级制的图示

代码

下面代码省略了一些部分信息，保留了关键注释和方法。

public class TCompactProtocol extends TProtocol {
  private final static TStruct ANONYMOUS_STRUCT = new TStruct("");
  private final static TField TSTOP = new TField("", TType.STOP, (short)0);
  private final static byte[] ttypeToCompactType = new byte[16];
  static {
    ttypeToCompactType[TType.STOP] = TType.STOP;
    ttypeToCompactType[TType.BOOL] = Types.BOOLEAN_TRUE;
    ttypeToCompactType[TType.BYTE] = Types.BYTE;
    ttypeToCompactType[TType.I16] = Types.I16;
    ttypeToCompactType[TType.I32] = Types.I32;
    ttypeToCompactType[TType.I64] = Types.I64;
    ttypeToCompactType[TType.DOUBLE] = Types.DOUBLE;
    ttypeToCompactType[TType.STRING] = Types.BINARY;
    ttypeToCompactType[TType.LIST] = Types.LIST;
    ttypeToCompactType[TType.SET] = Types.SET;
    ttypeToCompactType[TType.MAP] = Types.MAP;
    ttypeToCompactType[TType.STRUCT] = Types.STRUCT;
  }

  private static final byte PROTOCOL_ID = (byte)0x82;
  private static final byte VERSION = 1;
  private static final byte VERSION_MASK = 0x1f; // 0001 1111
  private static final byte TYPE_MASK = (byte)0xE0; // 1110 0000
  private static final int  TYPE_SHIFT_AMOUNT = 5;

  private static class Types {
    public static final byte BOOLEAN_TRUE   = 0x01;
    public static final byte BOOLEAN_FALSE  = 0x02;
    public static final byte BYTE           = 0x03;
    public static final byte I16            = 0x04;
    public static final byte I32            = 0x05;
    public static final byte I64            = 0x06;
    public static final byte DOUBLE         = 0x07;
    public static final byte BINARY         = 0x08;
    public static final byte LIST           = 0x09;
    public static final byte SET            = 0x0A;
    public static final byte MAP            = 0x0B;
    public static final byte STRUCT         = 0x0C;
  }

  /** 
   * Thrift 自己实现的short类型栈
   */
  private ShortStack lastField_ = new ShortStack(15);

  private short lastFieldId_ = 0;

  /**
   * If we encounter a boolean field begin, save the TField here so it can 
   * have the value incorporated.
   */
  private TField booleanField_ = null;

  /**
   * If we read a field header, and it's a boolean field, save the boolean 
   * value here so that readBool can use it.
   */
  private Boolean boolValue_ = null;

  @Override
  public void reset() {
    lastField_.clear();
    lastFieldId_ = 0;
  }

 
  /**
   * Write a message header to the wire. Compact Protocol messages contain the
   * protocol version so we can migrate forwards in the future if need be.
   */
  public void writeMessageBegin(TMessage message) throws TException {
    writeByteDirect(PROTOCOL_ID);
    writeByteDirect((VERSION & VERSION_MASK) | ((message.type << TYPE_SHIFT_AMOUNT) & TYPE_MASK));
    writeVarint32(message.seqid);
    writeString(message.name);
  }

  /**
   * Write a struct begin. This doesn't actually put anything on the wire. We 
   * use it as an opportunity to put special placeholder markers on the field
   * stack so we can get the field id deltas correct.
   */
  public void writeStructBegin(TStruct struct) throws TException {
    lastField_.push(lastFieldId_);
    lastFieldId_ = 0;
  }

  /**
   * Write a struct end. This doesn't actually put anything on the wire. We use
   * this as an opportunity to pop the last field from the current struct off
   * of the field stack.
   */
  public void writeStructEnd() throws TException {
    lastFieldId_ = lastField_.pop();
  }

  /**
   * Write a field header containing the field id and field type. If the
   * difference between the current field id and the last one is small (< 15),
   * then the field id will be encoded in the 4 MSB as a delta. Otherwise, the
   * field id will follow the type header as a zigzag varint.
   */ 
  public void writeFieldBegin(TField field) throws TException {
    if (field.type == TType.BOOL) {
      // we want to possibly include the value, so we'll wait.
      booleanField_ = field;
    } else {
      writeFieldBeginInternal(field, (byte)-1);
    }
  }

  /**
   * The workhorse of writeFieldBegin. It has the option of doing a 
   * 'type override' of the type header. This is used specifically in the 
   * boolean field case.
   */
  private void writeFieldBeginInternal(TField field, byte typeOverride) throws TException {
    // short lastField = lastField_.pop();

    // if there's a type override, use that.
    byte typeToWrite = typeOverride == -1 ? getCompactType(field.type) : typeOverride;

    // check if we can use delta encoding for the field id
    if (field.id > lastFieldId_ && field.id - lastFieldId_ <= 15) {
      // write them together
      writeByteDirect((field.id - lastFieldId_) << 4 | typeToWrite);
    } else {
      // write them separate
      writeByteDirect(typeToWrite);
      writeI16(field.id);
    }

    lastFieldId_ = field.id;
    // lastField_.push(field.id);
  }

  /**
   * Write the STOP symbol so we know there are no more fields in this struct.
   */
  public void writeFieldStop() throws TException {
    writeByteDirect(TType.STOP);
  }

  /**
   * Write a map header. If the map is empty, omit the key and value type 
   * headers, as we don't need any additional information to skip it.
   */
  public void writeMapBegin(TMap map) throws TException {
    if (map.size == 0) {
      writeByteDirect(0);
    } else {
      writeVarint32(map.size);
      writeByteDirect(getCompactType(map.keyType) << 4 | getCompactType(map.valueType));
    }
  }
  
  /** 
   * Write a list header.
   */
  public void writeListBegin(TList list) throws TException {
    writeCollectionBegin(list.elemType, list.size);
  }

  /**
   * Write a set header.
   */
  public void writeSetBegin(TSet set) throws TException {
    writeCollectionBegin(set.elemType, set.size);
  }

  /**
   * Write a boolean value. Potentially, this could be a boolean field, in 
   * which case the field header info isn't written yet. If so, decide what the
   * right type header is for the value and then write the field header. 
   * Otherwise, write a single byte.
   */
  public void writeBool(boolean b) throws TException {
    if (booleanField_ != null) {
      // we haven't written the field header yet
      writeFieldBeginInternal(booleanField_, b ? Types.BOOLEAN_TRUE : Types.BOOLEAN_FALSE);
      booleanField_ = null;
    } else {
      // we're not part of a field, so just write the value.
      writeByteDirect(b ? Types.BOOLEAN_TRUE : Types.BOOLEAN_FALSE);
    }
  }
  public void writeByte(byte b) throws TException {
    writeByteDirect(b);
  }
  //ZigZag编码：16位 
  public void writeI16(short i16) throws TException {
    writeVarint32(intToZigZag(i16));
  }
 //ZigZag编码：32位 
  public void writeI32(int i32) throws TException {
    writeVarint32(intToZigZag(i32));
  }
  //ZigZag编码：64位 
  public void writeI64(long i64) throws TException {
    writeVarint64(longToZigzag(i64));
  }

  public void writeDouble(double dub) throws TException {
    byte[] data = new byte[]{0, 0, 0, 0, 0, 0, 0, 0};
    fixedLongToBytes(Double.doubleToLongBits(dub), data, 0);
    trans_.write(data);
  }

  public void writeString(String str) throws TException {
    try {
      byte[] bytes = str.getBytes("UTF-8");
      writeBinary(bytes, 0, bytes.length);
    } catch (UnsupportedEncodingException e) {
      throw new TException("UTF-8 not supported!");
    }
  }

  /**
   * Write a byte array, using a varint for the size. 
   */
  public void writeBinary(ByteBuffer bin) throws TException {
    int length = bin.limit() - bin.position();
    writeBinary(bin.array(), bin.position() + bin.arrayOffset(), length);
  }

  private void writeBinary(byte[] buf, int offset, int length) throws TException {
    writeVarint32(length);
    trans_.write(buf, offset, length);
  }

   // 其他层复写的方法
  public void writeMessageEnd() throws TException {}
  public void writeMapEnd() throws TException {}
  public void writeListEnd() throws TException {}
  public void writeSetEnd() throws TException {}
  public void writeFieldEnd() throws TException {}

  /**
   * Abstract method for writing the start of lists and sets. List and sets on 
   * the wire differ only by the type indicator.
   */
  protected void writeCollectionBegin(byte elemType, int size) throws TException {
    if (size <= 14) {
      writeByteDirect(size << 4 | getCompactType(elemType));
    } else {
      writeByteDirect(0xf0 | getCompactType(elemType));
      writeVarint32(size);
    }
  }

  /**
   * Write an i32 as a varint. Results in 1-5 bytes on the wire.
   * TODO: make a permanent buffer like writeVarint64?
   */
  byte[] i32buf = new byte[5];
  private void writeVarint32(int n) throws TException {
    int idx = 0;
    while (true) {
      if ((n & ~0x7F) == 0) {
        i32buf[idx++] = (byte)n;
        // writeByteDirect((byte)n);
        break;
        // return;
      } else {
        i32buf[idx++] = (byte)((n & 0x7F) | 0x80);
        // writeByteDirect((byte)((n & 0x7F) | 0x80));
        n >>>= 7;
      }
    }
    trans_.write(i32buf, 0, idx);
  }

  /**
   * Write an i64 as a varint. Results in 1-10 bytes on the wire.
   */
  byte[] varint64out = new byte[10];
  private void writeVarint64(long n) throws TException {
    int idx = 0;
    while (true) {
      if ((n & ~0x7FL) == 0) {
        varint64out[idx++] = (byte)n;
        break;
      } else {
        varint64out[idx++] = ((byte)((n & 0x7F) | 0x80));
        n >>>= 7;
      }
    }
    trans_.write(varint64out, 0, idx);
  }

  /**
   * Convert l into a zigzag long. This allows negative numbers to be 
   * represented compactly as a varint.
   */
  private long longToZigzag(long l) {
    return (l << 1) ^ (l >> 63);
  }

  /**
   * Convert n into a zigzag int. This allows negative numbers to be 
   * represented compactly as a varint.
   */
  private int intToZigZag(int n) {
    return (n << 1) ^ (n >> 31);
  }

  /**
   * Convert a long into little-endian bytes in buf starting at off and going 
   * until off+7.
   */
  private void fixedLongToBytes(long n, byte[] buf, int off) {
    buf[off+0] = (byte)( n        & 0xff);
    buf[off+1] = (byte)((n >> 8 ) & 0xff);
    buf[off+2] = (byte)((n >> 16) & 0xff);
    buf[off+3] = (byte)((n >> 24) & 0xff);
    buf[off+4] = (byte)((n >> 32) & 0xff);
    buf[off+5] = (byte)((n >> 40) & 0xff);
    buf[off+6] = (byte)((n >> 48) & 0xff);
    buf[off+7] = (byte)((n >> 56) & 0xff);
  }

  /** 
   * Writes a byte without any possibility of all that field header nonsense. 
   * Used internally by other writing methods that know they need to write a byte.
   */
  private byte[] byteDirectBuffer = new byte[1];
  private void writeByteDirect(byte b) throws TException {
    byteDirectBuffer[0] = b;
    trans_.write(byteDirectBuffer);
  }

  /** 
   * Writes a byte without any possibility of all that field header nonsense.
   */
  private void writeByteDirect(int n) throws TException {
    writeByteDirect((byte)n);
  }


  // 
  // Reading methods.
  // 

  /**
   * Read a message header. 
   */
  public TMessage readMessageBegin() throws TException {
    byte protocolId = readByte();
    if (protocolId != PROTOCOL_ID) {
      throw new TProtocolException("Expected protocol id " + Integer.toHexString(PROTOCOL_ID) + " but got " + Integer.toHexString(protocolId));
    }
    byte versionAndType = readByte();
    byte version = (byte)(versionAndType & VERSION_MASK);
    if (version != VERSION) {
      throw new TProtocolException("Expected version " + VERSION + " but got " + version);
    }
    byte type = (byte)((versionAndType >> TYPE_SHIFT_AMOUNT) & 0x03);
    int seqid = readVarint32();
    String messageName = readString();
    return new TMessage(messageName, type, seqid);
  }

  /**
   * Read a struct begin. There's nothing on the wire for this, but it is our
   * opportunity to push a new struct begin marker onto the field stack.
   */
  public TStruct readStructBegin() throws TException {
    lastField_.push(lastFieldId_);
    lastFieldId_ = 0;
    return ANONYMOUS_STRUCT;
  }

  /**
   * Doesn't actually consume any wire data, just removes the last field for 
   * this struct from the field stack.
   */
  public void readStructEnd() throws TException {
    // consume the last field we read off the wire.
    lastFieldId_ = lastField_.pop();
  }
  
  /**
   * Read a field header off the wire. 
   */
  public TField readFieldBegin() throws TException {
    byte type = readByte();

    // if it's a stop, then we can return immediately, as the struct is over.
    if (type == TType.STOP) {
      return TSTOP;
    }

    short fieldId;

    // mask off the 4 MSB of the type header. it could contain a field id delta.
    short modifier = (short)((type & 0xf0) >> 4);
    if (modifier == 0) {
      // not a delta. look ahead for the zigzag varint field id.
      fieldId = readI16();
    } else {
      // has a delta. add the delta to the last read field id.
      fieldId = (short)(lastFieldId_ + modifier);
    }

    TField field = new TField("", getTType((byte)(type & 0x0f)), fieldId);

    // if this happens to be a boolean field, the value is encoded in the type
    if (isBoolType(type)) {
      // save the boolean value in a special instance variable.
      boolValue_ = (byte)(type & 0x0f) == Types.BOOLEAN_TRUE ? Boolean.TRUE : Boolean.FALSE;
    } 

    // push the new field onto the field stack so we can keep the deltas going.
    lastFieldId_ = field.id;
    return field;
  }

  /** 
   * Read a map header off the wire. If the size is zero, skip reading the key
   * and value type. This means that 0-length maps will yield TMaps without the
   * "correct" types.
   */
  public TMap readMapBegin() throws TException {
    int size = readVarint32();
    byte keyAndValueType = size == 0 ? 0 : readByte();
    return new TMap(getTType((byte)(keyAndValueType >> 4)), getTType((byte)(keyAndValueType & 0xf)), size);
  }

  /**
   * Read a list header off the wire. If the list size is 0-14, the size will 
   * be packed into the element type header. If it's a longer list, the 4 MSB
   * of the element type header will be 0xF, and a varint will follow with the
   * true size.
   */
  public TList readListBegin() throws TException {
    byte size_and_type = readByte();
    int size = (size_and_type >> 4) & 0x0f;
    if (size == 15) {
      size = readVarint32();
    }
    byte type = getTType(size_and_type);
    return new TList(type, size);
  }

  /**
   * Read a set header off the wire. If the set size is 0-14, the size will 
   * be packed into the element type header. If it's a longer set, the 4 MSB
   * of the element type header will be 0xF, and a varint will follow with the
   * true size.
   */
  public TSet readSetBegin() throws TException {
    return new TSet(readListBegin());
  }

  /**
   * Read a boolean off the wire. If this is a boolean field, the value should
   * already have been read during readFieldBegin, so we'll just consume the
   * pre-stored value. Otherwise, read a byte.
   */
  public boolean readBool() throws TException {
    if (boolValue_ != null) {
      boolean result = boolValue_.booleanValue();
      boolValue_ = null;
      return result;
    }
    return readByte() == Types.BOOLEAN_TRUE;
  }

  byte[] byteRawBuf = new byte[1];
  /**
   * Read a single byte off the wire. Nothing interesting here.
   */
  public byte readByte() throws TException {
    byte b;
    if (trans_.getBytesRemainingInBuffer() > 0) {
      b = trans_.getBuffer()[trans_.getBufferPosition()];
      trans_.consumeBuffer(1);
    } else {
      trans_.readAll(byteRawBuf, 0, 1);
      b = byteRawBuf[0];
    }
    return b;
  }

  /**
   * Read an i16 from the wire as a zigzag varint.
   */
  public short readI16() throws TException {
    return (short)zigzagToInt(readVarint32());
  }

  /**
   * Read an i32 from the wire as a zigzag varint.
   */
  public int readI32() throws TException {
    return zigzagToInt(readVarint32());
  }

  /**
   * Read an i64 from the wire as a zigzag varint.
   */
  public long readI64() throws TException {
    return zigzagToLong(readVarint64());
  }

  /**
   * No magic here - just read a double off the wire.
   */
  public double readDouble() throws TException {
    byte[] longBits = new byte[8];
    trans_.readAll(longBits, 0, 8);
    return Double.longBitsToDouble(bytesToLong(longBits));
  }

  /**
   * Reads a byte[] (via readBinary), and then UTF-8 decodes it.
   */
  public String readString() throws TException {
    int length = readVarint32();

    if (length == 0) {
      return "";
    }

    try {
      if (trans_.getBytesRemainingInBuffer() >= length) {
        String str = new String(trans_.getBuffer(), trans_.getBufferPosition(), length, "UTF-8");
        trans_.consumeBuffer(length);
        return str;
      } else {
        return new String(readBinary(length), "UTF-8");
      }
    } catch (UnsupportedEncodingException e) {
      throw new TException("UTF-8 not supported!");
    }
  }

  /**
   * Read a byte[] from the wire. 
   */
  public ByteBuffer readBinary() throws TException {
    int length = readVarint32();
    if (length == 0) return ByteBuffer.wrap(new byte[0]);

    byte[] buf = new byte[length];
    trans_.readAll(buf, 0, length);
    return ByteBuffer.wrap(buf);
  }

  /**
   * Read a byte[] of a known length from the wire. 
   */
  private byte[] readBinary(int length) throws TException {
    if (length == 0) return new byte[0];

    byte[] buf = new byte[length];
    trans_.readAll(buf, 0, length);
    return buf;
  }

  //
  // These methods are here for the struct to call, but don't have any wire 
  // encoding.
  //
  public void readMessageEnd() throws TException {}
  public void readFieldEnd() throws TException {}
  public void readMapEnd() throws TException {}
  public void readListEnd() throws TException {}
  public void readSetEnd() throws TException {}

  //
  // Internal reading methods
  //

  /**
   * Read an i32 from the wire as a varint. The MSB of each byte is set
   * if there is another byte to follow. This can read up to 5 bytes.
   */
  private int readVarint32() throws TException {
    int result = 0;
    int shift = 0;
    if (trans_.getBytesRemainingInBuffer() >= 5) {
      byte[] buf = trans_.getBuffer();
      int pos = trans_.getBufferPosition();
      int off = 0;
      while (true) {
        byte b = buf[pos+off];
        result |= (int) (b & 0x7f) << shift;
        if ((b & 0x80) != 0x80) break;
        shift += 7;
        off++;
      }
      trans_.consumeBuffer(off+1);
    } else {
      while (true) {
        byte b = readByte();
        result |= (int) (b & 0x7f) << shift;
        if ((b & 0x80) != 0x80) break;
        shift += 7;
      }
    }
    return result;
  }

  /**
   * Read an i64 from the wire as a proper varint. The MSB of each byte is set 
   * if there is another byte to follow. This can read up to 10 bytes.
   */
  private long readVarint64() throws TException {
    int shift = 0;
    long result = 0;
    if (trans_.getBytesRemainingInBuffer() >= 10) {
      byte[] buf = trans_.getBuffer();
      int pos = trans_.getBufferPosition();
      int off = 0;
      while (true) {
        byte b = buf[pos+off];
        result |= (long) (b & 0x7f) << shift;
        if ((b & 0x80) != 0x80) break;
        shift += 7;
        off++;
      }
      trans_.consumeBuffer(off+1);
    } else {
      while (true) {
        byte b = readByte();
        result |= (long) (b & 0x7f) << shift;
        if ((b & 0x80) != 0x80) break;
        shift +=7;
      }
    }
    return result;
  }

  //
  // encoding helpers
  //

  /**
   * Convert from zigzag int to int.
   */
  private int zigzagToInt(int n) {
    return (n >>> 1) ^ -(n & 1);
  }

  /** 
   * Convert from zigzag long to long.
   */
  private long zigzagToLong(long n) {
    return (n >>> 1) ^ -(n & 1);
  }

  /**
   * Note that it's important that the mask bytes are long literals, 
   * otherwise they'll default to ints, and when you shift an int left 56 bits,
   * you just get a messed up int.
   */
  private long bytesToLong(byte[] bytes) {
    return
      ((bytes[7] & 0xffL) << 56) |
      ((bytes[6] & 0xffL) << 48) |
      ((bytes[5] & 0xffL) << 40) |
      ((bytes[4] & 0xffL) << 32) |
      ((bytes[3] & 0xffL) << 24) |
      ((bytes[2] & 0xffL) << 16) |
      ((bytes[1] & 0xffL) <<  8) |
      ((bytes[0] & 0xffL));
  }

  //
  // type testing and converting
  //

  private boolean isBoolType(byte b) {
    int lowerNibble = b & 0x0f;
    return lowerNibble == Types.BOOLEAN_TRUE || lowerNibble == Types.BOOLEAN_FALSE;
  }

  /**
   * Given a TCompactProtocol.Types constant, convert it to its corresponding 
   * TType value.
   */
  private byte getTType(byte type) throws TProtocolException {
    switch ((byte)(type & 0x0f)) {
      case TType.STOP:
        return TType.STOP;
      case Types.BOOLEAN_FALSE:
      case Types.BOOLEAN_TRUE:
        return TType.BOOL;
      case Types.BYTE:
        return TType.BYTE;
      case Types.I16:
        return TType.I16;
      case Types.I32:
        return TType.I32;
      case Types.I64:
        return TType.I64;
      case Types.DOUBLE:
        return TType.DOUBLE;
      case Types.BINARY:
        return TType.STRING;
      case Types.LIST:
        return TType.LIST;
      case Types.SET:
        return TType.SET;
      case Types.MAP:
        return TType.MAP;
      case Types.STRUCT:
        return TType.STRUCT;
      default:
        throw new TProtocolException("don't know what type: " + (byte)(type & 0x0f));
    }
  }

  /**
   * Given a TType value, find the appropriate TCompactProtocol.Types constant.
   */
  private byte getCompactType(byte ttype) {
    return ttypeToCompactType[ttype];
  }
}

写入方式

这里我们需要讲几个具体的例子

写bool类型。

由于bool类型只有是否两种状态，且在我们的生成的代码当中是以TField的形式存在的，所有对于boolean的写入特殊处理。
写入bool类型的在struct 生成代码当中的的样例

TCompactProtocol写入Boolean分两种情况，

1：该boolean值为TStruct中的内部成员时TField时，得写入header数据（即内容和数据类型压缩在一起写）；
2 ：如果不为TField内部类型的话，直接按byte写入

1 2	oprot.writeFieldBegin(BOOLTYPE_FIELD_DESC); // BOOLTYPE_FIELD_DESC type = oprot.writeBool(struct.boolType);

所以在下面用booleanField_ 先保存了TField 然后在后续执行 writeBool 判断一下booleanField_ 是否为空

public void writeFieldBegin(TField field) throws TException {
    if (field.type == TType.BOOL) {
      booleanField_ = field;
    } else {
      writeFieldBeginInternal(field, (byte)-1);
    }
}

 public void writeBool(boolean b) throws TException {
    if (booleanField_ != null) {
      // we haven't written the field header yet
      writeFieldBeginInternal(booleanField_, b ? Types.BOOLEAN_TRUE : Types.BOOLEAN_FALSE);
      booleanField_ = null;
    } else {
      // we're not part of a field, so just write the value.
      writeByteDirect(b ? Types.BOOLEAN_TRUE : Types.BOOLEAN_FALSE);
    }
  }

`writeFieldBeginInternal` 压缩写法

writeFieldBeginInternal 每次写入一个Field 就说做一个记录，如果连续写的Field 不超过16个，那么他会将type 压缩写入。
具体的压缩过程。
第一，算出 Field的标号
第二，把编号的低4为空余出来
第三，将type类型和Field的第四位拼起来。

比如：

field.id > this.lastFieldId_ = 3 即为 0000 0011
<< 4 = 0011 0000
找到field type this.getCompactType(field.type)
| 上 type 比如说Type 为 1
最终结果 0011 0001 写入

代码如下

static {
    ttypeToCompactType[TType.STOP] = TType.STOP;
    ttypeToCompactType[TType.BOOL] = Types.BOOLEAN_TRUE;
    ttypeToCompactType[TType.BYTE] = Types.BYTE;
    ttypeToCompactType[TType.I16] = Types.I16;
    ttypeToCompactType[TType.I32] = Types.I32;
    ttypeToCompactType[TType.I64] = Types.I64;
    ttypeToCompactType[TType.DOUBLE] = Types.DOUBLE;
    ttypeToCompactType[TType.STRING] = Types.BINARY;
    ttypeToCompactType[TType.LIST] = Types.LIST;
    ttypeToCompactType[TType.SET] = Types.SET;
    ttypeToCompactType[TType.MAP] = Types.MAP;
    ttypeToCompactType[TType.STRUCT] = Types.STRUCT;
  }
private void writeFieldBeginInternal(TField field, byte typeOverride) throws TException {
                                            // ttypeToCompactType[ttype]
    byte typeToWrite = typeOverride == -1 ? this.getCompactType(field.type) : typeOverride;
    // 得到 差值 这个差值在 0 - 15 一个byte 就可。  
    if (field.id > this.lastFieldId_ && field.id - this.lastFieldId_ <= 15) {
        // 然后 << 4 表示 把地四位 空余出来，
        this.writeByteDirect(field.id - this.lastFieldId_ << 4 | typeToWrite);
    } else {
        this.writeByteDirect(typeToWrite);
        this.writeI16(field.id);
    }

    this.lastFieldId_ = field.id;
}

其他格式

bool类型：
一个字节。

如果bool型的字段是结构体或消息的成员字段并且有编号，一个字节的高4位表示字段编号，低4位表示bool的值（0001:true， 0010：false），即：一个字节的低4位的值（true：1，false：2）.

如果bool型的字段单独存在，一个字节表示值，即：一个字节的值（true：1，false：2）.

Byte类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一个字节的值.

I16类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一至三个字节的值.

I32类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一至五个字节的值.

I64类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一至十个字节的值.

double类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），八个字节的值.注：把double类型的数据转成八字节保存，并用小端方式发送。

String类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一至五个字节的负载数据的长度，负载数据.

Struct类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），结构体负载数据，一个字节的结束标记.

MAP类型：

一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一至五个字节的map元素的个数，一个字节的键值类型组合（高4位键类型，低4位值类型），Map负载数据.

Set类型：

表示方式一：一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一个字节的元素个数和值类型组合（高4位键元素个数，低4位值类型），Set负载数据.适用于Set中元素个数小于等于14个的情况。

表示方式二：一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一个字节的键值类型（高4位全为1，低4位值类型），一至五个字节的map元素的个数，Set负载数据.适用于Set中元素个数大于14个的情况。

List类型：

表示方式一：一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一个字节的元素个数和值类型组合（高4位键元素个数，低4位值类型），List负载数据.
适用于Set中元素个数小于等于14个的情况。

表示方式二：一个字节的编号与类型组合（高4位编号偏移1，低4位类型），一个字节的键值类型（高4位全为1，低4位值类型），一至五个字节的map元素的个数，List负载数据.适用于Set中元素个数大于14个的情况。

消息（函数）类型：

一个字节的版本，一个字节的消息调用（请求：0x21，响应：0x41，异常：0x61，oneway：0x81），一至五个字节的消息名称长度，消息名称，消息参数负载数据，一个字节的结束标记。

以上说明是基于相邻字段的编号小于等于15的情况。

如果字段相邻编号大于15，需要把类型和编号分开表示：用一个字节表示类型，一至五个字节表示编号偏移值。

读取方式

经过代码分析，我们了解了写入的过程，下面来了解读取过程。
我们还是来看读取Field的例子，Field 会封装在TMessage 当中，所以，首先read会读取TMessage,读取TMessage的没有什么特殊的，就是将TMessage 按照writeMessage 的顺序读取出来，之前会有一个判断版本号的一个操作

public TField readFieldBegin() throws TException {
    byte type = readByte();
    if (type == TType.STOP) {
      return TSTOP;
    }

    short fieldId;

    // 主要是计算field的id
    // 写入是以高4为未field id的偏移 所以需要将field 的偏移算出来
    short modifier = (short)((type & 0xf0) >> 4);
    if (modifier == 0) {
      //如果否没有偏移 直接 读取16位的field id
      fieldId = readI16();
    } else {
      // 否则 偏移 + 组内编号得到 field整整的id
      fieldId = (short)(lastFieldId_ + modifier);
    }
    // 转换为 type
    TField field = new TField("", getTType((byte)(type & 0x0f)), fieldId);

    // if this happens to be a boolean field, the value is encoded in the type
    if (isBoolType(type)) {
      // save the boolean value in a special instance variable.
      boolValue_ = (byte)(type & 0x0f) == Types.BOOLEAN_TRUE ? Boolean.TRUE : Boolean.FALSE;
    } 

    //记录field
    lastFieldId_ = field.id;
    return field;
  }

read 方法举例

public short readI16() throws TException {
    return (short)zigzagToInt(readVarint32());
}

private int readVarint32() throws TException {
    int result = 0;
    int shift = 0;
    // i32buf 是5个字节 也就是说 varint 最大字节为5 
    if (trans_.getBytesRemainingInBuffer() >= 5) {
      byte[] buf = trans_.getBuffer();
      int pos = trans_.getBufferPosition();
      int off = 0;
      while (true) {
        // varin32的读法，建议读一下上面的 varint32的写法 这里不再详解
        byte b = buf[pos+off];
        result |= (int) (b & 0x7f) << shift;
        if ((b & 0x80) != 0x80) break;
        shift += 7;
        off++;
      }
      trans_.consumeBuffer(off+1);
    } else {
      while (true) {
        byte b = readByte();
        result |= (int) (b & 0x7f) << shift;
        if ((b & 0x80) != 0x80) break;
        shift += 7;
      }
    }
    return result;
  }

压缩测试

本节我们来讨论压缩比，上面说了这么多，那么TCompactProtocol 的压缩比如何呢？
首先我们需要 MemeoryUtil 这个工具

1、编写测试bean 并生成对象

struct User {
    1: required string name,
    2: required i16 age = 0;
    3: required bool gender,
    4: required i32 No,
    5: required i64 createTime,
    6: required double grade,
    7: required list<Friends> friends,
    8: optional map<i32, Friends> mapUser,
    9: optional set<Friends> setUser,
    10: required UserType userType = UserType.STUDENT,
    11: optional i32 number
}

2、编写测试方法。

public class CompressionTest {

    protected byte[] buf = new byte[1024];
    protected TMemoryInputTransport memory ;
    protected TByteArrayOutputStream tByteArrayOutputStream ;
    protected TFramedTransport tFramedTransport;
    protected TCompactProtocol tcprotocol;
    protected Class<? extends TFramedTransport> aClass ;
    Field writeBuffer_;

    @Before
    public void preCompactTest() throws Exception {
        byte[] buf = new byte[1024];
        memory = new TMemoryInputTransport(buf);
        /*
          利用反射 将写如的对象转换成 TByteArrayOutputStream
          这样通过 TCompactProtocol 写入的就是   TByteArrayOutputStream 这个对象当中，方面我们统计
        **/
        tByteArrayOutputStream = new TByteArrayOutputStream(1024);
        tFramedTransport = new TFramedTransport(memory);
        tcprotocol = new TCompactProtocol(tFramedTransport);
        aClass = tFramedTransport.getClass();

        writeBuffer_ = aClass.getDeclaredField("writeBuffer_");
        writeBuffer_.setAccessible(true);
        writeBuffer_.set(tFramedTransport, tByteArrayOutputStream);

    }
    // MemoryUtil 是classmexer 当中的方法
   @Test
    public void testTMessageCompactTest() throws Exception {
        final long[] total = {0};
        IntStream.range(1, 1000)
                .mapToObj(CompressionTest::mockUser)
                .forEach(user -> {
                    try {
                        user.write(tcprotocol);
                        long l = MemoryUtil.deepMemoryUsageOf(user, MemoryUtil.VisibilityFilter.ALL);
                        total[0] +=l;
                    } catch (TException e) {
                        e.printStackTrace();
                    }
                });

        int len = tByteArrayOutputStream.len();
        // 转换
        double userTotalDouble = (double) total[0];
        double rate = (double) len / userTotalDouble;
        String format = String.format("%.4f", rate);
        System.out.println("total:" + total[0]);
        System.out.println("len:" + len);
        System.out.println("rate:" + format);
    }
    // inet 类型的测试方法
    @Test
    public void testIntCompactTest() throws Exception {
        for (int i = 1; i <= 1000; i++) {
            tcprotocol.writeI32(i);
        }
        int len = tByteArrayOutputStream.len();
        double rate = (double) len / (4 * 1000);
        System.out.println("rate:" + rate);
        System.out.println("len:" + len);
    }
    // 生成一个 测试对象
    public static User mockUser (int no) {
        User user = new User();
        user.setAge(Short.MAX_VALUE);
        user.setCreateTime(System.currentTimeMillis());
        user.setGender(true);
        user.setName("name" + no);
        user.setNo(no);
        Friends friends = new Friends();
        friends.setNo(Short.MAX_VALUE);
        user.setFriends(Collections.singletonList(friends));
        return user;
    }

3、将 classmexer.jar 添加到lib 当中去
4、 jvm 参数加上代理参数： VM options:如下图所示

最后得出结果
User 1000个的压缩结果如下

1
2
3

total:815184
len:42786
rate:0.0525

Int 类型压缩结果如下

1 2	rate:0.48425 len:1937

参考

赏

支付宝打赏

微信打赏

赞赏一下

协议和编解码(2)-TCompactProtocol

概诉

编解码

varint

zigzag

解码

TCompactProtocol 代码详解

代码

写入方式

写bool类型。

`writeFieldBeginInternal` 压缩写法

其他格式

读取方式

read 方法举例

压缩测试

参考

FEATURED TAGS

FRIENDS

概诉

编解码

varint

zigzag

解码

TCompactProtocol 代码详解

代码

写入方式

写bool类型。

writeFieldBeginInternal 压缩写法

其他格式

读取方式

read 方法举例

压缩测试

参考

FEATURED TAGS

FRIENDS

`writeFieldBeginInternal` 压缩写法