Kryo

Spark序列化时可以用Kryo框架。

序列化的过程中主要有3个指标:

  1. 对象序列化后的大小 : 一个对象会被序列化工具序列化为一串byte数组,这其中包含了对象的field值以及元数据信息,使其可以被反序列化回一个对象
  2. 序列化与反序列化的速度 : 一个对象被序列化成byte数组的时间取决于它生成/解析byte数组的方法
  3. 序列化工具本身的速度 : 序列化工具本身创建会有一定的消耗
  • Chunked Encoding : 分块编码
  • Forward compatibility : reading bytes serialized by newer classes
  • Backward compatibility : reading bytes serialized by older classes

Kryo 本地开发

  1. Change version of bcel from

     <dependency>
             <groupId>org.apache.bcel</groupId>
             <artifactId>bcel</artifactId>
             <version>6.0-SNAPSHOT</version>
     </dependency>
    
     TO:
       <dependency>
             <groupId>org.apache.bcel</groupId>
             <artifactId>bcel</artifactId>
             <version>6.0</version>
     </dependency>
    
  2. Run command : mvn clean compile -P java8

  3. >> is arithmetic shift right, >>> is logical shift right.

In an arithmetic shift, the sign bit is extended to preserve the signedness of the number.

For example: -2 represented in 8 bits would be 11111110 (because the most significant bit has negative weight). Shifting it right one bit using arithmetic shift would give you 11111111, or -1. Logical right shift, however, does not care that the value could possibly represent a number; it simply moves everything to the right and fills in from the left with 0s. Shifting our -2 right one bit using logical shift would give 01111111.

-1 >>> 32 is equivalent to -1 >>> 0 and -1 >>> 33 is equivalent to -1 >>> 1 and, especially confusing, -1 >>> -1 is equivalent to -1 >>> 31

System.out.println(Integer.toBinaryString(7));

  1. b & 0x7F 这个意思是位与运算,把b转成二进制数据与 0111 1111 进行二进制上的与运算,与相当于乘,最后得出的结果除了符号位以外,其他就是b本身。

    b & 0x80 这个处理符号位

  2. writeVarInt()在optimizePositive=false的时候,采用Zigzag Encoding,类似于Protocol Buffer.
  3. 在kryo中,每个类有一个关联的registration Id 和类名的引用ID,第一次写一个类时候,写的次序为:registration Id,类名引用ID和类名.

References

1 原码, 反码, 补码 详解

results matching ""

    No results matching ""