2024/05/13

Guava Hashing

Guava Hash 裡面有幾個重要的類別:Hashing, HashFunction, Hasher, HashCode, Funnel, PrimitiveSink

HashFunction

HashFunction 是 stateless function,可將任意的 data 轉換為 fixed number of bits,相同的 input 會得到相同的 output 結果,不同的 input 得到不同的唯一的結果

可產生 Hasher 物件,也可以根據條件回傳 HashCode 結果

    @Test
    public void hashFunction() {
        HashFunction hf = Hashing.md5();
        HashCode hc = hf.newHasher()
                .putLong(10)
                .putString("abc", Charsets.UTF_8)
                .hash();
        System.out.println("hashFunction="+hc.toString());
    }

Hasher

HashFunction 可以使用 stateful Hasher,Hasher 提供 fluent syntax,可將資料填充到 Hash 裡面,並得到 hash value。Hasher 可接受 primitive input, byte array, slices of byte arrays, character sequences, 任意物件產生的 Funnel。

資料透過 putXXX() method 放到 Hasher 的資料源中,有 putByte, putBytes, putShort, putInt, putLong, putFloat, putdouble, putBoolean, putChar, putEnencodedChars, putString, putObject

Funnel

如果有一個類別

class Person {
  final int id;
  final String firstName;
  final String lastName;
  final int birthYear;
}

其 Funnel 會是

Funnel<Person> personFunnel = new Funnel<Person>() {
  @Override
  public void funnel(Person person, PrimitiveSink into) {
    into
        .putInt(person.id)
        .putString(person.firstName, Charsets.UTF_8)
        .putString(person.lastName, Charsets.UTF_8)
        .putInt(birthYear);
  }
};

組合起來的code

    @Test
    public void objectToHashCode() {

        // 需要hash的对象
        Person person = new Person(27, "John", "Wink", 1990);

        Funnel<Person> personFunnel = new Funnel<Person>() {
            @Override
            public void funnel(Person person, PrimitiveSink into) {
                into
                        .putInt(person.id)
                        .putString(person.firstName, Charsets.UTF_8)
                        .putString(person.lastName, Charsets.UTF_8)
                        .putInt(person.birthYear);
            }
        };

        // md5算法
        HashFunction hf = Hashing.md5();
        HashCode hc = hf.newHasher()
                .putString("abc", Charsets.UTF_8)
                .putObject(person, personFunnel)
                .hash();
        System.out.println("objectToHashCode="+hc.toString());
    }

    class Person {
        int id;
        String firstName;
        String lastName;
        int birthYear;

        public Person(int id, String firstName, String lastName, int birthYear) {
            this.id = id;
            this.firstName = firstName;
            this.lastName = lastName;
            this.birthYear = birthYear;
        }
    }

HashCode

當 Hasher 給了所有 input 資料,他的 hash() method 會產生 HashCode 結果

HashCode 支援 euqality testing 例如 asInt(), asLong, asBytes()

另外有 writeBytesTo(array, offset, maxLength) 可將 hash 的 maxLength bytes 寫入 array

BloomFilter

可偵測物件是否 "definitely" 不在 filter 裡面,或是有可能已經被加入 Bloom filter

對於 hash 來說,就是偵測是否 hash value 有碰撞,不同的 input 得到相同的結果

BloomFilter<Person> friends = BloomFilter.create(personFunnel, 500, 0.01);
for (Person friend : friendsList) {
  friends.put(friend);
}
// much later
if (friends.mightContain(someone)) {
  // the probability that someone reached this place if they aren't a friend is
  // 1% we might, for example, start asynchronously loading things for someone
  // while we do a more expensive exact check
}

Hashing

Hashing (Guava: Google Core Libraries for Java HEAD-jre-SNAPSHOT API)

裡面有 static method,可取得某些演算法的 HashFunction

  • adler32()

  • crc32(), crc32c()

  • farmHashFingerprint64()

  • fingerprint2011()

  • goodFastHash(int minimumBits)

  • hmacMd5

  • hmacSha1

  • hmacSha256

  • hmacSha512

  • md5()

  • murmur3_128(), murmur3_32()

  • sha1(), sha256(), sha384(), sha512()

  • sipHash24

    @Test
    public void hashing() {
        // 測試 Hasing 演算法
        String input = "hello, world";
        // MD5
        System.out.println("md5="+Hashing.md5().hashBytes(input.getBytes()).toString());
        // sha256
        System.out.println("sha256="+Hashing.sha256().hashBytes(input.getBytes()).toString());
        // sha512
        System.out.println("sha512="+Hashing.sha512().hashBytes(input.getBytes()).toString());
        // crc32
        System.out.println("crc32="+Hashing.crc32().hashBytes(input.getBytes()).toString());
        // MD5
        System.out.println("MD5="+Hashing.md5().hashUnencodedChars(input).toString());
    }

References

HashingExplained · google/guava Wiki · GitHub

沒有留言:

張貼留言