实现

2025年02月22日

柏拉文

越努力，越幸运

一、认识

基于 JavaScript 模拟实现布隆过滤器的示例代码。该实现支持在构造时传入预期存储元素的数量和允许的误判率（false positive rate），并根据标准公式计算位数组大小和哈希函数数量。代码中使用了两种简单的哈希算法（djb2 和 sdbm），并通过双重哈希技术生成多个哈希值，从而实现对元素的添加和查询。

二、实现

class BloomFilter {
  /**
   * 构造函数
   * @param {number} expectedItems 预期存储的元素数量
   * @param {number} falsePositiveRate 允许的误判率（例如 0.01 表示 1% 误判率）
   */
  constructor(expectedItems, falsePositiveRate) {
    this.n = expectedItems;
    this.p = falsePositiveRate;
    // 计算位数组大小 m： m = - (n * ln(p)) / (ln2)^2
    this.m = Math.ceil(-(this.n * Math.log(this.p)) / (Math.log(2) ** 2));
    // 计算需要的哈希函数个数 k： k = (m/n) * ln2
    this.k = Math.ceil((this.m / this.n) * Math.log(2));
    // 使用 Uint8Array 构建位数组，每个元素代表 8 位
    this.bitArray = new Uint8Array(Math.ceil(this.m / 8));
  }

  // 内部方法：设置位数组中指定位置为 1
  _setBit(index) {
    const pos = Math.floor(index / 8);
    const bit = index % 8;
    this.bitArray[pos] |= (1 << bit);
  }

  // 内部方法：检查位数组中指定位置是否为 1
  _getBit(index) {
    const pos = Math.floor(index / 8);
    const bit = index % 8;
    return (this.bitArray[pos] & (1 << bit)) !== 0;
  }

  // 哈希函数 1：djb2 算法
  _hash1(str) {
    let hash = 5381;
    for (let i = 0; i < str.length; i++) {
      hash = ((hash << 5) + hash) + str.charCodeAt(i); // hash * 33 + c
    }
    return hash >>> 0; // 转为无符号 32 位整数
  }

  // 哈希函数 2：sdbm 算法
  _hash2(str) {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      hash = str.charCodeAt(i) + (hash << 6) + (hash << 16) - hash;
    }
    return hash >>> 0;
  }

  /**
   * 添加元素到布隆过滤器
   * @param {string} item 要添加的元素（使用字符串表示）
   */
  add(item) {
    const h1 = this._hash1(item);
    const h2 = this._hash2(item);
    for (let i = 0; i < this.k; i++) {
      // 使用双哈希技巧生成第 i 个哈希值
      const combinedHash = (h1 + i * h2) % this.m;
      this._setBit(combinedHash);
    }
  }

  /**
   * 检查元素是否可能存在于布隆过滤器中
   * @param {string} item 要检查的元素
   * @returns {boolean} 返回 true 表示元素可能存在；返回 false 表示元素一定不存在
   */
  test(item) {
    const h1 = this._hash1(item);
    const h2 = this._hash2(item);
    for (let i = 0; i < this.k; i++) {
      const combinedHash = (h1 + i * h2) % this.m;
      if (!this._getBit(combinedHash)) {
        return false;
      }
    }
    return true;
  }
}

// 使用示例
const bf = new BloomFilter(1000, 0.01); // 预期存储 1000 个元素，允许 1% 的误判率
bf.add("hello");
bf.add("world");

console.log("hello 是否存在:", bf.test("hello")); // 应输出 true
console.log("world 是否存在:", bf.test("world")); // 应输出 true
console.log("foo 是否存在:", bf.test("foo"));     // 大概率输出 false

1. 构造函数: 根据预期存储的元素数量 expectedItems 和允许的误判率 falsePositiveRate，利用公式计算, 并用 Uint8Array 构建一个位数组来存储数据。

位数组大小: $m = - \frac{(\ln 2)^2}{n \cdot \ln p}$
哈希函数数量: $k = \frac{n}{m} \cdot \ln 2$

2. 设置与获取位: 通过 _setBit 和 _getBit 方法实现对位数组中某一位的操作，这里采用了按位操作的方式存储布隆过滤器的状态。

3. 哈希函数: 实现了两个简单的哈希函数（_hash1 和 _hash2），并利用双哈希技术生成多个哈希值，这样避免了实现多个独立的哈希函数。

4. 添加与查询:

add(item): 方法会对传入的字符串计算 k 个哈希值，并将相应位设置为 1。
test(item): 方法则判断传入字符串的所有 k 个位置是否均为 1，若有任一位为 0，则该元素一定不存在；若全部为 1，则可能存在。

一、认识​

二、实现​

一、认识

二、实现