动态范围量化

2025年02月09日

柏拉文

越努力，越幸运

一、认识

动态范围量化（Dynamic Range Quantization, DRQ） 将模型的权重从浮点数转换为低精度的整数（如 8 位），从而减少模型大小和提高推理速度。TensorFlow.js 支持这种量化方式，可以显著减少模型的存储需求。它是一种 后训练量化（PTQ） 方法，它主要量化权重到 8-bit，而 激活值（Activation） 仍然保持浮点计算，能在降低模型大小和加速推理之间取得较好平衡。

动态范围量化（Dynamic Range Quantization, DRQ）: 减小模型大小（权重变成 int8，但激活仍是 float32）,提高推理速度（适用于 CPU、移动端等设备）, 几乎不损失推理精度（比 8-bit 全量化更稳定）。如果你使用 WebGL：动态范围量化（DRQ）适合你，它不会影响 WebGL 硬件加速。

动态范围量化（Dynamic Range Quantization, DRQ）实现如下:

加载原始模型
量化权重（转换为 int8 或 float16）
保持激活值计算为 float32
重新导出模型

二、实现

const fs = require("fs");
const tf = require("@tensorflow/tfjs-node");

async function quantizeModelDynamicRange(inputModelPath, outputModelPath) {
  console.log("🔹 Loading original model...");
  const model = await tf.loadGraphModel(`file://${inputModelPath}`);

  console.log("🔹 Extracting weightMap...");
  const weightMap = model.executor.weightMap; // Graph Model 的权重存储在这里

  if (!weightMap || Object.keys(weightMap).length === 0) {
    console.error("❌ No weights found in model. Check model.json.");
    return;
  }

  console.log("🔹 Applying dynamic range quantization...");
  const quantizedWeights = [];
  let allWeightData = []; // 存储所有量化后的权重数据
  let offset = 0;

  for (const [name, tensors] of Object.entries(weightMap)) {
    const tensor = tensors[0]; // 取第一项（通常权重只有一项）
    const originalData = tensor.dataSync();
    const int8Data = new Int8Array(originalData.length);

    // 计算缩放因子 (scale)
    const minVal = Math.min(...originalData);
    const maxVal = Math.max(...originalData);
    const scale = (maxVal - minVal) / 255;

    // 量化权重
    for (let i = 0; i < originalData.length; i++) {
      int8Data[i] = Math.round((originalData[i] - minVal) / scale);
    }

    quantizedWeights.push({
      name,
      shape: tensor.shape,
      dtype: "int8",
      quantization: { minVal, maxVal, scale }, // 存储缩放信息
      offset,
      length: int8Data.length,
    });

    allWeightData.push(Buffer.from(int8Data));
    offset += int8Data.length; // 更新偏移量
  }

  console.log("🔹 Saving quantized model...");
  fs.mkdirSync(outputModelPath, { recursive: true });

  // 保存 model.json，确保 `weightsManifest` 正确
  const quantizedModel = {
    modelTopology: model.modelTopology,
    weightsManifest: [
      {
        paths: ["group1-shard1of1.bin"],
        weights: quantizedWeights,
      },
    ],
  };

  fs.writeFileSync(
    `${outputModelPath}/model.json`,
    JSON.stringify(quantizedModel, null, 2)
  );

  // 保存权重到 `group1-shard1of1.bin`
  const weightBinPath = `${outputModelPath}/group1-shard1of1.bin`;
  fs.writeFileSync(weightBinPath, Buffer.concat(allWeightData));

  console.log(`✅ Quantized model saved at: ${outputModelPath}`);
}

// 运行量化
quantizeModelDynamicRange("./model.json", "newModel");

一、认识​

二、实现​

一、认识

二、实现