TensorFlow

2024年02月19日

柏拉文

越努力，越幸运

一、认识

面部识别 的实现分为三部分

视频录制时, 实时获取视频帧图像
在 Worker 环境中, 处理图像数据的机器学习任务
在 canvas 中绘制预测结果, 包括脸型、眼型、嘴型、手势

1.1 视频录制时, 实时获取视频帧图像

根据 机器学习模型 的输入标准, 需要指定视频帧图像的大小, 一般为 320 * 320, 以左上角顶点为基准, 并填充空白部分。同时, 为了保持模型预测与结果绘制的一致性, 需要将获取的视频帧图像进行水平翻转。随后通过 context.drawImage 将视频元素中的图像绘制到 canvas 上, 最后通过 context.getImageData 获取 canvas 的像素数据。

1.2 在 `Worker` 环境中, 处理图像数据的机器学习任务

在 Web Worker 环境中, 通过 importScripts 语句导入 TensorFlow.js 库。加载成功后, 通过 tf.loadGraphModel 加载 model.json 的图像面部识别机器学习模型。处理 Worker Message 事件, 当 Worker 收到消息时，会接收到录制视频时实时获取的视频帧图像数据, 数据类型为 Uint8ClampedArray。基于图像数据, 开始进行图片面部识别预测和定位。预测定位过程如下所示:

图像像素数据预处理: 获取的视频帧图像是一个包含 RGBA 像素数据的一维 Uint8ClampedArray, 由于机器学习模型只需要 RGB 三个颜色通道, 所以需要将每个像素的 A 透明度通道过滤出去。过滤方案: if (index % 4 !== 3) { data.push(val) }, 透明度通道的索引是每个像素的第 4 个值，也就是 RGBA 中的 A 值。
基于预处理后的数据, 创建一个 Tensor 张量, 并管理 GPU 内存使用: 基于第一步预处理完后的数据, 通过 tf.tensor 创建 Tensor 张量, 并 reshape 为一个4维数组以匹配模型的输入格式。通过 tf.tidy 管理 GPU 内存的使用，它将在回调函数执行完毕后自动清理中间张量。通过 tf.tidy 会回收函数内部创建的所有张量的内存，但不会回收返回的张量的内存。
异步运行模型预测, 等待预测结果: 通过 model.executeAsync(张量) 来异步预测结果
清理输入数据的张量以节省内存: 通过 tf.dispose 清空输入数据的张量 inputData,
转换预测结果, 并管理 GPU 内存使用: 循环遍历预测结果, 将人脸、手势、微笑等分离开来,分别放入各自的数组并返回。因此, 转换后会获得包含所有人脸和手部信息的对象。通过 tf.tidy 管理 GPU 内存的使用，它将在回调函数执行完毕后自动清理中间张量。通过 tf.tidy 会回收函数内部创建的所有张量的内存，但不会回收返回的张量的内存。
将预测结果告知给主线程: 通过 postMessage 将面部识别模型的预测结果发送给主线程。
清理用于预测的张量以节省内存: 通过 tf.dispose 清理用于预测的张量以节省内存

1.3 根据预测结果, 绘制脸型、眼型、嘴型、手势

预测结果中包含 hands 手势、faces 面部位置信息, 其中 faces 中还有 eyes、mouth 位置信息。位置信息格式为: x1,y1,x2,y2, 根据位置信息, 绘制各自的矩形框（需要做好尺寸转换后的位置映射关系计算和绘制）。

1.4 根据预测结果, 计算眼神、微笑、手势得分

预测结果中记录了 hands 手势、faces 面部信息数组, 根据手势、面部信息数组, 随后计算得分逻辑如下:

计算手势、面部、微笑、自信数量
1. 手势数量: 所有视频帧数据中的捕获的手势总和
2. 面部数量: 所有视频帧数据中的捕获的面部总和
3. 微笑程度: 面部识别机器模型可以从视频帧图像获取到面部中的微笑表情, 给定一个微笑值表示微笑程度, 存储在面部数据中。所以, 遍历所有面部数据中的微笑程度, 如果微笑程度大于给定的阈值, 我们就认定为这是一个微笑, 微笑数量加 1
4. 自信数量: 面部识别机器模型可以从视频帧图像获取到面部中的眼睛相对位置, 眼珠偏移值。眼珠偏移水平中心点距离占水平半径的比例，与眼珠偏移垂直中心点距离占垂直半径的比例，两者平均值小于0.2，当作眼睛居中了，认为眼神自信，自信数量加一。
根据预测结果大小与各自信息的数量, 计算各自得分:
1. 手势得分: 计算手势数量占预测结果总量的比例, 如果比例大于 0.52, 我们就认为在手势呈现上区域完美, 会给一个满分。如果比例小于 0.52, 我们定义了一个包含不同的比例阈值列表, 这些阈值从 0.005 到 0.52 不等，提供了不同层次的细分。循环遍历比例阈值列表, 每次比较当前阈值与手势数量占预测结果总量的比例, 并维护一个得分变量, 每次遍历都会加 1, 如果当前阈值小于手势数量占预测结果总量的比例时，此时的得分变量就是我们计算的最终手势得分。
2. 自信得分: 计算自信眼神占所有眼神总量的比例, 并通过 Math.round(1.9 * ratio * 10) / 10 得出我们最终的自信得分。
3. 微笑得分: 如果微笑数量小于 4, 那么我们认为表情是比较生硬的, 所以会直接基于微笑数量给出一个评分。如果大于 4, 计算微笑数量占预测结果总量的比例, 我们定义了一个包含不同的比例阈值列表, 这些阈值从 1 / 300 到 1 不等，提供了不同层次的细分。循环遍历比例阈值列表, 每次比较当前阈值与微笑数量占预测结果总量的比例, 并维护一个得分变量, 每次遍历都会加 1, 如果当前阈值小于微笑数量占预测结果总量的比例时，此时的得分变量就是我们计算的最终手势得分。

1.5 录制视频数据分片上传

通过 MediaRecorder API 设置数据输出间隔为 1s, 每1s后，首先将这个数据转换为Webm视频格式，并进行上传，并记录resourceId记录。结束录制后，将resourceId记录传送给bos进行流媒体的转换。注意用户取消了本次视频作业的录制，或进行了重新上传，需要将已上传的resource进行清除，否则会占用过多的bos资源。用户网络比较差的情况下，如上传失败，进行上传重试，同时弹窗提示用户网络较差。如果重试后仍然失败，将数据转换位blob格式，存储在localStorage中，监听网络状态的变化，如网络变好，进行上传重试，如网络失败超过30s，弹窗提示用户上传失败，是否需要取消本次录制。

二、实现

2.1 index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    <div class="operation">
      <button id="start-record">录制</button>
      <button id="stop-record">停止</button>
    </div>
    <div class="video-container" style="position: relative">
      <video
        id="video"
        autoplay
        style="width: 640px; height: 480px; transform: scaleX(-1)"
      ></video>
      <div class="canva-container" style="position: absolute; top: 0">
        <canvas
          id="capture-canvas"
          width="640"
          height="480"
          style="display: none"
        ></canvas>
        <canvas id="face-canvas" width="640" height="480"></canvas>
      </div>
    </div>

    <script type="module" src="./record.js"></script>
  </body>
</html>

2.2 record.js

import { drawFaceCanvas } from './draw.js';
import { calculate } from './calculate.js';

let recordStatus = 'init';
let worker = null;
let mediaRecorder = null;
const recordedBlobs = [];
const predictResult = [];
let recordDurationInMs = 0;
let startRecordTimestamp = 0;
const video = document.getElementById('video');
const startRecordEl = document.getElementById('start-record');
const stopRecordEl = document.getElementById('stop-record');

const captureCanvas = document.getElementById('capture-canvas');
const captureContext = captureCanvas.getContext('2d');
const faceCanvas = document.getElementById('face-canvas');
const faceContext = faceCanvas.getContext('2d');

function makeId(length) {
  const characters =
    'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  const charactersLength = characters.length;
  let result = '';
  for (let i = 0; i < length; i++) {
    result += characters.charAt(Math.floor(Math.random() * charactersLength));
  }

  return result;
}

function getImageDataFromVideoElement({
  context,
  videoSize,
  videoElement,
  imageDataSize
}) {
  context.clearRect(0, 0, imageDataSize, imageDataSize);
  context.save();
  context.translate(imageDataSize, 0);
  context.scale(-1, 1);
  context.drawImage(
    videoElement,
    0,
    0,
    videoSize.width,
    videoSize.height,
    0,
    0,
    imageDataSize,
    (imageDataSize * videoSize.height) / videoSize.width
  );
  context.restore();
  const imageData = context.getImageData(0, 0, imageDataSize, imageDataSize);
  const imageDataBuffer = imageData.data.buffer;
  const imageWidth = imageData.width;
  const imageHeight = imageData.height;

  return {
    id: makeId(4),
    buffer: imageDataBuffer,
    width: imageWidth,
    height: imageHeight
  };
}

function captureFrame() {
  const imageData = getImageDataFromVideoElement({
    imageDataSize: 320,
    videoElement: video,
    context: captureContext,
    videoSize: { width: 640, height: 480 }
  });
  worker.postMessage(
    {
      id: imageData.id,
      buffer: imageData.buffer,
      width: imageData.width,
      height: imageData.height
    },
    [imageData.buffer]
  );
}

function startRecord() {
  recordStatus = 'recording';
  const timeSlice = 5000;
  mediaRecorder.start(timeSlice);
  startRecordTimestamp = Date.now();
  captureFrame();
}

async function stopRecord() {
  recordStatus = 'stopped';
  const stream = video.srcObject;
  const tracks = stream.getTracks();
  recordDurationInMs = Date.now() - startRecordTimestamp;

  tracks.forEach(track => {
    track.stop();
  });

  video.srcObject = null;
  mediaRecorder.stop();

  const calcParams = {
    predictResult,
    config: {
      smilePassLine: 0.6,
      smileThreshold: 0.4,
    },
    recordSeconds: recordDurationInMs / 1000,
  };

  const aiResult = calculate(calcParams);
  console.log('aiResult', aiResult);
}

async function prepareRecord() {
  const stream = await navigator.mediaDevices.getUserMedia({
    video: true,
    audio: true
  });

  video.srcObject = stream;
  video.play();

  mediaRecorder = new MediaRecorder(stream, {
    mimeType: 'video/webm'
  });

  mediaRecorder.ondataavailable = event => {
    if (event.data && event.data.size > 0) {
      recordedBlobs.push(event.data);
    }
  };
}

async function checkPermissions(name) {
  try {
    return await navigator.permissions.query({ name: name });
  } catch (error) {
    return false;
  }
}

function prepareWorker() {
  worker = new Worker('./worker/worker.js');
}

async function run() {
  const camera = await checkPermissions('camera');
  const microphone = await checkPermissions('microphone');
  if (camera.state === 'granted' && microphone.state === 'granted') {
    prepareRecord();
    prepareWorker();
    startRecordEl.addEventListener('click', startRecord);
    stopRecordEl.addEventListener('click', stopRecord);
    worker.addEventListener('message', event => {
      if (recordStatus === 'recording') {
        const { data } = event;
        const { output, id } = data || {};

        drawFaceCanvas(
          faceContext,
          output,
          faceCanvas.width,
          faceCanvas.height
        );

        predictResult.push({
          ...output,
          faces: output.faces.map(faceItem => ({
            ...faceItem,
            mouthPos: undefined,
            facePos: undefined
          }))
        });

        captureFrame();
      }
    });
  } else {
    alert('请允许使用摄像头和麦克风');
  }
}

run();

2.3 draw.js

const EYE_GROUP_LEN = 10;

const CANVAS_PALETTE = {
  face: {
    color: '#FFD92B',
    pointColor: '#FF9800',
    opacity: ['100%', '20%']
  },
  eye: {
    color: '#46D06E',
    pointColor: '#1F9E40',
    opacity: ['100%', '20%']
  },
  mouth: {
    color: '#FFA43E',
    pointColor: '#E56014',
    opacity: ['100%', '20%']
  },
  hand: {
    color: '#5ec8fe',
    pointColor: '#0b66f9',
    opacity: ['100%', '20%']
  },
  sound: {
    color: '#F5A623',
    pointColor: '',
    opacity: ['100%', '8%', '%0']
  }
};

export const VIDEO_WIDTH = 640;
export const VIDEO_HEIGHT = 480;
export const PREDICT_SIZE = 320;
export const SCALE_RATIO = VIDEO_WIDTH / PREDICT_SIZE;

function getSightAverage(totalEyeCount, passedEyeCount) {
  if (totalEyeCount === 0) {
    return 0;
  }

  return Number(totalEyeCount) === 0 ? 0 : passedEyeCount / totalEyeCount;
}

function getSightPassCount(workerResult) {
  let passedEyeCount = 0;
  const faces = workerResult.faces || [];
  faces.forEach(face => {
    const eyesCount = Math.round(face.eyesPos.length / EYE_GROUP_LEN);
    let ratio = 0;
    for (let i = 0; i < eyesCount; i++) {
      const startIndex = i * EYE_GROUP_LEN;
      const eyePos = face.eyesPos.slice(startIndex, startIndex + EYE_GROUP_LEN);

      const ey1 = eyePos[1];
      const ey2 = eyePos[3];
      const ey3 = eyePos[9];

      const ex1 = eyePos[4];
      const ex2 = eyePos[6];
      const ex3 = eyePos[8];

      const horizontalRatio = Math.abs(((ey1 + ey2) / 2 - ey3) / (ey1 - ey2));
      const verticalRatio = Math.abs(((ex1 + ex2) / 2 - ex3) / (ex1 - ex2));
      const sightRatio = horizontalRatio + verticalRatio;
      ratio += sightRatio;
    }
    ratio /= eyesCount;
    if (ratio < 0.035) {
      passedEyeCount += 1;
    }
  });
  return passedEyeCount;
}

function drawFaceBorder(
  videoCtx,
  xPos,
  yPos,
  x2Pos,
  y2Pos,
  lineWidth,
  type = 'face'
) {
  const { color: lineColor, pointColor } = CANVAS_PALETTE[type];

  xPos *= SCALE_RATIO;
  yPos *= SCALE_RATIO;

  x2Pos *= SCALE_RATIO;
  y2Pos *= SCALE_RATIO;

  drawRectBorder(
    videoCtx,
    xPos,
    yPos,
    x2Pos,
    y2Pos,
    lineWidth,
    lineColor,
    pointColor
  );
}

function drawRectBorder(
  videoCtx,
  xPos,
  yPos,
  x2Pos,
  y2Pos,
  lineWidth,
  lineColor,
  pointColor
) {
  videoCtx.lineWidth = lineWidth; // 设置线条宽度

  drawLine(videoCtx, xPos, yPos, xPos, y2Pos, lineColor, pointColor);
  drawLine(videoCtx, xPos, y2Pos, x2Pos, y2Pos, lineColor, pointColor);
  drawLine(videoCtx, x2Pos, y2Pos, x2Pos, yPos, lineColor, pointColor);
  drawLine(videoCtx, x2Pos, yPos, xPos, yPos, lineColor, pointColor);

  videoCtx.closePath();
}

function drawLine(
  videoCtx,
  xStartPos,
  yStartPos,
  xEndPos,
  yEndPos,
  lineColor,
  pointColor
) {
  drawStartPoint(videoCtx, xStartPos, yStartPos, pointColor, 2);
  drawGradientLine(videoCtx, xStartPos, yStartPos, xEndPos, yEndPos, lineColor);
}

function drawGradientLine(
  videoCtx,
  xStartPos,
  yStartPos,
  xEndPos,
  yEndPos,
  lineColor
) {
  const gradient = videoCtx.createLinearGradient(
    xStartPos,
    yStartPos,
    xEndPos,
    yEndPos
  );

  gradient.addColorStop(0, lineColor);
  gradient.addColorStop(0.8, `${lineColor}88`);
  gradient.addColorStop(1, `${lineColor}00`);

  videoCtx.beginPath();
  videoCtx.moveTo(xStartPos, yStartPos);
  videoCtx.lineTo(xEndPos, yEndPos);
  videoCtx.strokeStyle = gradient;
  videoCtx.stroke();
}

function drawStartPoint(videoCtx, x, y, color, radius) {
  videoCtx.fillStyle = color;

  videoCtx.beginPath();
  videoCtx.arc(x, y, radius, 0, 2 * Math.PI);
  videoCtx.fill();
  videoCtx.closePath();
}

export function drawFaceCanvas(context, output, width, height) {
  const { hands, faces } = output;
  context.clearRect(0, 0, width, height);

  const sightAverage = getSightAverage(faces.length, getSightPassCount(output));
  const showSight = sightAverage > 0.2;
  faces.forEach(face => {
    const { faceRate, facePos } = face;

    // draw face border
    if (faceRate > 0.3) {
      const [x1, y1, x2, y2] = facePos;
      drawFaceBorder(context, x1, y1, x2, y2, 3, 'face');
    }

    if (showSight) {
      const offset = 0;
      // draw eyes border
      const { eyesPos } = face;

      const ex1 = eyesPos[4] - offset;
      const ey1 = eyesPos[1] - offset;
      const ex4 = eyesPos[16] + offset;
      const ey4 = eyesPos[13] + offset;
      drawFaceBorder(context, ex1, ey1, ex4, ey4, 3, 'eye');
    }

    const { mouthPos } = face;
    const mx1 = mouthPos[0];
    const my1 = mouthPos[7];
    const mx2 = mouthPos[12];
    const my2 = mouthPos[19];
    if (face.smileRate >= 0.4) {
      drawFaceBorder(context, mx1, my1, mx2, my2, 3, 'mouth');
    }
  });

  hands.forEach(hand => {
    const [x1, y1, x2, y2] = hand.handPos;
    drawFaceBorder(context, x1, y1, x2, y2, 3, 'hand');
  });
}

2.4 calculate.js

const DEFAULT_SCORE = 3;
const EYE_GROUP_LEN = 10;

function getHandAverage(handCount, recordSeconds) {
  const handAverage = (handCount / recordSeconds) * 60;
  return handAverage;
}

function getSightAverage(totalEyeCount, passedEyeCount) {
  if (totalEyeCount === 0) {
    return '0';
  }

  const average = totalEyeCount == 0 ? 0 : passedEyeCount / totalEyeCount;
  return average;
}

function getSmileAverage(smileCount, aiSize, smilePassLine) {
  let smileAverage = 0;

  if (smileCount < 1) {
    smileAverage = smilePassLine;
  } else if (smileCount < 2) {
    smileAverage = smilePassLine;
  } else {
    smileAverage = Math.min(
      1,
      smilePassLine + (1 - smilePassLine) * ((4 * smileCount) / aiSize)
    );
  }

  return smileAverage;
}

function getSmileScore(smileCount, aiSize) {
  let newScore = DEFAULT_SCORE * 10;
  if (!aiSize || !smileCount) {
    return 0;
  } else if (smileCount < 4) {
    newScore += smileCount;
    return newScore / 10;
  }
  const smileRate = smileCount / aiSize;
  const scoreLevelList = [
    1 / 300,
    2 / 300,
    3 / 300,
    4 / 300,
    1 / 60,
    4 / 180,
    5 / 180,
    6 / 180,
    1 / 10,
    2 / 10,
    3 / 10,
    4 / 10,
    5 / 10,
    0.6,
    0.8,
    1
  ];
  newScore = newScore + 4;
  for (let i = 0; i < scoreLevelList.length; i++) {
    const rate = scoreLevelList[i];
    newScore++;
    if (smileRate <= rate) {
      break;
    }
  }
  return newScore / 10;
}

function getSightScore(passedEyeCount, totalEyeCount) {
  if (!passedEyeCount) {
    return 0;
  } else {
    const ratio = passedEyeCount / totalEyeCount;
    const num = Math.round(1.9 * ratio * 10) / 10;
    return 3.1 + num;
  }
}

function getHandScore(handCount, aiSize) {
  if (!handCount) {
    return 0;
  }
  const handRate = handCount / aiSize;
  if (handRate > 0.52) {
    return 5;
  }
  const handLevelList = [
    0.005, 0.01, 0.015, 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.16, 0.2, 0.24,
    0.28, 0.32, 0.36, 0.4, 0.44, 0.48, 0.52
  ];
  let newScore = DEFAULT_SCORE * 10;

  for (let i = 0; i < handLevelList.length; i++) {
    const rate = handLevelList[i];
    newScore++;
    if (handRate <= rate) {
      break;
    }
  }

  return newScore / 10;
}

function parseAIFacePredictData({ predictResult, smileThreshold }) {
  let smileCount = 0;
  let handCount = 0;
  let totalEyeCount = 0;
  let passedEyeCount = 0;

  predictResult.forEach(item => {
    const { hands, faces } = item;
    if (hands.length > 0) {
      handCount += item.hands.length;
    }
    if (faces.length > 0) {
      // 累加脸部数量
      totalEyeCount += faces.length;

      faces.forEach(face => {
        // 微笑程度大于0.2判断为微笑，微笑数量加1。
        if (face.smileRate > smileThreshold) {
          smileCount += 1;
        }

        // 眼珠偏移水平中心点距离占水平半径的比例，与眼珠偏移垂直中心点距离占垂直半径的比例，两者平均值小于0.2，当作眼睛居中了，认为眼神自信，自信数量加一。
        const eyesCount = Math.round(face.eyesPos.length / EYE_GROUP_LEN);
        let ratio = 0;
        for (let i = 0; i < eyesCount; i++) {
          const startIndex = i * EYE_GROUP_LEN;
          const eyePos = face.eyesPos.slice(
            startIndex,
            startIndex + EYE_GROUP_LEN
          );

          const ey1 = eyePos[1];
          const ey2 = eyePos[3];
          const ey3 = eyePos[9];

          const ex1 = eyePos[4];
          const ex2 = eyePos[6];
          const ex3 = eyePos[8];

          const horizontalRatio = Math.abs(
            ((ey1 + ey2) / 2 - ey3) / (ey1 - ey2)
          );
          const verticalRatio = Math.abs(((ex1 + ex2) / 2 - ex3) / (ex1 - ex2));
          const sightRatio = horizontalRatio + verticalRatio;
          ratio += sightRatio;
        }
        ratio /= eyesCount;

        if (ratio < 0.2) {
          passedEyeCount += 1;
        }
      });
    }
  });

  return {
    smileCount,
    handCount,
    totalEyeCount,
    passedEyeCount
  };
}

export function calculate({
  recordSeconds,
  predictResult,
  config: { smileThreshold, smilePassLine }
}) {
  const aiSize = predictResult.length;

  const { smileCount, handCount, totalEyeCount, passedEyeCount } =
    parseAIFacePredictData({
      predictResult,
      smileThreshold
    });

  const handAverage = getHandAverage(handCount, recordSeconds);
  const sightAverage = getSightAverage(totalEyeCount, passedEyeCount);
  const smileAverage = getSmileAverage(smileCount, aiSize, smilePassLine);

  const smileScore = getSmileScore(smileCount, aiSize);
  const sightScore = getSightScore(passedEyeCount, totalEyeCount);
  const handScore = getHandScore(handCount, aiSize);

  return {
    handAverage,
    sightAverage,
    smileAverage,

    scoreMap: {
      smileScore,
      sightScore,
      handScore
    }
  };
}

2.5 worker/worker.js

importScripts('https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js')

let model;

(async () => {
  const baseUrl = self.location.href.replace(/\/[^\/]*$/, '')
  const modelUrl = `${baseUrl}/model.json`

  try {
    model = await tf.loadGraphModel(modelUrl)
    postMessage({ message: 'load_model_success' })
  } catch (e) {
    postMessage({ message: 'load_model_failed' })
  }
})()

const width = 640
const scale = 0.5

function transform(prediction) {
  let hands = []
  let faces = []

  prediction.forEach((object) => {
    let rates = object.splice(-4)
    let smileRate = rates[0]
    let bgRate = rates[1]
    let faceRate = rates[2]
    let handRate = rates[3]

    let pos = object.map((val) => {
      return val * width * scale
    })

    let type = [bgRate, faceRate, handRate].indexOf(Math.max(bgRate, faceRate, handRate))
    if (type === 1) {
      let facePos = pos.slice(0, 4)
      let eyesPos = pos.slice(4, 24)
      let mouthPos = pos.slice(24, 52)
      faces.push({
        faceRate,
        smileRate,
        facePos,
        mouthPos,
        eyesPos,
      })
    }
    if (type === 2) {
      hands.push({
        handRate,
        handPos: pos.slice(0, 4),
      })
    }
  })
  return {
    faces,
    hands,
  }
}

async function predict(imageData, imageId) {
  let data = []

  imageData.data.forEach((val, index) => {
    if (index % 4 !== 3) {
      data.push(val)
    }
  })

  let inputData = tf.tidy(() => {
    return tf.tensor(data).reshape([1, imageData.width, imageData.height, 3])
  })

  try {
    const t0 = performance.now()
    let prediction = await model.executeAsync(inputData)

    inputData.dispose()
    tf.dispose(inputData)
    inputData = null

    let output = tf.tidy(() => {
      return transform(prediction.arraySync())
    })

    const t1 = performance.now()
    postMessage({ message: 'predict_success', output, frame: 1000 / (t1 - t0), id: imageId })

    prediction.dispose && prediction.dispose()
    tf.dispose(prediction)
    prediction = null
    output = null
  } catch (error) {
    inputData && inputData.dispose()
    tf.dispose(inputData)
    inputData = null
    postMessage({ message: 'predict_failed', error })
  }
}

onmessage = function (event) {
  const { data } = event;
  const image = new ImageData(new Uint8ClampedArray(data.buffer), data.width, data.height)

  if (!model) {
    postMessage({ message: 'model_not_loaded' })
    return
  }

  predict(image, data.id)
}

TensorFlow

一、认识

1.1 视频录制时, 实时获取视频帧图像

1.2 在 `Worker` 环境中, 处理图像数据的机器学习任务

1.3 根据预测结果, 绘制脸型、眼型、嘴型、手势

1.4 根据预测结果, 计算眼神、微笑、手势得分

1.5 录制视频数据分片上传

二、实现

2.1 index.html

2.2 record.js

2.3 draw.js

2.4 calculate.js

2.5 worker/worker.js

2.6 worker/model.json

2.7 worker/group1-shard1of1.bin

三、沉淀与思考

3.1 如何根据预测结果, 计算得分?

3.2 面部预测时, 如何有效的管理 GPU 内存?

一、认识​

1.1 视频录制时, 实时获取视频帧图像​

1.2 在 Worker 环境中, 处理图像数据的机器学习任务​

1.3 根据预测结果, 绘制脸型、眼型、嘴型、手势​

1.4 根据预测结果, 计算眼神、微笑、手势得分​

1.5 录制视频数据分片上传​

二、实现​

2.1 index.html​

2.2 record.js​

2.3 draw.js​

2.4 calculate.js​

2.5 worker/worker.js​

2.6 worker/model.json​

2.7 worker/group1-shard1of1.bin​

三、沉淀与思考​

3.1 如何根据预测结果, 计算得分?​

3.2 面部预测时, 如何有效的管理 GPU 内存?​

一、认识

1.1 视频录制时, 实时获取视频帧图像

1.2 在 `Worker` 环境中, 处理图像数据的机器学习任务

1.3 根据预测结果, 绘制脸型、眼型、嘴型、手势

1.4 根据预测结果, 计算眼神、微笑、手势得分

1.5 录制视频数据分片上传

二、实现

2.1 index.html

2.2 record.js

2.3 draw.js

2.4 calculate.js

2.5 worker/worker.js

2.6 worker/model.json

2.7 worker/group1-shard1of1.bin

三、沉淀与思考

3.1 如何根据预测结果, 计算得分?

3.2 面部预测时, 如何有效的管理 GPU 内存?