跳到主要内容

HTML

2025年02月15日
柏拉文
越努力,越幸运

一、认识


基于 TensorFlow.js,集成人脸识别模型构建 AI 音视频录制工具,采用 WebGL 加速 AI 推理,优化多线程并行计算,确保录制帧率达 40 FPS+, 支持人脸/手势/微笑检测

一、音视频录制: 通过 navigator.mediaDevices.getUserMedia 获取用户的音频和视频流, 获取视频元素 (videoEl) 并将媒体流 (mediaStream) 赋值给 video 元素的 srcObject 属性,使视频可以播放。同时,监听视频的 play 事件,确保在视频播放时开始捕获每一帧,并将捕获的帧绘制到 Canvas 上。捕获视频帧并绘制到 Canvas 过程中, 利用 requestVideoFrameCallback(如果浏览器支持)或 requestAnimationFrame,实时捕获 video 元素的每一帧图像, 通过 createImageBitmap 方法将视频帧转化为 ImageBitmap 然后发送到 Worker 中。在 Worker 中使用 OffscreenCanvas 独立线程进行视频帧处理, 比如翻转, 然后采用 transferToImageBitmap() 零拷贝的方式将 OffscreenCanvas 直接转换为 ImageBitmap, 并通过 postMessage 传输处理后的 ImageBitmap 给主线程。然后通过 Canvas2D 上下文 (processCtx2D) 绘制到画布上。然后, 通过 this.processCanvas.captureStream(60) 获取每秒 60 帧的画布流。这个流包含了 Canvas 上绘制的视频帧, 将用户音频流的音频轨道添加到画布流中,实现视频和音频的合成。基于合成后的画布流,使用 MediaRecorder 创建一个录制器实例,并设置录制选项(如 mimeType)。如果录制时支持的 mimeType 不被浏览器支持,则会动态移除不支持的选项。每当 MediaRecorder 获取到录制数据时(通过 ondataavailable 事件),会将数据推送到 recordChunks 数组中,并且触发上传操作。上传数据会通过 uploadWorker 发送到后台,支持实时上传录制片段。每 5 秒钟,录制的数据会被打包并上传处理。上传与网络监控, 实现了网络状态监控,当网络状态变化时(如连接断开、速度变化等),将网络信息发送到 uploadWorker,实时调整上传策略, 并支持调度上传、网络恢复后的并发上传、上传失败重试等。另外,还处理了多种媒体权限和设备的异常情况。

1.1 多种媒体权限和设备的异常检测:

  1. 请求用户的摄像头和麦克风权限: 通过调用 navigator.mediaDevices.getUserMedia 方法,传入配置对象 {video: true, audio: true} 来同时请求视频和音频设备的访问权限。该方法返回一个 Promise,成功时会得到 MediaStream 对象,失败则会抛出异常。最后通过 mediaStream.getTracks() 获取所有媒体轨道, 并将所有媒体轨道停止来释放媒体资源。防止内存泄漏和设备资源占用,特别是在处理摄像头和麦克风这样的硬件资源时尤为重要。在视频录制结束或组件卸载时调用此方法,可以确保系统资源得到及时释放。

  2. 获取系统可用的媒体输入设备: 通过调用 navigator.mediaDevices.enumerateDevices() 方法获取所有媒体设备列表,然后进行了精细的设备筛选和分类处理。过滤出所有有效的输入设备( endsWith("input") )、排除了空设备ID的设备( deviceId !== "" )、分别筛选出音频输入设备( audioinput )和视频输入设备( videoinput )。我们允许允许用户在多个摄像头或麦克风之间切换, 支持用户选择最佳的录制设备(如高清摄像头或专业麦克风)。

  3. 通过 navigator.permissions.query({ name: "camera/microphone" }) 来检查摄像头和麦克风设备是否开启, 通过权限状态的 state 属性判断是否为 granted

  4. 调用 navigator.mediaDevices.getSupportedConstraints() 获取 window.navigator.mediaDevices.getUserMedia 支持的约束列表, 比如说: audio : {echoCancellation, noiseSuppression, autoGainControl}video : {width, height, frameRate, facingMode}

  5. 使用 window.MediaRecorder.isTypeSupported() 检查编码格式支持, 比如 Safari 不支持 new MediaRecorder(stream, { mimeType: "video/webm" }) mimeType 参数。不支持时,需要去掉参数。

1.2 获取用户的音频和视频流: 通过 navigator.mediaDevices.getUserMedia 获取用户的音频和视频流。getUserMediaWebRTCWeb Real-Time CommunicationAPI 的一部分,允许网页访问用户的摄像头和麦克风,为音视频流应用提供基础支持。它的主要作用是采集媒体流(MediaStream)。可以传入 constraints 约束参数, 来自定义摄像头和麦克风的行为。audio: 音频参数用于控制麦克风音质、降噪、回声消除、自动增益 等。video: 视频参数用于控制摄像头的行为,例如 分辨率、帧率、摄像头方向、设备 ID 等。

  • audio: deviceId: 指定音频输入设备。deviceId 通过 enumerateDevices() 获取所有麦克风; sampleRate: 数字,指定音频的采样率。{ ideal: 48000 }, 48kHz(48000Hz)高清音质(推荐); { ideal: 44100 }, 44100Hz 标准 CD 质量; { ideal: 8000 }, 8kHz 低质量电话音质; sampleSize: 采样大小(sampleSize)。{ ideal: 8 }, 8-bit; { ideal: 16 }, 16-bit 标准; { ideal: 24 }, 24-bit 高质量; { ideal: 32 }, 32-bit 专业级; channelCount: 数字,指定音频的通道数。 1=单声道,2=立体声, 单声道(1):适合语音通话; 立体声(2):适合音乐录制; autoGainControl: 布尔值, 指定是否启用 自动增益控制 autoGainControl 。 自动调整音量,防止声音过大或过小; echoCancellation: 布尔值,指定是否启用 回声消除(echoCancellation。用于语音通话,减少耳机或扬声器导致的回声; noiseSuppression: 布尔值,指定是否启用 噪声抑制(noiseSuppression。减少背景噪音(如风声、键盘声)。

  • video: width: 数字, 指定视频宽度分辨率。比如: width: 1920 或者 width: { ideal: 1280 } 或者 width: { min: 1024, ideal: 1280, max: 1920}; height: 数字, 指定视频高度分辨率。比如: height: 1080 或者 width: { ideal: 1080 } 或者 width: { min: 576, ideal: 1080, max: 1280}; deviceId: 指定视频输入设备; frameRate: 数字,指定视频的 帧率(frameRate)。视频帧率(FPS),控制流畅度,通常 30FPS 适用于一般应用,60FPS 适用于高性能场景; facingMode: 字符串,指定选择前置还是后置摄像头。 user(前置摄像头),environment(后置摄像头)。如果要动态切换, 必须使用后置摄像头, 如果: facingMode: { exact: "environment" } ; resolution: 对象,指定理想或精确的分辨率,如:{ width: 1280, height: 720 }; aspectRatio: 数字,指定视频宽高比、纵横比(aspectRatio), 比如: aspectRatio: 1.7777 或者 aspectRatio: { ideal: 1.7777777778 } 或者 aspectRatio: { min: 1, ideal: 1.7777777778, max: 2 }; noiseSuppression: 降噪, 布尔值。开启摄像头的降噪功能,提高视频清晰度(部分浏览器支持)

1.3 将用户媒体流渲染到 Video 元素, 捕获视频帧并绘制到 Canvas: 获取视频元素 (videoEl) 并将媒体流 (mediaStream) 赋值给 video 元素的 srcObject 属性,使视频可以播放。设置 muted = true , 防止麦克风录到视频播放的声音, 如果视频有声音,麦克风可能会二次录制,导致 回声、杂音。同时,监听视频的 play 事件,确保在视频播放时开始捕获每一帧,并将捕获的帧绘制到 Canvas 上。捕获视频帧并绘制到 Canvas 过程中, 利用 requestVideoFrameCallback(如果浏览器支持)或 requestAnimationFrame,实时捕获 video 元素的每一帧图像, 通过 createImageBitmap 方法将视频帧转化为 ImageBitmap 然后发送到 Worker 中。在 Worker 中使用 OffscreenCanvas 独立线程进行视频帧处理, 比如翻转, 然后采用 transferToImageBitmap() 零拷贝的方式将 OffscreenCanvas 直接转换为 ImageBitmap, 并通过 postMessage 传输处理后的 ImageBitmap 给主线程。然后通过 Canvas2D 上下文 (processCtx2D) 绘制到画布上。

  1. 捕获视频帧: 首选使用最新的 requestVideoFrameCallback API, 这是专门为视频帧同步设计的 API,能够提供精确的时间戳和元数据。 如果不支持, 降级使用 requestAnimationFrame ,与浏览器的渲染循环同步,提供流畅的帧捕获。如果还不支持, 后降级到 setInterval ,以固定 30fps 的速率模拟帧捕获。并且, 在封装的回调中, 通过检查 video.pausedvideo.ended 状态确保视频处于播放状态, 才会进行下一视频帧的回调处理。setInterval: 无法精准同步视频帧,可能出现帧丢失或重复 requestAnimationFrame: 无法精准同步视频帧,可能出现帧丢失或重复。requestAnimationFrame 不保证 每次都能精准对应视频帧,可能有丢帧或重复帧问题。requestVideoFrameCallback 通过视频帧驱动回调,确保每一帧都能精准捕获,并提供额外的帧元数据(timestamppresentationTimeexpectedDisplayTime)。而且 requestVideoFrameCallback 保证 只在新视频帧可用时触发,避免多余计算。因此, requestVideoFrameCallback 可以用于高效同步 视频帧,不会错过或重复帧, 帧驱动 回调,更适用于 视频分析、AI 计算、滤镜渲染。比 requestAnimationFrame 更精准,但需要 浏览器支持。

  2. 视频帧绘制: 将 video 元素通过 createImageBitmap 转换为 imageBitmap, 发送到 Worker。并通过 postMessagetransfer 参数传递 imageBitmap 可转移对象, 将控制权转移给 Worker线程,避免复制带来的性能损耗。在 Worker 中, 利用 OffscreenCanvas 进行高性能的图像处理,通过 setTransform() 高效实现水平翻转(镜像) 功能。它支持初始化画布 (init) 和 处理图像 (process),采用 transferToImageBitmap() 零拷贝的方式将 OffscreenCanvas 直接转换为 ImageBitmap, 并通过 postMessage 传输处理后的 ImageBitmap 给主线程, 并将 ImageBitmap 直接将控制权转移给主线程,避免复制带来的性能损耗。基于 Web Worker 绘制、处理避免主线程计算开销,提高渲染效率。最后在主线程中, 直接绘制 ImageBitmap 即可。

1.4 获取 Canvas 画布流: 通过 this.processCanvas.captureStream(60) 获取每秒 60 帧的画布流。这个流包含了 Canvas 上绘制的视频帧。将用户音频流的音频轨道添加到画布流中,实现视频和音频的合成。const audioTracks = this.stream.getAudioTracks(); audioTracks.forEach((track) => this.processStream.addTrack(track));

1.5 实例化录制对象 MediaRecorder: 基于合成后的画布流,使用 MediaRecorder 创建一个录制器实例,并设置录制选项(如 mimeType)。注册录制器的回调函数,监听录制器的事件,如 onstart(录制开始)、onstop(录制结束)、onpause(暂停录制)、onresume(恢复录制)等。

1.6 录制数据获取与上传: 我们通过每隔 5s 调用 mediaRecorder.requestData 来手动控制触发 mediaRecorder.ondataavailable 事件。每当 MediaRecorder 获取到录制数据时(通过 ondataavailable 事件),会将数据推送到 recordChunks 数组中,并且触发上传操作。上传数据会通过 uploadWorker 发送到后台,支持实时上传录制片段。为了准确计时,不受主线程阻塞的影响, 我们将录制计时的操作也放到了 Worker

1.7 后台调度发送录制数据片段: Worker 中实现了 Scheduler 调度器, 每当接收到录制数据片段, 作为一个待发送任务添加到调度器,等待调度执行, 每个任务支持请求失败重试, 并且还会根据网络的状态来自动暂停/恢复调度执行。

  1. 接收 init 消息,初始化 Scheduler 调度器: Scheduler 调度器, addTask: 添加任务,进入任务队列, 执行 scheduler 调度函数; pause: 暂停调度任务执行; resume: 继续任务调度执行 ; scheduler: while 循环检测 parallelism、暂停状态、队列长度, 进行调度执行任务

  2. 主线程监听 navigator.connection change / window online / window offline 事件: 在 navigator.connection change 事件中, 获取 navigator.connection.downlink 当前网速, 通过 postMessage 发送到 Worker 中, Worker 中判断 downlink, navigator.connection.downlink < 1 Mbpsscheduler.pause() 暂停上传; navigator.connection.downlink > 1 Mbpsscheduler.resume() 继续上传; 在 window online 网络恢复事件中, 通过 postMessage 发送到 Worker 中, 执行 scheduler.resume() 继续上传任务; 在 window offline 网络中断事件中, 通过 postMessage 发送到 Worker 中, 执行 scheduler.pause() 暂停上传任务;

  3. 添加任务失败重试逻辑(任务失败自动重试 3 次)

  4. 主线程每隔 5s 执行 recorder.requestData(), 在 ondataavailable 中将 5s 的录制数据 通过 postMessage 发送到 Worker 中, 并直接将录制数据的控制权转移给 Worker, 节省复制带来的性能损耗。将上传请求进行请求重试的包装, 允许失败重试, 重试次数达到 3 此后,将这段数据的上传置为失败。

二、人脸识别与识别特征绘制: 这个过程一共有三个 Canvas 分别为: 与视频流同步绘制的 Canvas、用于获取人脸识别图像数据的 Canvas 和用于绘制识别特征的 Canvas。将视频流同步绘制的 Canvas 中的每一帧绘制到获取人脸识别图像数据的 Canvas 上, 并进行缩放处理, 将处理后的 Canvas 图像数据传递给 Web Worker,进行进一步的人脸、手部检测推理。在 Web Worker 中使用 TensorFlow.js 加载、运行已经训练好的 graph-model 格式的人脸识别模型, 预测、推理视频帧图像数据中的人脸信息, 并返回预测结果。主线程接收到识别后的特征信息开始进行绘制。

2.1 获取视频帧并缩放处理: 将视频流同步绘制的 Canvas 中的每一帧绘制到获取人脸识别图像数据的 Canvas 上, 并进行缩放处理。使用 OffscreenCanvas 后台进行图像缩放, 采用 transferToImageBitmap() 零拷贝的方式将 OffscreenCanvas 直接转换为 ImageBitmap, 并通过 postMessagetransfer 参数传递 ImageBitmap 可转移对象, 将控制权转移给主线程,避免复制带来的性能损耗。

  1. 缩放图像: 一方面: 缩放图像, 可以降低采样、压缩图像数据输入,降低模型输入数据量,从而加速模型推理。缩小图像尺寸(例如,从大图到 320x320``)不仅能加速识别,还能减少模型处理过程中的计算复杂度。TensorFlow 进行图像识别时,越小的图像尺寸通常会带来更快的处理速度。另一方面: model.json 模型会期望一致的输入图像大小,因为它已经在这个尺寸上进行训练。使用不正确的输入尺寸可能会导致模型无法有效地捕捉到重要的图像特征,从而影响识别精度。通过这种方式,你确保图像缩放不会影响关键特征的捕捉

  2. 使用 OffscreenCanvas 后台进行图像缩放: OffscreenCanvas 是一种在 Web Worker 中进行图形渲染的 API,可以在后台线程中进行图像处理,从而避免阻塞主线程。

  3. postMessage transfer 参数: 这是一个可选的参数,是一个包含可转移对象的数组。可转移对象的所有权将从发送者转移到接收者,而不是复制,从而提高性能。类型: Array,包含 ArrayBufferMessagePortImageBitmap 等可转移对象。作用: 将可转移对象的所有权从发送者转移到接收者,避免数据复制,提高性能。注意, 一旦将对象控制权转移之后, 发送者自身将无法继续使用。

2.2 模型加载并优化: 在 Worker 中通过 importScripts 加载 TensorFlow.js,并使用 tf.loadGraphModel 加载 model.json。基于 Service Worker 缓存 TensorFlow 模型文件和相关的 JavaScript 库,并控制缓存的版本。定制 fetchFunc,优先从 caches.match(url) 先尝试从浏览器缓存中获取资源,若存在则直接返回缓存内容,否则执行网络请求。这样可以降低网络延迟、加快模型加载速度。原因: 模型文件较大(model.json + bin 文件),频繁请求会增加带宽消耗并影响加载速度。使用 caches.match(url) 优先读取缓存中的模型,避免重复下载,提高应用的冷启动速度,适用于 Web 端的离线 AI 识别。

  1. 什么是 Tensorflow.js: TensorFlow.js 是一个开源的 JavaScript 实现的机器学习库,允许在浏览器和 Node.js 环境中运行机器学习模型,执行训练和推理。

  2. 注册 Service Worker: 通过 navigator.serviceWorker.register 来注册 Service Worker。并注意: service worker 最大的作用域是 worker 所在的位置(换句话说,如果脚本 sw.js 位于 /js/sw.js 中,默认情况下它只能控制 /js/ 下的 URL)。可以使用 Service-Worker-Allowed 标头指定 worker 的最大作用域列表。如果 navigator.serviceWorker.register(workerPath, { scope: "xx"}) 超过最大作用域, 会报错。

  3. 编写 Service Worker: 1. 缓存版本控制:通过 CACHE_VERSIONCACHE_NAME,为缓存命名时加入版本号,确保每次更新资源时能清理旧的缓存并使用新的缓存。每次版本更新时只保留当前版本的缓存,删除旧版本缓存。2. 安装阶段 (install):在 Service Worker 安装时,在 event.waitUntil 回调中, 使用 caches.open 打开缓存,并将指定的文件(model.jsongroup1-shard1of1.binTensorFlow JS 库)通过 cache.addAll 添加到缓存中。 使用 event.waitUntil 确保 install 事件处理完成之前,Service Worker 不会进入 activated 状态,防止安装阶段未完成就进行缓存管理。3. 激活阶段 (activate):在 Service Worker 激活时,在 event.waitUntil 回调中, 从 caches 中清理掉不再需要的旧版本缓存,确保只保留当前版本的缓存,避免浪费存储空间。4. 拦截请求 (fetch):在拦截 fetch 请求时,首先判断请求的 URL 是否匹配需要缓存的文件。如果是,通过 event.respondWith 劫持响应。在 event.respondWith 回调中, 我们通过 caches.match(event.request) 对网络请求里的每个资源与缓存里可获取的等效资源进行匹配,查看缓存中是否有相应的资源, 如果有缓存,则尝试从缓存中加载文件;如果缓存中没有,则通过网络请求文件并将其缓存。

  4. Web Worker 加载模型: 在 Web Worker 独立线程中, 通过 tf.loadGraphModel 来加载模型相关资源, 并添加 fetchFunc 参数来自定义资源请求逻辑。确保 TensorFlow.js 能够首先从缓存中加载模型文件,而不是默认通过网络请求。通过这种方式,可以减少重复的网络请求,提高性能和离线支持。Web Worker 用于在浏览器主线程之外执行 JavaScript,适用于计算密集型任务; Service Worker 主要用于缓存和拦截网络请求,适合离线支持、推送通知等场景。

2.3 模型预检测(预热): 当模型加载成功后,则会调用 postPreflightPredictionMessage。这个预检测逻辑的作用是发送一个初始的预测请求,这通常是为了预热 模型,使得后续的预测操作更加高效。预检测(Preflight Prediction, 预热模型:加载深度学习模型后,第一次推理通常会比较慢,预热操作可以帮助模型加载时消除这个延迟,保证后续预测的流畅性。确保模型工作正常:通过预检测,确保模型加载正确且可以正常进行推理。这可以帮助捕获一些可能的加载问题(如模型文件损坏、网络延迟等)。资源初始化:有时候模型在加载后会进行一些初始化操作,预检测能够帮助确保这些操作成功完成。

2.4 图像数据预处理: 1. 采用 ImageData Buffer 作为输入, 将 RGBA 转换为 RGB(丢弃 Alpha 通道) 以适配 TensorFlow.js丢弃 Alpha 通道原因ImageData.data 采用 RGBA 格式(4 通道), 但大多数模型仅支持 RGB3 通道)。丢弃 Alpha 通道(透明度), 减少 25% 数据量,提高 tf.tensor() 处理效率, 避免不必要的计算资源消耗。 2. 然后使用 tf.tensor(data) 将预处理后的 Buffer 数组转换为一维张量。3. 但是 Graph Model 期望输入的是一个具有四个维度的张量, 格式为 [batch, height, width, channels], batch 表示批量维度, 模型通常一次处理多个样本,即一个批次(batch)。即使你只处理单个图像,也需要在最前面加上一个批量维度,这里为 1height/width 表示图像的空间尺寸, 图像有高度和宽度两个维度,分别对应 imageData.heightimageData.widthchannels 表示通道数(Channels), 对于 RGB 图像,通道数为 3(分别对应红、绿、蓝三个颜色通道)。因此, 为了确保输入张量的形状与模型的输入要求相匹配, 一维张量还需要通过调用 .reshape([1, imageData.width, imageData.height, 3]) 将一维数组转换成一个符合模型要求格式的四维张量。tf.tidy 的使用, 确保在闭包中创建的中间 Tensor 在函数执行结束时被自动释放,防止内存泄露。

2.5 异步推理与优化: 使用 model.executeAsync(inputData) 进行推理,并用 performance.now() 计算帧率,衡量推理性能。采用 tf.tidy() 和 手动 dispose() 释放 Tensor 张量,避免 Web Worker 运行过程中 GPU/内存泄漏。model.executeAsync 内部计算流程: 1. 输入验证与准备, 当调用 model.executeAsync(inputTensor) 时,首先会对传入的输入张量进行验证(检查形状、数据类型等),确保符合计算图的要求。此时,输入张量已经准备好,通常是通过 tf.tensor() 创建并经过 .reshape([batch, height, width, channels]) 格式化好的数据。2. 图执行器(Graph Executor)的调度, TensorFlow.js 内部会使用一个 GraphExecutor 来管理整个计算图, GraphExecutor 对计算图进行拓扑排序,确定每个节点(即操作)的依赖关系,并生成一个执行计划。每个操作节点都被安排在一个正确的执行顺序中, 只有当所有依赖的前置节点完成计算后,当前节点的计算才会被触发。3. 调度到后端执行, 根据执行计划,GraphExecutor 将每个操作节点依次提交到相应的后端, 例如,在 WebGL 后端中,每个操作会调用 runProgram,该方法会根据操作类型动态编译或选择已有的 shader 程序,并将输入数据上传到 GPU 内存(通常以纹理或缓冲区形式)。4. 异步 GPU 计算,提交到 GPU 后,shader 程序会在 GPU 指令队列中异步执行, 这种异步执行意味着 JavaScript 不会被阻塞,计算任务在后台并行运行。一些操作可能涉及异步数据传输或特殊的 GPU 调用(例如利用 gl.readPixels 异步读取数据),这都在 Promise 机制的控制下进行。5. 依赖管理与逐层计算, GraphExecutor 负责管理各节点之间的依赖关系,只有在所有依赖节点完成计算后,才会触发下一个操作。这种层层推进的执行方式保证了计算图中所有节点的计算都能按照正确的顺序并行调度。6. GPUCPU 的数据传输, 当计算图中最后一个操作完成后,如果最终结果需要返回给 JavaScript 层(例如调用 .arraySync()),后端会启动一个异步数据传输过程。WebGL 后端会调用类似 gl.readPixelsAPI,将 GPU 内存中的数据异步读取回 CPU 内存,这一步也不会阻塞主线程,而是通过 Promise 处理。7. Promise 解析与输出返回, 当所有计算操作和数据传输完成后,GraphExecutor 会收集所有输出张量。model.executeAsync 返回的 Promise 会在此时解析,并将输出 Tensor(或 Tensor 数组)作为结果返回给调用者。

  1. 采用异步推理,提高 FPS: Web Worker 运行在单独的线程,但 TensorFlow.js 推理会占用 GPU/CPU 计算资源,如果同步执行,可能会阻塞整个 Worker,影响 UI 交互。

  2. 释放内存,防止 GPU 泄漏: TensorFlow.js 在推理过程中会创建大量的 Tensor,如果不手动释放,它们会一直占用 WebGL/GPU 内存,导致显存泄漏、应用崩溃。主动调用 dispose() 释放 Tensor,减少 WebGL 资源占用。tf.tidy(() => {...}) 作用域管理,确保中间变量及时释放。

2.6 输出转换与分类: 利用 prediction.arraySync()Tensor 转换为 JavaScript 数组,再调用 transform 函数转换为结构化结果。也就是说, 对原始模型输出进行后处理,将浮点数数组映射为易于使用和理解的结构。具体逻辑为: 提取置信度, 通过 splice(-4) 移除数组末尾的 4 个数,分别代表 smileRate(笑容概率)、bgRate(背景概率)、faceRate(人脸概率)和 handRate(手部概率)。剩余部分 是归一化的坐标数据,将其乘以 width * scale 转换为实际像素坐标。根据提取的置信度, 比较背景、人脸、手部的概率,确定当前预测属于哪一类别。根据类别分别提取不同部位的位置信息(如人脸的边框、眼睛、嘴巴区域;手部的边框),并存入对应数组中。

2.7 Web Worker 通信: 监听 onmessage 处理主线程发送的图像数据,执行预测后 postMessage 传回识别结果,支持 异步高效通信。

2.8 绘制识别特性信息: 一旦 Worker 返回了预测结果,主线程会使用这些数据在 Canvas 上绘制出相关的特征信息,主要负责人脸框、手势框、嘴型框、眼势框。每个特征(如脸部、眼睛、嘴巴、手部等)都有对应的颜色、透明度设置。

  1. 计算缩放率: Web Worker 识别的是缩放后的视频帧图像, 因此, 识别后的特征信息坐标都是基于缩放后的。因此,我们需要计算缩放率, 绘制时按比例放大坐标。

  2. 绘制人脸框: 如果 faceRate 值大于 0.3, 说明有检测到人脸特征, 通过特征的坐标 x1, y1, x2, y2 来绘制人脸框。

  3. 绘制眼势框特征: 在检测到人脸特征的基础上, 我们只绘制自信的眼神。自信眼神的判断方式为: 眼珠偏移水平中心点距离占水平半径的比例,与眼珠偏移垂直中心点距离占垂直半径的比例,两者平均值小于 0.2,当作眼睛居中了,认为眼神自信。

    • 计算自信眼神数 (getSightPassCount): 遍历所有的人脸数据,每个人脸的数据中,eyesPos 是一个包含所有眼睛关键点位置的数组。为了将这些关键点按组进行处理,我们使用 EYE_GROUP_LEN (这里是 10 个)来划分每一组。这样每个人脸的眼睛组数 eyesCount 就被确定了。对于每一组眼睛,计算它们的对称性,通过水平(horizontalRatio)和垂直(verticalRatio)坐标的比例来衡量眼睛的对称性。这里的对称性计算的是: 水平(horizontalRatio:通过眼睛的三个 Y 坐标值来进行计算。具体是取 ey1ey2 的中点,与 ey3 的位置差值相比。垂直(verticalRatio 通过眼睛的三个 X 坐标值来进行计算, 计算了 ex1ex2 的中点与 ex3 的位置差值。然后,sightRatio 是水平比率和垂直比率的总和。对于每一组眼睛的 ratio,代码将它与 eyesCount 进行平均,计算出当前人脸的平均视线比例。如果 ratio 小于 0.035,说明该组眼睛符合视线通过的条件,因此将 passedEyeCount 增加 1。最后,返回统计出的自信眼神数量。

    • 计算自信眼神平均值: passedEyeCount 是自信眼神数量, 除以 totalEyeCount 是眼睛的总数(所有人脸的眼睛组数量)表示自信眼神占所有眼睛的比例。

  4. 绘制嘴型框特征: 在检测到人脸特征的基础上, 如果微笑 smileRate 值大于 0.4, 绘制嘴型框

  5. 绘制手势框特征: 直接绘制手势框。

2.9 继续获取下一帧视频帧,重复上面的处理流程: 我们的逻辑是上一帧的缩放、识别、绘制之后,才会进行下一帧的绘制。

二、实现


2.1 index.js

import { VideoRecorder } from "../videoRecorder/index.js";
import { FaceDrawCanvas } from "../faceDrawCanvas/index.js";
import { calculatePredictScore } from "../predictScore/index.js";
import { FacePredictWorker } from "../facePredictWorker/index.js";
import { FacePredictCanvas } from "../facePredictCanvas/index.js";

export class AIVideoRecorder {
constructor(options) {
const {
videoElId,
videoWidth,
videoHeight,
videoDeviceId,
audioDeviceId,
aiScoreSetting,
predictImageSize,
faceDrawCanvasElId,
videoUploadWorkerUrl,
facePredictCanvasElId,
facePredictWorkerUrl,
processVideoCanvasElId,
videoProcessorWorkerUrl,
videoRecorderTimeWorkerUrl,
} = options;

this.predictResult = [];
this.modelStatus = false;
this.videoElId = videoElId;
this.videoWidth = videoWidth;
this.videoHeight = videoHeight;
this.audioDeviceId = audioDeviceId;
this.videoDeviceId = videoDeviceId;
this.faceDrawCanvasElId = faceDrawCanvasElId;
this.predictImageSize = predictImageSize || 320;
this.facePredictWorkerUrl = facePredictWorkerUrl;
this.videoUploadWorkerUrl = videoUploadWorkerUrl;
this.facePredictCanvasElId = facePredictCanvasElId;
this.processVideoCanvasElId = processVideoCanvasElId;
this.videoProcessorWorkerUrl = videoProcessorWorkerUrl;
this.videoRecorderTimeWorkerUrl = videoRecorderTimeWorkerUrl;

this.videoEl = document.getElementById(this.videoElId);
this.aiScoreSetting = aiScoreSetting || {
smileThreshold: 0.4,
volumeThreshold: 0.2,
smilePassLine: 0.6,
};

this.videoRecorder = null;
this.faceDrawCanvas = null;
this.facePredictWorker = null;
this.facePredictCanvas = null;

this.run();
}

run() {
this.initWorker();
this.initVideoAudio();
}

initCanvas(videoEl) {
this.faceDrawCanvas = new FaceDrawCanvas({
canvasElId: this.faceDrawCanvasElId,
});
this.facePredictCanvas = new FacePredictCanvas({
videoEl: videoEl,
videoWidth: this.videoWidth,
videoHeight: this.videoHeight,
canvasElId: this.facePredictCanvasElId,
predictImageSize: this.predictImageSize,
});
}

initWorker() {
this.facePredictWorker = new FacePredictWorker(this.facePredictWorkerUrl, {
predictImageSize: this.predictImageSize,
predictSuccessCallback: this.predictSuccessCallback,
loadModelSuccessCallback: this.loadModelSuccessCallback,
});
}

getRecorderVideoUrl = () => {
return this.videoRecorder.getRecorderChunkUrl();
};

async initVideoAudio() {
this.videoRecorder = new VideoRecorder({
videoWidth: this.videoWidth,
videoHeight: this.videoHeight,
videoElId: this.videoElId,
videoConstraint: {
noiseSuppression: true,
deviceId: this.videoDeviceId,
width: { ideal: this.videoWidth },
height: { ideal: this.videoHeight },
},
audioConstraint: {
noiseSuppression: true, // 降噪
echoCancellation: true, // 回声消除
autoGainControl: true, // 自动增益
deviceId: this.audioDeviceId,
},
stopCallback: this.videoStopCallback,
pauseCallback: this.videoPauseCallback,
resumeCallback: this.videoResumeCallback,
uploadWorkerUrl: this.videoUploadWorkerUrl,
processWorkerUrl: this.videoProcessorWorkerUrl,
processVideoCanvasElId: this.processVideoCanvasElId,
recorderTimeWorkerUrl: this.videoRecorderTimeWorkerUrl,
});
const stream = await this.videoRecorder.prepareRecord();
this.initCanvas(this.videoRecorder.processCanvas);
}

predict = async () => {
const imageData = await this.facePredictCanvas.getPredictImageData(
this.videoEl,
this.predictImageSize
);

this.facePredictWorker.postMessage(imageData);
};

startRecord = () => {
if (!this.modelStatus) {
console.log("模型未加载");
return;
}

if (this.videoRecorder.status !== "notStarted") {
return;
}

this.videoRecorder.startRecorder();
this.predict();
};

pauseRecord = () => {
this.videoRecorder.pauseRecorder();
};

resumeRecord = () => {
this.videoRecorder.resumeRecorder();
};

stopRecord = () => {
this.videoRecorder.stopRecorder();
};

videoStopCallback = () => {
this.faceDrawCanvas.clear();
this.audioVolumeCanvas.clear();

const predictScoreResult = calculatePredictScore({
config: {
...this.aiScoreSetting,
},
duration: 30, // 暂时先写死
predictResult: this.predictResult,
soundVolumes: [],
});
console.log("predictScoreResult", predictScoreResult);
};

videoPauseCallback = () => {
this.faceDrawCanvas.clear();
this.audioVolumeCanvas.clear();
};

videoResumeCallback = () => {
this.predict();
};

predictSuccessCallback = (predictData) => {
const { id, output } = predictData;

if (id === "preflight") {
return;
}
if (this.videoRecorder.status !== "recording") {
return;
}

if (output && output.faces?.length !== 0) {
this.faceDrawCanvas?.draw?.(predictData);
this.predictResult.push({ ...output });
}

this.predict();
};

loadModelSuccessCallback = () => {
this.modelStatus = true;
};
}

2.2 predictScore.js

const DEFAULT_SCORE = 3;
const EYE_GROUP_LEN = 10;

function getASRSentencesDuration(sentences) {
const sentencesCount = sentences.length;
if (sentencesCount === 0) {
return 0;
}

const firstSentences = sentences[0];
const lastSentences = sentences[sentencesCount - 1];

const startTime = firstSentences.start_time;
const endTime = lastSentences.end_time;
const duration = endTime - startTime; // ms

const minutes = duration / 1000 / 60;

return minutes;
}

/**
* 计算腾讯语音转文字每分钟多少字符
* @param 腾讯ASR
* @returns 每分钟多少字符
*/
function getASRSentencesSpeed(sentences) {
const duration = getASRSentencesDuration(sentences);
if (duration === 0) {
return "0";
}

// 总体的字符数,包括逗号、句号、问号
let totalText = "";
sentences.forEach((sentence) => {
totalText += sentence.voice_text_str;
});

// 去除逗号、句号、问号后的字符总长度
// const totalCharLength = totalText.replace(/,|。|?/g, '').length
const totalCharLength = totalText.length;
const charLengthPerMinutes = totalCharLength / duration;

return charLengthPerMinutes;
}

function parseSentenceData(sentenceData) {
return sentenceData.map((sentence) => ({
start_time: sentence.start_time,
end_time: sentence.end_time,
}));
}

function getHandAverage(handCount, duration) {
const handAverage = (handCount / duration) * 60;
return handAverage;
}

function getSightAverage(totalEyeCount, passedEyeCount) {
if (totalEyeCount === 0) {
return "0";
}

const average = totalEyeCount == 0 ? 0 : passedEyeCount / totalEyeCount;
return average;
}

function getSmileAverage(smileCount, aiSize, smilePassLine) {
let smileAverage = 0;

if (smileCount < 1) {
smileAverage = smilePassLine;
} else if (smileCount < 2) {
smileAverage = smilePassLine;
} else {
smileAverage = Math.min(
1,
smilePassLine + (1 - smilePassLine) * ((4 * smileCount) / aiSize)
);
}

return smileAverage;
}

function getVolumeAverage(soundVolumes, volumeThreshold) {
if (soundVolumes.length === 0) {
return "0";
}
let valueTotal = 0;
let volumeCount = 0;

soundVolumes.forEach((item) => {
if (item > volumeThreshold) {
volumeCount += 1;
valueTotal += item;
}
});

const volumeAverage = (valueTotal / volumeCount) * 100;
return volumeAverage;
}

function getSmileScore(smileCount, aiSize) {
let newScore = DEFAULT_SCORE * 10;
if (!aiSize || !smileCount) {
return 0;
} else if (smileCount < 4) {
newScore += smileCount;
return newScore / 10;
}
const smileRate = smileCount / aiSize;
const scoreLevelList = [
1 / 300,
2 / 300,
3 / 300,
4 / 300,
1 / 60,
4 / 180,
5 / 180,
6 / 180,
1 / 10,
2 / 10,
3 / 10,
4 / 10,
5 / 10,
0.6,
0.8,
1,
];
newScore = newScore + 4;
for (let i = 0; i < scoreLevelList.length; i++) {
const rate = scoreLevelList[i];
newScore++;
if (smileRate <= rate) {
break;
}
}
return newScore / 10;
}

function getSightScore(passedEyeCount, totalEyeCount) {
if (!passedEyeCount) {
return 0;
} else {
const ratio = passedEyeCount / totalEyeCount;
const num = Math.round(1.9 * ratio * 10) / 10;
return 3.1 + num;
}
}

function getHandScore(handCount, aiSize) {
if (!handCount) {
return 0;
}
const handRate = handCount / aiSize;
if (handRate > 0.52) {
return 5;
}
const handLevelList = [
0.005, 0.01, 0.015, 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.16, 0.2, 0.24,
0.28, 0.32, 0.36, 0.4, 0.44, 0.48, 0.52,
];
let newScore = DEFAULT_SCORE * 10;

for (let i = 0; i < handLevelList.length; i++) {
const rate = handLevelList[i];
newScore++;
if (handRate <= rate) {
break;
}
}

return newScore / 10;
}

function parseAIFacePredictData({ predictResult, smileThreshold }) {
let smileCount = 0;
let handCount = 0;
let totalEyeCount = 0;
let passedEyeCount = 0;

predictResult.forEach((item) => {
const { hands, faces } = item;
if (hands.length > 0) {
handCount += item.hands.length;
}
if (faces.length > 0) {
// 累加脸部数量
totalEyeCount += faces.length;

faces.forEach((face) => {
// 微笑程度大于0.2判断为微笑,微笑数量加1。
if (face.smileRate > smileThreshold) {
smileCount += 1;
}

// 眼珠偏移水平中心点距离占水平半径的比例,与眼珠偏移垂直中心点距离占垂直半径的比例,两者平均值小于0.2,当作眼睛居中了,认为眼神自信,自信数量加一。
const eyesCount = Math.round(face.eyesPos.length / EYE_GROUP_LEN);
let ratio = 0;
for (let i = 0; i < eyesCount; i++) {
const startIndex = i * EYE_GROUP_LEN;
const eyePos = face.eyesPos.slice(
startIndex,
startIndex + EYE_GROUP_LEN
);

const ey1 = eyePos[1];
const ey2 = eyePos[3];
const ey3 = eyePos[9];

const ex1 = eyePos[4];
const ex2 = eyePos[6];
const ex3 = eyePos[8];

const horizontalRatio = Math.abs(
((ey1 + ey2) / 2 - ey3) / (ey1 - ey2)
);
const verticalRatio = Math.abs(((ex1 + ex2) / 2 - ex3) / (ex1 - ex2));
const sightRatio = horizontalRatio + verticalRatio;
ratio += sightRatio;
}
ratio /= eyesCount;

if (ratio < 0.2) {
passedEyeCount += 1;
}
});
}
});

return {
smileCount,
handCount,
totalEyeCount,
passedEyeCount,
};
}

export function calculatePredictScore({
duration,
predictResult,
soundVolumes,
totalSentences,
config: { smileThreshold, volumeThreshold, smilePassLine },
}) {
const aiSize = predictResult.length;

const { smileCount, handCount, totalEyeCount, passedEyeCount } =
parseAIFacePredictData({
predictResult,
smileThreshold,
});

const handAverage = getHandAverage(handCount, duration);
const sightAverage = getSightAverage(totalEyeCount, passedEyeCount);
const smileAverage = getSmileAverage(smileCount, aiSize, smilePassLine);
const volumeAverage = getVolumeAverage(soundVolumes, volumeThreshold);

const asr = totalSentences
? {
charLengthPerMinutes: getASRSentencesSpeed(totalSentences),
duration: getASRSentencesDuration(totalSentences),
totalSentences: parseSentenceData(totalSentences),
}
: undefined;

const smileScore = getSmileScore(smileCount, aiSize);
const sightScore = getSightScore(passedEyeCount, totalEyeCount);
const handScore = getHandScore(handCount, aiSize);

return {
handAverage,
sightAverage,
smileAverage,
volumeAverage,
asr,

scoreMap: {
smileScore,
sightScore,
handScore,
},
};
}

2.3 serviceWorker.js

const CACHE_VERSION = "v2"; // 设置缓存的版本号
const CACHE_NAME = `tensorflow-model-cache-${CACHE_VERSION}`; // 缓存名称包含版本号
const cacheUrlList = [
"./facePredictWorker/model.json",
"./facePredictWorker/group1-shard1of1.bin",
"https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js",
];

// 安装阶段:缓存模型文件
self.addEventListener("install", (event) => {
event.waitUntil(
caches.open(CACHE_NAME).then((cache) => {
return cache.addAll(cacheUrlList);
})
);
});

// 激活阶段:删除旧的缓存
self.addEventListener("activate", (event) => {
const cacheWhitelist = [CACHE_NAME]; // 只保留当前版本的缓存
event.waitUntil(
caches.keys().then((cacheNames) => {
return Promise.all(
cacheNames
.filter((cacheName) => !cacheWhitelist.includes(cacheName)) // 删除不在白名单中的缓存
.map((cacheName) => caches.delete(cacheName))
);
})
);
});

function checkIsUseCache(requestUrl) {
const newCacheUrlList = cacheUrlList.map((url) => {
if (url.startsWith("./")) {
url = url.replace("./", "");
}
return url;
});

return newCacheUrlList.some((url) => requestUrl.includes(url));
}

// 拦截网络请求,使用缓存加载模型
self.addEventListener("fetch", (event) => {
const requestUrl = event?.request?.url || "";
const isUseCache = checkIsUseCache(requestUrl);

if (isUseCache) {
event.respondWith(
caches.match(event.request).then((cachedResponse) => {
// 如果缓存中有请求的文件,直接返回缓存
return cachedResponse || fetch(event.request);
})
);
}
});

2.4 faceDrawCanvas.js

const EYE_GROUP_LEN = 10;

const CANVAS_PALETTE = {
face: {
color: "#FFD92B",
pointColor: "#FF9800",
opacity: ["100%", "20%"],
},
eye: {
color: "#46D06E",
pointColor: "#1F9E40",
opacity: ["100%", "20%"],
},
mouth: {
color: "#FFA43E",
pointColor: "#E56014",
opacity: ["100%", "20%"],
},
hand: {
color: "#5ec8fe",
pointColor: "#0b66f9",
opacity: ["100%", "20%"],
},
sound: {
color: "#F5A623",
pointColor: "",
opacity: ["100%", "8%", "%0"],
},
};

function getSightAverage(totalEyeCount, passedEyeCount) {
return totalEyeCount === 0 ? 0 : passedEyeCount / totalEyeCount;
}

function getSightPassCount(workerResult) {
let passedEyeCount = 0;
const faces = workerResult.faces || [];
faces.forEach((face) => {
const eyesCount = Math.round(face.eyesPos.length / EYE_GROUP_LEN);
let ratio = 0;
for (let i = 0; i < eyesCount; i++) {
const startIndex = i * EYE_GROUP_LEN;
const eyePos = face.eyesPos.slice(startIndex, startIndex + EYE_GROUP_LEN);

const ey1 = eyePos[1];
const ey2 = eyePos[3];
const ey3 = eyePos[9];

const ex1 = eyePos[4];
const ex2 = eyePos[6];
const ex3 = eyePos[8];

const horizontalRatio = Math.abs(((ey1 + ey2) / 2 - ey3) / (ey1 - ey2));
const verticalRatio = Math.abs(((ex1 + ex2) / 2 - ex3) / (ex1 - ex2));
const sightRatio = horizontalRatio + verticalRatio;
ratio += sightRatio;
}
ratio /= eyesCount;
if (ratio < 0.035) {
passedEyeCount += 1;
}
});
return passedEyeCount;
}

function drawStartPoint(ctx, x, y, color, radius) {
ctx.fillStyle = color;
ctx.beginPath();
ctx.arc(x, y, radius, 0, 2 * Math.PI);
ctx.fill();
ctx.closePath();
}

function drawGradientLine(
ctx,
xStartPos,
yStartPos,
xEndPos,
yEndPos,
lineColor
) {
const gradient = ctx.createLinearGradient(
xStartPos,
yStartPos,
xEndPos,
yEndPos
);
gradient.addColorStop(0, lineColor);
gradient.addColorStop(0.8, `${lineColor}88`);
gradient.addColorStop(1, `${lineColor}00`);

ctx.beginPath();
ctx.moveTo(xStartPos, yStartPos);
ctx.lineTo(xEndPos, yEndPos);
ctx.strokeStyle = gradient;
ctx.stroke();
}

function drawLine(
ctx,
xStartPos,
yStartPos,
xEndPos,
yEndPos,
lineColor,
pointColor
) {
drawStartPoint(ctx, xStartPos, yStartPos, pointColor, 2);
drawGradientLine(ctx, xStartPos, yStartPos, xEndPos, yEndPos, lineColor);
}

function drawRectBorder(
ctx,
xPos,
yPos,
x2Pos,
y2Pos,
lineWidth,
lineColor,
pointColor
) {
ctx.lineWidth = lineWidth;
drawLine(ctx, xPos, yPos, xPos, y2Pos, lineColor, pointColor);
drawLine(ctx, xPos, y2Pos, x2Pos, y2Pos, lineColor, pointColor);
drawLine(ctx, x2Pos, y2Pos, x2Pos, yPos, lineColor, pointColor);
drawLine(ctx, x2Pos, yPos, xPos, yPos, lineColor, pointColor);
ctx.closePath();
}

function calculateScaleRatio(width, height, predictImageSize) {
return width / height > 1
? width / predictImageSize
: height / predictImageSize;
}

function drawFaceBorder(
ctx,
xPos,
yPos,
x2Pos,
y2Pos,
scaleRatio,
lineWidth = 3,
type = "face"
) {
const { color: lineColor, pointColor } = CANVAS_PALETTE[type];
xPos *= scaleRatio;
yPos *= scaleRatio;
x2Pos *= scaleRatio;
y2Pos *= scaleRatio;
drawRectBorder(
ctx,
xPos,
yPos,
x2Pos,
y2Pos,
lineWidth,
lineColor,
pointColor
);
}

export function drawBy2D(canvas, ctx, predictData) {
const { output, predictImageSize } = predictData;
const { hands, faces } = output;
ctx.clearRect(0, 0, canvas.width, canvas.height);

const sightAverage = getSightAverage(faces.length, getSightPassCount(output));
const showSight = sightAverage > 0.2;
const scaleRatio = calculateScaleRatio(
canvas.width,
canvas.height,
predictImageSize
);

faces.forEach((face) => {
const { faceRate, facePos } = face;
if (faceRate > 0.3) {
const [x1, y1, x2, y2] = facePos;
drawFaceBorder(ctx, x1, y1, x2, y2, scaleRatio, 3, "face");
}

const { eyesPos: eyesPosition } = face;
if (showSight) {
const ex3 = eyesPosition[4];
const ey1 = eyesPosition[1];
const ey2 = eyesPosition[3];
const ex9 = eyesPosition[16];
const ey6 = eyesPosition[11];
const ey7 = eyesPosition[13];
const xOffset = 5;
const yOffset = 5;
const eRect1 = { x: ex3 - xOffset, y: Math.min(ey1, ey6) - yOffset };
const eRect2 = { x: ex9 + xOffset, y: Math.max(ey2, ey7) + yOffset };
drawFaceBorder(
ctx,
eRect1.x,
eRect1.y,
eRect2.x,
eRect2.y,
scaleRatio,
3,
"eye"
);
}

const { mouthPos } = face;
const mx1 = mouthPos[0];
const my1 = mouthPos[7];
const mx2 = mouthPos[12];
const my2 = mouthPos[19];
if (face.smileRate >= 0.4) {
drawFaceBorder(ctx, mx1, my1, mx2, my2, scaleRatio, 3, "mouth");
}
});

hands.forEach((hand) => {
const [x1, y1, x2, y2] = hand.handPos;
drawFaceBorder(ctx, x1, y1, x2, y2, scaleRatio, 3, "hand");
});
}


export class FaceDrawCanvas {
constructor(options = {}) {
const { canvasElId } = options;

this.canvas = document.getElementById(canvasElId);
this.ctx2D = this.canvas.getContext("2d");
}

draw = async (predictData) => {
drawBy2D(this.canvas, this.ctx2D, predictData);
};

clear = async () => {
this.ctx2D.clearRect(0, 0, this.canvas.clientWidth, this.canvas.height);
};
}

2.5 facePredictCanvas.js

function calculateNewDimensions(videoWidth, videoHeight, predictImageSize) {
const aspectRatio = videoWidth / videoHeight;
let newWidth, newHeight;

if (aspectRatio > 1) {
newWidth = predictImageSize;
newHeight = Math.round(predictImageSize / aspectRatio); // 使用四舍五入
} else {
newHeight = predictImageSize;
newWidth = Math.round(predictImageSize * aspectRatio); // 使用四舍五入
}

return { newWidth, newHeight };
}

function scalePredictImageData(
ctx,
videoEl,
videoWidth,
videoHeight,
predictImageSize,
isFlipHorizontal = false,
isFlipVertical = false
) {
const { newWidth, newHeight } = calculateNewDimensions(
videoWidth,
videoHeight,
predictImageSize
);

ctx.clearRect(0, 0, newWidth, newHeight);
ctx.save();
if (isFlipHorizontal) ctx.scale(-1, 1);
if (isFlipVertical) ctx.scale(1, -1);

ctx.drawImage(
videoEl,
0,
0,
videoWidth,
videoHeight,
isFlipHorizontal ? -newWidth : 0,
isFlipVertical ? -newHeight : 0,
newWidth,
newHeight
);
ctx.restore();
}

export function getPredictImageDataBy2D(
ctx,
videoEl,
videoWidth,
videoHeight,
predictImageSize,
isFlipHorizontal,
isFlipVertical
) {
scalePredictImageData(
ctx,
videoEl,
videoWidth,
videoHeight,
predictImageSize,
isFlipHorizontal,
isFlipVertical
);

const imageData = ctx.getImageData(0, 0, predictImageSize, predictImageSize);
// 测试缩放后的 imageData 数据
// ctx.putImageData(imageData, predictImageSize, predictImageSize);

return {
id: +new Date(),
width: imageData.width,
height: imageData.height,
buffer: imageData.data.buffer,
predictImageSize: predictImageSize,
};
}

export class FacePredictCanvas {
constructor(options = {}) {
const {
videoEl,
canvasElId,
videoWidth,
videoHeight,
isFlipVertical,
isFlipHorizontal,
predictImageSize,
} = options;

this.videoEl = videoEl;
this.videoWidth = videoWidth;
this.videoHeight = videoHeight;
this.isFlipVertical = isFlipVertical;
this.predictImageSize = predictImageSize;
this.isFlipHorizontal = isFlipHorizontal;
this.canvas = document.getElementById(canvasElId);
this.ctx2D = this.canvas.getContext("2d");
}

getPredictImageData = async () => {
return getPredictImageDataBy2D(
this.ctx2D,
this.videoEl,
this.videoWidth,
this.videoHeight,
this.predictImageSize,
this.isFlipHorizontal,
this.isFlipVertical
);
};
}

2.6 registerServiceWorker.js

export async function registerServiceWorker(workerPath) {
if (!("serviceWorker" in navigator)) {
console.warn("Service Worker is not supported in this browser.");
return;
}

try {
const registration = await navigator.serviceWorker.register(workerPath);
console.log("Service Worker registered with scope:", registration.scope);
} catch (error) {
console.error(`注册失败:${error}`);
}
}

2.7 videoRecorder/index.js

const recordModeMap = {
face: "face",
screen: "screen",
};

const statusMap = {
notStarted: "notStarted",
recording: "recording",
paused: "paused",
stopped: "stopped",
};

const EMediaError = {
AbortError: "media_aborted",
NotAllowedError: "permission_denied",
NotFoundError: "no_specified_media_found",
NotReadableError: "media_in_use",
OverconstrainedError: "invalid_media_constraints",
TypeError: "no_constraints",
SecurityError: "security_error",
OtherError: "other_error",
NoRecorder: "recorder_error",
// 无法获取浏览器录音功能,请升级浏览器或使用Chrome
NotChrome: "not_chrome",
// chrome下获取浏览器录音功能,因为安全性问题,需要在localhost或127.0.0.1或https下才能获取权限
UrlSecurity: "url_security",
None: "",
};

function isObject(object) {
return typeof object === "object" && object != null;
}

async function checkPermissions(name) {
try {
return await navigator.permissions.query({ name: name });
} catch (error) {
return false;
}
}

function removeUnsupportedConstraints(constraints) {
try {
const supportedMediaConstraints =
navigator.mediaDevices.getSupportedConstraints();

if (!supportedMediaConstraints) {
return;
}

Object.keys(constraints).forEach((constraint) => {
if (!supportedMediaConstraints[constraint]) {
console.log(
`VideoRecorder removeUnsupportedConstraints: Removing unsupported constraint "${constraint}".`
);
delete constraints[constraint];
}
});
} catch (error) {
console.log("VideoRecorder removeUnsupportedConstraints error: ", error);
}
}

function sanitizeStreamConstraints(streamConstraints) {
if (
isObject(streamConstraints.audio) &&
typeof streamConstraints.audio !== "boolean"
) {
removeUnsupportedConstraints(streamConstraints.audio);
}

if (
isObject(streamConstraints.video) &&
typeof streamConstraints.video !== "boolean"
) {
removeUnsupportedConstraints(streamConstraints.video);
}
}

function sanitizeRecorderOptions(recorderOptions) {
if (!window.MediaRecorder) {
console.log(
"VideoRecorder sanitizeRecorderOptions error",
"该浏览器不支持 window.MediaRecorder"
);
return;
}

if (
recorderOptions?.mimeType &&
typeof window.MediaRecorder.isTypeSupported === "function" &&
!window.MediaRecorder.isTypeSupported(recorderOptions.mimeType)
) {
console.log(
`VideoRecorder sanitizeRecorderOptions: Removing unsupported mimeType "${recorderOptions.mimeType}".`
);
delete recorderOptions.mimeType;
}
}

function getMediaPermissionErrorMessage(error) {
const errName = error.name;
if (errName === "NotFoundError" || errName === "DevicesNotFoundError") {
// required track is missing
// 找不到满足请求参数的媒体类型。
return EMediaError.NotFoundError;
} else if (errName === "NotReadableError" || errName === "TrackStartError") {
// 媒体设备已经被其他的应用所占用了
// 操作系统上某个硬件、浏览器或者网页层面发生的错误导致设备无法被访问。
// webcam or mic already in use
return EMediaError.NotReadableError;
} else if (
errName === "OverConstrainedError" ||
errName === "ConstraintNotSatisfiedError"
) {
// 当前设备不满足constraints条件
return EMediaError.OverconstrainedError;
} else if (
errName === "NotAllowedError" ||
errName === "permissionDeniedError"
) {
// permission denied in browser
// 用户拒绝了当前的浏览器实例的访问请求;或者用户拒绝了当前会话的访问;或者用户在全局范围内拒绝了所有媒体访问请求。
return EMediaError.NotAllowedError;
} else if (errName === "TypeError") {
// 类型错误,constraints对象未设置[空],或者都被设置为false。
return EMediaError.TypeError;
} else if (errName === "AbortError") {
// 硬件问题
return EMediaError.AbortError;
} else if (errName === "SecurityError") {
// 安全错误,在getUserMedia() 被调用的 Document 上面,使用设备媒体被禁止。这个机制是否开启或者关闭取决于单个用户的偏好设置。
return EMediaError.SecurityError;
} else {
// other errors
return error;
}
}

function releaseMediaStream(mediaStream) {
if (!mediaStream) {
return;
}

mediaStream.getTracks().forEach((track) => track.stop());
}

async function getMediaStream(recordMode, streamConstraints) {
try {
if (recordMode === recordModeMap.screen) {
const displayMediaSteam =
await window.navigator.mediaDevices.getDisplayMedia(streamConstraints);

/**
* @description: 合并音频流与屏幕录制流
*/
const displayAudioStream =
await window.navigator.mediaDevices.getUserMedia({
audio: this.streamConstraints.audio,
});

displayAudioStream
.getAudioTracks()
.forEach((audioTrack) => displayMediaSteam.addTrack(audioTrack));

return displayMediaSteam;
}

const userMediaStream = await window.navigator.mediaDevices.getUserMedia(
streamConstraints
);
return userMediaStream;
} catch (error) {
console.log("error", error);
const errorMessage = getMediaPermissionErrorMessage(error);
console.log("VideoRecorder setMediaStream Error:", errorMessage);
}
}

export class VideoRecorder {
constructor(options = {}) {
const {
videoElId,
recordMode,
videoWidth,
videoHeight,
stopCallback,
pauseCallback,
startCallback,
resumeCallback,
processCallback,
recorderOptions,
videoConstraint,
audioConstraint,
uploadWorkerUrl,
processWorkerUrl,
recorderTimeWorkerUrl,
processVideoCanvasElId,
} = options;

this.stream = null;
this.videoEl = null;
this.recorder = null;
this.recordChunks = [];
this.videoElId = videoElId;
this.videoWidth = videoWidth;
this.videoHeight = videoHeight;
this.stopCallback = stopCallback;
this.pauseCallback = pauseCallback;
this.startCallback = startCallback;
this.resumeCallback = resumeCallback;
this.processCallback = processCallback;
this.uploadWorkerUrl = uploadWorkerUrl;
this.processWorkerUrl = processWorkerUrl;
this.recorderTimeWorkerUrl = recorderTimeWorkerUrl;
this.processVideoCanvasElId = processVideoCanvasElId;

this.uploadWorker = null;
this.processCtx2D = null;
this.processCanvas = null;
this.processStream = null;
this.processWorker = null;
this.recorderTimerWorker = null;

this.isPauseIng = false;
this.status = statusMap.notStarted;

this.recordMode = recordMode || recordModeMap.face;

this.streamConstraints = {
audio: audioConstraint,
video: videoConstraint,
};

this.recorderOptions = recorderOptions || {
mimeType: "video/webm",
};

this.run();
}

run() {
this.initVideoEl();
this.initEventListener();
this.initUploadWorker();
this.initProcessCanvas();
this.initProcessWorker();
this.initRecorderTimerWorker();
}

initVideoEl() {
this.videoEl = document.getElementById(this.videoElId);

if (!this.videoEl) {
return;
}
//
this.videoEl.muted = true;
this.videoEl.addEventListener("play", () => {
VideoRecorder.captureVideoFrame(this.videoEl, this.processVideoFrame);
});
}

initEventListener() {
navigator.connection?.addEventListener?.("change", () => {
const { downlink, effectiveType } = navigator.connection;
const online = navigator.onLine;
console.log(`🌐 当前网络速度:${downlink} Mbps,类型:${effectiveType}`);
this.uploadWorker.postMessage({
type: "network",
online,
speed: downlink,
});
});

window.addEventListener("online", () => {
this.uploadWorker.postMessage({ type: "network", online: true });
});

window.addEventListener("offline", () => {
this.uploadWorker.postMessage({ type: "network", online: false });
});
}

initProcessCanvas() {
this.processCanvas = document.getElementById(this.processVideoCanvasElId);
this.processCanvas.width = this.videoWidth;
this.processCanvas.height = this.videoHeight;
this.processCtx2D = this.processCanvas.getContext("2d");
}

initProcessWorker() {
this.processWorker = new Worker(this.processWorkerUrl);
this.processWorker.onmessage = this.onVideoFrameMessage;
this.processWorker.postMessage({
type: "init",
width: this.videoWidth,
height: this.videoHeight,
});
}

initUploadWorker() {
this.uploadWorker = new Worker(this.uploadWorkerUrl);
this.uploadWorker.onmessage = this.onUploadMessage;
}

initRecorderTimerWorker() {
this.recorderTimerWorker = new Worker(this.recorderTimeWorkerUrl);
this.recorderTimerWorker.onmessage = this.onRecorderTimerMessage;
}

clearAll() {
this.clearRecorder();
releaseMediaStream(this.stream);
}

clearRecorder() {
this.recorder = null;
this.recordChunks = [];
}

getRecorderChunkUrl() {
const mimeType = "video/mp4";
const blob = new Blob(this.recordChunks, { type: mimeType });
const url = URL.createObjectURL(blob);
return url;
}

static async enumMediaDevices() {
try {
const devices = await window.navigator.mediaDevices.enumerateDevices();
const inputDevices = devices.filter((item) => {
return item.kind.endsWith("input") && item.deviceId !== "";
});
const audioInputs = inputDevices.filter(
(item) => item.kind === "audioinput"
);
const videoInputs = inputDevices.filter(
(item) => item.kind === "videoinput"
);

return [videoInputs, audioInputs];
} catch (error) {
console.error("VideoRecorder static enumMediaDevices error:", error);
return [[], []];
}
}

static isVideoPlaying(video) {
if (!video) {
return false;
}

return !video.paused && !video.ended;
}

static captureVideoFrame(video, callback) {
if (!video) {
return;
}

let requestFrame;

function processFrame(now, metadata) {
if (!VideoRecorder.isVideoPlaying(video)) {
return;
}

callback?.(video, now, metadata);
requestFrame(processFrame);
}

if ("requestVideoFrameCallback" in HTMLVideoElement.prototype) {
requestFrame = (cb) => video.requestVideoFrameCallback(cb);
} else if (window.requestAnimationFrame) {
requestFrame = (cb) => requestAnimationFrame(cb);
} else {
requestFrame = (cb) => {
setInterval(() => {
const now = performance.now();
cb(now);
}, 1000 / 30);
};
}

requestFrame(processFrame);
}

static async requestMediaPermissions() {
let mediaStream = null;
try {
mediaStream = await window.navigator.mediaDevices.getUserMedia({
video: true,
audio: true,
});
} catch (error) {
console.log("VideoRecorder static requestMediaPermissions error", error);
} finally {
// 必须确保每次请求后都能清理资源,避免不必要的设备占用。在某些情况下,如果不停止轨道,可能会导致摄像头或麦克风资源持续占用,影响其他应用使用这些设备。
releaseMediaStream(mediaStream);
return !!mediaStream;
}
}

async checkMediaPermissions() {
try {
const camera = await checkPermissions("camera");
const microphone = await checkPermissions("microphone");
if (camera.state === "granted" && microphone.state === "granted") {
return true;
}

console.log("请允许使用摄像头和麦克风");
return false;
} catch (error) {
const errorMessage = getMediaPermissionErrorMessage(error);
console.log(
"VideoRecorder checkPermissions 出现错误, 错误信息为",
errorMessage
);
return false;
}
}

drawVideoFrame = (imageBitmap) => {
this.processCtx2D.clearRect(
0,
0,
this.processCanvas.width,
this.processCanvas.height
);
this.processCtx2D.drawImage(
imageBitmap,
0,
0,
this.processCanvas.width,
this.processCanvas.height
);
};

onVideoFrameMessage = (event) => {
const { data = {} } = event;

if (data.type !== "processed") {
return;
}

this.drawVideoFrame(data.imageBitmap);
};

processVideoFrame = async (video) => {
const imageBitmap = await createImageBitmap(video);
this.processWorker.postMessage({ type: "process", imageBitmap }, [
imageBitmap,
]);
};

onUploadMessage = (event) => {
const { message } = event.data;
console.log("message", message);
};

onRecorderTimerMessage = (event) => {
const { type } = event.data;
if (type === "updateTime") {
this.recorder.requestData();
}
};

pauseVideo() {
this.videoEl?.pause?.();
}

playVideo() {
if (!this.videoEl) {
return;
}

this.videoEl?.play?.();
}

async prepareRecord() {
const checkResult1 = await this.checkMediaPermissions();
if (!checkResult1) {
return null;
}

sanitizeRecorderOptions(this.recorderOptions);
sanitizeStreamConstraints(this.streamConstraints);

if (this.recorder) {
this.recorder = null;
this.recordChunks = [];
}

this.processStream = this.processCanvas.captureStream(60);
this.stream = await getMediaStream(this.recordMode, this.streamConstraints);

if (!this.stream) {
return null;
}

if (this.videoEl) {
this.videoEl.srcObject = this.stream;
}

const audioTracks = this.stream.getAudioTracks();
audioTracks.forEach((track) => this.processStream.addTrack(track));

this.recorder = new window.MediaRecorder(
this.processStream,
this.recorderOptions
);

this.recorder.onstop = this.onRecorderStop;
this.recorder.onstart = this.onRecorderStart;
this.recorder.onpause = this.onRecorderPause;
this.recorder.onresume = this.onRecorderResume;
this.recorder.ondataavailable = this.onRecorderDataavailable;

return this.processStream;
}

startRecorder() {
if (this.status !== statusMap.notStarted) {
return;
}

this.playVideo();
this.recorder.start();
this.status = statusMap.recording;
this.uploadWorker.postMessage({ type: "init" });
this.recorderTimerWorker.postMessage({ action: "start" });
}

pauseRecorder() {
if (this.status !== statusMap.recording) {
return;
}

this.pauseVideo();
this.recorder.pause();
this.recorderTimerWorker.postMessage({ action: "stop" });
}

resumeRecorder() {
if (!this.isPauseIng) {
return;
}

this.playVideo();
this.recorder.resume();
this.recorderTimerWorker.postMessage({ action: "start" });
}

stopRecorder() {
this.pauseVideo();
this.recorder.stop();
this.recorderTimerWorker.postMessage({ action: "stop" });
}

onRecorderStop = () => {
this.stopCallback?.();
this.status = statusMap.stopped;
};

onRecorderStart = () => {
this.startCallback?.();
this.status = statusMap.recording;
};

onRecorderPause = () => {
this.isPauseIng = true;
this.recorder.requestData();
this.status = statusMap.paused;

this.pauseCallback?.();
};

onRecorderResume = () => {
this.resumeCallback?.();
this.isPauseIng = false;

this.status = statusMap.recording;
};

onRecorderError = () => {
this.errorCallback?.();
};

onRecorderDataavailable = (event) => {
const { data } = event;

if (data.size <= 0) {
return;
}

this.recordChunks.push(data);
this.processCallback?.(data);
this.uploadWorker.postMessage(
{ type: "upload", recorderChunk: data },
data
);
};
}

2.8 videoRecorder/video-upload-worker.js

let scheduler = null;
let uploadQueue = [];
let isUploading = false;
let recorderChunkIndex = 0;

class Scheduler {
constructor(parallelism) {
this.queue = [];
this.paused = false;
this.runningTask = 0;
this.parallelism = parallelism;
}

add(task, callback) {
return new Promise((resolve, reject) => {
const taskItem = {
reject,
resolve,
callback,
processor: () => Promise.resolve().then(() => task()),
};

this.queue.push(taskItem);
this.schedule();
});
}

pause() {
this.paused = true;
console.log("⚠️ 网络状态不佳,上传已暂停");
}

resume() {
if (this.paused) {
this.paused = false;
console.log("✅ 网络恢复,恢复上传");
this.schedule();
}
}

schedule() {
while (
!this.paused &&
this.runningTask < this.parallelism &&
this.queue.length
) {
this.runningTask++;
const taskItem = this.queue.shift();
const { processor, resolve, reject, callback } = taskItem;

processor()
.then((res) => {
resolve && resolve(res);
callback && callback(null, res);
})
.catch((error) => {
reject && reject(error);
callback && callback(error, null);
})
.finally(() => {
this.runningTask--;
this.schedule();
});
}
}
}

function request(data) {
return new Promise((resolve) => {
setTimeout(() => {
console.log(`📤 上传 ${data.id}:`, data.chunk);
resolve();
}, 1000);
});
}

function requestWithRetry(data, retries = 3) {
return new Promise((resolve, reject) => {
function attempt(remaining) {
request(data)
.then(resolve)
.catch((err) => {
if (remaining > 0) {
console.log(
`⚠️ 任务 ${data.id} 失败,正在重试 (${3 - remaining + 1}/3)...`
);
attempt(remaining - 1);
} else {
reject(err);
}
});
}
attempt(retries);
});
}

function addTask(data) {
scheduler.add(
() => requestWithRetry(data),
(error, result) => {
console.log(`${data.id} 已上传完成`);
if (scheduler.queue.length === 0 && scheduler.runningTask === 0) {
console.log("✅ 上传队列已清空");
self.postMessage({ message: "✅ 上传队列已清空" });
}
}
);
}

self.onmessage = (event) => {
const { type, speed, online, recorderChunk } = event.data;

switch (type) {
case "init":
recorderChunkIndex = 0;
scheduler = new Scheduler(4);
break;
case "upload":
addTask({ id: recorderChunkIndex, chunk: recorderChunk });
recorderChunkIndex++;
break;
case "network":
if (!online || (speed !== undefined && speed < 1)) {
scheduler?.pause?.();
} else {
scheduler?.resume?.();
}
break;
}
};

2.9 videoRecorder/video-processor-worker.js

let offscreenCanvas;
let offscreenCanvasCtx2D;

self.onmessage = async (event) => {
const { data = {} } = event;
const { type } = data;

switch (type) {
case "init":
const { width, height } = data;
offscreenCanvas = new OffscreenCanvas(width, height);
offscreenCanvasCtx2D = offscreenCanvas.getContext("2d");
break;
case "process":
const { imageBitmap } = data;
offscreenCanvasCtx2D.setTransform(-1, 0, 0, 1, offscreenCanvas.width, 0);
offscreenCanvasCtx2D.drawImage(
imageBitmap,
0,
0,
offscreenCanvas.width,
offscreenCanvas.height
);
const newImageBitmap = await offscreenCanvas.transferToImageBitmap();
self.postMessage({ type: "processed", imageBitmap: newImageBitmap }, [
newImageBitmap,
]);
break;
default:
console.log("未知情况!!!");
}
};

2.10 videoRecorder/video-recorder-time-worker.js

let recordingTime = 0;

self.onmessage = (event) => {
const { action } = event.data;

if (action === "start") {
recordingTime = 0;
startTimer();
} else if (action === "stop") {
stopTimer();
}
};

let timerId;

function startTimer() {
timerId = setInterval(() => {
recordingTime += 5; // Increment time by 5 seconds
self.postMessage({ type: "updateTime", time: recordingTime });
}, 5000);
}

function stopTimer() {
clearInterval(timerId);
}

2.11 facePredictWorker/index.js

export class FacePredictWorker {
constructor(url, options = {}) {
const {
predictImageSize,
predictSuccessCallback,
loadModelSuccessCallback,
} = options;

this.url = url;
this.worker = null;
this.predictImageSize = predictImageSize;
this.predictSuccessCallback = predictSuccessCallback;
this.loadModelSuccessCallback = loadModelSuccessCallback;

this.run();
}

run() {
this.worker = new Worker(this.url);
this.worker.onmessage = this.onMessage;
}

onMessage = (event) => {
const data = event.data;

if (data.message === "load_model_success") {
console.log("Model loaded successfully.");
this.loadModelSuccessCallback?.();
this.postPreflightPredictionMessage();
} else if (data.message === "load_model_failed") {
console.error("Model loading failed.", event);
} else if (data.message === "predict_success") {
this.predictSuccessCallback?.(data);
} else if (data.message === "model_not_loaded") {
console.log("Model Not Load");
}
};

postMessage(data) {
this.worker.postMessage(
{
id: data.id,
width: data.width,
height: data.height,
buffer: data.buffer,
predictImageSize: data.predictImageSize || this.predictImageSize,
},
[data.buffer]
);
}

postPreflightPredictionMessage() {
const width = this.predictImageSize;
const height = this.predictImageSize;

const buffer = new ArrayBuffer(width * height * 4);

this.worker.postMessage(
{
id: "preflight",
width,
height,
buffer,
predictImageSize: this.predictImageSize,
},
[buffer]
);
}
}

2.12 facePredictWorker/worker.js

importScripts("https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.min.js");

let model;
const width = 640;
const scale = 0.5;

(async () => {
const baseUrl = self.location.href.replace(/\/[^\/]*$/, "");
const modelUrl = `${baseUrl}/model.json`;

try {
model = await tf.loadGraphModel(modelUrl, {
fetchFunc: (url, options) => {
return caches.match(url).then((cachedResponse) => {
return cachedResponse || fetch(url, options); // 如果缓存中有响应,则返回缓存的内容,否则继续发起请求
});
},
});
postMessage({ message: "load_model_success" });
} catch (e) {
postMessage({ message: "load_model_failed" });
}
})();

function transform(prediction) {
let hands = [];
let faces = [];

prediction.forEach((object) => {
let rates = object.splice(-4);
let smileRate = rates[0];
let bgRate = rates[1];
let faceRate = rates[2];
let handRate = rates[3];

let pos = object.map((val) => {
return val * width * scale;
});

let type = [bgRate, faceRate, handRate].indexOf(
Math.max(bgRate, faceRate, handRate)
);
if (type === 1) {
let facePos = pos.slice(0, 4);
let eyesPos = pos.slice(4, 24);
let mouthPos = pos.slice(24, 52);
faces.push({
faceRate,
smileRate,
facePos,
mouthPos,
eyesPos,
});
}
if (type === 2) {
hands.push({
handRate,
handPos: pos.slice(0, 4),
});
}
});
return {
faces,
hands,
};
}

async function predict(imageData, options = {}) {
const { id: imageId, predictImageSize } = options;

const data = new Array(imageData.data.length * 0.75 || 0);
let j = 0;
for (let i = 0; i < imageData.data.length; i++) {
if ((i & 3) !== 3) {
data[j++] = imageData.data[i];
}
}

let inputData = tf.tidy(() => {
return tf.tensor(data).reshape([1, imageData.width, imageData.height, 3]);
});

try {
const t0 = performance.now();
let prediction = await model.executeAsync(inputData);

inputData.dispose();
tf.dispose(inputData);
inputData = null;

let output = tf.tidy(() => {
return transform(prediction.arraySync());
});

const t1 = performance.now();

postMessage({
output,
id: imageId,
predictImageSize,
frame: 1000 / (t1 - t0),
message: "predict_success",
});

prediction.dispose && prediction.dispose();
tf.dispose(prediction);
prediction = null;
output = null;
} catch (error) {
inputData && inputData.dispose();
tf.dispose(inputData);
inputData = null;
postMessage({ message: "predict_failed", error });
}
}

onmessage = function (event) {
const image = new ImageData(
new Uint8ClampedArray(event.data.buffer),
event.data.width,
event.data.height
);

if (!model) {
postMessage({ message: "model_not_loaded" });
return;
}

predict(image, event.data);
};

2.13 facePredictWorker/model.json

2.14 facePredictWorker/group1-shard1of1.bin

三、测试


3.1 test.js

import { VideoRecorder } from "./videoRecorder/index.js";
import { AIVideoRecorder } from "./aiVideoRecorder/index.js";
import { registerServiceWorker } from "./registerServiceWorker.js";

registerServiceWorker("./serviceWorker.js");

await VideoRecorder.requestMediaPermissions();
const mediaDevices = await VideoRecorder.enumMediaDevices();
const videoDeviceId = mediaDevices[0][0].deviceId;
const audioDeviceId = mediaDevices[1][0].deviceId;

const aiVideoRecorder = new AIVideoRecorder({
videoDeviceId,
audioDeviceId,
videoWidth: 862,
videoHeight: 485,
videoElId: "recorder-video",
faceDrawCanvasElId: "face-draw-canvas",
facePredictCanvasElId: "face-predict-canvas",
processVideoCanvasElId: "process-video-canvas",
facePredictWorkerUrl: "./facePredictWorker/worker.js",
videoUploadWorkerUrl: "./videoRecorder/video-upload-worker.js",
videoProcessorWorkerUrl: "./videoRecorder/video-processor-worker.js",
videoRecorderTimeWorkerUrl: "./videoRecorder/video-recorder-time-worker.js",
});

function handleStartRecord() {
aiVideoRecorder.startRecord();
}

function handlePauseRecord() {
aiVideoRecorder.pauseRecord();
}

function handleResumeRecord() {
aiVideoRecorder.resumeRecord();
}

function handleStopRecord() {
aiVideoRecorder.stopRecord();
}

function handlePreviewRecord() {
const url = aiVideoRecorder.getRecorderVideoUrl();
const previewViewEl = document.getElementById("preview-recorder-video");

previewViewEl.src = url;
previewViewEl.play();
}

const stopRecordEl = document.getElementById("stop-record");
const startRecordEl = document.getElementById("start-record");
const pauseRecordEl = document.getElementById("pause-record");
const resumeRecordEl = document.getElementById("resume-record");
const previewRecordEl = document.getElementById("preview-record");

startRecordEl.addEventListener("click", () => handleStartRecord());
pauseRecordEl.addEventListener("click", () => handlePauseRecord());
resumeRecordEl.addEventListener("click", () => handleResumeRecord());
stopRecordEl.addEventListener("click", () => handleStopRecord());
previewRecordEl.addEventListener("click", () => handlePreviewRecord());

3.2 test.html

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>视频录制</title>
<style>
#recorder-container {
width: 862px;
height: 485px;
position: relative;
}

#recorder-video {
opacity: 0;
width: 100%;
height: 100%;
}

#process-video-canvas {
width: 862;
height: 485;
left: 0;
top: 0;
z-index: 100;
position: absolute;
}

#audio-volume-canvas-container {
position: absolute;
top: 12px;
left: 12px;
width: 180px;
height: 34px;
background: #000000;
border-radius: 17px;
opacity: 0.8;
z-index: 999;
}

#audio-volume-canvas {
position: absolute;
top: 13px;
left: 36px;
border-radius: 4px;
}

#face-draw-canvas {
position: absolute;
left: 0;
top: 0;
z-index: 888;
}

#face-predict-canvas {
position: absolute;
left: 0;
top: 0;
}

#recorder-operation {
width: 862px;
margin-top: 24px;
display: flex;
justify-content: center;
gap: 24px;
}
</style>
</head>
<body>
<div id="recorder-container">
<video id="recorder-video"></video>
<canvas id="face-predict-canvas" width="862" height="485"></canvas>
<canvas id="process-video-canvas"></canvas>
<div id="audio-volume-canvas-container">
<canvas id="audio-volume-canvas" width="110" height="8"></canvas>
</div>
<canvas id="face-draw-canvas" width="862" height="485"></canvas>
</div>
<div id="recorder-operation">
<button id="start-record">开始录制</button>
<button id="pause-record">暂停录制</button>
<button id="resume-record">继续录制</button>
<button id="stop-record">停止录制</button>
<button id="preview-record">预览录制</button>
</div>

<div id="preview-record-container">
<video id="preview-recorder-video"></video>
</div>

<script type="module" src="./test.js"></script>
</body>
</html>