Toward smart aquaculture: A review of multimodal methods, datasets, and applications from the modality perspective

Zhang, Qinyue; Wang, Shasha; Zhang, Tianshu; Ren, Guiming; Zhang, Lingling; Wang, Yangfan; Zheng, Bing; Li, Juan; Zheng, Haiyong

doi:10.1016/j.compag.2025.111227

Toward Smart Aquaculture: A Review of Multimodal Methods, Datasets, and Applications from the Modality Perspective

Qinyue Zhang^a, Shasha Wang^b, Tianshu Zhang^c, Guiming Ren^c, Lingling Zhang^d, Yangfan Wang^d,e, Bing Zheng^a,e,g, Juan Li^f,*, Haiyong Zheng^a,g,**

^aCollege of Electronic Engineering, Ocean University of China ^bCollege of Oceanic and Atmospheric Sciences, OUC
^cChina National Fisheries Corp. ^dCollege of Marine Life Sciences, OUC ^eSanya Oceanographic Institution, OUC
^fSchool of Mechanical and Electrical Engineering, Qingdao Agricultural University ^gShenzhen Research Institute of OUC

Computers and Electronics in Agriculture 2026

*Corresponding author · **Corresponding author

Paper PDF Cite

Multimodal Intelligence for Precision Aquaculture
面向精准水产养殖的多模态智能

Combining vision, acoustics, and environmental sensors overcomes single-sensor limits in water-quality monitoring, disease detection, and behavior analysis. This review organizes prior work by modality combinations, fusion strategies, and aquaculture applications.
结合视觉、声学与环境传感器可突破单一传感器在水质监测、疾病检测与行为分析中的局限。本综述按模态组合、融合策略与养殖应用系统组织已有研究。

Smart Aquaculture System Overview / 智能水产养殖系统概览

EN. A typical smart aquaculture system integrates multiple sensing modalities: underwater cameras for visual monitoring, environmental sensors (temperature, dissolved oxygen, pH) for water quality assessment, acoustic systems (sonar) for detection in turbid water, and automated feeders controlled by intelligent management systems. Data from all sensors flow to a central decision-making platform that outputs actionable insights for feeding decisions, water quality management, fish disease prediction and diagnosis, and continuous fish monitoring.

中文. 典型的智能水产养殖系统集成多种感知模态：水下相机用于视觉监测、环境传感器（温度、溶解氧、pH）用于水质评估、声学系统（声呐）用于浑浊水体检测、自动投喂机由智能管理系统控制。所有传感器数据汇入中央决策平台，输出可执行的洞察：投喂决策、水质管理、鱼病预测与诊断、以及持续的鱼类监测。

Abstract / 摘要

In smart aquaculture, combining sensing modalities – vision, acoustics, and environmental probes – helps overcome single-sensor limits in water-quality monitoring, disease detection, and behavior analysis. This review organizes prior work by modality combinations (including an LLM-guided net-pen inspection prototype with ROV/AUV, synchronized camera-sonar trials for salmon counting and sizing, buoy-relayed ROV platforms for long-running video and water-quality monitoring, fjord-stratification-aware echo-feeding, and trimodal observatories combining cameras, imaging sonar, and environmental probes), deployment patterns, and aquaculture applications.

We compare the strengths of each pairing for core tasks – biomass estimation, infrastructure inspection, and health monitoring – and summarize design choices that matter in practice. Evidence from real-world deployments illustrates gains in robustness and accuracy. We also catalog publicly available multimodal datasets and assess suitability by species, environment, duration, and modality coverage to guide benchmarking and data collection. Finally, we highlight trends in lightweight edge computing, large language model interfaces, and digital-twin integration.

中文. 在智能水产养殖中，结合视觉、声学与环境传感器可突破单一传感器在水质监测、疾病检测与行为分析中的局限。本综述按模态组合（包括 LLM 引导的网箱巡检 ROV/AUV 原型、用于鲑鱼计数与测量的同步相机-声呐实验、浮标中继 ROV 长时段视频与水质监测平台、峡湾分层感知的声学投喂系统，以及融合相机、成像声呐与环境探头的三模态观测站）、部署模式与养殖应用组织已有研究。我们比较各模态组合在生物量估计、设施巡检与健康监测等核心任务中的优势，并总结实践中重要的设计选择。我们还汇总公开的多模态数据集，按物种、环境、时长与模态覆盖评估其适用性。最后，我们探讨轻量边缘计算、大语言模型接口与数字孪生集成等发展趋势。

Literature Selection / 文献筛选流程

EN. We followed the PRISMA 2020 guidelines for systematic reviews. From an initial pool of 420 records identified via databases (Web of Science, Scopus, IEEE Xplore) and 20 registers, we removed 30 duplicates and 10 records for other reasons. After screening 400 records and excluding 154 single-modal/no-fusion studies, 246 reports were assessed for eligibility. We further excluded 4 theses/dissertations and 70 non-aquaculture/non-underwater studies. From other methods (websites), 7 reports were identified. Finally, 172 studies were included in this review, covering multimodal fusion methods, datasets, and applications in smart aquaculture.

中文. 本综述遵循 PRISMA 2020 系统综述指南。初始检索自数据库（Web of Science、Scopus、IEEE Xplore）获得 420 条记录及 20 条注册记录，去重 30 条、其他原因排除 10 条后，对 400 条记录进行筛选，排除 154 篇单模态/无融合研究；246 篇报告进入资格评估阶段，进一步排除 4 篇学位论文及 70 篇非水产/非水下研究。通过其他途径（网站）补充 7 篇报告。最终纳入本综述的研究共 172 篇，涵盖智能水产养殖中的多模态融合方法、数据集与应用。

Review Scope / 综述范围

Sensing Modalities / 感知模态

🎥 Underwater Optical Imaging / 水下光学成像
Visual sensing for species detection, behavior tracking, and health assessment.

🔊 Acoustic Monitoring / 声学监测
Sonar and hydrophones for detection in turbid water, biomass estimation.

🌡️ Environmental Sensing / 环境传感
Water quality probes (DO, pH, temperature, salinity) for real-time monitoring.

📡 Supplementary Sensors / 辅助传感器
Passive acoustics, IMU, GPS for navigation and localization.

Fusion Strategies / 融合策略

Strategy / 策略	Description / 描述	Trade-offs / 权衡
Early Fusion 早期融合	Integrating raw signals in input space	Rich features; high compute cost
Intermediate Fusion 中期融合	Feature-level modal collaboration	Balanced; requires aligned representations
Late Fusion 晚期融合	Integration of decision outcomes	Flexible; may lose cross-modal synergy
Prompt-guided Fusion 提示引导融合	Semantic priors via LLM/VLM	Interpretable; emerging approach

Modality Combinations Covered / 涵盖的模态组合

Vision + Language Vision + Active Acoustics Vision + Other Sensors Acoustic + Non-visual Trimodal & Higher-order

Each combination is analyzed for technical progress, underwater challenges, aquaculture tasks, and available data resources.
每种组合均从技术进展、水下挑战、养殖任务与数据资源四个维度进行分析。

Key Applications / 核心应用

Biomass Estimation
生物量估计

Accurate fish counting and sizing using synchronized vision-sonar systems in commercial sea cages.

Infrastructure Inspection
设施巡检

LLM-guided ROV/AUV systems for net-pen inspection and anomaly detection.

Health Monitoring
健康监测

Disease detection and feeding behavior analysis via multimodal data integration.

Water Quality
水质监测

Real-time environmental monitoring with buoy-relayed platforms and edge computing.

Smart Feeding
智能投喂

Fjord-stratification-aware echo-feeding that adapts appetite triggers over seasons.

Underwater Robots
水下机器人

ROV/AUV platforms with multimodal devices for autonomous monitoring.

Multimodal Datasets / 多模态数据集

We catalog publicly available multimodal datasets and assess suitability by species, environment, duration, and modality coverage.
我们汇总公开的多模态数据集，按物种、环境、时长与模态覆盖评估其适用性。

Dataset Categories Covered / 数据集类别

🎥 + 💬 Vision-Language
Datasets for aquatic vision-language fusion models, VQA, and captioning.

🎥 + 🔊 Vision-Active Acoustic
Synchronized camera-sonar datasets for underwater detection and tracking.

🎥 + 🌡️ Vision-Environmental
Combined optical and water quality data for holistic monitoring.

🔊 + 🌡️ Acoustic-Environmental
Non-visual multimodal data for turbid water scenarios.

Note: Detailed dataset tables with links are available in the full paper (Section 3.6).
注：完整数据集表格及链接请参阅论文 Section 3.6。

Technology Roadmap / 技术演进路线

EN. The evolution of smart aquaculture spans six paradigms from 1986 to 2024+:

Manual (1986–): On-site inspection, manual logs, sporadic measurements, and reactive decisions.
Single-sensor (1988–): Standalone optical/acoustic/physico-chemical probes; task-specific analytics; sensitive to turbidity, night, and biofouling; siloed data.
AIoT (1997–): Networked sensing, edge/cloud analytics, automated feeding/aeration, lower latency; still modality-fragile.
Multimodal Fusion (2012–): Fuse vision/acoustic/environmental streams via early/intermediate/late/prompt-guided fusion; calibration & temporal alignment; robust under low visibility.
Digital Twin (2023–): Virtualized farm state; data assimilation & what-if simulation; decision support; validated models & synchronized data required.
Autonomous (2024+): Closed-loop perception→planning→actuation; human-in-the-loop safety; fault tolerance & explainability; toward self-optimizing operations.

中文. 智能水产养殖技术演进横跨六个范式（1986–2024+）：

人工阶段 (1986–)：现场巡检、手工记录、零星测量、被动决策。
单传感器 (1988–)：独立的光学/声学/理化探头；任务专用分析；对浑浊、夜间、生物污损敏感；数据孤岛。
AIoT (1997–)：联网感知、边缘/云端分析、自动投喂/增氧、延迟降低；仍受单模态局限。
多模态融合 (2012–)：融合视觉/声学/环境数据流，采用早期/中期/晚期/提示引导融合；需校准与时间对齐；低能见度下鲁棒。
数字孪生 (2023–)：养殖场状态虚拟化；数据同化与假设仿真；决策支持；需验证模型与同步数据。
自主阶段 (2024+)：感知→规划→执行闭环；人在回路保安全；容错与可解释性；迈向自优化运营。

Emerging Trends / 发展趋势

Edge Computing
边缘计算

Lightweight models for on-site inference, reducing latency and bandwidth requirements.
轻量模型实现现场推理，降低延迟与带宽需求。

LLM Interfaces
大语言模型接口

Natural language interaction for farmers; prompt-guided inspection and reporting.
养殖户自然语言交互；提示引导巡检与报告生成。

Digital Twin
数字孪生

Virtual replicas for simulation, prediction, and optimized decision-making.
虚拟副本用于仿真、预测与优化决策。

Paper Preview / 论文预览

Open in new tab / 新标签打开 · Download / 下载

BibTeX

Citation / 引用

@article{Zhang2026SmartAquaculture,
  title   = {Toward smart aquaculture: A review of multimodal methods, datasets, and applications from the modality perspective},
  author  = {Zhang, Qinyue and Wang, Shasha and Zhang, Tianshu and Ren, Guiming and Zhang, Lingling and Wang, Yangfan and Zheng, Bing and Li, Juan and Zheng, Haiyong},
  journal = {Computers and Electronics in Agriculture},
  volume  = {240},
  pages   = {111227},
  year    = {2026},
  doi     = {10.1016/j.compag.2025.111227}
}

相关工作 / Related Works

FishFaceID

OUC-MOI-ID