The two-volume set LNCS 13141 and LNCS 13142 constitutes the proceedings of the 28
th International Conference on MultiMedia Modeling, MMM 2022, which took place in Phu Quoc, Vietnam, during June 6-10, 2022.
The 107 papers presented in these proceedings were carefully reviewed and selected from a total of 212 submissions. They focus on topics related to multimedia content analysis; multimedia signal processing and communications; and multimedia applications and services.
Inhaltsverzeichnis
BEST PAPER SESSION. -
Real-time detection of tiny objects based on a weighted bi-directional FPN. - Multi-Modal Fusion Network for Rumor Detection with Texts and Images. - PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network. - MF-GAN: Multi-conditional fusion Generative Adversarial Network for Text-to-Image Synthesis. -
APPLICATIONS 1. -
Learning to classify weather conditions from single images without labels. - Learning Image Representation via Attribute-aware Attention Networks for Fashion Classification. - Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses. - Parallel DBSCAN-Martingale estimation of the number of concepts for automatic satellite image clustering
. - MULTIMEDIA APPLICATIONS - PERSPECTIVES, TOOLS & APPLICATIONS (Special Session) & BRAVE NEW IDEAS. -
AI for the Media Industry: Application Potential and Automation Level. - Color the Word: Leveraging Web Images for Machine Translation of Untranslatable Words
. - ACTIVITIES & EVENTS. -
MGMP: Multimodal Graph Message Propagation Network for Event Detection. - Pose-Enhanced Relation Feature for Action Recognition in Still Images. -Prostate Segmentation of Ultrasound Images based on Interpretable-guided Mathematical Model. - Spatiotemporal Perturbation Based Dynamic Consistency for Semi-Supervised Temporal Action Detection.
- MULTIMEDIA DATASETS FOR REPEATABLE EXPERIMENTATION (Special Session). -
A Task Category Space for User-Centric Comparative Multimedia Search Evaluations. - GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval. - LLQA - Lifelog Question Answering Dataset. -
LEARNING. -
Category-sensitive Incremental Learning For Image-based 3D Shape Reconstruction. - AdaConfigure: Reinforcement Learning-based Adaptive Configuration for Video Analytics Services. - Mining Minority-class Examples With Uncertainty Estimates. - Conditional Context-aware Feature Alignment for Domain Adaptive Detection Transformer
. - MULTIMEDIA for MEDICAL APPLICATIONS (Special Session). -
Human activity recognition with IMU and vital signs feature fusion. - On Identifying Pareidolia Phenomenon by Emulating Patient Behavior. - Using Explainable AI to Identify Differences between Clinical and Experimental Pain Detection Models Based on Facial Expressions. -
APPLICATIONS 2. -
Double Granularity Relation Network with Self-Criticism for Occluded Person Re-Identification. - A Complementary Fusion Strategy for RGB-D Face Recognition. - Multi-scale Cross-modal Transformer Network for RGB-D Object Detection. - Joint Re-Detection and Re-Identification for Multi-Object Tracking. -
MULTIMEDIA ANALYTICS for CONTEXTUAL HUMAN UNDERSTANDING (Special Session). -
An Investigation into Keystroke Dynamics and Heart Rate Variability as Indicators of Stress. - Fall detection using multimodal data. - Prediction of Blood Glucose using Contextual LifeLog Data. - Multimodal Embedding for Lifelog Retrieval. -
APPLICATIONS 3. -
A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval. - SAM: Self Attention Mechanism for Scene Text Recognition based on Swin Transformer. - JVCSR: Video Compressive Sensing Reconstruction with Joint In-loop Reference Enhancement and Out-loop Super-resolution. - Point Cloud Upsampling via a Coarse-to-fine Network
. - IMAGE ANALYTICS. -
Arbitrary Style Transfer With Adaptive Channel Network. - Fast Single Image Dehazing Using Morphological Reconstruction and Saturation Compensation. - One-Stage Image Inpainting with Hybrid Attention. - Real-time FPGA Design for OMP Targeting 8K Image Reconstruction. -
SPEECH & MUSIC. -
Time-Frequency Attention For Speech Emotion Recognition With Squeeze-and-Excitation Blocks. - SPEECH INTELLIGIBILITY ENHANCEMENT BY NON-PARALLEL SPEECH STYLE CONVERSION USING CWT AND iMetricGAN BASED CycleGAN. - A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody. -
MULTIMODAL ANALYTICS. -
Bi-attention modal separation network for multimodal video fusion. - Combining Knowledge and Multi-modal Fusion for Meme Classification. - Non-Uniform Attention Network for Multi-modal Sentiment Analysis. - Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder.