2010 IEEE International Workshop on Multimedia Signal Processing
Technical Program
Monday, October 4
Room: Amphi Maupertuis
08:30 - 09:30
Plen1: Protected Video Distribution in the Networked Age
Chair: Beatrice Pesquet-Popescu (Télécom ParisTech, France)
Ton Kalker, IEEE Fellow, HP Labs
The way in which professional music is distributed and consumed has changed dramatically over the last 10 years. For this transitional period, the three key concepts that stand-out are Napster, iPod and Digital Rights Management (DRM). Currently, we have arrived at a stable situation where most of the digital audio distribution is controlled by a single retailer, and digital music is no longer encumbered by DRM. However, it is unclear that the distribution and consumption of professional digital video will follow the path of digital music. It might very well be that the future of digital video will include a strong DRM component. Why this might be the case, what form distribution of digital video will take, and why the inclusion of DRM might be less controversial than feared, will be the topic of this talk.
09:30 - 10:50
SS1: Fingerprinting based multimedia content management and security
Chairs: Sviatoslav Voloshynovskiy (University of Geneva, Switzerland), Oleksiy Koval (University of Geneva, Switzerland)
- 9:30 Considering Security and Robustness Constraints for Watermark-based Tardos Fingerprinting
- 9:50 Challenging the Security of Content Based Image Retrieval Systems
- 10:10 Private Content Identification: performance-privacy-complexity trade-off
- 10:30 Identification Based on Digital Fingerprinting: What Can Be Done if ML Decoding Fails?
11:10 - 12:30
L1: Immersive communications and systems
Chair: John Apostolopoulos (Hewlett-Packard Labs, USA)
- 11:10 Fusion of Active and Passive Sensors for Fast 3D Capture
- 11:30 Robust Foreground Segmentation for GPU Architecture in an Immersive 3D
- 11:50 Rate-Distortion Optimized Low-Delay 3D Video Communications
- 12:10 Hierarchical Hole-Filling (HHF): Depth Image Based Rendering without Depth Map Filtering for 3D-TV
Room: Espace Lamennais
09:30 - 10:50
P1: Audio and Speech Processing
Chair: Yves Grenier (Télécom ParisTech, France)
- A comparative study between different pre-whitening decorrelation based acoustic feedback cancellers
- Improving Multiple-F0 Estimation by Onset Detection for Polyphonic Music Transcription
- Geometric calibration of distributed microphone arrays from acoustic source correspondences
- A Weighted Approach of Missing Data Technique in Cepstra Domain Based on S-function
- Integrating a HRTF-based Sound Synthesis System into Mumble
- Enhancing Stereophonic Teleconferencing with Microphone Arrays through Sound Field Warping
- Enhancing Loudspeaker-based 3D Audio with Room Modeling
- Visibility-Based Beam Tracing for Soundfield Rendering
Room: Amphi Maupertuis
14:00 - 14:50
Plen2: Telepresence: from Virtual to Reality
Chair: Eckehard Steinbach (Munich University of Technology, Germany)
Phil Chou, IEEE Fellow, Microsoft Research
The teleconferencing industry newsletter Wainhouse Report defines Telepresence as "a videoconferencing experience that creates the illusion that the remote participants are in the same room with you." Today Telepresence is embodied in the marketplace by solutions such as HP Halo and Cisco Telepresence, dedicated conference rooms sporting built-in furniture and life-sized high-definition video, costing hundreds of thousands of dollars per room. In the future, Telepresence systems will be more diverse, enabling connections between not only meeting rooms but also offices, hotel rooms, vehicles, and even large unstructured spaces such as conference halls and stadiums. Mixed reality as well as ubiquitous computing - including robotics - will play key roles, because these systems will not only need to immerse the participants in a common world, but will also need to empower the participants in ways that are better than being physically present. In this talk, I will take you on a tour of various component technologies as well as experiences that are being developed in Microsoft Research for the future of Telepresence. Along the way will be evident many opportunities for advances in multimedia signal processing.
14:50 - 16:10
L2: Sparse representations and compressed sensing
Chair: Hayder Radha (Michigan State University, USA)
- 14:50 The Iteration Tuned Dictionary for Sparse Representations
- 15:10 Hybrid Compressed Sensing of Images
- 15:30 Compressive Demosaicing
- 15:50 Multistage Compressed-Sensing Reconstruction of Multiview Images
Room: Espace Lamennais
14:50 - 16:10
P2: Virtual Reality Signal Processing
Chair: Mohamed Daoudi (LIFL (UMR USTL/CNRS 8022), University of Lille, France)
- Robust Head Pose Estimation by Fusing Time-of-Flight Depth and Color
- Optimized decomposition basis using Lanczos filters for lossless compression of biomedical images
- A new image projection method for panoramic image stitching
- Fast Environment Extraction for Lighting and Occlusion of Virtual Objects in Real Scenes
- Real-Time Particle Filtering with Heuristics for 3D Motion Capture by Monocular Vision
- Bilateral Depth-Discontinuity Filter for Novel View Synthesis
- Spectral EEG Features and Tasks Selection Process: Some Considerations toward BCI Applications
- Color Transfer for Complex Content Images Based on Intrinsic Component
- Clickable Augmented Documents
- Depth-aided image inpainting for Novel View Synthesis
- Robust Background Subtraction Method Based on 3D Model Projections with Likelihood
16:30 - 17:30
D1: Demo session
Chair: Thomas Guionnet (Envivio, France)
Tuesday, October 5
Room: Amphi Maupertuis
08:30 - 09:30
Plen3: High Definition Communication - What it takes to implement it and what difference does it make?
Chair: Yves Grenier (Télécom ParisTech, France)
Bernhard Grill, Audio Department, Fraunhofer Institute for Integrated Circuits IIS
The audio quality of voice connections has remained virtually unchanged for more than 100 years. In most cases the audio bandwidth is still constrained to 3.5 kHz and nobody should expect to recognize, by listening to the sound, what is going on in the background of a call. With IP connections being used more and more for voice communication several attempts are now made to improve the situation. Some propose to considerably increase the audio bandwidth while others go as far as to promote communication in "CD-Quality" which could even include stereo or multi channel audio to fully transmit the accoustical image of the background of the speaker. What are the benefits to the user and what does it take to implement such services, as far as the audio components are concerned? This talk will try to give an overview about various systems proposed and what difference they can provide in user experience.
09:30 - 10:50
L3: Audio processing
Chair: Marco Tagliasacchi (Politecnico di Milano, Italy)
- 9:30 Unsupervised Detection of Multimodal Clusters in Edited Recordings
- 9:50 Probabilistic framework for template-based chord recognition
- 10:10 Parametric stereo extension of ITU-T G.722 based on a new downmixing scheme
- 10:30 Fitting Pinna-Related Transfer Functions to Anthropometry for Binaural Sound Rendering
11:10 - 12:30
SS2: Virtual Worlds and multisensorial experience
Chair: Marius Preda (Télécom SudParis, France)
- 11:10 Controlling virtual world by the real world devices with an MPEG-V framework
- 11:30 4-D Broadcasting with MPEG-V
- 11:50 Avatars interoperability in Virtual Worlds
- 12:10 Audio-haptic physically based simulation and evaluation of walking sounds
Room: Espace Lamennais
09:10 - 10:50
P3: Video coding
Chair: Kenneth Rose (University of California, Santa Barbara, USA)
- Reference Frame Modification Methods in Scalable Video Coding (SVC)
- Motion Vector Forecast and Mapping (MV-FMap) Method for Entropy Coding based Video Coders
- Optimal mode switching for multi-hypothesis motion compensated prediction
- Data hiding of Motion Information in Chroma and Luma Samples for Video Compression
- Motion Vector Coding Algorithm Based on Adaptive Template Matching
- Efficient MV Prediction for Zonal Search In Video Transcoding
- Bit Allocation and Encoded View Selection for Optimal Multiview Image Representation
- H.264-Based Multiple Description Coding Using Motion Compensated Temporal Interpolation
- Optimizing the free distance of Error-Correcting Variable-Length Codes
Room: Amphi Maupertuis
14:00 - 14:50
Plen4: Signal Processing Based Research Issues in 3DTV
Chair: Christine Guillemot (INRIA, France)
Levent Onural, IEEE Fellow, Bilkent University
A typical 3DTV chain has capture, representation, compression, transmission, display interface and display stages. Each stage has its own specific nature and problems. And there are many alternative technologies for implementing each of these functional units. Signal processing tools play an important role in each such stage. The capture unit deals with difficult video data fusing problems. The post capture signal processing needs may range from nil in simplest 3DTV operations to demanding time-varying 3D model generation in sophisticated ones. Coding and compression of 3DTV video has its own specific nature and solutions. Probably the most complicated and demanding signal processing is at the display interface stage since 3D displays are quite different than 2D displays, and furthermore, since 3D displays come in many different forms. There are signal processing needs even within the camera and displays units. Among all different 3D modes, true 3D versions whichtarget physical duplication of information carrying light, such asholography and integral imaging, have their own rich signal processing needs. The signal processing problems associated especially with holographic 3DTV are unique and by far more demanding, and therefore, has the potential to trigger a new line of sophisticated signal processing techniques and associated mathematics.
14:50 - 16:10
L4: Joint source channel coding / error control
Chair: Vladimir Stankovic (University of Strathclyde, United Kingdom)
- 14:50 Recovering the Output of an OFB in the case of Instantaneous Erasures in Sub-band Domain
- 15:10 Unequal Error Protection Random Linear Coding for Multimedia Communications
- 15:30 Joint Source Channel Coding/Decoding of 3D-Escot bitstreams
- 15:50 Efficient Error Control in 3D Mesh Coding
Room: Espace Lamennais
14:50 - 16:10
P4: Distributed Source Coding
Chair: Soren Forchhammer (Technical University of Denmark, Denmark)
- Side information enhancement using an adaptive hash-based genetic algorithm in a Wyner-Ziv context
- On Joint Distribution Modeling in Distributed Video Coding Systems
- Side Information Refinement for Long Duration GOPs in DVC
- Reducing DVC Decoder Complexity in a Multicore System
- Toward Realtime Side Information Decoding on Multi-core Processors
- Scalable-to-Lossless Transform Domain Distributed Video Coding
- Encoder Rate Control for Block-based Distributed Video Coding
- Encoder and Decoder Side Global and Local Motion Estimation for Distributed Video Coding
- Spatial intra-prediction based on mixtures of sparse representations
Wednesday, October 6
Room: Amphi Maupertuis
08:30 - 09:30
Plen5: Interactive Digital Art, a need for authoring tools to orchestrate the multimodal interaction between spectators and Art pieces
Chair: Christine Guillemot (INRIA, France)
Stéphane Donikian, Inria Rennes Bretagne Atlantique
Interactive poly-artistic works is a type of expression becoming increasingly common nowadays. Consequently, users, specta(c)tors, expect more and more to play an active part in these works. Such creations always require the use of a wide range of techno-logies (3D video and audio display, video and audio synthesis, body tracking˙), and a large number of computer environments, software and frameworks have been created to fulfill these needs. However, despite this important profusion in terms of technical tools, several issues remain unsolved when realizing such artistic works. First, in the context of collaborative arts, existing frameworks do not provide means for con-ceptualizing art pie-ces for contributors coming from different artistic areas (composition, choreography, video, 3D graphics˙). Second, establishing communications between software or hardware components is often complicated. Finally, the communication process and its language have to be redefined from scratch for each new realization. We will introduce ConceptMove which is a unified paradigm for describing interactive poly-artistic works.
In the second part of this talk we will focus on Interactive Storytelling, which can be regarded as a new genre, deriving both from interactive media such as video games and from narrative media such as cinema or litterature. Whatever degree of interactivity, free-dom, and non-linearity might be provided, the role that the interactor is assigned to play always has to remain inside the boundaries thus defined by the author, and which convey the essence of the work itself. This brings an extra level of complexity for writers, when tools at their disposal remain limited compared to technological evolutions.
09:30 - 10:50
L5: Virtual Reality
Chair: Marc Antonini (I3S-CNRS-University of Nice Sophia Antipolis, France)
- 9:30 Adaptive Semi-Regular Remeshing: A Voronoi-Based Approach
- 9:50 A subjective experiment for 3D-mesh segmentation evaluation
- 10:10 Depth camera based system for auto-stereoscopic displays
- 10:30 Generalized Multiscale Seam Carving
11:10 - 12:30
L6: Scene analysis for immersive telecommunication
Chair: Peter Schelkens (Vrige Universiteit, Brussel, Belgium)
- 11:10 Movement recognition exploiting multi-view information
- 11:30 Generation of See-Through Baseball Movie from Multi-Camera Views
- 11:50 Video Super-resolution for Dual-Mode Digital Cameras via Scene-matched Learning
- 12:10 Gaussian Mixture Vector Quantization-Based Video Summarization Using Independent Component Analysis
Room: Espace Lamennais
09:30 - 10:50
P5: Media delivery and quality evaluation
Chair: Pascal Frossard (Swiss Federal Institute of Technology - EPFL, Switzerland)
- An Objective Metric for Assessing Quality of Experience on Stereoscopic Images
- Measuring Errors for Massive Triangle Meshes
- Depth Consistency Testing for Improved View Interpolation
- Visual Quality of Current Coding Technologies at High Definition IPTV Bitrates
- Error Concealment Considering Error Propagation inside a Frame
- A resilient and low-delay P2P streaming system based on network coding with random multicast trees
- An Improved Foresighted Resource Reciprocation Strategy for Multimedia Streaming Applications
- Strategies of Buffering Schedule in P2P VoD Streaming
- QoE Based Adaptation Mechanism for Media Distribution in Connected Home
- Sigmoid Shrinkage for BM3D denoising algorithm
Room: Amphi Maupertuis
14:00 - 14:50
Plen6: On the sampling and compression of the plenoptic function
Chair: Beatrice Pesquet-Popescu (Télécom ParisTech, France)
Pier Luigi Dragotti,
Electrical and Electronic Engineering Department at Imperial College, London
Image based rendering (IBR) is a promising way to produce arbitrary views of a scene using images instead of object models. In IBR, new views are rendered by interpolating available nearby images. The plenoptic function, which describes the light intensity passing through every viewpoint in every directions and at all times, is a powerful tool to study the IBR problem. In fact, image based rendering can be seen as the problem of sampling and interpolating the plenoptic function.We therefore first briefly review some classical results on the spectral properties of the plenoptic function and then provide a closed-form expression for its bandwidth under the finite-field-of-view contraint. This naturally leads to an adaptive sampling strategy where the local geometrical complexity of the scene is used to adapt the sampling density of the plenoptic function. In this context, we also present an adaptive images-based-rendering algorithm based around an adaptive extraction of depth layers, where the rendering system automatically adapts the minimum number of depth layers according to the scene observed and to the spacing of the sample cameras. Finally, we discuss the problem of compressing the multiple images acquired for image-based rendering and present competitive centralized and distributed compression algorithms.This talk is based on work done with a number of collaborators, in particular, M. Brookes (ICL), C. Gilliam (ICL), A. Gelman (ICL), V. Velisavlievic (Deutsche Telekom) and J. Berent (Google inc.).
14:50 - 16:10
L7: Multimedia for communication and collaboration
Chair: Shantanu Rane (Mitsubishi Electric Research Laboratories, USA)
- 14:50 Face Hallucination Using Bayesian Global Estimation and Local Basis Selection
- 15:10 Real-Time Video Enhancement for High Quality Videoconferencing
- 15:30 Spatial Synchronization of Audiovisual Objects by 3D Audio Object Coding
- 15:50 Overcoming Asynchrony in Audio-Visual Speech Recognition
16:30 - 17:30
D2: Panel session: virtual reality for future immersive communications and emerging applications
Chair: Touradj Ebrahimi (EPFL, Switzerland)
Room: Espace Lamennais
14:50 - 16:10
P6: Object/pattern detection, classification and recognition
Chair: Enis Cetin (Bilkent University, Ankara, Turkey)
- Common Spatial Pattern revisited by Riemannian geometry
- An N-gram model for unstructured audio signals toward information retrieval
- An Efficient Framework on Large-scale Video Genre Classification
- Time-Space Acoustical Feature for Fast Video Copy Detection
- A Hierarchical Statistical Model For Object Classification
- A Bayesian Image Annotation Framework Integrating Search and Context
- Human Emotion Recognition Using Real 3D Visual Features from Gabor Library
- Person Recognition using a bag of facial soft biometrics (BoFSB)
- Multimodal Speech Recognition of a Person with Articulation Disorders Using AAM and MAF
- Object Tracking under Illumination Variations using 2D-Cepstrum Characteristics of the Target