ÖAGM/AAPR Workshop 2011

Graz, Austria
May 26-27, 2011

Institute for Information and Communcation Technologies - Machine Vision Applications
June 15, 2010
Picture-Gallery online
June 3, 2010
Proceedings online

Programm & Proceedings

Invited Speakers

Prof. Dr. Javier Gonzalez is the head of the MAPIR group and full professor at the University of Málaga. Prof. Gonzalez received the B.S. degree in Electrical Engineering from the University of Seville in 1987. He joined the Department of "Ingenieria de Sistemas y Automatica" at the University of Malaga in 1988 and received his Ph.D. from this University in 1993. In 1990-1991 he was at the Field Robotics Center, Robotics Institute, Carnegie Mellon University (USA) working on mobile robots. Since 1996 he has been heading Spanish and European projects on mobile robotics and perception. His research interest include mobile robot autonomous navigation, computer vision and remote sensing. He is author and co-author of more than 100 papers and three books about computer vision and mobile robot symbolic world modeling. (See also http://www.isa.uma.es/C0/jgonzalez/default.aspx)

Stefan Scherer studied technical physics at Graz University of technology and received his Ph.D. degree in the field of telematics in the year 2000. During his studies he took part in diverse projects as for instance the participation in the development team for 3D visualization and analysis of the Magellan radar data of the planet Venus (Boulder, USA).
During his four years of assisting and leading the industrial machine vision department at the Institute for Computer Graphics and Vision at Graz University of technology he founded a group for digital microscopy. Lectures and practical trainings were hold in the field of "Measurement". Together with Dr. Manfred Prantl he founded the company Alicona in 2001. Alicona is an enterprise which is highly specialized and now a leading manufacturer in the field of high resolution optical 3D measurement. (See also http://www.alicona.com/)

back to top

Workshop Program

Planned Agenda - Day 1

8:15-9:15 Registration
9:15-9:30 Opening
9:30-10:20 Keynote - Javier Gonzalez (University of Malaga)
10:20-11:00 Session 1 - Robotics
11:00-11:15 Coffee break
11:20-12:20 Session 2 - Depth & 3D
12:20-14:00 Lunch break
14:00-14:30 Keynote - Stefan Scherer (Alicona)
14:30-16:15 Session 3 - Short Presentations & Industrial Exhibition
16:15-16:30 Coffee break
16:30-17:50 Session 4 - Quality Inspection & Learning
18:00-18:45 AAPR General Assembly
19:00 Conference Dinner

Planned Agenda - Day 2

9:00-10:00 Microsoft Award Session
10:00-11:00 Session 5 - Classification
11:00-11:20 Coffee break
11:20-12:20 Session 6 - Object Detection
12:20 Lunch, Award ceremony and Closing

A more detailed program can be downloaded here (PDF, 39KB).

back to top


Here you can find all presented papers of this year's workshop together with an abstract. To view or download the full version (PDF documents), simply click the headline of the paper you are interested in.

Table of Content

Keynote - Javier Gonzalez (University of Malaga)

Session 1 - Robotics

  • Large-Scale Robotic SLAM through Visual Mapping

    Christof Hoppe, Katrin Pirker, Matthias Ruether, Horst Bischof

    Keyframe-based visual SLAM systems perform reliably and fast in medium-sized environments. Currently, their main weaknesses are robustness and scalability in large scenarios. In this work, we propose a hybrid, keyframe based visual SLAM system, which overcomes these problems. We combine visual features of different strength, add appearance-based loop detection and present a novel method to incorporate non-visual sensor information into standard bundle adjustment frameworks to tackle the problem of weakly textured scenes. On a standardized test dataset, we outperform EKFbased solutions in terms of localization accuracy by at least a factor of two. On a self-recorded dataset, we achieve a performance comparable to a laser scanner approach.

  • Accurate Plane Estimation Within A Holistic Probabilistic Framework

    Kai Zhou, Andreas Richtsfeld, Karthik Mahesh Varadarajan, Michael Zillich, Markus Vincze

    Accurate 3D plane estimation in complex environments is an important functionality in many robotics applications such as navigation, manipulation, human-machine interaction. Following recent research in coherent geometrical contextual reasoning and object recognition, this paper proposes a joint probabilistic model which uses the results of wireframe feature detection to facilitate refinement of supporting plane estimation. By maximizing the probability of the joint model, our method has the capability of simultaneously estimating multiple 3D surfaces. The experiments using both synthetic data and an indoor mobile robot scenario demonstrate the benefits of our coherent model approach.

Session 2 - Depth & 3D

  • Multi-View Stereo: Redundancy Benefits for 3D Reconstruction

    Markus Rumpler, Arnold Irschara, Horst Bischof

    This work investigates the influence of using multiple views for 3D reconstruction with respect to depth accuracy and robustness. In particular we show that multiview matching not only contributes to scene completeness, but also improves depth accuracy by improved triangulation angles. We first start by synthetic experiments on a typical aerial photogrammetric camera network and investigate how baseline (i.e. triangulation angle) and redundancy affect the depth error. Our evaluation also includes a comparison between combined pairwise triangulated and fused stereo pairs in contrast to true multiview triangulation. By analyzing the 3D uncertainty ellipsoid of triangulated points we demonstrate the clear advantage of a multiview approach over fused two view stereo algorithms. We propose an efficient dense matching algorithm that utilizes pairwise optical flow followed by a robust correspondence chaining approach. We provide evaluation results of the proposed method on ground truth data and compare its performance in contrast to a multiview plane sweep method.

  • Segmentation-Based Depth Propagation in Videos

    Nicole Brosch, Christoph Rhemann, Margrit Gelautz

    In this paper we propose a simple yet effective approach to convert existing 2D video content into 3D. We present a new semi-automatic algorithm that propagates sparse user-provided depth information over the whole monocular video sequence. The advantage of our algorithm over previous work comes from the use of spatio-temporal video segmentation techniques. The segmentation preserves depth discontinuities, which is challenging for previous approaches. A subsequent refinement step enables smooth depth changes over time and yields the final depth map. Quantitative evaluations show that the proposed algorithm is able to produce good quality and temporal-coherent 3D videos.

  • Object Removal by Depth-guided Inpainting

    Liu He, Michael Bleyer, Margrit Gelautz

    Object removal by image inpainting aims at the visual uniformity of the inpainted blanks among their surroundings. Most inpainting algorithms pursue the structure continuity and texture similarity only in color. In this paper we take the view depth continuity into account and propose a depth-guided inpainting algorithm, in which a single color image and its associated disparity map are inpainted simultaneously. A fast exemplar-based inpainting is applied to fill the blank. Exemplars are randomly selected under depth constraints in initialization and optimized with a nearest neighbor search method in a semi-global way for smooth completion. Experimental results with datasets of different scenes demonstrate the positive impact of depth control in exemplar selection and the efficiency of the proposed algorithm.

Keynote - Stefan Scherer (Alicona Imaging GmbH)

Session 3 - Short Presentations & Industrial Exhibition

  • Affective computing for wearable diary and lifelogging systems: An overview

    Jana Machajdik, Allan Hanbury, Angelika Garz, Robert Sablatnig

    Diaries have transformed over the last decade. Originally in handwritten text format, photo albums and visual diaries became popular as photography became commonly available. Traditionally intended to remain private, the dimension of audience was added in blogs along with Internet communication. However, humans have a limited capacity to record their lives. The goal of lifelogs is to overcome this limit and collect and store all of a person&lsquos personal information digitally. This can be done by recording all computer and cell phone activity and mobile context (e.g. GPS), but also adding multiple wearable sensors such as always-on cameras or bio-sensors. This creates enormous amounts of data that has to be processed to be made useful to humans. The main purposes of diaries are to store memories and encourage self-reflection. In both processes human emotion plays a major role. In this paper, we review state-of-the-art literature on wearable diaries and lifelogging systems, and discuss the key issues and main challenges.

  • Learning Object Detectors from Weakly-Labeled Internet Images

    Inayatullah Khan, Peter M. Roth, and Horst Bischof

    Learning visual object detectors typically requires a large amount of labeled data, which is hard to obtain. To overcome this limitation, we propose a three-stage system that avoids any human labeling and autonomously learns an object detector from unlabeled Internet images. In the first stage, we collect images via visual image search, just using the name of an object class. Then, in the second stage, we determine the presence of the target object and, finally, in the third stage, we estimate its localization and crop patches, which are used to learn a detector. Since we have to cope with ambiguously/wrongly labeled data, we apply multiple instance learning (MIL) techniques in the last two stages. In the experimental results, we demonstrate the benefits of the approach on publicly available benchmark datasets. In fact, we show that we can train competitive object detectors without using visually labeled data.

  • Calculation of Attention Points Using 3D Cues

    Ekaterina Potapova, Michael Zillich, Markus Vincze

    Attention points are an effective means to tackle scene complexity, for example for grasping objects in a table scene. One way to obtain attention points is from a saliency map. Inspired by findings from preattentive human vision we investigated 3D cues to build a new type of saliency map. We implemented two 3D cues - one based on surface height and the other based on relative surface orientation. To evaluate the approach we built up an RGB-D (colour and depth) image database with table scenes of different complexity. We compared results of our algorithm to the classical 2D Itti-Koch-Niebur saliency map. In all types of scenes 3D cues showed better results than using 2D cues only.

  • Preprocessing of microscopy images via Shannon's entropy

    Jan Urban, Jan Vanek, Dalibor Stys

    In this paper method for the image preprocessing based on the Shannon‘s definition of the information entropy is presented. The enhancement algorithm analyzes an information contribution of individual pixels to the whole image. This method was developed a specially for microscopy images captured by a phase-contrast mode. However, the approach is general and may be applied to any other image. Basic idea is that a background stays informatively poor and objects are carrying a relevant information. This method preserves the details, it highlights edges, and it decreases the random noise, all is done in one calculation. Two different variants of using the information entropy in image analysis are adopted and compared in our approach. Also, the performance of individual methods is illustrated and discussed. Finally, an optimization of the algorithm as well as an implementation on graphical processing units (GPU) is described to overpass a high computation burden.

  • Industrial Application of a New Camera System based on Hyperspectral Imaging for Inline Quality Control of Potatoes

    Marcus Groinig, Markus Burgstaller, Manfred Pail

    In laboratory environments near infrared (NIR) spectroscopy is a common tool for chemometric investigations. The application of Hyperspectral Imaging (HSI) systems are now well introduced for bulk sorting of polymers. Since the work with the complex nature of information of HSI-data and huge data volumes are recognized to be quite challanging, the market entrance of this camera systems is quite complicated for new fields of application. To overcome this difficulties a new camera system technology called EVK Chemical Colour Camera (EC3) was introduced. In this work the application of this new camera system technology for the inline quality control of potatoes on the example of Sugar-Ends detection is presented.

  • Automatic Sorting of Alluminium Alloys Based on Spectroscopy Measures

    Marcin Grzegorzek, David Schwerbel, Dirk Balthasar, and Dietrich Paulus

    In this paper, we present an approach for classification of aluminium alloys based on spectroscopy measures. We use established pattern recognition techniques for a highly interesting and novel application domain from the area of waste sorting. First, spectrometric data is acquired from aluminium samples using the so called LIBS (Laser Induced Breakdown Spectroscopy). Second, the dimensionality of the feature space achieved in this way is significantly reduced by applying intelligent feature selection schemes. Finally, Nearest Neighbour and Bayes Classifiers as well as Support Vector Machines are used for classification. Comprehensive and comparative evaluation of algorithms integrated in our system provides us with very interesting conclusions.

Session 4 - Quality Inspection & Learning

  • Photometric stereo on carbon fiber surfaces

    Werner Palfinger, Stefan Thumfart, Christian Eitzinger

    The production of carbon fiber-reinforced plastic (CFRP) is currently changing from a highly manual and expensive to an automated process. However for automated production of CFRP parts new sensor systems for quality control are required. In this article we present a photometric stereo inspection system that is able to automatically evaluate critical quality criteria of carbon fiber fabrics. Based on the sensor output we propose a specific segmentation method, tailored towards the typical properties of woven carbon fiber fabrics that partitions the fabric into single segments for feature calculation and classification. Finally we show that the proposed workflow is able to detect a multitude of defects in a real-time system.

  • Vision-Based Quality Inspection in Robotic Welding

    Markus Heber, Christian Reinbacher, Matthias Ruether, Horst Bischof

    In this work we present a novel method for assessing the quality of a robotic welding process. While most conventional automated approaches rely on non-visual information like sound or voltage, we introduce a vision-based approach. Although the weld seam appearance changes, we exploit only the information from error-free reference data, and assess the welding quality through the number of highly dissimilar frames. In our experiments we show, that this approach enables an efficient and accurate separation of defective from error-free weldings, as well as detection of welding defects in real-time by exploiting the spatial information provided by the welding robot.

  • Learning Face Recognition in Videos from Associated Information Sources

    Wohlhart, Paul and Köstinger, Martin and Roth, Peter M. and Bischof, Horst

    Videos are often associated with additional information that could be valuable for interpretation of its content. This especially applies for the recognition of faces within video streams, where often cues such as transcripts and subtitles are available. However, this data is not completely reliable and might be ambiguously labeled. To overcome these limitations, we propose a new semi supervised multiple instance learning algorithm, where the contribution is twofold. First, we can transfer information on labeled bags of instances, thus, enabling us to weaken the prerequisite knowing each label for each instance. Second, we can integrate unlabeled data, given only probabilistic information in form of priors. The benefits of the approach are demonstrated for face recognition in videos on a publicly available benchmark dataset.

  • Inpainting of Occluded Regions in Handwritings

    Fabian Hollaus, Robert Sablatnig

    This paper deals with the reconstruction of handwritings that are partially overlapped by other texts. The presented system utilizes a high-order Markov Random Field in order to learn image models, which capture the statistics of handwritten strokes. Different handwriting models are used for the retouching of missing stroke regions in English words and ancient Greek handwritings. The Greek writings are overwritten by younger texts. The system makes use of multi-spectral images in order to separate the overwritings from the underwritings and to restore the older texts.

Session 5 - Classification

  • Text Classification and Layout Analysis of Paper Fragments

    Stefan Fiel, Markus Diem, Florian Kleber, Angelika Garz, and Robert Sablatnig

    Document image analysis such as text classification and layout analysis allow for the automated extraction of document properties. In general these methodologies are pre-processing steps for Optical Character Recognition (OCR) systems. In contrast, the proposed method aims at clustering document snippets so that an automated clustering of documents can be performed. First, localized words are classified according to printed text, manuscript, and noise. The third class permits the correction of falsely segmented background elements. Having classified the text elements, a clustering is carried out which groups words into text lines and paragraphs. A back propagation of the class weights - assigned to each word in the first step - enables correcting wrong class labels. Finally, additional features such as the detection of underlined text or the paragraph layout (e.g. left aligned, centered) are extracted. The proposed method shows promising results on a dataset consisting of document fragments with varying shapes, content writing and layout.

  • Detection and Classification of Local Primitives in Line Drawings

    Naeem A. Bhatti and Allan Hanbury

    The local primitives found in binary images are useful in the analysis and recognition of document and patent images. In this paper, an optimum detection of end points and junction points is obtained using morphological spurring and the granulometric curve of the image. A distance based algorithm is proposed to classify the local primitives found at the detected points. The size of the local region to classify a local primitive is determined granulometrically using the average thickness of lines found in the image. The classified primitives are quantized using a variant of local binary patterns. Ground truth is created and an analysis of the classification accuracy is performed. The values for all the parameters used in the proposed method are determined granulometrically which makes it scale invariant.  

  • Comparative study of Landmark Detection techniques for Airport Visibility Estimation

    Jean-Philippe Andreu, Harald Ganster, Martina Uray

    Reliable and exact assessment of visibility is essential for safe air traffic. In order to overcome the drawbacks of the currently subjective reports from human observers, we present an approach to automatically derive visibility measures by means of image processing. It is based on identification of visibility of individual landmarks and compiling an overall visibility range. The methods used are based on concepts of illumination compensation as well as structural (edges) and texture recognition. Validation on individual landmarks showed a reliable performance of 96% correct detections. Furthermore, a solution for compiling the overall visibility report is presented, that resembles the currently used standard in air traffic management.

Session 6 - Object Detection

  • 3D Object Category Pose from 2D Images using Probabilistic 3D Contour Models

    Kerstin Poetsch, Axel Pinz

    We present a pose estimation algorithm for 2D input images which is based on probabilistic 3D shape models. Our underlying model is a pose-invariant category model comprised of 3D contour fragments. These 3D contour fragments are represented by probability density functions, which are described by Gaussian Mixture Models. Our pose estimation algorithm consists of two steps. First, we use an Unscented Transformation to build 2D aspect models from the pose-invariant 3D category model. Second, we introduce a novel similarity measure between Gaussian Mixture Models which is based on a hypothesis test. We demonstrate our pose estimation approach on seventeen  poses of the ETH80 database for the two categories ‘horse’ and ‘cow’.

  • Hierarchical shape model for windows detection

    Jan Mačák, Ondřej Drbohlav

    In this paper, we test the performance of a hierarchical shape detector on the problem of window detection in facade images. The hierarchical shape model detector is constructed automatically using a small number of hand-drawn images of windows. The window detections are evaluated on both rectified and non-rectified facade images. On an eTRIMS dataset containing around 1000 windows, the detector found around 600 windows and 250 false detections in rectified images. Similar performance was obtained for non-rectified facades.

  • Multi-camera and radio fusion for person localization in a cluttered environment

    Rok Mandeljc, Janez Pers, Matej Kristan, Stanislav Kovacic

    We investigate the problem of person localization in a cluttered environment. We evaluate the performance of an Ultra-Wideband radio localization system and a multi-camera system based on the Probabilistic Occupancy Map algorithm. After demonstrating the strengths and weaknesses of both systems, we improve the localization results by fusing both the radio and the visual information within the Probabilistic Occupancy Map framework. This is done by treating the radio modality as an additional independent sensory input that contributes to a given cell&lsquos occupancy likelihood.

back to top