Deep Learning Competence Center

German Research Center for Artificial Intelligence (DFKI)


Shortly after the advent of Deep Learning, DFKI launched its Deep Learning Competence Center (DLCC) with the cross-sectional functionality to connect and bundle Deep Learning competences throughout the DFKI. Across all research departments, DFKI's DLCC brings together hundreds of experts focusing on the development and application of Deep Learning approaches in their respective fields. Apart from pushing the limits in each domain, the DLCC especially focuses on current DL specific research areas such as Efficiency, XAI, Robustness, Trustworthy AI, Small Data, Unsupervised / Self-supervised Learning, Generative Approaches.

The DLCC also functions as a central contact point for external researchers and experts interested in collaboration, and bundles the DFKI's efforts to teach novel Deep Learning approaches to a new generation of students, scientists and employees embracing machine learning.

Below you can find a selection of past and current research activities. Further information can also be found in our publication listing which also allows searching for specific sub-topics.

Research projects


Piping and instrumentation diagrams (P&IDs) are central documents in the engineering process of chemical plants. They describe how the individual devices of such a plant are configured and interconnected. However, P&IDs of older plants are often stored on paper or crude digital formats, lacking any kind of semantic information. PIDGraph is designed to bridge that gap by extracting graph structures from P&IDs and providing functions to edit and enhance them automatically. PIDGraph brings together deep learning, computer vision and semantic technologies.

Representation Learning for Streaming Data

Recent advancements have everything into smart world connected with billions of sensors, streaming trillions of data streams at given point of time, which transformed everything into data driven societies. In such a world an important aspect is to uncover the hidden as well as obvious patterns out of the data and representation learning is a key to this pattern discovery. Supervised training in such a scenario have limited application due to lack of unlabelled data. The goal of this project is to learn a meaningful representation, which captures the meaningful information necessary for downstream tasks in high dimensional space as well as keep the semantics in two-dimensional space for visualization. In this project we are further exploring hybrid and self-organizing approaches for representation learning

FiN - Fühler im Netz 2.0

Together with the roll-out of smart meters by the European Union, the next step in the digitalisation of the power grid began. However, apart from the smart meters themselves a communication infrastructure became necessary to pass readings from the smart meters on to the grid providers. This infrastructure is set up on top of the power grid and is not only able to provide a communication platform, moreover it is able to establish a comprehensive monitoring of the low-voltage grid. Even in year 2020 most of the low-voltage grid is totally unsupervised, wherefore no one really knows what state the grid is currently in. To overcome this, we defined two goals for FiN-2.0 which are on the one hand side to push advances in grid state monitoring and on the other to keep track of assets attached to the grid. To achieve this, we work on different use cases based on the data collected by the distributed, mesh-based communication nodes inside the low-voltage power grid. Those use cases cover for example predictive maintenance regarding the cable aging, localization of cable damages, anomaly detection and identification of interfering sources.

Adversarial Defense based on Structure-to-Signal Autoencoders

Adversarial attacks have exposed the intricacies of the complex loss surfaces approximated by neural networks. Mitigating the effects of said attacks has proven non-trivial but has also raised the need for more interpretable models. In this paper, we present a defense strategy against gradient-based attacks, on the premise that input gradients need to expose information about the semantic manifold for attacks to be successful. We propose an architecture based on compressive autoencoders (AEs) with a two-stage training scheme, creating not only an architectural bottleneck but also a representational bottleneck. We show that the proposed mechanism yields robust results against a collection of gradient-based attacks under challenging white-box conditions. This defense is attack-agnostic and can, therefore, be used for arbitrary pre-trained models, while not compromising the original performance. These claims are supported by experiments conducted with state-of-the-art image classifiers (ResNet50 and Inception v3), on the full ImageNet validation set. Experiments, including counterfactual analysis, empirically show that the robustness stems from a shift in the distribution of input gradients, which mitigates the effect of tested adversarial attack methods. Gradients propagated through the proposed AEs represent less semantic information and instead point to low-level structural features.

Generative View Synthesis: From Single-view Semantics to Novel-view Images

Content creation, central to applications such as virtual reality, can be a tedious and time-consuming. Recent image synthesis methods simplify this task by offering tools to generate new views from as little as a single input image, or by converting a semantic map into a photorealistic image. We propose to push the envelope further, and introduce Generative View Synthesis (GVS), which can synthesize multiple photorealistic views of a scene given a single semantic map. We show that the sequential application of existing techniques, e.g., semantics-to-image translation followed by monocular view synthesis, fail at capturing the scene's structure. In contrast, we solve the semantics-to-image translation in concert with the estimation of the 3D layout of the scene, thus producing geometrically consistent novel views that preserve semantic structures. We first lift the input 2D semantic map onto a 3D layered representation of the scene in feature space, thereby preserving the semantic labels of 3D geometric structures. We then project the layered features onto the target views to generate the final novel-view images. We verify the strengths of our method and compare it with several advanced baselines on three different datasets. Our approach also allows for style manipulation and image editing operations, such as the addition or removal of objects, with simple manipulations of the input style images and semantic maps respectively.

Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Nets

Classification problems solved with deep neural networks (DNNs) typically rely on a closed world paradigm, and optimize over a single objective (e.g., minimization of the cross-entropy loss). This setup dismisses all kinds of supporting signals that can be used to reinforce the existence or absence of particular patterns. The increasing need for models that are interpretable by design makes the inclusion of said contextual signals a crucial necessity. To this end, we introduce the notion of Self-Supervised Autogenous Learning (SSAL). A SSAL objective is realized through one or more additional targets that are derived from the original supervised classification task, following architectural principles found in multi-task learning. SSAL branches impose low-level priors into the optimization process (e.g., grouping). The ability of using SSAL branches during inference, allow models to converge faster, focusing on a richer set of class-relevant features. We equip state-of-the-art DNNs with SSAL objectives and report consistent improvements for all of them on CIFAR100 and Imagenet. We show that SSAL models outperform similar state-of-the-art methods focused on contextual loss functions, auxiliary branches and hierarchical priors.

TSViz - Time Series Visualization

Deep neural networks are used in a range of different applications, which poses an important question on their trustworthiness, which is extremely important especially for safety-critical domains. Explanation plays a key role in gaining trust. Most of the existing explanations are designed for images or language as these modalities are directly intelligible for humans. Conversely, in the case of time-series, the data is unintelligible. Therefore, finding an explanation for a given network decision is a challenging task that involves different factors such as the domain, target group, and data format. TSViz acts as a bridge/interface between researchers, developers, and users to explain certain decisions, and understand errors. Some of the key features of the framework are the importance ranking of the input data and the filters, the network partitioning, and the optimization based on the partitioning and the importance. In addition, the depth of the explanation is aligned to the user experience.

exAID - AI-based, explainable Diagnostic Assistance System

The lack of trust in opaque Deep Learning (DL)-based AI systems is a major obstacle in their practical support of clinical processes like the diagnosis of skin cancer. The aim of exAID is to provide practicing dermatologists with a detailed decision support that acts as a safety net. In a diagnostic setting, exAID offers comprehensive conceptual explanations that conform with expert approved concepts. In addition to generated concept scores, exAID allows to localize specific concept regions for quick validation by doctors. Moreover, an analytic mode allows to exploratively examine the collected database as well as the AI’s behavior.

P ≈ NP, at least in Visual Question Answering

In recent years, progress in the Visual Question Answering (VQA) field has largely been driven by public challenges and large datasets. One of the most widely-used of these is the VQA 2.0 dataset, consisting of polar ("yes/no") and non-polar questions. Looking at the question distribution over all answers, we find that the answers "yes" and "no" account for 38 % of the questions, while the remaining 62% are spread over the more than 3000 remaining answers. While several sources of biases have already been investigated in the field, the effects of such an over-representation of polar vs. non-polar questions remain unclear. In this paper, we measure the potential confounding factors when polar and non-polar samples are used jointly to train a baseline VQA classifier, and compare it to an upper bound where the over-representation of polar questions is excluded from the training. Further, we perform cross-over experiments to analyze how well the feature spaces align. Contrary to expectations, we find no evidence of counterproductive effects in the joint training of unbalanced classes. In fact, by exploring the intermediate feature space of visual-text embeddings, we find that the feature space of polar questions already encodes sufficient structure to answer many non-polar questions. Our results indicate that the polar (P) and the non-polar (NP) feature spaces are strongly aligned, hence the expression P ≈ NP.

ExplAINN - Explainable AI and Neural Networks

Despite astonishing progress in the field of Machine Learning (ML), the robustness of high-performance models, especially the ones based on Deep Learning technologies, has been lower than initially predicted. These networks do not generalize as expected, remaining vulnerable to small adversarial perturbations (also known as adversarial attacks). Such shortcomings pose a critical obstacle to implement Deep Learning models for safety-critical scenarios such as autonomous driving, medical imaging, and credit rating. Moreover, the gap between good performance and robustness also demonstrates the severe lack of explainability for modern AI approaches: Despite good performance, even experts cannot reliably explain model predictions. Hence, the goals of this project are threefold: Investigate methods of explainability and interpretability for existing AI approaches (focusing on Deep Neural Networks). Develope novel architectures and training schemes that are more interpretable by design. Analyze the trade-offs between explainability, robustness, and performance.

Knowledge Incorporated Neural Networks

Knowledge Incorporated Neural Networks combines knowledge of the domain experts with strengths of Deep Neural Networks (DNNs), offering unprecedented performance boost and robustness to DNN systems. Nowadays, Artificial Intelligence (AI) has almost become synonymous to DNNs and rightly so since DNNs have completely revolutionized AI and have offered amazing performance in various domains, sometimes even beating their human counterparts in terms of sheer performance. The ability of DNNs to rely on data to discover and extract features that are useful for the solution without having any prior knowledge about the problem is fascinating. However, where many see this as a strength of DNNs it is also its biggest weakness because in relying solely on the data we ignore one very important domain i.e. the expert knowledge, which can have information that is not present in the data. Hence, relying on only one domain, be it knowledge or data, can be suboptimal. In Knowledge Incorporated Neural Networks we have built novel architectures that allows for fusion of information between data driven and knowledge driven domains combining best of both worlds. We have shown that Knowledge incorporated neural networks offer unparalleled performance boost when compared to performance of DNNs or knowledge based programs in isolation, with these architectures beating traditional DNNs even when trained on a mere 10% of the data and hence alleviating shortcomings of DNNs of being too data hungry. This proves crucial in many real world problems where data are scarce and labelling expensive.

BReXSys: A Bibliographic Reference Extraction System

Bibliographic reference extraction from scientific publication is a challenging task due to diversity in referencing styles and document layout. BReXSys is an end-to-end system, for bibliographic reference extraction from scientific publications. It is capable of handling diverse input types i.e., scanned images, PDF (scanned and born-digital), text files, HTML, and structured XML. To support these diverse input types, BReXSys is equipped with various reference extraction methods which range from simple text-based methods to a complex end-to-end layout-driven deep learning-based method. BReXSys is highly configurable and can be configured to use any of these methods in isolation as well as in the form of fusion. The fusion helps to mitigate the limitations of different methods and take advantage of the strengths of the other method while using methods in isolation helps in getting more insight into individual methods.

ACE 2.0 - Academic Community Explorer 2.0

Scientific contributions play a vital role in determining the progress of society. With the exponential increase in scientific publications, it is inevitable to have a solution to carry out a semantic and pragmatic analysis of the scientific trends. Academic Community Explorer (ACE) 2.0 is equipped with state-of-the-art machine learning-based models to extract relevant information from scientific publications. ACE 2.0 identifies collaboration, citation, and topic trends. ACE 2.0 is also able to perform a range of Scientometric analysis on an individual or cumulative level to determine influential researchers for various roles in a community such as opinion leaders, idea generators, collaborators, and contribution influencers. Furthermore, ACE 2.0 is equipped with interactive visualizations to get instant insights into a scientific community.

ForGAN - Probabilistic Forecasting

Quantifying uncertainty in forecasting is crucial for optimal decision-making specially in sensitive domains. In the project ForGAN, we employed the unprecedented power of Generative Adversarial Networks in learning probability distribution to provide a probabilistic model for forecasting task. Using ForGAN, we can learn the full probability distribution of all possible outcomes. With this information at hand, our method can not only quantify uncertainty precisely but also provide extra insights about future outcomes (e.g. skew, long tails, etc). These insights can indicate various phenomena depending on the domain.


AI based Computer-Aided Diagnosis has promising prospects as an assistive tool for clinicians. However, unless these CAD systems are equipped with PACE, their deployment in routine clinical setups is highly dubious. PACE stands for Pragmatic, Accurate, Confident and Explainable; it represents a multipronged approach to enable a prototype CAD born in lab to integrate and perform in practical healthcare environment. To achieve this, research in on going to deploy practical CAD systems seamlessly in clinical workflow and assist medical practitioners in a wide range of diagnostic tasks with high accuracy. Since prognosis greatly depends on correct diagnosis and there’s very high risk associated with misdiagnosis, the CAD systems are also required to provide an estimate of uncertainty regarding their prediction. These predictions should also be explainable and justifiable in light of known medical criteria. So far, notable success has been achieved in realising clinically usable CAD systems with improved accuracy and uncertainty estimates along with explanations of their predictions using applications from Ophthalmology and Dermatology.

EDEL - Efficient Data Processing for Deep Learning

While modern GPUs could easily process hundreds or even thousands of frames per second for training, frameworks like PyTorch and Tensorflow provide less than 100 frames per second per CPU core. The recommendation to close this gap by using additional CPU resources often fails already today because too few CPU cores per computing node (server) and thus GPU are available. An even further increasing GPU density is already foreseeable. One of EDeL's goals is therefore to develop software solutions for the highest possible data throughput per CPU core for efficient training of deep learning models. This is to be achieved both by using optimized software and by relocating calculations, which currently often take place on CPUs, to GPUs. This will reduce idle times and thus increase the efficiency of the training of machine learning methods. Another aspect that stands in the way of efficient use of hardware is the large variety of data sets and their formats. Even very similar data sets are often delivered in different formats, which makes it impossible to exchange them without adapting the training program. In addition, the above mentioned formats can usually only be read with insufficient speed or high CPU load, which in turn stands in the way of full use of the GPUs. Therefore another main goal of EDeL is the development of a unified storage format for data sets, which makes them exchangeable and especially fast loadable with low CPU load. The planned developments are to be made available to the widest possible audience from research and industry. To this end, libraries are to be made available under an open source software license, as is customary in the deep learning sector.


The goal of MetaDL is code generation for AI applications on a variety of accelerator-based systems based on the AnyDSL framework. Particularly interesting are systems with special instructions for deep learning such as the tensor cores on NVIDIA graphics processors (GPUs) or dedicated AI hardware such as Google's Tensor Processing Units (TPUs). The usefulness of the concepts developed is demonstrated by applications in bioinformatics (analysis of DNA sequences) and image synthesis (noise suppression).

Project Skincare

In Skincare, we develop a mobile application for patients and health professionals in the context of skin cancer diagnosis and treatment. We will combine patient records with mobile images for knowledge discovery and knowledge acquisition toward decision support and services in clinical and non-clinical environments. Input modes include smartphones for a direct digitization of patient data and images. The innovative aspect is a holistic view on individual patients based on teledermatology, whereby patient data and lesions photographed with a mobile device can be taken into account for clinical and non-clinical decision support.


The goal of KAMeri is to detect different mental and emotional states of worker cooperating with robots in a real industrial setting. We use EEG data recorded by a dry cap system build in the project to train a network to learn to detect critical situations.


In previous applications, the online detection of error-related potentials (ErrPs) have been used in human-robot interaction, in which ErrPs are used as rewards in reinforcement learning. In the project Q-Rock, we we will use ErrP in deep reinforcement learning approaches.

Q-Rock - Robot behavior learning

The Q-Rock project is about using artificial Intelligence to let robots learn about their own capabilities. For this purpose, we use Deep Neural Networks to represent parameterizable robot behaviors obtained from an evolutionary algorithm. These behaviors can later be composed or refined using Deep Reinforcement Learning in simulation, and then executed in the robot.

What do Deep Networks Like to See

We propose a novel way to measure and understand convolutional neural networks by quantifying the amount of input signal they let in. To do this, an autoencoder (AE) was fine-tuned on gradients from a pre-trained classifier with fixed parameters. We compared the reconstructed samples from AEs that were fine-tuned on a set of image classifiers (AlexNet, VGG16, ResNet-50, and Inception v3) and found substantial differences. The AE learns which aspects of the input space to preserve and which ones to ignore, based on the information encoded in the backpropagated gradients. Measuring the changes in accuracy when the signal of one classifier is used by a second one, a relation of total order emerges. This order depends directly on each classifier’s input signal but it does not correlate with classification accuracy or network size. Further evidence of this phenomenon is provided by measuring the normalized mutual information between original images and auto-encoded reconstructions from different fine-tuned AEs. These findings break new ground in the area of neural network understanding, opening a new way to reason, debug, and interpret their results. We present four concrete examples in the literature where observations can now be explained in terms of the input signal that a model uses.


The MyCarlot project aims to develop a parking assistant, from where you can book a parking spot or the app can guide you in advance to the closest or most available parking in your vicinity. For this purpose we develop a module that can predict the occupancy of a parking space from data measured by sensors in each parking spot. For this we used deep neural networks and LSTMs, which can accurately model seasonality during the week and special dates, providing a superior solution for parking occupancy prediction.

Project ERICS

Refugees and migrants often lack essential information about asylum, housing, health, kindergarten, university, or work. Even when they know where and how to query this information on the Internet, this process can be very time-consuming, exhausting, disorienting, and confusing. How does ERICS and the Eike chatbot based on deep learming help? We have built an easy-to-use chatbot service that is able to interact with refugees and migrants via natural language. In order to do so, we designed a new user interface to: create an empathic and engaging visual interface; facilitate a natural text-based dialogue between users and our chatbot during the information seeking and retrieval process; collect user feedback to further improve the chatbot service across time using machine learning.


The TRACTAT project aims to lay the foundation for a smooth and effective Transfer of Control (ToC) between autonomous systems and humans in cyber-physical environments. In case of self-driving vehicles, the system must occasionally ask a human to take over in anomalous or otherwise unexpected situations. We train our ML systems based on a variety of user, environment, and situation related data ranging from traffic information to biosensor input to predict the smallest feasible time of transfer to allow a safe takeover.


In previous projects, the python-based framework for signal processing and classification (pySPACE) have been used to detect a specific electroencephalogram (EEG) pattern online in robotic applications and brain-computer-interfaces (BCIs). In the project TransFit, deep learning algorithms are implemented and integrated in the pySPACE. For example, convolutional neural network (CNN) is implemented to detect to P300, which is elicited by the recognition of task-specific events.

TransFIT - Object Recognition

In order to have helpful assistance of autonomous systems it's essential for robots to perceive and locate objects in their surroundings. In the TransFIT project we have developed a modular deep learning library for robot vision tasks. Specifically, we have implemented state-of-the-art models for real-time object detection of tools, common objects and humans. Furthermore, we have added pose and keypoint estimation methods that predict the orientation of objects in 3D space. Our library contains the complete optimization pipelines as well as a general structure for fast prototyping. This flexibility allows us to quickly develop ideas and implement new state-of-the-art deep learning architectures for robot-vision.

TransFIT - RH5 learning to walk

Walking is simple … or is it? State of the art bipedal humanoid robotic systems still struggle with this surprisingly complex task. In this project, novel approaches utilizing Deep Reinforcement Learning are developed to train joint-level and whole-body controllers that minimize communication overhead. For this purpose, biologically inspired control loops are implemented. In combination with imitation learning techniques, where the error correction ability of humans is leveraged, we aim at reducing training time and generating stable and human-like walking behavior on a humanoid robot prototype.

TransFIT - Trajectory Learning

Learning complex end-effector trajectories to show complicated behaviors is a challenging problem for robots because the search space for reinforcement learning algorithms is very high-dimensional. Our approach tries to capture various demonstrations of a skill that should be learned from humans in the latent space of an autoencoder. We can search for a precise solution to our problem in this latent space. As a result, reinforcement learning for adaptation to new, but similar tasks can be much faster than it would be in the original space of trajectories.


The Dreams4Cars project explores improvements in autonomous driving through self-learning from a dreamlike simulation. One part of the project involves learning forward and inverse dynamics models for a real autonomous car, and we use deep and recurrent neural networks for this purpose, which allows to learn the real behavior of the vehicle directly from data, without any simulation, enabling more realistic driving for the end user.


The goal of xMove is the development of a prototypical support system based on LSTMs and standard Machine Learning methods for workers in the field of assembly and inspection of aircraft. The current workstep of a worker is automatically recognized. The collected movement data is also used to improve employee support. This is done through automated processing for ergonomic support of workers through the use of data glasses.

HP-DLF High Performance Deep Learning Framework

The goal of HP-DLF is to provide researchers and developers in the "deep learning" domain an easy access to current and future high-performance computing systems. For this purpose, a new software framework will be developed, which automates the highly complex parallel training of large neural networks on heterogeneous computing clusters. The focus is on scaling and energy efficiency, as well as high portability and user transparency. The goal is to scale the training of networks designed in existing frameworks, without additional user effort, over a three-digit number of compute nodes.


To allow for better integration of renewable energy into our energy systems, it is essential to forecast the amount of energy production in order to keep the grids stable or to avoid blackouts. In our energy projects (Designetz, charge4C, BloGPV, PolyEnergyNet) we provide deep learning techniques for day-ahead and intraday forecasting of photovoltaic and wind energy production as well as forecasting the electricity consumption of private households. We use weather forecasts and historical production data to build and train the corresponding prediction models. Learning techniques for optimizing energy storage management are also investigated.

ZIM Kartoffelfäule

The notion of Kartoffelfaeule project is to devise an intelligent system capable of detecting and segmenting phytophthora infestans from healthy plants using the images of the plant fields taken from a certain altitude. The system is supposed to act like an early warning system that informs the relevant responsible personnel about the existence of potato blight/disease in the plants. As, the plants behave differently to different spectral bands of incident light when they undergo some chemical or structural changes on molecular levels, and these changes help identify and differentiate between healthy and infected plants, henceforth a multi-spectral camera is used to capture the behavior of plants under different wavelengths. The camera is attached with the semi-autonomous drone which flies over the fields and capture the images of the field. The acquired images are then preprocessed and the characteristics about the health state of the plants are determined using various machine learning methods such as Support vector machines, neural networks and Convolutional Neural Networks. The system is supposed to help the farmers to automatically identify the existence of disease and help them apply pesticides only to the affected area of the field instead of whole field which is normally in hundreds or thousands of acres.


Predictive maintenance for printers will bring a shift in the performance and durability of commercial printing equipment. In the current commercial printing environment, commercial printers need to be able to deliver constant uptime at the customer location. To avoid service interruptions, manufacturers have mainly emphasized corrective and preventive maintenance. Corrective maintenance is about fixing a machine when it really is broken, while preventive maintenance involves replace parts when they are about to exceed their expected lifespan for error-free service. However, these two approaches do not satisfy the growing need of commercial printing services for reliability and predictability in their production systems.
Océ, together with technology provider DFKI, is taking the path of focusing on predicting, rather than correcting or preventing potential issues. Using sensor data produced by commercial printer, and analyzing this with an algorithm, makes it possible to determine when a part, or multiple parts, are starting to fail, so corrective actions can be taken. The algorithm itself is being developed by DFKI based on a large set of different data coming from different printers and spanning several months of operations.
A first working prototype of the whole system is showcased by the end of 2018 and a matured product will be expected at the end of 2019.


Approaching the target while avoiding obstacles: In order for a manipulator to perform manipulation tasks, such as reaching or approaching an object to grasp etc., freely and intelligently, it is vital that it could avoid the obstacles in its path while performing its tasks successfully. For this part of the project, we devised a work-flow based on Deep Learning techniques that makes the manipulator avoid obstacles while approaching the object to grasp. Based on the location of the obstacle, the manipulator changes its path to reach the object successfully.


Tasty food, sweet animals, breathtaking landscapes - to characterize something briefly, an adjective together with a noun is being used. These adjective noun pairs (ANP) describe the visual content of an image together with the feelings that it triggers at the viewer. If they occur with great frequency, they can be used for the machine description of images that go beyond a textual reproduction of the visual content. Capttitude (Captions with Attitude) is a system able to produce affective captions with an emotional component. Two different methods are used, which as a result provide two variants of a text caption with emotional content; a Convolutional Neural Network (CNN) together with a long-short-term-memory (LSTM) network, and in the second approach a graph-based concept and syntax transition (CAST) network.


The project DeepEye faces the challenge of recognizing natural catastrophes in satellite images and to enrich the information with multi-media content gained from the social media. With an image analysis, the different spectral bands of satellite data are combined and the geographic areas affected by a natural catastrophe will be extract. In a multimodal analysis, relevant information from text, image and meta data is extracted using various methods of machine learning such as Convolutional Neural Networks. The focus of the analysis is the extraction of contextual aspects in order to obtain a comprehensive and complete view of a specific event. These prepared data can be used in the context of crisis management, for example for the coordination of the rescue forces in situ. With the combination of satellite data and multi-media content from social media, DeepEye aims for the next big step in crisis management: detailed depictions of natural catastrophes by the fusion of different information channels.


The aim of the project Mantos is the maintenance of the satellites in space. For this purpose, the manipulators are supposed to perform the maintenance tasks autonomously. One part of this whole project includes the pose estimation of an object and then learning of the skill for manipulator based-contact operation with that object which can be considered as peg-in-a-hole task. For the pose estimation, deep learning techniques are used and for solving the peg-in-a-hole part, imitation learning is employed.


Prof. Dr. Prof. h.c. Andreas Dengel


Contact Us

Prof. Andreas Dengel
German Research Center for Artificial Intelligence
Trippstadter Straße 122
67663 Kaiserslautern

Phone: (+49 631 20575 1000)