Lidé

Ing. Zdeněk Straka, Ph.D.

Všechny publikace

PreCNet: Next-Frame Video Prediction Based on Predictive Coding

  • DOI: 10.1109/TNNLS.2023.3240857
  • Odkaz: https://doi.org/10.1109/TNNLS.2023.3240857
  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, SSIM) was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance.

Single-Grasp Deformable Object Discrimination: The Effect of Gripper Morphology, Sensing Modalities, and Action Parameters

  • DOI: 10.1109/TRO.2024.3463402
  • Odkaz: https://doi.org/10.1109/TRO.2024.3463402
  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    In haptic object discrimination, the effect of gripper embodiment, action parameters, and sensory channels has not been systematically studied. We used two anthropomorphic hands and two 2-finger grippers to grasp two sets of deformable objects. On the object classification task, we found: (i) among classifiers, SVM on sensory features and LSTM on raw time series performed best across all grippers; (ii) faster compression speeds degraded performance; (iii) generalization to different grasping configurations was limited; transfer to different compression speeds worked well for the Barrett Hand only. Visualization of the feature spaces using PCA showed that gripper morphology and action parameters were the main source of variance, making generalization across embodiment or grip configurations very difficult. On the highly challenging dataset consisting of polyurethane foams alone, only the Barrett Hand achieved excellent performance. Tactile sensors can thus provide a key advantage even if recognition is based on stiffness rather than shape. The data set with 24,000 measurements is publicly available.

A normative model of peripersonal space encoding as performing impact prediction

  • DOI: 10.1371/journal.pcbi.1010464
  • Odkaz: https://doi.org/10.1371/journal.pcbi.1010464
  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    Accurately predicting contact between our bodies and environmental objects is paramount to our evolutionary survival. It has been hypothesized that multisensory neurons responding both to touch on the body, and to auditory or visual stimuli occurring near them—thus delineating our peripersonal space (PPS)—may be a critical player in this computation. However, we lack a normative account (i.e., a model specifying how we ought to compute) linking impact prediction and PPS encoding. Here, we leverage Bayesian Decision Theory to develop such a model and show that it recapitulates many of the characteristics of PPS. Namely, a normative model of impact prediction (i) delineates a graded boundary between near and far space, (ii) demonstrates an enlargement of PPS as the speed of incoming stimuli increases, (iii) shows stronger contact prediction for looming than receding stimuli—but critically is still present for receding stimuli when observation uncertainty is non-zero—, (iv) scales with the value we attribute to environmental objects, and finally (v) can account for the differing sizes of PPS for different body parts. Together, these modeling results support the conjecture that PPS reflects the computation of impact prediction, and make a number of testable predictions for future empirical studies.

Learning a peripersonal space representation using Conditional Restricted Boltzmann Machine

  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    We present a neural network learning architecture composed of a Restricted Boltzmann Machine (RBM) and a Conditional RBM (CRBM) that performs multisen- sory integration and prediction, motivated by the problem of learning a representation of defensive peripersonal space. This work follows up on our previous work (Straka and Hoffmann 2017) where we proposed a network composed of a RBM and a feedforward neural network (FFNN). In this work, with a similar 2D simulated scenario, we sought to replace the FFNN with an RBM-like module and opted for the CRBM which is responsible for making a temporal prediction. We demonstrate that the new architecture is capable of learning to map from visual and tactile inputs at a previous time step (without tactile activation) to future activations with the visual stimulus at the “skin” and corresponding tactile activation, including the confidence of the predictions.

Robotic homunculus: Learning of artificial skin representation in a humanoid robot motivated by primary somatosensory cortex

  • DOI: 10.1109/TCDS.2017.2649225
  • Odkaz: https://doi.org/10.1109/TCDS.2017.2649225
  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    Using the iCub humanoid robot with an artificial pressure-sensitive skin, we investigate how representations of the whole skin surface resembling those found in primate primary somatosensory cortex can be formed from local tactile stimulations traversing the body of the physical robot. We employ the well-known self-organizing map algorithm and introduce its modification that makes it possible to restrict the maximum receptive field (MRF) size of neuron groups at the output layer. This is motivated by findings from biology where basic somatotopy of the cortical sheet seems to be prescribed genetically and connections are localized to particular regions. We explore different settings of the MRF and the effect of activity-independent (input-output connections constraints implemented by MRF) and activity-dependent (learning from skin stimulations) mechanisms on the formation of the tactile map. The framework conveniently allows one to specify prior knowledge regarding the skin topology and thus to effectively seed a particular representation that training shapes further. Furthermore, we show that the MRF modification facilitates learning in situations when concurrent stimulation at nonadjacent places occurs (“multitouch”). The procedure was sufficiently robust and not intensive on the data collection and can be applied to any robots where representation of their “skin” is desirable.

Toward safe separation distance monitoring from RGB-D sensors in human-robot interaction

  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    The interaction of humans and robots in less constrained environments gains a lot of attention lately and safety of such interaction is of utmost importance. Two ways of risk assessment are prescribed by recent safety standards: (i) power and force limiting and (ii) speed and separation monitoring. Unlike typical solutions in industry that are restricted to mere safety zone monitoring, we present a framework that realizes separation distance monitoring between a robot and a human operator in a detailed, yet versatile, transparent, and tunable fashion. The separation distance is assessed pair-wise for all keypoints on the robot and the human body and as such can be selectively modified to account for specific conditions. The operation of this framework is illustrated on a Nao humanoid robot interacting with a human partner perceived by a RealSense RGB-D sensor and employing the OpenPose human skeleton estimation algorithm.

Versatile distance measurement between robot and human key points using RGB-D sensors for safe HRI

  • DOI: 10.5445/IR/1000086870
  • Odkaz: https://doi.org/10.5445/IR/1000086870
  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    The safety of collaborative robots’ and human interaction can be guaranteed in two main ways: (i) power and force limiting and (ii) speed and separation monitoring. We present a framework that realises separation distance monitoring between a robot and a human operator based on key point pair-wise evaluation. We show preliminary results using a setup with a Nao humanoid robot and a RealSense RGB-D sensor and employing OpenPose human skeleton estimation algorithm, and work in progress on a KUKA LBR iiwa platform.

Learning a Peripersonal Space Representation as a Visuo-Tactile Prediction Task

  • DOI: 10.1007/978-3-319-68600-4_13
  • Odkaz: https://doi.org/10.1007/978-3-319-68600-4_13
  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    The space immediately surrounding our body, or peripersonal space, is crucial for interaction with the environment. In primate brains, specific neural circuitry is responsible for its encoding. An important component is a safety margin around the body that draws on visuo-tactile interactions: approaching stimuli are registered by vision and processed, producing anticipation or prediction of contact in the tactile modality. The mechanisms of this representation and its development are not understood. We propose a computational model that addresses this: a neural network composed of a Restricted Boltzmann Machine and a feedforward neural network. The former learns in an unsupervised manner to represent position and velocity features of the stimulus. The latter is trained in a supervised way to predict the position of touch (contact). Unique to this model, it considers: (i) stimulus position and velocity, (ii) uncertainty of all variables, and (iii) not only multisensory integration but also prediction.

Where is my forearm? Clustering of body parts from simultaneous tactile and linguistic input using sequential mapping

  • Autoři: Štěpánová, K., doc. Mgr. Matěj Hoffmann, Ph.D., Ing. Zdeněk Straka, Ph.D., Klein, F.B., Cangelosi, A., Vavrečka, M.
  • Publikace: Kognice a umělý život XVII [Cognition and Artificial Life XVII]. Bratislava: Comenius University Bratislava, 2017. p. 155-162. ISBN 978-80-223-4346-6.
  • Rok: 2017
  • Pracoviště: Vidění pro roboty a autonomní systémy
  • Anotace:
    Humans and animals are constantly exposed to a continuous stream of sensory information from different modalities. At the same time, they form more compressed representations of concepts or symbols. In species that use language, this process is further structured by this interaction, where a mapping between the sensorimotor concepts and linguistic elements needs to be established. There is evidence that children might be learning language by simply disambiguating potential meanings based on multiple exposures to utterances in different contexts (cross-situational learning). In existing models, the mapping between modalities is usually found in a single step by directly using frequencies of referent and meaning co-occurrences. In this paper, we present an extension of this one-step mapping and introduce a newly proposed sequential mapping algorithm together with a publicly available Matlab implementation. For demonstration, we have chosen a less typical scenario: instead of learning to associate objects with their names, we focus on body representations. A humanoid robot is receiving tactile stimulations on its body, while at the same time listening to utterances of the body part names (e.g., hand, forearm, and torso). With the goal of arriving at the correct “body categories”, we demonstrate how a sequential mapping algorithm outperforms one-step mapping. In addition, the effect of data set size and noise in the linguistic input are studied.

Za stránku zodpovídá: Ing. Mgr. Radovan Suk