Computer vision is now one of the key areas of artificial intelligence. It enables computers to “see” and understand visual data, ranging from face recognition and autonomous driving to the analysis of human movement in sports or medicine. As the amount of visual data continues to grow, so does the importance of methods that can accurately and reliably identify objects and people in images, even in challenging situations.
This is precisely the problem addressed by Constantin Kolomiiets in his paper "SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds." He builds on the existing Segment Anything (SAM) model, which is used for object segmentation in images. Based on a set of input points, the model can recognize a specific object and create its segmentation mask—a collection of all pixels belonging to that object or person in the image. However, SAM is a general-purpose model and can fail in certain situations, such as when people overlap in a crowd or are in physical contact. "The model often incorrectly includes parts of other bodies or detects only clothing instead of the entire person," explains Kolomiiets, who worked on the paper together with Ing. Miroslav Purkrábek and Prof. Jiří Matas from the Visual Recognition Group (VRG) at CTU FEE.
The solution was to modify and retrain the model so that it adapts specifically to recognizing human figures. Instead of the general inputs on which the original SAM model was trained, the authors guided their SAM-pose2seg model during training using inputs derived from specific human poses, such as joint or nose positions. As a result, they achieved significantly better performance in difficult scenarios with overlapping people. "The new model also helped simplify the algorithm we used to select a set of input points from the full pose, allowing it to better handle inaccurate poses," adds the paper’s first author, Constantin Kolomiiets.
The result is a method capable of precisely "cutting out" individual figures from an image—more specifically, their segmentation masks—based on their skeletal structure. The model can be used, for example, for tracking players on a field, identifying individuals in a crowd, or analyzing human movement. The technology can also assist in annotating data that does not yet have segmentation masks, or in the future, in converting 2D images into 3D models, which has potential applications in virtual reality or virtual clothing try-on.
The 29th Computer Vision Winter Workshop took place in Jindřichův Hradec from February 9 to 11, 2026. Each year, research groups focused on computer vision from Prague, Ljubljana, Graz, and Vienna come together at this event. The workshop aims to facilitate the sharing of new scientific insights among groups and to provide young researchers and students with conference experience. It was at this workshop that Constantin Kolomiiets received the Best Paper Award for his research article and resulting model. The jury appreciated the clear presentation of the underlying principles and, in particular, the contribution of the work to the field.
"The award is a great motivation for me to continue my research, and I am very pleased that even beginners have the opportunity to participate in such conferences and receive recognition for their work," says Kolomiiets, who continues to work on the project within the Visual Recognition Group (VRG) at CTU FEE. Together with his colleagues, he is now exploring newly available technologies and considering future directions for the project.
Constantin Kolomiiets joined the VRG research group even before starting his first year at the FEE CTU through summer internships for prospective first-year students, which are offered annually by the Open Informatics program. "Although I had no prior experience in computer vision at the time, Prof. Matas explained the essence of the project I was to contribute to, and I then worked mainly with Matěj Suchánek and Mirek Purkrábek, who were very supportive and willing to answer all my questions," Kolomiiets describes his summer internship at VRG. At that time, he worked on a simpler task, as he was just getting acquainted with the basics; he focused on a single model and data augmentation. Nevertheless, the internship gave him a clear idea of what working in a research group looks like.
Kolomiiets rejoined the VRG the following summer after completing his first year of studies. "I wanted to gain valuable experience in the field, not only to absorb theoretical knowledge during my studies but also to apply the skills I had acquired in practice," he explains. This is why he became involved in research at VRG again and began working on the aforementioned SAM-pose2seg model. He evaluates his experience in the research group very positively and recommends it to other students. "I really appreciate that the university provides opportunities already in the early years of study to engage in cutting-edge research reflecting current trends at leading universities worldwide. I would recommend starting with a summer internship and then deciding whether it is manageable time-wise. If you achieve good results and can handle everything consistently, I wouldn’t hesitate to give it a try. It is also excellent preparation for an independent project and a bachelor’s thesis in the third year," Kolomiiets concludes.
Photo Credit: Petr Neugebauer