AI technologies, like animal detection and pose estimation, have transformed the field of behavior recognition. This research is vital for advancements in agriculture, animal care, veterinary medicine, and various science fields, including pharmacology and neurobiology.
Our article discusses how artificial intelligence and computer vision are applied to identify patterns in animal behavior. We also address the challenges faced and the methods implemented to tackle them, highlighting the importance of AI in enhancing our understanding and management of animal behavior.
We will review several important AI methods which are well-used in the field of animal behavior analysis.
A "behavior" means how an animal or person acts in response to a particular situation or stimulus. On the other hand, from a deep learning perspective, the behavior could be described as a sequence of an animal’s actions or movements. The task of animal behavior recognition becomes a matter of identifying and classifying these movement patterns using observed data, such as video footage.
In the majority of cases, the process involves detecting animals, along with estimating and tracking their poses. However, some methods do not rely on pose tracking. We will begin by exploring approaches dedicated to animal pose detection and then proceed to discuss combined methods that address both pose and behavior recognition.
Pose estimation and tracking are crucial for recognizing animal behavior. Methods like MoSeq (discussed in section 3.1) or B-SOiD utilize sequences of keypoints, that represent the positions of animal body parts over time, instead of relying directly on video footage. These keypoints, forming a pose or skeleton, can be depicted in 2D or 3D space and must accurately capture the animal's posture as seen by the camera.
Pose estimation methods are already well-developed for human subjects, as we discussed in our article “AI and Fitness: Explore Exercise with Pose Tracking Technology”. Furthermore, extending these techniques to analyze animal movements has also advanced significantly, supported by numerous specialized papers and repositories.
An example of such adaptation is ScarceNet, an AI animal pose recognition model developed by Chen Li and Gim Hee Lee in 2023. ScarceNet employs a semi-supervised learning approach, combining labels from a pre-trained model with labeled data to train the model for tracking 2D and 3D poses across various animals using one or multiple cameras.
The challenge lies in the diversity of animal species, each with its unique shape, size, and body structure. Even within the same species, factors such as age, gender, size, and coloration significantly alter the appearance.
Animals can also adopt a variety of intricate poses, and some of them, like snakes or octopuses, are capable of highly flexible movements, which makes their poses challenging to estimate.
As a result, developing a universal solution for animal pose estimation is a complicated task. We will observe two methods, that accurately detect animals’ poses, so you can use them in your projects.
SLEAP is an open-source tool introduced by Pereira et al (2022). It is designed to track how animals move and interact, especially when multiple animals are present. This is particularly important for studying how animals interact with each other and their social behaviors. A common challenge arises when animals are close together or overlap in view, making it hard to tell which part of the body belongs to which animal because they occlude each other.
There are two basic approaches to this challenge, top-down and bottom-up, both of which are implemented in SLEAP. The first approach involves first locating each of the instances with a bounding box, and then solving a more standard pose estimation problem with a neural network (by default, SLEAP uses the UNET model as a backbone, but there are other options to choose from). This approach is usually faster but scales poorly for large numbers of animals.
The bottom-up approach involves determining the location of all body parts in the image and grouping them into instances.
At first, SLEAP's neural network detects all the keypoints visible in the video frame but doesn't immediately know which animal they belong to. Part affinity fields, also predicted by the neural network, come into play here, creating links or "graphs" between these points to determine the exact pose of each animal, even in a crowded scene.
This approach greatly enhances the accuracy of identifying how animals are positioned and moving in the case of many animals.
DeepLabCut is a widely used tool for 2D and 3D animal pose estimation and tracking. It provides a comprehensive infrastructure, including a GUI for data labeling. Also, it allows you to train and evaluate your own models, or you can apply existing deep learning models from its extensive model zoo.
Like SLEAP, it utilizes standard convolutional neural network backbones, such as ResNets or EfficientNets, for animal pose estimation and employs part affinity fields to group identified keypoints into instances.
DeepLabCut also introduces a method for accurate and efficient animal tracking through an unsupervised reID system. Their article, "Multi-animal pose estimation, identification and tracking with DeepLabCut," proposes a transformer-based neural network head that learns animal identities using the same CNN backbone. This is particularly useful in scenarios where basic tracking methods like SORT fail to correctly resolve tracks due to interactions between animals or in instances where animals can enter or exit the scene.
After covering several most popular animal pose estimation and tracking solutions, let’s explore widely-used techniques for analyzing animal behavior. The first method involves the use of 3D depth videos or keypoints derived from an animal pose recognition technique. The second method simplifies the process by recognizing animal poses directly, without the need for additional input. We'll delve into the technical details of both approaches.
MoSeq, or Motion Sequencing is a behavioral analysis method developed by Harvard Medical School (2019) to study and understand how mice behave.
The method combines 3D imaging with unsupervised machine learning (an autoregressive hidden Markov model, or AR-HMM) to identify recurring patterns in animal behavior of animal behavior. It takes as input a 3D depth video recording or sequence of keypoints, which could be obtained using technologies like ScarceNet or DeepLabCut. This input is processed and translated into "syllables."
Think of "syllables" as the basic building blocks of the mouse's actions, much like how syllables form words in language. For example, a mouse might have a "syllable" for running, another for grooming, and another for eating. By identifying these "syllables," researchers can piece together the mouse's behavior more coherently, helping them understand the full story of what the mouse does and why.
MoSeq provides a novel approach to identifying and organizing these behavior "syllables" and tracking how often they change. This method is precious in neurobiology, where it has been used to identify specific behaviors induced by drugs in mice.
One of the key benefits of using MoSeq is its reliance on unsupervised machine learning, which eliminates the need for manually labeling data. However, this approach comes with limitations. Interpreting the identified "syllables" into behaviors that we can easily understand can be difficult, and the system requires retraining for each new dataset introduced. Furthermore, the usage of an autoregressive hidden Markov model (AR-HMM) means that only one syllable label can be assigned to each frame, even though an animal might perform multiple actions simultaneously.
MoSeq has significantly impacted research, allowing scientists to analyze and predict the behavior of mice with high accuracy in such vital areas as the study of Alzheimer's disease (“Revealing the structure of pharmacobehavioral space through motion sequencing,” 2020) or epilepsy (“Hidden behavioral fingerprints in epilepsy”, 2023)
LabGym, a tool developed by researchers at the University of Michigan and Northern Illinois University (2023), quantifies specific animal behaviors from standard 2D video footage. LabGym handles a diverse range of animals in any environment.
It can analyze not only the collective behavior of a group of animals, such as grooming but also identify specific actions of individual animals within the group, like a chipmunk taking a peanut from a hand.
The workflow begins with detecting objects in the footage. This can be achieved by either background subtraction, which is quick but less effective in dynamic backgrounds or during close contact between animals, or through more sophisticated instance segmentation techniques provided by Detectron2. The second method is slower, but it offers higher accuracy and adaptability.
The extracted data is used to generate a short animation of behavior for a chosen time window and a pattern image that visualizes the movement within that period. These outputs are analyzed by LabGym's Categorizer, which combines several neural networks with both convolutional and recurrent layers to identify user-defined behavior present in the animation. This component requires human-labeled data for its supervised learning process.
Additionally, LabGym introduces the Quantifier, a beneficial tool in research settings for measuring specific behavioral metrics like movement speed, action intensity, and counting the occurrences of certain behaviors.
To capture the real-world impact and breadth of the power of deep learning in the study of animal pose estimation and animal behavior, we cite several essential studies that demonstrate the significant advances made in this area.
1. In the paper “EXPLORE: a novel deep learning-based analysis method for exploration behaviour in object recognition tests” by Victor Ibañez et al. (2023). Published in the Nature Journal, this paper introduces a technique for identifying and quantifying exploration behavior in rodents, a key component in memory function research.
2. "Counting Piglet Suckling Events using Deep Learning-based Action Density Estimation” by Haiming Gan et al. (2023). Featured in ScienceDirect, the study proposes a novel method for analyzing the frequency of piglet suckling events.
3. "Explainable Automated Pain Recognition in Cats" by Marcelo Feighelstein et al. (2023). Published in the Nature Journal, this research develops a method to assess pain in cats through facial expression analysis, advancing animal welfare and veterinary care.
4. "Automatic detection of stereotypical behaviors of captive wild animals based on surveillance videos of zoos and animal reserves" by Zixuan Yin et al. (2024). Available in ScienceDirect, this paper aims to detect early signs of stress and depression in captive wild animals, suggesting improvements in captivity conditions to enhance animal well-being.
This highlights that animal behavior recognition has applications in fields ranging from neuroscience and pharmaceuticals, providing a deep understanding of the cognitive function and therapeutic interventions, to agriculture, advancing monitoring practices, and enhancing animal welfare.
The application of AI is crucial for advancing our understanding of animal welfare, agricultural practices, and scientific research. Although deep learning introduces certain challenges, such as the need for large annotated datasets and the difficulty of tracking multiple animalssimultaneously, improvements in algorithms have led to more accurate and efficient systems.
These advances allow researchers to study the natural behavior of animals, help improve animal care practices, and increase the accuracy of neurobiological and pharmacological studies.
At OpenCV.ai, we are dedicated to developing computer vision solutions tailored for laboratories to analyze animal behavior. Whether you're looking to integrate AI into your research projects or require specialized consultation, our team is here to provide expert guidance and support. Visit our Services page for more details on how we can assist in elevating your research.
Thank you for engaging with our content!