The Language of Gait


Language – the words and sentences used to make our intentions known to others and help us understand others’ intentions toward us — are only part of human-to-human communication. Non-verbal communication (gestures, actions, body movements) make up three quarters of our ability to communicate. Within weeks of being born we learn the meaning of a whole physical language, from a mother’s smile to a sibling’s frown. From thereon, how we sit, how we stand, where we look and what we do with our hands and feet give us clues about the thoughts, sincerity and mood of others.

This is clearly noticeable in the streets among the crowd of people walking in every direction. Some walk fast, some slow. Some upright, some low. Some have their arms wide open like a penguin, while others hold close what’s dear to them: body and belongings. Some have a confident, bouncy walk and move cheerfully forward and not adhering to the line. Others walk pensive, lost in their own thoughts. Their cadence, their body weight and how they handle it, their breadth of movement, the way they carve their way through the crowd, is a story in itself. The authoritative, disciplined walk of the policemen is very different from the playful movements of kids. People with injuries become noticeable, carefully trying to hide/avoid the pain they are in.

Gait is not always an unconscious effort, as one might concur. It has been a form of expression for as long as the human history goes. The march of an army bears witness to their confidence, discipline and commitment. Famous historical characters like Charlie Chaplin’s spread-eagle stride, Monty Python’s Ministry of Silly Walks, Frankenstein, etc., portrayed humor, terror, etc., with their iconic body movements.


Among the sensory inputs that humans can give a machine, the most recent and arguably the most rich data provider is the ‘gait’ of an individual. Compared to elementary inputs like touch (primary and fairly sophisticated in its usage), speech, gaze and biometrics, non- verbal communication especially gait has the power to passively derive the contextual information of a situation from the movement of the human body.

For example, the way people move and the way they hold themselves plays a key role in identification. Faces and facial expressions are important in the same way the spoken word is, but we also take a mental snapshot of someone’s body, their gestures and mannerisms. The better we know someone, the better we are able to recognize, interpret and predict their body language because we have evolved to be sensitive to human movement. We can even judge someone’s emotional state simply by observing their movement, expression, eyes and even their breathing and this is something we do unconsciously.

This article, however, tries to discuss the possible use-cases of this form of communication with/by a machine aka human computer interaction using non-verbal communication, specifically the gait of an individual. In 2017, researchers at the School of Psychology at the University of Aberdeen, led by Dr Karin Pilz, proved this[1] by creating a pair of computer- generated characters, and got 16 participants to identify them by observing their body language. One of the characters performed professional karate movements, while the other did the same but in a more amateur way. When their faces were swapped over, participants were still able to identify them based on their movement. The study showed that the less we are able to recognize someone, the more we rely on watching them in action. This means that we are able to recognize people from a distance even if we are unable to see their faces. The study confirmed this by using faces that had the same hairstyle, ears and face outline, so that people were not distracted by other factors.

As can be found elsewhere in the series, the earliest biometrics were palm prints - these suited computational facilities available in the 1970s. Then, there has been interest in the more popular biometrics: the fingerprint given its long forensic use; the face given that it is non-invasive and can be captured without a subject's knowledge or interaction; and the iris. Iris recognition has been proved quite an inspiration in biometrics, providing some of the largest biometric deployments and with some excellent performance . The fingerprint is now used in products such as mobile phones, computers and access control. Face recognition has a more checkered history, but it is the biometric favored by many in view of its practical advantages. These of course make face recognition more difficult to deploy, as can be found in other volumes in the International Series on Biometrics. Visitors to the US now routinely find their fingerprints and faces recorded at portals of entry. Our context here is to set the scene, not to contrast merit and advantage — that comes later. One of the main reasons for the late entry of gait onto the biometrics stage was not just idea, but also technology. Recognition by gait requires processing sequences of images and this imposes a large computational burden and only the recent advances in speed and memory made gait practicable as a biometric.

As a concept, Identification based on body language is familiar even in public domain. In Mission: Impossible–Rogue Nation, Agent Dunn had to bypass a gait-analysis security system, which identified people by the way they walk, to enter a closely guarded power plant. In 2008, Mark Nixon, a professor of computer vision at the UK’s University of Southampton, developed a 3-D gait-recognition system[2] that analyses video to identify individuals by their strut. Now, his newly improved system can ID a person from up to 100 feet away. Many cities in China and UK now use gait detection to spot criminals and monitor civilians.

But there are few questions that arise:

  1. If gait-analysis is capable to detecting idiosyncratic patterns of an individual, can it with the help of machine learning identify generalized behavior or mood patterns? In short, is there is a general way in which we move when we are sad, confident, etc., or feel a specific kind of emotion?

  2. If so, under what scenarios will such knowledge be required and what will be its advantages and disadvantages over traditional data channels namely gestures, touch, speech, etc.?

  3. Since, most of the methods for gait analysis use surveillance data, how will we deal with the ethical and moral implications of such knowledge? How can it not be used in the wrong hands and protect the privacy and identity of its subject? Also, how can the results be unbiased based on color, race, gender, age, etc.


3.1 Immigration and Homeland Security

Biometrics has risen to prominence quickly, even with its short history. The current political agendas of many countries are permeated by questions that biometrics might answer, including security and immigration. Now, the US Citizens and Immigration Services require applicants for immigration benefits to be fingerprinted for the purpose of conducting FBI criminal background checks; US visit requires that most foreign visitors traveling to the US on a visa have their two index fingers scanned and a digital face photograph taken to verify their identity at the port of entry. In the Enhanced Border Security and Visa Entry Reform Act of 2002, the US Congress mandated the use of biometrics with US visas. This law required that Embassies and Consulates abroad must issue to international visitors "only machine-readable, tamper-resistant visas and other travel and entry documents that use biometric identifiers," not later than October 26, 2004.

From a topic that was largely on a university research agenda in 2002, biometrics have moved fast. The move was largely due to performance: biometrics offer a combination of speed and security, ideal in any mass transit scenario. Also, since they are part of a human subject, they are in principle difficult to counterfeit. Not only this, but they are amenable to electronic storage and checking, and devices with such capability continue to proliferate. It is for these reasons that face, iris and fingerprint have found evaluation in security and immigration. Other biometrics have not enjoyed this. This is because some do not lend themselves well to that application scenario, others — like gait, were simply too new to be considered at that time.

3.2 Surveillance

In many of the developed countries, concern over security has manifested in surveillance systems. These systems are particularly advanced in the UK where on-line face recognition is already in routine use to deter crime. In fact, a high profile case in the UK where a child was abducted and murdered and only the gait of the murderer could be determined from the surveillance data was the inspiration of Southampton's gait research: as only gait could be perceived was it a valid biometric? A primary aim of surveillance is naturally as a deterrent for criminal acts; much of it is video and it has been used as evidence in courts. The video data can suffer from adverse quality due to poor resolution, time-lapse imagery (images recorded at a frequency much lower than the video sampling rate to save on storage), tape re-use as well as a subject concealing the more conventional biometrics. But it does offer data that gait recognition technology could be applied to.

The ongoing trend is that deployment of surveillance systems will continue to increase, suggesting wider deployment of gait recognition techniques.

3.3 Human ID at a Distance (HiD) Program

The main single contributor to progress in automatic recognition by gait has been the Defense Advanced Research Projects Agency's (DARPA's) Human ID at a Distance research program led by Dr. Jonathon Phillips from National Institute of Standards in Technology (NIST). This program embraced three main areas: face; gait and new technologies, initially aimed to improve security at US embassies following some terrorist acts in 1998. The Human ID at a Distance program started in 2000 and finished in 2004 (ironically, privacy concerns in the US led to its closure). Gait is a natural contender for recognition at a distance, given its unique capabilities. The aim of the gait program was to progress from laboratory-based studies on small populations to large scale populations of real world data. Of the current approaches to recognition by gait and data that can be used to analyze performance, those from MIT, Georgia Institute of Technology (GaTech), NIST and the Universities of Maryland (UMD), Southampton (Soton), Carnegie Mellon (CMU) and South Florida (USF) were originally associated with the Human ID at a Distance program.


Desmond Morris, in his book Peoplewatching, describes 20 different ways in which human beings move from one place to another without any artificial aid[3]. Among them were the most common style of movement patterns that humans exhibit; some of which stand out in public. They are explained below:

  1. The Totter: Unsteady, slow, vertical locomotion. Usually seen among infants, this form of locomotion is repeated in later life only by the invalid, the injured, and the intoxicated, where, for some reason, the adult legs become unreliable as the infantile ones.

  2. The Stroll: Particular form of slow walking, usually performed at a rate of about one stride per second, where there is clearly no desire to arrive somewhere, but merely to travel. In addition to being a social walk, the Stroll is also the gait of a person pacing up and down, deep in thought, or reading a book.

  3. The Shuffle: Hobbling gait of the aged and the infirm in which the speed of movement is drastically reduced and the feet are slid cautiously along the ground.

  4. The Walk: Ordinary bipedal movement. During the process of walking, there is always either one or two feet touching the ground at any one time. The walker is never airborne. This is the essential difference between walking and running.

  5. The Hurry: Fast walk of a person who is desperate to get somewhere quickly, but who has not yet been driven to the extreme of breaking into a run.

  6. The Run: The division of locomotion where individual is airborne and whole sequence of striding changes. There is no longer a moment when both feet are on the ground together and the runner is 'sailing through the air'.

  7. The Jog: Deliberate slowed down version of the Run, carrying the jogger along at little more than brisk-walking pace but avoiding exhaustion of a run.

  8. The Sprint: High-speed version of running, the feet strike the ground not flat, but at the toe end, with the heels hardly making any contact at all. Dramatically increased stride rate.

  9. The Tiptoe: Hybrid version of slow walking and sprinting. Mode of progression is less noisy than ordinary walk and is used for a stealthy approach towards an unsuspecting victim, or when trying not to waken a sleeper.

One of the unconscious processes at work as we move about in a crowd is the automatic classification of each person we see into one of the locomotion categories. Without being aware of it, we tag each person as a stroller, a shuffler, a walker, or a hurrier and then make unconscious calculations concerning where their gaits will bring them, relative to our own movements. In this way we can anticipate and avoid collisions more efficiently. If people did not employ these characteristic and identifiable gaits when moving about in public, we would find manoeuvring among them much more difficult than it is. Especially for people who are blind and often have to make their way through crowds.

Some other noticeable forms of locomotion are marching, goose-stepping, hopping, skipping, climbing, swinging, jumping, swimming and acrobatics but these are rarely seen outside in public except among children or sports professionals.

TOTTER GAIT: Probably the most popular gait pattern which is easily detectable by humans but not so much by machines. Technology having such situational information can respond accordingly and avoid mishaps. CCTV cameras around a bar/club can detect if a person is drunk outside permissible limit and send autonomous cars to drive concerned individual to their destinations. Similarly, 360 degree cameras mounted on the cars can detect drunken gait and engage self-lock if the driver to too drunk to drive. Such measures will lower the number of risky drivers on the street and avoid alcohol related street accidents.

STROLL GAIT: Usage of cellphones caused nearly 2000 deaths in 2016 according to Report of Road Accidents by Ministry of Road Transport and Highway in India. Using machine learning and pose training, pedestrians engaged with their mobile devices can be alerted of the incoming traffic. Such detection is difficult or inconclusive with other computer vision approaches are employed.

Unauthorized entry
Gait analysis can non-evasively detect theft or unsolicited entry in private property and notify concerned authority. Authorized entry Gait detection can respond to the decorum of certain place, e.g. devices can be switched to silent-mode when a person is entering a meeting, etc.

SHUFFLE GAIT: Drones that act as first responder to an accident or natural calamity can access the medical situation of victims using gait-analysis. Such data is useful for authorities to strategize food and medical supply to affected region and cater to more urgent cases on a priority basis. Similar algorithm can be used in reconnaissance missions by the military to access combat strategies.


What we really need is a checklist of typical poses and positions in order to try to understand, or at least guess their meaning. But be warned — although interpreting body language might appear to be a matter of common sense — most of what follows is only a rough guide and may not be entirely accurate! The gait of an individual depends on the complex interplay of a variety of factors, few of which are listed below:

Extrinsic: such as terrain, footwear, clothing, cargo (its size, shape, temperature, etc.)
Intrinsic: sex, weight, height, age, etc.
Physical: such as weight, height, physique
Psychological: personality type, emotions, urgency
Physiological: anthropomorphic characteristics i.e. measurements and proportions of body
Pathological: e.g. trauma, neurological diseases, musculus-skeletal anomalies, psychiatric disorders.


Many studies have considered human motion extraction and tracking, though not for biometric recognition purposes. There is quite a range of detailed surveys of research in this area. As with mainstream computer vision, the earliest confined analyses to motion modelling and tracking.

The suggested applications include athletic training, bio-mechanical evaluation, animation and machine interaction but not as it was then at a very early stage, biometrics. At a similar time, another survey reviewed work on visual analysis of gestures and whole body movement and distinguished 2D approaches with or without shape models and 3D approaches. The application focuses included gesture based interaction, motion analysis and model-based image coding, but again not biometric use.

The human body is an extremely complex object, being highly articulated and capable of a variety of motions. Rotations and twists of each body parts occur in nearly every movement, and various parts of the body continually move into and out of occlusion. The selection of good body models is important to efficiently recognize human shapes from images and properly analyze human motion. Stick figure models and volumetric models are commonly used for three-dimensional tracking, and the ribbon model and blob model are also used but are not so popular. Stick figure models connect sticks at joints to represent the human body. On the other hand, volumetric models are used for a better representation of the human body. Among the different volumetric models, generalized cones are the most commonly used ones.

However, these structural models need to be modified according to different applications and are mainly used in human motion tracking. The alternative is to consider the property of the spatio-temporal pattern as a whole. Among the current research, human motion can be defined by the different gestures of body motion, different athletic sports (tennis, ballet) or human walking or running. The analysis varies according to different motions. There are two main methods to model human motion. The first is model-based: after the human body model is selected, the 3-D structure of the model is recovered from image sequences with or without moving light displays. The second emphasizes determining features of motion fields without structural reconstruction.

Ideas from human motion studies can be used for modelling the movement of human walking. Hogg[4] and Rohr[5] use flexion/extension curves for the hip, knee, shoulder and elbow joints in their walking models. A different approach for the modelling of motion was taken by Akita[6], who used a sequence of stick figures, called key frame sequence, to model rough movements of the body. In his key frame sequence of stick figures, each figure represents a different phase of body posture from the point view of occlusion. The key frame sequence is determined in advance and referred to in the prediction process.

Other approaches that are different from above consider the properties of the spatio-temporal pattern as a whole. These are the model-free approaches, of which find versions in gait biometrics approaches. Polana and Nelson[7] defined temporal textures to be the motion patterns of indeterminate spatial and temporal extent, activities to be motion patterns which are temporally periodic but are limited in spatial extent, and motion events to be isolated simple motions that do not exhibit any temporal or spatial repetition. Little and Boyd's approach[8] is similar to Polana and Nelson's idea, but they derive dense 2-D optical flow of the person and derive a series of measures of the position of the person and the distribution of the flow. The frequency and phase of these periodic signals are determined and used as features of the motions.


To build a strong case in favor of gait, the help of Machine Learning has to be taken and classification algorithms are to be used to predict whether a particular trait in gait exhibits itself under certain conditions. So, if sufficient amount of training data (for let us say drunk gait or sneak gait) is collected, the algorithm can detect emergent patterns from test data set. In ML terminology, it is called supervised classification technique. For the purpose of simplicity, we will look at the drunk gait use case more closely.

Before doing so, I will try to answer the same questions that I raised in the beginning of the article. Drink and drive is no doubt a leading cause of road accidents and despite numerous attempts from ad campaigns, awareness drives, etc., the menace continues. Breath analyzer is a great way to detect and fine the perpetrators, but it is practically not possible to monitor every driver on the road. It is here where machines can help the authorities.

There are many challenges to deal with. Gait concerns not just recognizing objects, but moving objects at that. To successfully capture the gait of an individual, the program should know the accurate location of body joints in space at all points of time. In case of controlled environment like labs, there are no obstructions and the background is a green screen to make it easy for us to isolate the moving pedestrians (aka ‘targets’) from the scene. A raw video footage from a CCTV footage can have various problems — there might not be enough light, the frames can be hazy, there can be lots of obstruction (moving cars, other pedestrians, etc.). The algorithm has to be done step-by-step:


Getting image sequence, resizing and formatting to prepare the frames to be analyzed. The frame has to be resized for faster and real-time computation. This is done using a method called pooling. The main aim at this stage is to downsize the inputs for the machine without loosing useful information.


Background modelling, image subtraction, edge detection to remove visual obstruction from image, leaving only moving objects remaining in the scene. Additional edge detection method can be added. 
We use Open-CV methods for this. Only the required features to detect the pose of all humans in the frame is kept. This makes the next phase work much faster.

(Left) Original Image
(Right) Extracted moving object


Pose estimation and detection using PoseNet model to determine position of joints in the space per instance. PoseNet library from TensorFlow can be used to estimate instance-based poses. This layer looses the features that contain the 'appearance' information of the body. Hence, biases of race, color, gender, age, etc., are removed in the layer. What we receive are stick figures of individuals per frame.

Screenshot 2019-09-11 at 6.04.00 PM.png


For a complete gait cycle, the body points are passed as inputs to a machine learning algorithm. The image sequence serves as the training data set for particular kind of gait. Output is a prediction in the form of a label from among the one defined in previous sections.


  1. Pilz, K. S., & Thornton, I. M. (2017). Idiosyncratic body motion influences person recognition. Visual Cognition, 25(4–6), 539–549

  2. Mark S. Nixon, Tieniu Tan, R. C. (2006). Human Identification Based on Gait (1st ed.) Springer US.

  3. Desmond Morris (2002). Peoplewatching: The Desmond Morris guide to Body Language (2nd Edition) Locomotion, 437-44, Random House Publication.

  4. D. Hogg, Model-based vision — a program to see a walking person, Image and Vision Computing, I( 1), pp. 5-20, 1983

  5. K. Rohr, Towards model-based recognition of human movements in image sequences, Computer Vision, Graphics, and Image Processing, 59(1), pp. 94-115,1994.

  6. K. Akita, Image Sequence Analysis of Real World Human Motion, Pattern Recognition, 17(1), pp. 73-83, 1984

  7. R. Polana and R. Nelson, Detecting activities, Proc. Con! on Computer Vision and Pattern Recognition, New York, USA, pp. 2-7, 1993

  8. J. Little and J. Boyd, Describing motion for recognition, Proceedings of the International Symposium on Computer Vision, pp 235-240, 1995